SlideShare uma empresa Scribd logo
1 de 21
Cloudera	
  Impala	
  
Jus/n	
  Erickson	
  |	
  Product	
  Manager	
  
	
  
November	
  2012	
  
Why	
  Data	
  Scien/sts	
  Love	
  Hadoop	
  
                                 	
  
    •    Massive	
  volumes	
  of	
  data	
  


                          	
  
    •    Data	
  prepara/on	
  &	
  analy/cs	
  in	
  1	
  environment	
  
    •    Highly	
  flexible	
  environment	
  for	
  crea/ng	
  &	
  tes/ng	
  machine	
  learning	
  models	
  


                          	
  
    •    10%	
  the	
  cost/TB	
  under	
  management	
  


                                           ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Hadoop	
  Use	
  Cases	
  Moving	
  to	
  Real-­‐Time	
  




         Already	
  query	
            Already	
  load	
  data	
  into	
                            Already	
  use	
  HBase	
  for	
  
       Hadoop	
  using	
  Hive	
      CDH	
  every	
  90	
  mins	
  or	
  less	
  	
                real-­‐/me	
  data	
  access	
  




                                                                       Source:	
  Cloudera	
  customer	
  survey	
  August	
  2012	
  

                                     ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
But	
  Hadoop	
  Isn’t	
  Fast	
  Enough	
  




         Need	
  faster	
           Move	
  data	
  from	
  	
                               See	
  value	
  today	
  in	
  
          queries	
  on	
         Hadoop	
  to	
  RDBMS	
  for	
                             consolida/ng	
  to	
  a	
  
         Hadoop	
  data	
            interac/ve	
  SQL	
                                      single	
  plaYorm	
  




                                                                Source:	
  Cloudera	
  customer	
  survey	
  August	
  2012	
  

                              ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Beyond	
  Batch	
  –	
  The	
  Next	
  Stage	
  for	
  Hadoop	
  
                    HADOOP	
  TODAY	
  IS	
  TOO	
  SLOW	
  
                             MapReduce	
  is	
  batch	
  
           Simple	
  queries	
  can	
  take	
  minutes	
  /	
  tens	
  of	
  minutes	
  
                                               	
  
                              	
  
      CURRENT	
  DATA	
  MANAGEMENT	
  IS	
  TOO	
  COMPLEX	
  
                         Op/mized	
  for	
  rigid	
  schemas	
  &	
  	
  
                           special	
  purpose	
  applica/ons	
  
                   Redundant	
  data	
  storage	
  &	
  processes	
  
                  Very	
  expensive	
  systems:	
  $20K-­‐150K	
  /	
  TB	
  
                                                          	
  
                                  ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Cloudera	
  Enterprise	
  RTQ	
  
Real-­‐Time	
  Query	
  for	
  Data	
  Stored	
  in	
  Hadoop	
  	
  
Powered	
  by	
  Cloudera	
  Impala.	
  
                                                      Supports	
  Hive	
  SQL	
  
                                                      4-­‐30X	
  faster	
  than	
  Hive	
  over	
  MapReduce	
  
                                                      Supports	
  mul/ple	
  storage	
  engines	
  &	
  	
  
                                                      file	
  formats	
  
                                                      Uses	
  exis/ng	
  drivers,	
  integrates	
  with	
  exis/ng	
  
                                                      metastore,	
  works	
  with	
  leading	
  BI	
  tools	
  
                                                      Flexible,	
  cost-­‐effec/ve,	
  no	
  lock-­‐in	
  

                                                      Deploy	
  &	
  operate	
  with	
  Cloudera	
  Manager	
  

                                     ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Cloudera	
  Now	
  Powered	
  by	
  Impala	
  
                BEFORE	
  IMPALA	
                                                                                WITH	
  IMPALA	
  
                                                                                        USER	
  INTERFACE	
  



                                                                                        BATCH	
  PROCESSING	
          REAL-­‐TIME	
  ACCESS	
  




   •  Unified	
  Storage:	
                                                          •  With	
  Impala:	
  	
  
       Supports	
  HDFS	
  and	
  HBase	
                                                 Real-­‐/me	
  SQL	
  queries	
  
       Flexible	
  file	
  formats	
                                                       Na/ve	
  distributed	
  query	
  engine	
  
   •  Unified	
  Metastore	
                                                               Op/mized	
  for	
  low-­‐latency	
  
   •  Unified	
  Security	
                                                          •  Provides:	
  
   •  Unified	
  Client	
  Interfaces:	
                                                   Answers	
  as	
  fast	
  as	
  you	
  can	
  ask	
  
       ODBC,	
  SQL	
  syntax,	
  Hue	
  Beeswax	
                                        Everyone	
  to	
  ask	
  ques/ons	
  for	
  all	
  data	
  
                                                                                          Big	
  data	
  storage	
  and	
  analy/cs	
  together	
  

                                               ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Impala	
  beta	
  features	
  
Today	
  (Cloudera	
  Impala	
  0.1):	
  
•   Nearly	
  all	
  of	
  Hive's	
  SQL,	
  including	
  insert,	
  join,	
  and	
  subqueries	
  
•   Query	
  results	
  4-­‐30X	
  faster	
  than	
  Hive	
  
•   Same	
  open	
  Hive	
  metadata	
  model	
  =>	
  easy	
  to	
  create	
  &	
  change	
  schema	
  
•   Support	
  for	
  HDFS	
  and	
  HBase	
  storage	
  
•   HDFS	
  file	
  formats:	
  TextFile,	
  SequenceFile	
  
•   HDFS	
  compression:	
  Snappy,	
  GZIP,	
  BZIP	
  
•   Common	
  ODBC	
  driver	
  and	
  Hue	
  Beeswax	
  with	
  Hive	
  
•   Separate	
  CLI	
  than	
  Hive	
  

Next	
  few	
  months:	
  
•     Support	
  for	
  Avro,	
  RCFile	
  &	
  LZO	
  compressed	
  text	
  
•     Addi/onal	
  OS	
  support	
  
•     Trevni	
  columnar	
  format	
  
•     JDBC	
  driver	
  
•     DDL	
  
•     Straggler	
  handling	
  
•     Increased	
  join	
  perf	
  


                                                                  ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Impala	
  v0.1	
  SQL	
  (HiveQL)	
  
•    Select	
  
      –    Boolean,	
  /nyint,	
  smallint,	
  int,	
  bigint,	
  float,	
  double,	
  /mestamp,	
  string	
  
      –    All,	
  dis/nct	
  
      –    Subqueries	
  (in	
  from	
  clause)	
  
      –    Where,	
  group	
  by,	
  having	
  
      –    Order	
  by	
  (with	
  limit	
  ini/ally)	
  
      –    Joins	
  (ler,	
  right,	
  full,	
  outer),	
  mul/-­‐table,	
  subquery	
  
      –    Union	
  all	
  
      –    Limit	
  
      –    External	
  tables	
  
      –    Rela/onal,	
  arithme/c,	
  logical	
  operators	
  
      –    Math,	
  collec/on,	
  cast,	
  date,	
  condi/onal,	
  string,	
  /mestamp	
  built-­‐ins	
  (e.g.	
  count,	
  sum,	
  cast,	
  case,	
  like,	
  
           in,	
  between,	
  coalesce)	
  

•    Insert	
  into	
  

                                                        ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Cloudera	
  Impala	
  Details	
  
Common	
  Hive	
  SQL	
  and	
  interface	
                                                      Unified	
  metadata	
  and	
  scheduler	
  
                   SQL	
  App	
                                                 Hive	
                                                              State	
  
                                                                              Metastore	
                      YARN	
          HDFS	
  NN	
         Store	
  
                     ODBC	
  




      Query	
  Planner	
                                 Query	
  Planner	
                               Fully	
  MPP	
           Query	
  Planner	
  
  Query	
  Coordinator	
                            Query	
  Coordinator	
                                Distributed	
         Query	
  Coordinator	
  
   Query	
  Exec	
  Engine	
                        Query	
  Exec	
  Engine	
                                                   Query	
  Exec	
  Engine	
  
  HDFS	
  DN	
          HBase	
                    HDFS	
  DN	
                  HBase	
                                       HDFS	
  DN	
        HBase	
  
                                                                                                                    Local	
  Direct	
  Reads	
  

                                                ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Cloudera	
  Impala	
  Details	
  
Common	
  Hive	
  SQL	
  and	
  interface	
  
                   SQL	
  App	
                                                     Hive	
                                                       State	
  
                                                                                  Metastore	
                      YARN	
     HDFS	
  NN	
       Store	
  
                     ODBC	
  

                                    SQL	
  Request	
  

      Query	
  Planner	
                                     Query	
  Planner	
                                                   Query	
  Planner	
  
  Query	
  Coordinator	
                                 Query	
  Coordinator	
                                               Query	
  Coordinator	
  
   Query	
  Exec	
  Engine	
                             Query	
  Exec	
  Engine	
                                             Query	
  Exec	
  Engine	
  
  HDFS	
  DN	
          HBase	
                          HDFS	
  DN	
                HBase	
                                  HDFS	
  DN	
     HBase	
  


                                                    ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Cloudera	
  Impala	
  Details	
  
                                                                                   Unified	
  metadata	
  and	
  scheduler	
  
                 SQL	
  App	
                                     Hive	
                                                        State	
  
                                                                Metastore	
                      YARN	
     HDFS	
  NN	
        Store	
  
                   ODBC	
  




    Query	
  Planner	
                     Query	
  Planner	
                                                   Query	
  Planner	
  
Query	
  Coordinator	
                Query	
  Coordinator	
                                                Query	
  Coordinator	
  
Query	
  Exec	
  Engine	
             Query	
  Exec	
  Engine	
                                              Query	
  Exec	
  Engine	
  
HDFS	
  DN	
          HBase	
        HDFS	
  DN	
                  HBase	
                                  HDFS	
  DN	
     HBase	
  


                                  ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Cloudera	
  Impala	
  Details	
  
                 SQL	
  App	
                                     Hive	
                                                          State	
  
                                                                Metastore	
                      YARN	
        HDFS	
  NN	
       Store	
  
                   ODBC	
  




    Query	
  Planner	
                     Query	
  Planner	
                               Fully	
  MPP	
         Query	
  Planner	
  
Query	
  Coordinator	
                Query	
  Coordinator	
                                Distributed	
      Query	
  Coordinator	
  
Query	
  Exec	
  Engine	
             Query	
  Exec	
  Engine	
                                                 Query	
  Exec	
  Engine	
  
HDFS	
  DN	
          HBase	
        HDFS	
  DN	
                  HBase	
                                     HDFS	
  DN	
     HBase	
  


                                  ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Cloudera	
  Impala	
  Details	
  
                 SQL	
  App	
                                     Hive	
                                                              State	
  
                                                                Metastore	
                      YARN	
          HDFS	
  NN	
         Store	
  
                   ODBC	
  




    Query	
  Planner	
                     Query	
  Planner	
                                                        Query	
  Planner	
  
Query	
  Coordinator	
                Query	
  Coordinator	
                                                      Query	
  Coordinator	
  
Query	
  Exec	
  Engine	
             Query	
  Exec	
  Engine	
                                                   Query	
  Exec	
  Engine	
  
HDFS	
  DN	
          HBase	
        HDFS	
  DN	
                  HBase	
                                       HDFS	
  DN	
        HBase	
  
                                                                                                      Local	
  Direct	
  Reads	
  

                                  ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Cloudera	
  Impala	
  Details	
  
                 SQL	
  App	
                                                     Hive	
                                                          State	
  
                                                                                Metastore	
                      YARN	
        HDFS	
  NN	
       Store	
  
                   ODBC	
  

                                  SQL	
  Results	
  

    Query	
  Planner	
                                     Query	
  Planner	
                               In	
  Memory	
         Query	
  Planner	
  
Query	
  Coordinator	
                                 Query	
  Coordinator	
                                Transfers	
       Query	
  Coordinator	
  
Query	
  Exec	
  Engine	
                              Query	
  Exec	
  Engine	
                                                Query	
  Exec	
  Engine	
  
HDFS	
  DN	
          HBase	
                          HDFS	
  DN	
                HBase	
                                     HDFS	
  DN	
     HBase	
  


                                                  ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Impala	
  and	
  Hive	
  
•     Shared	
  with	
  Hive:	
  
        –    Metadata	
  (table	
  defini/ons)	
  
        –    ODBC	
  driver	
  
        –    Hue	
  Beeswax	
  
        –    SQL	
  syntax	
  (HiveQL)	
  
        –    Flexible	
  file	
  formats	
  
        –    Machine	
  pool	
  

•     Improvements:	
  
        –    Purpose-­‐built	
  query	
  engine	
  direct	
  on	
  HDFS	
  and	
  HBase	
  
        –    No	
  JVM	
  and	
  MapReduce	
  
        –    In-­‐memory	
  data	
  transfers	
  
        –    Low-­‐latency	
  scheduler	
  
        –    Na/ve	
  distributed	
  rela/onal	
  query	
  engine	
  
        –    Trevni	
  columnar	
  format	
  (arer	
  v0.1)	
  




                                                               ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Advantages	
  of	
  Our	
  Approach	
  
•    No	
  high-­‐latency	
  MapReduce	
  batch	
  processing	
  
•    Local	
  processing	
  avoids	
  network	
  botlenecks	
  
•    No	
  costly	
  data	
  format	
  conversion	
  overhead	
  
•    All	
  data	
  immediately	
  query-­‐able	
  
•    Single	
  machine	
  pool	
  to	
  scale	
  
•    All	
  machines	
  available	
  to	
  both	
  Impala	
  and	
  MapReduce	
  
•    Single,	
  open,	
  and	
  unified	
  metadata	
  and	
  scheduler	
  

          MapReduce	
                                      Remote	
  Query	
                                      Side	
  Storage	
  
     Query	
                                     Query	
               Query	
              Query	
  
     Node	
                                      Node	
                Node	
               Node	
              Query	
         MR	
  
                          Hive	
                                                                                Engine	
  
      MR	
       OR	
      MR	
                                                                                                  DN	
  
                                                NN	
  
       DN	
               HDFS	
  
                                                             DN	
                 DN	
                 DN	
  


                                               ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Google	
  Dremel	
  and	
  Impala	
  
•  What	
  is	
  Dremel:	
  
      –  Columnar	
  storage	
  for	
  data	
  with	
  nested	
  structures	
  
      –  Distributed	
  scalable	
  aggrega/on	
  on	
  top	
  of	
  that	
  

•  Columnar	
  storage	
  in	
  Hadoop:	
  Trevni	
  
      –  New	
  columnar	
  format	
  created	
  by	
  Doug	
  Cuung	
  
      –  Stores	
  data	
  in	
  appropriate	
  na/ve/binary	
  types	
  
      –  Will	
  also	
  store	
  nested	
  structures	
  similar	
  to	
  Dremel's	
  ColumnIO	
  
•  Distributed	
  aggrega/on:	
  Impala	
  

•  Impala	
  plus	
  Trevni:	
  a	
  superset	
  of	
  the	
  published	
  version	
  of	
  Dremel	
  (which	
  didn't	
  
   support	
  joins)	
  

                                              ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Benefits	
  of	
  Cloudera	
  Impala	
  
Real-­‐Time	
  Query	
  for	
  Data	
  Stored	
  in	
  Hadoop	
  
                                    • Get	
  answers	
  as	
  fast	
  as	
  you	
  can	
  ask	
  ques/ons	
  
                                    • Interac/ve	
  analy/cs	
  directly	
  on	
  source	
  data	
  
                                    • No	
  jumping	
  between	
  data	
  silos	
  
                                    • Reduce	
  duplicate	
  storage	
  with	
  EDW	
  
                                    • Reduce	
  data	
  movement	
  for	
  interac/ve	
  analysis	
  
                                    • Leverage	
  exis/ng	
  tools	
  and	
  employee	
  skills	
  
                                    • Ask	
  ques/ons	
  of	
  all	
  your	
  data	
  
                                    • No	
  informa/on	
  loss	
  from	
  aggrega/on	
  or	
  
                                      conforming	
  to	
  	
  rela/onal	
  schemas	
  for	
  analysis	
  

                                    • Single	
  metadata	
  store	
  from	
  origina/on	
  	
  through	
  analysis	
  
                                    • No	
  need	
  to	
  hunt	
  through	
  mul/ple	
  data	
  silos	
  

                                     ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Validated	
  Beta	
  Partners	
  




                     ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Impala: Real-time Queries in Hadoop

Mais conteúdo relacionado

Mais procurados

ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in ImpalaCloudera, Inc.
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisKnoldus Inc.
 
2023 COSCUP - Whats new in PostgreSQL 16
2023 COSCUP - Whats new in PostgreSQL 162023 COSCUP - Whats new in PostgreSQL 16
2023 COSCUP - Whats new in PostgreSQL 16José Lin
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera, Inc.
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheusCeline George
 
Locondo 20190215@ec tech_group
Locondo 20190215@ec tech_groupLocondo 20190215@ec tech_group
Locondo 20190215@ec tech_groupShinya Sugiyama
 
Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)Amazon Web Services
 
Kafka vs Pulsar @KafkaMeetup_20180316
Kafka vs Pulsar @KafkaMeetup_20180316Kafka vs Pulsar @KafkaMeetup_20180316
Kafka vs Pulsar @KafkaMeetup_20180316Nozomi Kurihara
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllMichael Mior
 
Luft : 10억 데이터를 10초만에 쿼리하는 DB 개발기
Luft : 10억 데이터를 10초만에 쿼리하는 DB 개발기Luft : 10억 데이터를 10초만에 쿼리하는 DB 개발기
Luft : 10억 데이터를 10초만에 쿼리하는 DB 개발기Hyojun Kim
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)NAVER D2
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
FDW-based Sharding Update and Future
FDW-based Sharding Update and FutureFDW-based Sharding Update and Future
FDW-based Sharding Update and FutureMasahiko Sawada
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaSridhar Kumar N
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~NTT DATA OSS Professional Services
 

Mais procurados (20)

ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
2023 COSCUP - Whats new in PostgreSQL 16
2023 COSCUP - Whats new in PostgreSQL 162023 COSCUP - Whats new in PostgreSQL 16
2023 COSCUP - Whats new in PostgreSQL 16
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
 
Locondo 20190215@ec tech_group
Locondo 20190215@ec tech_groupLocondo 20190215@ec tech_group
Locondo 20190215@ec tech_group
 
Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
 
Kafka vs Pulsar @KafkaMeetup_20180316
Kafka vs Pulsar @KafkaMeetup_20180316Kafka vs Pulsar @KafkaMeetup_20180316
Kafka vs Pulsar @KafkaMeetup_20180316
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Luft : 10억 데이터를 10초만에 쿼리하는 DB 개발기
Luft : 10억 데이터를 10초만에 쿼리하는 DB 개발기Luft : 10억 데이터를 10초만에 쿼리하는 DB 개발기
Luft : 10억 데이터를 10초만에 쿼리하는 DB 개발기
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
FDW-based Sharding Update and Future
FDW-based Sharding Update and FutureFDW-based Sharding Update and Future
FDW-based Sharding Update and Future
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
 

Destaque

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopCloudera, Inc.
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera, Inc.
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaData Science London
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera, Inc.
 
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageWebinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageCloudera, Inc.
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHortonworks
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentationhadooparchbook
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera, Inc.
 
Introduction to Apache HBase Training
Introduction to Apache HBase TrainingIntroduction to Apache HBase Training
Introduction to Apache HBase TrainingCloudera, Inc.
 
Data Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopData Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopJSI
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ ZooskCloudera, Inc.
 
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio AlcacerApache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio AlcacerStratio
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala InternalsDavid Groozman
 

Destaque (20)

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
 
Cloudera impala
Cloudera impalaCloudera impala
Cloudera impala
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Bhan hiv
Bhan hivBhan hiv
Bhan hiv
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageWebinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentation
 
ImpalaToGo use case
ImpalaToGo use caseImpalaToGo use case
ImpalaToGo use case
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
 
Introduction to Apache HBase Training
Introduction to Apache HBase TrainingIntroduction to Apache HBase Training
Introduction to Apache HBase Training
 
Data Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopData Visualization Design Best Practices Workshop
Data Visualization Design Best Practices Workshop
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ Zoosk
 
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio AlcacerApache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 

Semelhante a Impala: Real-time Queries in Hadoop

Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...Cloudera, Inc.
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopCloudera, Inc.
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseDataWorks Summit
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseCloudera, Inc.
 
Improving MySQL performance with Hadoop
Improving MySQL performance with HadoopImproving MySQL performance with Hadoop
Improving MySQL performance with HadoopSagar Jauhari
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
 

Semelhante a Impala: Real-time Queries in Hadoop (20)

Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
 
Improving MySQL performance with Hadoop
Improving MySQL performance with HadoopImproving MySQL performance with Hadoop
Improving MySQL performance with Hadoop
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 

Mais de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 

Último (20)

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 

Impala: Real-time Queries in Hadoop

  • 1. Cloudera  Impala   Jus/n  Erickson  |  Product  Manager     November  2012  
  • 2. Why  Data  Scien/sts  Love  Hadoop     •  Massive  volumes  of  data     •  Data  prepara/on  &  analy/cs  in  1  environment   •  Highly  flexible  environment  for  crea/ng  &  tes/ng  machine  learning  models     •  10%  the  cost/TB  under  management   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 3. Hadoop  Use  Cases  Moving  to  Real-­‐Time   Already  query   Already  load  data  into   Already  use  HBase  for   Hadoop  using  Hive   CDH  every  90  mins  or  less     real-­‐/me  data  access   Source:  Cloudera  customer  survey  August  2012   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 4. But  Hadoop  Isn’t  Fast  Enough   Need  faster   Move  data  from     See  value  today  in   queries  on   Hadoop  to  RDBMS  for   consolida/ng  to  a   Hadoop  data   interac/ve  SQL   single  plaYorm   Source:  Cloudera  customer  survey  August  2012   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 5. Beyond  Batch  –  The  Next  Stage  for  Hadoop   HADOOP  TODAY  IS  TOO  SLOW   MapReduce  is  batch   Simple  queries  can  take  minutes  /  tens  of  minutes       CURRENT  DATA  MANAGEMENT  IS  TOO  COMPLEX   Op/mized  for  rigid  schemas  &     special  purpose  applica/ons   Redundant  data  storage  &  processes   Very  expensive  systems:  $20K-­‐150K  /  TB     ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 6. Cloudera  Enterprise  RTQ   Real-­‐Time  Query  for  Data  Stored  in  Hadoop     Powered  by  Cloudera  Impala.   Supports  Hive  SQL   4-­‐30X  faster  than  Hive  over  MapReduce   Supports  mul/ple  storage  engines  &     file  formats   Uses  exis/ng  drivers,  integrates  with  exis/ng   metastore,  works  with  leading  BI  tools   Flexible,  cost-­‐effec/ve,  no  lock-­‐in   Deploy  &  operate  with  Cloudera  Manager   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 7. Cloudera  Now  Powered  by  Impala   BEFORE  IMPALA   WITH  IMPALA   USER  INTERFACE   BATCH  PROCESSING   REAL-­‐TIME  ACCESS   •  Unified  Storage:   •  With  Impala:     Supports  HDFS  and  HBase   Real-­‐/me  SQL  queries   Flexible  file  formats   Na/ve  distributed  query  engine   •  Unified  Metastore   Op/mized  for  low-­‐latency   •  Unified  Security   •  Provides:   •  Unified  Client  Interfaces:   Answers  as  fast  as  you  can  ask   ODBC,  SQL  syntax,  Hue  Beeswax   Everyone  to  ask  ques/ons  for  all  data   Big  data  storage  and  analy/cs  together   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 8. Impala  beta  features   Today  (Cloudera  Impala  0.1):   •  Nearly  all  of  Hive's  SQL,  including  insert,  join,  and  subqueries   •  Query  results  4-­‐30X  faster  than  Hive   •  Same  open  Hive  metadata  model  =>  easy  to  create  &  change  schema   •  Support  for  HDFS  and  HBase  storage   •  HDFS  file  formats:  TextFile,  SequenceFile   •  HDFS  compression:  Snappy,  GZIP,  BZIP   •  Common  ODBC  driver  and  Hue  Beeswax  with  Hive   •  Separate  CLI  than  Hive   Next  few  months:   •  Support  for  Avro,  RCFile  &  LZO  compressed  text   •  Addi/onal  OS  support   •  Trevni  columnar  format   •  JDBC  driver   •  DDL   •  Straggler  handling   •  Increased  join  perf   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 9. Impala  v0.1  SQL  (HiveQL)   •  Select   –  Boolean,  /nyint,  smallint,  int,  bigint,  float,  double,  /mestamp,  string   –  All,  dis/nct   –  Subqueries  (in  from  clause)   –  Where,  group  by,  having   –  Order  by  (with  limit  ini/ally)   –  Joins  (ler,  right,  full,  outer),  mul/-­‐table,  subquery   –  Union  all   –  Limit   –  External  tables   –  Rela/onal,  arithme/c,  logical  operators   –  Math,  collec/on,  cast,  date,  condi/onal,  string,  /mestamp  built-­‐ins  (e.g.  count,  sum,  cast,  case,  like,   in,  between,  coalesce)   •  Insert  into   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 10. Cloudera  Impala  Details   Common  Hive  SQL  and  interface   Unified  metadata  and  scheduler   SQL  App   Hive   State   Metastore   YARN   HDFS  NN   Store   ODBC   Query  Planner   Query  Planner   Fully  MPP   Query  Planner   Query  Coordinator   Query  Coordinator   Distributed   Query  Coordinator   Query  Exec  Engine   Query  Exec  Engine   Query  Exec  Engine   HDFS  DN   HBase   HDFS  DN   HBase   HDFS  DN   HBase   Local  Direct  Reads   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 11. Cloudera  Impala  Details   Common  Hive  SQL  and  interface   SQL  App   Hive   State   Metastore   YARN   HDFS  NN   Store   ODBC   SQL  Request   Query  Planner   Query  Planner   Query  Planner   Query  Coordinator   Query  Coordinator   Query  Coordinator   Query  Exec  Engine   Query  Exec  Engine   Query  Exec  Engine   HDFS  DN   HBase   HDFS  DN   HBase   HDFS  DN   HBase   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 12. Cloudera  Impala  Details   Unified  metadata  and  scheduler   SQL  App   Hive   State   Metastore   YARN   HDFS  NN   Store   ODBC   Query  Planner   Query  Planner   Query  Planner   Query  Coordinator   Query  Coordinator   Query  Coordinator   Query  Exec  Engine   Query  Exec  Engine   Query  Exec  Engine   HDFS  DN   HBase   HDFS  DN   HBase   HDFS  DN   HBase   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 13. Cloudera  Impala  Details   SQL  App   Hive   State   Metastore   YARN   HDFS  NN   Store   ODBC   Query  Planner   Query  Planner   Fully  MPP   Query  Planner   Query  Coordinator   Query  Coordinator   Distributed   Query  Coordinator   Query  Exec  Engine   Query  Exec  Engine   Query  Exec  Engine   HDFS  DN   HBase   HDFS  DN   HBase   HDFS  DN   HBase   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 14. Cloudera  Impala  Details   SQL  App   Hive   State   Metastore   YARN   HDFS  NN   Store   ODBC   Query  Planner   Query  Planner   Query  Planner   Query  Coordinator   Query  Coordinator   Query  Coordinator   Query  Exec  Engine   Query  Exec  Engine   Query  Exec  Engine   HDFS  DN   HBase   HDFS  DN   HBase   HDFS  DN   HBase   Local  Direct  Reads   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 15. Cloudera  Impala  Details   SQL  App   Hive   State   Metastore   YARN   HDFS  NN   Store   ODBC   SQL  Results   Query  Planner   Query  Planner   In  Memory   Query  Planner   Query  Coordinator   Query  Coordinator   Transfers   Query  Coordinator   Query  Exec  Engine   Query  Exec  Engine   Query  Exec  Engine   HDFS  DN   HBase   HDFS  DN   HBase   HDFS  DN   HBase   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 16. Impala  and  Hive   •  Shared  with  Hive:   –  Metadata  (table  defini/ons)   –  ODBC  driver   –  Hue  Beeswax   –  SQL  syntax  (HiveQL)   –  Flexible  file  formats   –  Machine  pool   •  Improvements:   –  Purpose-­‐built  query  engine  direct  on  HDFS  and  HBase   –  No  JVM  and  MapReduce   –  In-­‐memory  data  transfers   –  Low-­‐latency  scheduler   –  Na/ve  distributed  rela/onal  query  engine   –  Trevni  columnar  format  (arer  v0.1)   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 17. Advantages  of  Our  Approach   •  No  high-­‐latency  MapReduce  batch  processing   •  Local  processing  avoids  network  botlenecks   •  No  costly  data  format  conversion  overhead   •  All  data  immediately  query-­‐able   •  Single  machine  pool  to  scale   •  All  machines  available  to  both  Impala  and  MapReduce   •  Single,  open,  and  unified  metadata  and  scheduler   MapReduce   Remote  Query   Side  Storage   Query   Query   Query   Query   Node   Node   Node   Node   Query   MR   Hive   Engine   MR   OR   MR   DN   NN   DN   HDFS   DN   DN   DN   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 18. Google  Dremel  and  Impala   •  What  is  Dremel:   –  Columnar  storage  for  data  with  nested  structures   –  Distributed  scalable  aggrega/on  on  top  of  that   •  Columnar  storage  in  Hadoop:  Trevni   –  New  columnar  format  created  by  Doug  Cuung   –  Stores  data  in  appropriate  na/ve/binary  types   –  Will  also  store  nested  structures  similar  to  Dremel's  ColumnIO   •  Distributed  aggrega/on:  Impala   •  Impala  plus  Trevni:  a  superset  of  the  published  version  of  Dremel  (which  didn't   support  joins)   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 19. Benefits  of  Cloudera  Impala   Real-­‐Time  Query  for  Data  Stored  in  Hadoop   • Get  answers  as  fast  as  you  can  ask  ques/ons   • Interac/ve  analy/cs  directly  on  source  data   • No  jumping  between  data  silos   • Reduce  duplicate  storage  with  EDW   • Reduce  data  movement  for  interac/ve  analysis   • Leverage  exis/ng  tools  and  employee  skills   • Ask  ques/ons  of  all  your  data   • No  informa/on  loss  from  aggrega/on  or   conforming  to    rela/onal  schemas  for  analysis   • Single  metadata  store  from  origina/on    through  analysis   • No  need  to  hunt  through  mul/ple  data  silos   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 20. Validated  Beta  Partners   ©2012  Cloudera,  Inc.  All  Rights  Reserved.