SlideShare uma empresa Scribd logo
1 de 13
Baixar para ler offline
Cloudera  impala  Performance  
         Evaluation  
(with  Comparison  to  Hive)	
 
               Dec. 8, 2012
     CELLANT Corp. R&D Strategy Division
              Yukinori SUDA
                @sudabon
About  Cloudera  impala	
 
•  Latest version is 0.3 beta
•  Open-sourced implementation inspired by Google Dremel
   and F1
•  Developed by famous Hadoop distributor Cloudera
•  Bring real-time, ad-hoc query capability on Apache Hadoop
•  Query data stored in HDFS or Apache Hbase
•  Use the same metadata, SQL syntax (HiveQL) as Apache Hive
•  Support for TextFile and SequenceFile as Hive storage format
•  Also support SequenceFile compressed as Snappy, Gzip and
   Bzip
•  Directly access the data through a specialized distributed
   query engine
Architecture	
 
•  State Store works as an impala-state-store(statestored) daemon
•  Query Planner, Query Coordinator and Query Exec Engine work as an
   impalad daemon
System  Environment	
 
      •  Install via Cloudera Manager Free Edition
           Master                                          Slave



・HDFS	
   NameNode	
   SecondaryNameNode	
                                                     ・HDFS	
・MapReduceV1	
                                                                DataNode	
   JobTracker	
                                                            ・MapReduceV1	
・impala	
                                                                     TaskTracker	
   impalad	
                                                               ・impala	
   impala-­‐‑state-­‐‑store	
                                                 impalad	
   (statestored)
        1  Sever                                                               13  Servers

      All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch
Server  Specification	
 

•  CPU
   o  Intel Core 2 Duo 2.13 GHz with Hyper Threading

•  Memory
   o  4GB

•  Disk
   o  7,200 rpm SATA mechanical Hard Disk Drive

•  OS
   o  CentOS 6.2
Benchmark	
 
•  Use CDH4.1 + impala version 0.2 and 0.3
•  Use hivebench in open-sourced benchmark tool
   “HiBench”
   o  https://github.com/hibench
•  Modified datasets to 1/10 scale
   o  Default configuration generates table with 1 billion rows
•  Modified query sentence
   o  Deleted “INSERT INTO TABLE …” to evaluate read-only performance
   o  Deleted “datediff” function (I mistook not to be supported)
•  Combines a few Hive storage format with a few
   compression method
   o  TextFile, SequenceFile, RCFile
   o  No compression, Gzip, Snappy
•  Comparison with job query latency
   o  Average job latency over 5 measurements
Modified  Datasets	
 
•  Uservisits table              •  Rankings table
   o  100 million rows              o  12 million rows
   o  Schema                        o  Schema
        •  sourceIP     string           •  pageURL       string
        •  destURL      string           •  pageRank      int
        •  visitDate    string           •  avgDuration   int
        •  adRevenue    double
        •  userAgent    string
        •  countryCode string
        •  languageCode string
        •  searchWord   string
        •  duration     int
Modified  Query	
 
SELECT                                  ON
  sourceIP,                                (R.pageURL = NUV.destURL)
  sum(adRevenue) as totalRevenue,
  avg(pageRank)
                                        GROUP BY sourceIP
FROM                                    ORDER BY totalRevenue DESC
  rankings R                            LIMIT 1
JOIN (
  SELECT
     sourceIP,
     destURL,
     adRevenue
  FROM
     uservisits UV
  WHERE
     UV.visitData >= ‘1999-01-01’
     AND UV.visitData <= ‘2001-01-01’
  ) NUV
Benchmark  Result  
    (Hive)
Benchmark  Result  
 (impala  0.2)
Benchmark  Result  
 (impala  0.3)
Conclusion	
 
•  Impala is over 10 times faster than MR + Hive
   o  Impala 0.3
        •  SequenceFile compressed as Snappy: 14.337 seconds
   o  Impala 0.2
        •  SequenceFile compressed as Gzip: 19.733 seconds
   o  Hive
        •  RCFile compressed as Snappy: 164.161 seconds

•  Hope that impala version 1.0 included in CDH5
   makes faster
   o  Support RCFile and Trevni columner format
Thank  you

Mais conteúdo relacionado

Mais procurados

Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera, Inc.
 
How Impala Works
How Impala WorksHow Impala Works
How Impala WorksYue Chen
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Cloudera, Inc.
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala InternalsDavid Groozman
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaManish Maheshwari
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaJason Shih
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoopmarkgrover
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかToshihiro Suzuki
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentationhadooparchbook
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera, Inc.
 
Low Latency SQL on Hadoop - What's best for your cluster
Low Latency SQL on Hadoop - What's best for your clusterLow Latency SQL on Hadoop - What's best for your cluster
Low Latency SQL on Hadoop - What's best for your clusterDataWorks Summit
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Datamichaelguia
 
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsYifeng Jiang
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
 

Mais procurados (20)

Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling Impala
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Hive vs. Impala
Hive vs. ImpalaHive vs. Impala
Hive vs. Impala
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentation
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
 
Cloudera impala
Cloudera impalaCloudera impala
Cloudera impala
 
Low Latency SQL on Hadoop - What's best for your cluster
Low Latency SQL on Hadoop - What's best for your clusterLow Latency SQL on Hadoop - What's best for your cluster
Low Latency SQL on Hadoop - What's best for your cluster
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
 
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 

Semelhante a Performance evaluation of cloudera impala (with Comparison to Hive)

Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HiveYukinori Suda
 
Cloudera Impala presentation
Cloudera Impala presentationCloudera Impala presentation
Cloudera Impala presentationmarkgrover
 
Jan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
Jan 2013 HUG: Impala - Real-time Queries for Apache HadoopJan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
Jan 2013 HUG: Impala - Real-time Queries for Apache HadoopYahoo Developer Network
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Modern Data Stack France
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkJames Chen
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanNarayana B
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformRedge Technologies
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfssusere05ec21
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120Hyoungjun Kim
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015Robbie Strickland
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)VMware Tanzu
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Cloudera, Inc.
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraDataStax Academy
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopChicago Hadoop Users Group
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwielerlucenerevolution
 
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietachPLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietachPROIDEA
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High PerformanceInderaj (Raj) Bains
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad ranaData Con LA
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesHaohui Mai
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresDataWorks Summit
 

Semelhante a Performance evaluation of cloudera impala (with Comparison to Hive) (20)

Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
 
Cloudera Impala presentation
Cloudera Impala presentationCloudera Impala presentation
Cloudera Impala presentation
 
Jan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
Jan 2013 HUG: Impala - Real-time Queries for Apache HadoopJan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
Jan 2013 HUG: Impala - Real-time Queries for Apache Hadoop
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_Plan
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platform
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietachPLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of Files
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 

Mais de Yukinori Suda

Hadoop operation chaper 4
Hadoop operation chaper 4Hadoop operation chaper 4
Hadoop operation chaper 4Yukinori Suda
 
Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話Yukinori Suda
 
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービスHadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービスYukinori Suda
 
自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜Yukinori Suda
 
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)Yukinori Suda
 
Evaluation of cloudera impala 1.1
Evaluation of cloudera impala 1.1Evaluation of cloudera impala 1.1
Evaluation of cloudera impala 1.1Yukinori Suda
 
HiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取りHiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取りYukinori Suda
 
Performance Evaluation of Cloudera Impala GA
Performance Evaluation of Cloudera Impala GAPerformance Evaluation of Cloudera Impala GA
Performance Evaluation of Cloudera Impala GAYukinori Suda
 
Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)Yukinori Suda
 

Mais de Yukinori Suda (9)

Hadoop operation chaper 4
Hadoop operation chaper 4Hadoop operation chaper 4
Hadoop operation chaper 4
 
Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話
 
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービスHadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービス
 
自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜
 
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
 
Evaluation of cloudera impala 1.1
Evaluation of cloudera impala 1.1Evaluation of cloudera impala 1.1
Evaluation of cloudera impala 1.1
 
HiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取りHiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取り
 
Performance Evaluation of Cloudera Impala GA
Performance Evaluation of Cloudera Impala GAPerformance Evaluation of Cloudera Impala GA
Performance Evaluation of Cloudera Impala GA
 
Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)
 

Último

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Último (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Performance evaluation of cloudera impala (with Comparison to Hive)

  • 1. Cloudera  impala  Performance   Evaluation   (with  Comparison  to  Hive) Dec. 8, 2012 CELLANT Corp. R&D Strategy Division Yukinori SUDA @sudabon
  • 2. About  Cloudera  impala •  Latest version is 0.3 beta •  Open-sourced implementation inspired by Google Dremel and F1 •  Developed by famous Hadoop distributor Cloudera •  Bring real-time, ad-hoc query capability on Apache Hadoop •  Query data stored in HDFS or Apache Hbase •  Use the same metadata, SQL syntax (HiveQL) as Apache Hive •  Support for TextFile and SequenceFile as Hive storage format •  Also support SequenceFile compressed as Snappy, Gzip and Bzip •  Directly access the data through a specialized distributed query engine
  • 3. Architecture •  State Store works as an impala-state-store(statestored) daemon •  Query Planner, Query Coordinator and Query Exec Engine work as an impalad daemon
  • 4. System  Environment •  Install via Cloudera Manager Free Edition Master Slave ・HDFS NameNode SecondaryNameNode ・HDFS ・MapReduceV1 DataNode JobTracker ・MapReduceV1 ・impala TaskTracker impalad ・impala impala-­‐‑state-­‐‑store impalad (statestored) 1  Sever 13  Servers All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch
  • 5. Server  Specification •  CPU o  Intel Core 2 Duo 2.13 GHz with Hyper Threading •  Memory o  4GB •  Disk o  7,200 rpm SATA mechanical Hard Disk Drive •  OS o  CentOS 6.2
  • 6. Benchmark •  Use CDH4.1 + impala version 0.2 and 0.3 •  Use hivebench in open-sourced benchmark tool “HiBench” o  https://github.com/hibench •  Modified datasets to 1/10 scale o  Default configuration generates table with 1 billion rows •  Modified query sentence o  Deleted “INSERT INTO TABLE …” to evaluate read-only performance o  Deleted “datediff” function (I mistook not to be supported) •  Combines a few Hive storage format with a few compression method o  TextFile, SequenceFile, RCFile o  No compression, Gzip, Snappy •  Comparison with job query latency o  Average job latency over 5 measurements
  • 7. Modified  Datasets •  Uservisits table •  Rankings table o  100 million rows o  12 million rows o  Schema o  Schema •  sourceIP string •  pageURL string •  destURL string •  pageRank int •  visitDate string •  avgDuration int •  adRevenue double •  userAgent string •  countryCode string •  languageCode string •  searchWord string •  duration int
  • 8. Modified  Query SELECT ON sourceIP, (R.pageURL = NUV.destURL) sum(adRevenue) as totalRevenue, avg(pageRank) GROUP BY sourceIP FROM ORDER BY totalRevenue DESC rankings R LIMIT 1 JOIN ( SELECT sourceIP, destURL, adRevenue FROM uservisits UV WHERE UV.visitData >= ‘1999-01-01’ AND UV.visitData <= ‘2001-01-01’ ) NUV
  • 10. Benchmark  Result   (impala  0.2)
  • 11. Benchmark  Result   (impala  0.3)
  • 12. Conclusion •  Impala is over 10 times faster than MR + Hive o  Impala 0.3 •  SequenceFile compressed as Snappy: 14.337 seconds o  Impala 0.2 •  SequenceFile compressed as Gzip: 19.733 seconds o  Hive •  RCFile compressed as Snappy: 164.161 seconds •  Hope that impala version 1.0 included in CDH5 makes faster o  Support RCFile and Trevni columner format