SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
Which Hadoop SQL Engine Leads the Herd? 
Stewart Tate 
STSM, Chief Designer, IBM Big Data Cluster 
Lead Architect BigInsights Performance 
IBM Silicon Valley Laboratory ∙ San Jose ∙ California 
tates@us.ibm.com
2 
© 2013 IBM Corporation 
Acknowledgements and Disclaimers 
Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. 
The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. 
All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. 
© Copyright IBM Corporation 2014. All rights reserved. 
— U.S. Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. 
IBM, the IBM logo, ibm.com, BigInsights, and Big SQL are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or TM), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at 
•“Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml 
•TPC Benchmark, TPC-DS, and QphDS are trademarks of Transaction Processing Performance Council 
•Cloudera, the Cloudera logo, Cloudera Impala are trademarks of Cloudera. 
•Other company, product, or service names may be trademarks or service marks of others. 
2
3 
© 2013 IBM Corporation 
Our journey begins… 
•Evaluating three SQL engines in major Hadoop Distributions 
IBM BigInsights 3.0.0.1 Big SQL v3 
Cloudera (CDH5) Impala 1.4 
Hortonworks (HDP 2.1) Hive 0.13 
•Workload 
Simulated Enterprise Retail Operation using 99 Queries 
Hadoop-DS based on TPC-DS benchmark
4 
© 2013 IBM Corporation 
Three Identical Hadoop Clusters… 
Management Node 
One x3650 M4 BD 
Two E5-2680 v2 2.8GHz 10-core 
128GB RAM, 1866MHz 
2TB 3.5” HDD 
Dual-port 10GbE 
RHEL 6.4 
EXT4/HDFS/Parquet/ORC 
Data Nodes 
Sixteen x3650 M4 BD 
Two E5-2680 2.8GHz 10-core 
128GB RAM, 1866MHz 
Ten 2TB 3.5” HDD 
Four 120GB 3.5” SSD 
Dual-port 10GbE 
RHEL 6.4 
EXT4/HDFS/Parquet/ORC 
Using 
Lenovo Servers 
designed for Hadoop
5 
© 2013 IBM Corporation Benchmark Back Office…
6 © 2013 IBM Corporation 
Live access to Audited & Real-time results 
http://app.insightibm.com
7 © 2013 IBM Corporation 
Which SQL Engine Leads the Herd? 
View results from each Hadoop Distro via our mobile app 
http://app.insightibm.com
8 
© 2013 IBM Corporation 
Big SQL – Runs 100% of the queries Key points 
Impala & Hive require many queries to be re-written, some significantly 
Owing to various restrictions, some queries could not be re- written or failed at run-time 
Re-writing queries in a benchmark scenario where results are known is one thing – doing this against real databases in production is another
9 
© 2013 IBM Corporation 
Hadoop-DS benchmark – Single user performance @ 10TB 
Big SQL is 3.6x faster than Impala and 5.4x faster than Hive 0.13 
for single query stream using 46 common queries
10 
© 2013 IBM Corporation 
Hadoop-DS benchmark – multi-user performance @ 10TB 
With 4 streams, Big SQL is 2.1x faster than Impala and 8.5x faster than Hive 0.13 
for 4 query streams using 46 common queries 
**See Speaker notes for disclaimer
11 
© 2013 IBM Corporation 
Hadoop-DS benchmark – Big SQL with 99 queries @ 30TB 
Big SQL completed 4 concurrent query streams @30TB in 1.8x time of a single query stream Big SQL completed 396 queries in only 1.8x time of 99 queries
12 
© 2013 IBM Corporation 
Thank you!
13 
© 2013 IBM Corporation 
About TPC-DS Queries 
•The queries are diverse, and many are complex 
•Reflecting real business needs – a random sample: 
Find customers returning items more frequently than normal (q1) 
States with customers most ammenable to premium priced offers (q6) 
List key metrics for unadvertised in-store promotions by demographic (q7) 
Identify similar customers purchasing through multiple sales outlets (q10) 
Find customers shifting purchasing habits to the web (q11) 
Key measures for catalog sales fulfilled from an alternate warehouse (q16) 
Find frequently sold items and the circumstances under which repeat sales take place (q23) 
Understand the products and retail locations where items are likely to be return and subsequently re-purchased via the catalog (q29) 
Display customers making significant local purchases comparing to buying potential based on dependents and vehicles owned (q34)
14 
© 2013 IBM Corporation 
Hadoop-DS benchmark 
•Aim: To provide the fairest and most meaningful comparison of SQL over Hadoop solutions so far 
•Hadoop-DS benchmark is based on the TPC-DS* benchmark. Key deviations: 
No data maintenance or data persistence phases 
•Not possible across all vendors 
Uses a common query set across all solutions 
•The sub-set of queries that all vendors can successfully execute at that scale factor 
•Queries are not cherry picked 
•It is the most complete TPCDS like benchmark executed to date 
•Worked with TPC certified auditor to review the benchmark 
•Is analogues to porting a relational workload to SQL on Hadoop Bringing Order out of Chaos
15 
© 2013 IBM Corporation 
Big SQL 3.0 – Key Performance Features 
•Advanced re-write and cost based optimizer backed by decades of IBM R&D 
•Optimized HDFS readers for different storage formats 
Native readers 
•Self tuning memory manager 
Optimize memory allocation between Big SQL consumers depending on the workload 
•Informational Constraints 
•Advanced Statistics 
•Resource sharing and WLM

Mais conteúdo relacionado

Mais procurados

Deploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsDeploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsNeo4j
 
DevOps Culture & Enablement with Postgres Plus Cloud Database
DevOps Culture & Enablement with Postgres Plus Cloud DatabaseDevOps Culture & Enablement with Postgres Plus Cloud Database
DevOps Culture & Enablement with Postgres Plus Cloud DatabaseEDB
 
NoSQL on ACID: Meet Unstructured Postgres
NoSQL on ACID: Meet Unstructured PostgresNoSQL on ACID: Meet Unstructured Postgres
NoSQL on ACID: Meet Unstructured PostgresEDB
 
IBM Power Systems Outlook and Roadmap
IBM Power Systems Outlook and RoadmapIBM Power Systems Outlook and Roadmap
IBM Power Systems Outlook and RoadmapDavid Spurway
 
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...EDB
 
Powerplay: Postgres and Lenovo for the Best Performance & Savings
Powerplay: Postgres and Lenovo for the Best Performance & SavingsPowerplay: Postgres and Lenovo for the Best Performance & Savings
Powerplay: Postgres and Lenovo for the Best Performance & SavingsEDB
 
An overview of reference architectures for Postgres
An overview of reference architectures for PostgresAn overview of reference architectures for Postgres
An overview of reference architectures for PostgresEDB
 
IBM i and Linux case studies
IBM i and Linux case studiesIBM i and Linux case studies
IBM i and Linux case studiesDavid Spurway
 
The Value of Postgres to IT and Finance
The Value of Postgres to IT and FinanceThe Value of Postgres to IT and Finance
The Value of Postgres to IT and FinanceEDB
 
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j
 
Introduction to Machine Learning on IBM Power Systems
Introduction to Machine Learning on IBM Power SystemsIntroduction to Machine Learning on IBM Power Systems
Introduction to Machine Learning on IBM Power SystemsDavid Spurway
 
Hitachi Accelerated Flash Storage
Hitachi Accelerated Flash Storage Hitachi Accelerated Flash Storage
Hitachi Accelerated Flash Storage Hitachi Vantara
 
Informix 1210 feature overview
Informix 1210 feature overviewInformix 1210 feature overview
Informix 1210 feature overviewJohn Miller
 
Migrations, Health Checks, and Support Experiences - Postgres from the Servic...
Migrations, Health Checks, and Support Experiences - Postgres from the Servic...Migrations, Health Checks, and Support Experiences - Postgres from the Servic...
Migrations, Health Checks, and Support Experiences - Postgres from the Servic...EDB
 
Consolidate your SAP System landscape Teched && d-code 2014
Consolidate your SAP System landscape Teched && d-code 2014Consolidate your SAP System landscape Teched && d-code 2014
Consolidate your SAP System landscape Teched && d-code 2014Goetz Lessmann
 
Automating a PostgreSQL High Availability Architecture with Ansible
Automating a PostgreSQL High Availability Architecture with AnsibleAutomating a PostgreSQL High Availability Architecture with Ansible
Automating a PostgreSQL High Availability Architecture with AnsibleEDB
 
Application Report: Virtualizing Tier-1 Workloads using FC SANs
Application Report: Virtualizing Tier-1 Workloads using FC SANsApplication Report: Virtualizing Tier-1 Workloads using FC SANs
Application Report: Virtualizing Tier-1 Workloads using FC SANsIT Brand Pulse
 
sap hana|sap hana database| Introduction to sap hana
sap hana|sap hana database| Introduction to sap hanasap hana|sap hana database| Introduction to sap hana
sap hana|sap hana database| Introduction to sap hanaJames L. Lee
 
Overcoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQLOvercoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQLEDB
 

Mais procurados (20)

Deploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsDeploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime Insights
 
DevOps Culture & Enablement with Postgres Plus Cloud Database
DevOps Culture & Enablement with Postgres Plus Cloud DatabaseDevOps Culture & Enablement with Postgres Plus Cloud Database
DevOps Culture & Enablement with Postgres Plus Cloud Database
 
NoSQL on ACID: Meet Unstructured Postgres
NoSQL on ACID: Meet Unstructured PostgresNoSQL on ACID: Meet Unstructured Postgres
NoSQL on ACID: Meet Unstructured Postgres
 
IBM Power Systems Outlook and Roadmap
IBM Power Systems Outlook and RoadmapIBM Power Systems Outlook and Roadmap
IBM Power Systems Outlook and Roadmap
 
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
 
Powerplay: Postgres and Lenovo for the Best Performance & Savings
Powerplay: Postgres and Lenovo for the Best Performance & SavingsPowerplay: Postgres and Lenovo for the Best Performance & Savings
Powerplay: Postgres and Lenovo for the Best Performance & Savings
 
An overview of reference architectures for Postgres
An overview of reference architectures for PostgresAn overview of reference architectures for Postgres
An overview of reference architectures for Postgres
 
IBM i and Linux case studies
IBM i and Linux case studiesIBM i and Linux case studies
IBM i and Linux case studies
 
The Value of Postgres to IT and Finance
The Value of Postgres to IT and FinanceThe Value of Postgres to IT and Finance
The Value of Postgres to IT and Finance
 
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in Graphdatenbanken
 
Introduction to Machine Learning on IBM Power Systems
Introduction to Machine Learning on IBM Power SystemsIntroduction to Machine Learning on IBM Power Systems
Introduction to Machine Learning on IBM Power Systems
 
AriZona Beverage Company sucess case
AriZona Beverage Company sucess caseAriZona Beverage Company sucess case
AriZona Beverage Company sucess case
 
Hitachi Accelerated Flash Storage
Hitachi Accelerated Flash Storage Hitachi Accelerated Flash Storage
Hitachi Accelerated Flash Storage
 
Informix 1210 feature overview
Informix 1210 feature overviewInformix 1210 feature overview
Informix 1210 feature overview
 
Migrations, Health Checks, and Support Experiences - Postgres from the Servic...
Migrations, Health Checks, and Support Experiences - Postgres from the Servic...Migrations, Health Checks, and Support Experiences - Postgres from the Servic...
Migrations, Health Checks, and Support Experiences - Postgres from the Servic...
 
Consolidate your SAP System landscape Teched && d-code 2014
Consolidate your SAP System landscape Teched && d-code 2014Consolidate your SAP System landscape Teched && d-code 2014
Consolidate your SAP System landscape Teched && d-code 2014
 
Automating a PostgreSQL High Availability Architecture with Ansible
Automating a PostgreSQL High Availability Architecture with AnsibleAutomating a PostgreSQL High Availability Architecture with Ansible
Automating a PostgreSQL High Availability Architecture with Ansible
 
Application Report: Virtualizing Tier-1 Workloads using FC SANs
Application Report: Virtualizing Tier-1 Workloads using FC SANsApplication Report: Virtualizing Tier-1 Workloads using FC SANs
Application Report: Virtualizing Tier-1 Workloads using FC SANs
 
sap hana|sap hana database| Introduction to sap hana
sap hana|sap hana database| Introduction to sap hanasap hana|sap hana database| Introduction to sap hana
sap hana|sap hana database| Introduction to sap hana
 
Overcoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQLOvercoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQL
 

Destaque

Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLArseny Chernov
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...Remy Rosenbaum
 
N(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownN(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownDataWorks Summit
 
Low Latency SQL on Hadoop - What's best for your cluster
Low Latency SQL on Hadoop - What's best for your clusterLow Latency SQL on Hadoop - What's best for your cluster
Low Latency SQL on Hadoop - What's best for your clusterDataWorks Summit
 
Cloudera Showcase: SQL-on-Hadoop
Cloudera Showcase: SQL-on-HadoopCloudera Showcase: SQL-on-Hadoop
Cloudera Showcase: SQL-on-HadoopCloudera, Inc.
 
Functional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksFunctional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksHuafeng Wang
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat SheetHortonworks
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopCloudera, Inc.
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera, Inc.
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Destaque (14)

Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
 
N(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownN(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdown
 
Low Latency SQL on Hadoop - What's best for your cluster
Low Latency SQL on Hadoop - What's best for your clusterLow Latency SQL on Hadoop - What's best for your cluster
Low Latency SQL on Hadoop - What's best for your cluster
 
Cloudera Showcase: SQL-on-Hadoop
Cloudera Showcase: SQL-on-HadoopCloudera Showcase: SQL-on-Hadoop
Cloudera Showcase: SQL-on-Hadoop
 
SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
 
Functional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksFunctional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming Frameworks
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Semelhante a Benchmarking Hadoop - Which hadoop sql engine leads the herd

Spark working with a Cloud IDE: Notebook/Shiny Apps
Spark working with a Cloud IDE: Notebook/Shiny AppsSpark working with a Cloud IDE: Notebook/Shiny Apps
Spark working with a Cloud IDE: Notebook/Shiny AppsData Con LA
 
Hadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the HerdHadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the HerdIBM Analytics
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopDataWorks Summit
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Indrajit Poddar
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningDataWorks Summit
 
Accelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUsAccelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUsIBM
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...DataWorks Summit
 
IMS08 the momentum driving the ims future
IMS08   the momentum driving the ims futureIMS08   the momentum driving the ims future
IMS08 the momentum driving the ims futureRobert Hain
 
Informix REST API Tutorial
Informix REST API TutorialInformix REST API Tutorial
Informix REST API TutorialBrian Hughes
 
Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcDataWorks Summit
 
Integrating Structure and Analytics with Unstructured Data
Integrating Structure and Analytics with Unstructured DataIntegrating Structure and Analytics with Unstructured Data
Integrating Structure and Analytics with Unstructured DataDATAVERSITY
 
The Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on HadoopThe Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on HadoopDataWorks Summit
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...ModusOptimum
 
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)DataWorks Summit
 
IBM Hadoop-DS Benchmark Report - 30TB
IBM Hadoop-DS Benchmark Report - 30TBIBM Hadoop-DS Benchmark Report - 30TB
IBM Hadoop-DS Benchmark Report - 30TBGord Sissons
 
TDWI San Diego 2014: Wendy Lucas Describes how BLU Acceleration Delivers In-T...
TDWI San Diego 2014: Wendy Lucas Describes how BLU Acceleration Delivers In-T...TDWI San Diego 2014: Wendy Lucas Describes how BLU Acceleration Delivers In-T...
TDWI San Diego 2014: Wendy Lucas Describes how BLU Acceleration Delivers In-T...IBM Analytics
 
2016 02-16-announce-overview-zsp04505 usen
2016 02-16-announce-overview-zsp04505 usen2016 02-16-announce-overview-zsp04505 usen
2016 02-16-announce-overview-zsp04505 usenDavid Morlitz
 

Semelhante a Benchmarking Hadoop - Which hadoop sql engine leads the herd (20)

Spark working with a Cloud IDE: Notebook/Shiny Apps
Spark working with a Cloud IDE: Notebook/Shiny AppsSpark working with a Cloud IDE: Notebook/Shiny Apps
Spark working with a Cloud IDE: Notebook/Shiny Apps
 
Hadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the HerdHadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the Herd
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
 
13721876
1372187613721876
13721876
 
Accelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUsAccelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUs
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
 
IMS08 the momentum driving the ims future
IMS08   the momentum driving the ims futureIMS08   the momentum driving the ims future
IMS08 the momentum driving the ims future
 
Informix REST API Tutorial
Informix REST API TutorialInformix REST API Tutorial
Informix REST API Tutorial
 
Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache Orc
 
14 guendert pres
14 guendert pres14 guendert pres
14 guendert pres
 
Integrating Structure and Analytics with Unstructured Data
Integrating Structure and Analytics with Unstructured DataIntegrating Structure and Analytics with Unstructured Data
Integrating Structure and Analytics with Unstructured Data
 
Iod 2013 Jackman Schwenger
Iod 2013 Jackman SchwengerIod 2013 Jackman Schwenger
Iod 2013 Jackman Schwenger
 
The Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on HadoopThe Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on Hadoop
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
 
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
 
IBM Hadoop-DS Benchmark Report - 30TB
IBM Hadoop-DS Benchmark Report - 30TBIBM Hadoop-DS Benchmark Report - 30TB
IBM Hadoop-DS Benchmark Report - 30TB
 
TDWI San Diego 2014: Wendy Lucas Describes how BLU Acceleration Delivers In-T...
TDWI San Diego 2014: Wendy Lucas Describes how BLU Acceleration Delivers In-T...TDWI San Diego 2014: Wendy Lucas Describes how BLU Acceleration Delivers In-T...
TDWI San Diego 2014: Wendy Lucas Describes how BLU Acceleration Delivers In-T...
 
2016 02-16-announce-overview-zsp04505 usen
2016 02-16-announce-overview-zsp04505 usen2016 02-16-announce-overview-zsp04505 usen
2016 02-16-announce-overview-zsp04505 usen
 

Último

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Benchmarking Hadoop - Which hadoop sql engine leads the herd

  • 1. Which Hadoop SQL Engine Leads the Herd? Stewart Tate STSM, Chief Designer, IBM Big Data Cluster Lead Architect BigInsights Performance IBM Silicon Valley Laboratory ∙ San Jose ∙ California tates@us.ibm.com
  • 2. 2 © 2013 IBM Corporation Acknowledgements and Disclaimers Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. © Copyright IBM Corporation 2014. All rights reserved. — U.S. Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM, the IBM logo, ibm.com, BigInsights, and Big SQL are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or TM), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at •“Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml •TPC Benchmark, TPC-DS, and QphDS are trademarks of Transaction Processing Performance Council •Cloudera, the Cloudera logo, Cloudera Impala are trademarks of Cloudera. •Other company, product, or service names may be trademarks or service marks of others. 2
  • 3. 3 © 2013 IBM Corporation Our journey begins… •Evaluating three SQL engines in major Hadoop Distributions IBM BigInsights 3.0.0.1 Big SQL v3 Cloudera (CDH5) Impala 1.4 Hortonworks (HDP 2.1) Hive 0.13 •Workload Simulated Enterprise Retail Operation using 99 Queries Hadoop-DS based on TPC-DS benchmark
  • 4. 4 © 2013 IBM Corporation Three Identical Hadoop Clusters… Management Node One x3650 M4 BD Two E5-2680 v2 2.8GHz 10-core 128GB RAM, 1866MHz 2TB 3.5” HDD Dual-port 10GbE RHEL 6.4 EXT4/HDFS/Parquet/ORC Data Nodes Sixteen x3650 M4 BD Two E5-2680 2.8GHz 10-core 128GB RAM, 1866MHz Ten 2TB 3.5” HDD Four 120GB 3.5” SSD Dual-port 10GbE RHEL 6.4 EXT4/HDFS/Parquet/ORC Using Lenovo Servers designed for Hadoop
  • 5. 5 © 2013 IBM Corporation Benchmark Back Office…
  • 6. 6 © 2013 IBM Corporation Live access to Audited & Real-time results http://app.insightibm.com
  • 7. 7 © 2013 IBM Corporation Which SQL Engine Leads the Herd? View results from each Hadoop Distro via our mobile app http://app.insightibm.com
  • 8. 8 © 2013 IBM Corporation Big SQL – Runs 100% of the queries Key points Impala & Hive require many queries to be re-written, some significantly Owing to various restrictions, some queries could not be re- written or failed at run-time Re-writing queries in a benchmark scenario where results are known is one thing – doing this against real databases in production is another
  • 9. 9 © 2013 IBM Corporation Hadoop-DS benchmark – Single user performance @ 10TB Big SQL is 3.6x faster than Impala and 5.4x faster than Hive 0.13 for single query stream using 46 common queries
  • 10. 10 © 2013 IBM Corporation Hadoop-DS benchmark – multi-user performance @ 10TB With 4 streams, Big SQL is 2.1x faster than Impala and 8.5x faster than Hive 0.13 for 4 query streams using 46 common queries **See Speaker notes for disclaimer
  • 11. 11 © 2013 IBM Corporation Hadoop-DS benchmark – Big SQL with 99 queries @ 30TB Big SQL completed 4 concurrent query streams @30TB in 1.8x time of a single query stream Big SQL completed 396 queries in only 1.8x time of 99 queries
  • 12. 12 © 2013 IBM Corporation Thank you!
  • 13. 13 © 2013 IBM Corporation About TPC-DS Queries •The queries are diverse, and many are complex •Reflecting real business needs – a random sample: Find customers returning items more frequently than normal (q1) States with customers most ammenable to premium priced offers (q6) List key metrics for unadvertised in-store promotions by demographic (q7) Identify similar customers purchasing through multiple sales outlets (q10) Find customers shifting purchasing habits to the web (q11) Key measures for catalog sales fulfilled from an alternate warehouse (q16) Find frequently sold items and the circumstances under which repeat sales take place (q23) Understand the products and retail locations where items are likely to be return and subsequently re-purchased via the catalog (q29) Display customers making significant local purchases comparing to buying potential based on dependents and vehicles owned (q34)
  • 14. 14 © 2013 IBM Corporation Hadoop-DS benchmark •Aim: To provide the fairest and most meaningful comparison of SQL over Hadoop solutions so far •Hadoop-DS benchmark is based on the TPC-DS* benchmark. Key deviations: No data maintenance or data persistence phases •Not possible across all vendors Uses a common query set across all solutions •The sub-set of queries that all vendors can successfully execute at that scale factor •Queries are not cherry picked •It is the most complete TPCDS like benchmark executed to date •Worked with TPC certified auditor to review the benchmark •Is analogues to porting a relational workload to SQL on Hadoop Bringing Order out of Chaos
  • 15. 15 © 2013 IBM Corporation Big SQL 3.0 – Key Performance Features •Advanced re-write and cost based optimizer backed by decades of IBM R&D •Optimized HDFS readers for different storage formats Native readers •Self tuning memory manager Optimize memory allocation between Big SQL consumers depending on the workload •Informational Constraints •Advanced Statistics •Resource sharing and WLM