Enviar pesquisa
Carregar
Low Latency SQL on Hadoop - What's best for your cluster
•
8 gostaram
•
2,959 visualizações
DataWorks Summit
Seguir
Tecnologia
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 38
Recomendados
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera, Inc.
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera, Inc.
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
Cloudera, Inc.
Impala presentation
Impala presentation
trihug
Applications on Hadoop
Applications on Hadoop
markgrover
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
Data Science London
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
Architecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
Recomendados
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera, Inc.
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera, Inc.
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
Cloudera, Inc.
Impala presentation
Impala presentation
trihug
Applications on Hadoop
Applications on Hadoop
markgrover
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
Data Science London
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
Architecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
Introduction to Impala
Introduction to Impala
markgrover
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Data Con LA
Cloudera impala
Cloudera impala
Swiss Big Data User Group
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Cloudera, Inc.
The Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
Cloudera, Inc.
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Jeremy Beard
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
Cloudera, Inc.
Cloudera Impala
Cloudera Impala
Scott Leberknight
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera, Inc.
Introduction to Apache Kudu
Introduction to Apache Kudu
Shravan (Sean) Pabba
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
hadooparchbook
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Cloudera, Inc.
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
markgrover
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
Jonathan Seidman
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
DataWorks Summit
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
DataStax Academy
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
Mais conteúdo relacionado
Mais procurados
Introduction to Impala
Introduction to Impala
markgrover
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Data Con LA
Cloudera impala
Cloudera impala
Swiss Big Data User Group
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Cloudera, Inc.
The Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
Cloudera, Inc.
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Jeremy Beard
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
Cloudera, Inc.
Cloudera Impala
Cloudera Impala
Scott Leberknight
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera, Inc.
Introduction to Apache Kudu
Introduction to Apache Kudu
Shravan (Sean) Pabba
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
hadooparchbook
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Cloudera, Inc.
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
markgrover
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
Jonathan Seidman
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
DataWorks Summit
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
Mais procurados
(20)
Introduction to Impala
Introduction to Impala
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Cloudera impala
Cloudera impala
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
The Impala Cookbook
The Impala Cookbook
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
Cloudera Impala
Cloudera Impala
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
Introduction to Apache Kudu
Introduction to Apache Kudu
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Semelhante a Low Latency SQL on Hadoop - What's best for your cluster
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
DataStax Academy
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
Real Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely chen
Building data pipelines with kite
Building data pipelines with kite
Joey Echeverria
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
Yifeng Jiang
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Mac Moore
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Inside Analysis
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
DataWorks Summit
Get most out of Spark on YARN
Get most out of Spark on YARN
DataWorks Summit
Hive on spark berlin buzzwords
Hive on spark berlin buzzwords
Szehon Ho
PySpark Best Practices
PySpark Best Practices
Cloudera, Inc.
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch Integration
MapR Technologies
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in Hadoop
Cloudera Japan
Big Data Processing with Hadoop-MapReduce in Cloud Systems
Big Data Processing with Hadoop-MapReduce in Cloud Systems
Intellipaat
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
MapR Technologies
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
DataStax
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
MLconf
BDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using Impala
David Lauzon
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
Jeremy Beard
Semelhante a Low Latency SQL on Hadoop - What's best for your cluster
(20)
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Real Time Analytics with Dse
Real Time Analytics with Dse
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Building data pipelines with kite
Building data pipelines with kite
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
Get most out of Spark on YARN
Get most out of Spark on YARN
Hive on spark berlin buzzwords
Hive on spark berlin buzzwords
PySpark Best Practices
PySpark Best Practices
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch Integration
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in Hadoop
Big Data Processing with Hadoop-MapReduce in Cloud Systems
Big Data Processing with Hadoop-MapReduce in Cloud Systems
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
BDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using Impala
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
Mais de DataWorks Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
Mais de DataWorks Summit
(20)
Data Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Último
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Katpro Technologies
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Delhi Call girls
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Igalia
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Khem
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Digital Insurer
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Results
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
Último
(20)
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Low Latency SQL on Hadoop - What's best for your cluster
1.
Low Latency SQL
on Hadoop What’s best for your cluster? Prepared by Alan Gardner June 2014
2.
Alan Gardner © 2013
Pythian2 @alanctgardner gardner@pythian.com
3.
© 2013 Pythian3
4.
© 2013 Pythian4
5.
Overview • Performance • Architecture •
Features • Vendor Support • Conclusions © 2013 Pythian5
6.
Performance
7.
Berkeley Big Data
Benchmark • Hive, Hive-on-Tez, RedShift, Shark, Impala • Tested on five m2.4xlarge EC2 instances • Uses Intel’s Hadoop Benchmark, not TPC • ~150GB of © 2013 Pythian7
8.
Berkeley Big Data
Benchmark • Finds Shark fastest at straight scans, and tied with Impala for aggregation and joining • Hive-on-Tez is a distant third • Not using the optimized, columnar formats © 2013 Pythian8
9.
Cloudera SQL Benchmark •
Impala, Hive-on-Tez, Shark and Presto • Uses high-end hardware with relatively large memory, fastest data types for each engine • 15TB scale factor for a TPC-DS based test © 2013 Pythian9
10.
Cloudera SQL Benchmark •
Finds Impala to be significantly faster across all data sizes • Shark and Tez outperform Presto 0.60, with Tez performing better for larger result sets • It’s unclear if table© 2013 Pythian10
11.
Our Configuration • 9-node
cluster of m2.2xlarge instances • 4 cores, 34GB RAM • 850GB of instance storage • 100GB scale factor – only from disk, no RDDs • Impala 1.3.1 on CDH 5.0.1 • Hive 0.13 from the© 2013 Pythian11
12.
File Formats • Hive,
Shark - ORC (ZLIB) • Presto - ORC (ZLIB) – RCFile (LazyBinarySerDe) was slower – RCFile (ColumnarSerDe) may be better • Impala – Parquet (no compression) © 2013 Pythian12
13.
© 2013 Pythian13
14.
TPC-H Queries • Query
1 – filtering and aggregation on a single table • Query 8 – select two columns from joins across many-to-many relationships • Query 10 – select and aggregate on eight© 2013 Pythian14
15.
© 2013 Pythian15
16.
Architecture
17.
© 2013 Pythian17 •
Hive 0.13 runs on Tez, which executes queries as DAGs • DAGs are more efficient than MRv1 query plans • Runs on YARN, resources are shared between all jobs • Individual node failures are tolerated and retried automatically
18.
© 2013 Pythian18 •
HiveServer creates a DAG from HQL submitted over JDBC • HiveServer requests or reuses a Tez AM to run the query • Tez handles placement of query fragments based on locality and resources
19.
© 2013 Pythian19 •
Shark uses the same core as Hive: the HQL parser and the file and UDF interfaces are compatible • DAGs produced by Shark are optimized for Spark, rather than Tez • Spark can be run on YARN for resource sharing, as well as Mesos or stand- alone
20.
© 2013 Pythian20 •
Spark is more mature and offers a wider range of optimizations right now • Shark also supports storing results as an RDD within Spark
21.
© 2013 Pythian21 •
Impala runs as an engine ‘next to’ YARN, not on top of it • To reduce resource contention and allow scheduling to be centralized in YARN, Llama was created • Llama creates “fake” applications on YARN as placeholders for Impala
22.
© 2013 Pythian22 •
Impalad receives queries, plans and executes them • Statestore broadcasts metadata updates and node status • Catalog caches block metadata and Hive table metadata
23.
© 2013 Pythian23 •
Presto doesn’t interact with YARN at all • cgroups are the only way to share resources between YARN jobs and Presto • Presto also handles all scheduling and job placement by itself
24.
© 2013 Pythian24 •
Presto has a single coordinator which plans and distributes query fragments • Workers are still co-located with DataNodes for locality • Discovery service manages worker status
25.
Functionality
26.
© 2013 Pythian26
27.
© 2013 Pythian27
28.
Text RCFile Parquet
ORCFile Avro SequenceFile Presto R R R R R R Impala R/W R R/W - R R Hive/Shark R/W R/W R/W R/W R/W R/W © 2013 Pythian28 File Formats Flexibility SerDes Complex Data UDFs Spill to Disk JOIN Reordering Presto Yes Yes, but slow No No None Impala No No Yes No Cost-based Hive/Shark Yes Yes Yes Yes Cardinality
29.
Text RCFile Parquet
ORCFile Avro SequenceFile Presto R R R R R R Impala R/W R R/W - R R Hive/Shark R/W R/W R/W R/W R/W R/W © 2013 Pythian29 File Formats Flexibility SerDes Complex Data UDFs Spill to Disk JOIN Optimization Presto Yes Yes, but slow No No None Impala No No Yes No Cost-based Hive/Shark Yes Yes Yes Yes Cardinality
30.
Text RCFile Parquet
ORCFile Avro SequenceFile Presto R R R R R R Impala R/W R R/W - R R Hive/Shark R/W R/W R/W R/W R/W R/W © 2013 Pythian30 SerDes Complex Data UDFs Spill to Disk JOIN Optimization Presto Yes Yes, but slow No No None Impala No No Yes No Cost-based Hive/Shark Yes Yes Yes Yes Cardinality File Formats Flexibility
31.
Vendor Support
32.
© 2013 Pythian32 Cloudera
MapR HortonWorks Presto No No No Impala Yes Yes No Hive No Tez No Tez Yes Shark Spark Yes Spark Note: based on vendor documentation as of 31/05/2014 Official Support
33.
© 2013 Pythian33 Cloudera
MapR HortonWorks Presto No No No Impala Yes Yes No Hive No Tez No Tez Yes Shark Spark Yes Spark Note: based on vendor documentation as of 31/05/2014 Official Support
34.
© 2013 Pythian34 Cloudera
MapR HortonWorks Presto No No No Impala Yes Yes No Hive No Tez No Tez Yes Shark Spark Yes Spark Note: based on vendor documentation as of 31/05/2014 Official Support
35.
Conclusions
36.
© 2013 Pythian36 A
giant, indecipherable flowchart
37.
Conclusions • Shark provides
a faster alternative to Hive 0.13 for ETL and analytics, but support is lacking and tuning is difficult • Presto is still nascent – deployment is easy, but querying is not so simple © 2013 Pythian37
38.
Thank you –
Q&A To contact us gardner@pythian.com 1-877-PYTHIAN @pythian @alanctgardner © 2013 Pythian38