SlideShare a Scribd company logo
1 of 40
Download to read offline
Our Presto use case
and
performance test
Hironori Ogibayashi
Shin Matsuura
About us
● Hironori Ogibayashi(@angostura11)
● Shin Matsuura
○ IT Infrastructure team in Japanese
telecommunications carrier
○ Mainly working on middleware - test,
installation, deployment.
Todays Topic
● Presto use case
○ Deployment
○ Use case
○ Challenges
○ Future work
● Performance comparison between
Hive+Tez and Presto
Presto use case
Log Collection Flow
Fluentd
Aggregator
Hadoop Cluster Application
WebHDFS
・1500 Fluentd instances
・25,000 msg / sec
・400GB / day
・150 types of log
Log Usage
● Systems Infrastructure team
○ Checking trends in server performance
○ Performance analysis of Oracle
Database
● Application development team
○ Improving system and business
operations.
Application for Oracle DB Performance Analysis
- Check existing/potential problems of
Oracle database, for certain system,
certain period.
- Utilize logs stored in HDFS. Queries
were executed on Hive.
- But, it took more than one hour to
get the result...
- (So, we migrated to Presto.)
Why Presto?
● Frequent use of Interactive / ad-hoc
queries.
● Of cource, faster is better.
Hadoop Slave
Presto Deployment
Hadoop Slave
DataNode
TaskTracker
Presto Worker
Presto
Coordinator
Hive Metastore
Application/Client
・・・
● A decicated physical machine as a
Coordinator.
● Workers run on each Hadoop slaves.
● Logs in HDFS are periodically
converted to RCfiles.
● Presto versions
○ 0.66⇒0.73⇒0.75⇒0.82
Deployment Effect - Elapsed time of a single query
230sec
7sec
- Elapsed time of one of
the queries issued by the
application.
- Query was run on CDH4
(MRv1) cluster.
Deployment and Operation
● Deployment
○ Easy;Just extract binaries in each server and modify
configuration file.
○ Automated by Ansible + yum.
● What we use in operation
○ Query history
■ Coordinator Web UI
○ Logs
■ /var/presto/data/logs/{server.log,launcher.log}
○ Metrics
■ presto-metrics(https://github.com/xerial/presto-
metrics)⇒Fluentd⇒Elasticsearch + Kibana
○ sys schema
Challenges
● Worker crash / hang.
○ OutOfMemory. In case of hanging, we resolve to “kill -9”.
○ We Modified the memory parameter: task.shard.max-
threads×task.max-memory < -Xmx
● At first, we set node-scheduler.include-coordinator=true.
In which case, Coordinator crashed due to heavy query.
● SQL difference from HiveQL
○ At first our Application used both Hive and Presto because we used
Presto experimentally.Hence the Application had to support both
HiveQL and Presto(ANSI SQL).
○ Now, the application no longer use Hive.
Future work
● Improve Coodinator’s availability.
● Security
○ Now, all queries are executed as Presto’s daemon user.
● Resource isolation between Presto and Hadoop daemons.
Presto VS Hive+Tez
Contents
From a Performance perspective
Presto VS Hive+Tez
(not tuning any parameteres)
Conclusion
Presto VS Hive+Tez
Win Lose
How Fast??
Presto VS Hive+Tez
2.0~136 times
more details
Testing environment Configurations 2p12c
64GB Mem
36TB Disk
NN
DN DN DN
Hadoop(HDP2.1)
Presto(0.82)
Coodinator
Worker Worker Worker
Master : 3nodes
Slave : 3nodes
NN
Metastore
Sample data
300GB
csv file
50 columns
1.1B records
Performance measurement perspectives
• Query patterns
• Data format patterns
• Repetitive Querying
Query patterns
Queries
Query1: select count(*) from TestTBL
Query2: select * from TestTBL where col1 = ‘XXX’
Query3: select * from TestTBL where col1 = ‘XXX’ and col2 = ‘YYY’
Query4: select col1, count(*) from TestTBL group by col1
Query5: select col1, count(*) from TestTBL where col2 = ‘YYY’ group by col1
data format :Txt
Results: Query patterns
data format :Txt
Results: Query patterns
100x faster
Presto was faster in processing speed than
Hive+Tez in all queries.
Data format patterns
Data formats
• Text File (Textfile)
• Record Columnar File (RCfile)
• Optimized Row Columnar File (ORCfile)
Results: Data format patterns
※Query: Query2
Results: Data format patterns
※Query: Query2
Presto was faster in processing speed
than Hive+Tez in all data formats.
Repetitive Querying
Change in processing time with repetitions(Presto)
※Query: Query2
※Data format: Txt
Change in processing time with repetitions (Presto)
※Query: Query2
※Data format: Txt
Became faster After the second time.
Cache ???
2.5x faster
Change in processing time with repetitions (Hive+Tez)
※Query: Query2
※Data format: Txt
Change in processing time with repetitions (Hive+Tez)
※Query: Query2
※Data format: Txt
No real change in processing time
+α
Engine:Presto
Query × Data format
Engine:Presto
Query × Data format
Is using RCfile the most stable and fastest
way ??
Summary
Result
● Presto was faster than Hive+Tez in all queries.
● Presto was faster than Hive+Tez in all data formats.
● With repetitive Querying, presto became faster.
● By Using RCfile, Presto was the most stable and fastest.
Next
● Benchmark from node scaling and data volumn
perspectives.
● Benchmark while using compression functions of
ORCfile.
● Benchmark with HDP2.2.
Appendix
ほぼすべての条件で
2回目以降高速

More Related Content

What's hot

Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small StartHiroshi Toyama
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Martin Traverso
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookTreasure Data, Inc.
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015N Masahiro
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016kbajda
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case Kai Sasaki
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178Kai Sasaki
 
Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.Wojciech Biela
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Taro L. Saito
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...viirya
 
Boston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseBoston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseMatt Fuller
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FutureDataWorks Summit
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessSATOSHI TAGOMORI
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsSATOSHI TAGOMORI
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiDatabricks
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
 

What's hot (20)

Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small Start
 
Presto
PrestoPresto
Presto
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
Presto
PrestoPresto
Presto
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178
 
Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Boston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseBoston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the Enterprise
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench Tools
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
 

Viewers also liked

Amazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto MeetupAmazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto Meetupstevemcpherson
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
A Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on TezA Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on TezGw Liu
 
SQL on Hadoop 比較検証 【2014月11日における検証レポート】
SQL on Hadoop 比較検証 【2014月11日における検証レポート】SQL on Hadoop 比較検証 【2014月11日における検証レポート】
SQL on Hadoop 比較検証 【2014月11日における検証レポート】NTT DATA OSS Professional Services
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Sadayuki Furuhashi
 

Viewers also liked (6)

Amazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto MeetupAmazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto Meetup
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
A Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on TezA Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on Tez
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
SQL on Hadoop 比較検証 【2014月11日における検証レポート】
SQL on Hadoop 比較検証 【2014月11日における検証レポート】SQL on Hadoop 比較検証 【2014月11日における検証レポート】
SQL on Hadoop 比較検証 【2014月11日における検証レポート】
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
 

Similar to 20140120 presto meetup_en

Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioAlluxio, Inc.
 
ApacheCon 2022_ Large scale unification of file format.pptx
ApacheCon 2022_ Large scale unification of file format.pptxApacheCon 2022_ Large scale unification of file format.pptx
ApacheCon 2022_ Large scale unification of file format.pptxXinliShang1
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudZhenxiao Luo
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simpleDori Waldman
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodbPGConf APAC
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationScott Miao
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Job Queues Overview
Job Queues OverviewJob Queues Overview
Job Queues Overviewjoeyrobert
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...javier ramirez
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopPrasanna Rajaperumal
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.comRenzo Tomà
 
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...Hisham Mardam-Bey
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloudOVHcloud
 

Similar to 20140120 presto meetup_en (20)

Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
ApacheCon 2022_ Large scale unification of file format.pptx
ApacheCon 2022_ Large scale unification of file format.pptxApacheCon 2022_ Large scale unification of file format.pptx
ApacheCon 2022_ Large scale unification of file format.pptx
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS Cloud
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Job Queues Overview
Job Queues OverviewJob Queues Overview
Job Queues Overview
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoop
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
 
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 

Recently uploaded

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

20140120 presto meetup_en

  • 1. Our Presto use case and performance test Hironori Ogibayashi Shin Matsuura
  • 2. About us ● Hironori Ogibayashi(@angostura11) ● Shin Matsuura ○ IT Infrastructure team in Japanese telecommunications carrier ○ Mainly working on middleware - test, installation, deployment.
  • 3. Todays Topic ● Presto use case ○ Deployment ○ Use case ○ Challenges ○ Future work ● Performance comparison between Hive+Tez and Presto
  • 5. Log Collection Flow Fluentd Aggregator Hadoop Cluster Application WebHDFS ・1500 Fluentd instances ・25,000 msg / sec ・400GB / day ・150 types of log
  • 6. Log Usage ● Systems Infrastructure team ○ Checking trends in server performance ○ Performance analysis of Oracle Database ● Application development team ○ Improving system and business operations.
  • 7. Application for Oracle DB Performance Analysis - Check existing/potential problems of Oracle database, for certain system, certain period. - Utilize logs stored in HDFS. Queries were executed on Hive. - But, it took more than one hour to get the result... - (So, we migrated to Presto.)
  • 8. Why Presto? ● Frequent use of Interactive / ad-hoc queries. ● Of cource, faster is better.
  • 9. Hadoop Slave Presto Deployment Hadoop Slave DataNode TaskTracker Presto Worker Presto Coordinator Hive Metastore Application/Client ・・・ ● A decicated physical machine as a Coordinator. ● Workers run on each Hadoop slaves. ● Logs in HDFS are periodically converted to RCfiles. ● Presto versions ○ 0.66⇒0.73⇒0.75⇒0.82
  • 10. Deployment Effect - Elapsed time of a single query 230sec 7sec - Elapsed time of one of the queries issued by the application. - Query was run on CDH4 (MRv1) cluster.
  • 11. Deployment and Operation ● Deployment ○ Easy;Just extract binaries in each server and modify configuration file. ○ Automated by Ansible + yum. ● What we use in operation ○ Query history ■ Coordinator Web UI ○ Logs ■ /var/presto/data/logs/{server.log,launcher.log} ○ Metrics ■ presto-metrics(https://github.com/xerial/presto- metrics)⇒Fluentd⇒Elasticsearch + Kibana ○ sys schema
  • 12. Challenges ● Worker crash / hang. ○ OutOfMemory. In case of hanging, we resolve to “kill -9”. ○ We Modified the memory parameter: task.shard.max- threads×task.max-memory < -Xmx ● At first, we set node-scheduler.include-coordinator=true. In which case, Coordinator crashed due to heavy query. ● SQL difference from HiveQL ○ At first our Application used both Hive and Presto because we used Presto experimentally.Hence the Application had to support both HiveQL and Presto(ANSI SQL). ○ Now, the application no longer use Hive.
  • 13. Future work ● Improve Coodinator’s availability. ● Security ○ Now, all queries are executed as Presto’s daemon user. ● Resource isolation between Presto and Hadoop daemons.
  • 15. Contents From a Performance perspective Presto VS Hive+Tez (not tuning any parameteres)
  • 17. How Fast?? Presto VS Hive+Tez 2.0~136 times
  • 19. Testing environment Configurations 2p12c 64GB Mem 36TB Disk NN DN DN DN Hadoop(HDP2.1) Presto(0.82) Coodinator Worker Worker Worker Master : 3nodes Slave : 3nodes NN Metastore
  • 20. Sample data 300GB csv file 50 columns 1.1B records
  • 21. Performance measurement perspectives • Query patterns • Data format patterns • Repetitive Querying
  • 23. Queries Query1: select count(*) from TestTBL Query2: select * from TestTBL where col1 = ‘XXX’ Query3: select * from TestTBL where col1 = ‘XXX’ and col2 = ‘YYY’ Query4: select col1, count(*) from TestTBL group by col1 Query5: select col1, count(*) from TestTBL where col2 = ‘YYY’ group by col1
  • 24. data format :Txt Results: Query patterns
  • 25. data format :Txt Results: Query patterns 100x faster Presto was faster in processing speed than Hive+Tez in all queries.
  • 27. Data formats • Text File (Textfile) • Record Columnar File (RCfile) • Optimized Row Columnar File (ORCfile)
  • 28. Results: Data format patterns ※Query: Query2
  • 29. Results: Data format patterns ※Query: Query2 Presto was faster in processing speed than Hive+Tez in all data formats.
  • 31. Change in processing time with repetitions(Presto) ※Query: Query2 ※Data format: Txt
  • 32. Change in processing time with repetitions (Presto) ※Query: Query2 ※Data format: Txt Became faster After the second time. Cache ??? 2.5x faster
  • 33. Change in processing time with repetitions (Hive+Tez) ※Query: Query2 ※Data format: Txt
  • 34. Change in processing time with repetitions (Hive+Tez) ※Query: Query2 ※Data format: Txt No real change in processing time
  • 35.
  • 37. Engine:Presto Query × Data format Is using RCfile the most stable and fastest way ??
  • 38. Summary Result ● Presto was faster than Hive+Tez in all queries. ● Presto was faster than Hive+Tez in all data formats. ● With repetitive Querying, presto became faster. ● By Using RCfile, Presto was the most stable and fastest. Next ● Benchmark from node scaling and data volumn perspectives. ● Benchmark while using compression functions of ORCfile. ● Benchmark with HDP2.2.