SlideShare a Scribd company logo
Enviar pesquisa
Carregar
Entrar
Cadastre-se
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
Denunciar
Luke Han
Seguir
Co-Founder & CEO at Kyligence Inc. em Kyligence
21 de Oct de 2015
•
0 gostou
•
970 visualizações
1
de
24
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
21 de Oct de 2015
•
0 gostou
•
970 visualizações
Baixar agora
Baixar para ler offline
Denunciar
Software
Apache Tez Introducation - Apache Kylin Meetup @Shanghai
Luke Han
Seguir
Co-Founder & CEO at Kyligence Inc. em Kyligence
Recomendados
Tez: Accelerating Data Pipelines - fifthel
t3rmin4t0r
1.5K visualizações
•
33 slides
A TPC Benchmark of Hive LLAP and Comparison with Presto
Yu Liu
815 visualizações
•
11 slides
Data organization: hive meetup
t3rmin4t0r
2.1K visualizações
•
13 slides
Managing Machine Learning workflows on Treasure Data
Aki Ariga
2.6K visualizações
•
21 slides
Apache Tez – Present and Future
Jianfeng Zhang
617 visualizações
•
38 slides
Improve data engineering work with Digdag and Presto UDF
Kentaro Yoshida
1.9K visualizações
•
23 slides
Mais conteúdo relacionado
Mais procurados
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
3.2K visualizações
•
29 slides
201810 td tech_talk
Keisuke Suzuki
731 visualizações
•
57 slides
Llap: Locality is Dead
t3rmin4t0r
898 visualizações
•
16 slides
Quick Introduction to Apache Tez
GetInData
4K visualizações
•
20 slides
October 2014 HUG : Hive On Spark
Yahoo Developer Network
4.8K visualizações
•
18 slides
Recent Changes and Challenges for Future Presto
Kai Sasaki
1.3K visualizações
•
32 slides
Mais procurados
(20)
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
•
3.2K visualizações
201810 td tech_talk
Keisuke Suzuki
•
731 visualizações
Llap: Locality is Dead
t3rmin4t0r
•
898 visualizações
Quick Introduction to Apache Tez
GetInData
•
4K visualizações
October 2014 HUG : Hive On Spark
Yahoo Developer Network
•
4.8K visualizações
Recent Changes and Challenges for Future Presto
Kai Sasaki
•
1.3K visualizações
Stinger Initiative - Deep Dive
Hortonworks
•
5.3K visualizações
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
Muga Nishizawa
•
1.4K visualizações
EMR and DynamoDB
Sohail M. Khan
•
1.9K visualizações
LLAP: long-lived execution in Hive
DataWorks Summit
•
17.2K visualizações
YARN - Presented At Dallas Hadoop User Group
Rommel Garcia
•
3.3K visualizações
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit
•
1K visualizações
Hive edw-dataworks summit-eu-april-2017
alanfgates
•
1.7K visualizações
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
•
1.6K visualizações
An Overview on Optimization in Apache Hive: Past, Present Future
DataWorks Summit/Hadoop Summit
•
1.5K visualizações
Geographica: A Benchmark for Geospatial RDF Stores
Kostis Kyzirakos
•
1.8K visualizações
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
•
60.3K visualizações
Apache Tez - Accelerating Hadoop Data Processing
hitesh1892
•
3K visualizações
Tune up Yarn and Hive
rxu
•
2.9K visualizações
Apache Hadoop YARN 2015: Present and Future
DataWorks Summit
•
2K visualizações
Similar a 3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
Apache Tez – Present and Future
Rajesh Balamohan
1K visualizações
•
38 slides
February 2014 HUG : Tez Details and Insides
Yahoo Developer Network
2.2K visualizações
•
24 slides
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
31.7K visualizações
•
44 slides
Tez Data Processing over Yarn
InMobi Technology
2.4K visualizações
•
29 slides
Apache Tez -- A modern processing engine
bigdatagurus_meetup
356 visualizações
•
45 slides
Tez big datacamp-la-bikas_saha
Data Con LA
651 visualizações
•
43 slides
Similar a 3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
(20)
Apache Tez – Present and Future
Rajesh Balamohan
•
1K visualizações
February 2014 HUG : Tez Details and Insides
Yahoo Developer Network
•
2.2K visualizações
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
•
31.7K visualizações
Tez Data Processing over Yarn
InMobi Technology
•
2.4K visualizações
Apache Tez -- A modern processing engine
bigdatagurus_meetup
•
356 visualizações
Tez big datacamp-la-bikas_saha
Data Con LA
•
651 visualizações
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
•
18.3K visualizações
Apache Tez: Accelerating Hadoop Query Processing
Hortonworks
•
19.6K visualizações
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks
•
5.3K visualizações
Mhug apache storm
Joseph Niemiec
•
744 visualizações
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Modern Data Stack France
•
1.5K visualizações
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
•
3.7K visualizações
YARN Ready: Integrating to YARN with Tez
Hortonworks
•
4.2K visualizações
Apache Apex Meetup at Cask
Apache Apex
•
679 visualizações
Stream Processing Everywhere - What to use?
MapR Technologies
•
2.2K visualizações
Interactive query in hadoop
Rommel Garcia
•
2.4K visualizações
Hortonworks Yarn Code Walk Through January 2014
Hortonworks
•
4.4K visualizações
DataTorrent Presentation @ Big Data Application Meetup
Thomas Weise
•
593 visualizações
Tajo_Meetup_20141120
Hyoungjun Kim
•
3K visualizações
La big datacamp2014_vikram_dixit
Data Con LA
•
1K visualizações
Mais de Luke Han
Augmented OLAP for Big Data
Luke Han
10.6K visualizações
•
37 slides
Apache Kylin and Use Cases - 2018 Big Data Spain
Luke Han
1.3K visualizações
•
39 slides
Refactoring your EDW with Mobile Analytics Products
Luke Han
311 visualizações
•
48 slides
Building Enterprise OLAP on Hadoop for FSI
Luke Han
981 visualizações
•
30 slides
Apache Kylin Use Cases in China and Japan
Luke Han
1.2K visualizações
•
22 slides
The Apache Way - Building Open Source Community in China - Luke Han
Luke Han
794 visualizações
•
36 slides
Mais de Luke Han
(19)
Augmented OLAP for Big Data
Luke Han
•
10.6K visualizações
Apache Kylin and Use Cases - 2018 Big Data Spain
Luke Han
•
1.3K visualizações
Refactoring your EDW with Mobile Analytics Products
Luke Han
•
311 visualizações
Building Enterprise OLAP on Hadoop for FSI
Luke Han
•
981 visualizações
Apache Kylin Use Cases in China and Japan
Luke Han
•
1.2K visualizações
The Apache Way - Building Open Source Community in China - Luke Han
Luke Han
•
794 visualizações
The Evolution of Apache Kylin by Luke Han
Luke Han
•
1.5K visualizações
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
Luke Han
•
3.7K visualizações
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
Luke Han
•
1.6K visualizações
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
Luke Han
•
4.2K visualizações
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
Luke Han
•
3.1K visualizações
Apache Kylin Open Source Journey for QCon2015 Beijing
Luke Han
•
1.2K visualizações
ApacheKylin_HBaseCon2015
Luke Han
•
433 visualizações
Apache Kylin Extreme OLAP Engine for Big Data
Luke Han
•
2.9K visualizações
Apache Kylin Introduction
Luke Han
•
1.8K visualizações
Adding Spark support to Kylin at Bay Area Spark Meetup
Luke Han
•
1.4K visualizações
Apache kylin - Big Data Technology Conference 2014 Beijing
Luke Han
•
2.1K visualizações
Kylin OLAP Engine Tour
Luke Han
•
5.8K visualizações
Actuate presentation 2011
Luke Han
•
1.2K visualizações
Último
Domain storytelling-one size fit all process
Michael Chen
10 visualizações
•
24 slides
COA.pptx
GoluTiwari22
11 visualizações
•
14 slides
Transformer Models_ BERT vs. GPT.pdf
helloworld28847
5 visualizações
•
14 slides
Salesforce @AXA.pdf
PatrickYANG48
9 visualizações
•
13 slides
Tracking user activity logs using Loggastic #ApiPlatformCon
Paula Čučuk
28 visualizações
•
64 slides
Freight Management System
Freightoscope
7 visualizações
•
12 slides
Último
(20)
Domain storytelling-one size fit all process
Michael Chen
•
10 visualizações
COA.pptx
GoluTiwari22
•
11 visualizações
Transformer Models_ BERT vs. GPT.pdf
helloworld28847
•
5 visualizações
Salesforce @AXA.pdf
PatrickYANG48
•
9 visualizações
Tracking user activity logs using Loggastic #ApiPlatformCon
Paula Čučuk
•
28 visualizações
Freight Management System
Freightoscope
•
7 visualizações
Road to NODES 2023: Graphing Relational Databases
Neo4j
•
55 visualizações
DevOps and SF.pdf
PatrickYANG48
•
5 visualizações
The Next Era of CRM.pdf
PatrickYANG48
•
8 visualizações
TorfsBot or Not? Evaluating User Perception on Imitative Text Generation (CLI...
Thomas Winters
•
11 visualizações
MicroK8s 1.28 - MicroCeph on MicroK8s.pdf
Konstantinos Tsakalozos
•
9 visualizações
KaseSync: Revolutionizing Support Experiences With Community-CRM Integration
Grazitti Interactive
•
6 visualizações
Our Story, Orange Nile
ManolodelaFuente1
•
16 visualizações
Why Should You Choose a Personal Trainer over Group Gym Classes?
Neighborhood Trainer
•
38 visualizações
Winter 24 Highlights.pdf
PatrickYANG48
•
10 visualizações
PostgreSQL Prologue
Md. Golam Hossain
•
14 visualizações
Alliance Expedition Battle
Silver Caprice
•
1.5K visualizações
baklink.docx
AbdAsisHusainSalam
•
5 visualizações
The Pharo Debugger and Debugging tools: Advances and Roadmap
ESUG
•
33 visualizações
Cloud Powered Dynamo for Dynamics 365 FO Payroll Management Improves Efficien...
Dynamics Business Solutions
•
10 visualizações
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
1.
© Hortonworks Inc.
2015 Page 1 Apache Tez -‐ Next Generation of execution engine upon hadoop Jeff Zhang (@zjffdu)
2.
© Hortonworks Inc.
2015 Who’s this guy • Start use pig from 2009. Become Pig committer from Nov 2009 • Join Hortonworks in 2014. • Tez Committer from Oct 2014
3.
© Hortonworks Inc.
2015 Agenda •Tez Introduction •Tez Feature Deep Dive •Tez Status & Roadmap
4.
© Hortonworks Inc.
2015 I/O Synchronization Barrier I/O Synchronization Barrier Job 1 ( Join a & b ) Job 3 ( Group by of c ) Job 2 (Group by of a Join b) Job 4 (Join of S & R ) Hive -‐ MR Example of MR versus Tez Page 4 Single Job Hive -‐ Tez Join a & b Group by of a Join b Group by of c Job 4 (Join of S & R )
5.
© Hortonworks Inc.
2015 Tez – Introduction Page 5 • Distributed execution framework targeted towards data-‐processing applications. • Based on expressing a computation as a dataflow graph (DAG). • Highly customizable to meet a broad spectrum of use cases. • Built on top of YARN – the resource management framework for Hadoop. • Open source Apache project and Apache licensed.
6.
© Hortonworks Inc.
2015 What is DAG & Why DAG Projection Filter GroupBy … Join Union Intersect … Split … • Directed Acyclic Graph • Any complicated DAG can been composed of the following 3 basic paradigm – Sequential – Merge – Divide
7.
© Hortonworks Inc.
2015 Expressing DAG in Tez API • DAG API (Logic View) – Allowuser to build DAG – Topological structure of the data computation flow • Runtime API (Runtime View) – Application logic of each computation unit (vertex) – How to move/read/write data between vertices
8.
© Hortonworks Inc.
2015 DAG API (Logic View) Page 8 • Vertex (Processor, Parallelism, Resource, etc…) • Edge (EdgeProperty) – DataMovement – Scatter Gather (Join, GroupBy … ) – Broadcast ( Pig Replicated Join / Hive Broadcast Join ) – One-‐to-‐One ( Pig Order by ) – Custom
9.
© Hortonworks Inc.
2015 Runtime API (Runtime View) Page 9 ProcessorInput Output • Input – Through which processor receives data on an edge – Vertex can have multiple inputs • Processor – Application Logic (One vertex one processor) – Consume the inputs and produce the outputs • Output – Through which processor writes data to an edge – One vertex can have multiple outputs • Example of Input/Output/Processor – MRInput & MROutput (InputFormat/OutputFormat) – OrderedGroupedKVInput & OrderedPartitionedKVOutput (Scatter Gather) – UnorderedKVInput & UnorderedKVOutput (Broadcast & One-‐to-‐One) – PigProcessor/HiveProcessor
10.
© Hortonworks Inc.
2015 Benefit of DAG • Easier to express computation in DAG • No intermediate data written to HDFS • Less pressure on NameNode • No resource queuing effort & less resource contention • More optimization opportunity with more global context
11.
© Hortonworks Inc.
2015 Agenda •Tez Introduction •Tez Feature Deep Dive •Tez Improvement & Debuggability •Tez Status & Roadmap
12.
© Hortonworks Inc.
2015 Container-‐Reuse • Reuse the same container across DAG/Vertices/Tasks • Benefit of Container-‐Reuse – Less resources consumed – Reduce overhead of launching JVM – Reduce overhead of negotiate with Resource Manager – Reduce overhead of resource localization – Reduce network IO – Object Caching (Object Sharing)
13.
© Hortonworks Inc.
2015 Tez Session • Multiple Jobs/DAGs in one AM • Container-‐reuse across Jobs/DAGs • Data sharing between Jobs/DAGs
14.
© Hortonworks Inc.
2015 Dynamic Parallelism Estimation • VertexManager – Listen to the other vertices status – Coordinate and schedule its tasks – Communication between vertices
15.
© Hortonworks Inc.
2015 ATS Integration • Tez is fully integrated with YARN ATS (Application Timeline Service) – DAG Status, DAG Metrics, Task Status, Task Metrics are captured • Diagnostics & Performance analysis – Data Source for monitoring & diagnostics – Data Source for performance analysis
16.
© Hortonworks Inc.
2015 Recovery • AM can crash in corner cases – OOM – Node failure – … • Continue from the last checkpoint • Transparent to end users AM Crash
17.
© Hortonworks Inc.
2015 Order By of Pig f = Load ‘foo’ as (x, y); o = Order f by x;Load Sample (Calculate Histogram) HDFS Partition Sort Broadcast Load Sample (Calculate Histogram) Partition Sort One-‐to-‐One Scatter Gather Scatter Gather
18.
© Hortonworks Inc.
2015 Tez UI
19.
© Hortonworks Inc.
2015 Tez UI
20.
Tez UI 20 Download data
from ATS
21.
© Hortonworks Inc.
2015 RoadMap • Shared output edges – Same output to multiple vertices • Local mode stabilization • Optimizing (include/exclude) vertex at runtime • Partial completion VertexManager • Co-‐Scheduling • Framework stats for better runtime decisions
22.
© Hortonworks Inc.
2015 Tez – Adoption • Apache Hive • Start from Hive 0.13 • set hive.exec.engine = tez • Apache Pig • Start from Pig 0.14 • pig -‐x tez • Cascading • Flink Page 22
23.
© Hortonworks Inc.
2015 Tez Community • Useful Links – http://tez.apache.org/ – JIRA : https://issues.apache.org/jira/browse/TEZ – Code Repository: https://git-‐wip-‐us.apache.org/repos/asf/tez.git – Mailing Lists – Dev List: dev@tez.apache.org – User List: user@tez.apache.org – Issues List: issues@tez.apache.org • Tez Meetup – http://www.meetup.com/Apache-‐Tez-‐User-‐Group
24.
© Hortonworks Inc.
2015 Thank You! Questions & Answers Page 24