SlideShare uma empresa Scribd logo
1 de 46
Pivotal HAWQ 
A.Grishchenko 
HadoopKitchen @ Mail.ru 
27 Sep 2014 
Pivotal Confidential––Internal Use Only 1
SQL-on-Hadoop Solutions 
2008 
Hive 
 Developed by Facebook 
– Hive is used for data analysis in their data warehouse 
– DWH size is ~300PB at the moment, ~600TB of data is loaded daily. Data 
is compressed using ORCFiles, compression ratio is ~8x 
 HiveQL language is not compatible with ANSI SQL-92 
 Has many limitations on subqueries 
 Cost-based optimizer (Optiq) is only in technical preview now 
Pivotal Confidential–Internal Use Only 2
SQL-on-Hadoop Solutions 
2008 
Hive 
 Developed by Cloudera 
10.2012 
Impala 
– Open-source solution 
– Cloudera sells this solution to enterprise shops 
– Was in beta until the May’2013 
 Supports HiveQL, moving forward complete ANSI SQL-92 support 
 Written in C++, does not use Map-Reduce for running queries 
 Requires much memory, big tables join usually causes OOM error 
Pivotal Confidential–Internal Use Only 3
SQL-on-Hadoop Solutions 
2008 
Hive 
 Hortonworks initiative 
10.2012 
Impala 
02.2013 
Stinger 
– Consists of a number of steps to make Hive run 100x faster 
 Tez – solution to make Hive queries be translated to Tez jobs, which are 
similar to Map-Reduce but may have arbitrary topology 
 Optiq – cost-based query optimizer for Hive (technical preview ATM) 
 ORCFile – columnar storage format with adaptive compression and 
inline indexes 
 Hive-5317 – ACID and Update/Delete support (release at ~ 11.2014) 
Pivotal Confidential–Internal Use Only 4
SQL-on-Hadoop Solutions 
2008 
Hive 
 Pivotal product 
10.2012 
Impala 
02.2013 
Stinger 
02.2013 
HAWQ 
– Greenplum MPP DBMS, ported to store data in HDFS 
– Written in C, query optimizer is rewritten for this solution (ORCA) 
 Supports ANSI SQL-92 and analytic extensions from SQL-2003 
 Supports complex queries with correlated subqueries, window functions 
and different joins 
 Data is put on disk only if the process does not have enough memory 
Pivotal Confidential–Internal Use Only 5
SQL-on-Hadoop Solutions 
2008 
Hive 
 HP Vertica 
10.2012 
Impala 
02.2013 
Stinger 
02.2013 
HAWQ 
– Supports only MapR distribution as requires updatable storage 
– Supports ANSI SQL-92, SQL-2003 
– Supports UPDATE/DELETE 
– Officially announced as available in July’2014, no implementations yet 
 IBM BigSQL v3 
– IBM DB2 ported to store data in HDFS 
– Federated queries, good query optimizer, etc. 
 Both solutions are similar to Pivotal HAWQ in general idea 
2014 
Vertica, 
BigSQL 
Pivotal Confidential–Internal Use Only 6
Pivotal HAWQ Components 
Master 
Server 1 
Server 3 
Segment 1 
Segment 2 
… 
Segment K 
Standby 
Master 
Server 2 
Server 4 
Segment K+1 
Segment K+2 
… 
Segment 2*K 
Server M 
… 
Segment N 
… 
Pivotal Confidential–Internal Use Only 7
Pivotal HAWQ Components 
Server 1 
HAWQ Master 
Server 2 
ZK QJM ZK QJM ZK QJM 
HAWQ SBMstr 
Server 5 
Datanode 
HAWQ Segm. 
Server 3 
NameNode 
… 
Server 4 
SNameNode 
Server 6 
Datanode 
HAWQ Segm. 
Server M 
Datanode 
HAWQ Segm. 
Pivotal Confidential–Internal Use Only 8
Pivotal HAWQ Components 
HAWQ Master 
Query Parser 
Query Optimizer 
Query Executor 
Transaction 
Manager 
Metadata 
Catalog 
Process 
Manager 
HAWQ Standby Master 
Query Parser 
Query Optimizer 
Query Executor 
Transaction 
Manager 
Metadata 
Catalog 
Process 
Manager 
WAL 
replic. 
Pivotal Confidential–Internal Use Only 9
Pivotal HAWQ Components 
 Metadata is stored only on master-servers 
 Metadata is stored in modified Postgres instance, replicated 
to standby master with WAL 
 Metadata contains 
– Table information – schema, names, files 
– Statistics – number of unique values, value ranges, sample values, 
etc. 
– Information about users, groups, priorities, etc. 
 Master server shutdown causes the switch to standby with 
the loss of running sessions 
Pivotal Confidential–Internal Use Only 10
Pivotal HAWQ Components 
HAWQ Segment 
Query Executor 
libhdfs3 
PXF 
HDFS Datanode 
Segment Data Directory 
Local Filesystem (xfs) 
Spill Data Directory 
Pivotal Confidential–Internal Use Only 11
Pivotal HAWQ Components 
 Both masters and segments are modified postgres 
instances (to be clear, modified Greenplum instances) 
 Opening connection to the master server you fork 
postmaster process that starts to work with your session 
 Starting the query execution you connect to the segment 
instances and they also fork a process to execute query 
 Query execution plan is split into independent blocks 
(slices), each of them is executed as a separate OS process 
on the segment server, moving the data through UDP 
Pivotal Confidential–Internal Use Only 12
Pivotal HAWQ Components 
 Tables can be stored as: 
– Row-oriented (quicklz, zlib compression) 
– Column-oriented (quicklz, zlib, rle compression) 
– Parquet tables 
 Each segment has separate directory on HDFS where it 
stores its data shard 
 Within columnar storage each column is represented as a 
separate file 
 Parquet allows to store the table by columns and does not 
load NameNode with many files / block location requests 
Pivotal Confidential–Internal Use Only 13
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 14
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 15
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 16
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 17
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 18
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 19
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 20
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 21
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 22
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 23
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 24
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 25
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 26
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 27
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 28
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 29
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 30
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 31
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 32
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 33
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 34
PXF Framework 
 Gives you ability to read different data types from HDFS 
– Text files, both compressed and uncompressed 
– Seqence-files 
– AVRO-files 
 Able to read data from external data sources 
– HBase 
– Cassandra 
– Redis 
 Extensible API 
Pivotal Confidential–Internal Use Only 35
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 36
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 37
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 38
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 39
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 40
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 41
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 42
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 43
Further Steps 
 Master server scaling – pool of master servers 
 New native data storage formats and new native 
compression algorithms 
 YARN as resource manager for HAWQ 
 Dynamic segment allocation / decommission 
Pivotal Confidential–Internal Use Only 44
Questions? 
Pivotal Confidential––Internal Use Only 45
BUILT FOR THE SPEED OF BUSINESS

Mais conteúdo relacionado

Mais procurados

Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesHadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesMithun Radhakrishnan
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impalamarkgrover
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application ResourcesDataWorks Summit
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoopmarkgrover
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Scott Leberknight
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges DataWorks Summit
 
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBuilding a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBradford Stephens
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillMapR Technologies
 

Mais procurados (20)

Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesHadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resources
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges
 
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBuilding a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache Drill
 

Destaque

Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...VMware Tanzu
 
Архитектура Apache HAWQ Highload++ 2015
Архитектура Apache HAWQ Highload++ 2015Архитектура Apache HAWQ Highload++ 2015
Архитектура Apache HAWQ Highload++ 2015Alexey Grishchenko
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopBigData Research
 
Managing Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARIManaging Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARIMithun (Matt) Mathew
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalHortonworks
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHortonworks
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQpivotalny
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 
Introduction to Impala ~Hadoop用のSQLエンジン~ #hcj13w
Introduction to Impala ~Hadoop用のSQLエンジン~ #hcj13wIntroduction to Impala ~Hadoop用のSQLエンジン~ #hcj13w
Introduction to Impala ~Hadoop用のSQLエンジン~ #hcj13wCloudera Japan
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewVMware Tanzu
 
Introduction to Greenplum
Introduction to GreenplumIntroduction to Greenplum
Introduction to GreenplumDave Cramer
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
[D22] Pivotal HD 2.0 -業界最高レベルSQL on Hadoop技術「HAWQ」解説- by Masayuki Matsushita
[D22] Pivotal HD 2.0 -業界最高レベルSQL on Hadoop技術「HAWQ」解説- by Masayuki Matsushita[D22] Pivotal HD 2.0 -業界最高レベルSQL on Hadoop技術「HAWQ」解説- by Masayuki Matsushita
[D22] Pivotal HD 2.0 -業界最高レベルSQL on Hadoop技術「HAWQ」解説- by Masayuki MatsushitaInsight Technology, Inc.
 
gsoc_mentor for Shivram Mani
gsoc_mentor for Shivram Manigsoc_mentor for Shivram Mani
gsoc_mentor for Shivram ManiShivram Mani
 

Destaque (20)

Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
MPP vs Hadoop
MPP vs HadoopMPP vs Hadoop
MPP vs Hadoop
 
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Архитектура Apache HAWQ Highload++ 2015
Архитектура Apache HAWQ Highload++ 2015Архитектура Apache HAWQ Highload++ 2015
Архитектура Apache HAWQ Highload++ 2015
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoop
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Build & test Apache Hawq
Build & test Apache Hawq Build & test Apache Hawq
Build & test Apache Hawq
 
Managing Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARIManaging Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARI
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 
Introduction to Impala ~Hadoop用のSQLエンジン~ #hcj13w
Introduction to Impala ~Hadoop用のSQLエンジン~ #hcj13wIntroduction to Impala ~Hadoop用のSQLエンジン~ #hcj13w
Introduction to Impala ~Hadoop用のSQLエンジン~ #hcj13w
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
 
Introduction to Greenplum
Introduction to GreenplumIntroduction to Greenplum
Introduction to Greenplum
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
[D22] Pivotal HD 2.0 -業界最高レベルSQL on Hadoop技術「HAWQ」解説- by Masayuki Matsushita
[D22] Pivotal HD 2.0 -業界最高レベルSQL on Hadoop技術「HAWQ」解説- by Masayuki Matsushita[D22] Pivotal HD 2.0 -業界最高レベルSQL on Hadoop技術「HAWQ」解説- by Masayuki Matsushita
[D22] Pivotal HD 2.0 -業界最高レベルSQL on Hadoop技術「HAWQ」解説- by Masayuki Matsushita
 
gsoc_mentor for Shivram Mani
gsoc_mentor for Shivram Manigsoc_mentor for Shivram Mani
gsoc_mentor for Shivram Mani
 
PXF BDAM 2016
PXF BDAM 2016PXF BDAM 2016
PXF BDAM 2016
 

Semelhante a Pivotal hawq internals

Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개Seungdon Choi
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases DataWorks Summit
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideBig data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideDanairat Thanabodithammachari
 
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQSQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQModern Data Stack France
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Vinoth Chandar
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop AdministrationEdureka!
 
HAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataHAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataDataWorks Summit
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Hawq meets Hive - DataWorks San Jose 2017
Hawq meets Hive - DataWorks San Jose 2017Hawq meets Hive - DataWorks San Jose 2017
Hawq meets Hive - DataWorks San Jose 2017Alex Diachenko
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Ferran Galí Reniu
 
HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out Sumeet Singh
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSergey Lukjanov
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterEdureka!
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 

Semelhante a Pivotal hawq internals (20)

Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideBig data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guide
 
5. pivotal hd 2013
5. pivotal hd 20135. pivotal hd 2013
5. pivotal hd 2013
 
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQSQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
HAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataHAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged Data
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Hawq meets Hive - DataWorks San Jose 2017
Hawq meets Hive - DataWorks San Jose 2017Hawq meets Hive - DataWorks San Jose 2017
Hawq meets Hive - DataWorks San Jose 2017
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out
 
May 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data OutMay 2013 HUG: HCatalog/Hive Data Out
May 2013 HUG: HCatalog/Hive Data Out
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 

Último

UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 

Último (20)

UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 

Pivotal hawq internals

  • 1. Pivotal HAWQ A.Grishchenko HadoopKitchen @ Mail.ru 27 Sep 2014 Pivotal Confidential––Internal Use Only 1
  • 2. SQL-on-Hadoop Solutions 2008 Hive  Developed by Facebook – Hive is used for data analysis in their data warehouse – DWH size is ~300PB at the moment, ~600TB of data is loaded daily. Data is compressed using ORCFiles, compression ratio is ~8x  HiveQL language is not compatible with ANSI SQL-92  Has many limitations on subqueries  Cost-based optimizer (Optiq) is only in technical preview now Pivotal Confidential–Internal Use Only 2
  • 3. SQL-on-Hadoop Solutions 2008 Hive  Developed by Cloudera 10.2012 Impala – Open-source solution – Cloudera sells this solution to enterprise shops – Was in beta until the May’2013  Supports HiveQL, moving forward complete ANSI SQL-92 support  Written in C++, does not use Map-Reduce for running queries  Requires much memory, big tables join usually causes OOM error Pivotal Confidential–Internal Use Only 3
  • 4. SQL-on-Hadoop Solutions 2008 Hive  Hortonworks initiative 10.2012 Impala 02.2013 Stinger – Consists of a number of steps to make Hive run 100x faster  Tez – solution to make Hive queries be translated to Tez jobs, which are similar to Map-Reduce but may have arbitrary topology  Optiq – cost-based query optimizer for Hive (technical preview ATM)  ORCFile – columnar storage format with adaptive compression and inline indexes  Hive-5317 – ACID and Update/Delete support (release at ~ 11.2014) Pivotal Confidential–Internal Use Only 4
  • 5. SQL-on-Hadoop Solutions 2008 Hive  Pivotal product 10.2012 Impala 02.2013 Stinger 02.2013 HAWQ – Greenplum MPP DBMS, ported to store data in HDFS – Written in C, query optimizer is rewritten for this solution (ORCA)  Supports ANSI SQL-92 and analytic extensions from SQL-2003  Supports complex queries with correlated subqueries, window functions and different joins  Data is put on disk only if the process does not have enough memory Pivotal Confidential–Internal Use Only 5
  • 6. SQL-on-Hadoop Solutions 2008 Hive  HP Vertica 10.2012 Impala 02.2013 Stinger 02.2013 HAWQ – Supports only MapR distribution as requires updatable storage – Supports ANSI SQL-92, SQL-2003 – Supports UPDATE/DELETE – Officially announced as available in July’2014, no implementations yet  IBM BigSQL v3 – IBM DB2 ported to store data in HDFS – Federated queries, good query optimizer, etc.  Both solutions are similar to Pivotal HAWQ in general idea 2014 Vertica, BigSQL Pivotal Confidential–Internal Use Only 6
  • 7. Pivotal HAWQ Components Master Server 1 Server 3 Segment 1 Segment 2 … Segment K Standby Master Server 2 Server 4 Segment K+1 Segment K+2 … Segment 2*K Server M … Segment N … Pivotal Confidential–Internal Use Only 7
  • 8. Pivotal HAWQ Components Server 1 HAWQ Master Server 2 ZK QJM ZK QJM ZK QJM HAWQ SBMstr Server 5 Datanode HAWQ Segm. Server 3 NameNode … Server 4 SNameNode Server 6 Datanode HAWQ Segm. Server M Datanode HAWQ Segm. Pivotal Confidential–Internal Use Only 8
  • 9. Pivotal HAWQ Components HAWQ Master Query Parser Query Optimizer Query Executor Transaction Manager Metadata Catalog Process Manager HAWQ Standby Master Query Parser Query Optimizer Query Executor Transaction Manager Metadata Catalog Process Manager WAL replic. Pivotal Confidential–Internal Use Only 9
  • 10. Pivotal HAWQ Components  Metadata is stored only on master-servers  Metadata is stored in modified Postgres instance, replicated to standby master with WAL  Metadata contains – Table information – schema, names, files – Statistics – number of unique values, value ranges, sample values, etc. – Information about users, groups, priorities, etc.  Master server shutdown causes the switch to standby with the loss of running sessions Pivotal Confidential–Internal Use Only 10
  • 11. Pivotal HAWQ Components HAWQ Segment Query Executor libhdfs3 PXF HDFS Datanode Segment Data Directory Local Filesystem (xfs) Spill Data Directory Pivotal Confidential–Internal Use Only 11
  • 12. Pivotal HAWQ Components  Both masters and segments are modified postgres instances (to be clear, modified Greenplum instances)  Opening connection to the master server you fork postmaster process that starts to work with your session  Starting the query execution you connect to the segment instances and they also fork a process to execute query  Query execution plan is split into independent blocks (slices), each of them is executed as a separate OS process on the segment server, moving the data through UDP Pivotal Confidential–Internal Use Only 12
  • 13. Pivotal HAWQ Components  Tables can be stored as: – Row-oriented (quicklz, zlib compression) – Column-oriented (quicklz, zlib, rle compression) – Parquet tables  Each segment has separate directory on HDFS where it stores its data shard  Within columnar storage each column is represented as a separate file  Parquet allows to store the table by columns and does not load NameNode with many files / block location requests Pivotal Confidential–Internal Use Only 13
  • 14. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 14
  • 15. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 15
  • 16. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 16
  • 17. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 17
  • 18. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 18
  • 19. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 19
  • 20. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 20
  • 21. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 21
  • 22. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 22
  • 23. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 23
  • 24. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 24
  • 25. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 25
  • 26. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 26
  • 27. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 27
  • 28. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 28
  • 29. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 29
  • 30. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 30
  • 31. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 31
  • 32. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 32
  • 33. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 33
  • 34. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 34
  • 35. PXF Framework  Gives you ability to read different data types from HDFS – Text files, both compressed and uncompressed – Seqence-files – AVRO-files  Able to read data from external data sources – HBase – Cassandra – Redis  Extensible API Pivotal Confidential–Internal Use Only 35
  • 36. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 36
  • 37. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 37
  • 38. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 38
  • 39. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 39
  • 40. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 40
  • 41. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 41
  • 42. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 42
  • 43. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 43
  • 44. Further Steps  Master server scaling – pool of master servers  New native data storage formats and new native compression algorithms  YARN as resource manager for HAWQ  Dynamic segment allocation / decommission Pivotal Confidential–Internal Use Only 44
  • 46. BUILT FOR THE SPEED OF BUSINESS