SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
A NEW PLATFORM FOR A NEW ERA
SQL et in-memory sur Hadoop
avec Pivotal et HAWQ
Alexandre Vasseur
Jérôme Campo
Field Engineering, Pivotal

© Copyright 2013 Pivotal. All rights reserved.
Pivotal
Spin off d’EMC et VMware
Editeur logiciel
Plus de 1250 employés

Data Science Team

© Copyright 2013 Pivotal. All rights reserved.

Pivotal HD
Hadoop à 1000 noeuds pour la communauté
Ÿ  1000 noeuds, 24 000 cores
Ÿ  48 TB RAM
Ÿ  24 PB (12 000 disques)

Ÿ  Améliorer Hadoop
Ÿ  Valider l’éco système Hadoop à
l’échelle
http://www.analyticsworkbench.com
© Copyright 2013 Pivotal. All rights reserved.
Pivotal Hadoop
HAWQ– Advanced
Database Services
ANSI SQL + Analytics

Pivotal HD
Enterprise
Resource
Management
& Workflow

Xtension
Framework
HBase

Catalog
Services
Dynamic Pipelining

Pig, Hive,
Mahout
Map Reduce

Hadoop Virtualization (HVE)

Yarn

Sqoop

Data Loader

Apache

Pivotal HD Added Value

Configure,
Deploy,
Monitor,
Manage

HDFS

Zookeeper

© Copyright 2013 Pivotal. All rights reserved.

Query
Optimizer

Command
Flume

Center
10 ans de R&D sur la base de données
massivement parallèle
•  Moteur SQL haute performance
–  Multi-petabyte
–  ANSI SQL complet
–  Drivers standardisés et éco-système

•  Accès direct aux formats Hadoop
–  Text, Avro, Hive, HBase, autres formats via API

•  Database massivement parrallèle sur Hadoop
–  Format colonne, compressé, partitionnés, polymorphe
–  Gestion des priorités et des accès
MAD
lib

© Copyright 2013 Pivotal. All rights reserved.

•  In-Database Analytics
–  Bibliothèques statistiques et machine learning
parrallèlisées
–  Accessible via R ou SQL
Fonctionnement de HAWQ
Clients

SELECT beer, price
FROM Bars b, Sells s
WHERE b.name = s.bar
AND b.city = ‘San Francisco’

HAWQ Master Host
Query Parser

JDBC/ODBC
SQL Console

Query Optimizer
HDFS Namenode

HAWQ Segment Host

HAWQ Segment Host

HAWQ Segment Host

Query Executor

Query Executor

Query Executor

HDFS Datanode

HDFS Datanode

HDFS Datanode

© Copyright 2013 Pivotal. All rights reserved.

...
Fonctionnement de HAWQ

Execution Plan

MotionGather

Clients

Projects.beer, s.price
HashJoinb.name = s.bar

HAWQ Master Host

MotionRedist(b.name)

Query Parser

JDBC/ODBC
SQL Console

Query Optimizer
HDFS Namenode

€

s
ScanSells

Filterb.city = 'San Francisco'
b
ScanBars

HAWQ Segment Host

HAWQ Segment Host

HAWQ Segment Host

Query Executor

Query Executor

Query Executor

HDFS Datanode

HDFS Datanode

HDFS Datanode

© Copyright 2013 Pivotal. All rights reserved.

...
Fonctionnement de HAWQ
Clients

HAWQ Master Host
Query Parser

JDBC/ODBC

Query Optimizer
HDFS Namenode

SQL Console

HAWQ Segment Host
MotionGather
Projects.beer, s.price

Query Executor

HAWQ Segment Host
MotionGather
Projects.beer, s.price

HAWQ Segment Host
MotionGather
Projects.beer, s.price

MotionRedist(b.name)

MotionRedist(b.name)

MotionRedist(b.name)
s
ScanSells

Filterb.city = 'San Francisco'

s
ScanSells

Filterb.city = 'San Francisco'

HDFS Datanode

© Copyright 2013 Pivotal. All rights reserved.

Filterb.city = 'San Francisco'
b
ScanBars

b
ScanBars

b
ScanBars

Query Executor

HashJoinb.name = s.bar

HashJoinb.name = s.bar

HashJoinb.name = s.bar
s
ScanSells

Query Executor

HDFS Datanode

HDFS Datanode

...
10 ans de R&D sur les grilles mémoires
NoSQL/NewSQL
Sensor Data / Feeds

Map-Reduce

Analytic Apps

Model
Refresh

Model
Refresh

I/P & O/P
Formatter

Online Apps

HAWQ
GPXF

DW

Native Persistence

External Tables
Re-evaluate
Model

Shared Data - HFiles

© Copyright 2013 Pivotal. All rights reserved.

Re-evaluate
Model

HDFS

ICM
In-memory No/NewSQL sur Hadoop
Ÿ  Bénéfices d’une grille mémoire
–  Données en mémoire quand il le faut
–  Très haute disponibilité, concurrence massive, temps de réponse mémoire

Ÿ  Intégration native Hadoop
–  Eviction / stockage sur HDFS natif
–  Accès à la donnée in-memory ou globale via SQL/NoSQL et HAWQ

© Copyright 2013 Pivotal. All rights reserved.
Tester Pivotal HD
Pivotal HD Single Node VM

Pivotal HD avec Vagrant

Ÿ  Hadoop Stack Components – Pig, Hive,
Hbase, HDFS, Mahout, YARN, MRv2

Ÿ  Installation multi VM avec Virtual Box ou
VMware Workstation/Fusion

Ÿ  HAWQ / PXF
Ÿ  Command Center
Ÿ  DataLoader
Ÿ  Eclipse, Maven, Ant
Ÿ  Retail Data Set
http://gopivotal.com/pivotal-products/data/pivotal-hd#4
http://blog.gopivotal.com/products/in-45-min-set-up-hadoop-pivotal-hd-on-a-multi-vm-cluster-run-test-data

© Copyright 2013 Pivotal. All rights reserved.
Big/Fast Demo – Big Data Workflow
HTTP Pipe

Filter

Transform

Tap

Tap

JSON Field
Extract

JSON Field
Logistic
Extract
Regression
MAD
lib

© Copyright 2013 Pivotal. All rights reserved.

HDFS Sink

Analytic Counter

Analytic Counter
We’re hiring !
avasseur@gopivotal.com
jcampo@gopivotal.com

Merci
© Copyright 2013 Pivotal. All rights reserved.

Mais conteúdo relacionado

Mais procurados

Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopCloudera, Inc.
 
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive QueryInteractive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive QueryAshish Thapliyal
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Cloudera, Inc.
 
Five essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsightFive essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsightAshish Thapliyal
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveJulian Hyde
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraDataWorks Summit
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Data Con LA
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaApache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaDataWorks Summit
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Hadoop / Spark Conference Japan
 
Low Latency SQL on Hadoop - What's best for your cluster
Low Latency SQL on Hadoop - What's best for your clusterLow Latency SQL on Hadoop - What's best for your cluster
Low Latency SQL on Hadoop - What's best for your clusterDataWorks Summit
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDataWorks Summit
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDataWorks Summit
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 

Mais procurados (20)

Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive QueryInteractive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
HDInsight for Architects
HDInsight for ArchitectsHDInsight for Architects
HDInsight for Architects
 
Five essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsightFive essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsight
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaApache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in Alibaba
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
 
Low Latency SQL on Hadoop - What's best for your cluster
Low Latency SQL on Hadoop - What's best for your clusterLow Latency SQL on Hadoop - What's best for your cluster
Low Latency SQL on Hadoop - What's best for your cluster
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 

Destaque

Scientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutScientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutData Science London
 
сказка колобок
сказка колобоксказка колобок
сказка колобокboimilka
 
Bringing back the excitement to data analysis
Bringing back the excitement to data analysisBringing back the excitement to data analysis
Bringing back the excitement to data analysisData Science London
 
ACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayData Science London
 
Going Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersGoing Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersData Science London
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRData Science London
 

Destaque (6)

Scientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutScientific Article Recommendations with Mahout
Scientific Article Recommendations with Mahout
 
сказка колобок
сказка колобоксказка колобок
сказка колобок
 
Bringing back the excitement to data analysis
Bringing back the excitement to data analysisBringing back the excitement to data analysis
Bringing back the excitement to data analysis
 
ACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, Today
 
Going Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersGoing Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook Users
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapR
 

Semelhante a SQL et in-memory sur Hadoop avec Pivotal et HAWQ

Building Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaBuilding Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaAshish Thapliyal
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개Seungdon Choi
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystemtfmailru
 
Pivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache HadoopPivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache Hadoopmarklpollack
 
(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014
(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014
(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014Amazon Web Services
 
Hadoop online training course
Hadoop online  training courseHadoop online  training course
Hadoop online training courseKamal A
 
HAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataHAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataDataWorks Summit
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...Cobus Bernard
 
Hawq meets Hive - DataWorks San Jose 2017
Hawq meets Hive - DataWorks San Jose 2017Hawq meets Hive - DataWorks San Jose 2017
Hawq meets Hive - DataWorks San Jose 2017Alex Diachenko
 
(BIZ301) Getting Started: Running SAP on AWS | AWS re:Invent 2014
(BIZ301) Getting Started: Running SAP on AWS | AWS re:Invent 2014(BIZ301) Getting Started: Running SAP on AWS | AWS re:Invent 2014
(BIZ301) Getting Started: Running SAP on AWS | AWS re:Invent 2014Amazon Web Services
 
SharePoint 2010 Virtualization - Hungarian SharePoint User Group
SharePoint 2010 Virtualization - Hungarian SharePoint User GroupSharePoint 2010 Virtualization - Hungarian SharePoint User Group
SharePoint 2010 Virtualization - Hungarian SharePoint User GroupMichael Noel
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Edureka!
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...nnakasone
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
VMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUG IT
 

Semelhante a SQL et in-memory sur Hadoop avec Pivotal et HAWQ (20)

Pivotal hawq internals
Pivotal hawq internalsPivotal hawq internals
Pivotal hawq internals
 
Building Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaBuilding Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and Kafka
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Pivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache HadoopPivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache Hadoop
 
(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014
(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014
(BIZ401) Kellogg Company Runs SAP in a Hybrid Environment | AWS re:Invent 2014
 
Hadoop online training course
Hadoop online  training courseHadoop online  training course
Hadoop online training course
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
HAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataHAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged Data
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
 
Hawq meets Hive - DataWorks San Jose 2017
Hawq meets Hive - DataWorks San Jose 2017Hawq meets Hive - DataWorks San Jose 2017
Hawq meets Hive - DataWorks San Jose 2017
 
Uotm workshop
Uotm workshopUotm workshop
Uotm workshop
 
(BIZ301) Getting Started: Running SAP on AWS | AWS re:Invent 2014
(BIZ301) Getting Started: Running SAP on AWS | AWS re:Invent 2014(BIZ301) Getting Started: Running SAP on AWS | AWS re:Invent 2014
(BIZ301) Getting Started: Running SAP on AWS | AWS re:Invent 2014
 
SharePoint 2010 Virtualization - Hungarian SharePoint User Group
SharePoint 2010 Virtualization - Hungarian SharePoint User GroupSharePoint 2010 Virtualization - Hungarian SharePoint User Group
SharePoint 2010 Virtualization - Hungarian SharePoint User Group
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
VMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware Hadoop
 

Mais de Modern Data Stack France

Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupModern Data Stack France
 
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Modern Data Stack France
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Modern Data Stack France
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...Modern Data Stack France
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with sparkModern Data Stack France
 
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France -  20160114 industrialisation_process_big_data CanalPlusHUG France -  20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlusModern Data Stack France
 
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)Modern Data Stack France
 
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Modern Data Stack France
 
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Modern Data Stack France
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015Modern Data Stack France
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandationModern Data Stack France
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Modern Data Stack France
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielModern Data Stack France
 

Mais de Modern Data Stack France (20)

Stash - Data FinOPS
Stash - Data FinOPSStash - Data FinOPS
Stash - Data FinOPS
 
Vue d'ensemble Dremio
Vue d'ensemble DremioVue d'ensemble Dremio
Vue d'ensemble Dremio
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark Meetup
 
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
Hug janvier 2016 -EDF
Hug   janvier 2016 -EDFHug   janvier 2016 -EDF
Hug janvier 2016 -EDF
 
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France -  20160114 industrialisation_process_big_data CanalPlusHUG France -  20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlus
 
Hugfr SPARK & RIAK -20160114_hug_france
Hugfr  SPARK & RIAK -20160114_hug_franceHugfr  SPARK & RIAK -20160114_hug_france
Hugfr SPARK & RIAK -20160114_hug_france
 
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
 
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
 
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
 
Spark dataframe
Spark dataframeSpark dataframe
Spark dataframe
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandation
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
 
Spark meetup at viadeo
Spark meetup at viadeoSpark meetup at viadeo
Spark meetup at viadeo
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
 

SQL et in-memory sur Hadoop avec Pivotal et HAWQ

  • 1. A NEW PLATFORM FOR A NEW ERA
  • 2. SQL et in-memory sur Hadoop avec Pivotal et HAWQ Alexandre Vasseur Jérôme Campo Field Engineering, Pivotal © Copyright 2013 Pivotal. All rights reserved.
  • 3. Pivotal Spin off d’EMC et VMware Editeur logiciel Plus de 1250 employés Data Science Team © Copyright 2013 Pivotal. All rights reserved. Pivotal HD
  • 4. Hadoop à 1000 noeuds pour la communauté Ÿ  1000 noeuds, 24 000 cores Ÿ  48 TB RAM Ÿ  24 PB (12 000 disques) Ÿ  Améliorer Hadoop Ÿ  Valider l’éco système Hadoop à l’échelle http://www.analyticsworkbench.com © Copyright 2013 Pivotal. All rights reserved.
  • 5. Pivotal Hadoop HAWQ– Advanced Database Services ANSI SQL + Analytics Pivotal HD Enterprise Resource Management & Workflow Xtension Framework HBase Catalog Services Dynamic Pipelining Pig, Hive, Mahout Map Reduce Hadoop Virtualization (HVE) Yarn Sqoop Data Loader Apache Pivotal HD Added Value Configure, Deploy, Monitor, Manage HDFS Zookeeper © Copyright 2013 Pivotal. All rights reserved. Query Optimizer Command Flume Center
  • 6. 10 ans de R&D sur la base de données massivement parallèle •  Moteur SQL haute performance –  Multi-petabyte –  ANSI SQL complet –  Drivers standardisés et éco-système •  Accès direct aux formats Hadoop –  Text, Avro, Hive, HBase, autres formats via API •  Database massivement parrallèle sur Hadoop –  Format colonne, compressé, partitionnés, polymorphe –  Gestion des priorités et des accès MAD lib © Copyright 2013 Pivotal. All rights reserved. •  In-Database Analytics –  Bibliothèques statistiques et machine learning parrallèlisées –  Accessible via R ou SQL
  • 7. Fonctionnement de HAWQ Clients SELECT beer, price FROM Bars b, Sells s WHERE b.name = s.bar AND b.city = ‘San Francisco’ HAWQ Master Host Query Parser JDBC/ODBC SQL Console Query Optimizer HDFS Namenode HAWQ Segment Host HAWQ Segment Host HAWQ Segment Host Query Executor Query Executor Query Executor HDFS Datanode HDFS Datanode HDFS Datanode © Copyright 2013 Pivotal. All rights reserved. ...
  • 8. Fonctionnement de HAWQ Execution Plan MotionGather Clients Projects.beer, s.price HashJoinb.name = s.bar HAWQ Master Host MotionRedist(b.name) Query Parser JDBC/ODBC SQL Console Query Optimizer HDFS Namenode € s ScanSells Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Host HAWQ Segment Host HAWQ Segment Host Query Executor Query Executor Query Executor HDFS Datanode HDFS Datanode HDFS Datanode © Copyright 2013 Pivotal. All rights reserved. ...
  • 9. Fonctionnement de HAWQ Clients HAWQ Master Host Query Parser JDBC/ODBC Query Optimizer HDFS Namenode SQL Console HAWQ Segment Host MotionGather Projects.beer, s.price Query Executor HAWQ Segment Host MotionGather Projects.beer, s.price HAWQ Segment Host MotionGather Projects.beer, s.price MotionRedist(b.name) MotionRedist(b.name) MotionRedist(b.name) s ScanSells Filterb.city = 'San Francisco' s ScanSells Filterb.city = 'San Francisco' HDFS Datanode © Copyright 2013 Pivotal. All rights reserved. Filterb.city = 'San Francisco' b ScanBars b ScanBars b ScanBars Query Executor HashJoinb.name = s.bar HashJoinb.name = s.bar HashJoinb.name = s.bar s ScanSells Query Executor HDFS Datanode HDFS Datanode ...
  • 10. 10 ans de R&D sur les grilles mémoires NoSQL/NewSQL Sensor Data / Feeds Map-Reduce Analytic Apps Model Refresh Model Refresh I/P & O/P Formatter Online Apps HAWQ GPXF DW Native Persistence External Tables Re-evaluate Model Shared Data - HFiles © Copyright 2013 Pivotal. All rights reserved. Re-evaluate Model HDFS ICM
  • 11. In-memory No/NewSQL sur Hadoop Ÿ  Bénéfices d’une grille mémoire –  Données en mémoire quand il le faut –  Très haute disponibilité, concurrence massive, temps de réponse mémoire Ÿ  Intégration native Hadoop –  Eviction / stockage sur HDFS natif –  Accès à la donnée in-memory ou globale via SQL/NoSQL et HAWQ © Copyright 2013 Pivotal. All rights reserved.
  • 12. Tester Pivotal HD Pivotal HD Single Node VM Pivotal HD avec Vagrant Ÿ  Hadoop Stack Components – Pig, Hive, Hbase, HDFS, Mahout, YARN, MRv2 Ÿ  Installation multi VM avec Virtual Box ou VMware Workstation/Fusion Ÿ  HAWQ / PXF Ÿ  Command Center Ÿ  DataLoader Ÿ  Eclipse, Maven, Ant Ÿ  Retail Data Set http://gopivotal.com/pivotal-products/data/pivotal-hd#4 http://blog.gopivotal.com/products/in-45-min-set-up-hadoop-pivotal-hd-on-a-multi-vm-cluster-run-test-data © Copyright 2013 Pivotal. All rights reserved.
  • 13. Big/Fast Demo – Big Data Workflow HTTP Pipe Filter Transform Tap Tap JSON Field Extract JSON Field Logistic Extract Regression MAD lib © Copyright 2013 Pivotal. All rights reserved. HDFS Sink Analytic Counter Analytic Counter
  • 14. We’re hiring ! avasseur@gopivotal.com jcampo@gopivotal.com Merci © Copyright 2013 Pivotal. All rights reserved.