Enviar pesquisa
Carregar
2013 march 26_thug_etl_cdc_talking_points
•
4 gostaram
•
986 visualizações
Adam Muise
Seguir
Some diagrams for our roundtable on modern ETL/CDC with Hadoop and other new technologies
Leia menos
Leia mais
Vista de apresentação de diapositivos
Denunciar
Compartilhar
Vista de apresentação de diapositivos
Denunciar
Compartilhar
1 de 12
Recomendados
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
Gwen (Chen) Shapira
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
Adam Muise
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
Adam Muise
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Cloudera, Inc.
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
Hortonworks
Next Generation Hadoop Introduction
Next Generation Hadoop Introduction
Adam Muise
Filling the Data Lake
Filling the Data Lake
DataWorks Summit/Hadoop Summit
NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010
Cloudera, Inc.
Recomendados
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
Gwen (Chen) Shapira
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
Adam Muise
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
Adam Muise
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Cloudera, Inc.
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
Hortonworks
Next Generation Hadoop Introduction
Next Generation Hadoop Introduction
Adam Muise
Filling the Data Lake
Filling the Data Lake
DataWorks Summit/Hadoop Summit
NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010
Cloudera, Inc.
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
NoSQLmatters
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Cloudera, Inc.
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Kolja Manuel Rödel
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
DataWorks Summit
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
joshwills
Semantic Web For Dummies
Semantic Web For Dummies
Jeffrey T. Pollock
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
DataWorks Summit
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's Perspective
GeekNightHyderabad
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
DataWorks Summit/Hadoop Summit
Planing and optimizing data lake architecture
Planing and optimizing data lake architecture
Milos Milovanovic
Data-In-Motion Unleashed
Data-In-Motion Unleashed
DataWorks Summit
Big Data Architecture and Deployment
Big Data Architecture and Deployment
Cisco Canada
Implementing and running a secure datalake from the trenches
Implementing and running a secure datalake from the trenches
DataWorks Summit
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
BMC Software
The EDW Ecosystem
The EDW Ecosystem
DataWorks Summit/Hadoop Summit
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
ETL big data with apache hadoop
ETL big data with apache hadoop
Maulik Thaker
Hadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
markgrover
Big Data Platforms: An Overview
Big Data Platforms: An Overview
C. Scyphers
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
Jonathan Seidman
Why hadoop for data science?
Why hadoop for data science?
Hortonworks
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
jaxconf
Mais conteúdo relacionado
Mais procurados
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
NoSQLmatters
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Cloudera, Inc.
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Kolja Manuel Rödel
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
DataWorks Summit
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
joshwills
Semantic Web For Dummies
Semantic Web For Dummies
Jeffrey T. Pollock
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
DataWorks Summit
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's Perspective
GeekNightHyderabad
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
DataWorks Summit/Hadoop Summit
Planing and optimizing data lake architecture
Planing and optimizing data lake architecture
Milos Milovanovic
Data-In-Motion Unleashed
Data-In-Motion Unleashed
DataWorks Summit
Big Data Architecture and Deployment
Big Data Architecture and Deployment
Cisco Canada
Implementing and running a secure datalake from the trenches
Implementing and running a secure datalake from the trenches
DataWorks Summit
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
BMC Software
The EDW Ecosystem
The EDW Ecosystem
DataWorks Summit/Hadoop Summit
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
ETL big data with apache hadoop
ETL big data with apache hadoop
Maulik Thaker
Hadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
markgrover
Big Data Platforms: An Overview
Big Data Platforms: An Overview
C. Scyphers
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
Jonathan Seidman
Mais procurados
(20)
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
Semantic Web For Dummies
Semantic Web For Dummies
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's Perspective
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
Planing and optimizing data lake architecture
Planing and optimizing data lake architecture
Data-In-Motion Unleashed
Data-In-Motion Unleashed
Big Data Architecture and Deployment
Big Data Architecture and Deployment
Implementing and running a secure datalake from the trenches
Implementing and running a secure datalake from the trenches
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
The EDW Ecosystem
The EDW Ecosystem
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
ETL big data with apache hadoop
ETL big data with apache hadoop
Hadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
Big Data Platforms: An Overview
Big Data Platforms: An Overview
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
Semelhante a 2013 march 26_thug_etl_cdc_talking_points
Why hadoop for data science?
Why hadoop for data science?
Hortonworks
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
jaxconf
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
Slim Baltagi
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
Hortonworks
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
Hadoop Overview
Hadoop Overview
EMC
Agile analytics applications on hadoop
Agile analytics applications on hadoop
Hortonworks
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
russell_jurney
Introduction to Hadoop
Introduction to Hadoop
POSSCON
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
Jeffrey T. Pollock
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
Big Data Spain
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Hortonworks
Hortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
Mark Ginnebaugh
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
Hortonworks
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
EMC config Hadoop
EMC config Hadoop
solarisyougood
201305 hadoop jpl-v3
201305 hadoop jpl-v3
Eric Baldeschwieler
Hadoop data-lake-white-paper
Hadoop data-lake-white-paper
Supratim Ray
Semelhante a 2013 march 26_thug_etl_cdc_talking_points
(20)
Why hadoop for data science?
Why hadoop for data science?
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
Hadoop Overview
Hadoop Overview
Agile analytics applications on hadoop
Agile analytics applications on hadoop
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
Introduction to Hadoop
Introduction to Hadoop
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Hortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
EMC config Hadoop
EMC config Hadoop
201305 hadoop jpl-v3
201305 hadoop jpl-v3
Hadoop data-lake-white-paper
Hadoop data-lake-white-paper
Mais de Adam Muise
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
Adam Muise
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
Adam Muise
Hadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of Hadoop
Adam Muise
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1
Adam Muise
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
Adam Muise
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop
Adam Muise
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
Adam Muise
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
Adam Muise
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
Adam Muise
2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mda
Adam Muise
2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - Hadoop
Adam Muise
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Adam Muise
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMAC
Adam Muise
What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013
Adam Muise
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
Adam Muise
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
Adam Muise
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog
Adam Muise
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
Adam Muise
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
Adam Muise
hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohug
Adam Muise
Mais de Adam Muise
(20)
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
Hadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of Hadoop
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mda
2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohug
2013 march 26_thug_etl_cdc_talking_points
1.
Data Integration in
2013: A working session Adam Muise March 26 2013 Note: This deck is purposely sparse. Want value? Join the conversation in the Toronto Hadoop User Group: http://www.meetup.com/TorontoHUG/ © Hortonworks Inc. 2012
2.
Proposed Agenda •
Introductions • Discuss common Data Integration Patterns • Round-table of User Group Member CDC/ETL Use Cases • New Data Integration Solutions: A change from the Old Guard: – Hadoop and the Data Lake – Streaming (+ Hadoop) – Data Lake Governance / Management (InfoTrellis) – Databus (LinkedIn) Page 2 © Hortonworks Inc. 2012
3.
Introductions Who let you
in? Page 3 © Hortonworks Inc. 2012
4.
General Data Integration
Patterns • Enterprise Application Integration* – Metadata lookup – Validation – Extra-app communication • Enterprise Service Bus (SOA, Message Bus/Hub)* • Federation* – Bridging multiple databases with a query layer – Eg: Composite • Extract Transform Load (ETL)* – Collection – Aggregation – Format/Schema transformation • Data Lake – Landing Zone for multiple datasets in one store – Mixed schema, often raw structured/unstructured data – Eg: Hadoop * Source: Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture, Anthony David Giordano, 2010, IBM Press. Page 4 © Hortonworks Inc. 2012
5.
Use Case Roundtable Data
that’s keeping you up at night… Page 5 © Hortonworks Inc. 2012
6.
Scotia iTrade: Geoffrey
Li Page 6 © Hortonworks Inc. 2012
7.
New Data Integration
Solutions Fresh Ideas to new and old problems… Page 7 © Hortonworks Inc. 2012
8.
Hadoop: The Data
Lake Publish Event Signal Data Transformation Model/ Transform & Apply Metadata Aggregate Publish Exchange Explore Visualize Extract & Report Load Analyze Page 8 © Hortonworks Inc. 2012
9.
Streaming & Hadoop http://developer.yahoo.com/blogs/ydn/posts/2013/02/storm-and-hadoop-convergence-of-big-data-and-low-latency-processing/
Page 9 © Hortonworks Inc. 2012
10.
Streaming & Hadoop http://developer.yahoo.com/blogs/ydn/posts/2013/02/storm-and-hadoop-convergence-of-big-data-and-low-latency-processing/
Page 10 © Hortonworks Inc. 2012
11.
DataBus (LinkedIn) Databus is
a low latency change capture system which has become an integral part of LinkedIn’s data processing pipeline. Databus addresses a fundamental requirement to reliably capture, flow and processes primary data changes. Databus provides the following features: 1. Isolation between sources and consumers 2. Guaranteed in order and at least once delivery with high availability 3. Consumption from an arbitrary time point in the change stream including full bootstrap capability of the entire data. 4. Partitioned consumption 5. Source consistency preservation https://github.com/linkedin/databus/wiki Page 11 © Hortonworks Inc. 2012
12.
DataBus (LinkedIn) https://github.com/linkedin/databus/wiki
Page 12 © Hortonworks Inc. 2012