SlideShare uma empresa Scribd logo
1 de 28
1
Hadoop: Extending Your Data Warehouse
Tony Baer | Principal Analyst, Ovum
Moderated by Matt Brandwein | Product Marketing Manager, Cloudera
May 9, 2013
Welcome to the webinar!
• All lines are muted
• Q&A after the presentation
• Ask questions at any time by typing them in the
“Questions” pane on your WebEx panel
• Recording of this webinar will be available
on-demand at cloudera.com
• Join the conversation on Twitter:
@cloudera @TonyBaer #EDWHadoop
2
Who is Cloudera?
3
What the Enterprise
Requires
 Only 100% open source
Hadoop-based platform
with both batch and real-
time processing engines,
enterprise-ready with
native high availability
 Suite of system and data
management software
 Comprehensive support
and consulting services
 Broadest Hadoop training
and certification programs
Extensive Partner
Ecosystem
 Over 600 partners across
hardware, software and
services
The Leader in
Big Data
Management
 Deliver a revolutionary
data management
platform powered by
Apache Hadoop
 World’s leading
commercial vendor of
Apache Hadoop
 Enable organizations to
improve operational
efficiency and Ask
Bigger Questions of all
their data
Customers & Users
Across Industries
 More production
deployments than all other
vendors combined
© Copyright Ovum. All rights reserved. Ovum is a subsidiary of Informa plc.4
Hadoop: Extending your Data
Warehouse
Tony Baer
tony.baer@ovum.com
May 9, 2013
Twitter: @TonyBaer
© Copyright Ovum. All rights reserved. Ovum is an Informa business.5
 The BI Bottleneck
 Hadoop & Enterprise Data Warehousing strategy
 How Cloudera supports Hadoop as extended DW
Agenda
© Copyright Ovum. All rights reserved. Ovum is an Informa business.6
Sources Target(s)Staging
Server
Extract Transform Load
Data
Marts
DW
Traditional BI/Data warehousing architecture
ETL Tool
© Copyright Ovum. All rights reserved. Ovum is an Informa business.7
 DWs conceived for MBytes/GBytes of structured data
 Data structured based on expected queries & analytics
 Multiple tiers to separate distinct workloads
 OLTP – ongoing, shallow interactions, simple queries
 Transform – batch-oriented, IOPS-intensive
 BI/analytics – data-intensive, spikey
 Reduced, eliminated impact on OLTP
 More complex architecture, more tradeoffs
DW —
The base case
© Copyright Ovum. All rights reserved. Ovum is an Informa business.8
EDW hitting the wall
 Data growing in volume & complexity
 Use cases require more, richer data
 Customer retention
 Operational Efficiency
 Risk Mitigation
 Data retention mandates/policies
forcing hard decisions
 ETL bursting batch windows
 EDWs straining to accommodate
volumes, varieties of data
© Copyright Ovum. All rights reserved. Ovum is an Informa business.9
Sources Target(s)
Extract Load/Transform
DW
Data
Marts
The ELT pattern
© Copyright Ovum. All rights reserved. Ovum is an Informa business.10
The benefits – and limits – of ELT
 Pros
 Fewer data movements
 Flatter architecture
 Reduced errors with fewer data
movements
 Cons
 Transform vs. analytic workload
tradeoffs
 SLAs jeopardized
 Triggers arms race for more
infrastructure
Processing
Times
Infrastructure
CostsData
Volumes
Assuming constant SLAs
© Copyright Ovum. All rights reserved. Ovum is an Informa business.11
Enterprise DWs –
Size has its limits
 SLAs hit the wall
 Software licensing costs
 PBytes @ $20k - $50k/TByte get
$$$$$$
 Managing/transforming new data
types consumes resource
© Copyright Ovum. All rights reserved. Ovum is an Informa business.12
But what if...
 You don’t have to worry about batch
windows
 You don’t have to trade off
transformation vs. analytic processing
cycles
 You can control s/w license cost
escalation
 You can keep that archived data live
 You can more readily consume new
types of data & keep your analytic
options open
© Copyright Ovum. All rights reserved. Ovum is an Informa business.13
 The BI Bottleneck
 Hadoop & Enterprise Data Warehousing strategy
 How Cloudera supports Hadoop as extended DW
Agenda
© Copyright Ovum. All rights reserved. Ovum is an Informa business.14
Introducing Hadoop
 Originally, data processing framework for
solving unique Internet-scale problems
 Based on Google File System (GFS) &
MapReduce
 Apache Hadoop community emerged to
develop platform for wider scale adoption
 FS, telcos, retail media discovered Hadoop’s
benefits
© Copyright Ovum. All rights reserved. Ovum is an Informa business.15
Hadoop benefits
Scalability
Near linear
performance up to
1000s of nodes
Cost Flexibility
Leverages commodity
h/w & open source s/w
Versatility with data,
analytics & operation
© Copyright Ovum. All rights reserved. Ovum is an Informa business.16
Hadoop’s trump card —
Flexibility
 Accommodates all kinds of data
 Accommodates multiple
workloads
 Keeps your options open
 Extensibility
 Life beyond MapReduce
 Many personalities
 Best of both worlds
 Convergence with SQL
Get the best of both worlds
© Copyright Ovum. All rights reserved. Ovum is an Informa business.17
Sources Target
Extract Load/Transform
Data
Marts
Existing
DW/Data Mart
environment
Hadoop
DW
Hadoop as Data transformation platform
© Copyright Ovum. All rights reserved. Ovum is an Informa business.18
Why Hadoop as your data transformation platform?
 Inexpensive cycles/storage
 Low-cost platform reduces or eliminates tradeoff contingencies
 No more transformation vs. analytics choice
 Keep your archive active
 Flexible division of labor
 Data can remain in Hadoop or moved to SQL
 Raw data sits alongside transformed data
© Copyright Ovum. All rights reserved. Ovum is an Informa business.19
Why Hadoop as extension to your DW?
 Efficient division of labor
 Run time-consuming, resource-intensive analytic workloads inside
Hadoop
 Routine query, analytics, & reporting in SQL DW or data mart
 Query Hadoop directly
 Most commercial BI tools read Hive metadata
 Query Hadoop interactively
 Emerging MapReduce alternatives supporting interactive query
© Copyright Ovum. All rights reserved. Ovum is an Informa business.20
 The BI Bottleneck
 Hadoop & Enterprise Data Warehousing strategy
 How Cloudera supports Hadoop as extended DW
Agenda
© Copyright Ovum. All rights reserved. Ovum is an Informa business.21
Cloudera supports SQL convergence
 Partners with leading ETL, BI, and Data warehousing platform & tool
providers
 Connect Hadoop & SQL platforms
 Emerging trend: BI, ETL tools are working natively inside Hadoop
 Introducing Impala
 Brings high-performance interactive SQL inside Hadoop
 Turns Hadoop into an MPP SQL analytic data target
 Extends, doesn't replace your SQL EDW or data mart
 Makes your DW strategy more flexible, iterative
© Copyright Ovum. All rights reserved. Ovum is an Informa business.22
Taming Hadoop
 Cloudera Manager
 Automates deployment and health monitoring
 Automates Hadoop configuration
 New side-by-side deployment support
 Cloudera Navigator
 New feature of Cloudera Manager
 Tracks data utilization activity from HDFS, Hive & HBase
 Stepping stone for data security/stewardship… watch this space
 Backup & Disaster Recovery (BDR)
 New feature to automate recovery workflows
© Copyright Ovum. All rights reserved. Ovum is an Informa business.23
Hadoop –
Takeaways
 Economical platform for offloading data transformation cycles
 Extends enterprise analytics
 Hadoop & SQL are converging– broadening your analytic options
 Hadoop won’t replace your EDW, but will take more of the workload
 Cloudera actively broadening CDH to support & extend your EDW
 SQL convergence
 Platform manageability
 Data security & stewardship
Impala: Cloudera’s Design Strategy
24
Storage
Integration
Resource Management
Metadata
Batch
Processing
MAPREDUCE,
HIVE & PIG
…
Interactive
SQL
IMPALA
Math
Machine
Learning, Anal
ytics
HDFS HBase
TEXT, RCFILE, PARQUET, AVRO, ETC. RECORDS
Engines
Complement MapReduce with
interactive MPP SQL engine
One pool of data
One metadata model
One security framework
One set of system resources
100% open source
An Integrated Part of the Hadoop Platform
Impala Use Cases
25
Interactive BI/analytics on more data
Asking new questions
Data processing with tight SLAs
Query-able archive w/ full fidelity
Cost-effective, ad hoc query environment that
offloads the data warehouse for:
Leading BI tools work with Impala
26
Questions?
27
• Type in the “Questions” panel
• Tweet @cloudera #EDWHadoop
• Recording will be available
on-demand at cloudera.com
• Contact us:
tony.baer@ovum.com
Twitter: @TonyBaer
mbrandwein@cloudera.com
Twitter: @MattBrandwein
Thank you for attending!
Try Cloudera today
cloudera.com/downloads
Learn more about Impala
cloudera.com/impala
Get Hadoop Training
university.cloudera.com
Ready to go?
Check out Cloudera Quickstart
cloudera.com/quickstart
Hadoop: Extending your Data Warehouse

Mais conteúdo relacionado

Mais procurados

Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraCloudera, Inc.
 
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data HubCloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data HubCloudera, Inc.
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseOsama Hussein
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015DataWorks Summit
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017Jeremy Maranitch
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalDiego Alberto Tamayo
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
Keynote: The Journey to Pervasive Analytics
Keynote: The Journey to Pervasive AnalyticsKeynote: The Journey to Pervasive Analytics
Keynote: The Journey to Pervasive AnalyticsCloudera, Inc.
 

Mais procurados (20)

Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data HubCloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data Warehouse
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
Data lake
Data lakeData lake
Data lake
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Rob Bearden Keynote Hadoop Summit San Jose
Rob Bearden Keynote Hadoop Summit San JoseRob Bearden Keynote Hadoop Summit San Jose
Rob Bearden Keynote Hadoop Summit San Jose
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Keynote: The Journey to Pervasive Analytics
Keynote: The Journey to Pervasive AnalyticsKeynote: The Journey to Pervasive Analytics
Keynote: The Journey to Pervasive Analytics
 

Semelhante a Hadoop: Extending your Data Warehouse

Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...TheInevitableCloud
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderainevitablecloud
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Stefan Lipp
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Pactera_US
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksHortonworks
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...Big Data Spain
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Capgemini Data Warehouse Optimization Using Hadoop
Capgemini Data Warehouse Optimization Using HadoopCapgemini Data Warehouse Optimization Using Hadoop
Capgemini Data Warehouse Optimization Using HadoopAppfluent Technology
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Barijaxconf
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emcTaldor Group
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...EMC
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 

Semelhante a Hadoop: Extending your Data Warehouse (20)

Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
 
Haven 2 0
Haven 2 0 Haven 2 0
Haven 2 0
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Capgemini Data Warehouse Optimization Using Hadoop
Capgemini Data Warehouse Optimization Using HadoopCapgemini Data Warehouse Optimization Using Hadoop
Capgemini Data Warehouse Optimization Using Hadoop
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emc
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 

Mais de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Hadoop: Extending your Data Warehouse

  • 1. 1 Hadoop: Extending Your Data Warehouse Tony Baer | Principal Analyst, Ovum Moderated by Matt Brandwein | Product Marketing Manager, Cloudera May 9, 2013
  • 2. Welcome to the webinar! • All lines are muted • Q&A after the presentation • Ask questions at any time by typing them in the “Questions” pane on your WebEx panel • Recording of this webinar will be available on-demand at cloudera.com • Join the conversation on Twitter: @cloudera @TonyBaer #EDWHadoop 2
  • 3. Who is Cloudera? 3 What the Enterprise Requires  Only 100% open source Hadoop-based platform with both batch and real- time processing engines, enterprise-ready with native high availability  Suite of system and data management software  Comprehensive support and consulting services  Broadest Hadoop training and certification programs Extensive Partner Ecosystem  Over 600 partners across hardware, software and services The Leader in Big Data Management  Deliver a revolutionary data management platform powered by Apache Hadoop  World’s leading commercial vendor of Apache Hadoop  Enable organizations to improve operational efficiency and Ask Bigger Questions of all their data Customers & Users Across Industries  More production deployments than all other vendors combined
  • 4. © Copyright Ovum. All rights reserved. Ovum is a subsidiary of Informa plc.4 Hadoop: Extending your Data Warehouse Tony Baer tony.baer@ovum.com May 9, 2013 Twitter: @TonyBaer
  • 5. © Copyright Ovum. All rights reserved. Ovum is an Informa business.5  The BI Bottleneck  Hadoop & Enterprise Data Warehousing strategy  How Cloudera supports Hadoop as extended DW Agenda
  • 6. © Copyright Ovum. All rights reserved. Ovum is an Informa business.6 Sources Target(s)Staging Server Extract Transform Load Data Marts DW Traditional BI/Data warehousing architecture ETL Tool
  • 7. © Copyright Ovum. All rights reserved. Ovum is an Informa business.7  DWs conceived for MBytes/GBytes of structured data  Data structured based on expected queries & analytics  Multiple tiers to separate distinct workloads  OLTP – ongoing, shallow interactions, simple queries  Transform – batch-oriented, IOPS-intensive  BI/analytics – data-intensive, spikey  Reduced, eliminated impact on OLTP  More complex architecture, more tradeoffs DW — The base case
  • 8. © Copyright Ovum. All rights reserved. Ovum is an Informa business.8 EDW hitting the wall  Data growing in volume & complexity  Use cases require more, richer data  Customer retention  Operational Efficiency  Risk Mitigation  Data retention mandates/policies forcing hard decisions  ETL bursting batch windows  EDWs straining to accommodate volumes, varieties of data
  • 9. © Copyright Ovum. All rights reserved. Ovum is an Informa business.9 Sources Target(s) Extract Load/Transform DW Data Marts The ELT pattern
  • 10. © Copyright Ovum. All rights reserved. Ovum is an Informa business.10 The benefits – and limits – of ELT  Pros  Fewer data movements  Flatter architecture  Reduced errors with fewer data movements  Cons  Transform vs. analytic workload tradeoffs  SLAs jeopardized  Triggers arms race for more infrastructure Processing Times Infrastructure CostsData Volumes Assuming constant SLAs
  • 11. © Copyright Ovum. All rights reserved. Ovum is an Informa business.11 Enterprise DWs – Size has its limits  SLAs hit the wall  Software licensing costs  PBytes @ $20k - $50k/TByte get $$$$$$  Managing/transforming new data types consumes resource
  • 12. © Copyright Ovum. All rights reserved. Ovum is an Informa business.12 But what if...  You don’t have to worry about batch windows  You don’t have to trade off transformation vs. analytic processing cycles  You can control s/w license cost escalation  You can keep that archived data live  You can more readily consume new types of data & keep your analytic options open
  • 13. © Copyright Ovum. All rights reserved. Ovum is an Informa business.13  The BI Bottleneck  Hadoop & Enterprise Data Warehousing strategy  How Cloudera supports Hadoop as extended DW Agenda
  • 14. © Copyright Ovum. All rights reserved. Ovum is an Informa business.14 Introducing Hadoop  Originally, data processing framework for solving unique Internet-scale problems  Based on Google File System (GFS) & MapReduce  Apache Hadoop community emerged to develop platform for wider scale adoption  FS, telcos, retail media discovered Hadoop’s benefits
  • 15. © Copyright Ovum. All rights reserved. Ovum is an Informa business.15 Hadoop benefits Scalability Near linear performance up to 1000s of nodes Cost Flexibility Leverages commodity h/w & open source s/w Versatility with data, analytics & operation
  • 16. © Copyright Ovum. All rights reserved. Ovum is an Informa business.16 Hadoop’s trump card — Flexibility  Accommodates all kinds of data  Accommodates multiple workloads  Keeps your options open  Extensibility  Life beyond MapReduce  Many personalities  Best of both worlds  Convergence with SQL Get the best of both worlds
  • 17. © Copyright Ovum. All rights reserved. Ovum is an Informa business.17 Sources Target Extract Load/Transform Data Marts Existing DW/Data Mart environment Hadoop DW Hadoop as Data transformation platform
  • 18. © Copyright Ovum. All rights reserved. Ovum is an Informa business.18 Why Hadoop as your data transformation platform?  Inexpensive cycles/storage  Low-cost platform reduces or eliminates tradeoff contingencies  No more transformation vs. analytics choice  Keep your archive active  Flexible division of labor  Data can remain in Hadoop or moved to SQL  Raw data sits alongside transformed data
  • 19. © Copyright Ovum. All rights reserved. Ovum is an Informa business.19 Why Hadoop as extension to your DW?  Efficient division of labor  Run time-consuming, resource-intensive analytic workloads inside Hadoop  Routine query, analytics, & reporting in SQL DW or data mart  Query Hadoop directly  Most commercial BI tools read Hive metadata  Query Hadoop interactively  Emerging MapReduce alternatives supporting interactive query
  • 20. © Copyright Ovum. All rights reserved. Ovum is an Informa business.20  The BI Bottleneck  Hadoop & Enterprise Data Warehousing strategy  How Cloudera supports Hadoop as extended DW Agenda
  • 21. © Copyright Ovum. All rights reserved. Ovum is an Informa business.21 Cloudera supports SQL convergence  Partners with leading ETL, BI, and Data warehousing platform & tool providers  Connect Hadoop & SQL platforms  Emerging trend: BI, ETL tools are working natively inside Hadoop  Introducing Impala  Brings high-performance interactive SQL inside Hadoop  Turns Hadoop into an MPP SQL analytic data target  Extends, doesn't replace your SQL EDW or data mart  Makes your DW strategy more flexible, iterative
  • 22. © Copyright Ovum. All rights reserved. Ovum is an Informa business.22 Taming Hadoop  Cloudera Manager  Automates deployment and health monitoring  Automates Hadoop configuration  New side-by-side deployment support  Cloudera Navigator  New feature of Cloudera Manager  Tracks data utilization activity from HDFS, Hive & HBase  Stepping stone for data security/stewardship… watch this space  Backup & Disaster Recovery (BDR)  New feature to automate recovery workflows
  • 23. © Copyright Ovum. All rights reserved. Ovum is an Informa business.23 Hadoop – Takeaways  Economical platform for offloading data transformation cycles  Extends enterprise analytics  Hadoop & SQL are converging– broadening your analytic options  Hadoop won’t replace your EDW, but will take more of the workload  Cloudera actively broadening CDH to support & extend your EDW  SQL convergence  Platform manageability  Data security & stewardship
  • 24. Impala: Cloudera’s Design Strategy 24 Storage Integration Resource Management Metadata Batch Processing MAPREDUCE, HIVE & PIG … Interactive SQL IMPALA Math Machine Learning, Anal ytics HDFS HBase TEXT, RCFILE, PARQUET, AVRO, ETC. RECORDS Engines Complement MapReduce with interactive MPP SQL engine One pool of data One metadata model One security framework One set of system resources 100% open source An Integrated Part of the Hadoop Platform
  • 25. Impala Use Cases 25 Interactive BI/analytics on more data Asking new questions Data processing with tight SLAs Query-able archive w/ full fidelity Cost-effective, ad hoc query environment that offloads the data warehouse for:
  • 26. Leading BI tools work with Impala 26
  • 27. Questions? 27 • Type in the “Questions” panel • Tweet @cloudera #EDWHadoop • Recording will be available on-demand at cloudera.com • Contact us: tony.baer@ovum.com Twitter: @TonyBaer mbrandwein@cloudera.com Twitter: @MattBrandwein Thank you for attending! Try Cloudera today cloudera.com/downloads Learn more about Impala cloudera.com/impala Get Hadoop Training university.cloudera.com Ready to go? Check out Cloudera Quickstart cloudera.com/quickstart

Notas do Editor

  1. The rationale for multi-tiered DW architecture
  2. EDWs are straining to keep up with new demands being placed on them. Data volumes are snowballing and increasing familiar analytic problems are consuming new forms of data.Customer retention– for mass markets, social networks are generating new insights on what customers really think, and who is influencing them. For telcos, preventing customer churn goes well beyond dropped calls. Internet, IM, text, and yes… email… are becoming the bulk of mobile carrier traffic, multiplying the volumes and types of log files that must be dissected to understand the customer experienceOperational efficiency means tapping into the Internet of things – M2M– in addition to traditional OLTP systems for logistics providers to delver goods on time, for airlines to efficiency sequence airport operations, utilities to manage generation & transmission, or mfrs to fine tune operations on the shop floor.Risk mitigation must expand the range of transactions and track externalities to understand their exposure to loss.Data retention requirements – especially in regulated industries – forcing organizations to keep more data longer, forcing hard decisions of what data to keep live.This is breaking the establishing EDW model, optimized for transforming MBytes/GBytes of well-understood structured data. It is breaking the ETL model as data transformations are bursting their batch windows. Internet players such as Facebook discovered this years ago as their nightly batch windows to MySQL DWs were exceeding 24 hrs. the same issue is now crossing over the mainstream enterporises.
  3. Surging data volumes drove the need to flatten BI architecture, shifting data transformation loads onto the target system to a pattern known as Extract/Load/Transform (ELT).The DW was still separate from source systems. But in place of a staging server where data was drawn in, transformed, and then moved to a target, data transform was placed inside the EDW. Emergence of ELT pattern reflected reality of shrinking batch windows, and need to minimize data movements.
  4. The obvious advantage is that data movements are reduced; only a single movement from source to target was needed. The transformation workload was co-located to where the data was stored and analyzed.
  5. Inexpensive data transformation platformCompute cycles cheap because low-cost platform.Well suited for performing batch transformation of data to downstream SQL DWs/data marts. Can replace your ETL staging server.Accommodates all kinds of dataNo need to lay out tables ahead of time because you are loading to a file system.HDFS efficient for sequential loading & processing of all kinds of data.Keep your options openDon’t force-fit data & analyticsLate-binding approach to structuring dataData schema can evolve over time as new sources of data become availableHadoop has multiple options for representing structured data. You can add a Hive metadata layer and/or persist it in HBase tables.Extensibility – Hadoop becoming a platform with multiple personalitiesOriginated with MapReduce processingMany alternatives emerging for different styles of processingStream processing, graph, HPC patterns, etc.Applying data mining algorithms to uncover new insightsNew frameworks emerging rapidlyBest of both worldsSQL convergence via Hive, emerging frameworks such as ImpalaSQL querying for exploring and understanding your data through familiar BI tools;ExtensibilityLow cost of storage allows raw data & transformed data to be kept side-by-sideAbility to accommodate variably structured data allows orgs to gain visibility into data & data sources traditionally outside the reach of SQL DWs
  6. Hadoop can replace your ETL server
  7. Ovum believes that the hot spot for Hadoop development in 2013 is convergence with SQL.Cloudera has been an active player in making Hadoop SQL-friendly. It has long partnered with leading ETL, BI, and Data warehousing platform and tool providers to offer connectivity between Hadoop and SQL platforms. In turn, many of these technology providers are taking connectivity to new levels by extending their offerings to venture beyond interfacing to Hadoop to operating natively within it.Cloudera’s introduction of the Impala open source framework takes Hadoop-SQL convergence to the next level. Impala, an Apache open source project developed by Cloudera, brings interactive SQL query directly to Hadoop. It offers a high-performance, massively parallel processing framework that works against any Hadoop file format. While Impala utilizes the Hive metadata store, it provides a higher-performance alternative to relying on batch-oriented MapReduce and Hive processes.Impala helps business analysts iterate modeling for data that may eventually be migrated to a data warehouse. Low-cost platforms for iteratively discovering and structuring data.
  8. Cloudera Navigator, a new feature of Cloudera Manager, tracks how data is utilized; specifically, it compiles an audit trail detailing what operations were performed against specific pieces of data, by whom, and when. In its initial release, Navigator will track activity against HDFS, Hive, and HBase.
  9. Our design strategy is to tightly integrate and couple Impala within the Hadoop system. Impala (and interactive SQL) is just another application that you bring to your data. It’s integrated with Hadoop’s existing security and resource management frameworks and is completely interoperable with existing data formats and processing engines.One pool of dataStorage platforms (HDFS & HBase)Open data formats (files & records)Shared across multiple processing frameworksOne metadata modelNo synchronization of metadata between 2 different systems (analytical DBMS and Hadoop)Same metadata used by other components within Hadoop itself (Hive, Pig, Impala, etc.)One security frameworkSingle model for all of HadoopDoesn’t require “turning off” any portion of native Hadoop securityOne set of system resourcesOne set of nodes – storage, CPU, memoryOne management consoleIntegrated resource managementScale linearly as capacity or performance needs grow
  10. Interactive BI/Analytics on more dataRaw, full fidelity data – nothing lost through aggregation or ETL/LTNew sources & types – structured/unstructuredHistorical dataAsking new questionsExploration and data discovery for analytics and machine learning – need to find a data set for a model, which requires lots of simple queries to summarize, count, and validate.Hypothesis testing – avoid having to subset and fit the data to a warehouse just to ask a single questionData processing with tight SLAsCost-effective platformMinimize data movementReduce strain on data warehouseQuery-able storageReplace production data warehouse for DR/active archiveStore decades of data cost effectively (for better modeling or data retention mandates) without sacrificing the capability to analyze