SlideShare uma empresa Scribd logo
1 de 27
Getting Started & Successful
with Big Data
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
@Pentaho #BigDataWebSeries
Your Hosts Today
Paul Brook
Cloud EMEA Program Manager
Dell
2© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Davy Nys
VP EMEA & APAC
Pentaho
Chuck Yarbrough
Technical Solutions Marketing
Pentaho
Pentaho Webinar Series
3© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Sign-up at: pentaho.com
Goals for Today
4© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
To Understand:
• How to get a Hadoop
cluster up and running
• Where Hadoop and
other pieces fit into the
architecture
• How you can easily get
data in & out Hadoop
• How to leverage
Hadoop with Pentaho
• Initial Best Practices
Complete Analytics and Visual Data Management
HadoopNoSQL Databases
Data Discovery
&
Visualization
Enterprise
&
Ad Hoc Reporting
Predictive Analytics
&
Machine Learning
Data Ingestion, Manipulation
&
Integration
Analytic Databases
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
5
Data Warehouse Optimization
Data Sources Big Data Architecture
Data Warehouse
(Master & Transactional Data)
ERP
CRM
CDR
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Logs
Logs
Other Data
Raw Data
Parsed Data
Analytic Datasets
Master Data
Tape
Archive
Steps To Start with Hadoop
7© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Hadoop
Installation
• Install locally – as Pseudo-Distributed
mode
• Leverage tools like Dell Crowbar
• Cloud sandbox
• Easy download & installation
• Start on desktop
Pentaho
Installation
1
2
• Extract or access data from source
systems
• Load it (in its raw form) into Hadoop
• Tokenize & parse as required
• Transform & enrich
• Load into destination
Start
Loading3
Data Architecture and Integration Challenges
Data Sources Big Data Architecture
Data Warehouse
(Master & Transactional Data)
ERP
CRM
CDR
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Logs
Logs
Other Data
Raw Data
Parsed Data
Analytic Datasets
Master Data
NOSQL
Data Architecture and Integration Challenges
Data Sources Big Data Architecture
Data Warehouse
(Master & Transactional Data)
ERP
CRM
CDR
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Logs
Logs
Other Data
Raw Data
Parsed Data
Analytic Datasets
Master Data
NOSQL
Extract
Transform
Load
Orchestration & Integration
MR
Data Architecture and Integration Challenges
Data Sources Big Data Architecture
Data Warehouse
(Master & Transactional Data)
ERP
CRM
CDR
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Logs
Logs
Other Data
Raw Data
Parsed Data
Analytic Datasets
Master Data
NOSQL
Extract
Transform
Load
Orchestration & Integration
MR
Solution Architecture & Demo
11© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Fast and easy way to deploy
Hadoop clusters with Dell1
Global Marketing
Fast and easy way to
deploy Hadoop
clusters with Dell
Global Marketing
Well we are ready, but
how will the Hardware
Team know how to size
and design the Hadoop
cluster……..?
I don’t know….and it
may take a long time to
build the Hadoop cluster
Time is a critical
factor, we need to get
this project moving
Global Marketing
Reduce time to Cluster Sizing, Design &
Deployment
Faster time to productive operations
Optimize and adapt for your needs
Deliver the best return on investment
Reduce risk &
increase
flexibility
with
Dell
Dell.com/Crowbar
Global Marketing
Dell | Hadoop
Solution
“Dell … was one of the first of
the hardware vendors to grasp
the fact that cloud is about
provisioning services, not about
the hardware.”
Maxwell Cooter, Cloud Pro
Excels at supporting complex big data
analyses across large collections of
structured and unstructured data
• Hadoop handles a variety of workloads,
including search, log processing, data
warehousing, recommendation systems and
video/image analysis
• Work on the most modern scale-out
architectures using a clean-sheet design data
framework
• Without vendor lock-in
Apache Hadoop software
Crowbar software framework with a
Hadoop barclamp
PowerEdge C8000 Series, C6220, R720, R720XD
Force10 or PowerConnect switches
Reference Architecture
Deployment Guide
Joint Service and Support
Proven solutions
Proven components
Partner Ecosystem
Global Marketing
Crowbar
• Accelerates
multi-node
deployments
• Simplifies
maintenance
• Streamlines
ongoing
updates
Built with DevOps
• Provides an operational model for managing big
data clusters and cloud
Field-proven technologies
• Build on locally deployed Chef Server
• Raw servers to full cluster in <2 hours
• Hardened with more than a year of deployments
Apache 2 open source
• Multi-apps (Hadoop & OpenStack)
• Multi-OS (Ubuntu, RHEL, CentOS, SUSE)
NOT limited to Dell hardware
Crowbar Software Framework
A modular, open source framework
Global Marketing
Deploy a Hadoop cluster in ~2 hours
Reduce software
licensing fees
100%
Use Crowbar to:
• Automate the deployment and
configuration of a Hadoop cluster
• Quickly provision bare-metal
servers from box to cluster with
minimal intervention
• Maintain, upgrade and evolve your
Hadoop cluster over time
• Leverage an open source
framework backed by a growing
global developer ecosystem
Reduce
development time
4-6 mo.
Crowbar
software
framewo
rk
Evolve to meet your needs over time with built in DevOps
Global Marketing
Crowbar dashboard provides visibility
Global Marketing
Leverage developer expertise worldwide
Download the open source software:
https://github.com/dellcloudedge/crowb
ar
Participate in an active community
http://lists.us.dell.com/mailman/listinfo
/crowbar
Get resources on the Wiki:
https://github.com/dellcloudedge/crowb
ar/wiki
Visit Dell.com/Crowbar,
Crowbar@Dell.com
Global Marketing
Dell.com/Crowbar
pentaho.com/download
• Install on a local desktop – no need for a
cluster
• “Managed Code” no additional installations
• Pentaho will write to the Hadoop Distributed
Cache for execution
Pentaho Installation
21© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Scheduling Integration Manipulation Orchestration
2
Start Loading
Loading into HDFS & HIVE
– Hadoop Copy Files
– Specify source files / destination
22© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
3
Loading into HBASE
– Zookeeper host & port
– Specify HBASE Mapping
Solution Architecture & Demo
23© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Demo
Maximize Performance
24© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
As much as
15x faster than
hand-written
code.
Parallel
execution as
MapReduce
in the Hadoop
cluster.
Additional Best Practices
25© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Leverage
Hadoop
• Don’t do database lookups inside a
Mapper/Reducer – bring the data set into HDFS
• Don’t transfer data between two clustering
technologies – network overload
• Start with a small data set and validate logic &
performance outside the cluster
• Gradually increase volumes and fine tune the
application, cluster, data stores & network
Don’t Boil
the Ocean
• Leverage the various technologies available
• A combination of easy to use tools, powerful
scripting and custom coding provides the best mix
It’s
AND…AND
Solution Architecture & Demo
26© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Q & A
27© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Contact Us or Sign-up at:
pentaho.com

Mais conteúdo relacionado

Mais procurados

Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho
 
Slides pentaho-hadoop-weka
Slides pentaho-hadoop-wekaSlides pentaho-hadoop-weka
Slides pentaho-hadoop-weka
lucboudreau
 
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
MongoDB
 
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB
 
Hadoop uk user group meeting final
Hadoop uk user group meeting finalHadoop uk user group meeting final
Hadoop uk user group meeting final
Skills Matter
 

Mais procurados (20)

Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing
 
30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation
 
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015
 
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User Group
 
Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica
 
Pentaho big data camp - 5 min
Pentaho   big data camp - 5 minPentaho   big data camp - 5 min
Pentaho big data camp - 5 min
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
 
Pentaho - Jake Cornelius - Hadoop World 2010
Pentaho - Jake Cornelius - Hadoop World 2010Pentaho - Jake Cornelius - Hadoop World 2010
Pentaho - Jake Cornelius - Hadoop World 2010
 
Slides pentaho-hadoop-weka
Slides pentaho-hadoop-wekaSlides pentaho-hadoop-weka
Slides pentaho-hadoop-weka
 
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
 
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
 
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageWebinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
 
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
 
Big Data Solutions Executive Overview
Big Data Solutions Executive OverviewBig Data Solutions Executive Overview
Big Data Solutions Executive Overview
 
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...
 
Hadoop uk user group meeting final
Hadoop uk user group meeting finalHadoop uk user group meeting final
Hadoop uk user group meeting final
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpen Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
 

Destaque

Gsm cell analysis
Gsm cell analysisGsm cell analysis
Gsm cell analysis
Muxi ESL
 
CDR template revised Ver 3.1.1 presentation
CDR template revised Ver 3.1.1 presentationCDR template revised Ver 3.1.1 presentation
CDR template revised Ver 3.1.1 presentation
Arya dash
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
DataWorks Summit
 

Destaque (15)

AltiGen Cdr Manual
AltiGen Cdr ManualAltiGen Cdr Manual
AltiGen Cdr Manual
 
Cisco cdr reporting it’s easy if you do it smart
Cisco cdr reporting  it’s easy if you do it smartCisco cdr reporting  it’s easy if you do it smart
Cisco cdr reporting it’s easy if you do it smart
 
Gsm cell analysis
Gsm cell analysisGsm cell analysis
Gsm cell analysis
 
CDR template revised Ver 3.1.1 presentation
CDR template revised Ver 3.1.1 presentationCDR template revised Ver 3.1.1 presentation
CDR template revised Ver 3.1.1 presentation
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
 
Open Bank Project September 2014 at Open Data CH
Open Bank Project September 2014  at Open Data CHOpen Bank Project September 2014  at Open Data CH
Open Bank Project September 2014 at Open Data CH
 
Oracle GoldenGate 12c CDR Presentation for ECO
Oracle GoldenGate 12c CDR Presentation for ECOOracle GoldenGate 12c CDR Presentation for ECO
Oracle GoldenGate 12c CDR Presentation for ECO
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of Analytics
 
CDR-Stats : VoIP Analytics Solution for Asterisk and FreeSWITCH with MongoDB
CDR-Stats : VoIP Analytics Solution for Asterisk and FreeSWITCH with MongoDBCDR-Stats : VoIP Analytics Solution for Asterisk and FreeSWITCH with MongoDB
CDR-Stats : VoIP Analytics Solution for Asterisk and FreeSWITCH with MongoDB
 
Re-identification of Anomized CDR datasets using Social networlk Data
Re-identification of Anomized CDR datasets using Social networlk DataRe-identification of Anomized CDR datasets using Social networlk Data
Re-identification of Anomized CDR datasets using Social networlk Data
 
Big Data CDR Analyzer - Kanthaka
Big Data CDR Analyzer - KanthakaBig Data CDR Analyzer - Kanthaka
Big Data CDR Analyzer - Kanthaka
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Asterisk (IP-PBX) CDR Log Rotation
Asterisk (IP-PBX) CDR Log RotationAsterisk (IP-PBX) CDR Log Rotation
Asterisk (IP-PBX) CDR Log Rotation
 
Paging and Location Update
Paging and Location UpdatePaging and Location Update
Paging and Location Update
 

Semelhante a Big Data Integration Webinar: Getting Started With Hadoop Big Data

Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Kognitio
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
DataWorks Summit
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
inevitablecloud
 
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitecturesSQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
Polish SQL Server User Group
 
clusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetclusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheet
Andrei Khurshudov
 

Semelhante a Big Data Integration Webinar: Getting Started With Hadoop Big Data (20)

2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
 
What it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateWhat it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready state
 
Big Data Infrastructure
Big Data InfrastructureBig Data Infrastructure
Big Data Infrastructure
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Journey to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, PythonJourney to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, Python
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and Hadoop
 
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitecturesSQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
clusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetclusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheet
 

Mais de Pentaho

Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics
Pentaho
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare Solutions
Pentaho
 

Mais de Pentaho (7)

Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for Analytics
 
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview PresentationFilling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
 
The Next Big Thing in Big Data
The Next Big Thing in Big DataThe Next Big Thing in Big Data
The Next Big Thing in Big Data
 
Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics
 
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare Solutions
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Big Data Integration Webinar: Getting Started With Hadoop Big Data

  • 1. Getting Started & Successful with Big Data © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 @Pentaho #BigDataWebSeries
  • 2. Your Hosts Today Paul Brook Cloud EMEA Program Manager Dell 2© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Davy Nys VP EMEA & APAC Pentaho Chuck Yarbrough Technical Solutions Marketing Pentaho
  • 3. Pentaho Webinar Series 3© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Sign-up at: pentaho.com
  • 4. Goals for Today 4© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 To Understand: • How to get a Hadoop cluster up and running • Where Hadoop and other pieces fit into the architecture • How you can easily get data in & out Hadoop • How to leverage Hadoop with Pentaho • Initial Best Practices
  • 5. Complete Analytics and Visual Data Management HadoopNoSQL Databases Data Discovery & Visualization Enterprise & Ad Hoc Reporting Predictive Analytics & Machine Learning Data Ingestion, Manipulation & Integration Analytic Databases © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 5
  • 6. Data Warehouse Optimization Data Sources Big Data Architecture Data Warehouse (Master & Transactional Data) ERP CRM CDR Analytic Data Mart(s) Analytic Data Mart(s) Analytic Data Mart(s) Logs Logs Other Data Raw Data Parsed Data Analytic Datasets Master Data Tape Archive
  • 7. Steps To Start with Hadoop 7© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Hadoop Installation • Install locally – as Pseudo-Distributed mode • Leverage tools like Dell Crowbar • Cloud sandbox • Easy download & installation • Start on desktop Pentaho Installation 1 2 • Extract or access data from source systems • Load it (in its raw form) into Hadoop • Tokenize & parse as required • Transform & enrich • Load into destination Start Loading3
  • 8. Data Architecture and Integration Challenges Data Sources Big Data Architecture Data Warehouse (Master & Transactional Data) ERP CRM CDR Analytic Data Mart(s) Analytic Data Mart(s) Analytic Data Mart(s) Logs Logs Other Data Raw Data Parsed Data Analytic Datasets Master Data NOSQL
  • 9. Data Architecture and Integration Challenges Data Sources Big Data Architecture Data Warehouse (Master & Transactional Data) ERP CRM CDR Analytic Data Mart(s) Analytic Data Mart(s) Analytic Data Mart(s) Logs Logs Other Data Raw Data Parsed Data Analytic Datasets Master Data NOSQL Extract Transform Load Orchestration & Integration MR
  • 10. Data Architecture and Integration Challenges Data Sources Big Data Architecture Data Warehouse (Master & Transactional Data) ERP CRM CDR Analytic Data Mart(s) Analytic Data Mart(s) Analytic Data Mart(s) Logs Logs Other Data Raw Data Parsed Data Analytic Datasets Master Data NOSQL Extract Transform Load Orchestration & Integration MR
  • 11. Solution Architecture & Demo 11© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Fast and easy way to deploy Hadoop clusters with Dell1
  • 12. Global Marketing Fast and easy way to deploy Hadoop clusters with Dell
  • 13. Global Marketing Well we are ready, but how will the Hardware Team know how to size and design the Hadoop cluster……..? I don’t know….and it may take a long time to build the Hadoop cluster Time is a critical factor, we need to get this project moving
  • 14. Global Marketing Reduce time to Cluster Sizing, Design & Deployment Faster time to productive operations Optimize and adapt for your needs Deliver the best return on investment Reduce risk & increase flexibility with Dell Dell.com/Crowbar
  • 15. Global Marketing Dell | Hadoop Solution “Dell … was one of the first of the hardware vendors to grasp the fact that cloud is about provisioning services, not about the hardware.” Maxwell Cooter, Cloud Pro Excels at supporting complex big data analyses across large collections of structured and unstructured data • Hadoop handles a variety of workloads, including search, log processing, data warehousing, recommendation systems and video/image analysis • Work on the most modern scale-out architectures using a clean-sheet design data framework • Without vendor lock-in Apache Hadoop software Crowbar software framework with a Hadoop barclamp PowerEdge C8000 Series, C6220, R720, R720XD Force10 or PowerConnect switches Reference Architecture Deployment Guide Joint Service and Support Proven solutions Proven components Partner Ecosystem
  • 16. Global Marketing Crowbar • Accelerates multi-node deployments • Simplifies maintenance • Streamlines ongoing updates Built with DevOps • Provides an operational model for managing big data clusters and cloud Field-proven technologies • Build on locally deployed Chef Server • Raw servers to full cluster in <2 hours • Hardened with more than a year of deployments Apache 2 open source • Multi-apps (Hadoop & OpenStack) • Multi-OS (Ubuntu, RHEL, CentOS, SUSE) NOT limited to Dell hardware Crowbar Software Framework A modular, open source framework
  • 17. Global Marketing Deploy a Hadoop cluster in ~2 hours Reduce software licensing fees 100% Use Crowbar to: • Automate the deployment and configuration of a Hadoop cluster • Quickly provision bare-metal servers from box to cluster with minimal intervention • Maintain, upgrade and evolve your Hadoop cluster over time • Leverage an open source framework backed by a growing global developer ecosystem Reduce development time 4-6 mo. Crowbar software framewo rk Evolve to meet your needs over time with built in DevOps
  • 18. Global Marketing Crowbar dashboard provides visibility
  • 19. Global Marketing Leverage developer expertise worldwide Download the open source software: https://github.com/dellcloudedge/crowb ar Participate in an active community http://lists.us.dell.com/mailman/listinfo /crowbar Get resources on the Wiki: https://github.com/dellcloudedge/crowb ar/wiki Visit Dell.com/Crowbar, Crowbar@Dell.com
  • 21. pentaho.com/download • Install on a local desktop – no need for a cluster • “Managed Code” no additional installations • Pentaho will write to the Hadoop Distributed Cache for execution Pentaho Installation 21© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Scheduling Integration Manipulation Orchestration 2
  • 22. Start Loading Loading into HDFS & HIVE – Hadoop Copy Files – Specify source files / destination 22© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 3 Loading into HBASE – Zookeeper host & port – Specify HBASE Mapping
  • 23. Solution Architecture & Demo 23© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Demo
  • 24. Maximize Performance 24© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 As much as 15x faster than hand-written code. Parallel execution as MapReduce in the Hadoop cluster.
  • 25. Additional Best Practices 25© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Leverage Hadoop • Don’t do database lookups inside a Mapper/Reducer – bring the data set into HDFS • Don’t transfer data between two clustering technologies – network overload • Start with a small data set and validate logic & performance outside the cluster • Gradually increase volumes and fine tune the application, cluster, data stores & network Don’t Boil the Ocean • Leverage the various technologies available • A combination of easy to use tools, powerful scripting and custom coding provides the best mix It’s AND…AND
  • 26. Solution Architecture & Demo 26© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Q & A
  • 27. 27© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Contact Us or Sign-up at: pentaho.com

Notas do Editor

  1. Two major
  2. TAKE-AWAYSPentaho provides complete integrated DI+BI for every leading big data platform.
  3. The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  4. The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  5. The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  6. The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  7. Delivered as a hardware, software, and services Reference Architecture (RA) which can scale from 6-nodes up to 720-nodesCurrently utilizes PowerEdge C 2100/C6100/C6105 R720, R720XD servers and PowerConnect 6248 or Force 10 switchesDell CrowbarAutomated solution deployment and configuration (Bare metal, OS, Solution Stack, and Monitoring)CDH3 EnterpriseCloudera Hadoop DistributionCloudera Management ToolsCloudera SupportPartner EcosystemSoftware and services capabilities to address broader customer needs around HadoopEnabling non-technical business users to leverage HadoopSimplify getting data into HadoopIntuitive analytics reporting and dashboardsSolution Provided viaReference ArchitectureDeployment GuideDell Digital LockerDell Deployment Services
  8. First OpenStack cloud solution provider in marketPioneer OpenStack partner (Only Day 1 hardware provider)Most history with the OpenStack technology = expertize + RA’s that have been tested longer and fuller than newcomersDell offers a deep partnership ecosystemSingle point of support and purchase to reduce the problem of dealing with multiple vendorsONLY company providing automated software to do multi-node OpenStack provisioning: CrowbarDell developed software that we opensourced in the community.OpenStack expertsize