SlideShare uma empresa Scribd logo
1 de 25
Accelerated Analytics for the Big Data Fabric
       Bay Area Hadoop User Group




       © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
AGENDA



 The Big Data Fabric
 Big Data Preparation – An Everyday Challenge
 Use-Case Scenario – Call Volume Analysis
    Solution Requirements
    Solution Workflow
    Phase I - Data Preparation & Visualization
    Phase II - Pentaho MapReduce & Orchestration
 Summary




                                                                                                      2
                     © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
The Big Data Fabric




                                                                                Data Integration Big Analytics
   Pentaho Business Analytics                 3rd Party Tools
                                                             R
       Visualization      Dashboards              3rd   Party BI Tools
   Interactive Analysis    Reports                      Applications



Data Integration                                                 Scheduling
Job Orchestration                                            High Performance
    Workflow                                                      Visual IDE



   Hadoop                                                  Analytic Databases
                                NoSQL Databases




                                                                                Big Data Mgmt
                                                                                                                 3
Preparing Big Data for Analysis
          is an Everyday Challenge


                                             •        Very technical skills required
                                             •        Divide between M-R developers & analysts
                                             •        Beyond the reach of many organizations




                                                                                             4
  © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Pentaho Visual MapReduce




                                           Accessible by any ETL
                                           developer, business analyst or data
                                           scientist

                                           Executes inside Hadoop as a native
                                           Java MapReduce task
   © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
                                                                                    5
Pentaho Reporting & Analytics




          Batch Reporting
         and Ad Hoc Query
                                                                                      Data Visualization, Discovery
                                                                                              and Analysis




Hadoop                                    NoSQL                                                           Hybrid
                                                                                                                      6
                   © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Use Case Scenario – Call Volume Analysis

• VOIP service provider has excess capacity and is
  considering expansion to consumer markets
• Business Analyst: what are the top 10 states for
  inbound calls on Fridays, Saturdays and Sundays?
• Research data available:
   – Call records – date/timestamp & destination phone #
                                                                                                        ?
   – NANP (North American Numbering Plan) data – area
     code by country, state & time zone




                                                                                                            7
                       © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Solution Requirements

• Data Preparation
   – Access the call records in HDFS
   – Extract the destination area code for each call
   – Read the area code reference data
   – Lookup country, state and time zone by area code, append to each
     record
   – Filter out records (non-U.S. calls, calls made on M-Tu-W-Th)
   – Load to a relational database
   – Generate metadata
• Analysis
   – Explore data multi-dimensionally
   – Find the top-10 states by inbound call volume
   – Navigate via a geospatial interface
• Deployment
   – Deploy in MapReduce to handle larger data volumes

                                                                                                      8
                     © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Solution Workflow


• Phase I - Business Analysts
   – Use a data extract to prepare and validate their analyses
   – Iterate over requirements with executives and stake-holders


• Phase II - MapReduce Developers/Analysts
   – Create production Pentaho MapReduce transformations
   – Manage the deployment and orchestration between the
     Hadoop cluster and the production database




                                                                                                      9
                     © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Data Preparation (Phase I)




• The data pipeline implements the data preparation logic
• Each component has a “personality”– access, calculate, join, filter …
• Free-form design
    – As many or as few inputs, transformations and outputs as needed
• Schema contract exists only between connected components
• Pipelined, multi-threaded for performance
• 100% Java-based for deployment flexibility



                                                                                                       10
                      © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Data Pipeline – Input from HDFS




                                                                                      11
     © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Data Pipeline - Calculator




                                                                                   12
  © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Data Pipeline – Stream Lookup




                                                                                     13
    © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Data Pipeline – Row Filter




                                                                                   14
  © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Data Pipeline – Table Output




                                                                                    15
   © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Visualization – Multi-Dimensional UX




                                                                                        16
       © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Visualization – Geographic




                                                                                   17
  © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Visualization - Heatmap




                                                                                  18
 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Deployment to Hadoop (Phase II)


• To process a larger set of data we can deploy the data pipeline via
  MapReduce
    – Input and output streams are encoded in key-value pairs
    – Two specialized components provide an interface:




    – A special job component deploys the data pipeline to the Hadoop
      cluster:




                                                                                                       19
                      © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Pentaho MapReduce – Inputs/Outputs



      The core logic of the data pipeline is
       identical … only the ends change




                                ........




                                                                                         20
        © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Pentaho MapReduce – Orchestration




                                                                                        21
       © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Instant Analytics (Roadmap)




Choose a Big Data Source,
Answer a Few Questions,
   Publish to Pentaho


                                                Report, Explore and
                                                     Analyze




                                                                                                             Customize Model
                                                                                                                (Optional)
                                                                                                                               22
                            © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
SUMMARY



1. The Big Data Fabric encompasses a large collection of Hadoop
   distributions, NoSQL and analytical databases
2. A component-based approach to data access and integration can:
   – Allow business analysts and data scientists to perform their own data
     preparation
   – Result in more rapid validation of business requirements & metrics
   – Be used to create data pipelines that can be deployed directly to a
     cluster, enabling analytics against much larger data sets
   – Support orchestration across environments




                                                                                                      23
                     © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Summary




                                                                                 24
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Thank You
Join the conversation. You can find us on:

     http://blog.pentaho.com

     @Pentaho

     Facebook.com/Pentaho

     Pentaho Business Analytics



  © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Mais conteúdo relacionado

Mais procurados

Why Your Product Needs an Analytic Strategy
Why Your Product Needs an Analytic Strategy Why Your Product Needs an Analytic Strategy
Why Your Product Needs an Analytic Strategy Pentaho
 
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...MongoDB
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Pentaho
 
Pentaho Analytics at Tampa Analytics September Meetup
Pentaho Analytics at Tampa Analytics September MeetupPentaho Analytics at Tampa Analytics September Meetup
Pentaho Analytics at Tampa Analytics September MeetupMark Kromer
 
Oracle Enterprise Metadata Management
Oracle Enterprise Metadata ManagementOracle Enterprise Metadata Management
Oracle Enterprise Metadata ManagementAndrey Akulov
 
30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho EvaluationPentaho
 
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelDataWorks Summit
 
Pentaho roadmap 061314
Pentaho roadmap 061314Pentaho roadmap 061314
Pentaho roadmap 061314Stratebi
 
Big Data for Product Managers
Big Data for Product ManagersBig Data for Product Managers
Big Data for Product ManagersPentaho
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Datajdijcks
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data DiscoveryHarald Erb
 
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightPrecisely
 
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...DataWorks Summit/Hadoop Summit
 
Priyank Patel, Teradata, Hadoop & SQL
Priyank Patel, Teradata, Hadoop & SQLPriyank Patel, Teradata, Hadoop & SQL
Priyank Patel, Teradata, Hadoop & SQLThe Hive
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSteven Totman
 
Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for AnalyticsKatharine Bierce
 

Mais procurados (20)

Why Your Product Needs an Analytic Strategy
Why Your Product Needs an Analytic Strategy Why Your Product Needs an Analytic Strategy
Why Your Product Needs an Analytic Strategy
 
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
 
Big Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - PentahoBig Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - Pentaho
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
 
Pentaho Analytics at Tampa Analytics September Meetup
Pentaho Analytics at Tampa Analytics September MeetupPentaho Analytics at Tampa Analytics September Meetup
Pentaho Analytics at Tampa Analytics September Meetup
 
Oracle Enterprise Metadata Management
Oracle Enterprise Metadata ManagementOracle Enterprise Metadata Management
Oracle Enterprise Metadata Management
 
30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation
 
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
 
Pentaho roadmap 061314
Pentaho roadmap 061314Pentaho roadmap 061314
Pentaho roadmap 061314
 
Big Data for Product Managers
Big Data for Product ManagersBig Data for Product Managers
Big Data for Product Managers
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
 
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
 
Priyank Patel, Teradata, Hadoop & SQL
Priyank Patel, Teradata, Hadoop & SQLPriyank Patel, Teradata, Hadoop & SQL
Priyank Patel, Teradata, Hadoop & SQL
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
Oracle's BigData solutions
Oracle's BigData solutionsOracle's BigData solutions
Oracle's BigData solutions
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for Analytics
 

Semelhante a Bay Area Hadoop User Group

Putting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresPutting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresDATAVERSITY
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopMark Kromer
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDBMark Kromer
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachDataWorks Summit
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
Pentaho Roadmap 2011
Pentaho Roadmap 2011Pentaho Roadmap 2011
Pentaho Roadmap 2011Datalytics
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenDataWorks Summit
 
Oracle Advanced Analytics
Oracle Advanced AnalyticsOracle Advanced Analytics
Oracle Advanced Analyticsaghosh_us
 
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Cloudera, Inc.
 
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBICC Thomas More
 
Hadoop Perspectives for 2017
Hadoop Perspectives for 2017Hadoop Perspectives for 2017
Hadoop Perspectives for 2017Precisely
 

Semelhante a Bay Area Hadoop User Group (20)

Putting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresPutting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data Stores
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and Hadoop
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDB
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Plug 20110217
Plug   20110217Plug   20110217
Plug 20110217
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
Pentaho Roadmap 2011
Pentaho Roadmap 2011Pentaho Roadmap 2011
Pentaho Roadmap 2011
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Technical presentation
Technical presentationTechnical presentation
Technical presentation
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
 
Oracle Advanced Analytics
Oracle Advanced AnalyticsOracle Advanced Analytics
Oracle Advanced Analytics
 
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
 
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
 
Integrated dwh 3
Integrated dwh 3Integrated dwh 3
Integrated dwh 3
 
Big Data
Big DataBig Data
Big Data
 
Hadoop Perspectives for 2017
Hadoop Perspectives for 2017Hadoop Perspectives for 2017
Hadoop Perspectives for 2017
 

Mais de Pentaho

Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for AnalyticsPentaho
 
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview PresentationFilling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview PresentationPentaho
 
The Next Big Thing in Big Data
The Next Big Thing in Big DataThe Next Big Thing in Big Data
The Next Big Thing in Big DataPentaho
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015 Pentaho
 
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Pentaho
 
Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity Pentaho
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Pentaho
 
Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Pentaho
 
Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Pentaho
 
Embedded Analytics in Customer Success
Embedded Analytics in Customer SuccessEmbedded Analytics in Customer Success
Embedded Analytics in Customer SuccessPentaho
 
Embedded Analytics in Human Capital Management
Embedded Analytics in Human Capital ManagementEmbedded Analytics in Human Capital Management
Embedded Analytics in Human Capital ManagementPentaho
 
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...Pentaho
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho
 
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho
 

Mais de Pentaho (14)

Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for Analytics
 
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview PresentationFilling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
 
The Next Big Thing in Big Data
The Next Big Thing in Big DataThe Next Big Thing in Big Data
The Next Big Thing in Big Data
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015
 
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
 
Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics
 
Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica
 
Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing
 
Embedded Analytics in Customer Success
Embedded Analytics in Customer SuccessEmbedded Analytics in Customer Success
Embedded Analytics in Customer Success
 
Embedded Analytics in Human Capital Management
Embedded Analytics in Human Capital ManagementEmbedded Analytics in Human Capital Management
Embedded Analytics in Human Capital Management
 
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare Solutions
 
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
 

Bay Area Hadoop User Group

  • 1. Accelerated Analytics for the Big Data Fabric Bay Area Hadoop User Group © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 2. AGENDA  The Big Data Fabric  Big Data Preparation – An Everyday Challenge  Use-Case Scenario – Call Volume Analysis  Solution Requirements  Solution Workflow  Phase I - Data Preparation & Visualization  Phase II - Pentaho MapReduce & Orchestration  Summary 2 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 3. The Big Data Fabric Data Integration Big Analytics Pentaho Business Analytics 3rd Party Tools R Visualization Dashboards 3rd Party BI Tools Interactive Analysis Reports Applications Data Integration Scheduling Job Orchestration High Performance Workflow Visual IDE Hadoop Analytic Databases NoSQL Databases Big Data Mgmt 3
  • 4. Preparing Big Data for Analysis is an Everyday Challenge • Very technical skills required • Divide between M-R developers & analysts • Beyond the reach of many organizations 4 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 5. Pentaho Visual MapReduce Accessible by any ETL developer, business analyst or data scientist Executes inside Hadoop as a native Java MapReduce task © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 5
  • 6. Pentaho Reporting & Analytics Batch Reporting and Ad Hoc Query Data Visualization, Discovery and Analysis Hadoop NoSQL Hybrid 6 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 7. Use Case Scenario – Call Volume Analysis • VOIP service provider has excess capacity and is considering expansion to consumer markets • Business Analyst: what are the top 10 states for inbound calls on Fridays, Saturdays and Sundays? • Research data available: – Call records – date/timestamp & destination phone # ? – NANP (North American Numbering Plan) data – area code by country, state & time zone 7 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 8. Solution Requirements • Data Preparation – Access the call records in HDFS – Extract the destination area code for each call – Read the area code reference data – Lookup country, state and time zone by area code, append to each record – Filter out records (non-U.S. calls, calls made on M-Tu-W-Th) – Load to a relational database – Generate metadata • Analysis – Explore data multi-dimensionally – Find the top-10 states by inbound call volume – Navigate via a geospatial interface • Deployment – Deploy in MapReduce to handle larger data volumes 8 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 9. Solution Workflow • Phase I - Business Analysts – Use a data extract to prepare and validate their analyses – Iterate over requirements with executives and stake-holders • Phase II - MapReduce Developers/Analysts – Create production Pentaho MapReduce transformations – Manage the deployment and orchestration between the Hadoop cluster and the production database 9 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 10. Data Preparation (Phase I) • The data pipeline implements the data preparation logic • Each component has a “personality”– access, calculate, join, filter … • Free-form design – As many or as few inputs, transformations and outputs as needed • Schema contract exists only between connected components • Pipelined, multi-threaded for performance • 100% Java-based for deployment flexibility 10 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 11. Data Pipeline – Input from HDFS 11 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 12. Data Pipeline - Calculator 12 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 13. Data Pipeline – Stream Lookup 13 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 14. Data Pipeline – Row Filter 14 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 15. Data Pipeline – Table Output 15 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 16. Visualization – Multi-Dimensional UX 16 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 17. Visualization – Geographic 17 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 18. Visualization - Heatmap 18 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 19. Deployment to Hadoop (Phase II) • To process a larger set of data we can deploy the data pipeline via MapReduce – Input and output streams are encoded in key-value pairs – Two specialized components provide an interface: – A special job component deploys the data pipeline to the Hadoop cluster: 19 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 20. Pentaho MapReduce – Inputs/Outputs The core logic of the data pipeline is identical … only the ends change ........ 20 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 21. Pentaho MapReduce – Orchestration 21 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 22. Instant Analytics (Roadmap) Choose a Big Data Source, Answer a Few Questions, Publish to Pentaho Report, Explore and Analyze Customize Model (Optional) 22 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 23. SUMMARY 1. The Big Data Fabric encompasses a large collection of Hadoop distributions, NoSQL and analytical databases 2. A component-based approach to data access and integration can: – Allow business analysts and data scientists to perform their own data preparation – Result in more rapid validation of business requirements & metrics – Be used to create data pipelines that can be deployed directly to a cluster, enabling analytics against much larger data sets – Support orchestration across environments 23 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 24. Summary 24 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 25. Thank You Join the conversation. You can find us on: http://blog.pentaho.com @Pentaho Facebook.com/Pentaho Pentaho Business Analytics © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Notas do Editor

  1. Leveraging PDI to incorporate Big Data into your data fabric provides immediate access to analytics, examples: Batch and Ad Hoc reporting directly against Big Data Data sources using familiar BI tools with no coding – Report Designer, Interactive Reporting Agile framework to quickly generate/house/manage data marts for interactive analysis, data discovery, etc.