SlideShare uma empresa Scribd logo
1 de 39
Baixar para ler offline
Putting Business Intelligence to
          Work on Hado Data Stores
                       oop

                                       Ian Fyfe, Chief Techno
                                                            ology Evangelist, Pentaho




© 2010, Pentaho. All Rights Reserved. www.pentaho.com.   © 2010, Pentaho. All Rights R
                                                                                     Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | Slide 1
Session Abstract
       This presentation will cover how to ov
                                            vercome Hadoop's constraints to get
       more out of your business data analyssis.
       An inexpensive way of storing large volumes of da ata,
                                                         ata Hadoop is also scalable and redundant But
                                                                                          redundant.
       getting data out of Hadoop is tough due to a lack of a built-in query language. Also, because users
                                                        k
       experience high latency (up to several minutes pe query), Hadoop is not appropriate for ad hoc
                                                         er
       query, reporting, and business analysis with tradiitional tools.
       The fi t t in
       Th first step i overcoming H d
                                 i Hadoop's constraints i connecting t HIVE a d t warehouse
                                           '      t i ts is         ti to HIVE, data         h
       infrastructure built on top of Hadoop, which provvides the relational structure necessary for
       schedule reporting of large datasets data stored in Hadoop files. HIVE also provides a simple query
                                                         i
       language called Hive QL which is based on SQL an which enables users familiar with SQL to query
                                                        nd
       this data.
       But to really unlock the power of Hadoop, you mu be able to efficiently extract data stored across
                                                         ust
       multiple (often tens or hundreds) of nodes with a user-friendly ETL (extract, transform and load)
       tool that will then allow you to move y
                                 y           your Hadoop data into a relational data mart or warehouse
                                                       op
       where you can use BI tools for analysis.

   Attendees will learn, how an IT person without java programming skills can:
      Integrate with Hadoop and Hive to bring ETL, dat warehousing and BI applications to the tasks of
                                                     ta
      analyzing Big Data;
      Provide key data integration and transformation functionality to Hadoop data;
                                                      f
      Manage and control Hadoop jobs using a graphica interface;
                                                     al
      Integrating Hadoop data with data from other souurces to drive compelling reporting and analytics
      for today's massive volumes of data.
© 2010, Pentaho. All Rights Reserved. www.pentaho.com.                    Worldwide: +1 (866) 660-7555 | Slide 2
THE CASE FOR B DATA
                          BIG


© 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | Slide 3
The Case for Big Data
       Enterprises increasingly face nee to store, process and maintain
                                       eds
       larger and larger volumes of structured and unstructured data
              Compliance
              Competitive Advantage
       Challenges associated with big da
                                       ata
              Cost – storage and processing power
                                                r
              Timeliness of data processing
       Why Hadoop?                                               Google trends for ‘Hadoop’


              Low cost, reliable scale-out architec
                                                  cture for storing massive amounts of data
              Parallel,
              Parallel distributed computing frammework for processing data
              Proven success in solving Big Data pr
                                                  roblems at fortune 500 companies like
              Google, Yahoo!, IBM and GE
              Vibrant community, exploding i
              Vib              i       l di intere strong commercial i
                                                  est,                 i l investments




© 2010, Pentaho. All Rights Reserved. www.pentaho.com.           Worldwide: +1 (866) 660-7555 | Slide 4
Hadoop for Data Integration and BI
               Top Use Cases for Hadoop*
                     1. “mine data for improved busines intelligence”
                                                      ss
                     2 “reducing cost of data analysis”
                     2. reducing              analysis
                     3. “log analysis”

               Top Challenges with Hadoop*
                     1. Steep technical learning curve
                     2. Hiring qualified people
                     3. Availability of appropriate produ
                                                        ucts and tools


        Unfortunately, Hadoop was not designed specifically for ETL and BI use cases:
                                             d
                 It’s not a database
                 High latency queries and jobs not ideal for all BI use cases
                 Skill set mismatch for traditional ETL us
                                                         sers and BI Solution architects


   *Based on a survey of 100+ Hadoop users conducted by Karmasphere Sept 2010
                                                   d    Karmasphere, Sept.


© 2010, Pentaho. All Rights Reserved. www.pentaho.com.                     Worldwide: +1 (866) 660-7555 | Slide 5
ESTABLISHING A
                          AN
             ARCHITECTURE FFOR BIG DATA

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | Slide 6
Example Use Cases Today
         p                 y
          Transactional
          •Fraud detection
          •Financial services/sto k markets
           Fi    i l     i   / tock    k t


          Sub-Transactional
          •Weblogs
          •Social/online media
          •Telecoms events
© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.    Worldwide: +1 (866) 660-7555 | | Slide
                                                          US and Worldwide: +1 (866) 660-7555 Slide 7
Example Use Cases Today
         p                 y
          Non-Transactional
          •Web pages, blogs etc
                              c
          •Documents
           D      t
          •Physical events
             y
          •Application events
          •Machine events

          In most cases structur or semi-structured
                               red

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.    Worldwide: +1 (866) 660-7555 | | Slide
                                                          US and Worldwide: +1 (866) 660-7555 Slide 8
Traditional Business In
                           ntelligence ( )
                                 g     (BI)
                              Data Mart(s)




                                                          Tape/T
                                                               Trash

          Data                             ? ? ?
         Source                             ?
                                           ? ??


© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.                 Worldwide: +1 (866) 660-7555 | | Slide
                                                                       US and Worldwide: +1 (866) 660-7555 Slide 9
Data Lake
        • Single source
        • Large volume
        • Not distilled
        • T i ll no more th 0 2
          Typically       than 0-2
          lakes per company
        • Known and unknown
          questions
        • Multiple user communities
        • Don’t fit in traditional
          RDBMS with a reasonable
          cost

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | | Slide
                                                          US and Worldwide: +1 (866) 660-7555Slide 10
Data Lake Requiremen
                 q      nts
          • Store all the data
          • Satisfy routine reporting
            and analysis
          • Satisfy ad-hoc query /
            analysis / reporting
          • Balance performance and
            cost




© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | | Slide
                                                          US and Worldwide: +1 (866) 660-7555Slide 11
What if...
                              Data Mart(s)                  Ad-H
                                                               Hoc       Data Warehouse




                                                          Data L
                                                               Lake(s)

          Data
         Source



© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.                  Worldwide: +1 (866) 660-7555 | | Slide
                                                                         US and Worldwide: +1 (866) 660-7555Slide 12
Big Data Does Not Replace Data Marts
       g                 p




                                 It’s not a database
                                 High latency
                                                  sive data-crunching
                                 Optimized for mass
                                 Big Data databases are immature
                                                  s
                                 Databases are no SQL
                                               no-
© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.    Worldwide: +1 (866) 660-7555 | | Slide
                                                           US and Worldwide: +1 (866) 660-7555Slide 13
What Hadoop Really is
               p      y s….
     Core Components

          HDFS
                  a distributed file system allow
                                                wing massive
                  storage across a cluster of com
                                                mmodity
                  servers
          MapReduce
                  Framework for distributed com  mputation,
                  common use cases include agg   gregating,
                  sorting, and filtering BIG data sets
                  Problem is broken up into sma fragments
                                                all
                  of work that can be computed or
                                                d
                  recomputed in isolation on any node of the
                                                  y
                  cluster

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.    Worldwide: +1 (866) 660-7555 | | Slide
                                                           US and Worldwide: +1 (866) 660-7555Slide 14
What Hadoop Really is
               p      y s….
     Related Projects
          Hive – a data warehouse
          infrastructure on top of Hadoop
                                   H
                  Implements a SQL like Query l
                                              language,
                                              language
                  including a JDBC driver
                  Allows MapReduce developers to plugin
                            p             p       p g
                  custom mappers and reducers
          Hbase – the Hadoop data
                                abase –
          AH HA!
                  A variant of NoSQL databases,
                  problematic for traditional BI
                  Best at storing large amounts of
                  unstructured data
© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | | Slide
                                                          US and Worldwide: +1 (866) 660-7555Slide 15
Hadoop and BI?
          p
                  Distributed processin
                                      ng
                  Distributed file syste
                                       em
                  Commodity h d re
                  C    dit hardwar
                  Platform independen (in theory)
                                    nt
                  Scales out beyond te
                                     echnology and/or
                  economy of a RDBM MS

          In many cases it’s the only viable solution

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | | Slide
                                                          US and Worldwide: +1 (866) 660-7555Slide 16
Hadoop and BI?
          p



               90% of new Had doop use cases
                   are transfo
                             ormation of
                   semi/struct
                             tured data*
                                   data

           * of those companies we’ve talke to
                                we ve     ed to...

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | | Slide
                                                          US and Worldwide: +1 (866) 660-7555Slide 17
Hadoop and BI?
          p




            “The working conditio
                                ons
            within Hadoop are sho
                                ocking”
                                ocking


         ETL Developer




© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | | Slide
                                                          US and Worldwide: +1 (866) 660-7555Slide 18
Hadoop and BI?
          p
          Instead of this...




© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | | Slide
                                                          US and Worldwide: +1 (866) 660-7555Slide 19
Hadoop and BI?
          p
          You have to do this in Java...
               public void map(
                   Text key,
                   Text value,
                   OutputCollector output
                                        t,
                   Reporter reporter)

               public void reduce(
               p
                   Text key,
                   Iterator values,
                   OutputCollector output
                                        t,
                   Reporter reporter)


© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | | Slide
                                                          US and Worldwide: +1 (866) 660-7555Slide 20
People d t use
                                               don
                                               don’t
                                     Hadoop for BI because
                                        they wa to
                                               ant to...



© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | | Slide
                                                          US and Worldwide: +1 (866) 660-7555Slide 21
...they do i because
                                                they    it
                                                 they ha to
                                                       ave to...




© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.     Worldwide: +1 (866) 660-7555 | | Slide
                                                            US and Worldwide: +1 (866) 660-7555Slide 22
... and unfo
                                              ortunately it
                                       wasn’t d
                                              designed
                                 for most BI requirements



© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | | Slide
                                                          US and Worldwide: +1 (866) 660-7555Slide 23
Why not add to Hadoop
                                                d
                                    the things it’s missing...



© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | | Slide
                                                          US and Worldwide: +1 (866) 660-7555Slide 24
... until it can do
                                                           t
                                               what we n  need it to?



© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.         Worldwide: +1 (866) 660-7555 | | Slide
                                                                US and Worldwide: +1 (866) 660-7555Slide 25
If only w had a
                                     we
                           Java,
                           Java emb   beddable,
                                      beddable
                      data transformmation engine
                                           engine...



© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | | Slide
                                                          US and Worldwide: +1 (866) 660-7555Slide 26
A Data Integration Eng
                g         g
                          gine for Hadoop
                                        p
                                                  Data Marts, Da Warehouse,
                                                                ata
                                                     Analytical App
                                                         y      Applications


                                                          Data Integr
                                                                    ration
                                                              Enginee

                                                                                  Design
                                                          Data Integr
                                                                    ration
                           Hadoop                             Engine
                                                              E i e               Deploy
                                                                             Orchestrate
                                                          Data Integr
                                                                    ration
                                                              Engine
                                                                 g e

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.                      Worldwide: +1 (866) 660-7555 | | Slide
                                                                             US and Worldwide: +1 (866) 660-7555Slide 27
Visualize                                Reporting / Dashb
                                                               boards / Analysis


                                                                                                     Web Tier

                                                                DM &
                                                                   & DW                                 RDBMS
     Optimize
                                                                  Hiv
                                                                    ve
                                                                                                      Hadoop
                                                               Files / HDFS


         Load                                             Applications
                                                                     s & Systems

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.                        Worldwide: +1 (866) 660-7555 | | Slide
                                                                               US and Worldwide: +1 (866) 660-7555Slide 28
Reporting / Dashb
                                                               boards / Analysis


                                                                                                     Web Tier

                                                                DM &
                                                                   & DW                                 RDBMS
                            adata
                         Meta




                                                                  Hiv
                                                                    ve
                                                                                                      Hadoop
                                                               Files / HDFS


                                                          Applications
                                                                     s & Systems

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.                        Worldwide: +1 (866) 660-7555 | | Slide
                                                                               US and Worldwide: +1 (866) 660-7555Slide 29
Data Mart(s)                  Ad-H
                                                               Hoc       Data Warehouse




                                                          Data Lake(s)

          Data
         Source



© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.                  Worldwide: +1 (866) 660-7555 | | Slide
                                                                         US and Worldwide: +1 (866) 660-7555Slide 30
Reporting / Dashb
                                                               boards / Analysis


                                                                                                     Web Tier

                                                                                                        RDBMS


                                                           Data                                       Hadoop
                                                           Lake



                                                          Applications
                                                                     s & Systems

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.                        Worldwide: +1 (866) 660-7555 | | Slide
                                                                               US and Worldwide: +1 (866) 660-7555Slide 31
Product Requirements for BI Ag
                                gainst Hadoop
       Lower technical barriers through grap
                                           phical ETL
       environment for creating and managing Hadoop
                                           g
       MapReduce j b
       M R d     jobs                                     Interactive Analysis

                                                                                  Batch Reporting
       Extreme ETL scalability through deplo
                                           oyment                                 and Ad Hoc Query
       across the Hadoop cluster                             Data M t
                                                             D t Marts

        Easily spin-off high performance data marts for




                                                                 Ag BI
       interactive analysis




                                                                  gile
                                                                                        Hive
                                                                                        Hi
       Easily integrate data from Hadoop with data from
                                            h
       other sources                                                     Hadoop
       Provide end-to-end BI addressing comm BI use
       P    id    dt      d    dd     i      mon
                                                            Data Integration Jobs
       cases with Hadoop including reporting, ad hoc
       query and interactive analysis
       Reduce costs through subscription-base pricing,
                                             ed
       reduced dependency on scarce technica al                Log                DBs and
                                                               Files              other sources
       resources, and easier maintainability
                    d    i     i t i bilit

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.    Worldwide: +1 (866) 660-7555 | Slide 32
THE ROAD AHEAD


© 2010, Pentaho. All Rights Reserved. www.pentaho.com.    Worldwide: +1 (866) 660-7555 | Slide 33
The Road Ahead
          Other NoSQL Integration
                  Facilitate BI use cases on top of HBase, possibly others like
                                                    HBase
                  MongoDB, Cassandra
          Streaming Data Source Su
                                 upport
                  In support of near-realtime us cases
                                               se
                  Long/always running data proc cessing jobs
          Contiguous Meta-data
                  Data Lineage and Impact Analy covering the entire big data
                                              ysis
                  architecture
          The End of MapReduce ( as a concept ETL users need to
                       p       (… s        p
          understand)
                  Push down optimization of Tra
                                              ansformations that generate
                  native MapReduce tasks in Had
                                              doop
© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.        Worldwide: +1 (866) 660-7555 | | Slide
                                                               US and Worldwide: +1 (866) 660-7555Slide 34
Hadoop Distro Wars




   The Apache Software Foundation




© 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | Slide 35
Tools That Make Hado Easier
                                      oop
                   e.g. Apache Pig

       Pig is a platform for
       analyzing large data sets
              Produces sequences of
              MapReduce programs
       Integrate Pig scripts into
       enterprise data integration
       workflows e.g.
          1 Submit and monitor a
          1.
                 series of Pig and
                 MapReduce jobs
          2. Process a database bulk
                 load step to ready data
                 for ad-hoc analysis or
                 report bursting

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | Slide 36
Growth in Adoption of Other
                                          o
                       NoSQL Big Data Platf
                                          forms

        Hbase – the Hadoop database
        mongoDB – scalable high performance document oriented database
                    scalable, high-performance, document-oriented
        LexisNexis HPCC – a data intensive computing system platform
       Many others




© 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | Slide 37
Summary
       Hadoop and other Big Data NoSQL platforms
                                 N
              Great at storing and processin large diverse data volumes
                                           ng
              Not designed for Business Inte
                                           elligence


       Choosing the right BI technoology can unlock your Big Data
       to drive actionable insights
                               g
              Graphical user interfaces
              Scalable
              Spin-off data marts
              Integrate data into data warehhouses
              Integrated dashboards, reportting, data analysis, data
              integration


© 2010, Pentaho. All Rights Reserved. www.pentaho.com.   Worldwide: +1 (866) 660-7555 | Slide 38
Thank You!
                                                              k

                                               ifyfe@pen
                                                       ntaho.com
                                                       ntaho com



© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
 © 2010, Pentaho. All Rights Reserved. www.pentaho.com.                Worldwide: +1 (866) 660-7555 | | Slide
                                                                       US and Worldwide: +1 (866) 660-7555Slide 39

Mais conteúdo relacionado

Mais procurados

Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Jonathan Seidman
 
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Edureka!
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Jonathan Seidman
 
What is the Point of Hadoop
What is the Point of HadoopWhat is the Point of Hadoop
What is the Point of HadoopDataWorks Summit
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
Delivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDelivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDataWorks Summit
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IEdureka!
 
What is Hadoop?
What is Hadoop?What is Hadoop?
What is Hadoop?cneudecker
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Eric Baldeschwieler
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Edureka!
 
HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduceSkillspeed
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohugAdam Muise
 

Mais procurados (20)

Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
 
What is the Point of Hadoop
What is the Point of HadoopWhat is the Point of Hadoop
What is the Point of Hadoop
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
hadoop @ Ibmbigdata
hadoop @ Ibmbigdatahadoop @ Ibmbigdata
hadoop @ Ibmbigdata
 
Delivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDelivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated Architecture
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -I
 
What is Hadoop?
What is Hadoop?What is Hadoop?
What is Hadoop?
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
 
Big Data
Big DataBig Data
Big Data
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduce
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohug
 

Destaque

Pentaho big data camp - 5 min
Pentaho   big data camp - 5 minPentaho   big data camp - 5 min
Pentaho big data camp - 5 minianfyfe
 
Pentaho, Hadoop , Big Data e Data Lakes
Pentaho, Hadoop , Big Data e Data LakesPentaho, Hadoop , Big Data e Data Lakes
Pentaho, Hadoop , Big Data e Data LakesAmbiente Livre
 
Stratebi Big Data
Stratebi Big DataStratebi Big Data
Stratebi Big DataStratebi
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBigDataExpo
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Uday Kothari
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introductionmattcasters
 

Destaque (8)

Pentaho big data camp - 5 min
Pentaho   big data camp - 5 minPentaho   big data camp - 5 min
Pentaho big data camp - 5 min
 
Pentaho, Hadoop , Big Data e Data Lakes
Pentaho, Hadoop , Big Data e Data LakesPentaho, Hadoop , Big Data e Data Lakes
Pentaho, Hadoop , Big Data e Data Lakes
 
Stratebi Big Data
Stratebi Big DataStratebi Big Data
Stratebi Big Data
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of Analytics
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
 

Semelhante a Putting Business Intelligence to Work on Hadoop Data Stores

Hadoop uk user group meeting final
Hadoop uk user group meeting finalHadoop uk user group meeting final
Hadoop uk user group meeting finalSkills Matter
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User GroupPentaho
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Pentaho - Jake Cornelius - Hadoop World 2010
Pentaho - Jake Cornelius - Hadoop World 2010Pentaho - Jake Cornelius - Hadoop World 2010
Pentaho - Jake Cornelius - Hadoop World 2010Cloudera, Inc.
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Pactera_US
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Josh Patterson
 
How pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureHow pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureKovid Academy
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges"
Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges" Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges"
Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges" Dataconomy Media
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksHortonworks
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataYahoo Developer Network
 
Making the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysMaking the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysDataWorks Summit
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationInside Analysis
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoptionfaizrashid1995
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 

Semelhante a Putting Business Intelligence to Work on Hadoop Data Stores (20)

Hadoop uk user group meeting final
Hadoop uk user group meeting finalHadoop uk user group meeting final
Hadoop uk user group meeting final
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User Group
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Haven 2 0
Haven 2 0 Haven 2 0
Haven 2 0
 
Big Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - PentahoBig Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - Pentaho
 
Pentaho - Jake Cornelius - Hadoop World 2010
Pentaho - Jake Cornelius - Hadoop World 2010Pentaho - Jake Cornelius - Hadoop World 2010
Pentaho - Jake Cornelius - Hadoop World 2010
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
 
How pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureHow pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architecture
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges"
Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges" Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges"
Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges"
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big Data
 
Making the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysMaking the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British Airways
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop Acceleration
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 

Mais de DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 

Mais de DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Último

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Último (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Putting Business Intelligence to Work on Hadoop Data Stores

  • 1. Putting Business Intelligence to Work on Hado Data Stores oop Ian Fyfe, Chief Techno ology Evangelist, Pentaho © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights R Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | Slide 1
  • 2. Session Abstract This presentation will cover how to ov vercome Hadoop's constraints to get more out of your business data analyssis. An inexpensive way of storing large volumes of da ata, ata Hadoop is also scalable and redundant But redundant. getting data out of Hadoop is tough due to a lack of a built-in query language. Also, because users k experience high latency (up to several minutes pe query), Hadoop is not appropriate for ad hoc er query, reporting, and business analysis with tradiitional tools. The fi t t in Th first step i overcoming H d i Hadoop's constraints i connecting t HIVE a d t warehouse ' t i ts is ti to HIVE, data h infrastructure built on top of Hadoop, which provvides the relational structure necessary for schedule reporting of large datasets data stored in Hadoop files. HIVE also provides a simple query i language called Hive QL which is based on SQL an which enables users familiar with SQL to query nd this data. But to really unlock the power of Hadoop, you mu be able to efficiently extract data stored across ust multiple (often tens or hundreds) of nodes with a user-friendly ETL (extract, transform and load) tool that will then allow you to move y y your Hadoop data into a relational data mart or warehouse op where you can use BI tools for analysis. Attendees will learn, how an IT person without java programming skills can: Integrate with Hadoop and Hive to bring ETL, dat warehousing and BI applications to the tasks of ta analyzing Big Data; Provide key data integration and transformation functionality to Hadoop data; f Manage and control Hadoop jobs using a graphica interface; al Integrating Hadoop data with data from other souurces to drive compelling reporting and analytics for today's massive volumes of data. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | Slide 2
  • 3. THE CASE FOR B DATA BIG © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | Slide 3
  • 4. The Case for Big Data Enterprises increasingly face nee to store, process and maintain eds larger and larger volumes of structured and unstructured data Compliance Competitive Advantage Challenges associated with big da ata Cost – storage and processing power r Timeliness of data processing Why Hadoop? Google trends for ‘Hadoop’ Low cost, reliable scale-out architec cture for storing massive amounts of data Parallel, Parallel distributed computing frammework for processing data Proven success in solving Big Data pr roblems at fortune 500 companies like Google, Yahoo!, IBM and GE Vibrant community, exploding i Vib i l di intere strong commercial i est, i l investments © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | Slide 4
  • 5. Hadoop for Data Integration and BI Top Use Cases for Hadoop* 1. “mine data for improved busines intelligence” ss 2 “reducing cost of data analysis” 2. reducing analysis 3. “log analysis” Top Challenges with Hadoop* 1. Steep technical learning curve 2. Hiring qualified people 3. Availability of appropriate produ ucts and tools Unfortunately, Hadoop was not designed specifically for ETL and BI use cases: d It’s not a database High latency queries and jobs not ideal for all BI use cases Skill set mismatch for traditional ETL us sers and BI Solution architects *Based on a survey of 100+ Hadoop users conducted by Karmasphere Sept 2010 d Karmasphere, Sept. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | Slide 5
  • 6. ESTABLISHING A AN ARCHITECTURE FFOR BIG DATA © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | Slide 6
  • 7. Example Use Cases Today p y Transactional •Fraud detection •Financial services/sto k markets Fi i l i / tock k t Sub-Transactional •Weblogs •Social/online media •Telecoms events © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555 Slide 7
  • 8. Example Use Cases Today p y Non-Transactional •Web pages, blogs etc c •Documents D t •Physical events y •Application events •Machine events In most cases structur or semi-structured red © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555 Slide 8
  • 9. Traditional Business In ntelligence ( ) g (BI) Data Mart(s) Tape/T Trash Data ? ? ? Source ? ? ?? © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555 Slide 9
  • 10. Data Lake • Single source • Large volume • Not distilled • T i ll no more th 0 2 Typically than 0-2 lakes per company • Known and unknown questions • Multiple user communities • Don’t fit in traditional RDBMS with a reasonable cost © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 10
  • 11. Data Lake Requiremen q nts • Store all the data • Satisfy routine reporting and analysis • Satisfy ad-hoc query / analysis / reporting • Balance performance and cost © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 11
  • 12. What if... Data Mart(s) Ad-H Hoc Data Warehouse Data L Lake(s) Data Source © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 12
  • 13. Big Data Does Not Replace Data Marts g p It’s not a database High latency sive data-crunching Optimized for mass Big Data databases are immature s Databases are no SQL no- © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 13
  • 14. What Hadoop Really is p y s…. Core Components HDFS a distributed file system allow wing massive storage across a cluster of com mmodity servers MapReduce Framework for distributed com mputation, common use cases include agg gregating, sorting, and filtering BIG data sets Problem is broken up into sma fragments all of work that can be computed or d recomputed in isolation on any node of the y cluster © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 14
  • 15. What Hadoop Really is p y s…. Related Projects Hive – a data warehouse infrastructure on top of Hadoop H Implements a SQL like Query l language, language including a JDBC driver Allows MapReduce developers to plugin p p p g custom mappers and reducers Hbase – the Hadoop data abase – AH HA! A variant of NoSQL databases, problematic for traditional BI Best at storing large amounts of unstructured data © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 15
  • 16. Hadoop and BI? p Distributed processin ng Distributed file syste em Commodity h d re C dit hardwar Platform independen (in theory) nt Scales out beyond te echnology and/or economy of a RDBM MS In many cases it’s the only viable solution © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 16
  • 17. Hadoop and BI? p 90% of new Had doop use cases are transfo ormation of semi/struct tured data* data * of those companies we’ve talke to we ve ed to... © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 17
  • 18. Hadoop and BI? p “The working conditio ons within Hadoop are sho ocking” ocking ETL Developer © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 18
  • 19. Hadoop and BI? p Instead of this... © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 19
  • 20. Hadoop and BI? p You have to do this in Java... public void map( Text key, Text value, OutputCollector output t, Reporter reporter) public void reduce( p Text key, Iterator values, OutputCollector output t, Reporter reporter) © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 20
  • 21. People d t use don don’t Hadoop for BI because they wa to ant to... © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 21
  • 22. ...they do i because they it they ha to ave to... © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 22
  • 23. ... and unfo ortunately it wasn’t d designed for most BI requirements © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 23
  • 24. Why not add to Hadoop d the things it’s missing... © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 24
  • 25. ... until it can do t what we n need it to? © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 25
  • 26. If only w had a we Java, Java emb beddable, beddable data transformmation engine engine... © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 26
  • 27. A Data Integration Eng g g gine for Hadoop p Data Marts, Da Warehouse, ata Analytical App y Applications Data Integr ration Enginee Design Data Integr ration Hadoop Engine E i e Deploy Orchestrate Data Integr ration Engine g e © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 27
  • 28. Visualize Reporting / Dashb boards / Analysis Web Tier DM & & DW RDBMS Optimize Hiv ve Hadoop Files / HDFS Load Applications s & Systems © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 28
  • 29. Reporting / Dashb boards / Analysis Web Tier DM & & DW RDBMS adata Meta Hiv ve Hadoop Files / HDFS Applications s & Systems © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 29
  • 30. Data Mart(s) Ad-H Hoc Data Warehouse Data Lake(s) Data Source © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 30
  • 31. Reporting / Dashb boards / Analysis Web Tier RDBMS Data Hadoop Lake Applications s & Systems © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 31
  • 32. Product Requirements for BI Ag gainst Hadoop Lower technical barriers through grap phical ETL environment for creating and managing Hadoop g MapReduce j b M R d jobs Interactive Analysis Batch Reporting Extreme ETL scalability through deplo oyment and Ad Hoc Query across the Hadoop cluster Data M t D t Marts Easily spin-off high performance data marts for Ag BI interactive analysis gile Hive Hi Easily integrate data from Hadoop with data from h other sources Hadoop Provide end-to-end BI addressing comm BI use P id dt d dd i mon Data Integration Jobs cases with Hadoop including reporting, ad hoc query and interactive analysis Reduce costs through subscription-base pricing, ed reduced dependency on scarce technica al Log DBs and Files other sources resources, and easier maintainability d i i t i bilit © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | Slide 32
  • 33. THE ROAD AHEAD © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | Slide 33
  • 34. The Road Ahead Other NoSQL Integration Facilitate BI use cases on top of HBase, possibly others like HBase MongoDB, Cassandra Streaming Data Source Su upport In support of near-realtime us cases se Long/always running data proc cessing jobs Contiguous Meta-data Data Lineage and Impact Analy covering the entire big data ysis architecture The End of MapReduce ( as a concept ETL users need to p (… s p understand) Push down optimization of Tra ansformations that generate native MapReduce tasks in Had doop © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 34
  • 35. Hadoop Distro Wars The Apache Software Foundation © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | Slide 35
  • 36. Tools That Make Hado Easier oop e.g. Apache Pig Pig is a platform for analyzing large data sets Produces sequences of MapReduce programs Integrate Pig scripts into enterprise data integration workflows e.g. 1 Submit and monitor a 1. series of Pig and MapReduce jobs 2. Process a database bulk load step to ready data for ad-hoc analysis or report bursting © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | Slide 36
  • 37. Growth in Adoption of Other o NoSQL Big Data Platf forms Hbase – the Hadoop database mongoDB – scalable high performance document oriented database scalable, high-performance, document-oriented LexisNexis HPCC – a data intensive computing system platform Many others © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | Slide 37
  • 38. Summary Hadoop and other Big Data NoSQL platforms N Great at storing and processin large diverse data volumes ng Not designed for Business Inte elligence Choosing the right BI technoology can unlock your Big Data to drive actionable insights g Graphical user interfaces Scalable Spin-off data marts Integrate data into data warehhouses Integrated dashboards, reportting, data analysis, data integration © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | Slide 38
  • 39. Thank You! k ifyfe@pen ntaho.com ntaho com © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | | Slide US and Worldwide: +1 (866) 660-7555Slide 39