SlideShare a Scribd company logo
1 of 60
Download to read offline
Big Business Value from
Big Data and Hadoop




© Hortonworks Inc. 2012   Page 1
Topics
•  The Big Data Explosion: Hype or Reality
•  Introduction to Apache Hadoop
•  The Business Case for Big Data




                                             Page 2
        © Hortonworks Inc. 2012
Big Data: Hype or Reality?




© Hortonworks Inc. 2012      Page 3
What is Big Data?


              What is Big Data?




                                  Page 4
   © Hortonworks Inc. 2012
Big Data: Changing The Game for Organizations

                                                                     Transactions + Interactions
Petabytes
                  BIG DATA                       Mobile Web                  + Observations
                                                 Sentiment

                                                  User Click Stream
                                                                    SMS/MMS
                                                                                   = BIG DATA
                                                                         Speech to Text

                                                                Social Interactions & Feeds
  Terabytes       WEB                Web logs
                                                                         Spatial & GPS Coordinates
                                         A/B testing
                                                                                Sensors / RFID / Devices
                                                  Behavioral Targeting
   Gigabytes      CRM                                                                   Business Data Feeds
                                                             Dynamic Pricing
                                     Segmentation                                             External Demographics
                                                                    Search Marketing
                                         Customer Touches                                      User Generated Content
                  ERP
   Megabytes                                                           Affiliate Networks
                   Purchase detail              Support Contacts                                  HD Video, Audio, Images
                                                                         Dynamic Funnels
                   Purchase record
                                                    Offer details          Offer history            Product/Service Logs
                   Payment record



                                                  Increasing Data Variety and Complexity


               © Hortonworks Inc. 2012
Next Generation Data Platform Drivers
Organizations will need to become more data driven to compete



 Business             •     Enable new business models & drive faster growth (20%+)
  Drivers             •     Find insights for competitive advantage & optimal returns



                      •     Data continues to grow exponentially
 Technical            •     Data is increasingly everywhere and in many formats
   Drivers            •     Legacy solutions unfit for new requirements growth



 Financial            •     Cost of data systems, as % of IT spend, continues to grow
   Drivers            •     Cost advantages of commodity hardware & open source




       © Hortonworks Inc. 2012
What is a Data Driven Business?
     •  DEFINITION
        Better use of available data in the decision making process

     •  RULE
        Key metrics derived from data should be tied to goals

     •  PROVEN RESULTS
        Firms that adopt Data-Driven Decision Making have output and
        productivity that is 5-6% higher than what would be expected
        given their investments and usage of information technology*



1110010100001010011101010100010010100100101001001000010010001001000001000100000100
0100100100010000101110000100100010001010010010111101010010001001001010010100100111
11001010010100011111010001001010000010010001010010111101010011001001010010001000111


        * “Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance?” Brynjolfsson, Hitt and Kim (April 22, 2011)


                                                                                                                                            Page 7
                  © Hortonworks Inc. 2012
Path to Becoming Big Data Driven




4
   Key Considerations for a Data Driven Business
      1.     Large web properties were born this way, you may have to adapt a strategy
      2.     Start with a project tied to a key objective or KPI – Don’t OVER engineer
      3.     Make sure your Big Data strategy “fits” your organization and grow it over time
      4.     Don’t do big data just to do big data – you can get lost in all that data




       “Simply put, because of big data, managers can
   measure, and hence know, radically more about their
     businesses, and directly translate that knowledge
       into improved decision making & performance.”
                              - Erik Brynjolfsson and Andrew McAfee




                                                                                     Page 8
    © Hortonworks Inc. 2012
Wide Range of New Technologies

                              In Memory Storage
NoSQL
                             BigTable
 MPP                                    In Memory DB
Cloud                        Hadoop         HBase
                      Pig         Apache
 MapReduce                        YARN
                      Scale-out Storage
                                                    Page 9
   © Hortonworks Inc. 2012
A Few Definitions of New Technologies
• NoSQL
  – A broad range of new database management technologies mostly
    focused on low latency access to large amounts of data.
• MPP or “in database analytics”
  – Widely used approach in data warehouses that incorporates
    analytic algorithms with the data and then distribute processing
• Cloud
  – The use of computing resources that are delivered as a service
    over a network
• Hadoop
  – Open source data management software that combines massively
    parallel computing and highly-scalable distributed storage



                                                                       Page 10
      © Hortonworks Inc. 2012
Big Data: It’s About Scale & Structure

                                 SCALE (storage & processing)


                                         STRUCTURE


    Traditional                             MPP                           Hadoop
                               EDW                       NoSQL
    Database                              Analytics                   Distribution


   Standards and structured              governance                 Loosely structured


   Limited, no data processing           processing         Processing coupled w/ data


   Structured                             data types             Multi and unstructured




    A JDBC connector to Hive does not make you “big data”



                                                                                          Page 11
     © Hortonworks Inc. 2012
Hadoop Market: Growing & Evolving
•  Big data outranks virtualization
   as #1 trend driving spending                            Industry
                                                                         Unstructured
                                                                          Data, $6
   initiatives                                           Applications,                             Hadoop, $14
                                                             $12
   –  Barclays CIO survey April 2012
                                                                                                            NoSQL, $2

                                                 Enterprise
•  Overall market at $100B                     Applications, $5

   –  Hadoop 2nd only to RDBMS in potential
                                               Storage for DB,
                                                     $9

•  Estimates put market growth
   > 40% CAGR                                                                                               RDBMS, $27
                                                     Server for DB,
   –  IDC expects Big Data tech and services             $12
      market to grow to $16.9B in 2015                      Data Integration                Business
   –  According to JPMC 50% of Big Data                      & Quality, $4              Intelligence, $13
      market will be influenced by Hadoop

                                                 Source: Big Data II – New Themes, New Opportunities,
                                                            BofA Merrill Lynch, March 2012




                                                                                                               Page 12
         © Hortonworks Inc. 2012
Hadoop: Poised for Rapid Growth
                                                                                            •  Hadoop is in the chasm
                                                                                            •  Ecosystem evolving
customers
 relative %




                                                                                            •  Enterprises endure 1-3
                                                                                               year adoption cycle




                                              The CHASM
          Innovators,              Early                     Early
                                                                           Late majority,            Laggards,
          technology             adopters,                  majority,
                                                                           conservatives              Skeptics
          enthusiasts           visionaries               pragmatists




                                                                                                                          time
                  Customers want                                            Customers want
              technology & performance                                  solutions & convenience

                                                                                             Source: Geoffrey Moore - Crossing the Chasm



                                                                                                                                 Page 13
                 © Hortonworks Inc. 2012
What’s Needed to Drive Success?
•  Enterprise tooling to become
   a complete data platform
   –  Open deployment & provisioning
   –  Higher quality data loading
   –  Monitoring and management
   –  APIs for easy integration
                                                        www.hortonworks.com/moore

•  Ecosystem needs support & development
   –  Existing infrastructure vendors need to continue to integrate
   –  Apps need to continue to be developed on this infrastructure



•  Market needs to rally around core Apache Hadoop
   –  To avoid further splintering/market distraction
   –  To accelerate adoption




                                                                                    Page 14
        © Hortonworks Inc. 2012
What is Hadoop?




© Hortonworks Inc. 2012   Page 15
What is Apache Hadoop?
Open source data management software that combines massively parallel computing
and highly-scalable distributed storage to provide high performance computation
across large amounts of data




                                                                                  Page 16
  © Hortonworks Inc. 2012
Big Data Driven Means a New Data Platform

• Facilitate data capture
  – Collect data from all sources - structured and unstructured data
  – At all speeds batch, asynchronous, streaming, real-time
• Comprehensive data processing
  – Transform, refine, aggregate, analyze, report
• Simplified data exchange
  – Interoperate with enterprise data systems for query and processing
  – Share data with analytic applications
  – Data between analyst
• Easy to operate platform
  – Provision, monitor, diagnose, manage at scale
  – Reliability, availability, affordability, scalability, interoperability


                                                                              Page 17
      © Hortonworks Inc. 2012
Enterprise Data Platform Requirements
A data platform is an integrated set of components that
allows you to capture, process and share data in any
format, at scale


          Capture
                                  •  Store and integrate data sources in real time and batch


                                •  Process and manage data of any size and includes tools to
     Process
                                   sort, filter, summarize and apply basic functions to the data


          Exchange                •  Open and promote the exchange of data
                                     with new & existing external applications

    Operate                 •  Includes tools to manage and operate and gain
                               insight into performance and assure availability


                                                                                                   Page 18


      © Hortonworks Inc. 2012
Apache Hadoop
Open Source data management with scale-out storage & processing




                      Processing                      Storage


    Map Reduce                        HDFS

    •    Distributed across “nodes”   •    Splits a task across processors
                                           “near” the data & assembles results
    •    Natively redundant
                                      •    Self-Healing, High Bandwidth
    •    Name node tracks locations        Clustered Storage




                                                                                 Page 19
         © Hortonworks Inc. 2012
Apache Hadoop Characteristics
• Scalable
   – Efficiently store and process petabytes of data
   – Linear scale driven by additional processing and storage
• Reliable
   – Redundant storage
   – Failover across nodes and racks
• Flexible
   – Store all types of data in any format
   – Apply schema on analysis and sharing of the data
• Economical
   – Use commodity hardware
   – Open source software guards against vendor lock-in


                                                                Page 20
       © Hortonworks Inc. 2012
Tale of the Tape

               Relational                      VS.          Hadoop
               Required on write              schema        Required on read

                     Reads are fast            speed        Writes are fast

   Standards and structured                  governance     Loosely structured

 Limited, no data processing                 processing     Processing coupled with data

                                Structured   data types     Multi and unstructured


  Interactive OLAP Analytics                                Data Discovery
 Complex ACID Transactions                   best fit use   Processing unstructured data
      Operational Data Store                                Massive Storage/Processing


                                                                                     Page 21
      © Hortonworks Inc. 2012
What is a Hadoop “Distribution”
                                 Talend    WebHDFS           Sqoop          Flume
A complimentary set
                                           HCatalog
of open source                                                         HBase
                                    Pig               Hive
technologies that                   MapReduce                        HDFS
make up a complete                Ambari              Oozie                 HA
data platform                                    ZooKeeper




•  Tested and pre-packaged to ease installation and usage
•  Collects the right versions of the components that all have different
   release cycles and ensures they work together




       © Hortonworks Inc. 2012
Apache Hadoop Related Projects
                                Talend       WebHDFS            Sqoop          Flume
                                            HCatalog
                                                                           HBase
       Capture                       Pig                 Hive
                                     MapReduce                          HDFS
                                   Ambari               Oozie                  HA
                                                    ZooKeeper
  Process

                              WebHDFS
      Exchange                •  REST API that supports the complete FileSystem
                                 interface for HDFS.
                              •  Move data in and out and delete from HDFS
                              •  Perform file and directory functions

  Operate                     •  webhdfs://<HOST>:<HTTP PORT>/PATH

                              •  Standard and included in version 1.0 of Hadoop



                                                                                       Page 23
    © Hortonworks Inc. 2012
Apache Hadoop Related Projects
                                   Talend       WebHDFS             Sqoop          Flume
                                              HCatalog
                                                                                 HBase
       Capture                         Pig                   Hive
                                       MapReduce                            HDFS
                                    Ambari                  Oozie                  HA
                                                        ZooKeeper
  Process

                              Apache Sqoop
      Exchange                •    Sqoop is a set of tools that allow non-Hadoop data
                                   stores to interact with traditional relational databases
                                   and data warehouses.
                              •    A series of connectors have been created to support
                                   explicit sources such as Oracle & Teradata
  Operate
                              •    It moves data in and out of Hadoop

                              •    SQ-OOP: SQL to Hadoop


                                                                                              Page 24
    © Hortonworks Inc. 2012
Apache Hadoop Related Projects
                                 Talend        WebHDFS             Sqoop           Flume
                                             HCatalog
                                                                                HBase
       Capture                        Pig                   Hive
                                     MapReduce                             HDFS
                                   Ambari                  Oozie                  HA
                                                       ZooKeeper
  Process

                              Apache Flume
      Exchange                •  Distributed service for efficiently collecting, aggregating,
                                 and moving streams of log data into HDFS
                              •  Streaming capability with many failover and recovery
                                 mechanisms

  Operate                     •  Often used to move web log files directly into Hadoop




                                                                                           Page 25
    © Hortonworks Inc. 2012
Apache Hadoop Related Projects
                                Talend       WebHDFS             Sqoop          Flume
                                            HCatalog
                                                                             HBase
       Capture                       Pig                  Hive
                                     MapReduce                           HDFS
                                   Ambari                Oozie                  HA
                                                     ZooKeeper
  Process

                              Apache HBase
      Exchange                •  HBase is a non-relational database. It is columnar and
                                 provides fault-tolerant storage and quick access to
                                 large quantities of sparse data. It also adds
                                 transactional capabilities to Hadoop, allowing users to
                                 conduct updates, inserts and deletes.
  Operate




                                                                                        Page 26
    © Hortonworks Inc. 2012
Apache Hadoop Related Projects
                                Talend        WebHDFS            Sqoop          Flume
                                            HCatalog
                                                                              HBase
       Capture                       Pig                  Hive
                                     MapReduce                           HDFS
                                   Ambari                Oozie                  HA
                                                      ZooKeeper
  Process

                              Apache Pig
      Exchange                •  Apache Pig allows you to write complex map reduce
                                 transformations using a simple scripting language. Pig
                                 latin (the language) defines a set of transformations on
                                 a data set such as aggregate, join and sort among
                                 others. Pig Latin is sometimes extended using UDF
  Operate                        (User Defined Functions), which the user can write in
                                 Java and then call directly from the language.




                                                                                        Page 27
    © Hortonworks Inc. 2012
Apache Hadoop Related Projects
                                Talend       WebHDFS             Sqoop          Flume
                                            HCatalog
                                                                             HBase
       Capture                       Pig                  Hive
                                     MapReduce                           HDFS
                                   Ambari                Oozie                  HA
                                                     ZooKeeper
  Process

                              Apache HCatalog
      Exchange                •  HCatalog is a metadata management service for
                                 Apache Hadoop. It opens up the platform and allows
                                 interoperability across data processing tools such as
                                 Pig, Map Reduce and Hive. It also provides a table
                                 abstraction so that users need not be concerned with
  Operate                        where or how their data is stored.




                                                                                         Page 28
    © Hortonworks Inc. 2012
Apache Hadoop Related Projects
                                Talend       WebHDFS             Sqoop          Flume
                                            HCatalog
                                                                             HBase
       Capture                       Pig                  Hive
                                     MapReduce                           HDFS
                                   Ambari                Oozie                  HA
                                                     ZooKeeper
  Process

                              Apache Hive
      Exchange                •  Apache Hive is a data warehouse infrastructure built
                                 on top of Hadoop (originally by Facebook) for providing
                                 data summarization, ad-hoc query, and analysis of
                                 large datasets. It provides a mechanism to project
                                 structure onto this data and query the data using a
  Operate                        SQL-like language called HiveQL (HQL).




                                                                                        Page 29
    © Hortonworks Inc. 2012
Apache Hadoop Related Projects
                                Talend        WebHDFS             Sqoop          Flume
                                            HCatalog
                                                                              HBase
       Capture                       Pig                   Hive
                                     MapReduce                            HDFS
                                   Ambari                Oozie                   HA
                                                     ZooKeeper
  Process

                              Apache Ambari
      Exchange                •  Ambari is a monitoring, administration and lifecycle
                                 management project for Apache Hadoop clusters
                              •  It provides a mechanism to provisions nodes

                              •  Operationalizes Hadoop.
  Operate




                                                                                         Page 30
    © Hortonworks Inc. 2012
Apache Hadoop Related Projects
                                Talend        WebHDFS            Sqoop          Flume
                                             HCatalog
                                                                              HBase
       Capture                       Pig                  Hive
                                     MapReduce                           HDFS
                                   Ambari                 Oozie                 HA
                                                      ZooKeeper
  Process

                              Apache Oozie
      Exchange                •  Oozie coordinates jobs written in multiple languages
                                 such as Map Reduce, Pig and Hive. It is a workflow
                                 system that links these jobs and allows specification of
                                 order and dependencies between them.

  Operate




                                                                                        Page 31
    © Hortonworks Inc. 2012
Apache Hadoop Related Projects
                                Talend        WebHDFS            Sqoop          Flume
                                            HCatalog
                                                                              HBase
       Capture                       Pig                  Hive
                                     MapReduce                           HDFS
                                   Ambari                Oozie                  HA
                                                     ZooKeeper
  Process

                              Apache ZooKeeper
      Exchange                •  ZooKeeper is a centralized service for maintaining
                                 configuration information, naming, providing distributed
                                 synchronization, and providing group services.


  Operate




                                                                                        Page 32
    © Hortonworks Inc. 2012
Using Hadoop “out of the box”
1.  Provision your cluster w/ Ambari
2.  Load your data into HDFS
  – WebHDFS, distcp, Java
3.  Express queries in Pig and Hive
  – Shell or scripts
4.  Write some UDFs and MR-jobs by hand
5.  Graduate to using data scientists
6.  Scale your cluster and your analytics
  – Integrate to the rest of your enterprise data silos
    (TOPIC OF TODAY’S DISCUSSION)




      © Hortonworks Inc. 2012
Hadoop in Enterprise Data Architectures
    Existing Business Infrastructure                                                 Web                      New Tech

                                                                                                                   Datameer
                                                                                                                    Tableau
                                                                                                                  Karmasphere
   IDE &          ODS &             Applications &   Visualization &                  Web                            Splunk
  Dev Tools      Datamarts          Spreadsheets       Intelligence                Applications


                                                                                                                                 Operations

                      Discovery                                                 Low Latency/
                        Tools                         EDW
                                                                                  NoSQL
                                                                                                                                 Custom   Existing



                                                              Templeton        WebHDFS             Sqoop            Flume
                                                                              HCatalog
                                                                                                                  HBase
                                                                       Pig                 Hive
                                                                       MapReduce                           HDFS
                                                                     Ambari                Oozie                    HA
                                                                                       ZooKeeper




                                                            Social               Exhaust                   logs          files
       CRM           ERP             financials             Media                 Data


                                                  Big Data Sources
                                      (transactions, observations, interactions)



                                                                                                                                          Page 34
          © Hortonworks Inc. 2012
The Business Case for
Hadoop




© Hortonworks Inc. 2012   Page 35
Apache Hadoop: Patterns of Use

                                           Big Data
                             Transactions, Interactions, Observations




                             Refine         Explore          Enrich




                                  Business Case

                                                                        Page 36
   © Hortonworks Inc. 2012
Operational Data Refinery
Hadoop as platform for ETL modernization
                                                                             Refine   Explore       Enrich



Unstructured     Log files           DB data   Capture and Archive
                                               •  Capture new unstructured data along with
                                                  logfiles all alongside existing sources
                                               •  Retain inputs in raw form for audit and
         Capture and archive
                                                  continuity purposes
           Parse & Cleanse
                                               Transform, change and refine
          Structure and join                   •  Parse the data & cleanse
                 Upload                        •  Apply structure and definition
                              Refinery
                                               •  Join datasets together across disparate data
                                                  sources
                                               Upload
                                               •  Push to existing data warehouse for
                                                  downstream consumption
              Enterprise
                                               •  Feeds operational reporting and online systems
           Data Warehouse



                                                                                                Page 37
           © Hortonworks Inc. 2012
“Big Bank” Service 2 Users with 1 Pipeline
                                                                                     Empower scientists and systems
various upstream apps and feeds                                                      - Best of breed exploration platforms
- EBCDIC                                                                             - plus existing TD warehouse
- weblogs
- relational feeds


                                                                                        Veritca              Aster
                                                                GPFS
                              10k files per day



                                                                                        Palantir            Teradata


                              transform to Hcatalog datatypes




                                                  1         .          .       .



                                                  .         .          .       .
                 3 - 5 years retention
                                                                                       Detect only changed records
                                                                                           and upload to EDW
                                                  .         .          .       n


                                                 HDFS (scalable-clustered storage)



           © Hortonworks Inc. 2012
“Big Bank” Key Benefits
• Capture and archive
  – Retain 3 – 5 years instead of 2 – 10 days
  – Lower costs
  – Improved compliance
• Transform, change, refine
  – Turn upstream raw dumps into small list of “new, update, delete”
    customer records
  – Convert fixed-width EBCDIC to UTF-8 (Java and DB compatible)
  – Turn raw weblogs into sessions and behaviors
• Upload
  – Insert into Teradata for downstream “as-is” reporting and tools
  – Insert into new exploration platform for scientists to play with



      © Hortonworks Inc. 2012
Big Data Exploration & Visualization
Hadoop as agile, ad-hoc data mart
                                                                             Refine   Explore       Enrich



Unstructured     Log files           DB data   Capture and archive
                                               •  Same as in “refine”
                                               Structure and join
         Capture and archive                   •  Parse the data into queryable format
          Structure and join                   •  Exploratory analysis using Hive and Pig to
                                                  discover value and lower complexity
         Categorize into tables
                                               Categorize into tables
      upload              JDBC / ODBC          •  Label data and type information for
                                     Explore      compatibility and later discovery
                                               •  Pre-compute stats, groupings, patterns in data
                                                  to accelerate analysis
                                               Visualize
                                               •  See key insights in existing powerful tools
                              Visualization    •  Move actionable insights into reports and marts
  Datamarts                       Tools



                                                                                                Page 40
           © Hortonworks Inc. 2012
“Hardware Manufacturer” Unlocking
Customer Survey Data

                                                                          product satisfaction
                                                                        and customer feedback
                                                                         analysis via Tableau




                                                    Searchable text fields
                                                      (Lucene index)

                                                  Multiple choice survey data


                                                        weblog traffic




                                              1           .         .           .



                                              .           .         .           .
                              Existing
                        Customer Service DB

                                              .           .         .           n


                                                       Hadoop cluster


    © Hortonworks Inc. 2012
“Hardware Manufacturer” Key Benefits
• Capture and archive
  – Store 10+ survey forms/year for > 3 years
  – Capture text, audio, and systems data in one platform
• Structure and join
  – Unlock freeform text and audio data
  – Un-anonymize customers
• Categorize into tables
  – Create HCatalog tables “customer”, “survey”, “freeform text”
• Upload, JDBC
  – Visualize natural satisfaction levels and groups
  – Tag customers as “happy” and report back to CRM database




      © Hortonworks Inc. 2012
Application Enrichment
Deliver Hadoop analysis to online apps
                                                                                  Refine   Explore       Enrich



Unstructured      Log files          DB data   Capture
                                               •  Capture data that was once
                                                  too bulky and unmanageable
      Capture
                          Enrich
       Parse
                                               Derive Useful Information
    Derive/Filter                              •    Uncover aggregate characteristics across data
                           Scheduled &
                           near real time      •    Use Hive Pig and Map Reduce to identify patterns
   NoSQL, HBase                                •    Filter useful data from mass streams (Pig)
    Low Latency
                                               •    Micro or macro batch oriented schedules

                                               Share
                                               •  Push results to HBase or other NoSQL alternative
                                                  for real time delivery
     Online
                                               •  Use patterns to deliver right content/offer to the
   Applications                                   right person at the right time


                                                                                                     Page 43
           © Hortonworks Inc. 2012
“Clothing Retailer” Custom Homepages
 anonymous                                             cookied
  shopper                                              shopper




                   Default                  Custom
                                                                 HBase custom homepage data
                  Homepage                 Homepage




                                                                 1            .         .            .



                                  Retail                         .            .         .            .
                                 website

                                                                 .            .         .            n


                                                                     Hadoop for "cluster analysis"
                                             weblogs
              Sales
             database




       © Hortonworks Inc. 2012
“Clothing Retailer” Key Benefits
• Capture
  – Capture weblogs together with sales order history, customer
    master
• Derive useful information
  – Compute relationships between products over time
      – “people who buy shirts eventually need pants”
  – Score customer web behavior / sentiment
  – Connect product recommendations to customer sentiment
• Share
  – Load customer recommendations into HBase for rapid website
    service




     © Hortonworks Inc. 2012
Business Cases of Hadoop

   Vertical Refine                                  Explore                            Enrich
                                                                                       •  Dynamic Pricing
                    •  Log Analysis/Site
 Retail & Web                                       •  Social Network Analysis         •  Session & Content
                       Optimization
                                                                                          Optimization

                    •  Loyalty Program                                                 •  Dynamic Pricing/Targeted
       Retail                                       •  Brand and Sentiment Analysis
                       Optimization                                                       Offer


  Intelligence      •  Threat Identification         •  Person of Interest Discovery    •  Cross Jurisdiction Queries


                    •  Risk Modeling & Fraud
                                                    •  Surveillance and Fraud
                       Identification                                                  •  Real-time upsell, cross sales
     Finance        •  Trade Performance
                                                       Detection
                                                                                          marketing offers
                                                    •  Customer Risk Analysis
                       Analytics

                    •  Smart Grid: Production       •  Grid Failure Prevention
      Energy                                                                           •  Individual Power Grid
                       Optimization                 •  Smart Meters


                                                                                       •  Dynamic Delivery
Manufacturing       •  Supply Chain Optimization    •  Customer Churn Analysis
                                                                                       •  Replacement parts

 Healthcare &       •  Electronic Medical Records   •  Clinical Trials Analysis        •  Insurance Premium
        Payer          (EMPI)                                                             Determination




         © Hortonworks Inc. 2012
Goal: Optimize Outcomes at Scale
                     Media     optimize                 Content
       Intelligence            optimize                 Detection
                Finance        optimize                 Algorithms
       Advertising             optimize                 Performance
                     Fraud     optimize                 Prevention
Retail / Wholesale             optimize                 Inventory turns
   Manufacturing               optimize                 Supply chains
         Healthcare            optimize                 Patient outcomes
           Education           optimize                 Learning outcomes
     Government                optimize                 Citizen services
                                     Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation.

     © Hortonworks Inc. 2012
Hortonworks Data Platform




© Hortonworks Inc. 2012     Page 48
Hortonworks Vision & Role

                                We believe that by the end of 2015,
                                more than half the world's data will be
                                processed by Apache Hadoop.



  1       Be diligent stewards of the open source core

  2       Be tireless innovators beyond the core

  3       Provide robust data platform services & open APIs

  4       Enable the ecosystem at each layer of the stack

  5       Make the platform enterprise-ready & easy to use


      © Hortonworks Inc. 2012
Hadoop Emerging as Enterprise Big Data Platform



  Applications,                                                              Installation & Configuration,
  Business Tools,                                                            Administration,
  Development Tools,                                                         Monitoring,
  Open APIs and access                                                       High Availability,
  Data Movement & Integration,                                               Replication,
  Data Management Systems,                                                   Multi-tenancy, ..
  Systems Management
                                             Hortonworks
                                             Data Platform

                                         DEVELOPER
                                  Data Platform Services & Open APIs

                                     Metadata, Indexing, Search, Security,
                                    Management, Data Extract & Load, APIs




        © Hortonworks Inc. 2012
Delivering on the Promise of Apache Hadoop




      Trusted                                Open                   Innovative

•  Stewards of the core            •    100% open platform      •  Innovating current platform
•  Original builders and           •    No POS holdback            with HCatalog, Ambari, HA
   operators of Hadoop             •    Open to the community   •  Innovating future platform
•  100+ years development          •    Open to the ecosystem      with YARN, HA
   experience                                                   •  Complete vision for
                                   •    Closely aligned to
•  Managed every viable                 Apache Hadoop core         Hadoop-based platform
   Hadoop release
•  HDP built on Hadoop 1.0




         © Hortonworks Inc. 2012
Hortonworks Apache Hadoop Expertise

Hortonworks represent more than
                                                         Hortonworkers
90 years combined Hadoop
                                                         are the builders,
development experience
                                                         operators and
                                                         core architects of
8 of top 15 highest lines of code*                       Apache Hadoop
contributors to Hadoop of all time
                               …and 3 of top 5


     “Hortonworks’ engineers spend all of their time improving open
      source Apache Hadoop. Other Hadoop engineers are going to spend
      the bulk of their time working on their proprietary software”

     “We have noticed more activity over the last year from Hortonworks’
      engineers on building out Apache Hadoop’s more innovative
      features. These include YARN, Ambari and HCatalog..”


     © Hortonworks Inc. 2012
Hortonworks Data Platform

                                                                                                                                                                                                                     •  Simplify deployment to get
                                                                                        Non-Relational Database
Management & Monitoring Svcs




                                                                                                                            Scripting                Query
                                                                                                                              (Pig)                  (Hive)                                                             started quickly and easily




                                                                                                                                                              Data Integration Services
                                                                                                                                                                                          (HCatalog APIs, WebHDFS,
                                                     Workflow & Scheduling




                                                                                                                                                                                           Talend, Sqoop, FlumeNG)
                               (Ambari, Zookeeper)




                                                                                                                                        Metadata
                                                                                                                  (HBase)

                                                                                                                                                                                                                     •  Monitor, manage any size cluster
                                                                                                                                        Services
                                                                                                                                                                                                                        with familiar console and tools
                                                                             (Oozie)




                                                                                                                                        (HCatalog)

                                                                                                                              Distributed Processing                                                                 •  Only platform to include data
                                                                                                                                      (MapReduce)
                                                                                                                                                                                                                        integration services to interact
                                                                                                                               1                                                                                        with any data source
                                                                                                                            Distributed Storage
                                                                                                                                   (HDFS)
                                                                                                                                                                                                                     •  Metadata services opens the
                                                                                                                                                                                                                        platform for integration with
                                                                                       Hortonworks Data Platform                                                                                                        existing applications
                                                        Delivers enterprise grade functionality on a proven
                                                        Apache Hadoop distribution to ease management,                                                                                                               •  Dependable high availability
                                                       simplify use and ease integration into the enterprise                                                                                                            architecture




                                   The only 100% open source data platform for Apache Hadoop

                                                                                © Hortonworks Inc. 2012
Demo




© Hortonworks Inc. 2012   Page 54
Why Hortonworks Data Platform?

ONLY Hortonworks Data Platform provides…
•  Tightly aligned to core Apache Hadoop development line
   - Reduces risk for customers who may add custom coding or projects

•  Enterprise Integration
  - HCatalog provides scalable, extensible integration point to Hadoop data

•  Most reliable Hadoop distribution
  - Full stack high availability on v1 delivers the strongest SLA guarantees

•  Multi-tenant scheduling and resource management
  - Capacity and fair scheduling optimizes cluster resources

•  Integration with operations, eases cluster management
  - Ambari is the most open/complete operations platform for Hadoop clusters




        © Hortonworks Inc. 2012
Hortonworks Growing Partner Ecosystem




                                        Page 56
    © Hortonworks Inc. 2012
Hortonworks Support Subscriptions
Objective: help organizations to successfully develop
and deploy solutions based upon Apache Hadoop
• Full-lifecycle technical support available
  – Developer support for design, development and POCs
  – Production support for staging and production environments
       – Up to 24x7 with 1-hour response times

• Delivered by the Apache Hadoop experts
  – Backed by development team that has released every major
    version of Apache Hadoop since 0.1

• Forward-compatibility
  – Hortonworks’ leadership role helps ensure bug fixes and patches
    can be included in future versions of Hadoop projects



                                                                 Page 57
      © Hortonworks Inc. 2012
Hortonworks Training
                         The expert source for
                         Apache Hadoop training & certification

Role-based training for developers,
administrators & data analysts
  –  Coursework built and maintained by the core Apache Hadoop development team.
  –  The “right” course, with the most extensive and realistic hands-on materials
  –  Provide an immersive experience into real-world Hadoop scenarios
  –  Public and Private courses available


Comprehensive Apache Hadoop Certification
  –  Become a trusted and valuable
     Apache Hadoop expert




                                                                              Page 58
      © Hortonworks Inc. 2012
What next?

1                                 Download Hortonworks Data Platform
                                  hortonworks.com/download




2   Use the getting started guide
    hortonworks.com/get-started



3   Learn more… get support

                                                             Hortonworks Support
       •  Expert role based training                         •  Full lifecycle technical support
       •  Course for admins, developers                         across four service levels
          and operators                                      •  Delivered by Apache Hadoop
       •  Certification program                                 Experts/Committers
       •  Custom onsite options                              •  Forward-compatible
        hortonworks.com/training                             hortonworks.com/support


                                                                                                   Page 59
        © Hortonworks Inc. 2012
Q&A




© Hortonworks Inc. 2012   Page 60

More Related Content

What's hot

Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudHortonworks
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalHortonworks
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalDiego Alberto Tamayo
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopData Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopHortonworks
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarHortonworks
 
Democratizing Big Data with Microsoft Azure HDInsight
Democratizing Big Data with Microsoft Azure HDInsightDemocratizing Big Data with Microsoft Azure HDInsight
Democratizing Big Data with Microsoft Azure HDInsightHortonworks
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicDataWorks Summit
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHortonworks
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsHortonworks
 
Bigger Data For Your Budget
Bigger Data For Your BudgetBigger Data For Your Budget
Bigger Data For Your BudgetHortonworks
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureDataWorks Summit
 

What's hot (20)

Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopData Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache Hadoop
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
Democratizing Big Data with Microsoft Azure HDInsight
Democratizing Big Data with Microsoft Azure HDInsightDemocratizing Big Data with Microsoft Azure HDInsight
Democratizing Big Data with Microsoft Azure HDInsight
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
 
Bigger Data For Your Budget
Bigger Data For Your BudgetBigger Data For Your Budget
Bigger Data For Your Budget
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
 

Viewers also liked

How to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the CloudHow to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the CloudAttunity
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Task magic 3
Task magic 3Task magic 3
Task magic 3rvhstl
 
Attunity Solutions for Teradata
Attunity Solutions for TeradataAttunity Solutions for Teradata
Attunity Solutions for TeradataAttunity
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Hortonworks
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data ApplicationsRichard McDougall
 
Online hotel booking application - Design Process
Online hotel booking application - Design ProcessOnline hotel booking application - Design Process
Online hotel booking application - Design Processchayapathi sarath
 
Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
Partners 2013 LinkedIn Use Cases for Teradata Connectors for HadoopPartners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
Partners 2013 LinkedIn Use Cases for Teradata Connectors for HadoopEric Sun
 
SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW)SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW) Karan Gulati
 
Equinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyEquinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyPraveen Kumar
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
 
Streaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For ScaleStreaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For ScaleHelena Edelson
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Confluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern AnalyticsConfluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern Analyticsconfluent
 
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Amazon Web Services
 
TechConnectr's Big Data Connection. Digital Marketing KPIs, Targeting, Analy...
TechConnectr's Big Data Connection.  Digital Marketing KPIs, Targeting, Analy...TechConnectr's Big Data Connection.  Digital Marketing KPIs, Targeting, Analy...
TechConnectr's Big Data Connection. Digital Marketing KPIs, Targeting, Analy...Bob Samuels
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Helena Edelson
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...SoftServe
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics ArchitectureArvind Sathi
 

Viewers also liked (20)

How to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the CloudHow to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the Cloud
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Task magic 3
Task magic 3Task magic 3
Task magic 3
 
Attunity Solutions for Teradata
Attunity Solutions for TeradataAttunity Solutions for Teradata
Attunity Solutions for Teradata
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data Applications
 
Online hotel booking application - Design Process
Online hotel booking application - Design ProcessOnline hotel booking application - Design Process
Online hotel booking application - Design Process
 
Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
Partners 2013 LinkedIn Use Cases for Teradata Connectors for HadoopPartners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
 
SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW)SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW)
 
Equinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyEquinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journey
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Streaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For ScaleStreaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For Scale
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Confluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern AnalyticsConfluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern Analytics
 
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
 
TechConnectr's Big Data Connection. Digital Marketing KPIs, Targeting, Analy...
TechConnectr's Big Data Connection.  Digital Marketing KPIs, Targeting, Analy...TechConnectr's Big Data Connection.  Digital Marketing KPIs, Targeting, Analy...
TechConnectr's Big Data Connection. Digital Marketing KPIs, Targeting, Analy...
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 

Similar to Hortonworks roadshow

Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightHortonworks
 
Hadoop: What It Is and What It's Not
Hadoop: What It Is and What It's NotHadoop: What It Is and What It's Not
Hadoop: What It Is and What It's NotInside Analysis
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformHortonworks
 
Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...Hortonworks
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopHortonworks
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisOW2
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationDataWorks Summit
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessTeradata Aster
 
OSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - TechnicalOSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - TechnicalAccenture the Netherlands
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesDataWorks Summit
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantStuart Miniman
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondTeradata Aster
 
When Worlds Collide: Intelligence, Analytics and Operations
When Worlds Collide: Intelligence, Analytics and OperationsWhen Worlds Collide: Intelligence, Analytics and Operations
When Worlds Collide: Intelligence, Analytics and OperationsInside Analysis
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forumbigdatawf
 
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelA Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelInside Analysis
 
The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureInside Analysis
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxData Science London
 

Similar to Hortonworks roadshow (20)

Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
 
Hadoop: What It Is and What It's Not
Hadoop: What It Is and What It's NotHadoop: What It Is and What It's Not
Hadoop: What It Is and What It's Not
 
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data Platform
 
Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...Break Through the Traditional Advertisement Services with Big Data and Apache...
Break Through the Traditional Advertisement Services with Big Data and Apache...
 
vBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and BeyondvBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and Beyond
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache Hadoop
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 
OSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - TechnicalOSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - Technical
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation Architectures
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
When Worlds Collide: Intelligence, Analytics and Operations
When Worlds Collide: Intelligence, Analytics and OperationsWhen Worlds Collide: Intelligence, Analytics and Operations
When Worlds Collide: Intelligence, Analytics and Operations
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forum
 
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelA Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
 
The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information Architecture
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
 

More from Accenture

Certify 2014trends-report
Certify 2014trends-reportCertify 2014trends-report
Certify 2014trends-reportAccenture
 
Calabrio analyze
Calabrio analyzeCalabrio analyze
Calabrio analyzeAccenture
 
Tier 2 net app baseline design standard revised nov 2011
Tier 2 net app baseline design standard   revised nov 2011Tier 2 net app baseline design standard   revised nov 2011
Tier 2 net app baseline design standard revised nov 2011Accenture
 
Perf stat windows
Perf stat windowsPerf stat windows
Perf stat windowsAccenture
 
Performance problems on ethernet networks when the e0m management interface i...
Performance problems on ethernet networks when the e0m management interface i...Performance problems on ethernet networks when the e0m management interface i...
Performance problems on ethernet networks when the e0m management interface i...Accenture
 
NetApp system installation workbook Spokane
NetApp system installation workbook SpokaneNetApp system installation workbook Spokane
NetApp system installation workbook SpokaneAccenture
 
Migrate volume in akfiler7
Migrate volume in akfiler7Migrate volume in akfiler7
Migrate volume in akfiler7Accenture
 
Migrate vol in akfiler7
Migrate vol in akfiler7Migrate vol in akfiler7
Migrate vol in akfiler7Accenture
 
Data storage requirements AK
Data storage requirements AKData storage requirements AK
Data storage requirements AKAccenture
 
C mode class
C mode classC mode class
C mode classAccenture
 
Akfiler upgrades providence july 2012
Akfiler upgrades providence july 2012Akfiler upgrades providence july 2012
Akfiler upgrades providence july 2012Accenture
 
Reporting demo
Reporting demoReporting demo
Reporting demoAccenture
 
Net app virtualization preso
Net app virtualization presoNet app virtualization preso
Net app virtualization presoAccenture
 
Providence net app upgrade plan PPMC
Providence net app upgrade plan PPMCProvidence net app upgrade plan PPMC
Providence net app upgrade plan PPMCAccenture
 
WSC Net App storage for windows challenges and solutions
WSC Net App storage for windows challenges and solutionsWSC Net App storage for windows challenges and solutions
WSC Net App storage for windows challenges and solutionsAccenture
 
50,000-seat_VMware_view_deployment
50,000-seat_VMware_view_deployment50,000-seat_VMware_view_deployment
50,000-seat_VMware_view_deploymentAccenture
 
Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...
Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...
Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...Accenture
 
Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11
Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11
Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11Accenture
 
Snap mirror source to tape to destination scenario
Snap mirror source to tape to destination scenarioSnap mirror source to tape to destination scenario
Snap mirror source to tape to destination scenarioAccenture
 

More from Accenture (20)

Certify 2014trends-report
Certify 2014trends-reportCertify 2014trends-report
Certify 2014trends-report
 
Calabrio analyze
Calabrio analyzeCalabrio analyze
Calabrio analyze
 
Tier 2 net app baseline design standard revised nov 2011
Tier 2 net app baseline design standard   revised nov 2011Tier 2 net app baseline design standard   revised nov 2011
Tier 2 net app baseline design standard revised nov 2011
 
Perf stat windows
Perf stat windowsPerf stat windows
Perf stat windows
 
Performance problems on ethernet networks when the e0m management interface i...
Performance problems on ethernet networks when the e0m management interface i...Performance problems on ethernet networks when the e0m management interface i...
Performance problems on ethernet networks when the e0m management interface i...
 
NetApp system installation workbook Spokane
NetApp system installation workbook SpokaneNetApp system installation workbook Spokane
NetApp system installation workbook Spokane
 
Migrate volume in akfiler7
Migrate volume in akfiler7Migrate volume in akfiler7
Migrate volume in akfiler7
 
Migrate vol in akfiler7
Migrate vol in akfiler7Migrate vol in akfiler7
Migrate vol in akfiler7
 
Data storage requirements AK
Data storage requirements AKData storage requirements AK
Data storage requirements AK
 
C mode class
C mode classC mode class
C mode class
 
Akfiler upgrades providence july 2012
Akfiler upgrades providence july 2012Akfiler upgrades providence july 2012
Akfiler upgrades providence july 2012
 
NA notes
NA notesNA notes
NA notes
 
Reporting demo
Reporting demoReporting demo
Reporting demo
 
Net app virtualization preso
Net app virtualization presoNet app virtualization preso
Net app virtualization preso
 
Providence net app upgrade plan PPMC
Providence net app upgrade plan PPMCProvidence net app upgrade plan PPMC
Providence net app upgrade plan PPMC
 
WSC Net App storage for windows challenges and solutions
WSC Net App storage for windows challenges and solutionsWSC Net App storage for windows challenges and solutions
WSC Net App storage for windows challenges and solutions
 
50,000-seat_VMware_view_deployment
50,000-seat_VMware_view_deployment50,000-seat_VMware_view_deployment
50,000-seat_VMware_view_deployment
 
Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...
Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...
Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...
 
Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11
Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11
Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11
 
Snap mirror source to tape to destination scenario
Snap mirror source to tape to destination scenarioSnap mirror source to tape to destination scenario
Snap mirror source to tape to destination scenario
 

Hortonworks roadshow

  • 1. Big Business Value from Big Data and Hadoop © Hortonworks Inc. 2012 Page 1
  • 2. Topics •  The Big Data Explosion: Hype or Reality •  Introduction to Apache Hadoop •  The Business Case for Big Data Page 2 © Hortonworks Inc. 2012
  • 3. Big Data: Hype or Reality? © Hortonworks Inc. 2012 Page 3
  • 4. What is Big Data? What is Big Data? Page 4 © Hortonworks Inc. 2012
  • 5. Big Data: Changing The Game for Organizations Transactions + Interactions Petabytes BIG DATA Mobile Web + Observations Sentiment User Click Stream SMS/MMS = BIG DATA Speech to Text Social Interactions & Feeds Terabytes WEB Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting Gigabytes CRM Business Data Feeds Dynamic Pricing Segmentation External Demographics Search Marketing Customer Touches User Generated Content ERP Megabytes Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Offer details Offer history Product/Service Logs Payment record Increasing Data Variety and Complexity © Hortonworks Inc. 2012
  • 6. Next Generation Data Platform Drivers Organizations will need to become more data driven to compete Business •  Enable new business models & drive faster growth (20%+) Drivers •  Find insights for competitive advantage & optimal returns •  Data continues to grow exponentially Technical •  Data is increasingly everywhere and in many formats Drivers •  Legacy solutions unfit for new requirements growth Financial •  Cost of data systems, as % of IT spend, continues to grow Drivers •  Cost advantages of commodity hardware & open source © Hortonworks Inc. 2012
  • 7. What is a Data Driven Business? •  DEFINITION Better use of available data in the decision making process •  RULE Key metrics derived from data should be tied to goals •  PROVEN RESULTS Firms that adopt Data-Driven Decision Making have output and productivity that is 5-6% higher than what would be expected given their investments and usage of information technology* 1110010100001010011101010100010010100100101001001000010010001001000001000100000100 0100100100010000101110000100100010001010010010111101010010001001001010010100100111 11001010010100011111010001001010000010010001010010111101010011001001010010001000111 * “Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance?” Brynjolfsson, Hitt and Kim (April 22, 2011) Page 7 © Hortonworks Inc. 2012
  • 8. Path to Becoming Big Data Driven 4 Key Considerations for a Data Driven Business 1.  Large web properties were born this way, you may have to adapt a strategy 2.  Start with a project tied to a key objective or KPI – Don’t OVER engineer 3.  Make sure your Big Data strategy “fits” your organization and grow it over time 4.  Don’t do big data just to do big data – you can get lost in all that data “Simply put, because of big data, managers can measure, and hence know, radically more about their businesses, and directly translate that knowledge into improved decision making & performance.” - Erik Brynjolfsson and Andrew McAfee Page 8 © Hortonworks Inc. 2012
  • 9. Wide Range of New Technologies In Memory Storage NoSQL BigTable MPP In Memory DB Cloud Hadoop HBase Pig Apache MapReduce YARN Scale-out Storage Page 9 © Hortonworks Inc. 2012
  • 10. A Few Definitions of New Technologies • NoSQL – A broad range of new database management technologies mostly focused on low latency access to large amounts of data. • MPP or “in database analytics” – Widely used approach in data warehouses that incorporates analytic algorithms with the data and then distribute processing • Cloud – The use of computing resources that are delivered as a service over a network • Hadoop – Open source data management software that combines massively parallel computing and highly-scalable distributed storage Page 10 © Hortonworks Inc. 2012
  • 11. Big Data: It’s About Scale & Structure SCALE (storage & processing) STRUCTURE Traditional MPP Hadoop EDW NoSQL Database Analytics Distribution Standards and structured governance Loosely structured Limited, no data processing processing Processing coupled w/ data Structured data types Multi and unstructured A JDBC connector to Hive does not make you “big data” Page 11 © Hortonworks Inc. 2012
  • 12. Hadoop Market: Growing & Evolving •  Big data outranks virtualization as #1 trend driving spending Industry Unstructured Data, $6 initiatives Applications, Hadoop, $14 $12 –  Barclays CIO survey April 2012 NoSQL, $2 Enterprise •  Overall market at $100B Applications, $5 –  Hadoop 2nd only to RDBMS in potential Storage for DB, $9 •  Estimates put market growth > 40% CAGR RDBMS, $27 Server for DB, –  IDC expects Big Data tech and services $12 market to grow to $16.9B in 2015 Data Integration Business –  According to JPMC 50% of Big Data & Quality, $4 Intelligence, $13 market will be influenced by Hadoop Source: Big Data II – New Themes, New Opportunities, BofA Merrill Lynch, March 2012 Page 12 © Hortonworks Inc. 2012
  • 13. Hadoop: Poised for Rapid Growth •  Hadoop is in the chasm •  Ecosystem evolving customers relative % •  Enterprises endure 1-3 year adoption cycle The CHASM Innovators, Early Early Late majority, Laggards, technology adopters, majority, conservatives Skeptics enthusiasts visionaries pragmatists time Customers want Customers want technology & performance solutions & convenience Source: Geoffrey Moore - Crossing the Chasm Page 13 © Hortonworks Inc. 2012
  • 14. What’s Needed to Drive Success? •  Enterprise tooling to become a complete data platform –  Open deployment & provisioning –  Higher quality data loading –  Monitoring and management –  APIs for easy integration www.hortonworks.com/moore •  Ecosystem needs support & development –  Existing infrastructure vendors need to continue to integrate –  Apps need to continue to be developed on this infrastructure •  Market needs to rally around core Apache Hadoop –  To avoid further splintering/market distraction –  To accelerate adoption Page 14 © Hortonworks Inc. 2012
  • 15. What is Hadoop? © Hortonworks Inc. 2012 Page 15
  • 16. What is Apache Hadoop? Open source data management software that combines massively parallel computing and highly-scalable distributed storage to provide high performance computation across large amounts of data Page 16 © Hortonworks Inc. 2012
  • 17. Big Data Driven Means a New Data Platform • Facilitate data capture – Collect data from all sources - structured and unstructured data – At all speeds batch, asynchronous, streaming, real-time • Comprehensive data processing – Transform, refine, aggregate, analyze, report • Simplified data exchange – Interoperate with enterprise data systems for query and processing – Share data with analytic applications – Data between analyst • Easy to operate platform – Provision, monitor, diagnose, manage at scale – Reliability, availability, affordability, scalability, interoperability Page 17 © Hortonworks Inc. 2012
  • 18. Enterprise Data Platform Requirements A data platform is an integrated set of components that allows you to capture, process and share data in any format, at scale Capture •  Store and integrate data sources in real time and batch •  Process and manage data of any size and includes tools to Process sort, filter, summarize and apply basic functions to the data Exchange •  Open and promote the exchange of data with new & existing external applications Operate •  Includes tools to manage and operate and gain insight into performance and assure availability Page 18 © Hortonworks Inc. 2012
  • 19. Apache Hadoop Open Source data management with scale-out storage & processing Processing Storage Map Reduce HDFS •  Distributed across “nodes” •  Splits a task across processors “near” the data & assembles results •  Natively redundant •  Self-Healing, High Bandwidth •  Name node tracks locations Clustered Storage Page 19 © Hortonworks Inc. 2012
  • 20. Apache Hadoop Characteristics • Scalable – Efficiently store and process petabytes of data – Linear scale driven by additional processing and storage • Reliable – Redundant storage – Failover across nodes and racks • Flexible – Store all types of data in any format – Apply schema on analysis and sharing of the data • Economical – Use commodity hardware – Open source software guards against vendor lock-in Page 20 © Hortonworks Inc. 2012
  • 21. Tale of the Tape Relational VS. Hadoop Required on write schema Required on read Reads are fast speed Writes are fast Standards and structured governance Loosely structured Limited, no data processing processing Processing coupled with data Structured data types Multi and unstructured Interactive OLAP Analytics Data Discovery Complex ACID Transactions best fit use Processing unstructured data Operational Data Store Massive Storage/Processing Page 21 © Hortonworks Inc. 2012
  • 22. What is a Hadoop “Distribution” Talend WebHDFS Sqoop Flume A complimentary set HCatalog of open source HBase Pig Hive technologies that MapReduce HDFS make up a complete Ambari Oozie HA data platform ZooKeeper •  Tested and pre-packaged to ease installation and usage •  Collects the right versions of the components that all have different release cycles and ensures they work together © Hortonworks Inc. 2012
  • 23. Apache Hadoop Related Projects Talend WebHDFS Sqoop Flume HCatalog HBase Capture Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Process WebHDFS Exchange •  REST API that supports the complete FileSystem interface for HDFS. •  Move data in and out and delete from HDFS •  Perform file and directory functions Operate •  webhdfs://<HOST>:<HTTP PORT>/PATH •  Standard and included in version 1.0 of Hadoop Page 23 © Hortonworks Inc. 2012
  • 24. Apache Hadoop Related Projects Talend WebHDFS Sqoop Flume HCatalog HBase Capture Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Process Apache Sqoop Exchange •  Sqoop is a set of tools that allow non-Hadoop data stores to interact with traditional relational databases and data warehouses. •  A series of connectors have been created to support explicit sources such as Oracle & Teradata Operate •  It moves data in and out of Hadoop •  SQ-OOP: SQL to Hadoop Page 24 © Hortonworks Inc. 2012
  • 25. Apache Hadoop Related Projects Talend WebHDFS Sqoop Flume HCatalog HBase Capture Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Process Apache Flume Exchange •  Distributed service for efficiently collecting, aggregating, and moving streams of log data into HDFS •  Streaming capability with many failover and recovery mechanisms Operate •  Often used to move web log files directly into Hadoop Page 25 © Hortonworks Inc. 2012
  • 26. Apache Hadoop Related Projects Talend WebHDFS Sqoop Flume HCatalog HBase Capture Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Process Apache HBase Exchange •  HBase is a non-relational database. It is columnar and provides fault-tolerant storage and quick access to large quantities of sparse data. It also adds transactional capabilities to Hadoop, allowing users to conduct updates, inserts and deletes. Operate Page 26 © Hortonworks Inc. 2012
  • 27. Apache Hadoop Related Projects Talend WebHDFS Sqoop Flume HCatalog HBase Capture Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Process Apache Pig Exchange •  Apache Pig allows you to write complex map reduce transformations using a simple scripting language. Pig latin (the language) defines a set of transformations on a data set such as aggregate, join and sort among others. Pig Latin is sometimes extended using UDF Operate (User Defined Functions), which the user can write in Java and then call directly from the language. Page 27 © Hortonworks Inc. 2012
  • 28. Apache Hadoop Related Projects Talend WebHDFS Sqoop Flume HCatalog HBase Capture Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Process Apache HCatalog Exchange •  HCatalog is a metadata management service for Apache Hadoop. It opens up the platform and allows interoperability across data processing tools such as Pig, Map Reduce and Hive. It also provides a table abstraction so that users need not be concerned with Operate where or how their data is stored. Page 28 © Hortonworks Inc. 2012
  • 29. Apache Hadoop Related Projects Talend WebHDFS Sqoop Flume HCatalog HBase Capture Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Process Apache Hive Exchange •  Apache Hive is a data warehouse infrastructure built on top of Hadoop (originally by Facebook) for providing data summarization, ad-hoc query, and analysis of large datasets. It provides a mechanism to project structure onto this data and query the data using a Operate SQL-like language called HiveQL (HQL). Page 29 © Hortonworks Inc. 2012
  • 30. Apache Hadoop Related Projects Talend WebHDFS Sqoop Flume HCatalog HBase Capture Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Process Apache Ambari Exchange •  Ambari is a monitoring, administration and lifecycle management project for Apache Hadoop clusters •  It provides a mechanism to provisions nodes •  Operationalizes Hadoop. Operate Page 30 © Hortonworks Inc. 2012
  • 31. Apache Hadoop Related Projects Talend WebHDFS Sqoop Flume HCatalog HBase Capture Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Process Apache Oozie Exchange •  Oozie coordinates jobs written in multiple languages such as Map Reduce, Pig and Hive. It is a workflow system that links these jobs and allows specification of order and dependencies between them. Operate Page 31 © Hortonworks Inc. 2012
  • 32. Apache Hadoop Related Projects Talend WebHDFS Sqoop Flume HCatalog HBase Capture Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Process Apache ZooKeeper Exchange •  ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. Operate Page 32 © Hortonworks Inc. 2012
  • 33. Using Hadoop “out of the box” 1.  Provision your cluster w/ Ambari 2.  Load your data into HDFS – WebHDFS, distcp, Java 3.  Express queries in Pig and Hive – Shell or scripts 4.  Write some UDFs and MR-jobs by hand 5.  Graduate to using data scientists 6.  Scale your cluster and your analytics – Integrate to the rest of your enterprise data silos (TOPIC OF TODAY’S DISCUSSION) © Hortonworks Inc. 2012
  • 34. Hadoop in Enterprise Data Architectures Existing Business Infrastructure Web New Tech Datameer Tableau Karmasphere IDE & ODS & Applications & Visualization & Web Splunk Dev Tools Datamarts Spreadsheets Intelligence Applications Operations Discovery Low Latency/ Tools EDW NoSQL Custom Existing Templeton WebHDFS Sqoop Flume HCatalog HBase Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Social Exhaust logs files CRM ERP financials Media Data Big Data Sources (transactions, observations, interactions) Page 34 © Hortonworks Inc. 2012
  • 35. The Business Case for Hadoop © Hortonworks Inc. 2012 Page 35
  • 36. Apache Hadoop: Patterns of Use Big Data Transactions, Interactions, Observations Refine Explore Enrich Business Case Page 36 © Hortonworks Inc. 2012
  • 37. Operational Data Refinery Hadoop as platform for ETL modernization Refine Explore Enrich Unstructured Log files DB data Capture and Archive •  Capture new unstructured data along with logfiles all alongside existing sources •  Retain inputs in raw form for audit and Capture and archive continuity purposes Parse & Cleanse Transform, change and refine Structure and join •  Parse the data & cleanse Upload •  Apply structure and definition Refinery •  Join datasets together across disparate data sources Upload •  Push to existing data warehouse for downstream consumption Enterprise •  Feeds operational reporting and online systems Data Warehouse Page 37 © Hortonworks Inc. 2012
  • 38. “Big Bank” Service 2 Users with 1 Pipeline Empower scientists and systems various upstream apps and feeds - Best of breed exploration platforms - EBCDIC - plus existing TD warehouse - weblogs - relational feeds Veritca Aster GPFS 10k files per day Palantir Teradata transform to Hcatalog datatypes 1 . . . . . . . 3 - 5 years retention Detect only changed records and upload to EDW . . . n HDFS (scalable-clustered storage) © Hortonworks Inc. 2012
  • 39. “Big Bank” Key Benefits • Capture and archive – Retain 3 – 5 years instead of 2 – 10 days – Lower costs – Improved compliance • Transform, change, refine – Turn upstream raw dumps into small list of “new, update, delete” customer records – Convert fixed-width EBCDIC to UTF-8 (Java and DB compatible) – Turn raw weblogs into sessions and behaviors • Upload – Insert into Teradata for downstream “as-is” reporting and tools – Insert into new exploration platform for scientists to play with © Hortonworks Inc. 2012
  • 40. Big Data Exploration & Visualization Hadoop as agile, ad-hoc data mart Refine Explore Enrich Unstructured Log files DB data Capture and archive •  Same as in “refine” Structure and join Capture and archive •  Parse the data into queryable format Structure and join •  Exploratory analysis using Hive and Pig to discover value and lower complexity Categorize into tables Categorize into tables upload JDBC / ODBC •  Label data and type information for Explore compatibility and later discovery •  Pre-compute stats, groupings, patterns in data to accelerate analysis Visualize •  See key insights in existing powerful tools Visualization •  Move actionable insights into reports and marts Datamarts Tools Page 40 © Hortonworks Inc. 2012
  • 41. “Hardware Manufacturer” Unlocking Customer Survey Data product satisfaction and customer feedback analysis via Tableau Searchable text fields (Lucene index) Multiple choice survey data weblog traffic 1 . . . . . . . Existing Customer Service DB . . . n Hadoop cluster © Hortonworks Inc. 2012
  • 42. “Hardware Manufacturer” Key Benefits • Capture and archive – Store 10+ survey forms/year for > 3 years – Capture text, audio, and systems data in one platform • Structure and join – Unlock freeform text and audio data – Un-anonymize customers • Categorize into tables – Create HCatalog tables “customer”, “survey”, “freeform text” • Upload, JDBC – Visualize natural satisfaction levels and groups – Tag customers as “happy” and report back to CRM database © Hortonworks Inc. 2012
  • 43. Application Enrichment Deliver Hadoop analysis to online apps Refine Explore Enrich Unstructured Log files DB data Capture •  Capture data that was once too bulky and unmanageable Capture Enrich Parse Derive Useful Information Derive/Filter •  Uncover aggregate characteristics across data Scheduled & near real time •  Use Hive Pig and Map Reduce to identify patterns NoSQL, HBase •  Filter useful data from mass streams (Pig) Low Latency •  Micro or macro batch oriented schedules Share •  Push results to HBase or other NoSQL alternative for real time delivery Online •  Use patterns to deliver right content/offer to the Applications right person at the right time Page 43 © Hortonworks Inc. 2012
  • 44. “Clothing Retailer” Custom Homepages anonymous cookied shopper shopper Default Custom HBase custom homepage data Homepage Homepage 1 . . . Retail . . . . website . . . n Hadoop for "cluster analysis" weblogs Sales database © Hortonworks Inc. 2012
  • 45. “Clothing Retailer” Key Benefits • Capture – Capture weblogs together with sales order history, customer master • Derive useful information – Compute relationships between products over time – “people who buy shirts eventually need pants” – Score customer web behavior / sentiment – Connect product recommendations to customer sentiment • Share – Load customer recommendations into HBase for rapid website service © Hortonworks Inc. 2012
  • 46. Business Cases of Hadoop Vertical Refine Explore Enrich •  Dynamic Pricing •  Log Analysis/Site Retail & Web •  Social Network Analysis •  Session & Content Optimization Optimization •  Loyalty Program •  Dynamic Pricing/Targeted Retail •  Brand and Sentiment Analysis Optimization Offer Intelligence •  Threat Identification •  Person of Interest Discovery •  Cross Jurisdiction Queries •  Risk Modeling & Fraud •  Surveillance and Fraud Identification •  Real-time upsell, cross sales Finance •  Trade Performance Detection marketing offers •  Customer Risk Analysis Analytics •  Smart Grid: Production •  Grid Failure Prevention Energy •  Individual Power Grid Optimization •  Smart Meters •  Dynamic Delivery Manufacturing •  Supply Chain Optimization •  Customer Churn Analysis •  Replacement parts Healthcare & •  Electronic Medical Records •  Clinical Trials Analysis •  Insurance Premium Payer (EMPI) Determination © Hortonworks Inc. 2012
  • 47. Goal: Optimize Outcomes at Scale Media optimize Content Intelligence optimize Detection Finance optimize Algorithms Advertising optimize Performance Fraud optimize Prevention Retail / Wholesale optimize Inventory turns Manufacturing optimize Supply chains Healthcare optimize Patient outcomes Education optimize Learning outcomes Government optimize Citizen services Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation. © Hortonworks Inc. 2012
  • 48. Hortonworks Data Platform © Hortonworks Inc. 2012 Page 48
  • 49. Hortonworks Vision & Role We believe that by the end of 2015, more than half the world's data will be processed by Apache Hadoop. 1 Be diligent stewards of the open source core 2 Be tireless innovators beyond the core 3 Provide robust data platform services & open APIs 4 Enable the ecosystem at each layer of the stack 5 Make the platform enterprise-ready & easy to use © Hortonworks Inc. 2012
  • 50. Hadoop Emerging as Enterprise Big Data Platform Applications, Installation & Configuration, Business Tools, Administration, Development Tools, Monitoring, Open APIs and access High Availability, Data Movement & Integration, Replication, Data Management Systems, Multi-tenancy, .. Systems Management Hortonworks Data Platform DEVELOPER Data Platform Services & Open APIs Metadata, Indexing, Search, Security, Management, Data Extract & Load, APIs © Hortonworks Inc. 2012
  • 51. Delivering on the Promise of Apache Hadoop Trusted Open Innovative •  Stewards of the core •  100% open platform •  Innovating current platform •  Original builders and •  No POS holdback with HCatalog, Ambari, HA operators of Hadoop •  Open to the community •  Innovating future platform •  100+ years development •  Open to the ecosystem with YARN, HA experience •  Complete vision for •  Closely aligned to •  Managed every viable Apache Hadoop core Hadoop-based platform Hadoop release •  HDP built on Hadoop 1.0 © Hortonworks Inc. 2012
  • 52. Hortonworks Apache Hadoop Expertise Hortonworks represent more than Hortonworkers 90 years combined Hadoop are the builders, development experience operators and core architects of 8 of top 15 highest lines of code* Apache Hadoop contributors to Hadoop of all time …and 3 of top 5 “Hortonworks’ engineers spend all of their time improving open source Apache Hadoop. Other Hadoop engineers are going to spend the bulk of their time working on their proprietary software” “We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog..” © Hortonworks Inc. 2012
  • 53. Hortonworks Data Platform •  Simplify deployment to get Non-Relational Database Management & Monitoring Svcs Scripting Query (Pig) (Hive) started quickly and easily Data Integration Services (HCatalog APIs, WebHDFS, Workflow & Scheduling Talend, Sqoop, FlumeNG) (Ambari, Zookeeper) Metadata (HBase) •  Monitor, manage any size cluster Services with familiar console and tools (Oozie) (HCatalog) Distributed Processing •  Only platform to include data (MapReduce) integration services to interact 1 with any data source Distributed Storage (HDFS) •  Metadata services opens the platform for integration with Hortonworks Data Platform existing applications Delivers enterprise grade functionality on a proven Apache Hadoop distribution to ease management, •  Dependable high availability simplify use and ease integration into the enterprise architecture The only 100% open source data platform for Apache Hadoop © Hortonworks Inc. 2012
  • 54. Demo © Hortonworks Inc. 2012 Page 54
  • 55. Why Hortonworks Data Platform? ONLY Hortonworks Data Platform provides… •  Tightly aligned to core Apache Hadoop development line - Reduces risk for customers who may add custom coding or projects •  Enterprise Integration - HCatalog provides scalable, extensible integration point to Hadoop data •  Most reliable Hadoop distribution - Full stack high availability on v1 delivers the strongest SLA guarantees •  Multi-tenant scheduling and resource management - Capacity and fair scheduling optimizes cluster resources •  Integration with operations, eases cluster management - Ambari is the most open/complete operations platform for Hadoop clusters © Hortonworks Inc. 2012
  • 56. Hortonworks Growing Partner Ecosystem Page 56 © Hortonworks Inc. 2012
  • 57. Hortonworks Support Subscriptions Objective: help organizations to successfully develop and deploy solutions based upon Apache Hadoop • Full-lifecycle technical support available – Developer support for design, development and POCs – Production support for staging and production environments – Up to 24x7 with 1-hour response times • Delivered by the Apache Hadoop experts – Backed by development team that has released every major version of Apache Hadoop since 0.1 • Forward-compatibility – Hortonworks’ leadership role helps ensure bug fixes and patches can be included in future versions of Hadoop projects Page 57 © Hortonworks Inc. 2012
  • 58. Hortonworks Training The expert source for Apache Hadoop training & certification Role-based training for developers, administrators & data analysts –  Coursework built and maintained by the core Apache Hadoop development team. –  The “right” course, with the most extensive and realistic hands-on materials –  Provide an immersive experience into real-world Hadoop scenarios –  Public and Private courses available Comprehensive Apache Hadoop Certification –  Become a trusted and valuable Apache Hadoop expert Page 58 © Hortonworks Inc. 2012
  • 59. What next? 1 Download Hortonworks Data Platform hortonworks.com/download 2 Use the getting started guide hortonworks.com/get-started 3 Learn more… get support Hortonworks Support •  Expert role based training •  Full lifecycle technical support •  Course for admins, developers across four service levels and operators •  Delivered by Apache Hadoop •  Certification program Experts/Committers •  Custom onsite options •  Forward-compatible hortonworks.com/training hortonworks.com/support Page 59 © Hortonworks Inc. 2012
  • 60. Q&A © Hortonworks Inc. 2012 Page 60