SlideShare uma empresa Scribd logo
1 de 85
Baixar para ler offline
Tuesday, August 21, 2012
Eric Kavanagh
                           Eric.kavanagh@bloorgroup.com




Twitter Tag: #briefr
Tuesday, August 21, 2012
Reveal the essential characteristics of enterprise
            software, good and bad

            Provide a forum for detailed analysis of today’s
            innovative technologies

            Give vendors a chance to explain their product to
            savvy analysts

            Allow audience members to pose serious questions...
            and get answers!



Twitter Tag: #briefr
Tuesday, August 21, 2012
August: Analytics

          September: Integration

          October: Database

          November: Cloud

          December: Innovators



Twitter Tag: #briefr
Tuesday, August 21, 2012
Analytics is, and always has been, about discovering insights
             that lead to better business decisions. The range of
             technologies and use cases that inhabit this area is wide:
             statistical analysis, data and process mining, predictive
             analytics and modeling, and complex event processing.

             What is now referred to as Big Data has pushed analytics
             beyond the capabilities of traditional solutions. “Big
             Analytics” has organizations diving into large heaps of data
             that previously was not available or usable.

             The growing volume, variety, velocity and complexity of
             data has proven to be a major challenge to organizations
             who leverage analytics to maintain a competitive edge.

   Twitter Tag: #briefr

Tuesday, August 21, 2012
 
                           John is the Principal and Founder of
                           Radiant Advisors. As a recognized thought
                           leader in BI, John has been publishing
                           articles and presenting at conferences for
                           the past 10 years. He has been a Best
                           Practices judge, presenter and panel
                           participant at TDWI. John has also
                           developed and presented his own courses:
                           Radiant Advisors Learning Catalog.

                           John has a B.S. in Mechanical Engineering
                           from California State University and an
                           M.B.A. from the University of Colorado. He
                           is a Certified Business Intelligence
                           Professional with mastery levels in
                           Leadership and Administration, Database
                           Administration and Business Intelligence.
                            




Twitter Tag: #briefr
Tuesday, August 21, 2012
Teradata is known for its analytic data solutions with
            a focus on integrated data warehousing, big data
            analytics and business applications.

            It offers a broad suite of technology platforms and
            solutions, and a wide range of data management
            applications and data mining capabilities.

            Teradata features Teradata Aster is its MapReduce
            platform to handle big data and big analytics on
            multi-structured data.


Twitter Tag: #briefr
Tuesday, August 21, 2012
Steve Wooledge is Senior Director of
         Marketing at Teradata’s Aster Center of
         Innovation, where he is an evangelist for the
         company’s analytic platform product and
         responsible for awareness, demand
         generation, and solution marketing for the
         data scientist. Steve has more than 10 years
         of experience in product marketing and
         business development for business
         intelligence, data management, Web
         analytics and e-commerce products.

         Prior to his current role, Steve held product
         marketing positions at Interwoven and
         Business Objects as well as sales and
         engineering roles at Business Objects, Dow
         Chemical and Occidental Petroleum.

         Steve has a B.S. in Chemical Engineering and
         an M.B.A. in Marketing and Finance.




Twitter Tag: #briefr
Tuesday, August 21, 2012
The Unified Big Data Architecture &
   Bridging the Analyst Gap for Hadoop
   Steve Wooledge, Sr. Director of Marketing
   August 21, 2012



Tuesday, August 21, 2012
Topics


   • Quick intro to Teradata Aster

   • The need for a unified big data architecture

   • Bridging the Analyst Gap for Hadoop: Aster SQL-H™




        10   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Teradata Aster
   Leading Innovator in Data Discovery for the Enterprise




   Customers




        11   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Teradata Aster
   Leading Innovator in Data Discovery for the Enterprise

  § Aster Solution: Data discovery platform - Delivers MapReduce analytic
     framework within a MPP database




   Customers




        11   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Teradata Aster
   Leading Innovator in Data Discovery for the Enterprise

  § Aster Solution: Data discovery platform - Delivers MapReduce analytic
     framework within a MPP database
  § Brings data science to the business: Enables MapReduce processing through
     the analytic language of business, standard SQL




   Customers




        11   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Teradata Aster
   Leading Innovator in Data Discovery for the Enterprise

  § Aster Solution: Data discovery platform - Delivers MapReduce analytic
     framework within a MPP database
  § Brings data science to the business: Enables MapReduce processing through
     the analytic language of business, standard SQL
  § Delivers new analytics: Gives businesses new breakthrough analytic apps via
     pre-packaged pattern, path, and graph SQL-MapReduce modules




   Customers




        11   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Teradata Aster
   Leading Innovator in Data Discovery for the Enterprise

  § Aster Solution: Data discovery platform - Delivers MapReduce analytic
     framework within a MPP database
  § Brings data science to the business: Enables MapReduce processing through
     the analytic language of business, standard SQL
  § Delivers new analytics: Gives businesses new breakthrough analytic apps via
     pre-packaged pattern, path, and graph SQL-MapReduce modules
  § On multi-structured data: Leverages multi-structured data sources for
     increased analytic breadth & accuracy



   Customers




        11   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Teradata Aster
   Leading Innovator in Data Discovery for the Enterprise

  § Aster Solution: Data discovery platform - Delivers MapReduce analytic
     framework within a MPP database
  § Brings data science to the business: Enables MapReduce processing through
     the analytic language of business, standard SQL
  § Delivers new analytics: Gives businesses new breakthrough analytic apps via
     pre-packaged pattern, path, and graph SQL-MapReduce modules
  § On multi-structured data: Leverages multi-structured data sources for
     increased analytic breadth & accuracy



   Customers




        11   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Teradata Aster
   Leading Innovator in Data Discovery for the Enterprise

  § Aster Solution: Data discovery platform - Delivers MapReduce analytic
     framework within a MPP database
  § Brings data science to the business: Enables MapReduce processing through
     the analytic language of business, standard SQL
  § Delivers new analytics: Gives businesses new breakthrough analytic apps via
     pre-packaged pattern, path, and graph SQL-MapReduce modules
  § On multi-structured data: Leverages multi-structured data sources for
     increased analytic breadth & accuracy



   Customers




        11   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Teradata Aster
   Leading Innovator in Data Discovery for the Enterprise

  § Aster Solution: Data discovery platform - Delivers MapReduce analytic
     framework within a MPP database
  § Brings data science to the business: Enables MapReduce processing through
     the analytic language of business, standard SQL
  § Delivers new analytics: Gives businesses new breakthrough analytic apps via
     pre-packaged pattern, path, and graph SQL-MapReduce modules
  § On multi-structured data: Leverages multi-structured data sources for
     increased analytic breadth & accuracy



   Customers




        11   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Teradata Aster MapReduce Platform


                              Analysts             Customers                 Business Users         Data Scientists

                       Your Analytic & Advanced Reporting Applications


                                                                                      • 50+ pre-built analytic modules
     Develop                    Rapid Analytics                                       • Visual IDE; develop apps in hours
                                 Development                                          • Many programming languages


                                                                                      • SQL-MapReduce framework
      Process
                           Embedded Analytic                                          • Analyze both structured
                              Processing                                                & multi-structured data
                                                                                      • Linear, incremental scalability

                                                                                      • Commodity-hardware based
       Store          Massively Parallel Data                                         • Software only, appliance, or cloud
                             Storage                                                  • Relational-data architecture can
                                                                                        be extended for non-relational types


        12     Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Business Impact / ROI

          Increased conversions from
          recommendations with 360-degree
          view of customer across in-store
          and .com behavior
                                                                                    • Payment processing
          Build revenue attribution                                                   analytics down from one
                                                                                      day to one minute with SQL-
          models to link every purchase to a                                          MapReduce

          site feature                                                              • Web log data processing
                                                                                      from seven hours to 20
                                                                                      minutes
          Reduce churn from one day                                                 • Interactive dashboards with
                                                                                      all KPI’s from point of order
          previously to 20 minutes                                                    inception—down from five
                                                                                      hours to five minutes




             Deeper Consumer Insights with Teradata Aster
        13   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Big Data: From Transactions to Interactions




                                            Web logs                WEB                      A/B testing
                  s
               te
            aby
        r                         Offer history                                            Dynamic Pricing
     Te
                                                                                          Affiliate Networks
                           	
                	
                te
                   s                         CRM
                                                                  Segmentation
            a by                                                                          Search marketing
        g
      Gi                                                            Offer details
                                     ERP                                                 Behavioral Targeting
                    es                                     Customer Touches
                b yt            Purchase detail
        ga                      Purchase record
     Me                                                      Support Contacts             Dynamic Funnels
                                Payment record




        14        Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Big Data: From Transactions to Interactions

                                                                                            BIG DATA
                                  User Generated Content
                 es                                                                                                Social Network
             b yt                                               Mobile Web
       ta
    Pe                              User Click Stream                                           Sentiment
                                                                                                                External Demographics



                                            Web logs                WEB                      A/B testing         Business Data Feeds
                  s
               te
            aby
        r                         Offer history                                            Dynamic Pricing
     Te                                                                                                               HD Video
                                                                                          Affiliate Networks
                           	
                	
                te
                   s                         CRM                                                                   Speech to Text
                                                                  Segmentation
            a by                                                                          Search marketing
        g
      Gi                                                            Offer details
                                                                                                                Product/Service Logs
                                     ERP                                                 Behavioral Targeting
                    es                                     Customer Touches
                b yt            Purchase detail
        g   a                                                                                                        SMS/MMS
                                Purchase record
     Me                                                      Support Contacts             Dynamic Funnels
                                Payment record




                                                         Increasing data variety and complexity



        14        Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Unified Big Data Architecture
   Bridging Classic & Big Data Worlds

                                                              Classic Method
                                                     Structured & Repeatable Analysis




     Business determines what                                                           IT structures the data to
          questions to ask                                                               answer those questions




        15   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Unified Big Data Architecture
   Bridging Classic & Big Data Worlds

                                                              Classic Method
                                                     Structured & Repeatable Analysis




     Business determines what                                                           IT structures the data to
          questions to ask                                                               answer those questions

                                                                                        “Capture only what’s
                                                                                              needed”




        15   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Unified Big Data Architecture
   Bridging Classic & Big Data Worlds

                                                              Classic Method
                                                     Structured & Repeatable Analysis




     Business determines what                                                               IT structures the data to
          questions to ask                                                                   answer those questions

                                                                                           “Capture only what’s
                                                                                                 needed”




      IT delivers a platform for                             Big Data Method
        storing, refining, and                                                             Business explores data for
                                                   Multi-structured & Iterative Analysis
     analyzing all data sources                                                            questions worth answering




        15   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Unified Big Data Architecture
   Bridging Classic & Big Data Worlds

                                                              Classic Method
                                                     Structured & Repeatable Analysis




     Business determines what                                                               IT structures the data to
          questions to ask                                                                   answer those questions

                                                                                           “Capture only what’s
                                                                                                 needed”




      IT delivers a platform for                             Big Data Method
        storing, refining, and                                                             Business explores data for
                                                   Multi-structured & Iterative Analysis
     analyzing all data sources                                                            questions worth answering

      “Capture in case it’s
           needed”
        15   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Unified Big Data Architecture
   Bridging Classic & Big Data Worlds

                                                              Classic Method
                                                     Structured & Repeatable Analysis




     Business determines what                                                               IT structures the data to
          questions to ask                                                                   answer those questions
                                                SQL performance and structure
                                                                                           “Capture only what’s
                                                                                                 needed”


                                             MapReduce Processing Flexibility




      IT delivers a platform for                             Big Data Method
        storing, refining, and                                                             Business explores data for
                                                   Multi-structured & Iterative Analysis
     analyzing all data sources                                                            questions worth answering

      “Capture in case it’s
           needed”
        15   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
SQL-MapReduce
                                                                                    • Single-pass of data
 MapReduce Analytics                                                                • Linked list sequential analysis

                                                                                    Traditional SQL
 Example: Pattern Matching Analysis                                                 • Self-Joins for sequencing
                                                                                    • Limited operators for ordered data




        16   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
The Advantages of MapReduce
   Raw click-stream data and pattern matching with nPath
  Goal
  • Increase understanding of customer behavior                                              Click Stream Analysis:
    on a website to improve advertising rates or
    website navigation
                                                                                            Comparative Performance

  Challenges                                                                          400
                                                                                                                   SQL for 3 pages:
  • Full website session-level data needed,                                                                           6 minutes
    typically from raw web logs
                                                                                      300
  • Requires complex multi-pass SQL queries                                                                                       MapReduce for
                                                                                                                                 3, 4, 8, 12 pages:
    or Non-SQL techniques                                                                                                        77-131 seconds
  • Requires rewriting query to change number



                                                                               Time
                                                                                      200
    of clicks analyzed

  MapReduce Value
                                                                                      100
  • Performance: Single pass over data
    regardless of number of clicks analyzed
  • Manageability: Much simpler code—
                                                                                       0
    from 350 lines of SQL to 18-line SQL-                                                   SQL	
  (3pg)   SQL-­‐MR	
  (3pg) SQL-­‐MR	
  (4pg) SQL-­‐MR	
  (8pg) SQL-­‐MR	
  (12pg)
    MapReduce                                                                                Example Analytic Logic
  • Ease of Use: Pattern flexibility to handle                                        People who search ‘diabetes’ also browse…
    varied numbers of clicks and click patterns                                       People who download visit pages A, B, D …
    without rewriting code
        17   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Need for a Unified Big Data Architecture for New Insights
   Enabling All Users for Any Data Type from Data Capture to Analysis




               Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.


                                                                                    Reporting and Execution
                   Discover and Explore
                                                                                       in the Enterprise


                                                Capture, Store and Refine


         Audio/                                                          Web &         Machine
                       Images             Docs            Text                                   CRM   SCM   ERP
         Video                                                           Social         Logs



        18   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Teradata Unified Big Data Architecture
   Any User, Any Data, Any Analysis


                          Engineers               Data Scientists                   Quants      Business Analysts
               Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.



               Aster MapReduce Portfolio                                                Teradata Analytics Portfolio

                                                                                              Integrated Data
                 Discovery Platform
                                                                                                 Warehouse
                                             SQL-H




                                                          Capture, Store, Refine

             Audio/                                             Web &               Machine
                            Images             Text                                              CRM      SCM       ERP
             Video                                              Social               Logs




        19   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Hadoop Points of Integration – Bulk Data Transfer
   • Teradata:Hadoop
   • JDBC (available today)
     − Hadoop programs can call JDBC
   • TDDBinputformat/Dboutputformat (available today)
     − Submits SQL to JDBC
   • Cloudera Sqoop (available today)
     − Command line import/export database objects

   • Aster:Hadoop
   • Aster-Hadoop Adaptor – node:node transfer using SQL-MapReduce




             Opportunity for analysts to more easily access Hadoop data




        20   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Source: Enterprise Strategy Group; April 5, 2012



Tuesday, August 21, 2012
Source: Enterprise Strategy Group; April 5, 2012



Tuesday, August 21, 2012
Bridging the Business Analyst Gap for
                       Hadoop Data




Tuesday, August 21, 2012
Announced June 12th, 2012


                                                 Aster SQL-H™
   A Business User’s Bridge to Analyze Hadoop Data


Aster SQL-H gives analysts and data scientists a better way to
           analyze data stored cheaply in Hadoop
•Allow standard ANSI SQL to Hadoop data

•Leverage existing BI tool investments

•Enable 50+ prebuilt SQL-MapReduce Apps and IDE

•Lower costs by making data analysts self-sufficient

        23   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
The Big Data Architecture Today Has Gaps
  Analyst’s Goal: Get Insights from Data in Hadoop


          Engineers                         Data Scientists                         Quants     Business Analysts




                                                          Aster MapReduce Portfolio          Teradata Analytics Portfolio
             Custom Code and
               Development

                                                             SQL & SQL-MapReduce                        SQL

               MR, Pig, Hive
                                                               Teradata Aster                        Teradata
                IT is the optimizer                          Discovery Platform                        IDW



                                                              HDFS




        24   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Analytics on Hadoop Data with Aster SQL-H


          Engineers                         Data Scientists                         Quants        Business Analysts




                                                         Aster MapReduce Portfolio           Teradata Analytics Portfolio




                                                               SQL & MapReduce                           SQL



                                                               Teradata Aster                         Teradata
                                                             Discovery Platform                         IDW


                                                              HDFS




        25   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Analytics on Hadoop Data with Aster SQL-H


          Engineers                         Data Scientists                         Quants        Business Analysts




                                  Aster MapReduce Portfolio
                                               Aster MapReduce Portfolio                     Teradata Analytics Portfolio




                  SQL-H                                 SQL & MapReduce
                                               SQL & SQL-MapReduce                                      SQL
                                                                                                        SQL



                                                               Teradata Aster                         Teradata
                                                             Discovery Platform                         IDW


                                                              HDFS




        25   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Aster SQL-H Integration with Hadoop Catalog
   A Business User’s Bridge to Analyzing Data in Hadoop

   • Industry’s First Database Integration
     with Hadoop’s HCatalog                                                                  Aster SQL-H
   • Abstraction layer to easily and
     efficiently read structured & multi-
     structured data stored in HDFS
                                                                                     Hadoop
   • Uses Hadoop Catalog (HCatalog) to                                              MapReduce
     perform data abstraction functions
     (e.g. automatically understands
     tables, data partitions)
                                                                                      Hive          HCatalog
   • HDFS data presented to users as
     Aster tables                                                                      Pig
   • Fully accessible within the Aster SQL
     and SQL-MapReduce processing
     engines, plus ODBC/JDBC & BI tools
                                                                                               HDFS

        26   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Data & Processing Locality in SQL-H

•SQL & SQL-MapReduce
 processing
•Intermediate data                                                                         Aster Layer: SQL-H
 persistence
•Optional: HDFS data subset
 persistence for maximum
 performance
                                                                                                         Hadoop
                                                                                                           MR




                                                                                    Data Filtering
                                                                                                           Hive       HCatalog
                                                                           Data
•Hcatalog: metadata store

•HDFS: data repository
                                                                                                           Pig
•No MapReduce processing
 in Hadoop

•Directly & in parallel move
 data from HDFS to Teradata                                                                          Hadoop Layer: HDFS
 Aster

        27   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Benefits of Aster SQL-H™
 Deep metadata layer integration between Aster and Hadoop

 Business Analysts (Powerful analytics & Performance)
 •50+ advanced SQL-MapReduce functions (Aster MapReduce Portfolio)
 •Simplified, SQL-based interface with Hadoop data structures (Hcatalog)
 •Interoperability with existing ecosystem & skillset

 Architects and Administrators (Maintainability)
 •Leverage existing DBA skill-sets without additional overhead
 •Simplify administration and monitoring
    - Alternatives require manual creation and maintenance of metadata
    - Less work and fewer errors
    - Can do filtering with Aster; select data from HCatalog, leverage partitioning




        28   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Aster MapReduce Portfolio: the App Store of Big Data
   Some of the 50+ out-of-the-box analytical apps



             Path Analysis                                                          Text Analysis
             Discover patterns in rows of                                           Derive patterns and extract
             sequential data                                                        features in textual data



             Statistical Analysis                                                   Segmentation
             High-performance processing of                                         Discover natural groupings of
             common statistical calculations                                        data points



             Marketing Analytics                                                    Data Transformation
             Analyze customer interactions to                                       Transform data for more
             optimize marketing decisions                                           advanced analysis



        29   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Big Data Architecture:
         Optimizing Workloads with Specialized Approach




Tuesday, August 21, 2012
When to Use Which?
   The best approach by workload and data type
   • Processing as a Function of Schema Requirements by Data Type

                  Low Cost              Loading and Refining                                                  Analytics
                  Storage &                                                                       Reporting   (User-driven,
                  Retention             Data Pre-Processing,
                                                                                Transformations               interactive)
                                        Prep, Cleansing

                                                 Financial analysis, ad-Hoc/OLAP
 Stable            Teradata /                   Enterprise-wide BI and Reporting                              Teradata
                                        Teradata            Teradata      Teradata
 Schema            Hadoop                               Spatial/Temporal                                      (SQL analytics)
                                                         Active Execution

                                                               Interactive data discovery
                                                                       Aster                                  Aster
 Evolving                               Aster /                     Web clickstream
                   Hadoop                                              (joining with    Aster                 (SQL + MapReduce
 Schema                                 Hadoop                    Set-top box analysis
                                                                       structured data)                       Analytics)
                                                                CDRs, Sensor logs, JSON


                                     Social feeds, text, document, or image processing
                                                                                Aster
 Format,
                   Hadoop             Hadoop Audio/video storage and refining
                                                          Hadoop                (MapReduce
 No Schema
                                             Storage and batch transformations Analytics)

        31   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
When to Use Which?
   The best approach by workload and data type
   • Processing as a Function of Schema Requirements by Data Type

                  Low Cost              Loading and Refining                                                  Analytics
                  Storage &                                                                       Reporting   (User-driven,
                  Retention             Data Pre-Processing,
                                                                                Transformations               interactive)
                                        Prep, Cleansing



 Stable            Teradata /                                                                                 Teradata
                                        Teradata                                Teradata          Teradata
 Schema            Hadoop                                                                                     (SQL analytics)



                                                               Interactive data discovery
                                                                       Aster                                  Aster
 Evolving                               Aster /                     Web clickstream
                   Hadoop                                              (joining with    Aster                 (SQL + MapReduce
 Schema                                 Hadoop                    Set-top box analysis
                                                                       structured data)                       Analytics)
                                                                CDRs, Sensor logs, JSON


                                     Social feeds, text, document, or image processing
                                                                                Aster
 Format,
                   Hadoop             Hadoop Audio/video storage and refining
                                                          Hadoop                (MapReduce
 No Schema
                                             Storage and batch transformations Analytics)

        31   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
When to Use Which?
   The best approach by workload and data type
   • Processing as a Function of Schema Requirements by Data Type

                  Low Cost              Loading and Refining                                                   Analytics
                  Storage &                                                                        Reporting   (User-driven,
                  Retention             Data Pre-Processing,
                                                                                Transformations                interactive)
                                        Prep, Cleansing



 Stable            Teradata /                                                                                  Teradata
                                        Teradata                                Teradata           Teradata
 Schema            Hadoop                                                                                      (SQL analytics)




                                                                                Aster                          Aster
 Evolving                               Aster /
                   Hadoop                                                       (joining with      Aster       (SQL + MapReduce
 Schema                                 Hadoop
                                                                                structured data)               Analytics)




                                     Social feeds, text, document, or image processing
                                                                                Aster
 Format,
                   Hadoop             Hadoop Audio/video storage and refining
                                                          Hadoop                (MapReduce
 No Schema
                                             Storage and batch transformations Analytics)

        31   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
When to Use Which?
   The best approach by workload and data type
   • Processing as a Function of Schema Requirements by Data Type

                  Low Cost              Loading and Refining                                                   Analytics
                  Storage &                                                                        Reporting   (User-driven,
                  Retention             Data Pre-Processing,
                                                                                Transformations                interactive)
                                        Prep, Cleansing



 Stable            Teradata /                                                                                  Teradata
                                        Teradata                                Teradata           Teradata
 Schema            Hadoop                                                                                      (SQL analytics)




                                                                                Aster                          Aster
 Evolving                               Aster /
                   Hadoop                                                       (joining with      Aster       (SQL + MapReduce
 Schema                                 Hadoop
                                                                                structured data)               Analytics)




                                                                                                               Aster
 Format,
                   Hadoop               Hadoop                                  Hadoop                         (MapReduce
 No Schema
                                                                                                               Analytics)


        31   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
ESG Benchmark Report Summary
   3rd-party validation of Aster and Hadoop “fit”


        Scope
        • Identical hardware for Aster and Hadoop
        • Clickstream, sentiment, & traditional retail data
        • Compare “time to insight” and “time to develop”




        Results
        •Loading: Hadoop 1.8x faster
        •Transforms: Hadoop 1.3x faster
        •Analytics: Aster 35x faster
        (range: 4-416x)
        •Development: Aster 3x faster




        32   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Hadoop vs. Aster Web Clickstream Analytics




      33   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Hadoop vs. Aster Web Clickstream Analytics




                                                                                        On
                                                                                      average
                                                                                      Aster is
                                                                                        18x
                                                                                       Faster

              Aster       Aster                                           Aster
              1.5X Faster 33X Faster                                      6X Faster




      33   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Example: Golden Path Analysis of Top Site Paths
   Identifying Top Pathing Occurrences (for any event of interest)

  • Business Question
  • How do we find and rank the 10
    most frequent paths taken to the                                         SELECT click_path, count(*) as path_frequency
                                                                             FROM nPath(
    checkout page?                                                           	
 ON clicks
     - Page Visits exist in multiple rows in                                 	
 PARTITION BY user_id
       the database, for each user                                           	
 ORDER BY timestamp
                                                                             	
 MODE( overlapping )
                                                                             	
 PATTERN(‘(RELEVANT|IGNORE)*.BUY’)
                                                                             	 SYMBOLS(
  • Analytics Question
                                                                             	 	    page_type IN (‘help.asp’) AS IGNORE,
  • What is the most common path for                                         	 	    page_type NOT IN (‘help.asp’) AS RELEVANT,
    a user on the site to…                                                   	 	    page_type = ‘checkout’ as BUY)
     1. Enter the site                                                       RESULT( accum( page_id of RELEVANT) as
                                                                                click_path )
     2. View any page (other than the Help                                   ) T
        page)                                                                GROUP BY click_path
                                                                             ORDER BY count(*) desc
     - Make a purchase on the Checkout                                       LIMIT 10;
       page
     - Rank the top 10 occurrences




        34   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Example: Golden Path Analysis of Top Site Paths
   Identifying Top Pathing Occurrences (for any event of interest)

  • Business Question
  • How do we find and rank the 10
    most frequent paths taken to the                                         SELECT click_path, count(*) as path_frequency
                                                                             FROM nPath(
    checkout page?                                                           	
 ON clicks
     - Page Visits exist in multiple rows in                                 	
 PARTITION BY user_id
       the database, for each user                                           	
 ORDER BY timestamp
                                                                             	
 MODE( overlapping )
                                                                             	
 PATTERN(‘(RELEVANT|IGNORE)*.BUY’)
                                                                             	 SYMBOLS(
  • Analytics Question
                                                                             	 	    page_type IN (‘help.asp’) AS IGNORE,
  • What is the most common path for                                         	 	    page_type NOT IN (‘help.asp’) AS RELEVANT,
    a user on the site to…                                                   	 	    page_type = ‘checkout’ as BUY)
     1. Enter the site                                                       RESULT( accum( page_id of RELEVANT) as
                                                                                click_path )
     2. View any page (other than the Help                                   ) T
        page)                                                                GROUP BY click_path
                                                                             ORDER BY count(*) desc
     - Make a purchase on the Checkout                                       LIMIT 10;
       page
     - Rank the top 10 occurrences




        34   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Example: Golden Path Analysis of Top Site Paths
   Identifying Top Pathing Occurrences (for any event of interest)

  • Business Question
  • How do we find and rank the 10
    most frequent paths taken to the                                         SELECT click_path, count(*) as path_frequency
                                                                             FROM nPath(
    checkout page?                                                           	
 ON clicks
     - Page Visits exist in multiple rows in                                 	
 PARTITION BY user_id
       the database, for each user                                           	
 ORDER BY timestamp
                                                                             	
 MODE( overlapping )
                                                                             	
 PATTERN(‘(RELEVANT|IGNORE)*.BUY’)
                                                                             	 SYMBOLS(
  • Analytics Question
                                                                             	 	    page_type IN (‘help.asp’) AS IGNORE,
  • What is the most common path for                                         	 	    page_type NOT IN (‘help.asp’) AS RELEVANT,
    a user on the site to…                                                   	 	    page_type = ‘checkout’ as BUY)
     1. Enter the site                                                       RESULT( accum( page_id of RELEVANT) as
                                                                                click_path )
     2. View any page (other than the Help                                   ) T
        page)                                                                GROUP BY click_path
                                                                             ORDER BY count(*) desc
     - Make a purchase on the Checkout                                       LIMIT 10;
       page
     - Rank the top 10 occurrences




        34   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Example: Golden Path Analysis of Top Site Paths
   Identifying Top Pathing Occurrences (for any event of interest)

  • Business Question
  • How do we find and rank the 10
    most frequent paths taken to the                                         SELECT click_path, count(*) as path_frequency
                                                                             FROM nPath(
    checkout page?                                                           	
 ON clicks
     - Page Visits exist in multiple rows in                                 	
 PARTITION BY user_id
       the database, for each user                                           	
 ORDER BY timestamp
                                                                             	
 MODE( overlapping )
                                                                             	
 PATTERN(‘(RELEVANT|IGNORE)*.BUY’)
                                                                             	 SYMBOLS(
  • Analytics Question
                                                                             	 	    page_type IN (‘help.asp’) AS IGNORE,
  • What is the most common path for                                         	 	    page_type NOT IN (‘help.asp’) AS RELEVANT,
    a user on the site to…                                                   	 	    page_type = ‘checkout’ as BUY)
     1. Enter the site                                                       RESULT( accum( page_id of RELEVANT) as
                                                                                click_path )
     2. View any page (other than the Help                                   ) T
        page)                                                                GROUP BY click_path
                                                                             ORDER BY count(*) desc
     - Make a purchase on the Checkout                                       LIMIT 10;
       page
     - Rank the top 10 occurrences




        34   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Example: Golden Path Analysis of Top Site Paths
   Identifying Top Pathing Occurrences (for any event of interest)

  • Business Question
  • How do we find and rank the 10
    most frequent paths taken to the                                         SELECT click_path, count(*) as path_frequency
                                                                             FROM nPath(
    checkout page?                                                           	
 ON clicks
     - Page Visits exist in multiple rows in                                 	
 PARTITION BY user_id
       the database, for each user                                           	
 ORDER BY timestamp
                                                                             	
 MODE( overlapping )
                                                                             	
 PATTERN(‘(RELEVANT|IGNORE)*.BUY’)
                                                                             	 SYMBOLS(
  • Analytics Question
                                                                             	 	    page_type IN (‘help.asp’) AS IGNORE,
  • What is the most common path for                                         	 	    page_type NOT IN (‘help.asp’) AS RELEVANT,
    a user on the site to…                                                   	 	    page_type = ‘checkout’ as BUY)
     1. Enter the site                                                       RESULT( accum( page_id of RELEVANT) as
                                                                                click_path )
     2. View any page (other than the Help                                   ) T
        page)                                                                GROUP BY click_path
                                                                             ORDER BY count(*) desc
     - Make a purchase on the Checkout                                       LIMIT 10;
       page
     - Rank the top 10 occurrences




        34   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Example: Golden Path Analysis of Top Site Paths
   Identifying Top Pathing Occurrences (for any event of interest)

  • Business Question
  • How do we find and rank the 10
    most frequent paths taken to the                                         SELECT click_path, count(*) as path_frequency
                                                                             FROM nPath(
    checkout page?                                                           	
 ON clicks
     - Page Visits exist in multiple rows in                                 	
 PARTITION BY user_id
       the database, for each user                                           	
 ORDER BY timestamp
                                                                             	
 MODE( overlapping )
                                                                             	
 PATTERN(‘(RELEVANT|IGNORE)*.BUY’)
                                                                             	 SYMBOLS(
  • Analytics Question
                                                                             	 	    page_type IN (‘help.asp’) AS IGNORE,
  • What is the most common path for                                         	 	    page_type NOT IN (‘help.asp’) AS RELEVANT,
    a user on the site to…                                                   	 	    page_type = ‘checkout’ as BUY)
     1. Enter the site                                                       RESULT( accum( page_id of RELEVANT) as
                                                                                click_path )
     2. View any page (other than the Help                                   ) T
        page)                                                                GROUP BY click_path
                                                                             ORDER BY count(*) desc
     - Make a purchase on the Checkout                                       LIMIT 10;
       page
     - Rank the top 10 occurrences




        34   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Example: Golden Path Analysis of Top Site Paths
   Identifying Top Pathing Occurrences (for any event of interest)

  • Business Question
  • How do we find and rank the 10
    most frequent paths taken to the                                         SELECT click_path, count(*) as path_frequency
                                                                             FROM nPath(
    checkout page?                                                           	
 ON clicks
     - Page Visits exist in multiple rows in                                 	
 PARTITION BY user_id
       the database, for each user                                           	
 ORDER BY timestamp
                                                                             	
 MODE( overlapping )
                                                                             	
 PATTERN(‘(RELEVANT|IGNORE)*.BUY’)
                                                                             	 SYMBOLS(
  • Analytics Question
                                                                             	 	    page_type IN (‘help.asp’) AS IGNORE,
  • What is the most common path for                                         	 	    page_type NOT IN (‘help.asp’) AS RELEVANT,
    a user on the site to…                                                   	 	    page_type = ‘checkout’ as BUY)
     1. Enter the site                                                       RESULT( accum( page_id of RELEVANT) as
                                                                                click_path )
     2. View any page (other than the Help                                   ) T
        page)                                                                GROUP BY click_path
                                                                             ORDER BY count(*) desc
     - Make a purchase on the Checkout                                       LIMIT 10;
       page
     - Rank the top 10 occurrences




        34   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Single Channel Pathing Analysis




        35   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Analyzing Multi-channel Identifies Advertising Signal




        36   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Hadoop Provides 1.3x Faster ELT on Average




      37   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
When to Use Which Depends on Data Type
   - Aster faster on parsing and sessionizing Weblogs




      38   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Evolving Schema Example
   Aster Digital Marketing Client


     Custom Data
       by Client                                   Analytic Tools




     Media Data
     (Aggregated)                                     Teradata Aster
                                                Cookie-level




       Raw Web                                                           Archival
                                                   data




         Logs




       Ad Server
                                          Hadoop (on AWS)
         Logs
                                         (Storage, aggregations,
                                               cleansing)




        39   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Evolving Schema Example
   Aster Digital Marketing Client


     Custom Data                                                                    • Segmentation: Custom SQL-MR
       by Client                                   Analytic Tools
                                                                                      algorithms to match and create
                                                                                      centralized identifiers
                                                                                    • Sessionize by client
                                                                                    • nPath identifies segment path
     Media Data
     (Aggregated)                                     Teradata Aster                  analysis (behavior after ads)
                                                Cookie-level




       Raw Web                                                           Archival
                                                   data




         Logs




       Ad Server
                                          Hadoop (on AWS)
         Logs
                                         (Storage, aggregations,
                                               cleansing)




        39   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Evolving Schema Example
   Aster Digital Marketing Client


     Custom Data                                                                    • Segmentation: Custom SQL-MR
       by Client                                   Analytic Tools
                                                                                      algorithms to match and create
                                                                                      centralized identifiers
                                                                                    • Sessionize by client
                                                                                    • nPath identifies segment path
     Media Data
     (Aggregated)                                     Teradata Aster                  analysis (behavior after ads)


                                                                                    • Benefits:
                                                Cookie-level




       Raw Web                                                           Archival
                                                   data




         Logs




       Ad Server
                                          Hadoop (on AWS)
         Logs
                                         (Storage, aggregations,
                                               cleansing)




        39   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Evolving Schema Example
   Aster Digital Marketing Client


     Custom Data                                                                    • Segmentation: Custom SQL-MR
       by Client                                   Analytic Tools
                                                                                      algorithms to match and create
                                                                                      centralized identifiers
                                                                                    • Sessionize by client
                                                                                    • nPath identifies segment path
     Media Data
     (Aggregated)                                     Teradata Aster                  analysis (behavior after ads)


                                                                                    • Benefits:
                                                Cookie-level




       Raw Web                                                           Archival   - Marketing analysts more
                                                   data




         Logs
                                                                                      productive with Aster


       Ad Server
                                          Hadoop (on AWS)
         Logs
                                         (Storage, aggregations,
                                               cleansing)




        39   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Evolving Schema Example
   Aster Digital Marketing Client


     Custom Data                                                                    • Segmentation: Custom SQL-MR
       by Client                                   Analytic Tools
                                                                                      algorithms to match and create
                                                                                      centralized identifiers
                                                                                    • Sessionize by client
                                                                                    • nPath identifies segment path
     Media Data
     (Aggregated)                                     Teradata Aster                  analysis (behavior after ads)


                                                                                    • Benefits:
                                                Cookie-level




       Raw Web                                                           Archival   - Marketing analysts more
                                                   data




         Logs
                                                                                      productive with Aster
                                                                                    - Lower cost - storage and
                                                                                      batch refining done on
       Ad Server
         Logs
                                          Hadoop (on AWS)                             Amazon Elastic MapReduce
                                         (Storage, aggregations,
                                               cleansing)




        39   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
More Accurate Customer Churn Prevention
                                                                Social feeds                     Clickstream Data




                            Multi-Structured Raw
                                     Data                                           Call Data
                                                                                                       Aster                                    Analysis
                               Call Center Voice                                    Sentiment                                                      +
                                                                   Hadoop                            Discovery
                                    Records                                          Scores
                                                                                                     Platform                                  Marketing
                                                                                    Check Data                                                Automation
                                Check Images




                                                                                                                           Analytic Results
                                                                                                        Dimensional Data
                                                                                                                                              (Customer
                                                                  Capture, Retain &                                                            Retention
                            Traditional Data Flow                   Refine Layer                                                              Campaign)

                               Data Sources


                                                                       ETL Tools                      Teradata
                                                                                                   Integrated DW




        40   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
More Accurate Customer Churn Prevention
                                                                Social feeds                     Clickstream Data
                  Hadoop captures,                                                                                                            Aster does path
                      stores and                                                                                                               and sentiment
                  transforms social,                                                                                                            analysis with
                   images and call                                                                                                            multi-structured
                        records                                                                                                                     data

                            Multi-Structured Raw
                                     Data                                           Call Data
                                                                                                       Aster                                          Analysis
                               Call Center Voice                                    Sentiment                                                            +
                                                                   Hadoop                            Discovery
                                    Records                                          Scores
                                                                                                     Platform                                        Marketing
                                                                                    Check Data                                                      Automation
                                Check Images




                                                                                                                           Analytic Results
                                                                                                        Dimensional Data
                                                                                                                                                     (Customer
                                                                  Capture, Retain &                                                                   Retention
                            Traditional Data Flow                   Refine Layer                                                                     Campaign)

                               Data Sources


                                                                       ETL Tools                      Teradata
                                                                                                   Integrated DW




        40   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Summary
   Bringing the VALUE of Hadoop to the Enterprise

   • Teradata is focused on extracting most business value for
     customers from data in Hadoop

   • Mainstream organizations need a unified big data architecture
   - Best-of-breed with Hadoop, Aster, Teradata
   - Brings “Data Science” to business analysts
   - 50+ business-ready MapReduce analytics and apps
   - Enabled by SQL-MapReduce framework and new SQL-H

   • Learn more at www.asterdata.com/mapreduce




        41   Confidential and proprietary. Copyright © 2012 Teradata Corporation.


Tuesday, August 21, 2012
Tuesday, August 21, 2012
Twitter Tag: #briefr
Tuesday, August 21, 2012
1




                                      THE GREAT DIVIDE:
                                      BRIDGING UNSTRUCTURED
                                      AND STRUCTURED DATA FOR
                                      NEW CUSTOMER INSIGHTS
                                      §Briefing Room - August 21, 2012
                                      §John O’Brien, Radiant Advisors
                                      §john.obrien@radiantadvisors.com




   © Copyright 2012 Radiant Advisors. All Rights
   Reserved
       
                 v1.00.000


Tuesday, August 21, 2012
2


          Principal and Founder, Radiant Advisors
          JOHN O’BRIEN
                                                   §With over 25 years of experience delivering value through data warehousing and BI
                                                   programs, John O’Brien's unique perspective comes from the combination of his
                                                   roles as a practitioner, consultant, and vendor in the BI industry. His knowledge in
                                                   designing, building, and growing enterprise BI systems and teams brings real world
                                                   insights to each role and phase within a BI program.
                                                   §Today, through Radiant Advisors John provides research and advisory services that
                                                   guide companies in meeting the demands of next generation information
                                                   management, architecture, and emerging technologies.



         Instructor 10+ years Experienced                                                           Education
         As a recognized thought leader in BI,              In 2005, John co-founded and became     John has a B.S. in Mechanical
         John has been publishing articles and              CTO of a data warehouse appliance       Engineering from California State
         presenting at conferences in North                 company that raised $43 million in      University with an emphasis in
         America and Europe for the past 10                 several rounds of venture capital       control systems and instrumentation
         years, including The Data Warehousing              financing and has many global            and an Executive M.B.A. from
         Institute where he has been invited as             production customers.  As CTO, John’s   University of Colorado.  He is a
         one of TDWI’s Best Practices judges,               primary role was to focus product       Certified Business Intelligence
         Executive Summit presenters and                    development and BI market strategy.     Professional (CBIP) since 2005 with
         expert panel participants. John has                                                        mastery levels in Leadership and
         also developed and presented many of                                                       Administration, Database
         his own courses that now comprise the                                                      Administration and Business
         initial Radiant Advisors Learning                                                          Intelligence.
         Catalog.
   © Copyright 2012 Radiant Advisors. All Rights
   Reserved
       
                 v1.00.000


Tuesday, August 21, 2012
3
            §Bridging the Great Divide: Unstructured and Structured
             Data
             WHERE DOES CONTEXT LIVE?
                      Context leveraged                                Context(s) leveraged
    Structured



                                                   BI Tools           Context in abstraction
                                                   Direct access




                      Context in structures                            Context in structures

                                                       Individual
                                                       Context                                                Context in
    Unstructured




                                                       with Data                                              Data Scientists
                                                       Scientists




                                                                                     Hive




                                                                                                    M/R
                                                                                              PIG
                   Centralized
                                    Hive




                                                                    Centralized
                                             PIG




                   Context in
                   abstraction                                      Context in                            HCatalog
                                                                    abstraction
                                      MapReduce
                                                                                            MapReduce


                                     Hadoop HDFS
                                                                                        Hadoop HDFS

                                 More Rigid                                       More Agile
   © Copyright 2012 Radiant Advisors. All Rights
   Reserved
       
                 v1.00.000


Tuesday, August 21, 2012
4
         §Bridging the Great Divide: Unstructured and Structured
          Data
          UNLOCKING UNSTRUCTURED VALUE
                              Yesterday                                              Tomorrow &
                                                                                             Analysts
                                                                                                             Casual Users
                          Value




                                                                             Value
                                                    Power Users                                        Power Users
                                    Users Involved                                        Users Involved


                                  More     Very Few                     Many Many            More     Very Few
                                  Analysts Data Scientists              Consumers            Analysts Data Scientists




                                                                                     Tool

                                                                                                Hive


                                                                                                           PIG
                                                                             DB


                                                                                      BI
                                     Hive


                                              PIG




                                                                  HCatalog            ç
                                        MapReduce                                          MapReduce



                                      Hadoop HDFS                                         Hadoop HDFS


   © Copyright 2012 Radiant Advisors. All Rights
   Reserved
       
                 v1.00.000


Tuesday, August 21, 2012
5
         §Bridging the Great Divide: Unstructured and Structured
          Data
          DISCOVERY IN BI PROCESSES
      1.
                                                        Many More Analysts          Many Many Consumers



                 Discover
                 Context

                                                    Hive




                                                                                       Tool
                                                             PIG




                                                                                        BI
                                                                        Tool
                                                                         BI
                                                                                                          Few
         More
         Analysts/Modelers
                                                    ç                                  ç                  Analysts/
                                                                                                          Modelers
                                                                   HCatalog

                                                                         ç             ç
                                    M/R




                                                   Hadoop HDFS
         Very Few
         Data Scientists

                                                                Defined Context
                                                    2.            Available to
                                                              Structured Database

   © Copyright 2012 Radiant Advisors. All Rights
   Reserved
       
                 v1.00.000


Tuesday, August 21, 2012
6
         §Bridging the Great Divide: Unstructured and Structured
          Data
          MODERN BI ARCHITECTURES
                                                                                                       Data Warehouse:
                                  Internet,                                                            Optimized Work Loads
                                 Sensor data                                                           Operational
     Hadoop:
                                                                                                       Benefit from Context
     Massive Scalability
                                                                 Operational Systems
     Lowest Cost
                                                                                    Insulate Change or Direct to
     Handles Complexity
                                                                                    Staging

                                                       Migrate History
                                                                     or ETL Acquire          Staging

                                                                                                 ETL
                                                      ç
                     MapReduce




                                                                               or ETL
                                                      ç
                                                                                        ç        ç
                                                      ç
    Very Few                                                                                                       Few
                                                                                                ETL
    Data                                                                                                           Analysts/
    Scientists                                                           PIG                                       Modelers
                                                                                            Data Marts
                                 Hadoop HDFS          HCatalog                               Data Marts
                                                                         Hive                  Data Marts



                                                   Many Many Consumers


   © Copyright 2012 Radiant Advisors. All Rights
   Reserved
       
                 v1.00.000


Tuesday, August 21, 2012
7
         §Bridging the Great Divide: Unstructured and Structured
          Data

          SUMMARY
        • Understand context in processes and architectures


        • Realize that value is unlocked with more users

        • Discovery is a powerful BI process to
          operationalize

        • Modern BI Architectures are integrating Hadoop



   © Copyright 2012 Radiant Advisors. All Rights
   Reserved
       
                 v1.00.000


Tuesday, August 21, 2012
•   Is Aster Solution intended for Data Discovery Platform and/or
              Analytic Engine Platform?

          •   Is there any difference in semantics for Teradata's vision of
              Integrated Data Warehouse vs. "Analytic Platform" which includes
              Aster and Hadoop?

          •   Does the Hcatalog need to be defined before users can use SQL-H to
              query Hadoop?

          •   The Aster MapReduce Portfolio enables its users to query and pull
              data from the Hadoop HDFS directly via SQL-H.  When data is pulled
              in from HDFS into Aster, are the Aster tables modeled as in Hcatalog
              or as key-value pairs?

          •   Is the output of the SQL-MR in Aster inserted into another physical
              table for further usage?


  Twitter Tag: #briefr

Tuesday, August 21, 2012
•   Given that Hive and PIG are interface layers above the MapReduce
              processing layer, does the Aster Layer SQL-H work as an interface
              layer interfacing with MapReduce?  Does SQL-H work similar to Hive
              when processing data inside HDFS?

          •   When it comes to performance comparisons between Aster and
              Hadoop, what guidelines were given in sizing the Hadoop
              environment?

          •   Given the commodity nature of Hadoop, does it make sense to
              increase the size of Hadoop environment to gain performance more
              cost effectively?

          •   When to use Hadoop or Aster? Based on data type?  Based on
              workload (e.g. Load, ETL, Analyze)? Or Based on Analysis type (e.g.
              Sentiment Classification or Sessionization)?



  Twitter Tag: #briefr

Tuesday, August 21, 2012
•   Does Aster store "multi-structured" data such as audio, video, image,
              pdf, etc files as a blog/clob field in database records or stores
              pointers to files?

          •   Does Aster Data have Predictive Modeling Markup Language (PMML)
              compatibility to enable Discovery through the inter-operability of
              Analytic Models to allow models developed in SAS or other platforms
              to be migrated to Aster?




  Twitter Tag: #briefr

Tuesday, August 21, 2012
Twitter Tag: #briefr
Tuesday, August 21, 2012
August: Analytics

          September: Integration

          October: Database

          November: Cloud

          December: Innovators



Twitter Tag: #briefr
Tuesday, August 21, 2012
Twitter Tag: #briefr
Tuesday, August 21, 2012

Mais conteúdo relacionado

Mais de Inside Analysis

Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessInside Analysis
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownInside Analysis
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security Inside Analysis
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeInside Analysis
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataInside Analysis
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionInside Analysis
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsInside Analysis
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingInside Analysis
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLInside Analysis
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelInside Analysis
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureInside Analysis
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskInside Analysis
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseInside Analysis
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldInside Analysis
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave DuggalInside Analysis
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyInside Analysis
 

Mais de Inside Analysis (20)

Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey Malafsky
 

Último

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Último (20)

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

  • 2. Eric Kavanagh Eric.kavanagh@bloorgroup.com Twitter Tag: #briefr Tuesday, August 21, 2012
  • 3. Reveal the essential characteristics of enterprise software, good and bad Provide a forum for detailed analysis of today’s innovative technologies Give vendors a chance to explain their product to savvy analysts Allow audience members to pose serious questions... and get answers! Twitter Tag: #briefr Tuesday, August 21, 2012
  • 4. August: Analytics September: Integration October: Database November: Cloud December: Innovators Twitter Tag: #briefr Tuesday, August 21, 2012
  • 5. Analytics is, and always has been, about discovering insights that lead to better business decisions. The range of technologies and use cases that inhabit this area is wide: statistical analysis, data and process mining, predictive analytics and modeling, and complex event processing. What is now referred to as Big Data has pushed analytics beyond the capabilities of traditional solutions. “Big Analytics” has organizations diving into large heaps of data that previously was not available or usable. The growing volume, variety, velocity and complexity of data has proven to be a major challenge to organizations who leverage analytics to maintain a competitive edge. Twitter Tag: #briefr Tuesday, August 21, 2012
  • 6.   John is the Principal and Founder of Radiant Advisors. As a recognized thought leader in BI, John has been publishing articles and presenting at conferences for the past 10 years. He has been a Best Practices judge, presenter and panel participant at TDWI. John has also developed and presented his own courses: Radiant Advisors Learning Catalog. John has a B.S. in Mechanical Engineering from California State University and an M.B.A. from the University of Colorado. He is a Certified Business Intelligence Professional with mastery levels in Leadership and Administration, Database Administration and Business Intelligence.   Twitter Tag: #briefr Tuesday, August 21, 2012
  • 7. Teradata is known for its analytic data solutions with a focus on integrated data warehousing, big data analytics and business applications. It offers a broad suite of technology platforms and solutions, and a wide range of data management applications and data mining capabilities. Teradata features Teradata Aster is its MapReduce platform to handle big data and big analytics on multi-structured data. Twitter Tag: #briefr Tuesday, August 21, 2012
  • 8. Steve Wooledge is Senior Director of Marketing at Teradata’s Aster Center of Innovation, where he is an evangelist for the company’s analytic platform product and responsible for awareness, demand generation, and solution marketing for the data scientist. Steve has more than 10 years of experience in product marketing and business development for business intelligence, data management, Web analytics and e-commerce products. Prior to his current role, Steve held product marketing positions at Interwoven and Business Objects as well as sales and engineering roles at Business Objects, Dow Chemical and Occidental Petroleum. Steve has a B.S. in Chemical Engineering and an M.B.A. in Marketing and Finance. Twitter Tag: #briefr Tuesday, August 21, 2012
  • 9. The Unified Big Data Architecture & Bridging the Analyst Gap for Hadoop Steve Wooledge, Sr. Director of Marketing August 21, 2012 Tuesday, August 21, 2012
  • 10. Topics • Quick intro to Teradata Aster • The need for a unified big data architecture • Bridging the Analyst Gap for Hadoop: Aster SQL-H™ 10 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 11. Teradata Aster Leading Innovator in Data Discovery for the Enterprise Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 12. Teradata Aster Leading Innovator in Data Discovery for the Enterprise § Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 13. Teradata Aster Leading Innovator in Data Discovery for the Enterprise § Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database § Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 14. Teradata Aster Leading Innovator in Data Discovery for the Enterprise § Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database § Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL § Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 15. Teradata Aster Leading Innovator in Data Discovery for the Enterprise § Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database § Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL § Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules § On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 16. Teradata Aster Leading Innovator in Data Discovery for the Enterprise § Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database § Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL § Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules § On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 17. Teradata Aster Leading Innovator in Data Discovery for the Enterprise § Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database § Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL § Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules § On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 18. Teradata Aster Leading Innovator in Data Discovery for the Enterprise § Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database § Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL § Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules § On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy Customers 11 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 19. Teradata Aster MapReduce Platform Analysts Customers Business Users Data Scientists Your Analytic & Advanced Reporting Applications • 50+ pre-built analytic modules Develop Rapid Analytics • Visual IDE; develop apps in hours Development • Many programming languages • SQL-MapReduce framework Process Embedded Analytic • Analyze both structured Processing & multi-structured data • Linear, incremental scalability • Commodity-hardware based Store Massively Parallel Data • Software only, appliance, or cloud Storage • Relational-data architecture can be extended for non-relational types 12 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 20. Business Impact / ROI Increased conversions from recommendations with 360-degree view of customer across in-store and .com behavior • Payment processing Build revenue attribution analytics down from one day to one minute with SQL- models to link every purchase to a MapReduce site feature • Web log data processing from seven hours to 20 minutes Reduce churn from one day • Interactive dashboards with all KPI’s from point of order previously to 20 minutes inception—down from five hours to five minutes Deeper Consumer Insights with Teradata Aster 13 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 21. Big Data: From Transactions to Interactions Web logs WEB A/B testing s te aby r Offer history Dynamic Pricing Te Affiliate Networks te s CRM Segmentation a by Search marketing g Gi Offer details ERP Behavioral Targeting es Customer Touches b yt Purchase detail ga Purchase record Me Support Contacts Dynamic Funnels Payment record 14 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 22. Big Data: From Transactions to Interactions BIG DATA User Generated Content es Social Network b yt Mobile Web ta Pe User Click Stream Sentiment External Demographics Web logs WEB A/B testing Business Data Feeds s te aby r Offer history Dynamic Pricing Te HD Video Affiliate Networks te s CRM Speech to Text Segmentation a by Search marketing g Gi Offer details Product/Service Logs ERP Behavioral Targeting es Customer Touches b yt Purchase detail g a SMS/MMS Purchase record Me Support Contacts Dynamic Funnels Payment record Increasing data variety and complexity 14 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 23. Unified Big Data Architecture Bridging Classic & Big Data Worlds Classic Method Structured & Repeatable Analysis Business determines what IT structures the data to questions to ask answer those questions 15 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 24. Unified Big Data Architecture Bridging Classic & Big Data Worlds Classic Method Structured & Repeatable Analysis Business determines what IT structures the data to questions to ask answer those questions “Capture only what’s needed” 15 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 25. Unified Big Data Architecture Bridging Classic & Big Data Worlds Classic Method Structured & Repeatable Analysis Business determines what IT structures the data to questions to ask answer those questions “Capture only what’s needed” IT delivers a platform for Big Data Method storing, refining, and Business explores data for Multi-structured & Iterative Analysis analyzing all data sources questions worth answering 15 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 26. Unified Big Data Architecture Bridging Classic & Big Data Worlds Classic Method Structured & Repeatable Analysis Business determines what IT structures the data to questions to ask answer those questions “Capture only what’s needed” IT delivers a platform for Big Data Method storing, refining, and Business explores data for Multi-structured & Iterative Analysis analyzing all data sources questions worth answering “Capture in case it’s needed” 15 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 27. Unified Big Data Architecture Bridging Classic & Big Data Worlds Classic Method Structured & Repeatable Analysis Business determines what IT structures the data to questions to ask answer those questions SQL performance and structure “Capture only what’s needed” MapReduce Processing Flexibility IT delivers a platform for Big Data Method storing, refining, and Business explores data for Multi-structured & Iterative Analysis analyzing all data sources questions worth answering “Capture in case it’s needed” 15 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 28. SQL-MapReduce • Single-pass of data MapReduce Analytics • Linked list sequential analysis Traditional SQL Example: Pattern Matching Analysis • Self-Joins for sequencing • Limited operators for ordered data 16 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 29. The Advantages of MapReduce Raw click-stream data and pattern matching with nPath Goal • Increase understanding of customer behavior Click Stream Analysis: on a website to improve advertising rates or website navigation Comparative Performance Challenges 400 SQL for 3 pages: • Full website session-level data needed, 6 minutes typically from raw web logs 300 • Requires complex multi-pass SQL queries MapReduce for 3, 4, 8, 12 pages: or Non-SQL techniques 77-131 seconds • Requires rewriting query to change number Time 200 of clicks analyzed MapReduce Value 100 • Performance: Single pass over data regardless of number of clicks analyzed • Manageability: Much simpler code— 0 from 350 lines of SQL to 18-line SQL- SQL  (3pg) SQL-­‐MR  (3pg) SQL-­‐MR  (4pg) SQL-­‐MR  (8pg) SQL-­‐MR  (12pg) MapReduce Example Analytic Logic • Ease of Use: Pattern flexibility to handle People who search ‘diabetes’ also browse… varied numbers of clicks and click patterns People who download visit pages A, B, D … without rewriting code 17 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 30. Need for a Unified Big Data Architecture for New Insights Enabling All Users for Any Data Type from Data Capture to Analysis Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. Reporting and Execution Discover and Explore in the Enterprise Capture, Store and Refine Audio/ Web & Machine Images Docs Text CRM SCM ERP Video Social Logs 18 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 31. Teradata Unified Big Data Architecture Any User, Any Data, Any Analysis Engineers Data Scientists Quants Business Analysts Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. Aster MapReduce Portfolio Teradata Analytics Portfolio Integrated Data Discovery Platform Warehouse SQL-H Capture, Store, Refine Audio/ Web & Machine Images Text CRM SCM ERP Video Social Logs 19 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 32. Hadoop Points of Integration – Bulk Data Transfer • Teradata:Hadoop • JDBC (available today) − Hadoop programs can call JDBC • TDDBinputformat/Dboutputformat (available today) − Submits SQL to JDBC • Cloudera Sqoop (available today) − Command line import/export database objects • Aster:Hadoop • Aster-Hadoop Adaptor – node:node transfer using SQL-MapReduce Opportunity for analysts to more easily access Hadoop data 20 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 33. Source: Enterprise Strategy Group; April 5, 2012 Tuesday, August 21, 2012
  • 34. Source: Enterprise Strategy Group; April 5, 2012 Tuesday, August 21, 2012
  • 35. Bridging the Business Analyst Gap for Hadoop Data Tuesday, August 21, 2012
  • 36. Announced June 12th, 2012 Aster SQL-H™ A Business User’s Bridge to Analyze Hadoop Data Aster SQL-H gives analysts and data scientists a better way to analyze data stored cheaply in Hadoop •Allow standard ANSI SQL to Hadoop data •Leverage existing BI tool investments •Enable 50+ prebuilt SQL-MapReduce Apps and IDE •Lower costs by making data analysts self-sufficient 23 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 37. The Big Data Architecture Today Has Gaps Analyst’s Goal: Get Insights from Data in Hadoop Engineers Data Scientists Quants Business Analysts Aster MapReduce Portfolio Teradata Analytics Portfolio Custom Code and Development SQL & SQL-MapReduce SQL MR, Pig, Hive Teradata Aster Teradata IT is the optimizer Discovery Platform IDW HDFS 24 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 38. Analytics on Hadoop Data with Aster SQL-H Engineers Data Scientists Quants Business Analysts Aster MapReduce Portfolio Teradata Analytics Portfolio SQL & MapReduce SQL Teradata Aster Teradata Discovery Platform IDW HDFS 25 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 39. Analytics on Hadoop Data with Aster SQL-H Engineers Data Scientists Quants Business Analysts Aster MapReduce Portfolio Aster MapReduce Portfolio Teradata Analytics Portfolio SQL-H SQL & MapReduce SQL & SQL-MapReduce SQL SQL Teradata Aster Teradata Discovery Platform IDW HDFS 25 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 40. Aster SQL-H Integration with Hadoop Catalog A Business User’s Bridge to Analyzing Data in Hadoop • Industry’s First Database Integration with Hadoop’s HCatalog Aster SQL-H • Abstraction layer to easily and efficiently read structured & multi- structured data stored in HDFS Hadoop • Uses Hadoop Catalog (HCatalog) to MapReduce perform data abstraction functions (e.g. automatically understands tables, data partitions) Hive HCatalog • HDFS data presented to users as Aster tables Pig • Fully accessible within the Aster SQL and SQL-MapReduce processing engines, plus ODBC/JDBC & BI tools HDFS 26 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 41. Data & Processing Locality in SQL-H •SQL & SQL-MapReduce processing •Intermediate data Aster Layer: SQL-H persistence •Optional: HDFS data subset persistence for maximum performance Hadoop MR Data Filtering Hive HCatalog Data •Hcatalog: metadata store •HDFS: data repository Pig •No MapReduce processing in Hadoop •Directly & in parallel move data from HDFS to Teradata Hadoop Layer: HDFS Aster 27 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 42. Benefits of Aster SQL-H™ Deep metadata layer integration between Aster and Hadoop Business Analysts (Powerful analytics & Performance) •50+ advanced SQL-MapReduce functions (Aster MapReduce Portfolio) •Simplified, SQL-based interface with Hadoop data structures (Hcatalog) •Interoperability with existing ecosystem & skillset Architects and Administrators (Maintainability) •Leverage existing DBA skill-sets without additional overhead •Simplify administration and monitoring - Alternatives require manual creation and maintenance of metadata - Less work and fewer errors - Can do filtering with Aster; select data from HCatalog, leverage partitioning 28 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 43. Aster MapReduce Portfolio: the App Store of Big Data Some of the 50+ out-of-the-box analytical apps Path Analysis Text Analysis Discover patterns in rows of Derive patterns and extract sequential data features in textual data Statistical Analysis Segmentation High-performance processing of Discover natural groupings of common statistical calculations data points Marketing Analytics Data Transformation Analyze customer interactions to Transform data for more optimize marketing decisions advanced analysis 29 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 44. Big Data Architecture: Optimizing Workloads with Specialized Approach Tuesday, August 21, 2012
  • 45. When to Use Which? The best approach by workload and data type • Processing as a Function of Schema Requirements by Data Type Low Cost Loading and Refining Analytics Storage & Reporting (User-driven, Retention Data Pre-Processing, Transformations interactive) Prep, Cleansing Financial analysis, ad-Hoc/OLAP Stable Teradata / Enterprise-wide BI and Reporting Teradata Teradata Teradata Teradata Schema Hadoop Spatial/Temporal (SQL analytics) Active Execution Interactive data discovery Aster Aster Evolving Aster / Web clickstream Hadoop (joining with Aster (SQL + MapReduce Schema Hadoop Set-top box analysis structured data) Analytics) CDRs, Sensor logs, JSON Social feeds, text, document, or image processing Aster Format, Hadoop Hadoop Audio/video storage and refining Hadoop (MapReduce No Schema Storage and batch transformations Analytics) 31 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 46. When to Use Which? The best approach by workload and data type • Processing as a Function of Schema Requirements by Data Type Low Cost Loading and Refining Analytics Storage & Reporting (User-driven, Retention Data Pre-Processing, Transformations interactive) Prep, Cleansing Stable Teradata / Teradata Teradata Teradata Teradata Schema Hadoop (SQL analytics) Interactive data discovery Aster Aster Evolving Aster / Web clickstream Hadoop (joining with Aster (SQL + MapReduce Schema Hadoop Set-top box analysis structured data) Analytics) CDRs, Sensor logs, JSON Social feeds, text, document, or image processing Aster Format, Hadoop Hadoop Audio/video storage and refining Hadoop (MapReduce No Schema Storage and batch transformations Analytics) 31 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 47. When to Use Which? The best approach by workload and data type • Processing as a Function of Schema Requirements by Data Type Low Cost Loading and Refining Analytics Storage & Reporting (User-driven, Retention Data Pre-Processing, Transformations interactive) Prep, Cleansing Stable Teradata / Teradata Teradata Teradata Teradata Schema Hadoop (SQL analytics) Aster Aster Evolving Aster / Hadoop (joining with Aster (SQL + MapReduce Schema Hadoop structured data) Analytics) Social feeds, text, document, or image processing Aster Format, Hadoop Hadoop Audio/video storage and refining Hadoop (MapReduce No Schema Storage and batch transformations Analytics) 31 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 48. When to Use Which? The best approach by workload and data type • Processing as a Function of Schema Requirements by Data Type Low Cost Loading and Refining Analytics Storage & Reporting (User-driven, Retention Data Pre-Processing, Transformations interactive) Prep, Cleansing Stable Teradata / Teradata Teradata Teradata Teradata Schema Hadoop (SQL analytics) Aster Aster Evolving Aster / Hadoop (joining with Aster (SQL + MapReduce Schema Hadoop structured data) Analytics) Aster Format, Hadoop Hadoop Hadoop (MapReduce No Schema Analytics) 31 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 49. ESG Benchmark Report Summary 3rd-party validation of Aster and Hadoop “fit” Scope • Identical hardware for Aster and Hadoop • Clickstream, sentiment, & traditional retail data • Compare “time to insight” and “time to develop” Results •Loading: Hadoop 1.8x faster •Transforms: Hadoop 1.3x faster •Analytics: Aster 35x faster (range: 4-416x) •Development: Aster 3x faster 32 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 50. Hadoop vs. Aster Web Clickstream Analytics 33 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 51. Hadoop vs. Aster Web Clickstream Analytics On average Aster is 18x Faster Aster Aster Aster 1.5X Faster 33X Faster 6X Faster 33 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 52. Example: Golden Path Analysis of Top Site Paths Identifying Top Pathing Occurrences (for any event of interest) • Business Question • How do we find and rank the 10 most frequent paths taken to the SELECT click_path, count(*) as path_frequency FROM nPath( checkout page? ON clicks - Page Visits exist in multiple rows in PARTITION BY user_id the database, for each user ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( • Analytics Question page_type IN (‘help.asp’) AS IGNORE, • What is the most common path for page_type NOT IN (‘help.asp’) AS RELEVANT, a user on the site to… page_type = ‘checkout’ as BUY) 1. Enter the site RESULT( accum( page_id of RELEVANT) as click_path ) 2. View any page (other than the Help ) T page) GROUP BY click_path ORDER BY count(*) desc - Make a purchase on the Checkout LIMIT 10; page - Rank the top 10 occurrences 34 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 53. Example: Golden Path Analysis of Top Site Paths Identifying Top Pathing Occurrences (for any event of interest) • Business Question • How do we find and rank the 10 most frequent paths taken to the SELECT click_path, count(*) as path_frequency FROM nPath( checkout page? ON clicks - Page Visits exist in multiple rows in PARTITION BY user_id the database, for each user ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( • Analytics Question page_type IN (‘help.asp’) AS IGNORE, • What is the most common path for page_type NOT IN (‘help.asp’) AS RELEVANT, a user on the site to… page_type = ‘checkout’ as BUY) 1. Enter the site RESULT( accum( page_id of RELEVANT) as click_path ) 2. View any page (other than the Help ) T page) GROUP BY click_path ORDER BY count(*) desc - Make a purchase on the Checkout LIMIT 10; page - Rank the top 10 occurrences 34 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 54. Example: Golden Path Analysis of Top Site Paths Identifying Top Pathing Occurrences (for any event of interest) • Business Question • How do we find and rank the 10 most frequent paths taken to the SELECT click_path, count(*) as path_frequency FROM nPath( checkout page? ON clicks - Page Visits exist in multiple rows in PARTITION BY user_id the database, for each user ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( • Analytics Question page_type IN (‘help.asp’) AS IGNORE, • What is the most common path for page_type NOT IN (‘help.asp’) AS RELEVANT, a user on the site to… page_type = ‘checkout’ as BUY) 1. Enter the site RESULT( accum( page_id of RELEVANT) as click_path ) 2. View any page (other than the Help ) T page) GROUP BY click_path ORDER BY count(*) desc - Make a purchase on the Checkout LIMIT 10; page - Rank the top 10 occurrences 34 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 55. Example: Golden Path Analysis of Top Site Paths Identifying Top Pathing Occurrences (for any event of interest) • Business Question • How do we find and rank the 10 most frequent paths taken to the SELECT click_path, count(*) as path_frequency FROM nPath( checkout page? ON clicks - Page Visits exist in multiple rows in PARTITION BY user_id the database, for each user ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( • Analytics Question page_type IN (‘help.asp’) AS IGNORE, • What is the most common path for page_type NOT IN (‘help.asp’) AS RELEVANT, a user on the site to… page_type = ‘checkout’ as BUY) 1. Enter the site RESULT( accum( page_id of RELEVANT) as click_path ) 2. View any page (other than the Help ) T page) GROUP BY click_path ORDER BY count(*) desc - Make a purchase on the Checkout LIMIT 10; page - Rank the top 10 occurrences 34 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 56. Example: Golden Path Analysis of Top Site Paths Identifying Top Pathing Occurrences (for any event of interest) • Business Question • How do we find and rank the 10 most frequent paths taken to the SELECT click_path, count(*) as path_frequency FROM nPath( checkout page? ON clicks - Page Visits exist in multiple rows in PARTITION BY user_id the database, for each user ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( • Analytics Question page_type IN (‘help.asp’) AS IGNORE, • What is the most common path for page_type NOT IN (‘help.asp’) AS RELEVANT, a user on the site to… page_type = ‘checkout’ as BUY) 1. Enter the site RESULT( accum( page_id of RELEVANT) as click_path ) 2. View any page (other than the Help ) T page) GROUP BY click_path ORDER BY count(*) desc - Make a purchase on the Checkout LIMIT 10; page - Rank the top 10 occurrences 34 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 57. Example: Golden Path Analysis of Top Site Paths Identifying Top Pathing Occurrences (for any event of interest) • Business Question • How do we find and rank the 10 most frequent paths taken to the SELECT click_path, count(*) as path_frequency FROM nPath( checkout page? ON clicks - Page Visits exist in multiple rows in PARTITION BY user_id the database, for each user ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( • Analytics Question page_type IN (‘help.asp’) AS IGNORE, • What is the most common path for page_type NOT IN (‘help.asp’) AS RELEVANT, a user on the site to… page_type = ‘checkout’ as BUY) 1. Enter the site RESULT( accum( page_id of RELEVANT) as click_path ) 2. View any page (other than the Help ) T page) GROUP BY click_path ORDER BY count(*) desc - Make a purchase on the Checkout LIMIT 10; page - Rank the top 10 occurrences 34 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 58. Example: Golden Path Analysis of Top Site Paths Identifying Top Pathing Occurrences (for any event of interest) • Business Question • How do we find and rank the 10 most frequent paths taken to the SELECT click_path, count(*) as path_frequency FROM nPath( checkout page? ON clicks - Page Visits exist in multiple rows in PARTITION BY user_id the database, for each user ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( • Analytics Question page_type IN (‘help.asp’) AS IGNORE, • What is the most common path for page_type NOT IN (‘help.asp’) AS RELEVANT, a user on the site to… page_type = ‘checkout’ as BUY) 1. Enter the site RESULT( accum( page_id of RELEVANT) as click_path ) 2. View any page (other than the Help ) T page) GROUP BY click_path ORDER BY count(*) desc - Make a purchase on the Checkout LIMIT 10; page - Rank the top 10 occurrences 34 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 59. Single Channel Pathing Analysis 35 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 60. Analyzing Multi-channel Identifies Advertising Signal 36 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 61. Hadoop Provides 1.3x Faster ELT on Average 37 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 62. When to Use Which Depends on Data Type - Aster faster on parsing and sessionizing Weblogs 38 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 63. Evolving Schema Example Aster Digital Marketing Client Custom Data by Client Analytic Tools Media Data (Aggregated) Teradata Aster Cookie-level Raw Web Archival data Logs Ad Server Hadoop (on AWS) Logs (Storage, aggregations, cleansing) 39 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 64. Evolving Schema Example Aster Digital Marketing Client Custom Data • Segmentation: Custom SQL-MR by Client Analytic Tools algorithms to match and create centralized identifiers • Sessionize by client • nPath identifies segment path Media Data (Aggregated) Teradata Aster analysis (behavior after ads) Cookie-level Raw Web Archival data Logs Ad Server Hadoop (on AWS) Logs (Storage, aggregations, cleansing) 39 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 65. Evolving Schema Example Aster Digital Marketing Client Custom Data • Segmentation: Custom SQL-MR by Client Analytic Tools algorithms to match and create centralized identifiers • Sessionize by client • nPath identifies segment path Media Data (Aggregated) Teradata Aster analysis (behavior after ads) • Benefits: Cookie-level Raw Web Archival data Logs Ad Server Hadoop (on AWS) Logs (Storage, aggregations, cleansing) 39 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 66. Evolving Schema Example Aster Digital Marketing Client Custom Data • Segmentation: Custom SQL-MR by Client Analytic Tools algorithms to match and create centralized identifiers • Sessionize by client • nPath identifies segment path Media Data (Aggregated) Teradata Aster analysis (behavior after ads) • Benefits: Cookie-level Raw Web Archival - Marketing analysts more data Logs productive with Aster Ad Server Hadoop (on AWS) Logs (Storage, aggregations, cleansing) 39 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 67. Evolving Schema Example Aster Digital Marketing Client Custom Data • Segmentation: Custom SQL-MR by Client Analytic Tools algorithms to match and create centralized identifiers • Sessionize by client • nPath identifies segment path Media Data (Aggregated) Teradata Aster analysis (behavior after ads) • Benefits: Cookie-level Raw Web Archival - Marketing analysts more data Logs productive with Aster - Lower cost - storage and batch refining done on Ad Server Logs Hadoop (on AWS) Amazon Elastic MapReduce (Storage, aggregations, cleansing) 39 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 68. More Accurate Customer Churn Prevention Social feeds Clickstream Data Multi-Structured Raw Data Call Data Aster Analysis Call Center Voice Sentiment + Hadoop Discovery Records Scores Platform Marketing Check Data Automation Check Images Analytic Results Dimensional Data (Customer Capture, Retain & Retention Traditional Data Flow Refine Layer Campaign) Data Sources ETL Tools Teradata Integrated DW 40 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 69. More Accurate Customer Churn Prevention Social feeds Clickstream Data Hadoop captures, Aster does path stores and and sentiment transforms social, analysis with images and call multi-structured records data Multi-Structured Raw Data Call Data Aster Analysis Call Center Voice Sentiment + Hadoop Discovery Records Scores Platform Marketing Check Data Automation Check Images Analytic Results Dimensional Data (Customer Capture, Retain & Retention Traditional Data Flow Refine Layer Campaign) Data Sources ETL Tools Teradata Integrated DW 40 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 70. Summary Bringing the VALUE of Hadoop to the Enterprise • Teradata is focused on extracting most business value for customers from data in Hadoop • Mainstream organizations need a unified big data architecture - Best-of-breed with Hadoop, Aster, Teradata - Brings “Data Science” to business analysts - 50+ business-ready MapReduce analytics and apps - Enabled by SQL-MapReduce framework and new SQL-H • Learn more at www.asterdata.com/mapreduce 41 Confidential and proprietary. Copyright © 2012 Teradata Corporation. Tuesday, August 21, 2012
  • 72. Twitter Tag: #briefr Tuesday, August 21, 2012
  • 73. 1 THE GREAT DIVIDE: BRIDGING UNSTRUCTURED AND STRUCTURED DATA FOR NEW CUSTOMER INSIGHTS §Briefing Room - August 21, 2012 §John O’Brien, Radiant Advisors §john.obrien@radiantadvisors.com © Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000 Tuesday, August 21, 2012
  • 74. 2 Principal and Founder, Radiant Advisors JOHN O’BRIEN §With over 25 years of experience delivering value through data warehousing and BI programs, John O’Brien's unique perspective comes from the combination of his roles as a practitioner, consultant, and vendor in the BI industry. His knowledge in designing, building, and growing enterprise BI systems and teams brings real world insights to each role and phase within a BI program. §Today, through Radiant Advisors John provides research and advisory services that guide companies in meeting the demands of next generation information management, architecture, and emerging technologies. Instructor 10+ years Experienced Education As a recognized thought leader in BI, In 2005, John co-founded and became John has a B.S. in Mechanical John has been publishing articles and CTO of a data warehouse appliance Engineering from California State presenting at conferences in North company that raised $43 million in University with an emphasis in America and Europe for the past 10 several rounds of venture capital control systems and instrumentation years, including The Data Warehousing financing and has many global and an Executive M.B.A. from Institute where he has been invited as production customers.  As CTO, John’s University of Colorado.  He is a one of TDWI’s Best Practices judges, primary role was to focus product Certified Business Intelligence Executive Summit presenters and development and BI market strategy. Professional (CBIP) since 2005 with expert panel participants. John has mastery levels in Leadership and also developed and presented many of Administration, Database his own courses that now comprise the Administration and Business initial Radiant Advisors Learning Intelligence. Catalog. © Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000 Tuesday, August 21, 2012
  • 75. 3 §Bridging the Great Divide: Unstructured and Structured Data WHERE DOES CONTEXT LIVE? Context leveraged Context(s) leveraged Structured BI Tools Context in abstraction Direct access Context in structures Context in structures Individual Context Context in Unstructured with Data Data Scientists Scientists Hive M/R PIG Centralized Hive Centralized PIG Context in abstraction Context in HCatalog abstraction MapReduce MapReduce Hadoop HDFS Hadoop HDFS More Rigid More Agile © Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000 Tuesday, August 21, 2012
  • 76. 4 §Bridging the Great Divide: Unstructured and Structured Data UNLOCKING UNSTRUCTURED VALUE Yesterday Tomorrow & Analysts Casual Users Value Value Power Users Power Users Users Involved Users Involved More Very Few Many Many More Very Few Analysts Data Scientists Consumers Analysts Data Scientists Tool Hive PIG DB BI Hive PIG HCatalog ç MapReduce MapReduce Hadoop HDFS Hadoop HDFS © Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000 Tuesday, August 21, 2012
  • 77. 5 §Bridging the Great Divide: Unstructured and Structured Data DISCOVERY IN BI PROCESSES 1. Many More Analysts Many Many Consumers Discover Context Hive Tool PIG BI Tool BI Few More Analysts/Modelers ç ç Analysts/ Modelers HCatalog ç ç M/R Hadoop HDFS Very Few Data Scientists Defined Context 2. Available to Structured Database © Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000 Tuesday, August 21, 2012
  • 78. 6 §Bridging the Great Divide: Unstructured and Structured Data MODERN BI ARCHITECTURES Data Warehouse: Internet, Optimized Work Loads Sensor data Operational Hadoop: Benefit from Context Massive Scalability Operational Systems Lowest Cost Insulate Change or Direct to Handles Complexity Staging Migrate History or ETL Acquire Staging ETL ç MapReduce or ETL ç ç ç ç Very Few Few ETL Data Analysts/ Scientists PIG Modelers Data Marts Hadoop HDFS HCatalog Data Marts Hive Data Marts Many Many Consumers © Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000 Tuesday, August 21, 2012
  • 79. 7 §Bridging the Great Divide: Unstructured and Structured Data SUMMARY • Understand context in processes and architectures • Realize that value is unlocked with more users • Discovery is a powerful BI process to operationalize • Modern BI Architectures are integrating Hadoop © Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000 Tuesday, August 21, 2012
  • 80. Is Aster Solution intended for Data Discovery Platform and/or Analytic Engine Platform? • Is there any difference in semantics for Teradata's vision of Integrated Data Warehouse vs. "Analytic Platform" which includes Aster and Hadoop? • Does the Hcatalog need to be defined before users can use SQL-H to query Hadoop? • The Aster MapReduce Portfolio enables its users to query and pull data from the Hadoop HDFS directly via SQL-H.  When data is pulled in from HDFS into Aster, are the Aster tables modeled as in Hcatalog or as key-value pairs? • Is the output of the SQL-MR in Aster inserted into another physical table for further usage? Twitter Tag: #briefr Tuesday, August 21, 2012
  • 81. Given that Hive and PIG are interface layers above the MapReduce processing layer, does the Aster Layer SQL-H work as an interface layer interfacing with MapReduce?  Does SQL-H work similar to Hive when processing data inside HDFS? • When it comes to performance comparisons between Aster and Hadoop, what guidelines were given in sizing the Hadoop environment? • Given the commodity nature of Hadoop, does it make sense to increase the size of Hadoop environment to gain performance more cost effectively? • When to use Hadoop or Aster? Based on data type?  Based on workload (e.g. Load, ETL, Analyze)? Or Based on Analysis type (e.g. Sentiment Classification or Sessionization)? Twitter Tag: #briefr Tuesday, August 21, 2012
  • 82. Does Aster store "multi-structured" data such as audio, video, image, pdf, etc files as a blog/clob field in database records or stores pointers to files? • Does Aster Data have Predictive Modeling Markup Language (PMML) compatibility to enable Discovery through the inter-operability of Analytic Models to allow models developed in SAS or other platforms to be migrated to Aster? Twitter Tag: #briefr Tuesday, August 21, 2012
  • 83. Twitter Tag: #briefr Tuesday, August 21, 2012
  • 84. August: Analytics September: Integration October: Database November: Cloud December: Innovators Twitter Tag: #briefr Tuesday, August 21, 2012
  • 85. Twitter Tag: #briefr Tuesday, August 21, 2012