SlideShare uma empresa Scribd logo
1 de 97
Baixar para ler offline
Big Data



Paul Zikopoulos, BA, MBA
Director, IBM Information Management Technical Professionals,
WW Competitive Database, and Big Data                           @BigData_paulz
IBM Certified Advanced Technical Expert (Clusters and DRDA)
IBM Certified Customer Solutions Expert (DBA and BI)
paulz_ibm@msn.com

    March 27, 2012                                                © 2012 IBM Corporation
Paul C. Zikopoulos, B.A., M.B.A., is the
                 Director of Technical Professionals for IBM
                 Software Group’s Information Management

                   division and additionally leads the World
                   Wide Database Competitive and Big Data
                   SWAT teams. Paul is an award-winning
                   writer and speaker with more than 18 years
    of experience in Information Management. Paul has written
    more than 350 magazine articles and 15 books including
    Understanding Big Data: Analytics for Enterprise Class
    Hadoop and Streaming Data, DB2 pureScale: Risk Free
    Agile Scaling, Break Free with DB2 9.7: A Tour of Cost
    Saving Features; Information on Demand: Introduction to
    DB2 9.5 New Features; DB2 Fundamentals Certification for
    Dummies; DB2 for Dummies; and more. Paul is a DB2
    Certified Advanced Technical Expert (DRDA and Clusters)
    and a DB2 Certified Solutions Expert (BI and DBA). In his
    spare time, he enjoys all sorts of sporting activities,
    including running with his dog Chachi, avoiding punches in
    his MMA training, and trying to figure out the world
    according to Chloë—his daughter. You can reach him at:
    paulz_ibm@msn.com.

         @BigData_paulz


2                                                                © 2012 IBM Corporation
Why Big Data
How We Got Here




March 27, 2012    © 2012 IBM Corporation
4
                         In 2005 there were 1.3 billion RFID
                          tags in circulation…




                         …by the end of 2011, this was about
                         30 billion and growing even faster
© 2012 IBM Corporation
An increasingly sensor-enabled and instrumented
     business environment generates HUGE volumes of
        data with MACHINE SPEED characteristics…




              1 BILLION lines of code
5   EACH engine generating 10 TB every 30 minutes!
                                               © 2012 IBM Corporation
350B
                                                         Transactions/Year

                                                          Meter Reads
                                                         every 15 min.


    120M – meter reads/month   3.65B – meter reads/day
6                                                        © 2012 IBM Corporation
 In August of 2010, Adam
      Savage, of “Myth Busters,”
      took a photo of his vehicle
      using his smartphone. He
      then posted the photo to his
      Twitter account including the
      phrase “Off to work.”

     Since the photo was taken by
      his smartphone, the image
      contained metadata revealing
      the exact geographical
      location the photo was taken

     By simply taking and posting
      a photo, Savage revealed the
      exact location of his home,
      the vehicle he drives, and the
      time he leaves for work

7                            © 2012 IBM Corporation
The Social Layer in an Instrumented Interconnected World
                                                                                     4.6
                                                     30 billion RFID             billion
                                                         tags today
                                                                                  camera
                             12+ TBs                   (1.3B in 2005)             phones
                            of tweet data                                          world
                              every day                                             wide



                                                                               100s of
                                                                              millions
                                                                               of GPS
           data every day
    ? TBs of




                                                                              enabled
                                                                                 devices
                                                                                    sold
                                                                                annually

                                   25+ TBs of                                         2+
                                      log data                                   billion
                                     every day                                    people
                                                                                  on the
                                                 76 million smart                Web by
                                                 meters in 2009…                end 2011
                                                  200M by 2014
8                                                                       © 2012 IBM Corporation
Twitter Tweets per Second Record Breakers of 2011
                          Social-media analytics can
                           be used from healthcare to
                           predicting votes

                          Challenges
                           –   Volume
                           –   Velocity
                           –   Variety
                           –   Language Processing: consider
                               that Twitter sentences are not
                               well formed and often use
                               urban talk




9                                                 © 2012 IBM Corporation
Can a Social Media Persona be Monetized?




10                                         © 2012 IBM Corporation
Extract Intent, Life Events, Micro Segmentation Attributes

                       Chloe

                           Name, Birthday, Family
                       Tom Sit

                                 Not Relevant - Noise
                       Tina Mu

                                  Monetizable Intent
                       Jo Jobs
                                 Not Relevant - Noise


               Location                  Wishful Thinking

11
              Relocation
                 Monetizable Intent
                                            SPAMbots
                                                   © 2012 IBM Corporation
12   © 2012 IBM Corporation
1.8 ZB


       1 ZB
     1 ZB=1T GB




                  4Trillion
                    8GB
                   iPods




13                            © 2012 IBM Corporation
The Big Data Opportunity
Extracting insight from an immense volume, variety and velocity of data,
            in context, beyond what was previously possible.


                             Variety: Manage the complexity of data
                                       in many different structures,
                                       ranging from relational, to
                                       logs,
                                        to raw text
                             Velocity:
                                       Streaming data and large
                                       volume data movement
                             Volume:
                                       Scale from Terabytes to
                                       Petabytes (1K TBs) to
                                       Zettabytes (1B TBs)
14                                                          © 2012 IBM Corporation
Data In Motion
           Analyzes and correlates 5M+                             500K/sec, 6B+ IPDRs analyzed
           market messages/sec to execute                          per day on more than 4 PBs/yr.
           algorithmic option trades with                          sustaining 1GBps.
           average latency of 30 micro-secs.




                                                    height:     height:     height:
                                                      640         1280        640
                                               directory: directory: directory: directory:
                                                    width:      width:      width:
                                                  ”/img" ”/img"        ”/opt" ”/img"
                                                      480         1024        480
                                               filename: filename: filename: filename:
                                                    data:       data:       data:
                                                  “cow”     “bird” “java” “cat”
                                                                               tuples
15                                                                                           © 2012 IBM Corporation
The Big Data Conundrum
 The economies of deletion have changed….
     – Leading us into new opportunities and challenges

 The percentage of available data an enterprise can analyze is
  decreasing proportionately to the available to that enterprise

 Quite simply, this means as enterprises, we are getting
  “more naive” about our business over time

                                        Data AVAILABLE to
                                          an organization




                                                          Data an organization
                                                             can PROCESS
16                                                                 © 2012 IBM Corporation
Applications for Big Data Analytics
Smarter Healthcare     Multi-channel        Finance         Log Analysis
                           sales




Homeland Security      Traffic Control      Telecom        Search Quality




     Manufacturing   Trading Analytics   Fraud and Risk   Retail: Churn, NBO




17                                                            © 2012 IBM Corporation
Bigger and Bigger Volumes of Data
 Retailers collect click-stream data from Web site interactions and
  loyalty card-drive transaction data
     – This traditional POS information is used by retailer for shopping basket analysis,
       inventory replenishment, +++
     – But data is being provided to suppliers for customer buying analysis

 Healthcare has traditionally been dominated by paper-based systems,
  but this information is getting digitized

 Science is increasingly dominated by big science initiatives
     – Large-scale experiments generate over 15 PB of data a year and can’t be stored
       within the data center; then sent to laboratories

 Financial services are seeing larger volumes through smaller trading
  sizes, increased market volatility, and technological improvements in
  automated and algorithmic trading

 Improved instrument and sensory technology
     – Large Synoptic Survey Telescope’s GPixel camera generates 6PB+ of image
       data per year or consider Oil and Gas industry
18                                                                           © 2012 IBM Corporation
A Simple Log File
Jun 29 04:03:37 192.168.190.24 nagios: GLOBAL SERVICE EVENT HANDLER: ast-oral-
vip; remedy_oracle_db_latch-contention; (null); (null); (null); handle-all-
critical-event

DATE     TIME     IP ADDRESS   MONITOR
Jun 29 04:03:57 192.168.190.24 nagios: SERVICE ALERT: ast-oral-
vip;remedy_oracle_db_soft-prase-ratio;CRITICAL;HARD;3;CRITICAL – Soft parse
ratio
86.63%

                                           TYPE      COMPONENT
Jun 29 07:50:57 192.168.190.24 nagios: SERVICE ALERT: ast-oral
vip;remedy_oracle_db_latch-waiting; OK;SOFT;2;OK – SGA latch xssinfo freelist
(#350)
sleeping 0.000000% of the time


Jun 29 07:50:57 192.168.190.24 nagios: GLOBAL SERVICE EVENT HANDLER: ast-oral-
vip;remedy_oracle_db_latch-waiting; (null); (null); (null); handle-all-
critical-event



19                                                               © 2012 IBM Corporation
20   © 2012 IBM Corporation
Click Stream Analysis Use Case
                                 Goal: Determine how many abandoned shopping carts there are and
                                 where they were abandoned.
                                 Input: Web log data (IP address, timestamp, URL, HTTP return codes).
                                 Output: List of IP addresses and last page visited by user.

                   Web log input                                            Map Step
                                                                                               K2                 V2
201.67.34.67 | 10:24:48 | www.ff.com | 200                                Make IP          201.67.34.67 10:24:48,www.ff.com
123.99.54.76 | 11:15:21 | www.ff.com/search? | 200                        address the
                                                                                           123.99.54.76 11:15:21,www.ff.com/search?
231.67.66.84 | 02:40:33 | www.ff.com/addItem.jsp | 200
                                                                          key and
                                                                1 line    timestamp        231.67.66.84 02:40:33,www.ff.com/addItem.jsp
101.23.78.92 | 07:09:56 | www.ff.com/shipping.jsp | 200
                                                                          and URL the      101.23.78.92 07:09:56,www.ff.com/shipping.jsp
114.43.89.75 | 01:32:43 | www.ff.com | 200                                value.           114.43.89.75 01:32:43,www.ff.com
109.55.55.81 | 09:45:31 | www.ff.com/confirmation.jsp | 200
                                                                                           109.55.55.81 09:45:31,www.ff.com/confirmation.jsp
         K1 – Ignored, V1 – 1 line of log


                                                                                         Reduce Step
                                        K3                          V3                   Ensure                               K4
      Shuffle/Sort/Group
                                  201.67.34.67            10:24:48,www.ff.com            something is
         Framework                                                                                                      201.67.34.67
                                                                                         in cart and
         does this
         automatically                                                                   determine last
                                                          10:25:15,www.ff.com/search?
                                                                                         page visited.                        V4
                                                                                                                       www.ff.com/shipping
                                                          10:26:23,www.ff.com/shipping




21                                                                                                                     © 2012 IBM Corporation
Solution Architecture – Social Media Analytics for
Media and Entertainment
                           Online flow: Data-in-motion analysis

     Social Media          Stream Computing and Analytics                                          Timely
                                                                                                  Decisions

                                                                      Entity       Predictive
                              Data Ingest     Text Analytics:       Analytics:     Analytics:
                               and Prep       Timely Insights        Profile         Action
                                                                    Resolution    Determination
                                                                                                  Dashboard




                             Hadoop System and Analytics

                                                                  Comprehensive
                                                   Entity
            Social Media                                           Social Media    Predictive       Customer
                             Text Analytics    Analytics and
               Data                                                 Customer       Analytics         Models
                                                Integration
                                                                     Profiles


                       Offline flow: Data-at-rest analysis                                         Reports




22                                                                                                © 2012 IBM Corporation
Most Requested Uses of Big Data
  Log Analytics & Storage

  Smart Grid / Smarter Utilities

  RFID Tracking & Analytics

  Fraud / Risk Management & Modeling

  360° View of the Customer

  Warehouse Extension

  Email / Call Center Transcript Analysis

  Call Detail Record Analysis

  +++
            23
23                                           © 2012 IBM Corporation
Why Didn’t We Use All of the Big Data Before?




     “IBM gave us an opportunity to turn our plans into something that was
     very tangible right from the beginning. IBM had experts within data mining,
     Big Data, and Apache Hadoop and it was clear to use from the beginning we
     wanted to improve our business, not only today, but also prepare for the
     challenges we will face in three to five years, we had to go with IBM.”
     – Lars Christian Christensen VP Plant Siting & Forecasting

24                                                                    © 2012 IBM Corporation
25   © 2012 IBM Corporation
Watson’s advanced analytic capabilities can sort through the equivalent of
26    200 MILLION pages of data to uncover an answer in 3 SECONDS.Corporation
                                                                 © 2012 IBM
IBM Big Data Strategy: Move the Analytics Closer to the Data

  New analytic applications drive                        Analytic Applications
   the requirements for a big data               BI /   Exploration / Functional Industry Predictive Content
                                              Reporting Visualization    App
                                                                                                       BI /
                                                                                   App Analytics Analytics
   platform                                                                                          Reportin
                                                                                                        g
     – Integrate and manage the full
                                                         IBM Big Data Platform
       variety, velocity and volume of data
     – Apply advanced analytics to              Visualization        Application          Systems
                                                                    Development          Management
       information in its native form            & Discovery
     – Visualize all available data for ad-
       hoc analysis                                                    Accelerators
     – Development environment for
       building new analytic applications          Hadoop
                                                   Hadoop              Stream
                                                                      Stream                Data
     – Workload optimization and                   System
                                                   System            Computing
                                                                     Computing            Warehouse
       scheduling
     – Security and Governance

                                                       Information Integration & Governance




27                                                                                       © 2012 IBM Corporation
The IBM Big Data Platform

                                                           InfoSphere BigInsights
                                                           Hadoop-based low latency
                                                         analytics for variety and volume
                                                                   Data-At-Rest


                                                           MPP Hadoop

                                    MPP                                                     MPP Stream
                             Information                                                    Computing
InfoSphere Information
       Server                 Integration                                                                          InfoSphere Streams
                                                                                                                 Low Latency Analytics for
     High volume data                                                                                                 streaming data
      integration and                                                                                            Velocity, Variety & Volume
      transformation
                                                             MPP Data                                                 Data-In-Motion


                                                             Warehouse
                                                    Netezza High        Netezza 1000
                                                  Capacity Appliance BI+Ad Hoc Analytics on
                                                  Queryable Archive for      Structured Data
                                                    Structured Data


                    InfoSphere Warehouse                                                       Smart Analytics System
                        Large volume structured                                                 Operational Analytics on
                             data analytics                                                        Structured Data

28                                                                                                                    © 2012 IBM Corporation
Deep Analytics Appliance – Revolutionized Analytics

       Purpose-built
     analytics appliance
                                             Dedicated High
                                             Performance
Speed:       10-100x faster than
             traditional systems             Disk Storage


Simplicity: Minimal administration
         and tuning

Scalability: Peta-scale user data
         capacity                            Blades With
                                             Custom FPGA
                                             Accelerators
Smart:       High-performance
     advanced analytics




29                                          © 2012 IBM Corporation
Complementary Analytics
                  Traditional Approach                               New Approach
                  Structured, analytical, logical             Creative, holistic thought, intuition




                               Data                                Hadoop
                             Warehouse                             Streams
       Transaction Data                                                                      Web Logs

     Internal App Data                                                                          Social Data
                   Structured
                       Structured                                       Unstructured
                  Repeatable                        Enterprise
                                                                  Unstructured
                                                                        Exploratory
     Mainframe Data
                       Repeatable                   Integration    Exploratory Text Data: emails
                       Linear                                           Iterative
                               Linear
             Monthly sales reports                                    Iterativesentiment
                                                                           Brand
              Profitability analysis                                         Product Sensor data: images
                                                                                     strategy
       OLTP System Datasurveys
               Customer                                                      Maximum asset utilization


          ERP data             Traditional                          New                       RFID
                                Sources                            Sources




30                                                                                               © 2012 IBM Corporation
Most Client Use Cases Combine Multiple Technologies
                          Pre-processing
                              Ingest and analyze unstructured data
                              types and convert to structured data



                          Combine structured and unstructured analysis
                              Augment data warehouse with additional external
                              sources, such as social media



                          Combine high velocity and historical analysis
                              Analyze and react to data in motion; adjust
                              models with deep historical analysis


                          Reuse structured data for exploratory analysis
                              Experimentation and ad-hoc analysis with
                              structured data



31                                                              © 2012 IBM Corporation
Data Governance


                              ed                                     Some
                  im es term                                         “Sche
                                                                          times
                                                                                t e r me
            Somet     a Fi rs
                              t”                                           ma L         d
                   m                                                             ater”
             “Sche
 Separation of Duties                                    Separation of Duties
     – Users can’t delete their audit logs                   – No authorized access


 Privilege Users                                         Privilege Users
     – Monitoring authorized and privilege                   – Monitoring authorized and privilege
       user activities                                         user activities

 Sensitive Data (in tables)                              Sensitive Data (in the file system)
     – Protecting and blocking unauthorized                  – Protecting and blocking unauthorized
       access to the system and the data                       access to the system and the data

 Workflow for audit and compliance                       Workflow for audit and compliance
  controls to validate proper security                     controls to validate proper security
  procedures (satisfies regulations!)                      procedures (satisfies regulations!)
32                                           Internal Use Only                         © 2012 IBM Corporation
Data Security Design Goals


                          ed                            Some
               imes term                                     tim
                                                        “Sche es termed
         Somet    a Fi rs
                          t”                                  ma L
                m                                                  ater”
          “Sche
 Minimal impact to the Database               Minimal impact to the Big Data
  server resources                              server resources
 No Database configuration changes            No Big Data configuration changes
 Separation of Duties: Preventing             Separation of Duties: Preventing
  privilege users from doing malicious          privilege users/jobs from doing
  activities                                    malicious activities
 Audit trail with very granular details       Audit trail with very granular details
  of Database activity                          of Big Data activity
 Real-time alerting and blocking              Real-time alerting and blocking
 Minimal impact to the network                Minimal impact to the network
 100% Database activity visibility            100% Big Data activity visibility
 Heterogeneous support                        Heterogeneous support

33                                Internal Use Only                     © 2012 IBM Corporation
Guardium



                              Customer Identification                                     Privacy
                              Master Data Management
                              Master Data Management                                    Data Privacy
                                                                                        Data Privacy
           InfoSphere
                                         InfoSphere                              Optim for Test Data,
             Quality                                       DB2                     Redaction, +++
                                            MDM
              Stage




      Data
                                        Customer Intelligence Appliance
      Data
     Quality
     Quality
                                                             Data Models         Out-of-the-box analytics
      Information Server




                                                                                         Cognos


                                                           Pre-built
                                    Customer Integration   behavioral                  IBM Global Business
                                         Appliance         attributes                       Services




                           IBM Retail Data Model                  Core Metrics                Unica


     Enterprise Data Warehouse
     Enterprise Data Warehouse                                 Applications and Operational Analytics
                                                               Applications and Operational Analytics


                             Online Archive                        OLTP and Big Data Integration
                           Managing Growth
                           Managing Growth                         Built-in Integration into Big Data
                                                                   Built-in Integration into Big Data

                Optim Data Archive                           DB2 SAP - HANA              solidDB
                                                                                                    Informix
                                                                                 DB2




34                                                                                                             © 2012 IBM Corporation
IBM Flattens the Time to Value Big Data Curve

                +            +                  +          +                  +                 +
                                    Harden                                        Text/ML
     Velocity       Hadoop                           ETL       Dev. Tooling                          Visualization
                                  File System                                     Analytics




                                    Harden
                                                    100+Big Data Tooling
                                                              Dev.                Text/ML
     Velocity       Hadoop                          ETL                                             Visualization
                                  File System                                     Analytics
                                                    Business Partners
                +                               +          +                  +
                             OR




                                                         Signed                                 +




35                                                                                            © 2012 IBM Corporation
First Ever Forrester Wave on Big Data


                               “IBM has the deepest
                               Hadoop platform and
                               application portfolio. IBM,
                               an established EDW vendor,
                               has its own Hadoop
                               distribution; an extensive
                               professional services force
                               working on Hadoop projects;
                               extensive R&D programs
                               developing Hadoop
                               technologies; connections to
                               Hadoop from its EDW.”
                               –The Forrester Wave™: Enterprise
                               Hadoop Solutions, 1Q12


36                                                    © 2012 IBM Corporation
Open Source
    Technology Behind
         Hadoop




March 27, 2012          © 2012 IBM Corporation
A Difference of Processing Models
  SETI@home is a Computational Processing Model
     – Service for Extraterrestrial Intelligence (SETI) uses
       unused desktop CPU processing power to perform
       wide-spread analysis of radio telescope data
     – Pushes data to the program for processing
     – Data to function
  MapReduce is a Data Processing Model
     – Data processing primitives of Mappers and Reducers
     – Can be complex to write MapReduce programs, but a
       very simple configuration change makes it scale to
       1000s of nodes
     – Function to data


38                                                             © 2012 IBM Corporation
Building Blocks of Hadoop




39                          © 2012 IBM Corporation
Hadoop Framework
 Hadoop Common
     – A utility layer that provides access to the HDFS and projects

 HDFS
     – Data storage platform for Hadoop framework
     – Can scale to massive size when distributed over
       multiple computers

 MapReduce
     – Framework to process data across a node clusters
     – MAP process splits work by first mapping the input
       across the control nodes of the cluster, then splitting
       the workload into even smaller data sets and distributing
       it further throughout the computing cluster
        • Allows MPP – think DB2 DPF ‘like’
     – REDUCE collects and combines the nodes’ answers to deliver a result

40                                                                     © 2012 IBM Corporation
Hadoop Explained
      Hadoop computation model
                  – Data stored in a distributed file system spanning many inexpensive computers
                  – Bring function to the data
                  – Distribute application to the compute resources where the data is stored
      Scalable to thousands of nodes and petabytes of data
     public static class TokenizerMapper
       public static class TokenizerMapper
                                                                                         Hadoop Data Nodes
          extends Mapper<Object,Text,Text,IntWritable> {
           extends Mapper<Object,Text,Text,IntWritable> {
        private final static IntWritable
          private final static IntWritable
             one = new IntWritable(1);
               one = new IntWritable(1);
        private Text word = new Text();
          private Text word = new Text();
        public void map(Object key, Text val, Context
          public void map(Object key, Text val, Context
            StringTokenizer itr =
             StringTokenizer itr =
                new StringTokenizer(val.toString());
                 new StringTokenizer(val.toString());


                                                                                                             1. Map Phase
            while (itr.hasMoreTokens()) {
             while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
             word.set(itr.nextToken());
               context.write(word, one);
                context.write(word, one);
            }
             }

                                                                                                               (break job into small parts)
        }
          }
     }
       }
     public static class IntSumReducer
      public static class IntSumReducer
        extends Reducer<Text,IntWritable,Text,IntWrita


                                                                                                             2. Shuffle
         extends Reducer<Text,IntWritable,Text,IntWrita

                                                                  Distribute map
       private IntWritable result = new IntWritable();
        private IntWritable result = new IntWritable();
       public void reduce(Text key,
        public void reduce(Text key,
          Iterable<IntWritable> val, Context context){

                                                                 tasks to cluster
           Iterable<IntWritable> val, Context context){

                                                                                                               (transfer interim output
         int sum = 0;
          int sum = 0;
         for (IntWritable v : val) {
          for (IntWritable v : val) {
           sum += v.get();
            sum += v.get();
     . . .
      . . .                                                                                                    for final processing)

MapReduce Application
                                                                                                             3. Reduce Phase
                                                                                                               (boil all output down to
                                                                         Shuffle                               a single result set)




                 Result Set                                 Return a single result set

41                                                                                                                        © 2012 IBM Corporation
Growing the Hadoop Environment
 Avro: Data serialization system that converts
  data into a fast, compact binary data format
     – Avro data can be stored in a file with versioned schemas

 Chukwa: Large-scale monitoring system that provides
  insights into the Hadoop distributed file system
  and MapReduce

 HBase is a scalable, column-oriented distributed
  database modeled after Google's BigTable distributed
  storage system

 Hive is a data warehouse infrastructure that provides
  ad hoc query and data summarization for Hadoop
  Supported data
     – Hive utilizes SQL-like query language call HiveQL
     – HiveQL also used by programmers to run custom MapReduce jobs.
42                                                                     © 2012 IBM Corporation
Growing the Hadoop Environment
  Mahout is a data mining library designed to work on
   the Hadoop framework
     – Mahout delivers a core set of algorithms designed for clustering,
       classification and batch-based filtering.

  Pig is a high-level programming language and execution
   framework for parallel computation
     – Pig works within the Hadoop and MapReduce frameworks

  ZooKeeper provides coordination, configuration and
   group services for distributed applications working over
   the Hadoop stack




43                                                                         © 2012 IBM Corporation
Committed to Open Source
 Decade of lineage and contributions
  to the open source community
     – Apache Hadoop and Jaql, Apache Derby,
       Apache Geronimo, Apache Jakarta, +++
     – Eclipse: founded by IBM
     – Significant Lucene contributions via IBM
       Lucene Extension Library (ILEL)
     – DRDA, XQuery, SQL, XML4J, XERCES,
       HTTP, Java, Linux, +++

 IBM products built on open source
     – WebSphere: Apache
     – Rational: Eclipse and Apache
     – InfoSphere: Eclipse and Apache, +++

 IBM’s BigInsights (Hadoop) is 100%
  open source compatible with
  no forks



44                                                © 2012 IBM Corporation
Learn Hadoop: At Your Place, At Your Pace
Making Learning Hadoop Easy and Fun

 Flexible on-line delivery allows
  learning @your place and
  @your pace

 Free courses, free study materials

 Cloud-based sandbox for
  exercises – zero setup

 8500+ registered students

 Hadoop Programming Challenge is
  sending 3 students to IOD 2011
  Conference, all expenses paid

45                                          © 2012 IBM Corporation
Why IBM
            for Big Data
          The Solution Side



Paul Zikopoulos, BA, MBA
Director, IBM Information Management Technical Professionals,
WW Competitive Database, and Big Data
IBM Certified Advanced Technical Expert (Clusters and DRDA)
IBM Certified Customer Solutions Expert (DBA and BI)
paulz@ca.ibm.com

    March 27, 2012                                              © 2012 IBM Corporation
A Big Data Platform
Analytics Excellence                     In-Motion Operational      At-Rest Operational Excellence
Text Analytics Toolkit                        Excellence                           Harden Hadoop - GPFS
Machine Learning Toolkit                                                Surface Area Lock Down
                                              Unrivalled….
Industry Accelerators
Development Tooling               Embrace and Extend                Policy Driven Retention & Immutability
                                                                                       Role-Based Security
Visualization Tooling                                                                 Adaptive MapReduce
Deployment Tooling (“App                                                                Workload Manager
Store”)                                                                 Fast Splittable CMX Compression
$14B in 5 yrs. on Analytics                                        REST-exposed Administration
+++                                           Open Source
                                             IBMHadoop
                                                 Big Data                                            +++
                                               Platform

       In-Motion                                                        At-Rest
     Analyze extreme amounts of                                        Beyond traditional
        data in milliseconds                                            structured data
         Uses same analytics as                              BigInsights uses same analytics as Streams
              BigInsights                                    No forked, not ported: Hadoop Extended with
     Data can be analyzed on the way                              operational excellence and security
       into the enterprise for earlier                           Netezza for in-database MapReduce
              pattern detection                                        MPP Data Warehouses

47                                                                                     © 2012 IBM Corporation
Committed to Innovation
 $100M new investment into
  analytics announced 2Q11

 IBM spent $14+ billion in 24
  analytics acquisitions in 5 years

 IBM has the largest commercial
  research organization on Earth
      – 200+ mathematicians developing
        breakthrough analytics

 Largest patent portfolio in
  the industry


For   19    consecutive years IBM inventors have received the most U.S. Patents


        6,000
More than
48
                          patents in a single year                            © 2012 IBM Corporation
A Big Data Platform for Data In-Motion and At-Rest
                                                                               Data Ingest

                                                          01011001100011101001001001001
110100101010011100101001111001000100100010010001000100101 11000100101001001011001001010
                                                          01100100101001001010100010010
       Opportunity Cost Starts Here




                                                          01100100101001001010100010010
                                                          11000100101001001011001001010
                                                          01100100101001001010100010010
                       Bootstrap                          01100100101001001010100010010
                        Enrich                            01100100101001001010100010010
                                                          01100100101001001010100010010
                                                          11000100101001001011001001010
                                                          01100100101001001010100010010
                                                          01100100101001001010100010010
                                                          01100100101001001010100010010
                                      Adaptive            01100100101001001010100010010
                                                          01100100101001001010100010010
                                      Analytics           11000100101001001011001001010
                                                          01100100101001001010100010010
                                       Model              01100100101001001010100010010
                                                          01100100101001001010100010010
                                                          11000100101001001011001001010



  49                                                                          © 2012 IBM Corporation
Data In Motion
                  Hear what’s going on miles away to optimize
                   perimeter displacements

                  Perspective: Try to find the word “Zero” in a
                   1000 MP3 song library in a fraction of a second
                    – Figure out the difference between the sound of a
                      human whisper and the wind




50                                                          © 2012 IBM Corporation
Average cluster size is 100 nodes



                       one
                    hour
     What if        of this 100-node
               cluster cost


51
                      $34                  © 2012 IBM Corporation
52   © 2012 IBM Corporation
Ease of Use for Developers and Users




     End-user Visualization          Development Environment
      Data exploration, crawling,         Familiar coding and tooling
            and analytics           environment, testing, and optimization

53                                                              © 2012 IBM Corporation
What is BigSheets?
     Browser-based Big Data analytics tool for business users

     Big Data Challenges…               How can BigSheets help?
 Business users need a no           Spreadsheet-like discovery interface lets
  programming approach for            business users easily analyze Big Data
  analyzing Big Data                  with ZERO      PROGRAMMING

 Extremely difficult to find
  actionable business insights in    BUILT-IN “readers” can work
  data from multiple sources with     with data in several common formats
  different formats                    – JSON arrays, CSV, TSV, Web
                                         crawler output, . . .

 Translating untapped data into
  actionable business insights is    Users can VISUALLY combine and
  a common requirement that
                                      explore various types of data to identify
  requires visualization
                                      “hidden” insights

54                                                                © 2012 IBM Corporation
55   © 2012 IBM Corporation
8% spread
     change
     affecting
     brand

     Over 1M
     tweets
     analyzed

     2nd most
     Tweets/sec
     run rate as
     of 1Q12




56                 © 2012 IBM Corporation
Big Data Made Easy for the Little Guy
  USC’s Film Forecaster correctly predicted a
   clamor for "Hangover 2” that resulted in $100
   million opening over Memorial Day weekend
     – Looked at 250K-500K Tweets and broke down
       positive and negative messages using a lexicon
       of 1700 words




                                                 The Film Forecaster sounds like a
                                                 big undertaking for USC, but it really
                                                 came down to one communications
                                                 masters student who learned Big
                                                 Sheets in a day, then pulled in the
                                                 tweets and analyzed them - Ryan Kim



57                                                                      © 2012 IBM Corporation
Running Applications from the Web Console




58                                      © 2012 IBM Corporation
A Rich Management Big Data Tool




                                          Quick links to
                                        common functions




                                               Learn more
                                            through external
          Where and how to begin
                                             Web resources
     performing common administrative
            or analytical tasks




59                                                    © 2012 IBM Corporation
A Rich Management Big Data Tool




60                                © 2012 IBM Corporation
Rich Big Data Development Environment
                       Eclipse base Development tools
                         – For JQAL, Hive, PIG, Java
                           MapReduce and Text Analytics




61                                              © 2012 IBM Corporation
The Path to Efficiency: Declarative Languages
                                Offline                                                               Runtime

     Development Environment

                                                                                                                  Extracted
      Declarative Language
       Declarative Language                  Select
                                                                                                                  Objects
                                             Regex
 create view MonetizableIntent as
 select P.name as product,                    Join
        I.clue as strength                                      Select
 from Intent I, Product P               Intent     Product
                                                                Join
                                                                                                       Runtime
                                                                                                        Runtime
 where
     Follows(I.clue, P.name, 0, 20)                          Regex
                                                                     Product
 and Not(ContainsRegex(/b(not)b/,                                            Cost-based
     LeftContext(I.clue, 10)));                        Intent                  optimization                       Input
                                                 …

                                                                                                                  Documents

                                                      Join

                                                 Select
                                                          Product
  Streams, Text Analytics, Machine         Regex                                              High-throughput
   Learning, SQL all are declarative,
   simple to learn languages              Intent                                               Small memory footprint
  All have strong development
   tooling and accelerators                                                                    Optimizes for tasks:
  IBM has been optimizing
   declarative languages for decades,                                                               Example, Text
   IN FACT, IBM INVENTED IT!                                                                         Analytics needs CPU
                                                                                                     optimization

62                                                                                                         © 2012 IBM Corporation
Analytic Accelerators Designed for Variety

         Text          Simple &
     (listen, verb),                                          Acoustic
     (radio, noun)     Advanced Text



                       Mining in                              Advanced
                       Microseconds                           Mathematical Models



                       Predictive        ∑    R ( st , at )
                                       population
                                                              Statistics




                       GeoSpatial                             Image & Video


63                                                                         © 2012 IBM Corporation
Accelerators Improve Time to Value
                  Telecommunications                     Retail Customer
                  CDR streaming analytics                Intelligence
                  Deep Network Analytics                 Customer Behavior and
                                                         Lifetime Value Analysis


                  Finance                                Social Media
                  Streaming options trading              Analytics
                  Insurance and banking DW               Sentiment Analytics, Intent to
                  models                                 purchase


                  Public                                 Data Mining
                  Transportation                         Streaming statistical
                  Real-time monitoring and               analysis
                  routing optimization




Over 100 sample          User Defined         Standard         Industry Data
 applications              Toolkits           Toolkits            Models
                                                             Banking, Insurance, Telco,
                                                                Healthcare, Retail
64                                                                      © 2012 IBM Corporation
Telecommunications CDR Analytic Accelerator
             Analyze Call Detail
            Records in Real Time

Streaming Analytic Accelerators
     – CDR dropped call analysis
     – Determine VIP customers with service
       issues – proactive alerts
     – CDR Adapters – ASN.1, Binary, ASCII
     – Analytic Operators – CDR de-duplication,
       dropped call detection, termination reason,
       customer importance
     – Visualization – real-time KPI dashboard


Data Warehouse Appliance
     – Integrated network, devices, customer, and
       services model
     – Telecom model, KPIs, and KQIs

65                                                   © 2012 IBM Corporation
Without a Big Data Platform                                                   IBM Big Data Platform
                    You Code…
                                                                          Over 100 sample applications and
                                                                            toolkits with industry focused
          Event
                                                                           toolkits with 300+ functions and
                          Custom SQL
         Handling             and                                                      operators!
                            Scripts
                                         Multithreading                  Streams provides development, deployment,
                                                                             runtime, and infrastructure services
      Check         Application
     Pointing       Management
                                                          Accelerators
                                    HA                        and

                                                            Toolkits




                          Performance         Debug
          Connectors
                          Optimization



                                                                          “TerraEchos developers can deliver
     Security                                                             applications 45% faster due to the agility
                                                                          of Streams Processing Language…”
                                                                          – Alex Philip, CEO and President

66                                                                                                           © 2012 IBM Corporation
Streams Runtime Illustrated
                                                                      Optimizing scheduler assigns PEs                Dynamically add
                                                                      to nodes, and continually manages               nodes and jobs
                                                                      resource allocation

                                                                      Commodity hardware – laptop,                    Add in SPSS
                                                                      blades or high performance clusters             jobs in the flow



                           Meters

                                               Company    Usage
                                                          Model
                                               Filter                                                  Temp
                  Meters                                                                               Action

                                     Text                                    Season     Daily
           Usage                     Extract                                 Adjust     Adjust
           Contract


                                    Text
                                    Extract
                                                Degree
                                                History
                                                            Compare
                                                            History                              Store
                                                                                                 History




     X86 Host              X86 Host                       X86 Host                X86 Host                 X86 Host




67                                                                                                                       © 2012 IBM Corporation
Streams Runtime Illustrated

                                PEs on busy nodes, can                                              PEs on failing nodes can be
                                be moved manually by the                                            moved automatically, with
                                Streams administrator                                               communications re-routed




                           Meters

                                               Company    Usage
                                                          Model
                                               Filter                                           Temp
                  Meters                                                                        Action

                                     Text                             Season     Daily
           Usage                     Extract                          Adjust     Adjust
           Contract


                                    Text
                                    Extract
                                                Degree
                                                History
                                                            Compare
                                                            History                       Store
                                                                                          History




     X86 Host              X86 Host                       X86 Host         X86 Host                 X86 Host




68                                                                                                                       © 2012 IBM Corporation
How Text Analytics Works
                Football World Cup 2010, one team distinguished
                themselves well, losing to the eventual champions 1-
                0 in the Final. Early in the second half,
                Netherlands’ striker, Arjen Robben, had a
                breakaway, but the keeper for Spain, Iker Casilas
                made the save. Winger Andres Iniesta scored for
                Spain for the win.



         World Cup 2010 Highlights



         Arjen Robben                Striker        Netherlands
          Iker Casilas               Keeper             Spain
         Andres Iniesta              Winger             Spain




69                                                                 © 2012 IBM Corporation
What’s Wrong with Text Analytics Today
 Current alternative approaches and infrastructure for text
  analytics present challenges for analysts
     – They tend to perform poorly (in terms of accuracy and speed)
     – They are difficult to use

 These alternative approaches rely on the raw text flowing only
  forward through a system of extractors and filters
     – Inflexible and inefficient approach, often resulting in redundant processing

 Existing toolkits are also limited in their expressiveness
     –   Analysts having to develop custom code
     –   Programmer  Analyst (think Java Developer  DBA struggles)
     –   Leads to more delays, complexity, and difficulties getting it right
     –   Biggest factor hurting analyst productivity is the difficulty in determining
         how the system produced a certain result


70                                                                        © 2012 IBM Corporation
Extracting Person and Phone Relationships




71                                          © 2012 IBM Corporation
Text Analytic Toolkit Example
  Semantically search Enron’s emails to support queries such as
   “Find Tom’s phone” in order to find his actual phone number
     – As opposed to emails containing words Tom and Phone

  Need to be able to accurately extract email entities of type Person
   and Phone and the relationship between them




72                                                            © 2012 IBM Corporation
The Enron Email Example
 Start with a naïve set of rules to identify a PersonPhone relationship
  built using some sort of editor
     – Build out an extractor that know how to find a person’s name: Lorraine Smith
     – Build out an extractor that knows how to find a phone number: 607-205-4493




          607.205.4493     Euro spacing
          xt. ext. …       North American spacing
          +0119054132112   Globalstar GSP-1600
          911, 411         +++

73                                                                     © 2012 IBM Corporation
The Tricky Thing About Sentiment…




74                                  © 2012 IBM Corporation
Text Analytics Toolkit
 System T text analytics engine previously only embedded in IBM products and
  hidden from end users
     – Found in Lotus Notes, IBM e-discovery Analyzer, CCI, InfoSphere Warehouse,+++
     – Almost a decade since initial release

 BigInsights is the first time IBM opens up the Text Analytics Engine
  technology for customization and development

 BigInsights Text Analytic Toolkit provides developer tools, an easy to use text
  analytics language, and a set of extractors for fast adoption
   – Multilingual support, including support for DBCS languages

 BigInsights includes Annotator Query Language (AQL): SQL-like!
     – Fully declarative text analytics language
     – No “black boxes” or modules that can’t be customized.
     – Tooling for easy customization because you are abstracted from the
       programmatic details
     – Competing solutions make use of locked up black-box modules that cannot be
       customized, which restricts flexibility and are difficult to optimize for performance
75                                                                            © 2012 IBM Corporation
Accelerating Analytics – Explainability
  Every annotation’s provenance can be visualized

  Provenance of Morgan Stanley Fax: 205-4493 shows the
   FullName rule is responsible for generating incorrect annotation
     – Solutions?
       • Remove Morgan Stanley from FullName dictionary?
       • Create new dictionary for CompanyName?




                                                           Text Analytics Toolkit




76                                                                  © 2012 IBM Corporation
Text Analytics Toolkit Provenance Viewer
  Major challenge for analysts is determining the lineage of changes
   that have been applied to text
     – REALLY difficult to diskern which extractors need to be adjusted to tweak the
       resulting annotations

  Provenance Viewer for interactive visualizations to display output
   annotations for regression, debugging, version enhancements, +++
     – Reduce development of extractors by days to weeks




77                                                                                © 2012 IBM Corporation
Accelerating Analytics – discovery and Pattern Matching
 Contextual Clues discoverer cluster the context surrounding annotation in
  order to detect frequently occurring patterns

 Illustrate clustering results between incorrect PersonPhone pairs
     – Visualize frequent occurrences (bubble size) of clues: i.e. FAX and TELEFAX
     – Developer improves precision of annotator by adding a rule that filters out PersonPhone
       pars if the Phone is preceded by a FAX clue
     – Developer finds call at text as key clue for rule extraction for PersonPhone




                                                                         Text Analytics Toolkit




78                                                                                © 2012 IBM Corporation
Accelerating Analytics – Assisted Development
 IDE that exposes sophisticated techniques to assist rule developers
  throughout all states of the development cycle
     – Promotes agile development because it unifies the business analyst and the
       developer with a common toolset which fosters understanding
        • Business Analyst helps mature and validates extractor results
        • Developer understands visually that AQL enhancements needed to refine rules




                                                                 Text Analytics Toolkit




79                                                                        © 2012 IBM Corporation
IBM Text Analytics Toolkit - Development Toolset




                                Context Sensitive Help
                                                         Outline View

                     Hyperlink Navigation



     Procedural
        Help



80                                                         © 2012 IBM Corporation
BigInsights Text Analytics Development


                                                           Difference
                  Content Assist                               Viewer




                                                                 Regular Expression Builder




                                              Context Sensitive Help
                                                                         Outline View

                                   Hyperlink Navigation



     Procedural
        Help                 Design Time Validation



81                                                                          © 2012 IBM Corporation
Text Analytics Toolkit – Development Accelerator
  Eclipse plug-ins enhance analyst productivity
     – When writing AQL code, the editor features syntax highlighting, and automatic
       detection of syntax error at design time not runtime
     – Pre-built extractors and sample test tools, +++

  Reduce coding time and debugging by 30-50%+!




82                                                                       © 2012 IBM Corporation
Text Analytics Toolkit – Performance Accelerator
  AQL code is highly optimized for MapReduce because of its
   declarative nature

  Unlike other frameworks, AQL optimizer determines order of
   execution of the extractor instructions for maximum efficiency

  Deliver analysis up to 10x faster than other leading alternative
   frameworks running the same extractors




83                                                              © 2012 IBM Corporation
84   © 2012 IBM Corporation
Business Task: Find the Pictures that Have Cats




85                                        © 2012 IBM Corporation
BUT! Your Application Returns the Following…
                                PRECISION
                               (a measure of exactness)




                                 2/4
                                   RECALL
                                (a measure of coverage)




                                  2/5
86                                            © 2012 IBM Corporation
IBM Finds RIGHT Answers Better Than Anyone




87                                     © 2012 IBM Corporation
IBM’s Hadoop System Provides Unique Business Value

 Optimized beyond open source
  Hadoop
     – Workload optimization, security, +++

 Integration with enterprise systems
     – Connectors for multiple data sources

 Accelerators reduce development
  and implementation times
     – Industry & application accelerators
     – Analytic accelerators

 Visualization tools enable business
  users to explore Big Data



88                                            © 2012 IBM Corporation
University of Southern Cal
      Political Debate Monitoring

      Solution to measure public sentiment
       during the Republican Primary and
       Presidential Debates
      Examines trends, volume, and content
       of millions of public Twitter messages
       in real-time
      Analytic accelerators to understand
       sentiment (positive, negative, neutral)
        - Stream Computing and visualization
      Benefits
        - Real-time display of public sentiment as
          candidates respond to questions
        - Debate winner prediction based on
          public opinion instead of solely political
          analysts
89                                    © 2012 IBM Corporation
Cisco turns to IBM big
      data for intelligent
         infrastructure
          m anagem ent
      Optimize building energy
       consumption with centralized
       monitoring and control of
       building monitoring system
      Automates preventive and
       corrective maintenance of
       building corrective systems
      Uses Streams, InfoSphere
       BigInsights and Cognos
        -   Log Analytics
        -   Energy Bill Forecasting
        -   Energy consumption optimization
        -   Detection of anomalous usage
        -   Presence-aware energy mgt.
90
90      -   Policy enforcement 2012 IBM Corporation
                              ©
Stream Computing Provides Unique Business Value

 Real-time answers = low latency
  insight
     – Better outcomes for time sensitive
       applications (e.g. fraud detection, network
       management)

 Solution when data is too large or
  expensive to store
     – Analyze data as it comes to you
     – Persist data of interest for deeper analysis

 Insights derived across multiple
  streams
     – Fuse streams for new insights




91                                                    © 2012 IBM Corporation
Asian telco reduces
     billing costs and
     improves customer
     satisfaction
     Capabilities:

         Stream Computing
         Analytic Accelerators


     Real-time mediation and analysis of
      6B CDRs per day
     Data processing time reduced from
      12 hrs to 1 sec

     Hardware cost reduced to 1/8th
     Proactively address issues
      (e.g. dropped calls) impacting
      customer satisfaction.

92
92                          © 2012 IBM Corporation
Data Warehousing Provides Unique Business Value

  Consolidate, manage and
   reconcile data for enterprise
   business intelligence

  Establish trust, quality and
   governance where necessary
     – Financial data
     – Credit card data
     – Healthcare

  Combine deep and operational
   analytics

  Maintain history for trending
   and historical reporting


93                                      © 2012 IBM Corporation
Pacific Northwest Smart
Grid Demonstration Project

  Capabilities:

      Stream Computing – real-time
      control system

      Deep Analytics Appliance –
      analyze massive data sets




  Demonstrates scalability from
  100 to 500K homes while
  retaining 10 years’ historical data

  60k metered customers in 5 states

  Accommodates ad hoc analysis of
  price fluctuation, energy consumption
  profiles, risk, fraud detection, grid
  health, etc.


 94                                       © 2012 IBM Corporation
Information Integration Provides Unique Business Value


 Movement of large data sets in
  batch and real time
     – Parallel processing engine for
       efficient data movement

 Governance and trust for
  Big Data
     – Lineage and meta data of new Big
       Data data sources
     – Profile sources to determine trust

 Data Quality
     – Standardize and transform data




95                                           © 2012 IBM Corporation
Leaders integrate and govern massive Big Data
 Marketing Services
with InfoSphere big
 Leader integrates
 data for customer
 intelligence
 Capabilities Utilized:
       Information Integration – data
       quality, ETL
       Deep Analytics Appliance




 Complex customer data
 integration for
 54M records/hour


 Processing
 5B simultaneous records
 96
9696                                     © 2012 IBM Corporation
@BigData_paulz




 THINK

                       PDF    http://tinyurl.com/88auawg
                      Hardcopy http://tinyurl.com/7ef9ubs
     March 27, 2012                     © 2012 IBM Corporation
97

Mais conteúdo relacionado

Mais procurados

Why Infrastructure Matters for Big Data & Analytics
Why Infrastructure Matters for Big Data & AnalyticsWhy Infrastructure Matters for Big Data & Analytics
Why Infrastructure Matters for Big Data & AnalyticsRick Perret
 
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address RequirementsGov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address RequirementsDataWorks Summit
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop SampleAlan Quayle
 
Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714Niu Bai
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and howbobosenthil
 
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataNextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataEd Dodds
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
Telco Big Data 2012 Highlights
Telco Big Data 2012 HighlightsTelco Big Data 2012 Highlights
Telco Big Data 2012 HighlightsAlan Quayle
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data AnalyticsTUSHAR GARG
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersDataWorks Summit
 
Big data, data science & fast data
Big data, data science & fast dataBig data, data science & fast data
Big data, data science & fast dataKunal Joshi
 
Big Data Infrastructure and Analytics Solution on FITAT2013
Big Data Infrastructure and Analytics Solution on FITAT2013Big Data Infrastructure and Analytics Solution on FITAT2013
Big Data Infrastructure and Analytics Solution on FITAT2013Erdenebayar Erdenebileg
 
IBM InfoSphere Data Replication for Big Data
IBM InfoSphere Data Replication for Big DataIBM InfoSphere Data Replication for Big Data
IBM InfoSphere Data Replication for Big DataIBM Analytics
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forumbigdatawf
 

Mais procurados (20)

Why Infrastructure Matters for Big Data & Analytics
Why Infrastructure Matters for Big Data & AnalyticsWhy Infrastructure Matters for Big Data & Analytics
Why Infrastructure Matters for Big Data & Analytics
 
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address RequirementsGov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
Big Data Overview
Big Data OverviewBig Data Overview
Big Data Overview
 
Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataNextGen Infrastructure for Big Data
NextGen Infrastructure for Big Data
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Telco Big Data 2012 Highlights
Telco Big Data 2012 HighlightsTelco Big Data 2012 Highlights
Telco Big Data 2012 Highlights
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 
Big data, data science & fast data
Big data, data science & fast dataBig data, data science & fast data
Big data, data science & fast data
 
Big Data Infrastructure and Analytics Solution on FITAT2013
Big Data Infrastructure and Analytics Solution on FITAT2013Big Data Infrastructure and Analytics Solution on FITAT2013
Big Data Infrastructure and Analytics Solution on FITAT2013
 
IBM InfoSphere Data Replication for Big Data
IBM InfoSphere Data Replication for Big DataIBM InfoSphere Data Replication for Big Data
IBM InfoSphere Data Replication for Big Data
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forum
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 

Destaque

Social Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data TechnologiesSocial Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data TechnologiesNicolas Morales
 
Big Data and Analytics: The IBM Perspective
Big Data and Analytics: The IBM PerspectiveBig Data and Analytics: The IBM Perspective
Big Data and Analytics: The IBM PerspectiveThe_IPA
 
High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)Jose Luis Lopez Pino
 
5 Level of MDM Maturity
5 Level of MDM Maturity5 Level of MDM Maturity
5 Level of MDM MaturityPanaEk Warawit
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?James Serra
 
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Nathan Bijnens
 
Creating a Data-Driven Organization: an executive summary
Creating a Data-Driven Organization: an executive summaryCreating a Data-Driven Organization: an executive summary
Creating a Data-Driven Organization: an executive summaryCarl Anderson
 
Data strategy in a Big Data world
Data strategy in a Big Data worldData strategy in a Big Data world
Data strategy in a Big Data worldCraig Milroy
 
Developing a Data Strategy -- A Guide For Business Leaders
Developing a Data Strategy -- A Guide For Business LeadersDeveloping a Data Strategy -- A Guide For Business Leaders
Developing a Data Strategy -- A Guide For Business Leadersibi
 
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyBecoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyDATAVERSITY
 
Palantir, Quid, RecordedFuture: Augmented Intelligence Frontier
Palantir, Quid, RecordedFuture: Augmented Intelligence FrontierPalantir, Quid, RecordedFuture: Augmented Intelligence Frontier
Palantir, Quid, RecordedFuture: Augmented Intelligence FrontierDaniel Kornev
 
Data Governance: Keystone of Information Management Initiatives
Data Governance: Keystone of Information Management InitiativesData Governance: Keystone of Information Management Initiatives
Data Governance: Keystone of Information Management InitiativesAlan McSweeney
 
Four ways data is improving healthcare operations
Four ways data is improving healthcare operationsFour ways data is improving healthcare operations
Four ways data is improving healthcare operationsTableau Software
 
Tableau presentation
Tableau presentationTableau presentation
Tableau presentationkt166212
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Destaque (19)

Social Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data TechnologiesSocial Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data Technologies
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Big Data and Analytics: The IBM Perspective
Big Data and Analytics: The IBM PerspectiveBig Data and Analytics: The IBM Perspective
Big Data and Analytics: The IBM Perspective
 
The Palantir Seeing Stone
The Palantir Seeing StoneThe Palantir Seeing Stone
The Palantir Seeing Stone
 
High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)
 
5 Level of MDM Maturity
5 Level of MDM Maturity5 Level of MDM Maturity
5 Level of MDM Maturity
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
 
Creating a Data-Driven Organization: an executive summary
Creating a Data-Driven Organization: an executive summaryCreating a Data-Driven Organization: an executive summary
Creating a Data-Driven Organization: an executive summary
 
Data strategy in a Big Data world
Data strategy in a Big Data worldData strategy in a Big Data world
Data strategy in a Big Data world
 
Developing a Data Strategy -- A Guide For Business Leaders
Developing a Data Strategy -- A Guide For Business LeadersDeveloping a Data Strategy -- A Guide For Business Leaders
Developing a Data Strategy -- A Guide For Business Leaders
 
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyBecoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
 
Data Strategy
Data StrategyData Strategy
Data Strategy
 
Palantir, Quid, RecordedFuture: Augmented Intelligence Frontier
Palantir, Quid, RecordedFuture: Augmented Intelligence FrontierPalantir, Quid, RecordedFuture: Augmented Intelligence Frontier
Palantir, Quid, RecordedFuture: Augmented Intelligence Frontier
 
Data Governance: Keystone of Information Management Initiatives
Data Governance: Keystone of Information Management InitiativesData Governance: Keystone of Information Management Initiatives
Data Governance: Keystone of Information Management Initiatives
 
Four ways data is improving healthcare operations
Four ways data is improving healthcare operationsFour ways data is improving healthcare operations
Four ways data is improving healthcare operations
 
Tableau presentation
Tableau presentationTableau presentation
Tableau presentation
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Semelhante a IBM-Why Big Data?

Konceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBMKonceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBMIBM Danmark
 
Ibm swg day 2012 jhb big data (white)
Ibm swg day 2012 jhb big data (white)Ibm swg day 2012 jhb big data (white)
Ibm swg day 2012 jhb big data (white)simonje
 
MICROSTRATEGY - Sessione introduttiva sulla piattaforma di Business Intelligence
MICROSTRATEGY - Sessione introduttiva sulla piattaforma di Business IntelligenceMICROSTRATEGY - Sessione introduttiva sulla piattaforma di Business Intelligence
MICROSTRATEGY - Sessione introduttiva sulla piattaforma di Business IntelligenceTwinergy
 
Future Tech for Concur User Conference
Future Tech for Concur User ConferenceFuture Tech for Concur User Conference
Future Tech for Concur User ConferenceMichael Fauscette
 
Social mobile usage Don't Leave Social at the Office
 Social mobile usage   Don't Leave Social at the Office Social mobile usage   Don't Leave Social at the Office
Social mobile usage Don't Leave Social at the OfficeHeath McCarthy
 
Entering engineering workforce_v4
Entering engineering workforce_v4Entering engineering workforce_v4
Entering engineering workforce_v4Tony Pearson
 
Social Networking 3 - Final
Social Networking 3 - Final Social Networking 3 - Final
Social Networking 3 - Final Willie Favero
 
ASAE Tech Conference 2012 Closing Keynote on Disruptive Tech
ASAE Tech Conference 2012 Closing Keynote on Disruptive TechASAE Tech Conference 2012 Closing Keynote on Disruptive Tech
ASAE Tech Conference 2012 Closing Keynote on Disruptive TechDion Hinchcliffe
 
The mobile traveler experience
The mobile traveler experienceThe mobile traveler experience
The mobile traveler experienceKevin May
 
Cognitive computing big_data_statistical_analytics
Cognitive computing big_data_statistical_analyticsCognitive computing big_data_statistical_analytics
Cognitive computing big_data_statistical_analyticsPietro Leo
 
Smarter Analytics: Big Data and Predictive Governance
Smarter Analytics: Big Data and Predictive GovernanceSmarter Analytics: Big Data and Predictive Governance
Smarter Analytics: Big Data and Predictive GovernanceIBM Danmark
 
Kim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our WorldKim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our WorldBigDataViz
 
Palestra "Technology Trends To Watch In 2012 and beyond"
Palestra "Technology Trends To Watch In 2012 and beyond"Palestra "Technology Trends To Watch In 2012 and beyond"
Palestra "Technology Trends To Watch In 2012 and beyond"Dígitro Tecnologia
 
Big Data: Industry trends and key players
Big Data: Industry trends and key playersBig Data: Industry trends and key players
Big Data: Industry trends and key playersCM Research
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntelAPAC
 
Ibm software network2012 claudio cinquepalmi #ibmsocialbiz
Ibm software network2012 claudio cinquepalmi  #ibmsocialbiz Ibm software network2012 claudio cinquepalmi  #ibmsocialbiz
Ibm software network2012 claudio cinquepalmi #ibmsocialbiz Claudio Cinquepalmi
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxVaishnavGhadge1
 

Semelhante a IBM-Why Big Data? (20)

Konceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBMKonceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBM
 
Ibm swg day 2012 jhb big data (white)
Ibm swg day 2012 jhb big data (white)Ibm swg day 2012 jhb big data (white)
Ibm swg day 2012 jhb big data (white)
 
MICROSTRATEGY - Sessione introduttiva sulla piattaforma di Business Intelligence
MICROSTRATEGY - Sessione introduttiva sulla piattaforma di Business IntelligenceMICROSTRATEGY - Sessione introduttiva sulla piattaforma di Business Intelligence
MICROSTRATEGY - Sessione introduttiva sulla piattaforma di Business Intelligence
 
Future Tech for Concur User Conference
Future Tech for Concur User ConferenceFuture Tech for Concur User Conference
Future Tech for Concur User Conference
 
Social mobile usage Don't Leave Social at the Office
 Social mobile usage   Don't Leave Social at the Office Social mobile usage   Don't Leave Social at the Office
Social mobile usage Don't Leave Social at the Office
 
ibm
ibmibm
ibm
 
Entering engineering workforce_v4
Entering engineering workforce_v4Entering engineering workforce_v4
Entering engineering workforce_v4
 
Social Networking 3 - Final
Social Networking 3 - Final Social Networking 3 - Final
Social Networking 3 - Final
 
Accelerate Return on Data
Accelerate Return on DataAccelerate Return on Data
Accelerate Return on Data
 
ASAE Tech Conference 2012 Closing Keynote on Disruptive Tech
ASAE Tech Conference 2012 Closing Keynote on Disruptive TechASAE Tech Conference 2012 Closing Keynote on Disruptive Tech
ASAE Tech Conference 2012 Closing Keynote on Disruptive Tech
 
The mobile traveler experience
The mobile traveler experienceThe mobile traveler experience
The mobile traveler experience
 
Cognitive computing big_data_statistical_analytics
Cognitive computing big_data_statistical_analyticsCognitive computing big_data_statistical_analytics
Cognitive computing big_data_statistical_analytics
 
Smarter Analytics: Big Data and Predictive Governance
Smarter Analytics: Big Data and Predictive GovernanceSmarter Analytics: Big Data and Predictive Governance
Smarter Analytics: Big Data and Predictive Governance
 
Mobile Megatrends 2008
Mobile Megatrends 2008Mobile Megatrends 2008
Mobile Megatrends 2008
 
Kim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our WorldKim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our World
 
Palestra "Technology Trends To Watch In 2012 and beyond"
Palestra "Technology Trends To Watch In 2012 and beyond"Palestra "Technology Trends To Watch In 2012 and beyond"
Palestra "Technology Trends To Watch In 2012 and beyond"
 
Big Data: Industry trends and key players
Big Data: Industry trends and key playersBig Data: Industry trends and key players
Big Data: Industry trends and key players
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
 
Ibm software network2012 claudio cinquepalmi #ibmsocialbiz
Ibm software network2012 claudio cinquepalmi  #ibmsocialbiz Ibm software network2012 claudio cinquepalmi  #ibmsocialbiz
Ibm software network2012 claudio cinquepalmi #ibmsocialbiz
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 

Mais de Kun Le

Ibm big data and analytics for a holistic customer journey
Ibm big data and analytics for a holistic customer journeyIbm big data and analytics for a holistic customer journey
Ibm big data and analytics for a holistic customer journeyKun Le
 
Lessons and Challenges from Mining Retail E-Commerce Data
Lessons and Challenges from Mining Retail E-Commerce DataLessons and Challenges from Mining Retail E-Commerce Data
Lessons and Challenges from Mining Retail E-Commerce DataKun Le
 
University of Washington Computer Science & Engineering CSE 403: Software Eng...
University of Washington Computer Science & Engineering CSE 403: Software Eng...University of Washington Computer Science & Engineering CSE 403: Software Eng...
University of Washington Computer Science & Engineering CSE 403: Software Eng...Kun Le
 
Business Intelligence and Retail
Business Intelligence and RetailBusiness Intelligence and Retail
Business Intelligence and RetailKun Le
 
The “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedInThe “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedInKun Le
 
Marketo - Definitive guide to marketing metrics marketing analytics
Marketo - Definitive guide to marketing metrics marketing analyticsMarketo - Definitive guide to marketing metrics marketing analytics
Marketo - Definitive guide to marketing metrics marketing analyticsKun Le
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Kun Le
 
Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569Kun Le
 
Under the hood_of_mobile_marketing
Under the hood_of_mobile_marketingUnder the hood_of_mobile_marketing
Under the hood_of_mobile_marketingKun Le
 
IBM-Infoworld Big Data deep dive
IBM-Infoworld Big Data deep diveIBM-Infoworld Big Data deep dive
IBM-Infoworld Big Data deep diveKun Le
 
Smarter analytics for retailers Delivering insight to enable business success
Smarter analytics for retailers Delivering insight to enable business successSmarter analytics for retailers Delivering insight to enable business success
Smarter analytics for retailers Delivering insight to enable business successKun Le
 
Quoc Le, Stanford & Google - Tera Scale Deep Learning
Quoc Le, Stanford & Google - Tera Scale Deep LearningQuoc Le, Stanford & Google - Tera Scale Deep Learning
Quoc Le, Stanford & Google - Tera Scale Deep LearningKun Le
 
IBM - Using Predictive Analytics to Segment, Target and Optimize Marketing
IBM - Using Predictive Analytics to Segment, Target and Optimize MarketingIBM - Using Predictive Analytics to Segment, Target and Optimize Marketing
IBM - Using Predictive Analytics to Segment, Target and Optimize MarketingKun Le
 
Big data that drives marketing roi across all channels & campaigns
Big data that drives marketing roi across all channels & campaignsBig data that drives marketing roi across all channels & campaigns
Big data that drives marketing roi across all channels & campaignsKun Le
 

Mais de Kun Le (14)

Ibm big data and analytics for a holistic customer journey
Ibm big data and analytics for a holistic customer journeyIbm big data and analytics for a holistic customer journey
Ibm big data and analytics for a holistic customer journey
 
Lessons and Challenges from Mining Retail E-Commerce Data
Lessons and Challenges from Mining Retail E-Commerce DataLessons and Challenges from Mining Retail E-Commerce Data
Lessons and Challenges from Mining Retail E-Commerce Data
 
University of Washington Computer Science & Engineering CSE 403: Software Eng...
University of Washington Computer Science & Engineering CSE 403: Software Eng...University of Washington Computer Science & Engineering CSE 403: Software Eng...
University of Washington Computer Science & Engineering CSE 403: Software Eng...
 
Business Intelligence and Retail
Business Intelligence and RetailBusiness Intelligence and Retail
Business Intelligence and Retail
 
The “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedInThe “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedIn
 
Marketo - Definitive guide to marketing metrics marketing analytics
Marketo - Definitive guide to marketing metrics marketing analyticsMarketo - Definitive guide to marketing metrics marketing analytics
Marketo - Definitive guide to marketing metrics marketing analytics
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
 
Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569
 
Under the hood_of_mobile_marketing
Under the hood_of_mobile_marketingUnder the hood_of_mobile_marketing
Under the hood_of_mobile_marketing
 
IBM-Infoworld Big Data deep dive
IBM-Infoworld Big Data deep diveIBM-Infoworld Big Data deep dive
IBM-Infoworld Big Data deep dive
 
Smarter analytics for retailers Delivering insight to enable business success
Smarter analytics for retailers Delivering insight to enable business successSmarter analytics for retailers Delivering insight to enable business success
Smarter analytics for retailers Delivering insight to enable business success
 
Quoc Le, Stanford & Google - Tera Scale Deep Learning
Quoc Le, Stanford & Google - Tera Scale Deep LearningQuoc Le, Stanford & Google - Tera Scale Deep Learning
Quoc Le, Stanford & Google - Tera Scale Deep Learning
 
IBM - Using Predictive Analytics to Segment, Target and Optimize Marketing
IBM - Using Predictive Analytics to Segment, Target and Optimize MarketingIBM - Using Predictive Analytics to Segment, Target and Optimize Marketing
IBM - Using Predictive Analytics to Segment, Target and Optimize Marketing
 
Big data that drives marketing roi across all channels & campaigns
Big data that drives marketing roi across all channels & campaignsBig data that drives marketing roi across all channels & campaigns
Big data that drives marketing roi across all channels & campaigns
 

Último

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Último (20)

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

IBM-Why Big Data?

  • 1. Big Data Paul Zikopoulos, BA, MBA Director, IBM Information Management Technical Professionals, WW Competitive Database, and Big Data @BigData_paulz IBM Certified Advanced Technical Expert (Clusters and DRDA) IBM Certified Customer Solutions Expert (DBA and BI) paulz_ibm@msn.com March 27, 2012 © 2012 IBM Corporation
  • 2. Paul C. Zikopoulos, B.A., M.B.A., is the Director of Technical Professionals for IBM Software Group’s Information Management division and additionally leads the World Wide Database Competitive and Big Data SWAT teams. Paul is an award-winning writer and speaker with more than 18 years of experience in Information Management. Paul has written more than 350 magazine articles and 15 books including Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, DB2 pureScale: Risk Free Agile Scaling, Break Free with DB2 9.7: A Tour of Cost Saving Features; Information on Demand: Introduction to DB2 9.5 New Features; DB2 Fundamentals Certification for Dummies; DB2 for Dummies; and more. Paul is a DB2 Certified Advanced Technical Expert (DRDA and Clusters) and a DB2 Certified Solutions Expert (BI and DBA). In his spare time, he enjoys all sorts of sporting activities, including running with his dog Chachi, avoiding punches in his MMA training, and trying to figure out the world according to Chloë—his daughter. You can reach him at: paulz_ibm@msn.com. @BigData_paulz 2 © 2012 IBM Corporation
  • 3. Why Big Data How We Got Here March 27, 2012 © 2012 IBM Corporation
  • 4. 4 In 2005 there were 1.3 billion RFID tags in circulation… …by the end of 2011, this was about 30 billion and growing even faster © 2012 IBM Corporation
  • 5. An increasingly sensor-enabled and instrumented business environment generates HUGE volumes of data with MACHINE SPEED characteristics… 1 BILLION lines of code 5 EACH engine generating 10 TB every 30 minutes! © 2012 IBM Corporation
  • 6. 350B Transactions/Year Meter Reads every 15 min. 120M – meter reads/month 3.65B – meter reads/day 6 © 2012 IBM Corporation
  • 7.  In August of 2010, Adam Savage, of “Myth Busters,” took a photo of his vehicle using his smartphone. He then posted the photo to his Twitter account including the phrase “Off to work.”  Since the photo was taken by his smartphone, the image contained metadata revealing the exact geographical location the photo was taken  By simply taking and posting a photo, Savage revealed the exact location of his home, the vehicle he drives, and the time he leaves for work 7 © 2012 IBM Corporation
  • 8. The Social Layer in an Instrumented Interconnected World 4.6 30 billion RFID billion tags today camera 12+ TBs (1.3B in 2005) phones of tweet data world every day wide 100s of millions of GPS data every day ? TBs of enabled devices sold annually 25+ TBs of 2+ log data billion every day people on the 76 million smart Web by meters in 2009… end 2011 200M by 2014 8 © 2012 IBM Corporation
  • 9. Twitter Tweets per Second Record Breakers of 2011  Social-media analytics can be used from healthcare to predicting votes  Challenges – Volume – Velocity – Variety – Language Processing: consider that Twitter sentences are not well formed and often use urban talk 9 © 2012 IBM Corporation
  • 10. Can a Social Media Persona be Monetized? 10 © 2012 IBM Corporation
  • 11. Extract Intent, Life Events, Micro Segmentation Attributes Chloe Name, Birthday, Family Tom Sit Not Relevant - Noise Tina Mu Monetizable Intent Jo Jobs Not Relevant - Noise Location Wishful Thinking 11 Relocation Monetizable Intent SPAMbots © 2012 IBM Corporation
  • 12. 12 © 2012 IBM Corporation
  • 13. 1.8 ZB 1 ZB 1 ZB=1T GB 4Trillion 8GB iPods 13 © 2012 IBM Corporation
  • 14. The Big Data Opportunity Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible. Variety: Manage the complexity of data in many different structures, ranging from relational, to logs, to raw text Velocity: Streaming data and large volume data movement Volume: Scale from Terabytes to Petabytes (1K TBs) to Zettabytes (1B TBs) 14 © 2012 IBM Corporation
  • 15. Data In Motion Analyzes and correlates 5M+ 500K/sec, 6B+ IPDRs analyzed market messages/sec to execute per day on more than 4 PBs/yr. algorithmic option trades with sustaining 1GBps. average latency of 30 micro-secs. height: height: height: 640 1280 640 directory: directory: directory: directory: width: width: width: ”/img" ”/img" ”/opt" ”/img" 480 1024 480 filename: filename: filename: filename: data: data: data: “cow” “bird” “java” “cat” tuples 15 © 2012 IBM Corporation
  • 16. The Big Data Conundrum  The economies of deletion have changed…. – Leading us into new opportunities and challenges  The percentage of available data an enterprise can analyze is decreasing proportionately to the available to that enterprise  Quite simply, this means as enterprises, we are getting “more naive” about our business over time Data AVAILABLE to an organization Data an organization can PROCESS 16 © 2012 IBM Corporation
  • 17. Applications for Big Data Analytics Smarter Healthcare Multi-channel Finance Log Analysis sales Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail: Churn, NBO 17 © 2012 IBM Corporation
  • 18. Bigger and Bigger Volumes of Data  Retailers collect click-stream data from Web site interactions and loyalty card-drive transaction data – This traditional POS information is used by retailer for shopping basket analysis, inventory replenishment, +++ – But data is being provided to suppliers for customer buying analysis  Healthcare has traditionally been dominated by paper-based systems, but this information is getting digitized  Science is increasingly dominated by big science initiatives – Large-scale experiments generate over 15 PB of data a year and can’t be stored within the data center; then sent to laboratories  Financial services are seeing larger volumes through smaller trading sizes, increased market volatility, and technological improvements in automated and algorithmic trading  Improved instrument and sensory technology – Large Synoptic Survey Telescope’s GPixel camera generates 6PB+ of image data per year or consider Oil and Gas industry 18 © 2012 IBM Corporation
  • 19. A Simple Log File Jun 29 04:03:37 192.168.190.24 nagios: GLOBAL SERVICE EVENT HANDLER: ast-oral- vip; remedy_oracle_db_latch-contention; (null); (null); (null); handle-all- critical-event DATE TIME IP ADDRESS MONITOR Jun 29 04:03:57 192.168.190.24 nagios: SERVICE ALERT: ast-oral- vip;remedy_oracle_db_soft-prase-ratio;CRITICAL;HARD;3;CRITICAL – Soft parse ratio 86.63% TYPE COMPONENT Jun 29 07:50:57 192.168.190.24 nagios: SERVICE ALERT: ast-oral vip;remedy_oracle_db_latch-waiting; OK;SOFT;2;OK – SGA latch xssinfo freelist (#350) sleeping 0.000000% of the time Jun 29 07:50:57 192.168.190.24 nagios: GLOBAL SERVICE EVENT HANDLER: ast-oral- vip;remedy_oracle_db_latch-waiting; (null); (null); (null); handle-all- critical-event 19 © 2012 IBM Corporation
  • 20. 20 © 2012 IBM Corporation
  • 21. Click Stream Analysis Use Case Goal: Determine how many abandoned shopping carts there are and where they were abandoned. Input: Web log data (IP address, timestamp, URL, HTTP return codes). Output: List of IP addresses and last page visited by user. Web log input Map Step K2 V2 201.67.34.67 | 10:24:48 | www.ff.com | 200 Make IP 201.67.34.67 10:24:48,www.ff.com 123.99.54.76 | 11:15:21 | www.ff.com/search? | 200 address the 123.99.54.76 11:15:21,www.ff.com/search? 231.67.66.84 | 02:40:33 | www.ff.com/addItem.jsp | 200 key and 1 line timestamp 231.67.66.84 02:40:33,www.ff.com/addItem.jsp 101.23.78.92 | 07:09:56 | www.ff.com/shipping.jsp | 200 and URL the 101.23.78.92 07:09:56,www.ff.com/shipping.jsp 114.43.89.75 | 01:32:43 | www.ff.com | 200 value. 114.43.89.75 01:32:43,www.ff.com 109.55.55.81 | 09:45:31 | www.ff.com/confirmation.jsp | 200 109.55.55.81 09:45:31,www.ff.com/confirmation.jsp K1 – Ignored, V1 – 1 line of log Reduce Step K3 V3 Ensure K4 Shuffle/Sort/Group 201.67.34.67 10:24:48,www.ff.com something is Framework 201.67.34.67 in cart and does this automatically determine last 10:25:15,www.ff.com/search? page visited. V4 www.ff.com/shipping 10:26:23,www.ff.com/shipping 21 © 2012 IBM Corporation
  • 22. Solution Architecture – Social Media Analytics for Media and Entertainment Online flow: Data-in-motion analysis Social Media Stream Computing and Analytics Timely Decisions Entity Predictive Data Ingest Text Analytics: Analytics: Analytics: and Prep Timely Insights Profile Action Resolution Determination Dashboard Hadoop System and Analytics Comprehensive Entity Social Media Social Media Predictive Customer Text Analytics Analytics and Data Customer Analytics Models Integration Profiles Offline flow: Data-at-rest analysis Reports 22 © 2012 IBM Corporation
  • 23. Most Requested Uses of Big Data  Log Analytics & Storage  Smart Grid / Smarter Utilities  RFID Tracking & Analytics  Fraud / Risk Management & Modeling  360° View of the Customer  Warehouse Extension  Email / Call Center Transcript Analysis  Call Detail Record Analysis  +++ 23 23 © 2012 IBM Corporation
  • 24. Why Didn’t We Use All of the Big Data Before? “IBM gave us an opportunity to turn our plans into something that was very tangible right from the beginning. IBM had experts within data mining, Big Data, and Apache Hadoop and it was clear to use from the beginning we wanted to improve our business, not only today, but also prepare for the challenges we will face in three to five years, we had to go with IBM.” – Lars Christian Christensen VP Plant Siting & Forecasting 24 © 2012 IBM Corporation
  • 25. 25 © 2012 IBM Corporation
  • 26. Watson’s advanced analytic capabilities can sort through the equivalent of 26 200 MILLION pages of data to uncover an answer in 3 SECONDS.Corporation © 2012 IBM
  • 27. IBM Big Data Strategy: Move the Analytics Closer to the Data  New analytic applications drive Analytic Applications the requirements for a big data BI / Exploration / Functional Industry Predictive Content Reporting Visualization App BI / App Analytics Analytics platform Reportin g – Integrate and manage the full IBM Big Data Platform variety, velocity and volume of data – Apply advanced analytics to Visualization Application Systems Development Management information in its native form & Discovery – Visualize all available data for ad- hoc analysis Accelerators – Development environment for building new analytic applications Hadoop Hadoop Stream Stream Data – Workload optimization and System System Computing Computing Warehouse scheduling – Security and Governance Information Integration & Governance 27 © 2012 IBM Corporation
  • 28. The IBM Big Data Platform InfoSphere BigInsights Hadoop-based low latency analytics for variety and volume Data-At-Rest MPP Hadoop MPP MPP Stream Information Computing InfoSphere Information Server Integration InfoSphere Streams Low Latency Analytics for High volume data streaming data integration and Velocity, Variety & Volume transformation MPP Data Data-In-Motion Warehouse Netezza High Netezza 1000 Capacity Appliance BI+Ad Hoc Analytics on Queryable Archive for Structured Data Structured Data InfoSphere Warehouse Smart Analytics System Large volume structured Operational Analytics on data analytics Structured Data 28 © 2012 IBM Corporation
  • 29. Deep Analytics Appliance – Revolutionized Analytics Purpose-built analytics appliance Dedicated High Performance Speed: 10-100x faster than traditional systems Disk Storage Simplicity: Minimal administration and tuning Scalability: Peta-scale user data capacity Blades With Custom FPGA Accelerators Smart: High-performance advanced analytics 29 © 2012 IBM Corporation
  • 30. Complementary Analytics Traditional Approach New Approach Structured, analytical, logical Creative, holistic thought, intuition Data Hadoop Warehouse Streams Transaction Data Web Logs Internal App Data Social Data Structured Structured Unstructured Repeatable Enterprise Unstructured Exploratory Mainframe Data Repeatable Integration Exploratory Text Data: emails Linear Iterative Linear Monthly sales reports Iterativesentiment Brand Profitability analysis Product Sensor data: images strategy OLTP System Datasurveys Customer Maximum asset utilization ERP data Traditional New RFID Sources Sources 30 © 2012 IBM Corporation
  • 31. Most Client Use Cases Combine Multiple Technologies Pre-processing Ingest and analyze unstructured data types and convert to structured data Combine structured and unstructured analysis Augment data warehouse with additional external sources, such as social media Combine high velocity and historical analysis Analyze and react to data in motion; adjust models with deep historical analysis Reuse structured data for exploratory analysis Experimentation and ad-hoc analysis with structured data 31 © 2012 IBM Corporation
  • 32. Data Governance ed Some im es term “Sche times t e r me Somet a Fi rs t” ma L d m ater” “Sche  Separation of Duties  Separation of Duties – Users can’t delete their audit logs – No authorized access  Privilege Users  Privilege Users – Monitoring authorized and privilege – Monitoring authorized and privilege user activities user activities  Sensitive Data (in tables)  Sensitive Data (in the file system) – Protecting and blocking unauthorized – Protecting and blocking unauthorized access to the system and the data access to the system and the data  Workflow for audit and compliance  Workflow for audit and compliance controls to validate proper security controls to validate proper security procedures (satisfies regulations!) procedures (satisfies regulations!) 32 Internal Use Only © 2012 IBM Corporation
  • 33. Data Security Design Goals ed Some imes term tim “Sche es termed Somet a Fi rs t” ma L m ater” “Sche  Minimal impact to the Database  Minimal impact to the Big Data server resources server resources  No Database configuration changes  No Big Data configuration changes  Separation of Duties: Preventing  Separation of Duties: Preventing privilege users from doing malicious privilege users/jobs from doing activities malicious activities  Audit trail with very granular details  Audit trail with very granular details of Database activity of Big Data activity  Real-time alerting and blocking  Real-time alerting and blocking  Minimal impact to the network  Minimal impact to the network  100% Database activity visibility  100% Big Data activity visibility  Heterogeneous support  Heterogeneous support 33 Internal Use Only © 2012 IBM Corporation
  • 34. Guardium Customer Identification Privacy Master Data Management Master Data Management Data Privacy Data Privacy InfoSphere InfoSphere Optim for Test Data, Quality DB2 Redaction, +++ MDM Stage Data Customer Intelligence Appliance Data Quality Quality Data Models Out-of-the-box analytics Information Server Cognos Pre-built Customer Integration behavioral IBM Global Business Appliance attributes Services IBM Retail Data Model Core Metrics Unica Enterprise Data Warehouse Enterprise Data Warehouse Applications and Operational Analytics Applications and Operational Analytics Online Archive OLTP and Big Data Integration Managing Growth Managing Growth Built-in Integration into Big Data Built-in Integration into Big Data Optim Data Archive DB2 SAP - HANA solidDB Informix DB2 34 © 2012 IBM Corporation
  • 35. IBM Flattens the Time to Value Big Data Curve + + + + + + Harden Text/ML Velocity Hadoop ETL Dev. Tooling Visualization File System Analytics Harden 100+Big Data Tooling Dev. Text/ML Velocity Hadoop ETL Visualization File System Analytics Business Partners + + + + OR Signed + 35 © 2012 IBM Corporation
  • 36. First Ever Forrester Wave on Big Data “IBM has the deepest Hadoop platform and application portfolio. IBM, an established EDW vendor, has its own Hadoop distribution; an extensive professional services force working on Hadoop projects; extensive R&D programs developing Hadoop technologies; connections to Hadoop from its EDW.” –The Forrester Wave™: Enterprise Hadoop Solutions, 1Q12 36 © 2012 IBM Corporation
  • 37. Open Source Technology Behind Hadoop March 27, 2012 © 2012 IBM Corporation
  • 38. A Difference of Processing Models  SETI@home is a Computational Processing Model – Service for Extraterrestrial Intelligence (SETI) uses unused desktop CPU processing power to perform wide-spread analysis of radio telescope data – Pushes data to the program for processing – Data to function  MapReduce is a Data Processing Model – Data processing primitives of Mappers and Reducers – Can be complex to write MapReduce programs, but a very simple configuration change makes it scale to 1000s of nodes – Function to data 38 © 2012 IBM Corporation
  • 39. Building Blocks of Hadoop 39 © 2012 IBM Corporation
  • 40. Hadoop Framework  Hadoop Common – A utility layer that provides access to the HDFS and projects  HDFS – Data storage platform for Hadoop framework – Can scale to massive size when distributed over multiple computers  MapReduce – Framework to process data across a node clusters – MAP process splits work by first mapping the input across the control nodes of the cluster, then splitting the workload into even smaller data sets and distributing it further throughout the computing cluster • Allows MPP – think DB2 DPF ‘like’ – REDUCE collects and combines the nodes’ answers to deliver a result 40 © 2012 IBM Corporation
  • 41. Hadoop Explained  Hadoop computation model – Data stored in a distributed file system spanning many inexpensive computers – Bring function to the data – Distribute application to the compute resources where the data is stored  Scalable to thousands of nodes and petabytes of data public static class TokenizerMapper public static class TokenizerMapper Hadoop Data Nodes extends Mapper<Object,Text,Text,IntWritable> { extends Mapper<Object,Text,Text,IntWritable> { private final static IntWritable private final static IntWritable one = new IntWritable(1); one = new IntWritable(1); private Text word = new Text(); private Text word = new Text(); public void map(Object key, Text val, Context public void map(Object key, Text val, Context StringTokenizer itr = StringTokenizer itr = new StringTokenizer(val.toString()); new StringTokenizer(val.toString()); 1. Map Phase while (itr.hasMoreTokens()) { while (itr.hasMoreTokens()) { word.set(itr.nextToken()); word.set(itr.nextToken()); context.write(word, one); context.write(word, one); } } (break job into small parts) } } } } public static class IntSumReducer public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWrita 2. Shuffle extends Reducer<Text,IntWritable,Text,IntWrita Distribute map private IntWritable result = new IntWritable(); private IntWritable result = new IntWritable(); public void reduce(Text key, public void reduce(Text key, Iterable<IntWritable> val, Context context){ tasks to cluster Iterable<IntWritable> val, Context context){ (transfer interim output int sum = 0; int sum = 0; for (IntWritable v : val) { for (IntWritable v : val) { sum += v.get(); sum += v.get(); . . . . . . for final processing) MapReduce Application 3. Reduce Phase (boil all output down to Shuffle a single result set) Result Set Return a single result set 41 © 2012 IBM Corporation
  • 42. Growing the Hadoop Environment  Avro: Data serialization system that converts data into a fast, compact binary data format – Avro data can be stored in a file with versioned schemas  Chukwa: Large-scale monitoring system that provides insights into the Hadoop distributed file system and MapReduce  HBase is a scalable, column-oriented distributed database modeled after Google's BigTable distributed storage system  Hive is a data warehouse infrastructure that provides ad hoc query and data summarization for Hadoop Supported data – Hive utilizes SQL-like query language call HiveQL – HiveQL also used by programmers to run custom MapReduce jobs. 42 © 2012 IBM Corporation
  • 43. Growing the Hadoop Environment  Mahout is a data mining library designed to work on the Hadoop framework – Mahout delivers a core set of algorithms designed for clustering, classification and batch-based filtering.  Pig is a high-level programming language and execution framework for parallel computation – Pig works within the Hadoop and MapReduce frameworks  ZooKeeper provides coordination, configuration and group services for distributed applications working over the Hadoop stack 43 © 2012 IBM Corporation
  • 44. Committed to Open Source  Decade of lineage and contributions to the open source community – Apache Hadoop and Jaql, Apache Derby, Apache Geronimo, Apache Jakarta, +++ – Eclipse: founded by IBM – Significant Lucene contributions via IBM Lucene Extension Library (ILEL) – DRDA, XQuery, SQL, XML4J, XERCES, HTTP, Java, Linux, +++  IBM products built on open source – WebSphere: Apache – Rational: Eclipse and Apache – InfoSphere: Eclipse and Apache, +++  IBM’s BigInsights (Hadoop) is 100% open source compatible with no forks 44 © 2012 IBM Corporation
  • 45. Learn Hadoop: At Your Place, At Your Pace Making Learning Hadoop Easy and Fun  Flexible on-line delivery allows learning @your place and @your pace  Free courses, free study materials  Cloud-based sandbox for exercises – zero setup  8500+ registered students  Hadoop Programming Challenge is sending 3 students to IOD 2011 Conference, all expenses paid 45 © 2012 IBM Corporation
  • 46. Why IBM for Big Data The Solution Side Paul Zikopoulos, BA, MBA Director, IBM Information Management Technical Professionals, WW Competitive Database, and Big Data IBM Certified Advanced Technical Expert (Clusters and DRDA) IBM Certified Customer Solutions Expert (DBA and BI) paulz@ca.ibm.com March 27, 2012 © 2012 IBM Corporation
  • 47. A Big Data Platform Analytics Excellence In-Motion Operational At-Rest Operational Excellence Text Analytics Toolkit Excellence Harden Hadoop - GPFS Machine Learning Toolkit Surface Area Lock Down Unrivalled…. Industry Accelerators Development Tooling Embrace and Extend Policy Driven Retention & Immutability Role-Based Security Visualization Tooling Adaptive MapReduce Deployment Tooling (“App Workload Manager Store”) Fast Splittable CMX Compression $14B in 5 yrs. on Analytics REST-exposed Administration +++ Open Source IBMHadoop Big Data +++ Platform In-Motion At-Rest Analyze extreme amounts of Beyond traditional data in milliseconds structured data Uses same analytics as BigInsights uses same analytics as Streams BigInsights No forked, not ported: Hadoop Extended with Data can be analyzed on the way operational excellence and security into the enterprise for earlier Netezza for in-database MapReduce pattern detection MPP Data Warehouses 47 © 2012 IBM Corporation
  • 48. Committed to Innovation  $100M new investment into analytics announced 2Q11  IBM spent $14+ billion in 24 analytics acquisitions in 5 years  IBM has the largest commercial research organization on Earth – 200+ mathematicians developing breakthrough analytics  Largest patent portfolio in the industry For 19 consecutive years IBM inventors have received the most U.S. Patents 6,000 More than 48 patents in a single year © 2012 IBM Corporation
  • 49. A Big Data Platform for Data In-Motion and At-Rest Data Ingest 01011001100011101001001001001 110100101010011100101001111001000100100010010001000100101 11000100101001001011001001010 01100100101001001010100010010 Opportunity Cost Starts Here 01100100101001001010100010010 11000100101001001011001001010 01100100101001001010100010010 Bootstrap 01100100101001001010100010010 Enrich 01100100101001001010100010010 01100100101001001010100010010 11000100101001001011001001010 01100100101001001010100010010 01100100101001001010100010010 01100100101001001010100010010 Adaptive 01100100101001001010100010010 01100100101001001010100010010 Analytics 11000100101001001011001001010 01100100101001001010100010010 Model 01100100101001001010100010010 01100100101001001010100010010 11000100101001001011001001010 49 © 2012 IBM Corporation
  • 50. Data In Motion  Hear what’s going on miles away to optimize perimeter displacements  Perspective: Try to find the word “Zero” in a 1000 MP3 song library in a fraction of a second – Figure out the difference between the sound of a human whisper and the wind 50 © 2012 IBM Corporation
  • 51. Average cluster size is 100 nodes one hour What if of this 100-node cluster cost 51 $34 © 2012 IBM Corporation
  • 52. 52 © 2012 IBM Corporation
  • 53. Ease of Use for Developers and Users End-user Visualization Development Environment Data exploration, crawling, Familiar coding and tooling and analytics environment, testing, and optimization 53 © 2012 IBM Corporation
  • 54. What is BigSheets? Browser-based Big Data analytics tool for business users Big Data Challenges… How can BigSheets help?  Business users need a no  Spreadsheet-like discovery interface lets programming approach for business users easily analyze Big Data analyzing Big Data with ZERO PROGRAMMING  Extremely difficult to find actionable business insights in  BUILT-IN “readers” can work data from multiple sources with with data in several common formats different formats – JSON arrays, CSV, TSV, Web crawler output, . . .  Translating untapped data into actionable business insights is  Users can VISUALLY combine and a common requirement that explore various types of data to identify requires visualization “hidden” insights 54 © 2012 IBM Corporation
  • 55. 55 © 2012 IBM Corporation
  • 56. 8% spread change affecting brand Over 1M tweets analyzed 2nd most Tweets/sec run rate as of 1Q12 56 © 2012 IBM Corporation
  • 57. Big Data Made Easy for the Little Guy  USC’s Film Forecaster correctly predicted a clamor for "Hangover 2” that resulted in $100 million opening over Memorial Day weekend – Looked at 250K-500K Tweets and broke down positive and negative messages using a lexicon of 1700 words The Film Forecaster sounds like a big undertaking for USC, but it really came down to one communications masters student who learned Big Sheets in a day, then pulled in the tweets and analyzed them - Ryan Kim 57 © 2012 IBM Corporation
  • 58. Running Applications from the Web Console 58 © 2012 IBM Corporation
  • 59. A Rich Management Big Data Tool Quick links to common functions Learn more through external Where and how to begin Web resources performing common administrative or analytical tasks 59 © 2012 IBM Corporation
  • 60. A Rich Management Big Data Tool 60 © 2012 IBM Corporation
  • 61. Rich Big Data Development Environment  Eclipse base Development tools – For JQAL, Hive, PIG, Java MapReduce and Text Analytics 61 © 2012 IBM Corporation
  • 62. The Path to Efficiency: Declarative Languages Offline Runtime Development Environment Extracted Declarative Language Declarative Language Select Objects Regex create view MonetizableIntent as select P.name as product, Join I.clue as strength Select from Intent I, Product P Intent Product Join Runtime Runtime where Follows(I.clue, P.name, 0, 20) Regex Product and Not(ContainsRegex(/b(not)b/, Cost-based LeftContext(I.clue, 10))); Intent optimization Input … Documents Join Select Product  Streams, Text Analytics, Machine Regex  High-throughput Learning, SQL all are declarative, simple to learn languages Intent  Small memory footprint  All have strong development tooling and accelerators  Optimizes for tasks:  IBM has been optimizing declarative languages for decades,  Example, Text IN FACT, IBM INVENTED IT! Analytics needs CPU optimization 62 © 2012 IBM Corporation
  • 63. Analytic Accelerators Designed for Variety Text Simple & (listen, verb), Acoustic (radio, noun) Advanced Text Mining in Advanced Microseconds Mathematical Models Predictive ∑ R ( st , at ) population Statistics GeoSpatial Image & Video 63 © 2012 IBM Corporation
  • 64. Accelerators Improve Time to Value Telecommunications Retail Customer CDR streaming analytics Intelligence Deep Network Analytics Customer Behavior and Lifetime Value Analysis Finance Social Media Streaming options trading Analytics Insurance and banking DW Sentiment Analytics, Intent to models purchase Public Data Mining Transportation Streaming statistical Real-time monitoring and analysis routing optimization Over 100 sample User Defined Standard Industry Data applications Toolkits Toolkits Models Banking, Insurance, Telco, Healthcare, Retail 64 © 2012 IBM Corporation
  • 65. Telecommunications CDR Analytic Accelerator Analyze Call Detail Records in Real Time Streaming Analytic Accelerators – CDR dropped call analysis – Determine VIP customers with service issues – proactive alerts – CDR Adapters – ASN.1, Binary, ASCII – Analytic Operators – CDR de-duplication, dropped call detection, termination reason, customer importance – Visualization – real-time KPI dashboard Data Warehouse Appliance – Integrated network, devices, customer, and services model – Telecom model, KPIs, and KQIs 65 © 2012 IBM Corporation
  • 66. Without a Big Data Platform IBM Big Data Platform You Code… Over 100 sample applications and toolkits with industry focused Event toolkits with 300+ functions and Custom SQL Handling and operators! Scripts Multithreading Streams provides development, deployment, runtime, and infrastructure services Check Application Pointing Management Accelerators HA and Toolkits Performance Debug Connectors Optimization “TerraEchos developers can deliver Security applications 45% faster due to the agility of Streams Processing Language…” – Alex Philip, CEO and President 66 © 2012 IBM Corporation
  • 67. Streams Runtime Illustrated Optimizing scheduler assigns PEs Dynamically add to nodes, and continually manages nodes and jobs resource allocation Commodity hardware – laptop, Add in SPSS blades or high performance clusters jobs in the flow Meters Company Usage Model Filter Temp Meters Action Text Season Daily Usage Extract Adjust Adjust Contract Text Extract Degree History Compare History Store History X86 Host X86 Host X86 Host X86 Host X86 Host 67 © 2012 IBM Corporation
  • 68. Streams Runtime Illustrated PEs on busy nodes, can PEs on failing nodes can be be moved manually by the moved automatically, with Streams administrator communications re-routed Meters Company Usage Model Filter Temp Meters Action Text Season Daily Usage Extract Adjust Adjust Contract Text Extract Degree History Compare History Store History X86 Host X86 Host X86 Host X86 Host X86 Host 68 © 2012 IBM Corporation
  • 69. How Text Analytics Works Football World Cup 2010, one team distinguished themselves well, losing to the eventual champions 1- 0 in the Final. Early in the second half, Netherlands’ striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casilas made the save. Winger Andres Iniesta scored for Spain for the win. World Cup 2010 Highlights Arjen Robben Striker Netherlands Iker Casilas Keeper Spain Andres Iniesta Winger Spain 69 © 2012 IBM Corporation
  • 70. What’s Wrong with Text Analytics Today  Current alternative approaches and infrastructure for text analytics present challenges for analysts – They tend to perform poorly (in terms of accuracy and speed) – They are difficult to use  These alternative approaches rely on the raw text flowing only forward through a system of extractors and filters – Inflexible and inefficient approach, often resulting in redundant processing  Existing toolkits are also limited in their expressiveness – Analysts having to develop custom code – Programmer  Analyst (think Java Developer  DBA struggles) – Leads to more delays, complexity, and difficulties getting it right – Biggest factor hurting analyst productivity is the difficulty in determining how the system produced a certain result 70 © 2012 IBM Corporation
  • 71. Extracting Person and Phone Relationships 71 © 2012 IBM Corporation
  • 72. Text Analytic Toolkit Example  Semantically search Enron’s emails to support queries such as “Find Tom’s phone” in order to find his actual phone number – As opposed to emails containing words Tom and Phone  Need to be able to accurately extract email entities of type Person and Phone and the relationship between them 72 © 2012 IBM Corporation
  • 73. The Enron Email Example  Start with a naïve set of rules to identify a PersonPhone relationship built using some sort of editor – Build out an extractor that know how to find a person’s name: Lorraine Smith – Build out an extractor that knows how to find a phone number: 607-205-4493 607.205.4493 Euro spacing xt. ext. … North American spacing +0119054132112 Globalstar GSP-1600 911, 411 +++ 73 © 2012 IBM Corporation
  • 74. The Tricky Thing About Sentiment… 74 © 2012 IBM Corporation
  • 75. Text Analytics Toolkit  System T text analytics engine previously only embedded in IBM products and hidden from end users – Found in Lotus Notes, IBM e-discovery Analyzer, CCI, InfoSphere Warehouse,+++ – Almost a decade since initial release  BigInsights is the first time IBM opens up the Text Analytics Engine technology for customization and development  BigInsights Text Analytic Toolkit provides developer tools, an easy to use text analytics language, and a set of extractors for fast adoption – Multilingual support, including support for DBCS languages  BigInsights includes Annotator Query Language (AQL): SQL-like! – Fully declarative text analytics language – No “black boxes” or modules that can’t be customized. – Tooling for easy customization because you are abstracted from the programmatic details – Competing solutions make use of locked up black-box modules that cannot be customized, which restricts flexibility and are difficult to optimize for performance 75 © 2012 IBM Corporation
  • 76. Accelerating Analytics – Explainability  Every annotation’s provenance can be visualized  Provenance of Morgan Stanley Fax: 205-4493 shows the FullName rule is responsible for generating incorrect annotation – Solutions? • Remove Morgan Stanley from FullName dictionary? • Create new dictionary for CompanyName? Text Analytics Toolkit 76 © 2012 IBM Corporation
  • 77. Text Analytics Toolkit Provenance Viewer  Major challenge for analysts is determining the lineage of changes that have been applied to text – REALLY difficult to diskern which extractors need to be adjusted to tweak the resulting annotations  Provenance Viewer for interactive visualizations to display output annotations for regression, debugging, version enhancements, +++ – Reduce development of extractors by days to weeks 77 © 2012 IBM Corporation
  • 78. Accelerating Analytics – discovery and Pattern Matching  Contextual Clues discoverer cluster the context surrounding annotation in order to detect frequently occurring patterns  Illustrate clustering results between incorrect PersonPhone pairs – Visualize frequent occurrences (bubble size) of clues: i.e. FAX and TELEFAX – Developer improves precision of annotator by adding a rule that filters out PersonPhone pars if the Phone is preceded by a FAX clue – Developer finds call at text as key clue for rule extraction for PersonPhone Text Analytics Toolkit 78 © 2012 IBM Corporation
  • 79. Accelerating Analytics – Assisted Development  IDE that exposes sophisticated techniques to assist rule developers throughout all states of the development cycle – Promotes agile development because it unifies the business analyst and the developer with a common toolset which fosters understanding • Business Analyst helps mature and validates extractor results • Developer understands visually that AQL enhancements needed to refine rules Text Analytics Toolkit 79 © 2012 IBM Corporation
  • 80. IBM Text Analytics Toolkit - Development Toolset Context Sensitive Help Outline View Hyperlink Navigation Procedural Help 80 © 2012 IBM Corporation
  • 81. BigInsights Text Analytics Development Difference Content Assist Viewer Regular Expression Builder Context Sensitive Help Outline View Hyperlink Navigation Procedural Help Design Time Validation 81 © 2012 IBM Corporation
  • 82. Text Analytics Toolkit – Development Accelerator  Eclipse plug-ins enhance analyst productivity – When writing AQL code, the editor features syntax highlighting, and automatic detection of syntax error at design time not runtime – Pre-built extractors and sample test tools, +++  Reduce coding time and debugging by 30-50%+! 82 © 2012 IBM Corporation
  • 83. Text Analytics Toolkit – Performance Accelerator  AQL code is highly optimized for MapReduce because of its declarative nature  Unlike other frameworks, AQL optimizer determines order of execution of the extractor instructions for maximum efficiency  Deliver analysis up to 10x faster than other leading alternative frameworks running the same extractors 83 © 2012 IBM Corporation
  • 84. 84 © 2012 IBM Corporation
  • 85. Business Task: Find the Pictures that Have Cats 85 © 2012 IBM Corporation
  • 86. BUT! Your Application Returns the Following… PRECISION (a measure of exactness) 2/4 RECALL (a measure of coverage) 2/5 86 © 2012 IBM Corporation
  • 87. IBM Finds RIGHT Answers Better Than Anyone 87 © 2012 IBM Corporation
  • 88. IBM’s Hadoop System Provides Unique Business Value  Optimized beyond open source Hadoop – Workload optimization, security, +++  Integration with enterprise systems – Connectors for multiple data sources  Accelerators reduce development and implementation times – Industry & application accelerators – Analytic accelerators  Visualization tools enable business users to explore Big Data 88 © 2012 IBM Corporation
  • 89. University of Southern Cal Political Debate Monitoring  Solution to measure public sentiment during the Republican Primary and Presidential Debates  Examines trends, volume, and content of millions of public Twitter messages in real-time  Analytic accelerators to understand sentiment (positive, negative, neutral) - Stream Computing and visualization  Benefits - Real-time display of public sentiment as candidates respond to questions - Debate winner prediction based on public opinion instead of solely political analysts 89 © 2012 IBM Corporation
  • 90. Cisco turns to IBM big data for intelligent infrastructure m anagem ent  Optimize building energy consumption with centralized monitoring and control of building monitoring system  Automates preventive and corrective maintenance of building corrective systems  Uses Streams, InfoSphere BigInsights and Cognos - Log Analytics - Energy Bill Forecasting - Energy consumption optimization - Detection of anomalous usage - Presence-aware energy mgt. 90 90 - Policy enforcement 2012 IBM Corporation ©
  • 91. Stream Computing Provides Unique Business Value  Real-time answers = low latency insight – Better outcomes for time sensitive applications (e.g. fraud detection, network management)  Solution when data is too large or expensive to store – Analyze data as it comes to you – Persist data of interest for deeper analysis  Insights derived across multiple streams – Fuse streams for new insights 91 © 2012 IBM Corporation
  • 92. Asian telco reduces billing costs and improves customer satisfaction Capabilities: Stream Computing Analytic Accelerators Real-time mediation and analysis of 6B CDRs per day Data processing time reduced from 12 hrs to 1 sec Hardware cost reduced to 1/8th Proactively address issues (e.g. dropped calls) impacting customer satisfaction. 92 92 © 2012 IBM Corporation
  • 93. Data Warehousing Provides Unique Business Value  Consolidate, manage and reconcile data for enterprise business intelligence  Establish trust, quality and governance where necessary – Financial data – Credit card data – Healthcare  Combine deep and operational analytics  Maintain history for trending and historical reporting 93 © 2012 IBM Corporation
  • 94. Pacific Northwest Smart Grid Demonstration Project Capabilities: Stream Computing – real-time control system Deep Analytics Appliance – analyze massive data sets Demonstrates scalability from 100 to 500K homes while retaining 10 years’ historical data 60k metered customers in 5 states Accommodates ad hoc analysis of price fluctuation, energy consumption profiles, risk, fraud detection, grid health, etc. 94 © 2012 IBM Corporation
  • 95. Information Integration Provides Unique Business Value  Movement of large data sets in batch and real time – Parallel processing engine for efficient data movement  Governance and trust for Big Data – Lineage and meta data of new Big Data data sources – Profile sources to determine trust  Data Quality – Standardize and transform data 95 © 2012 IBM Corporation
  • 96. Leaders integrate and govern massive Big Data Marketing Services with InfoSphere big Leader integrates data for customer intelligence Capabilities Utilized: Information Integration – data quality, ETL Deep Analytics Appliance Complex customer data integration for 54M records/hour Processing 5B simultaneous records 96 9696 © 2012 IBM Corporation
  • 97. @BigData_paulz THINK PDF http://tinyurl.com/88auawg Hardcopy http://tinyurl.com/7ef9ubs March 27, 2012 © 2012 IBM Corporation 97