SlideShare uma empresa Scribd logo
1 de 42
Big Data
            Simon Jeggo
            24 May 2012

© 2011 IBM Corporation    IBM Confidential
Agenda
 What is Big Data


 Some Big Data Use Cases


 IBM’s Big Data Platform

© 2011 IBM Corporation   IBM Confidential
What is
Big Data




 © 2011 IBM Corporation   IBM Confidential
The Big Data Challenge – a Term defined
     “Big Data is a term applied to data sets that are large, complex and dynamic (or a combination thereof) and for which there is a
      requirement to capture, manage and process the data set in its entirety, such that it is not possible to process the data using
      traditional software tools and analytic techniques within tolerable time frames.”
     New technologies that bring cost effective approaches to explore, understand and predict better business outcomes
            MPP databases
            Streams
            In-database analytics
            Apache Hadoop                                                                                                                Automate

            Cloud computing platforms
            Archival storage systems
     Why something different?
          Data x Computation > typical warehouse
          Schema Flexibility
          Programming Flexibility
                                                                                                                                   Integrate     Secure




     We are engaged in over 50 clients, working with them to apply big data techniques to a class of problems -- e.g., text analytics, log analysis,
      customer insights, fraud detection etc.
     We have a set of unique value-adds – JAQL, GPFS, System-T and others coming…
     And we can make BigData for our clients sit in their complex IT environment



      © 2011 IBM Corporation                                           IBM Confidential
4
© 2011 IBM Corporation
                          In 2005 there were 1.3 billion RFID
                           tags in circulation…




IBM Confidential




                          …by the end of 2011, this was about
                          30 billion and growing even faster
An increasingly sensor-enabled and instrumented
          business environment generates HUGE volumes of
             data with MACHINE SPEED characteristics…




               1 BILLION lines of code
     EACH engine generating 10 TB every 30 minutes!
© 2011 IBM Corporation         IBM Confidential
350B Transactions/
                                                                                              Year

                                                                                         Meter Reads
                                                                                        every 15 min.


                         120M – meter reads/month            3.65B – meter reads/day
© 2011 IBM Corporation                    IBM Confidential
 In August of 2010, Adam Savage,
                               of “Myth Busters,” took a photo
                               of his vehicle using his
                               smartphone. He then posted the
                               photo to his Twitter account
                               including the phrase “Off to
                               work.”

                              Since the photo was taken by his
                               smartphone, the image
                               contained metadata revealing the
                               exact geographical location the
                               photo was taken

                                By simply taking and posting a
                                    photo, Savage revealed the exact
                                    location of his home, the vehicle
                                    he drives, and the time he leaves
© 2011 IBM Corporation   IBM Confidential work
                                    for
The Social Layer in a Instrumented Interconnected World
                                                                                      4.6
                                                            30 billion RFID       billion
                                                                tags today         camera
                            12+ TBs                            (1.3B in 2005)      phones
                           of tweet data                                        world wide
                             every day



                                                                                 100s of
                                                                                millions
                                                                                 of GPS
        data every day




                                                                                enabled
 ? TBs of




                                                                                   devices
                                                                                      sold
                                                                                  annually


                                   25+ TBs of                                          2+
                                  log data every                                  billion
                                       day                                      people on
                                                                                 the Web
                                                    76 million smart               by end
                                                     meters in 2009…                2011
                                                      200M by 2014
       © 2011 IBM Corporation                      IBM Confidential
Twitter Tweets per Second Record Breakers of 2011
                                    Social-media analytics can be
                                     used from healthcare to
                                     predicting votes

                                    Challenges
                                         –    Volume
                                         –    Velocity
                                         –    Variety
                                         –    Language Processing: consider that
                                              Twitter sentences are not well
                                              formed and often use
                                              urban talk




  © 2011 IBM Corporation   IBM Confidential
Extract Intent, Life Events, Micro Segmentation Attributes

                                      Chloe

                                              Name, Birthday, Family
                                      Tom Sit

                                                 Not Relevant - Noise
                                      Tina Mu

                                                      Monetizable Intent
                                      Jo Jobs
                                                 Not Relevant - Noise


                              Location                                   Wishful Thinking

© 2011 IBM Corporation       Relocation
                                 Monetizable Intent
                                                      IBM Confidential     SPAMbots
Watson’s advanced analytic capabilities can sort through the equivalent of 200
 © 2011 IBM Corporation
               MILLION pages of data to uncover an answer in 3 SECONDS.
                                         IBM Confidential
1.8 ZB


         1 ZB
       1 ZB=1T GB




                                            4Trillion
                                              8GB
                                             iPods




© 2011 IBM Corporation   IBM Confidential
Cisco turns to IBM big
                                              data for intelligent
                                                infrastructure
                                                management
 Big Data                                   •   Optimize building energy
                                                consumption with centralized

 Use                                        •
                                                monitoring
                                                Automate preventive and
                                                corrective maintenance
 Cases
                                            Capabilities Utilized:
                                                • Streaming Analytics
                                                • Hadoop System
                                                • Business Intelligence

                                            Applications:
                                                 •   Log Analytics
                                                 •   Energy Bill Forecasting
                                                 •   Energy consumption optimization
                                                 •   Detection of anomalous usage
© 2011 IBM Corporation   IBM Confidential        •   Presence-aware energy mgt.
Applications for Big Data Analytics
Smarter Healthcare           Multi-channel                         Finance       Log Analysis
                                 sales




Homeland Security
                             Traffic Control                       Telecom      Search Quality




  Manufacturing             Trading Analytics                 Fraud and Risk   Retail: Churn, NBO




   © 2011 IBM Corporation                       IBM Confidential
Retail Industry

 Issues for the Retail Industry
    Deliver value to empowered customers
         Move from market analysis to understanding individuals
         Take charge of growing volume, velocity and variety of data
    Foster lasting connections
         Focus on relationships, not just transactions
         Invest in expanding the corporate brand
    Capture value, measure results
         Developing complete understanding of the point of sale
         Build new skills and solutions




 © 2011 IBM Corporation                    IBM Confidential
Use Case: Social Media Analytics
Problem
    As consumers continue to adopt social media technologies, businesses must be able to track customer sentiment and brand perception, finding
     new opportunities and avoiding business problems from negative perceptions



                                                                                                Structured/Unstructured data
Solution
   Social Media Analytics
         What consumers and the industry are saying


   Optimizing Internal Operations
         Better utilization of tools for web analytics
         Decreased latency for analysis


   Predictive Analytics
         Promotion targeting for offers
         Prospect harvesting
         POS analytics, predictive and discovery


   Competitive Intelligence
   Unlock information across the web
                                                                                                    What is our next best offer?

    © 2011 IBM Corporation                                        IBM Confidential
Warehouse Off-load Use Case: Transactional Analytics

Problem
        Retailers have massive amounts of transaction data that offers a wealth of information about customer purchasing behavior in stores
        This data isn't being used effectively because of its volume, the cost to store it, and the barriers to analyzing massive data




Solution
         Store POS transactions in BigInsights, reducing the cost from
          traditional data warehousing
         BigInsights enables ad-hoc query for historical reporting, trend
          analysis, and analyst needs
         Data mining feeds for store and customer segmentation, market
          basket analysis, promotion targeting and other analytics
          based solutions
         Historical POS made available for analysis of new product
          introductions, new store openings, and other disruptive business
          events




        © 2011 IBM Corporation                                            IBM Confidential
FSS - Customer Correspondence Analytics
Problem
    Current approaches limit insight and predictive analytics to structured data, limiting insight and losing the “state”
     of the customer
    Human-based review of correspondence is limited to small scale sampling
           Results of sampling are too dependant on the skills of reviewer and cannot learn from information sets outside of that
            human reviewers knowledge
    Detecting and acting on rapidly changing customer sentiment and understanding why a service touch is occurring
     from the customer POV
    The need to take cost out of service touch points while improving effectiveness/intamacy



Solution
      Use of un-instrumented or under-instrumented information source to identify and head-off issues
        •   Extends risk modeling to underutilized sources such as email, chat, social media, call center, and CSR interactions and notes
      Move from small scale sampling to 100% coverage using BigInsights and cross correlation of information sources
        –   Natural language analytics combined with machine learning to identify opportunities and issues that are not apparent in small sample sizes
            and human awareness.
      Use of natural language sophisticated analytics to allow develop a predictive understanding customer actions based on
       customer state
        –   Topic and sentiment extraction from email, chat, social media, call center, and CSR interactions and notes to predict call reasons and next
            best action



  © 2011 IBM Corporation                                          IBM Confidential
FSS - Risk Platforms and Analytics
Problem
   Real-time analytics and need to meet SLA windows are outstripping existing infrastructure capabilities
           Burst-oriented trading close volumes and resulting position analytics are expanding faster than traditional technologies can
            cost effectively meet
           Standard policies of flushing the data after hours or days is not meeting risk modeling needs
   Web, unstructured and machine generated data does not fit existing relational analytics tools
           SQL is not the natural tool to manipulate untapped information sources that can improve the dimension of risk modeling
   The changing nature of risk requires flexibility in sizing, speed and methods that are not easy to respond to with existing
    SQL based platforms


Solution
 Predict, identify and triage risk anomalies in real-time
    – Use of SystemT and SystemML analytics engines to identify problems based on historical data and then push those
      models to Streams
 Use of BigInsights to ingest and analyze hundreds of TB an hour to meet SLA requirements for high
  volume and complex trading operations
 Use of un-instrumented or under-instrumented information source to identify and head-off issues
    •   Extends risk modeling to underutilized sources such as email, chat, social media, call center, and CSR interactions
        and notes



  © 2011 IBM Corporation                                       IBM Confidential
FSS - Social Media Analytics
Problem
 Important source of information, but requires new approaches to collecting, storing, understanding and utilizing the value to
  be found.
        Fuzzy and messy data are the norm
        Little if any of the information is easily structured
 Reconciling external and internal sources
        Identifying individuals among the fog of external data is not easily done but is often necessary
 Linking to known individuals requires Entity analytics concepts and capabilities

Solution
 Ability to acquire, parse, analyze, link and persist external information sources to a variety of analytics
  platforms
    – Use of SystemT and SystemML analytics engines to digest and make sense of external sources

 Sophisticated text/language analytics to allow powerful and accurate understanding of the external
  sources
    – Entity resolution capabilities to match external sources to known customers and groups
    – Graphical interfaces to quickly explore data sets, test hypothesis, create production jobs and synthesize data sources
      from multiple disparate internal and external sources
    – Ability to push normalized data to Netezza for analytics with existing methods and tools


  © 2011 IBM Corporation                                         IBM Confidential
Explosion of data in Telecom
                 From 500PB per month 2011
           To 5,000PB per month 2016



© 2011 IBM Corporation     IBM Confidential
Explosion of Data for Telecom
    > 2 Billion Internet users 2011                                             How to lower
                                                                                network costs
      AT&T Global Network carries 24                                            ($/GB)?
      Petabytes of data PER DAY                                                                 How to improve
                                                                                                data revenue
     5 Billon Mobile Phones WW                    Voice             Traffic    Network          ($/GB)?
     – 550K Android phone              $/bit     Dominant                       Cost
     activated every day                                                                               Profitability
  Twitter process 7 terabytes
                                       Traffic                                                         Gap
                                       Volume
  of data every day                                                                                    (value/GB)

    Facebook processes 10
    terabytes of data every day                                                          Revenues

     Skype 300 Million Min of
     Video Calls Per Month

       YouTube – Massive bits through
       Networks                                                               Data Dominant
       48 Hours of Web of Video
       uploaded per min                                                                              Time
       3 Billion views per day

Telecoms need to be smarter….. smarter networks and smarter business models
© 2011 IBM Corporation                           IBM Confidential
                         All Telecom Enterprises have BIG DATA CHALLENGES
Churn Prediction and Targeted Offers
                            with Social Media Text Analytics

Problem
 Lost revenue and increase customer acquisition cost is directly related to churn
      Churn not only lost customers due to pricing, but to service level, new tech offerings, service offerings, and
       customer perception
 Significant challenge increasing ARPU
      Revenue per customer is much harder to increase as competition increases
 Current churn prediction systems are not up to the challenge
      Too slow and not using social media data

Solution

    Improve churn prediction using social media
        – Analyze social media on its own or with current warehouse/BI analytics to predict churn quicker (real-
          time) and more accurately
        – BigInsights Text Analytics is the key to finding new analytics and Streams for RT alerts
    Discover ARPU opportunities directly from social media
        – New source of customer intent and sentiment will drive new revenue opportunities
        – Real time feedback to marketing systems or warehouse/BI to place offers quickly
        – Finding ready-to-buy customers


  © 2011 IBM Corporation                              IBM Confidential
Real Time CDR Analytics and Ingest

Problem
 Gathering CDR’s, mediating them into relevant data, and moving them to analytical systems is slow and
  costly
      By the time CDR data is mediated and ingested by data warehouses, the ability to respond to problems is significantly
       reduced.
      Systems tend to be old and require extensive application maintenance and hardware
 Cannot achieve real time billing, requires handling billions of CDRs per day, and de-duplication against 15
  days worth of CDR data
Solution

 Big Data Streams Telecommunications Mediation and Analytics (TMA) offering
     – Real-time CDR processing
     – Real-time analytics and dashboard
     – Unparalleled price/performance benefits
     – Connectors to Warehouse and BigInsights
 Real-Time dashboards include:
     – Dropped calls by high priority customers, location, providers, etc
     – Terminated calls by location and customer type
     – Revenue monitoring by voice and SMS
 The solution will enable novel Business Intelligence applications
  © 2011 IBM Corporation                                IBM Confidential
CDR Analytics with Extended Data

Problem
 Telecom is experiencing an explosion of data from 3G and LTE (4G) network traffic. CDR’s are almost
  only used for billing systems because storing and analyzing them was too expensive with EDW and BI
  alone.
 Competition driving the need for focus on:
    customer retention
    customer profitability
 No connection between CDR, Web, and other data making everything from fraud detection to targeted
  marketing to ad optimization difficult and expensive
Solution
 BigInsights for cost effective store of original data and large-scale text analytics
     – Stores data unstructured and non-typed ingested with no data model
     – Discovery and Analytics tools are built into BigInsights – Machine Learning extensions
     – Integration to Netezza and DB2. JDBC to other data bases
 Big Data Streams Telecommunications Mediation and Analytics (TMA) offering
     – Real-time CDR processing can be extended to other data sources – fast and low cost
 Netezza integration opens Big Data solutions to warehouse and BI
     – Deep analytics and model development
     – Can act as a high performance operational data store
  © 2011 IBM Corporation                         IBM Confidential
Ad Effectiveness Analysis with Social Media

Problem
 Telecom and Media spend large sums of money on advertising. Measuring the effectiveness of the Ads
  difficult and almost impossible online without costly services
    Service providers are slow with responses and expensive
    Current ad analysis is mostly guesswork and intuition – not lending itself to timely decisions
 Enterprises are demanding better ROI from ad budgets and proof of effectiveness of each ad campaign
    To increase effectiveness, enterprises have to react in near-real-time

Solution

 BigInsights used for social media ingest and fast analysis
   – Answers questions like what was the awareness, who did we reach, and what was the reaction to an
     ad in a few hours vs weeks
   – Offers ad departments to react: modify, localize, and focus
 Streams for real-time ad analysis extending predictive models for fast reaction
 React very quickly to ad effectiveness
   1.   Adjust ad budgets
   2.   Tailor ad’s to geography
   3.   Alter messaging
   4.   Adjust targeted and direct marketing initiatives
 © 2011 IBM Corporation                                 IBM Confidential
Why IBM
                  for Big Data
                The Solution Side




© 2011 IBM Corporation        IBM Confidential
The IBM Big Data Platform


                                                                 InfoSphere BigInsights
                                                              Hadoop-based low latency
                                                            analytics for variety and volume

                                                                      Hadoop


                             Information Integration                                            Stream Computing

InfoSphere Information Server                                                                                        InfoSphere Streams
High volume data integration and                                                                                   Low Latency Analytics for
         transformation                                                                                                 streaming data




                                                           MPP Data Warehouse




                                                                                                                    IBM Smart Analytics
        IBM InfoSphere                       IBM Netezza High                          IBM Netezza 1000
                                                                                                                         System
          Warehouse                         Capacity Appliance                            BI+Ad Hoc
                                                                                                                   Operational Analytics on
  Large volume structured data               Queryable Archive                   Analytics on Structured Data
                                                                                                                       Structured Data
            analytics                         Structured Data

       © 2011 IBM Corporation                                       IBM Confidential
A Big Data Platform
Analytics Excellence                 In-Motion Operational Excellence         At-Rest Operational Excellence
Text Analytics Toolkit                          Unrivalled….                                 Harden Hadoop - GPFS
Machine Learning Toolkit
Industry Accelerators Development   Embrace and Extend                                     Surface Area Lock Down

Tooling Visualization Tooling                                                   Policy Driven Retention & Immutability
Deployment Tooling (“App Store”)                                                                  Role-Based Security
$14B in 5 yrs. on Analytics                                                                      Adaptive MapReduce
+++                                                                                                 Workload Manager
                                                                                   Fast Splittable CMX Compression
                                                                           REST-exposed Administration              ++
                                                                                                                     +

        In-Motion                               Open Source
                                                                                 At-Rest
                                               IBM Big Data
                                                  Hadoop
     Analyze extreme amounts of                  Platform                       Beyond traditional
        data in milliseconds                                                     structured data
     Uses same analytics as BigInsights                               BigInsights uses same analytics as Streams

    Data can be analyzed on the way into                              No forked, not ported: Hadoop Extended with
      the enterprise for earlier pattern                                   operational excellence and security
                  detection                                              Netezza for in-database MapReduce
                                                                                MPP Data Warehouses
      © 2011 IBM Corporation                       IBM Confidential
Stream Computing: A new paradigm for ultra low latency
                         and high throughput in-motion analytics
     Continuous Ingestion                  Continuous Queries /Analytics on data in motion




© 2011 IBM Corporation                    IBM Confidential
Data In Motion
                           Information used to be aggregated and analyzed every 30-60
                            minutes and discarded after 72 hours
                           Analyzing 1000 pieces of unique medical diagnostic information
                            per/sec. and stored in a dynamic model
                           Perspective: 20% drop in mortality of control group in trials
                            (extend approach to daily activities)
                              -   120 children monitored:120K messages/sec…billions/day




© 2011 IBM Corporation                    IBM Confidential
Data In Motion

                                 Hear what’s going on miles away to optimize
                                  perimeter displacements

                                 Perspective: Try to find the word “Zero” in a
                                  1000 MP3 song library in a fraction of a second
                                   – Figure out the difference between the sound of a
                                     human whisper and the wind




© 2011 IBM Corporation              IBM Confidential
Data In Motion – Improving What They
                         Already Have
                             Old Microsoft-based solution not able to keep up with
                              new 3G demands for their real-time xDR analysis
                              business requirements

                             Streams and Netezza solution proposed
                               – Time to merge and load data reduced 90%+
                               – Time to market for new products from 4 hours to minutes




© 2011 IBM Corporation                IBM Confidential
                                             Internal Use Only Reference
How Text Analytics Works

                              Football World Cup 2010, one team distinguished
                              themselves well, losing to the eventual champions 1-0 in
                              the Final. Early in the second half,

                              Netherlands’ striker, Arjen Robben, had a breakaway, but
                              the keeper for Spain, Iker Casilas made the save. Winger
                              Andres Iniesta scored for Spain for the win.
                         World Cup 2010 Highlights



                          Arjen Robben                   Striker        Netherlands
                           Iker Casilas                  Keeper            Spain
                          Andres Iniesta                 Winger            Spain




© 2011 IBM Corporation                               IBM Confidential
IBM Text Analytics Toolkit Lets You…
 Build out world-class text analysis applications 50% faster than manual method
 Run faster text analysis (10x or more vs. some marketplace alternatives)
 Get more precise and correct answers (2x vs. some marketplace alternatives)




     © 2011 IBM Corporation                        IBM Confidential
What is BigSheets?
      Browser-based Big Data analytics tool for business users
  Big Data Challenges…                                   How can BigSheets help?
 Business users need a no                          Spreadsheet-like discovery interface lets
  programming approach for                           business users easily analyze Big Data
  analyzing Big Data                                 with ZERO PROGRAMMING


 Extremely difficult to find
                                                    BUILT-IN “readers” can work with data
  actionable business insights in
                                                         in several common formats
  data from multiple sources with
                                                          – JSON arrays, CSV, TSV, Web
  different formats
                                                             crawler output, . . .

 Translating untapped data into
                                                    Users can VISUALLY combine and
  actionable business insights is a
  common requirement that requires                   explore various types of data to identify
  visualization                                      “hidden” insights

   © 2011 IBM Corporation             IBM Confidential
© 2011 IBM Corporation   IBM Confidential
Big Data Made Easy for the Little Guy

 USC’s Film Forecaster correctly predicted a
  clamor for "Hangover 2” that resulted in $100
  million opening over Memorial Day weekend
    – Looked at 250K-500K Tweets and broke down
      positive and negative messages using a lexicon
      of 1700 words




                                                          The Film Forecaster sounds like a big
                                                          undertaking for USC, but it really came
                                                          down to one communications masters
                                                          student who learned Big Sheets in
                                                          a day, then pulled in the tweets and
                                                          analyzed them - Ryan Kim


 © 2011 IBM Corporation                IBM Confidential
Why IBM for Big Data?
 Only IBM is showing data-in-motion and data-at-rest analytics: a bigger more
  opportunistic view of Big Data


 Development and research sit side by side
 Virtualization tooling, development, file system, analytics
 Not just same company: same org, same people, same leadership


 BigInsights being used in
  IBM products today such
  as Cognos Consumer Insight




    © 2011 IBM Corporation              IBM Confidential
Without a Big Data Platform                                                       IBM Big Data Platform
                   You Code…

                                                                             Over 100 sample applications and
                                                                           toolkits with industry focused toolkits
      Event               Custom SQL                                         with 300+ functions and operators!
     Handling                 and
                            Scripts
                                         Multithreading                    Streams provides development, deployment,
                                                                               runtime, and infrastructure services
 Check            Application
Pointing          Management
                                                            Accelerators
                                       HA                       and

                                                              Toolkits




                          Performance          Debug
      Connectors
                          Optimization



                                                                           “TerraEchos developers can deliver
Security                                                                   applications 45% faster due to the agility
                                                                           of Streams Processing Language…”
                                                                           – Alex Philip, CEO and President
 © 2011 IBM Corporation                                   IBM Confidential
THINK

      https://w3-connections.ibm.com/wikis/home?lang=en_US#/wiki/Info%20Mgmt%20Client%20Technical
                  %20Professional%20Resources%20Wiki/page/Understanding%20Big%20Data
     © 2011 IBM Corporation                    IBM Confidential
42

Mais conteúdo relacionado

Mais procurados

Brian pickering introduction to seserv - seserv se workshop june 2012
Brian pickering   introduction to seserv - seserv se workshop june 2012Brian pickering   introduction to seserv - seserv se workshop june 2012
Brian pickering introduction to seserv - seserv se workshop june 2012ictseserv
 
Cloud 2015: The Road to 15 Billion Connected Devices
Cloud 2015: The Road to 15 Billion Connected DevicesCloud 2015: The Road to 15 Billion Connected Devices
Cloud 2015: The Road to 15 Billion Connected DevicesIntel IT Center
 
Android fragmentation, a valid concern?
Android fragmentation, a valid concern?Android fragmentation, a valid concern?
Android fragmentation, a valid concern?androidaalto
 
The Creativity Machine
The Creativity MachineThe Creativity Machine
The Creativity MachineSal
 
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietyPresentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietySURFnet
 
IBM Smart Camp: Philippe Souidi on Big Data
IBM Smart Camp: Philippe Souidi on Big DataIBM Smart Camp: Philippe Souidi on Big Data
IBM Smart Camp: Philippe Souidi on Big DataPhilippe Souidi
 
Wireless M2M - bringing smart interactivity into the world of moving machines...
Wireless M2M - bringing smart interactivity into the world of moving machines...Wireless M2M - bringing smart interactivity into the world of moving machines...
Wireless M2M - bringing smart interactivity into the world of moving machines...Mobile Monday Brussels
 
Gamechanics Talk - Joddy Hernady
Gamechanics Talk - Joddy HernadyGamechanics Talk - Joddy Hernady
Gamechanics Talk - Joddy HernadySegitiga.Net
 
Innovation & Hyper Connectivity
Innovation & Hyper ConnectivityInnovation & Hyper Connectivity
Innovation & Hyper Connectivitykoumanolis
 
Javier salcedo cloud computing - seserv se workshop june 2012
Javier salcedo   cloud computing - seserv se workshop june 2012Javier salcedo   cloud computing - seserv se workshop june 2012
Javier salcedo cloud computing - seserv se workshop june 2012ictseserv
 
Maximise productivity through dynamic, virtual technology (IBM Websphere)
Maximise productivity through dynamic, virtual technology (IBM Websphere)Maximise productivity through dynamic, virtual technology (IBM Websphere)
Maximise productivity through dynamic, virtual technology (IBM Websphere)IBM Danmark
 
Optimizing Email for Mobile Devices
Optimizing Email for Mobile DevicesOptimizing Email for Mobile Devices
Optimizing Email for Mobile DevicesSilverpop
 
Digital consumer 数字消费者
Digital consumer 数字消费者Digital consumer 数字消费者
Digital consumer 数字消费者Ian Hou
 
World Newspaper Congress 11: Technology Session, Adam Bird
World Newspaper Congress 11: Technology Session, Adam BirdWorld Newspaper Congress 11: Technology Session, Adam Bird
World Newspaper Congress 11: Technology Session, Adam BirdWAN-IFRA
 
Huawei - Mobile Networking evolved
Huawei - Mobile Networking evolvedHuawei - Mobile Networking evolved
Huawei - Mobile Networking evolvedJeremie Tisseau
 
Signal rich media white paper
Signal rich media white paperSignal rich media white paper
Signal rich media white papercbrandis
 
Google Represented at The Mobile VAS 2009
Google Represented at The Mobile VAS 2009Google Represented at The Mobile VAS 2009
Google Represented at The Mobile VAS 2009Paritosh Sharma
 

Mais procurados (20)

Brian pickering introduction to seserv - seserv se workshop june 2012
Brian pickering   introduction to seserv - seserv se workshop june 2012Brian pickering   introduction to seserv - seserv se workshop june 2012
Brian pickering introduction to seserv - seserv se workshop june 2012
 
Cloud 2015: The Road to 15 Billion Connected Devices
Cloud 2015: The Road to 15 Billion Connected DevicesCloud 2015: The Road to 15 Billion Connected Devices
Cloud 2015: The Road to 15 Billion Connected Devices
 
Android fragmentation, a valid concern?
Android fragmentation, a valid concern?Android fragmentation, a valid concern?
Android fragmentation, a valid concern?
 
16h30 p duff-big-data-final
16h30   p duff-big-data-final16h30   p duff-big-data-final
16h30 p duff-big-data-final
 
The Creativity Machine
The Creativity MachineThe Creativity Machine
The Creativity Machine
 
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietyPresentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
 
IBM Smart Camp: Philippe Souidi on Big Data
IBM Smart Camp: Philippe Souidi on Big DataIBM Smart Camp: Philippe Souidi on Big Data
IBM Smart Camp: Philippe Souidi on Big Data
 
Web². The Internet of everything
Web². The Internet of everythingWeb². The Internet of everything
Web². The Internet of everything
 
Wireless M2M - bringing smart interactivity into the world of moving machines...
Wireless M2M - bringing smart interactivity into the world of moving machines...Wireless M2M - bringing smart interactivity into the world of moving machines...
Wireless M2M - bringing smart interactivity into the world of moving machines...
 
Gamechanics Talk - Joddy Hernady
Gamechanics Talk - Joddy HernadyGamechanics Talk - Joddy Hernady
Gamechanics Talk - Joddy Hernady
 
Innovation & Hyper Connectivity
Innovation & Hyper ConnectivityInnovation & Hyper Connectivity
Innovation & Hyper Connectivity
 
Javier salcedo cloud computing - seserv se workshop june 2012
Javier salcedo   cloud computing - seserv se workshop june 2012Javier salcedo   cloud computing - seserv se workshop june 2012
Javier salcedo cloud computing - seserv se workshop june 2012
 
Maximise productivity through dynamic, virtual technology (IBM Websphere)
Maximise productivity through dynamic, virtual technology (IBM Websphere)Maximise productivity through dynamic, virtual technology (IBM Websphere)
Maximise productivity through dynamic, virtual technology (IBM Websphere)
 
Optimizing Email for Mobile Devices
Optimizing Email for Mobile DevicesOptimizing Email for Mobile Devices
Optimizing Email for Mobile Devices
 
Digital consumer 数字消费者
Digital consumer 数字消费者Digital consumer 数字消费者
Digital consumer 数字消费者
 
World Newspaper Congress 11: Technology Session, Adam Bird
World Newspaper Congress 11: Technology Session, Adam BirdWorld Newspaper Congress 11: Technology Session, Adam Bird
World Newspaper Congress 11: Technology Session, Adam Bird
 
InfoBulletin February 2011
InfoBulletin February 2011InfoBulletin February 2011
InfoBulletin February 2011
 
Huawei - Mobile Networking evolved
Huawei - Mobile Networking evolvedHuawei - Mobile Networking evolved
Huawei - Mobile Networking evolved
 
Signal rich media white paper
Signal rich media white paperSignal rich media white paper
Signal rich media white paper
 
Google Represented at The Mobile VAS 2009
Google Represented at The Mobile VAS 2009Google Represented at The Mobile VAS 2009
Google Represented at The Mobile VAS 2009
 

Destaque

Dtcc ibm big data platform 2012-final_cn
Dtcc ibm big data platform 2012-final_cnDtcc ibm big data platform 2012-final_cn
Dtcc ibm big data platform 2012-final_cnyp_fangdong
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Adam Kawa
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyJosh Baer
 
JORDANrm10
JORDANrm10JORDANrm10
JORDANrm10Year56
 
Welcome to NiceMeeting
Welcome to NiceMeetingWelcome to NiceMeeting
Welcome to NiceMeetingNiceMeeting
 
Корпоративные детские программы
Корпоративные детские программыКорпоративные детские программы
Корпоративные детские программыSunrise child care
 
WILLr8
WILLr8WILLr8
WILLr8Year56
 
Lo kuu razissue7
Lo kuu razissue7Lo kuu razissue7
Lo kuu razissue7j30silva
 
ChLoE rm10
ChLoE rm10ChLoE rm10
ChLoE rm10Year56
 
Q.r. codes at gps
Q.r. codes at gpsQ.r. codes at gps
Q.r. codes at gpsYear56
 
Celina R10
Celina R10Celina R10
Celina R10Year56
 
Mc chat aats!
Mc chat aats!Mc chat aats!
Mc chat aats!aats04
 
hollee r10
hollee r10hollee r10
hollee r10Year56
 
From inspiration to booking... who ate the last cookie?
From inspiration to booking... who ate the last cookie?From inspiration to booking... who ate the last cookie?
From inspiration to booking... who ate the last cookie?Martino Matijevic
 
Marine - David
Marine - DavidMarine - David
Marine - DavidYear56
 
Schoolmentalhealth 090526112700-phpapp02-5
Schoolmentalhealth 090526112700-phpapp02-5Schoolmentalhealth 090526112700-phpapp02-5
Schoolmentalhealth 090526112700-phpapp02-5jbmlfm1234
 

Destaque (20)

Dtcc ibm big data platform 2012-final_cn
Dtcc ibm big data platform 2012-final_cnDtcc ibm big data platform 2012-final_cn
Dtcc ibm big data platform 2012-final_cn
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At Spotify
 
JORDANrm10
JORDANrm10JORDANrm10
JORDANrm10
 
Welcome to NiceMeeting
Welcome to NiceMeetingWelcome to NiceMeeting
Welcome to NiceMeeting
 
Корпоративные детские программы
Корпоративные детские программыКорпоративные детские программы
Корпоративные детские программы
 
WILLr8
WILLr8WILLr8
WILLr8
 
Lo kuu razissue7
Lo kuu razissue7Lo kuu razissue7
Lo kuu razissue7
 
Escultura
EsculturaEscultura
Escultura
 
ChLoE rm10
ChLoE rm10ChLoE rm10
ChLoE rm10
 
Almoxarife 2014
Almoxarife 2014Almoxarife 2014
Almoxarife 2014
 
Q.r. codes at gps
Q.r. codes at gpsQ.r. codes at gps
Q.r. codes at gps
 
Celina R10
Celina R10Celina R10
Celina R10
 
Mc chat aats!
Mc chat aats!Mc chat aats!
Mc chat aats!
 
Nachra3
Nachra3Nachra3
Nachra3
 
hollee r10
hollee r10hollee r10
hollee r10
 
From inspiration to booking... who ate the last cookie?
From inspiration to booking... who ate the last cookie?From inspiration to booking... who ate the last cookie?
From inspiration to booking... who ate the last cookie?
 
Marine - David
Marine - DavidMarine - David
Marine - David
 
Schoolmentalhealth 090526112700-phpapp02-5
Schoolmentalhealth 090526112700-phpapp02-5Schoolmentalhealth 090526112700-phpapp02-5
Schoolmentalhealth 090526112700-phpapp02-5
 
Presentation ICHC 2015
Presentation ICHC 2015Presentation ICHC 2015
Presentation ICHC 2015
 

Semelhante a Ibm swg day 2012 jhb big data (white)

Konceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBMKonceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBMIBM Danmark
 
Big data 20120327
Big data 20120327Big data 20120327
Big data 20120327Accenture
 
IBM-Why Big Data?
IBM-Why Big Data?IBM-Why Big Data?
IBM-Why Big Data?Kun Le
 
Smarter Planet: How Big Data changes our world
Smarter Planet: How Big Data changes our worldSmarter Planet: How Big Data changes our world
Smarter Planet: How Big Data changes our worldKim Escherich
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntelAPAC
 
Kim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our WorldKim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our WorldBigDataViz
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Mark Heid
 
Intel Cloud Summit: Big Data
Intel Cloud Summit: Big DataIntel Cloud Summit: Big Data
Intel Cloud Summit: Big DataIntelAPAC
 
Big data and big content
Big data and big contentBig data and big content
Big data and big contentJohn Mancini
 
Future of technical innovation 3 trends that impact enterprise users
Future of technical innovation   3 trends that impact enterprise usersFuture of technical innovation   3 trends that impact enterprise users
Future of technical innovation 3 trends that impact enterprise usersJohn Gibbon
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...DATAVERSITY
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...Vladimir Bacvanski, PhD
 
Big Data: Industry trends and key players
Big Data: Industry trends and key playersBig Data: Industry trends and key players
Big Data: Industry trends and key playersCM Research
 
Business Intelligence on AWS Redshift
Business Intelligence on AWS RedshiftBusiness Intelligence on AWS Redshift
Business Intelligence on AWS RedshiftAgileiss
 
Mesa Big Data 2nd Screen Final
Mesa Big Data 2nd Screen FinalMesa Big Data 2nd Screen Final
Mesa Big Data 2nd Screen FinalTripp Payne
 

Semelhante a Ibm swg day 2012 jhb big data (white) (20)

Konceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBMKonceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBM
 
Big data 20120327
Big data 20120327Big data 20120327
Big data 20120327
 
IBM-Why Big Data?
IBM-Why Big Data?IBM-Why Big Data?
IBM-Why Big Data?
 
ibm
ibmibm
ibm
 
Big data
Big dataBig data
Big data
 
Smarter Planet: How Big Data changes our world
Smarter Planet: How Big Data changes our worldSmarter Planet: How Big Data changes our world
Smarter Planet: How Big Data changes our world
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
 
Kim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our WorldKim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our World
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
 
Intel Cloud Summit: Big Data
Intel Cloud Summit: Big DataIntel Cloud Summit: Big Data
Intel Cloud Summit: Big Data
 
Big data and big content
Big data and big contentBig data and big content
Big data and big content
 
Future of technical innovation 3 trends that impact enterprise users
Future of technical innovation   3 trends that impact enterprise usersFuture of technical innovation   3 trends that impact enterprise users
Future of technical innovation 3 trends that impact enterprise users
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
 
Big data and its impact on indian business
Big data and its impact on indian businessBig data and its impact on indian business
Big data and its impact on indian business
 
Big Data: Industry trends and key players
Big Data: Industry trends and key playersBig Data: Industry trends and key players
Big Data: Industry trends and key players
 
Business Intelligence on AWS Redshift
Business Intelligence on AWS RedshiftBusiness Intelligence on AWS Redshift
Business Intelligence on AWS Redshift
 
Big Data and Cloud Analytics
Big Data and Cloud AnalyticsBig Data and Cloud Analytics
Big Data and Cloud Analytics
 
Mesa Big Data 2nd Screen Final
Mesa Big Data 2nd Screen FinalMesa Big Data 2nd Screen Final
Mesa Big Data 2nd Screen Final
 

Último

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 

Último (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Ibm swg day 2012 jhb big data (white)

  • 1. Big Data Simon Jeggo 24 May 2012 © 2011 IBM Corporation IBM Confidential
  • 2. Agenda What is Big Data Some Big Data Use Cases IBM’s Big Data Platform © 2011 IBM Corporation IBM Confidential
  • 3. What is Big Data © 2011 IBM Corporation IBM Confidential
  • 4. The Big Data Challenge – a Term defined  “Big Data is a term applied to data sets that are large, complex and dynamic (or a combination thereof) and for which there is a requirement to capture, manage and process the data set in its entirety, such that it is not possible to process the data using traditional software tools and analytic techniques within tolerable time frames.”  New technologies that bring cost effective approaches to explore, understand and predict better business outcomes  MPP databases  Streams  In-database analytics  Apache Hadoop Automate  Cloud computing platforms  Archival storage systems  Why something different?  Data x Computation > typical warehouse  Schema Flexibility  Programming Flexibility Integrate Secure  We are engaged in over 50 clients, working with them to apply big data techniques to a class of problems -- e.g., text analytics, log analysis, customer insights, fraud detection etc.  We have a set of unique value-adds – JAQL, GPFS, System-T and others coming…  And we can make BigData for our clients sit in their complex IT environment © 2011 IBM Corporation IBM Confidential 4
  • 5. © 2011 IBM Corporation In 2005 there were 1.3 billion RFID tags in circulation… IBM Confidential …by the end of 2011, this was about 30 billion and growing even faster
  • 6. An increasingly sensor-enabled and instrumented business environment generates HUGE volumes of data with MACHINE SPEED characteristics… 1 BILLION lines of code EACH engine generating 10 TB every 30 minutes! © 2011 IBM Corporation IBM Confidential
  • 7. 350B Transactions/ Year Meter Reads every 15 min. 120M – meter reads/month 3.65B – meter reads/day © 2011 IBM Corporation IBM Confidential
  • 8.  In August of 2010, Adam Savage, of “Myth Busters,” took a photo of his vehicle using his smartphone. He then posted the photo to his Twitter account including the phrase “Off to work.”  Since the photo was taken by his smartphone, the image contained metadata revealing the exact geographical location the photo was taken  By simply taking and posting a photo, Savage revealed the exact location of his home, the vehicle he drives, and the time he leaves © 2011 IBM Corporation IBM Confidential work for
  • 9. The Social Layer in a Instrumented Interconnected World 4.6 30 billion RFID billion tags today camera 12+ TBs (1.3B in 2005) phones of tweet data world wide every day 100s of millions of GPS data every day enabled ? TBs of devices sold annually 25+ TBs of 2+ log data every billion day people on the Web 76 million smart by end meters in 2009… 2011 200M by 2014 © 2011 IBM Corporation IBM Confidential
  • 10. Twitter Tweets per Second Record Breakers of 2011  Social-media analytics can be used from healthcare to predicting votes  Challenges – Volume – Velocity – Variety – Language Processing: consider that Twitter sentences are not well formed and often use urban talk © 2011 IBM Corporation IBM Confidential
  • 11. Extract Intent, Life Events, Micro Segmentation Attributes Chloe Name, Birthday, Family Tom Sit Not Relevant - Noise Tina Mu Monetizable Intent Jo Jobs Not Relevant - Noise Location Wishful Thinking © 2011 IBM Corporation Relocation Monetizable Intent IBM Confidential SPAMbots
  • 12. Watson’s advanced analytic capabilities can sort through the equivalent of 200 © 2011 IBM Corporation MILLION pages of data to uncover an answer in 3 SECONDS. IBM Confidential
  • 13. 1.8 ZB 1 ZB 1 ZB=1T GB 4Trillion 8GB iPods © 2011 IBM Corporation IBM Confidential
  • 14. Cisco turns to IBM big data for intelligent infrastructure management Big Data • Optimize building energy consumption with centralized Use • monitoring Automate preventive and corrective maintenance Cases Capabilities Utilized: • Streaming Analytics • Hadoop System • Business Intelligence Applications: • Log Analytics • Energy Bill Forecasting • Energy consumption optimization • Detection of anomalous usage © 2011 IBM Corporation IBM Confidential • Presence-aware energy mgt.
  • 15. Applications for Big Data Analytics Smarter Healthcare Multi-channel Finance Log Analysis sales Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail: Churn, NBO © 2011 IBM Corporation IBM Confidential
  • 16. Retail Industry  Issues for the Retail Industry  Deliver value to empowered customers  Move from market analysis to understanding individuals  Take charge of growing volume, velocity and variety of data  Foster lasting connections  Focus on relationships, not just transactions  Invest in expanding the corporate brand  Capture value, measure results  Developing complete understanding of the point of sale  Build new skills and solutions © 2011 IBM Corporation IBM Confidential
  • 17. Use Case: Social Media Analytics Problem  As consumers continue to adopt social media technologies, businesses must be able to track customer sentiment and brand perception, finding new opportunities and avoiding business problems from negative perceptions Structured/Unstructured data Solution  Social Media Analytics  What consumers and the industry are saying  Optimizing Internal Operations  Better utilization of tools for web analytics  Decreased latency for analysis  Predictive Analytics  Promotion targeting for offers  Prospect harvesting  POS analytics, predictive and discovery  Competitive Intelligence  Unlock information across the web What is our next best offer? © 2011 IBM Corporation IBM Confidential
  • 18. Warehouse Off-load Use Case: Transactional Analytics Problem  Retailers have massive amounts of transaction data that offers a wealth of information about customer purchasing behavior in stores  This data isn't being used effectively because of its volume, the cost to store it, and the barriers to analyzing massive data Solution  Store POS transactions in BigInsights, reducing the cost from traditional data warehousing  BigInsights enables ad-hoc query for historical reporting, trend analysis, and analyst needs  Data mining feeds for store and customer segmentation, market basket analysis, promotion targeting and other analytics based solutions  Historical POS made available for analysis of new product introductions, new store openings, and other disruptive business events © 2011 IBM Corporation IBM Confidential
  • 19. FSS - Customer Correspondence Analytics Problem  Current approaches limit insight and predictive analytics to structured data, limiting insight and losing the “state” of the customer  Human-based review of correspondence is limited to small scale sampling  Results of sampling are too dependant on the skills of reviewer and cannot learn from information sets outside of that human reviewers knowledge  Detecting and acting on rapidly changing customer sentiment and understanding why a service touch is occurring from the customer POV  The need to take cost out of service touch points while improving effectiveness/intamacy Solution  Use of un-instrumented or under-instrumented information source to identify and head-off issues • Extends risk modeling to underutilized sources such as email, chat, social media, call center, and CSR interactions and notes  Move from small scale sampling to 100% coverage using BigInsights and cross correlation of information sources – Natural language analytics combined with machine learning to identify opportunities and issues that are not apparent in small sample sizes and human awareness.  Use of natural language sophisticated analytics to allow develop a predictive understanding customer actions based on customer state – Topic and sentiment extraction from email, chat, social media, call center, and CSR interactions and notes to predict call reasons and next best action © 2011 IBM Corporation IBM Confidential
  • 20. FSS - Risk Platforms and Analytics Problem  Real-time analytics and need to meet SLA windows are outstripping existing infrastructure capabilities  Burst-oriented trading close volumes and resulting position analytics are expanding faster than traditional technologies can cost effectively meet  Standard policies of flushing the data after hours or days is not meeting risk modeling needs  Web, unstructured and machine generated data does not fit existing relational analytics tools  SQL is not the natural tool to manipulate untapped information sources that can improve the dimension of risk modeling  The changing nature of risk requires flexibility in sizing, speed and methods that are not easy to respond to with existing SQL based platforms Solution  Predict, identify and triage risk anomalies in real-time – Use of SystemT and SystemML analytics engines to identify problems based on historical data and then push those models to Streams  Use of BigInsights to ingest and analyze hundreds of TB an hour to meet SLA requirements for high volume and complex trading operations  Use of un-instrumented or under-instrumented information source to identify and head-off issues • Extends risk modeling to underutilized sources such as email, chat, social media, call center, and CSR interactions and notes © 2011 IBM Corporation IBM Confidential
  • 21. FSS - Social Media Analytics Problem  Important source of information, but requires new approaches to collecting, storing, understanding and utilizing the value to be found.  Fuzzy and messy data are the norm  Little if any of the information is easily structured  Reconciling external and internal sources  Identifying individuals among the fog of external data is not easily done but is often necessary  Linking to known individuals requires Entity analytics concepts and capabilities Solution  Ability to acquire, parse, analyze, link and persist external information sources to a variety of analytics platforms – Use of SystemT and SystemML analytics engines to digest and make sense of external sources  Sophisticated text/language analytics to allow powerful and accurate understanding of the external sources – Entity resolution capabilities to match external sources to known customers and groups – Graphical interfaces to quickly explore data sets, test hypothesis, create production jobs and synthesize data sources from multiple disparate internal and external sources – Ability to push normalized data to Netezza for analytics with existing methods and tools © 2011 IBM Corporation IBM Confidential
  • 22. Explosion of data in Telecom From 500PB per month 2011 To 5,000PB per month 2016 © 2011 IBM Corporation IBM Confidential
  • 23. Explosion of Data for Telecom > 2 Billion Internet users 2011 How to lower network costs AT&T Global Network carries 24 ($/GB)? Petabytes of data PER DAY How to improve data revenue 5 Billon Mobile Phones WW Voice Traffic Network ($/GB)? – 550K Android phone $/bit Dominant Cost activated every day Profitability Twitter process 7 terabytes Traffic Gap Volume of data every day (value/GB) Facebook processes 10 terabytes of data every day Revenues Skype 300 Million Min of Video Calls Per Month YouTube – Massive bits through Networks Data Dominant 48 Hours of Web of Video uploaded per min Time 3 Billion views per day Telecoms need to be smarter….. smarter networks and smarter business models © 2011 IBM Corporation IBM Confidential All Telecom Enterprises have BIG DATA CHALLENGES
  • 24. Churn Prediction and Targeted Offers with Social Media Text Analytics Problem  Lost revenue and increase customer acquisition cost is directly related to churn  Churn not only lost customers due to pricing, but to service level, new tech offerings, service offerings, and customer perception  Significant challenge increasing ARPU  Revenue per customer is much harder to increase as competition increases  Current churn prediction systems are not up to the challenge  Too slow and not using social media data Solution  Improve churn prediction using social media – Analyze social media on its own or with current warehouse/BI analytics to predict churn quicker (real- time) and more accurately – BigInsights Text Analytics is the key to finding new analytics and Streams for RT alerts  Discover ARPU opportunities directly from social media – New source of customer intent and sentiment will drive new revenue opportunities – Real time feedback to marketing systems or warehouse/BI to place offers quickly – Finding ready-to-buy customers © 2011 IBM Corporation IBM Confidential
  • 25. Real Time CDR Analytics and Ingest Problem  Gathering CDR’s, mediating them into relevant data, and moving them to analytical systems is slow and costly  By the time CDR data is mediated and ingested by data warehouses, the ability to respond to problems is significantly reduced.  Systems tend to be old and require extensive application maintenance and hardware  Cannot achieve real time billing, requires handling billions of CDRs per day, and de-duplication against 15 days worth of CDR data Solution  Big Data Streams Telecommunications Mediation and Analytics (TMA) offering – Real-time CDR processing – Real-time analytics and dashboard – Unparalleled price/performance benefits – Connectors to Warehouse and BigInsights  Real-Time dashboards include: – Dropped calls by high priority customers, location, providers, etc – Terminated calls by location and customer type – Revenue monitoring by voice and SMS  The solution will enable novel Business Intelligence applications © 2011 IBM Corporation IBM Confidential
  • 26. CDR Analytics with Extended Data Problem  Telecom is experiencing an explosion of data from 3G and LTE (4G) network traffic. CDR’s are almost only used for billing systems because storing and analyzing them was too expensive with EDW and BI alone.  Competition driving the need for focus on:  customer retention  customer profitability  No connection between CDR, Web, and other data making everything from fraud detection to targeted marketing to ad optimization difficult and expensive Solution  BigInsights for cost effective store of original data and large-scale text analytics – Stores data unstructured and non-typed ingested with no data model – Discovery and Analytics tools are built into BigInsights – Machine Learning extensions – Integration to Netezza and DB2. JDBC to other data bases  Big Data Streams Telecommunications Mediation and Analytics (TMA) offering – Real-time CDR processing can be extended to other data sources – fast and low cost  Netezza integration opens Big Data solutions to warehouse and BI – Deep analytics and model development – Can act as a high performance operational data store © 2011 IBM Corporation IBM Confidential
  • 27. Ad Effectiveness Analysis with Social Media Problem  Telecom and Media spend large sums of money on advertising. Measuring the effectiveness of the Ads difficult and almost impossible online without costly services  Service providers are slow with responses and expensive  Current ad analysis is mostly guesswork and intuition – not lending itself to timely decisions  Enterprises are demanding better ROI from ad budgets and proof of effectiveness of each ad campaign  To increase effectiveness, enterprises have to react in near-real-time Solution  BigInsights used for social media ingest and fast analysis – Answers questions like what was the awareness, who did we reach, and what was the reaction to an ad in a few hours vs weeks – Offers ad departments to react: modify, localize, and focus  Streams for real-time ad analysis extending predictive models for fast reaction  React very quickly to ad effectiveness 1. Adjust ad budgets 2. Tailor ad’s to geography 3. Alter messaging 4. Adjust targeted and direct marketing initiatives © 2011 IBM Corporation IBM Confidential
  • 28. Why IBM for Big Data The Solution Side © 2011 IBM Corporation IBM Confidential
  • 29. The IBM Big Data Platform InfoSphere BigInsights Hadoop-based low latency analytics for variety and volume Hadoop Information Integration Stream Computing InfoSphere Information Server InfoSphere Streams High volume data integration and Low Latency Analytics for transformation streaming data MPP Data Warehouse IBM Smart Analytics IBM InfoSphere IBM Netezza High IBM Netezza 1000 System Warehouse Capacity Appliance BI+Ad Hoc Operational Analytics on Large volume structured data Queryable Archive Analytics on Structured Data Structured Data analytics Structured Data © 2011 IBM Corporation IBM Confidential
  • 30. A Big Data Platform Analytics Excellence In-Motion Operational Excellence At-Rest Operational Excellence Text Analytics Toolkit Unrivalled…. Harden Hadoop - GPFS Machine Learning Toolkit Industry Accelerators Development Embrace and Extend Surface Area Lock Down Tooling Visualization Tooling Policy Driven Retention & Immutability Deployment Tooling (“App Store”) Role-Based Security $14B in 5 yrs. on Analytics Adaptive MapReduce +++ Workload Manager Fast Splittable CMX Compression REST-exposed Administration ++ + In-Motion Open Source At-Rest IBM Big Data Hadoop Analyze extreme amounts of Platform Beyond traditional data in milliseconds structured data Uses same analytics as BigInsights BigInsights uses same analytics as Streams Data can be analyzed on the way into No forked, not ported: Hadoop Extended with the enterprise for earlier pattern operational excellence and security detection Netezza for in-database MapReduce MPP Data Warehouses © 2011 IBM Corporation IBM Confidential
  • 31. Stream Computing: A new paradigm for ultra low latency and high throughput in-motion analytics Continuous Ingestion Continuous Queries /Analytics on data in motion © 2011 IBM Corporation IBM Confidential
  • 32. Data In Motion  Information used to be aggregated and analyzed every 30-60 minutes and discarded after 72 hours  Analyzing 1000 pieces of unique medical diagnostic information per/sec. and stored in a dynamic model  Perspective: 20% drop in mortality of control group in trials (extend approach to daily activities) - 120 children monitored:120K messages/sec…billions/day © 2011 IBM Corporation IBM Confidential
  • 33. Data In Motion  Hear what’s going on miles away to optimize perimeter displacements  Perspective: Try to find the word “Zero” in a 1000 MP3 song library in a fraction of a second – Figure out the difference between the sound of a human whisper and the wind © 2011 IBM Corporation IBM Confidential
  • 34. Data In Motion – Improving What They Already Have  Old Microsoft-based solution not able to keep up with new 3G demands for their real-time xDR analysis business requirements  Streams and Netezza solution proposed – Time to merge and load data reduced 90%+ – Time to market for new products from 4 hours to minutes © 2011 IBM Corporation IBM Confidential Internal Use Only Reference
  • 35. How Text Analytics Works Football World Cup 2010, one team distinguished themselves well, losing to the eventual champions 1-0 in the Final. Early in the second half, Netherlands’ striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casilas made the save. Winger Andres Iniesta scored for Spain for the win. World Cup 2010 Highlights Arjen Robben Striker Netherlands Iker Casilas Keeper Spain Andres Iniesta Winger Spain © 2011 IBM Corporation IBM Confidential
  • 36. IBM Text Analytics Toolkit Lets You…  Build out world-class text analysis applications 50% faster than manual method  Run faster text analysis (10x or more vs. some marketplace alternatives)  Get more precise and correct answers (2x vs. some marketplace alternatives) © 2011 IBM Corporation IBM Confidential
  • 37. What is BigSheets? Browser-based Big Data analytics tool for business users Big Data Challenges… How can BigSheets help?  Business users need a no  Spreadsheet-like discovery interface lets programming approach for business users easily analyze Big Data analyzing Big Data with ZERO PROGRAMMING  Extremely difficult to find  BUILT-IN “readers” can work with data actionable business insights in in several common formats data from multiple sources with – JSON arrays, CSV, TSV, Web different formats crawler output, . . .  Translating untapped data into  Users can VISUALLY combine and actionable business insights is a common requirement that requires explore various types of data to identify visualization “hidden” insights © 2011 IBM Corporation IBM Confidential
  • 38. © 2011 IBM Corporation IBM Confidential
  • 39. Big Data Made Easy for the Little Guy  USC’s Film Forecaster correctly predicted a clamor for "Hangover 2” that resulted in $100 million opening over Memorial Day weekend – Looked at 250K-500K Tweets and broke down positive and negative messages using a lexicon of 1700 words The Film Forecaster sounds like a big undertaking for USC, but it really came down to one communications masters student who learned Big Sheets in a day, then pulled in the tweets and analyzed them - Ryan Kim © 2011 IBM Corporation IBM Confidential
  • 40. Why IBM for Big Data?  Only IBM is showing data-in-motion and data-at-rest analytics: a bigger more opportunistic view of Big Data  Development and research sit side by side  Virtualization tooling, development, file system, analytics  Not just same company: same org, same people, same leadership  BigInsights being used in IBM products today such as Cognos Consumer Insight © 2011 IBM Corporation IBM Confidential
  • 41. Without a Big Data Platform IBM Big Data Platform You Code… Over 100 sample applications and toolkits with industry focused toolkits Event Custom SQL with 300+ functions and operators! Handling and Scripts Multithreading Streams provides development, deployment, runtime, and infrastructure services Check Application Pointing Management Accelerators HA and Toolkits Performance Debug Connectors Optimization “TerraEchos developers can deliver Security applications 45% faster due to the agility of Streams Processing Language…” – Alex Philip, CEO and President © 2011 IBM Corporation IBM Confidential
  • 42. THINK https://w3-connections.ibm.com/wikis/home?lang=en_US#/wiki/Info%20Mgmt%20Client%20Technical %20Professional%20Resources%20Wiki/page/Understanding%20Big%20Data © 2011 IBM Corporation IBM Confidential 42

Notas do Editor

  1. You know, a great example is radio frequency ID tags (RFID). These caught lots and lots of attention when Wal-Mart was redesigning their supply chain with them, and the cost of RFID tags have come down so much, they ’ ve just proliferated all over the world. When you think about the Instrumentation characteristic of IBM ’ s Smarter Planet (Instrumented, Interconnected, and Intelligent), this is just one example of how we ’ ve become an instrumented world. On this slide you can see that in 2005, there were 1.3 billion RFID tags in circulation; this turns into 30 billion by the end of last year (2011). That ’ s a pretty significant annual growth rate to get to where we got to at the end of 2011; and again, this is just a single example of instrumentation.   They are a good place to start with Big Data, because they are now ubiquitous as is the opportunity for Big Data. They are used to track cars on a toll route, food supplies for temperature transport, livestock, supplies, inventories, luggage, retail items, tickets used for transportation, you name it.
  2. I was on an Airbus plane the other day, and do you realize that these things are hugely sensor-enabled devices that are instrumented to collect data as they operate. They also generate huge volumes of data.   +CLICK+ For this particular Airbus, over a billion lines of a code and a single engine generates 10 terabytes of data every 30 minutes. And there are four engines there, right?   +CLICK+ And, you know, just taking this particular plane from the UK to New York would generate 640 terabytes of data. Now stop and ponder that for a moment. Propose this amount of data injection to your client and it becomes obvious – there ’ s too much data to process, analyze, store with traditional approaches.
  3. You can see in this slide another example of Big Data in the utilities sector: smart metering. As meter reads have transformed from every other month, to a physical read with a estimation every other month, to monthly, weekly, daily, and hourly – you’ve got an immense amount of data streaming into the enterprise as shown on this slide. Smart metering is also about point in time values, so you can spot spikes, and adjust accordingly, so data in motion is a play here too. Smart meters are smart because they can communicate – not only with the customer about their electricity usage and pricing signals, but they can also communicate with the utility to indicate if there are fluctuations in power or even accurately pinpoint an outage. For a utility company, smart meters are generating a wealth of new information that is fundamentally changing the way they interact with their customers.
  4. The notion is that we are always sharing information about ourselves. For example, this particular Hollywood Star actually gave away the location of his house, when he heads to work, and more just by uploading a photo with GPS location enabled (the default for smartphones by the way). The full story of this is located at http://nyti.ms/917hRh.   The US Army had to send guidance and requirements for military phone lockdowns because geo-positioning capabilities of service men and women’s Blackberries and iPhones gave away sensitive location information when unsuspecting service personnel upload pictures of themselves in the Iraqi desert.
  5. Obviously, there are many other forms and sources of data. Let ’ s start with the hottest topic associated with Big Data today: social networks. Twitter generates about 12 terabytes a day of tweet data – which is every single day. Now, keep in mind, these numbers are hard to count on , so the point is that they ’ re big, right? So don ’ t fixate on the actual number because they change all the time and realize that even if these numbers are out of date in 2 years, it ’ s at a point where it ’ s too staggering to handle exclusively using traditional approaches.   +CLICK+ Facebook over a year ago was generating 25 terabytes of log data every day ( Facebook log data reference: http://www.datacenterknowledge.com/archives/2009/04/17/a-look-inside-facebooks-data-center/ ) and probably about 7 to 8 terabytes of data that goes up on the Internet.   +CLICK+ Google, who knows? Look at Google Plus, YouTube, Google Maps, and all that kind of stuff. So that ’ s the left hand of this chart – the social network layer.   +CLICK+ Now let ’ s get back to instrumentation: there are massive amounts of proliferated technologies that allow us to be more interconnected than in the history of the world – and it just isn ’ t P2P (people to people) interconnections, it ’ s M2M (machine to machine) as well. Again, with these numbers, who cares what the current number is, I try to keep them updated, but it ’ s the point that even if they are out of date, it ’ s almost unimaginable how large these numbers are. Over 4.6 billion camera phones that leverage built-in GP S to tag the location or your photos, purpose built GPS devices, smart metres. If you recall the bridge that collapsed in Minneapolis a number of years ago in the USA, it was rebuilt with smart sensors inside it that measure the contraction and flex of the concrete based on weather conditions, ice build up, and so much more.   So I didn ’ t realise how true it was when Sam P launched Smart Planet: I thought it was a marketing play. But truly the world is more instrumented, interconnected, and intelligent than it ’ s ever been and this capability allows us to address new problems and gain new insight never before thought possible and that ’ s what the Big Data opportunity is all about!
  6.   This slide shows the tweets per second (TPS) record breakers for 2011 – as you can see, the record keeps getting broken and the topics range from news, to safety, to sport, to shocking, to ‘cult’ like movie followers.   The point here is that Twitter is not only growing enormously, but the range of topics is from emergency to world events to social commentary to sport to entertainment and all parts in between.   Source: http://www.mediabistro.com/alltwitter/twitters-tweets-per-second-record-breakers-of-2011-infochart_b17210.
  7. You can just +CLICK+ through this slide as another example of social media (such as Facebook and Twitter) and the valuable information that can be found within; note also in some cases, the information is SPAM and noise – and we want to be able to discard that area as well and find the signals in the noise.   The reason why I am showing social media is it involves heavy text analytics – and that’s the hardest part of Big Data analytics. There are easier use cases, and the IBM platform is terrific at that for sure (such as log analysis). In addition, there are easier ways to use text analytics – for example, use it to get insight into company earnings as it pours through hundreds of pages on the web to spot trends and patterns.
  8. Most of you know of Watson, our computing system designed to compete on the Jeopardy game show. Watson represents a breakthrough in terms of volume of information stored, and the ability to access it quickly (answering natural language questions). I think Watson is impressive, because there are many commercial uses for this technology – and the technology exists today! The game Jeopardy provides the ultimate challenge for Watson because the game’s clues involve analyzing subtle meanings, irony, riddles, and other complexities in which humans excel and computers traditionally do not. If you think about Deep Blue, the 1997 IBM machine that defeated the reigning world chess champion, Watson is yet another major leap in capability of IT systems to identify patterns, gain critical insight and enhance decision-making despite daunting complexities. While Deep Blue was amazing, it was an achievement of the application of compute power to a computationally well-defined and well-bound game: Chess. Watson, on the other hand, faces a challenge that is open-ended, defies the well-bounded mathematical formulation of a game like Chess. Watson has to operate in the near limitless, ambiguous, and high contextual domain of human language and knowledge.   Watson answers a Grand Challenge: Can IBM design a computing system that rivals a human’s ability to answer questions posed in natural language by interpreting meaning and context and then retrieving, analyzing and understanding vast amounts of information in real-time? IBM Watson is a breakthrough in analytic innovation, proving that it is possible to harness vast amounts of information and rival a human’s ability to answer questions posted in natural language in real-time. But it doesn't matter how good the machine is if we don’t have good information to feed it. We live in a time where a computer can compete against humans at answering questions in plain English, based on storing, retrieving, analyzing and understanding vast amounts of information at real-time speeds. These same capabilities can enable you to improve and optimize your business, too. IBM just showed the value of putting that information to work by creating a computing system capable of competing on Jeopardy Well there ’ s a lot of technology that went into Watson – and a lot of Big Data technology in there as well. Now take a moment and think about how this iconic game show is played: you have to answer a question within three seconds. The technology used to analyze and return answers in Watson was a pre-cursor to the Streams technology, in fact, Streams was invented because that technology used in Watson wasn ’ t fast enough for some of the in-motion requirements needed by companies today. Jeopardy questions are not straight forward, they have pun and tricks to make them harder – so some of our text analytic technology with natural language processing, which is part of the IBM Big Data platform, is in there too (that ’ s yet another MAJOR DIFFERENTIATOR for IBM in Big Data: our Text   Analytic Toolkit, which you will hear more about later in this presentation). It wasn ’ t always smooth sailing for Watson, the big breakthrough came when they started to use machine learning (ML), and the IBM Big Data platform will further differentiate itself from the field in 2012 when a corresponding toolkit came to market just like the text analytics toolkit. Finally, Watson had to have access to a heck of a lot of data – and Big Data technologies were used to load and index over 200 million pages of data; Watson had everything from encyclopedias, to the bible, to the world famous music and movie databases, etc.   All these technologies mentioned in the previous paragraph had to work together as well. So IBM clearly has some inflection point understanding of these technologies and how to get them working together. In the case of the text analytics and machine learning – well we have to make that easier to consume because you don ’ t have the world ’ s largest commercial research organization for math at your fingertips. So we need to build tooling, and optimization, and accelerators around that and put these technologies inside consumable toolkits: which are we doing now.
  9. +CLICK+ In 2009, we came close as a world to 0.8 zettabytes of data. Now, that ’ s a number that few people understand (a ZB is a trillion GBs!). We ’ re not used to working with numbers like this, in the same manner the DBAs you show the Airbus data generation rates from earlier in this presentation can ’ t fathom ingesting into their data centers.   +CLICK+ In 2010, we crossed the 1 ZB inflection point in the world data: the game is on.   +CLICK+ By the end of 2011 it ’ s estimated we are at 1.8 ZBs – so there are some pretty good growth rates years over year (YoY)   +CLICK+ And so as you look forward into what the future holds, in the next decade – the Big Data era – and you can see it ’ s gong to get crazy. Let me put it in perspective for you – 4 trillion 8-gigabyte iPods of data by 2020 (35 ZB). And you know what? I ’ m willing to bet this is conservative, because some new social networking capability is going to pop up, or a faster smarter more mobile compute technology that allows us to be even more smarter and interconnected and (hopefully) intelligent, so we are on a pace that is unprecedented in the history of the world.   In short, there ’ s a tremendous amount of data being generated from kinds of all these instrumented and interconnected people and devices.
  10. Think about the suitability of applications for IBM Big Data technologies. I am telling you: every single industry has a Big Data opportunity for you. For example, smarter healthcare where a hospital can pick up the sensor readings off of neonatal babies to try to foreshadow incoming problems based on trends. We work with homeland security today. The US President Barack Obama is the Twitter President, if when an event happens, he tweets about it and homeland defence wants to know how people respond and if there are groups to focus on that are expressing negative sentiment laced with terrorism or wrong-doing. Just look across any industry and you ’ re going to find some reoccurring themes. One of those themes is more data, because I (and business for that matter) believe we can make better decisions when you have access to more data, or we can keep that data longer. More data that ’ s persisted for longer periods of time leads to better models. So that ’ s definitely a recurring Big Data theme: “ I want to keep more and more data to get better and better insight, and I want to be able to have analysis on the data that—when it ’ s NOT only structured ” There ’ s unstructured and semi-structured to fold into our mostly structured analytics of today and ALL industries are facing this challenge today (and can benefit from solving it). Lots of uses cases here. For example: Financial Services: Detect and prevent fraud, model and manage risk, personalize banking and insurance products, compliance, archival, +++ Healthcare: Patient monitoring, predictive modeling, compliance, archival, text search, data drive research, +++ Retail: Behavioral analysis, cross selling, recommendation engine (next best offer – NBO), optimize pricing, placement, and design, optimize inventory and distribution, +++ Web/Social/Mobile: Sentiment analysis, Web log, image, and video analysis, personalization, billing, reporting, network analysis, +++ Manufacturing: simulation, analysis, design, improve service via product sensor data, “Digital Factory” for lean manufacturing, +++ Government: detect and prevent fraud, homeland security and intelligence, support open data initiatives, +++
  11. This agenda was developed to show the affect of IBM’s big data solutions on the areas of the business that most Chief Marketing Officers consider as crucial to the health of the business. We’ll dive into each of the three areas, describe solutions to address retailers’ needs, and present two of the most popular big data use cases.
  12. There are two alternatives for implementing social media analytics. One is CCI and the other is a bespoke solution running on BigInsights. Most will be bespoke because customers already have some of the components they need. CCI is a complete offering with all componentry.
  13. Transactional Analytics Data Warehousing Traditionally, POS-based analytics have been sourced through data warehouses. POS data received from the store is cleansed, formatted and stored in the data warehouse. It is subsequently summarized and used for reporting. Operational and Ad Hoc Analytics Teradata has the dominant data warehousing base in retail with most of their implementations based on storing Point of Sale (POS) data for follow-on summary query and reporting. Teradata's sales mantra while building that base was that retailers should store detailed POS transactions so they could be summarized as needed to support operational and ad hoc analytics. Operational analytics are those used for daily decision making and are repeated as new data is received. An example of operational analytics is weekly sales reports. Ad hoc analytics are those required to answer high value, point-in-time questions. Once the question is answered it is normally not asked again, or it could become an operational report if deemed to have repeatable value. An example of a retail, ad hoc analytics would be a prediction model about potential responses to a sales promotion. Difficulties with Data Warehouse Ad Hoc Analytics Many retailers now store multiples years of POS transactions to enable ad hoc analytics, but in truth rarely use them. The primary reason is the difficulty to access the data for ad hoc questions. In execution the transactions are only used for standard reporting and usually only query of recent information. Only the occasional year-over-year report or compliance requirement query will access old data. In order to provide ad hoc access to historical information in a data warehouse an analyst must write SQL programs to use the database structure to retrieve the necessary data; and those programs must be run with the warehouse is not busy with normal reporting, and that's usually at night. The inability of analysts to access data to answer normal business questions has resulted in many data warehouses earning the nickname “data cemetery”, because once it goes in it's never seen again. BigInsights with its Map/Reduce interface opens access to the older data and allows analysts to quickly build necessary tables without interfering with operational warehouses. In fact, it offers a spreadsheet-like interface. Cost Justification Factors Teradata is an expensive platform, and many retailers suffer with poor response time to minimize the cost of service. A Netezza study estimated the cost/terabyte of data in a Teradata warehouse to be approximately $7K/month. The same study calculated the average cost of a Netezza warehouse to be about $5K/month. Industry studies estimate the cost of building a Hadoop platform to be much less because of the use of commodity hardware and the avoidance of the need to store structure (about half the storage in a data warehouse). Moving little used data to BigInsights could be a self-funding project. Example Implementation We recently performed a proof of concept at a premier, large retailer to “prove” the above scenarios. POS data was loaded into a BigInsights platform in a cloud and analysts were given known business problems to solve. The POC was a success and several current business problems were solved. The platform proved so useful that the POC went beyond its design scope and solved problems that “walked through the door” because the business learned about the capabilities being tested. The implementation at that retailer is a model of what will be proposed at other retailers.
  14. The 360-degree view of the customer is not a new thought, but it is typically implemented in a way that ignores customer correspondence. So not really a 360-degree view if you are ignoring what your customer(s) actually say now is it? The idea here on these use cases is to really look holistically on what customers are telling you about their interest, likes, dislikes, concerns and about their risk to you as a part of how your treat them. BigInsights is used here to gather and do the unstructured analytics, Streams can the look for patterns identified in real time, and as with all these use cases Netezza can be the destination for further analytics.
  15. Risk and compliance are key topics today – what we’re doing with our Big Data portfolio is making broader, faster and more holistic risk management possible. In most cases firms shrink the risk information sources utilized to fit conventional processing methods, and that simply doesn’t work well or need to be the case today. Our stance is use all the available sources and store/retain/compute as necessary to deal with the risk and we, IBM, will provide the platform capabilities necessary to do so.
  16. Much has been said regarding the proliferation of social media – its multiple channels, scope of content and subject matter. Something for everyone. Available to everyone. Immediate and impactful. But what’s different from the media explosion of television and radio some 50 year’s ago is both the sheer volume and influence of social media. 770 million people have visited a social networking site, according to comScore … According to Forrester research, 4 out of 5 Americans use social media in some capacity. But it’s the power of influence and massive distribution that make social media such a potent force in influencing consumer perceptions. In fact, 78% of consumers trust their peer’s recommendations … And it’s this volume of content, distribution and influence that is re-shaping how organizations are engaging their customers and broader constituencies through social media, there relationship to brands, products, services and issues of the day. Given this Social media analytics is a hot topic for most firms, and while some basic solutions are starting to show up, there is a lot of work that remains. Banks/Brokers are keen on trying to better understand the needs and desires of their customers to increase sales. Much of this analytics needs to include Social Media as one of, not the ONLY, source. That makes these systems more hoc in nature and cross information type, and are better supported by BigInsights than data warehouses.
  17. > Click to animate < > Click to animate < This is an EXPLOSION of data in the communications industry > Click to animate < Adding data at the rate of 500 Petabytes per month last year to > Click to animate < 10 times that monthly amount by four years. Some examples of where that data comes > Click to Next Slide < from in the communications industry.
  18. The AT&T global backbone network carries just under 24 petabytes of data traffic on an average business day. Put another way, that is 53,549 lbs (24,289 kg) of blu ray disks of data every day. Over 550,000 new Android smart phones are activated each day on top of a slightly smaller number of iPhones. That is driving data vs voice through networks at the ratio of 11:1. IPTV through services like YouTube is becoming the dominant data traffic on communications networks everywhere. These are just examples that illustrate the following points: Traffic is beginning to exceed infrastructure capacity – driving network costs up. Communications companies are having a hard time increasing revenue to cover those costs. Driving profit from this market is becoming much harder every month, let alone every year. In the next few minutes, I will show you why All Telecommunications (And Media) companies have Big Data Problems! Click to Next Slide < Public Domain Facts and Notes: "AT&T- News Room". Att.com. 2008-10-23. http://www.att.com/gen/press-room?pid=4800&cdvn=news&newsarticleid=30623. Retrieved 2009-08-16.  The last 12 quarters have seen 30 X growth - that's 3,000 % growth - in traffic across the AT&T Network. Growth rate in just Q4 2009 - just that one quarter - was greater than the entire Network traffic for the previous year - 2008. The amount of capital investment and corporate effort in 2010, this year (2011) and next year (2012) will be roughly equivalent to that of building the the Hoover Dam In the field of telecommunications, data retention (or data preservation ) generally refers to the storage of call detail records (CDRs) of telephony and internet traffic and transaction data (IPDRs) by governments and commercial organizations. In the case of government data retention, the data that is stored are usually of telephone calls made and received, emails sent and received and web sites visited. Location data is also collected. The primary objective in government data retention is traffic analysis and mass surveillance. By analyzing the retained data, governments can identify the locations of individuals, an individual's associates and the members of a group such as political opponents. These activities may or may not be lawful, depending on the constitutions and laws of each country. In many jurisdictions access to these databases may be made by a government with little or no judicial oversight (e.g. USA, UK, Australia). In the case of commercial data retention, the data retained will usually be on transactions and web sites visited. Data retention also covers data collected by other means (e.g. by automatic numberplate recognition systems) and held by government and commercial organisations. Telecoms: AT&T transfers about 19 petabytes of data through their networks each day.[9] Telecoms: The AT&T global backbone network carries 23.7 petabytes of data traffic on an average business day. The last 12 quarters have seen 30 X growth - that's 3,000 % growth - in traffic across the AT&T Network. Growth rate in just Q4 2009 - just that one quarter - was greater than the entire Network traffic for the previous year - 2008. The amount of capital investment and corporate effort in 2010, this year (2011) and next year (2012) will be roughly equivalent to that of building the the Hoover Dam. Transcript : Now this explosion in data really is quite extraordinary in terms of how much information is out there today. You know, you look at 7 TB of data every day on Twitter. I mean, that's just a remarkable phenomenon. Facebook, probably a lot of you saw the movie that's out on the -- The Social Network -- 10 TB every day. Some of it's certainly entirely useless; some of it people wish never went on Facebook. You know, that expression what happens in Vegas stays in Vegas, it's not true. What happens in Vegas goes on the web and will live on for hundreds of years, all right. Your, the future members of your family will look back on some of the things that you did that have been digitally recorded and just shake their heads in disgust, so be careful, be careful. So pretty extraordinary in terms of what's happening around data in information. Author ’s Original Notes: IBM IOD 2010_GS Day 2 Transcript : Now this explosion in data really is quite extraordinary in terms of how much information is out there today. You know, you look at 7 TB of data every day on Twitter. I mean, that's just a remarkable phenomenon. Facebook, probably a lot of you saw the movie that's out on the -- The Social Network -- 10 TB every day. Some of it's certainly entirely useless; some of it people wish never went on Facebook. You know, that expression what happens in Vegas stays in Vegas, it's not true. What happens in Vegas goes on the web and will live on for hundreds of years, all right. Your, the future members of your family will look back on some of the things that you did that have been digitally recorded and just shake their heads in disgust, so be careful, be careful. So pretty extraordinary in terms of what's happening around data in information. Author ’s Original Notes: 06/12/12 Transcript : Now this explosion in data really is quite extraordinary in terms of how much information is out there today. You know, you look at 7 TB of data every day on Twitter. I mean, that's just a remarkable phenomenon. Facebook, probably a lot of you saw the movie that's out on the -- The Social Network -- 10 TB every day. Some of it's certainly entirely useless; some of it people wish never went on Facebook. You know, that expression what happens in Vegas stays in Vegas, it's not true. What happens in Vegas goes on the web and will live on for hundreds of years, all right. Your, the future members of your family will look back on some of the things that you did that have been digitally recorded and just shake their heads in disgust, so be careful, be careful. So pretty extraordinary in terms of what's happening around data in information. Author ’s Original Notes: Prensenter name here.ppt Transcript : Now this explosion in data really is quite extraordinary in terms of how much information is out there today. You know, you look at 7 TB of data every day on Twitter. I mean, that's just a remarkable phenomenon. Facebook, probably a lot of you saw the movie that's out on the -- The Social Network -- 10 TB every day. Some of it's certainly entirely useless; some of it people wish never went on Facebook. You know, that expression what happens in Vegas stays in Vegas, it's not true. What happens in Vegas goes on the web and will live on for hundreds of years, all right. Your, the future members of your family will look back on some of the things that you did that have been digitally recorded and just shake their heads in disgust, so be careful, be careful. So pretty extraordinary in terms of what's happening around data in information. Author ’s Original Notes:
  19. Industry statistics show that it is between 4 and 7 to 1 more profitable to keep a customer than attract a new one. Over 70% of CMO ’s interviewed named Churn as one of there top three problems facing their enterprise. Addressing current challenges is an important part of a future-focused strategy. The rising costs of acquisition, retention and servicing customers need to be controlled, churn reduced and average revenue per user (ARPU) erosion has to be curtailed. Such efforts depend on differentiating the customer experience and creating new revenue streams by tapping into rich customer data, applying analytics and working effectively with content providers. > Click to Next Slide <
  20. InfoSphere Streams supports real-time mediation by handling billions of CDRs each day and linear scalability for growth. CDR mediation for billing systems have been around for decades. Using the Streams Telecommunications Mediation and Analytics (TMA) offering supports the following: A platform for real-time analytics on CDR’s Offloaded CDRs processing to Streams platform enhances warehouse performance and improved TCO Single platform for mediation and real time analytics reduces IT complexity The Business Benefits are substantial and include: Real time CDR processing enables real time billing – faster billing equals more profit Provides platform for real-time analytics to drive revenue: for example, location driven marketing campaigns. Data now processed reduced from 12 hours to 1 second. HW costs reduced 87% Support for future growth without the need to re-architect, more data, more analysis. Finding and addressing the negative sentiment that dropped calls by high priority customers proactively. Addressing terminated calls by location and customer type for customer service as well as fraud detection. Real Time Mediation can lead to fresh BI applications. Lets examine what the TMA offering looks like architecturally. > Click to Next Slide <
  21. CDR Analytics can be extended with BigInsights integrating with existing warehouse and BI infrastructure. By using the IBM Big Data at Rest solutions, huge volumes of CDR and OTHER data can be ingested in the format they arrive in. Because of the cost effectiveness of BigInsights and its integration capabilities, new analytics and insights are derived from combining CDR ’s with social media, clickstream, urls, and other unstructured data. An example would be to discover relationships between dropped calls and abandoned carts. Or, how consumer sentiment relates to web navigation, and local cell conditions in order to better predict churn. > Click to Next Slide <
  22. Measuring ad effectiveness is a problem as old as mass media ads where invented. The complexity of this has increased dramatically with the advent of the internet in the ’90’s and mobile technology in the 2000’s. Using social media as a new source for measuring the response to ads quickly or in real time is the focus of this case study. Media customers are very eager to find ways to do this in a cost effective way. There are many agencies and services that can explore social media and report back on sentiment but there is are three problems with that approach so far: Sentiment scores (likes and dislikes) only give a small part of an answer. Social Media Analytics that do not create actionable insights tied to direct business decisions are just a “buzz meter”. The time lag for even simple sentiment scores is too long to take effective action. Social media is ever changing. Getting ahead of it with insights that are very specific to your requirement is very difficult and costly. The solution is a combination of IBM Big Data and IBM ’s methodology developed by GBS. First, lets examine the IBM Big Data Technology that supports the ad effectiveness solution. > Click to Next Slide <
  23. This is a closer look at the Big Data platform. You can see the product view and how each fits into the IBM Big Data platform.
  24.   So I ’ m likely going to start to mention some products here around the IBM Big Data platform. +CLICK+ Hadoop is about bringing all this data into an at-rest batch-based repository. You can see on this slide open source Hadoop can be used to analyze semi-structured data, structured data (there are times when this should be done in an EDW and sometimes in a Big Data system), and unstructured data.   +CLICK+ The IBM Big Data platform EMBRACES and EXTENDS Hadoop. As I mentioned before, IBM won ’ t fork the Hadoop code.   +CLICK+ The IBM at-rest solution for Big Data is built on Hadoop and it ’ s called IBM InfoSphere BigInsights (BigInsights). And as I said before, we ’ re not going to fork that code, we are going to embrace and extend it. BigInsights ‘ hardens ’ Hadoop and rounds it out to make it more enterprise-worthy. Our at-rest story also includes Netezza as a repository and this engine includes the ability to run MapReduce (the programming framework around Hadoop) program IN-DATABASE. For some analytics workloads, MapReduce is a better choice than SQL and for data that ’ s more fit for the EDW (static, structured, repeatable, governed) Netezza is a terrific fit here. We also extend Big Data with industry leading in-motion technology with a product called InfoSphere Streams (I talked about it earlier). None of our competitors are really talking about Big Data in-motion. Some people will talk about complex event processing (CEP), which is about 10,000 transactions a second; it ’ s not at the speed or the scale which this is; and CEP can only tolerate simple rules and mostly structured data.   +CLICK+ The IBM Big Data platform then focuses on two key value propositions: Operational Excellence and Analytical Excellence. Operational Excellence - on the right - for the BigInsights platform (and assumed in the Netezza platform) details what we do for BigInsights above and beyond what open source Hadoop ships. I will be honest, there are a lot of vendors doing this today, Cloudera, MapR, HortonWorks, there ’ s a lot of people talking about making Hadoop operationally excellent. Well, we know something about operational excellence, because we ’ re IBM, so we have this enterprise grade proven file system called GPFS. We ’ ve ported that into work in Hadoop as GPFS SNC and since it ’ s POSIX it makes life easier, more secure, and more performant than an open source Hadoop world. For example, IBM understands that security is important, so we use GPFS SNC and extended capabilities in Hadoop to provide surface area lockdown: BigInsights gives you granular role-base security. You can attach policies around retention and the mutability (change rights). These are pretty important things. Adaptive MapReduce is kind of like connection pooling for a database, it makes the system run faster without you having to tune it. There is a workload manager, with a very fast Hadoop-oriented compression algorithm that is splittable. Rich tooling for management. In short, operational excellence is going to appeal to the folks managing the Big Data platform. It ’ s important. IBM does a great job – others do a good to great job – but at the end of the day, you get a well running Hadoop cluster. That ’ s it.   From an in-motion perspective, the operational excellence is unparalleled and I have yet to see another vendor able to seriously challenge us in this area. The value for the business is on the left side that I refer to as Analytical Excellence. The IBM Big Data platform provide these toolkits that let you get building analytics faster, more reliable and more potent than you could otherwise, and we do this for both Big Data at-rest and in-motion. There are industry accelerators, development tooling, visualization tooling, text analytics tooling, machine learning toolkit (which is coming in the future) and this is where the magic happens. It ’ s where you get a solution – IBM is telling the story on how to get to analytics and I ’ m going to use an example later in this presentation to show you just how much of a head start IBM gives you in Big Data.
  25. UOIT Nosocomial is a term that simply means 'hospital acquired' or 'got this bug/infection while in the hospital. Carolyn is working to detect blood poisoning (a nosocomial infection) which is also called SEPSIS. The specific test is that the infant oxygenation level (aka SpO2 for peripheral oxygenation level) drops below 85% blood pressure (aka Mean Arterial Pressure - MAP) drops below the gestational age measured in weeks for the same 20 seconds. Another test Carolyn is running is to determine if the baby is about to crash. (crash means heart stop). Normal people and premise have variability in their heart rate - speeds up, has difference in peaks, time between the stages of the heart wave, etc. When babies are about to crash, they try to preserve all energy and this variability drops. So, we are analyzing EKG waves to determine each heartbeat wave, and using Fast Fourier Transforms (FFT's) to determine the area under the wave. Then comparing the waves to understand variability. This test is know as Heart Rate Variability (HRV).   Hospital equipment issues an alert when a vital sign goes out of range – prompting the hospital staff to take action immediate. However many live threatening conditions do not reach critical levels right away. Often signs that something is wrong begin to appear long before the situations becomes serious and even a skilled nurse or physician might not be able to spot and interpret these trends in time to avoid serious complications. What’s more is some of these warning indicators are hard to detect and it’s next to impossible to understand their implications until it is too late. For example, nosocomial infection, a life threatening illness contracted in hospitals. Research has shown that signs of this infection can appear 12-24 hours before overt trouble/distress is spotted. Making things more complex, in a baby where this infection has set in, heart rates stay too normal (it doesn’t rise and fall within the day as it would for a healthy baby); all the while the pulse is within the acceptable limits. While information needed to detect the information is present, it’s too subtle, the nurses are too busy to see out of normal individual events. In a neonatal ward, the ability to absorb and reflect upon everything presented is beyond human capacity, there is just too much data. Information in these hospitals just wasn’t being used. Machines provide up to 1000 reading per second is summarized into a single reading every 30-60 minutes and then discard 72 hours later. Consequently, a set of rules that reflect the best understanding of the problem have been built, and they can be dynamically changed. Now extend this to kids with cancer attending school, and so on.   Kinds of things that feed the data: Entrotrachael Tube, Nastrogastic Tube, Ventilator Hose, Oxygen, Pulse, Hearth, Skin temperature, Body temperature, translucency, bilaterally placed electrodes, reference electrode, +++
  26. US Department of Energy is a national defense priority among other things. So research needs to be safeguarded by both above and below ground biological and mechanical threats. Their solution has to continually consume and analyze information in-motion such as movements of animals, humans, the atmosphere (such as wind). Scientists lacked the time to record the data and listen to it later. The data consumption and analytical requirements would be akin to listening to 1000 MP3 song simultaneously and looked for the word “Rocket” in every song – within a fraction of a second. TerraEchos has one of the most robust classification systems in the industry. They use Adelos S4 fiber-optic acoustic sensor technology from the US Navy. They can figure out the difference between a human whisper, the pressure of a footstep, and between the sound of a human voice and the whisper of the wind.
  27. Time is of the essence when analyzing customer call data to serve up location dependent offers/advertisements, identify possible network problems, or provide reps with the latest information on a customer calling with a service problem. Sprint needed to be able to access and analyze call, internet usage and texting detail records (xDRs) in real-time. The company had been using Microsoft SQL Server as part of a homegrown solution to transform data for analysis and feed it to their Netezza warehouse.   With the introduction of 3G technologies and the corresponding explosion in data volume, this Microsoft-based solution was unable to meet SLAs and performance requirements set by the business. The technology owners knew this problem would only get worse with the transition to LTE (4G). The latency created by the system meant Sprint was unable to capitalize on new revenue opportunities, and was forced to be reactive, rather than proactive, in addressing customer and network issues.   The IBM team worked with the part of Sprint’s organization responsible for running the company’s network (rather than the IT department) to propose InfoSphere Streams as a truly real-time conduit for xDR analysis. A proof of concept using InfoSphere Streams provided Sprint with overwhelming and indisputable evidence that the Microsoft-based solution should be replaced. The POC showed that with Streams: The time to merge data was reduced by 91%, the time to load data was reduced by 92%, and storage requirements were reduced by 93%   A core component of IBM’s platform for big data, Streams provides near linear growth when adding additional nodes to the runtime cluster. Applications can be re-deployed without being re-written to take advantage of the extra hardware. This gives Sprint tremendous flexibility to tailor their infrastructure to their business requirements. For example, based on the POC, Sprint can select the number of blades to meet their velocity requirements.    This is a great example of clients
  28. This slide shows a very simple example of the end goal for text analytics. Imagine an application that converts text to speech. In this very simplistic example, it’s a text to speech application that takes a streaming radio broadcast and finds structure within it.
  29. To use an analogy for the text analytic toolkit that comes with the IBM Big Data platform, I will refer to a shopping trip to the art store I had with my daughter the other day. We were looking to buy some color by paintings, and my gosh, t ’ s not just Disney and Phineas & Ferb. There are some very rich detailed beautiful portraits: Monet, Renoir, you name it – I never imagined. And I looked at what we did and in, you know, the IBM Big Data platform, we actually offer this whole toolkit which remind me of this. The toolkit has everything you need. And so you go and here ’ s this really rich and detailed colour by numbers painting (our tools, our Annotator Query Language, our pre-built extractors) that allow you to paint this wonderful picture. I could extend it in the background and put some trees in there (add to the extractors IBM is providing with a further set of rules) and you end up with this vibrant picture.   Now imagine decorating a room. All the other vendors seem to leave you with tools to paint the wall and put a hook in the centre to hang some art and you certainly aren ’ t going to win any type of decorating awards with that. It ’ s all left up to you and this is EXACTLY what Cloudera and MapR are enticing our clients to do. They ’ re getting clients to go in and say that decorating the wall is painting it, or they ’ re just saying, here ’ s the paint, you do the rest. Folks, the painting the wall is the easiest part; it ’ s the operational excellence. It ’ s easy. I think we do a better job but it ’ s easy.   So what do you do if you ’ re on the Cloudera platform? I guess you go and buy some different tools, get some skills, hire them out, and there ’ s my painting. I ’ m not much of an artist. Then I go take some courses or pay for expensive skills and you know what? A lot of your clients, they have smart people so that ’ s where they get to on the right. You can see, well they kind of misinterpreted the branch, they got the nose different and this may look okay but it ’ s not as rich and detailed as the picture on the left. And the point is Big Data technologies were born in the Yahoos, in the Googles, in the Facebooks of the IT world. These folks have mountains of developers, near unlimited development resources. But, you know, if I ’ m an insurance company or I ’ m a credit card company, I don ’ t have unlimited development resources. But chances are I ’ m outsourcing some of that development, and I ’ m going to show you why that won ’ t work, so we ’ re not—you know, we ’ re—this is not our—our core competency is not development; it ’ s our business. So why are you asking me to be core competency? But everybody ’ s jumping on the big data wagon right now because it ’ s the hottest thing going.
  30. One of the Big Data challenges is “ How do I get analysts to go out and analyze this data with zero programming ” . If you don ’ t have such tooling, you create an unnatural dependency on development to go and hand-code and build every piece of visualization and analysis. This is too expensive, inefficient, and just too cumbersome. BigSheets gives you exactly this, with ZERO programming. Your analysts are going to need to be able to go visualize and analyze JSON formats, CVS and text files and all that kind of stuff; they are going to want a programming free crawler, and more. All of this is included in BigSheets – to the end user, it ’ s looks like a spreadsheet; under the covers, it ’ s generate Pig jobs to run on Hadoop.
  31. Here is a screen shot of BigSheets – again, it looks like a ubiquitous spreadsheet software and we all know (for good or bad) that this is the most popular analytic tool in the world, and that ’ s why we built BigSheets to operate like a spreadsheet. Just like in a spreadsheet, you can pivot data union it, run some macros, and so on. BigSheets is built with Web 2.0 technologies and runs within a Web browser – it is tightly integrated into the BigInsights management toolset.
  32. Here is another example of something the University of Southern California Annenberg School of Communication did with the IBM Big Data platform ’ s BigSheets technology. USC@Annenburg created the Film Forecaster tool and used it to correctly predict 2011 ’ s summer block busters based on scraping Twitter and analyzing that against a simple lexicon that described a positive or negative showing for a movie. They made quite the impact since this very solution was featured on ABC News (a national news agency in the USA).   More striking is the quote: the application was built by a communication Masters student who learned Big Sheets in a day.
  33. +CLICK+ Now that we’ve talked about how end users visualize Big Data, and how IT can deploy the applications – let’s talk about the hardest part of all, building them – and let’s start with Big Data in-motion: InfoSphere Streams. How do you do this on your own? If you choose to build it out yourself, and remember this IS NOT CEP, it ’ s WAY more scalable, resilient, available, and has handlers for unstructured data (which is going to be my point): if you build this yourself YOU have to worry about event handling, check pointing, security, availability, provisioning, debugging, and all kinds of other stuff shown (and some not shown) on this slide.   +CLICK+ The IBM Big Data platform offers you a toolkit to build in-motion Big Data applications. Inside this toolkit are accelerators and tooling and more that let you build out something very quickly and powerful. For example it includes the Streams Processing Language (SPL). SPL came out in the 2nd release of InfoSphere Streams. TerraEchos uses this piece of the toolkit extensively, and with it, they are able to build their applications 45 percent faster! So we give you the run time and infrastructure services that kind of take care of all the hard stuff for you, whether one node is overloaded, whether one node goes down, you don ’ t have to worry about that. We ’ ve got that covered. So you just kind of build the logic. SPL is a declarative language in the same way that SQL and Annotated Query Language (AQL) are. Specifically, parts of SPL are are truly declarative, but there are parts and extensions to all of these that are not completely declarative, but that ’ s beyond the scope of the point being made here. IBM has a two decades history of taking declarative languages and getting them to run in massively parallel processing (MPP) environments; and that ’ s exactly what Streams and Hadoop clusters are. This allows you to spend more time on building the application as opposed to fine tuning it for performance which is what folks typically have to do. Think ISAS (DB2 with DPF) and SQL – it ’ s the same concept. Beyond MPP optimization, SPL has number of local optimizations which will include new auto-parallelization and pipelineing that has not made it into the product yet as of 1Q12, but will be coming soon.   IBM ships a number of accelerators to help you get started and flatten the time to value for the Big Data deployment curve. In this section, I will talk about Text Analytics Toolkit for the remainder of this presentation. There are accelerator kits for Telco, Smarter Energy, Public Transportation, Finance, Data Mining --- and more on the way. Over 100 sample applications, user defined toolkits and standard toolkits with over 300 functions and operators.   Telco: Process Call Data in Real Time. This is the foundation for realtime marketing promotions, churn prevention, etc. Finance: Real Time Market data ingestion and management, Real Time decision support for Equities, Derivate, Commodity and Forex trading, Incorporate additional contextual awareness (news, weather etc.) into trading decision, Real Time cross asset pricing, Continuous real time trade monitoring to identify fraudulent trading, Real Time Cross Asset across trading desks and geographies for a continuous enterprise risk level and liquidity management. Smarter Energy: Sample applications to monitor electric transmission grids using phasor measurement units to monitor for voltage stability and transient stability to improve availability. Data Mining: Mining data streams to extract relevant information or intelligence is critical for a majority of stream processing applications. IBM InfoSphere Streams Mining Toolkit integrates with InfoSphere Warehouse using PMML standard. PMML is supported by several state-of-the-art statistics and data mining software tools such as InfoSphere Warehouse, R / Rattle, SAS Enterprise Miner, SPSS, and Weka. InfoSphere Streams, used with the Mining Toolkit, can help you detect fraud, prevent customer churn, segment your customers, and simplify market basket analysis. The in-database data mining capabilities integrate with existing systems to provide scalable, high performing predictive and pattern analysis without moving your data into proprietary data mining platform. Public Transportation: Sample intelligent transportation system to display current location of buses based on GPS readings and estimate time of arrival at future stops based on current traffic . User Defined Toolkits: Create reusable sets of operators and functions. A powerful base for creating cross-domain and domain-specific toolkits.   How Text Analytics Works   This slide shows a very simple example of the end goal for text analytics. Imagine an application that converts text to speech. In this very simplistic example, it’s a text to speech application that takes a streaming radio broadcast and finds structure within it.   What ’ s Wrong with Text Analytics Today There are lots of alternative approaches and infrastructure for text analytics in the marketplace today. They tend to perform poorly in terms of accuracy and speed. They ’ re very difficult to use. They typically require an army of Java programmers to get stuff going. They ’ re often characterized in the Internet as inflexible and inefficient, because the programmer has to go to the analyst; the analyst then takes the annotator designed to extract the text; it doesn ’ t work right; so the two have to get back together again and it becomes an iterative loop and this hurts analyst productivity. If you ’ ve ever worked to resolve performance problems in a Java Hibernate environment with developers and DBAs, or perhaps in an SAS ad DBA environment trying to implement a model, you know the error prone inefficient process here.