SlideShare a Scribd company logo
1 of 12
•   Click to add text




Managing Uncertain Data at Scale
     Nikolay Marin




                                   © 2013 IBM Corporation
Managing Uncertain Data at Scale


Managing Uncertain Data at Scale




                                    By 2015, 80% of the world’s data will be uncertain
    Trend: Most of the
    world’s analyzed                Uncertain data management requires new techniques
    data will be uncertain          These techniques are necessary for real-world Big Data Analytics



    Opportunity:                    Robust, business-aware uncertain data management
    Business leadership
                                    Use analytics over uncertain web, sensor, and human-generated data
    using Big Data
    Analytics                       Enable good business decisions by understanding analysis
                                     confidence


    Challenge: Taking               Analysis of text is highly nuanced; sensor-based data is imprecise
    Big Data Analytics              Timely business decisions require efficient large-scale analytics
    into an uncertain
    world                           It is more difficult to obtain insight about an individual than a group,
                                     especially if the source data is uncertain


© 2013 3IBM Corporation                                                                                         2
Managing Uncertain Data at Scale


The fourth dimension of Big Data: Veracity – handling data in doubt



            Volume                                   Velocity          Variety               Veracity*




                                                                   Data in Many
       Data at Rest                           Data in Motion                             Data in Doubt
                                                                      Forms
       Terabytes to                             Streaming data,       Structured,         Uncertainty due to
    exabytes of existing                        milliseconds to    unstructured, text,    data inconsistency
      data to process                         seconds to respond      multimedia          & incompleteness,
                                                                                         ambiguities, latency,
                                                                                           deception, model
                                                                                            approximations

* Truthfulness, accuracy or precision, correctness


© 2013 3IBM Corporation                                                                                      3
Managing Uncertain Data at Scale


Uncertainty arises from many sources

    Process Uncertainty                  Data Uncertainty                   Model Uncertainty
            Processes contain            Data input is uncertain           All modeling is approximate
              “randomness”

                                    Intended                  Actual
                                    Spelling Text Entry      Spelling



                                                           ? ?
                                                            ?              Fitting a curve to data
        Uncertain travel times             GPS Uncertainty

                                                        ?            ?
                                       Testimony
                                                               ?
                                                         {Paris Airport}
                                              Ambiguity


                                                    {John Smith, Dallas}
         Semiconductor yield                       {John Smith, Kansas}    Forecasting a hurricane
                                   Contaminated?                              (www.noaa.gov)
                                     Rumors          Conflicting Data

© 2013 3IBM Corporation                                                                              4
Managing Uncertain Data at Scale


  By 2015, 80% of all available data will be uncertain


                                                                                                              By 2015 the number of networked devices will
                                                                                                               be double the entire global population. All
                                 9000
                                                                                                                      sensor data has uncertainty.
                                 8000 100
Global Data Volume in Exabytes




                                        90                                                     The total number of social media
                                 7000
                                                                                              accounts exceeds the entire global
                                             Aggregate Uncertainty %




                                        80                                                  population. This data is highly uncertain
                                 6000
                                                                                              in both its expression and content.
                                        70




                                                                                                                                                        s)
                                 5000




                                                                                                                                                of r s
                                                                                                                                                    in g
                                                                                                                                           rn nso
                                        60




                                                                                                                                                  Th
                                                                       Data quality solutions exist for




                                                                                                                                                e
                                 4000




                                                                                                                                        S
                                        50




                                                                                                                                             et
                                                                       enterprise data like customer,




                                                                                                                                        te
                                                                                                                                        (In
                                 3000   40                             product, and address data, but
                                                                         this is only a fraction of the                                           ia )
                                                                                                                                              M ed d text
                                 2000
                                        30                                   total enterprise data.                                      i a l an
                                                                                                                                   S ,oc audio
                                        20                                                                                            eo           P
                                 1000                                                                                            (vid          VoI
                                        10
                                    0                                                                                            Enterprise Data
                                                                        Multiple sources: IDC,Cisco
                                        2005                                                              2010                                       2015

© 2013 3IBM Corporation                                                                                                                                      5
Managing Uncertain Data at Scale


How to reduce uncertainty in processes, models, and data




Constructing context for better understanding
 Extract as much information as feasible from each source
 Combine (condense) data from multiple sources
 More data from more sources is better
   – Gathers more evidence for statistical methods

                                   Using statistical methods scaled for Big Data
                                    Stochastic techniques efficiently reason about uncertainty
                                    Monte Carlo techniques explore many possible scenarios
                                     in order to gain insight


Requires specific business process and industry context
© 2013 3IBM Corporation                                                                       6
Managing Uncertain Data at Scale


Statistical techniques reduce uncertainty in analytical models

                       Attributes
     Trouble tickets




                                                                                    Help agent find
                                                                                    similar tickets
                                               Use stochastic search
                                               to find trouble tickets
                                               that are similar



  Trouble ticket attributes                Model approximation                  Prediction

    Some attributes such as server type    Treat N attributes as N
     are precise                             dimensions in space                 Improve predictability by getting
    Other attributes such as words in      Model similarity as closeness in     agent feedback
     trouble tickets may be imprecise        the N dimensional space
     indicators of the problem



   Improve suggestions for similar problems using corroborating data and better mathematical techniques
   Analyze all the data – do not subset
   Use related techniques to automate Level 1 support, finding problem clusters, etc.

© 2013 3IBM Corporation                                                                                               7
Managing Uncertain Data at Scale


Analytics is broadly defined as the use of data and computation to make
smart decisions




                    Data                  Decision point          Possible outcomes


                                        Data instances
                  Historical                                                      1
                                                                              n
                                        Reports and queries on         Optio
                                         data aggregates
                                        Predictive models              Option 2
                                        Answers and confidence         Opt
                 Simulated                                                    ion
                                        Feedback and learning                    3


   Text      Video, Images     Audio




© 2013 3IBM Corporation                                                               8
Managing Uncertain Data at Scale


Future of Analytics




    Explosion of                    Creates new analytics opportunities
    unstructured data               Addresses new enterprise needs




    Consistent,
    extensible, and                 Reduces cost-to-value for enterprises
    consumable analytics            Increases analytics solution coverage with limited supply of skills
    platform



    Optimizing across               Analytics becomes a dominant IT workload and drives HW design
    the stack to deploy
                                    Opportunity to seamlessly scale from terascale to exascale
    analytics at scale



© 2013 3IBM Corporation                                                                                    9
Managing Uncertain Data at Scale


  Analytics toolkits will be expanded to support ingestion and interpretation of
  unstructured data, and enable adaptation and learning

                  Adaptive Analysis                                   Responding to context                                 Learn
                                                                                                                             In the context of
                  Continual Analysis                                  Responding to local change/feedback
New                                                                                                                          the decision
Methods           Optimization under Uncertainty                      Quantifying or mitigating risk                         process
                                                                                                                            Decide and Act
                  Optimization                                        Decision complexity, solution speed

                  Predictive Modeling                                 Causality, probabilistic, confidence levels

                  Simulation                                          High fidelity, games, data farming
                                                                                                                            Understand
                  Forecasting                                         Larger data sets, nonlinear regression                 and Predict
Tradi-
tional            Alerts                                              Rules/triggers, context sensitive, complex events

                  Query/Drill Down                                    In memory data, fuzzy search, geo spatial

                  Ad hoc Reporting                                    Query by example, user defined reports                Report
                  Standard Reporting                                  Real time, visualizations, user interaction

                  Entity Resolution                                   People, roles, locations, things
                                                                                                                            Collect and
New               Relationship, Feature Extraction                    Rules, semantic inferencing, matching                   Ingest/Interpret
Data                                                                                                                      Decide what to count;
                  Annotation and Tokenization                         Automated, crowd sourced
                                                                                                                          enable accurate counting

  Extended from: Competing on Analytics, Davenport and Harris, 2007
  © 2013 3IBM Corporation                                                                                                                    10
Managing Uncertain Data at Scale


Finally...what about a longer term view.... say the next 10-50 years?

1. Artificial Intelligence
2. Nano –“everything”
3. Cognitive Computing
4. Deep (Exascale) Computing
5. Automic & Quantum Computing
6. Human / Computer Interaction
7. Machine to Machine Interaction
8. BioTech / Human Augmentation
9. Robots & Robotics
10. Advanced / Predictive Analytics
11. Security & Privacy
12. 3-D Printing
13. Video-enabled Business Processes
14. Personalized Web/Assistants
15. Ubiquitous Computing
16. Gaming
17. Simulation
18. Virtual Computing (including virtual worlds, tele-presence, etc.)
19. Augmented Reality


IBM Academy of Technology and Global Technology Outlook can help you find some answers

© 2013 3IBM Corporation                                                                  11
Managing Uncertain Data at Scale




© 2013 3IBM Corporation

More Related Content

Similar to IBM - Managing Uncertain Data at Scale

SAS Fraud Framework for Insurance
SAS Fraud Framework for InsuranceSAS Fraud Framework for Insurance
SAS Fraud Framework for Insurancestuartdrose
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Mark Heid
 
Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryAmazon Web Services
 
InfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroupInfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroupIBMInfoSphereUGFR
 
Big Data at #WADAY11
Big Data at #WADAY11 Big Data at #WADAY11
Big Data at #WADAY11 Cosimo Accoto
 
Oasis cloud-law-ics-unofficial
Oasis cloud-law-ics-unofficialOasis cloud-law-ics-unofficial
Oasis cloud-law-ics-unofficialJamie Clark
 
Progress with confidence into next generation IT
Progress with confidence into next generation ITProgress with confidence into next generation IT
Progress with confidence into next generation ITPaul Muller
 
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...Foviance
 
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietyPresentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietySURFnet
 
Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019
Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019 Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019
Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019 Amazon Web Services
 
Airtel Represented at The Mobile VAS SUMMIT 2009
Airtel Represented at The Mobile VAS SUMMIT 2009Airtel Represented at The Mobile VAS SUMMIT 2009
Airtel Represented at The Mobile VAS SUMMIT 2009Paritosh Sharma
 
Smarter Computing in a New Era of IT - Dr. Gururaj Rao
Smarter Computing in a New Era of IT - Dr. Gururaj RaoSmarter Computing in a New Era of IT - Dr. Gururaj Rao
Smarter Computing in a New Era of IT - Dr. Gururaj RaoJyothi Satyanathan
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureOdinot Stanislas
 
Big data and big content
Big data and big contentBig data and big content
Big data and big contentJohn Mancini
 
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012Amazon Web Services
 
The Enterprise Trifecta
The Enterprise TrifectaThe Enterprise Trifecta
The Enterprise Trifectasinhabipul
 

Similar to IBM - Managing Uncertain Data at Scale (20)

SAS Fraud Framework for Insurance
SAS Fraud Framework for InsuranceSAS Fraud Framework for Insurance
SAS Fraud Framework for Insurance
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
 
Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend Story
 
InfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroupInfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroup
 
Big Data at #WADAY11
Big Data at #WADAY11 Big Data at #WADAY11
Big Data at #WADAY11
 
Oasis cloud-law-ics-unofficial
Oasis cloud-law-ics-unofficialOasis cloud-law-ics-unofficial
Oasis cloud-law-ics-unofficial
 
Progress with confidence into next generation IT
Progress with confidence into next generation ITProgress with confidence into next generation IT
Progress with confidence into next generation IT
 
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
 
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietyPresentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
 
Big Data and Cloud Analytics
Big Data and Cloud AnalyticsBig Data and Cloud Analytics
Big Data and Cloud Analytics
 
Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019
Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019 Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019
Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019
 
Airtel Represented at The Mobile VAS SUMMIT 2009
Airtel Represented at The Mobile VAS SUMMIT 2009Airtel Represented at The Mobile VAS SUMMIT 2009
Airtel Represented at The Mobile VAS SUMMIT 2009
 
Smarter Computing in a New Era of IT - Dr. Gururaj Rao
Smarter Computing in a New Era of IT - Dr. Gururaj RaoSmarter Computing in a New Era of IT - Dr. Gururaj Rao
Smarter Computing in a New Era of IT - Dr. Gururaj Rao
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
Data mining applications
Data mining applicationsData mining applications
Data mining applications
 
Opening keynote gianni cooreman
Opening keynote gianni cooremanOpening keynote gianni cooreman
Opening keynote gianni cooreman
 
Big data and big content
Big data and big contentBig data and big content
Big data and big content
 
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
 
Digital Transformation
Digital TransformationDigital Transformation
Digital Transformation
 
The Enterprise Trifecta
The Enterprise TrifectaThe Enterprise Trifecta
The Enterprise Trifecta
 

More from Iosif Itkin

Foundations of Software Testing Lecture 4
Foundations of Software Testing Lecture 4Foundations of Software Testing Lecture 4
Foundations of Software Testing Lecture 4Iosif Itkin
 
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...Iosif Itkin
 
Exactpro FinTech Webinar - Global Exchanges Test Oracles
Exactpro FinTech Webinar - Global Exchanges Test OraclesExactpro FinTech Webinar - Global Exchanges Test Oracles
Exactpro FinTech Webinar - Global Exchanges Test OraclesIosif Itkin
 
Exactpro FinTech Webinar - Global Exchanges FIX Protocol
Exactpro FinTech Webinar - Global Exchanges FIX ProtocolExactpro FinTech Webinar - Global Exchanges FIX Protocol
Exactpro FinTech Webinar - Global Exchanges FIX ProtocolIosif Itkin
 
Operational Resilience in Financial Market Infrastructures
Operational Resilience in Financial Market InfrastructuresOperational Resilience in Financial Market Infrastructures
Operational Resilience in Financial Market InfrastructuresIosif Itkin
 
20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season
20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season
20 Simple Questions from Exactpro for Your Enjoyment This Holiday SeasonIosif Itkin
 
Testing the Intelligence of your AI
Testing the Intelligence of your AITesting the Intelligence of your AI
Testing the Intelligence of your AIIosif Itkin
 
EXTENT 2019: Exactpro Quality Assurance for Financial Market Infrastructures
EXTENT 2019: Exactpro Quality Assurance for Financial Market InfrastructuresEXTENT 2019: Exactpro Quality Assurance for Financial Market Infrastructures
EXTENT 2019: Exactpro Quality Assurance for Financial Market InfrastructuresIosif Itkin
 
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...Iosif Itkin
 
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan ShamraiEXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan ShamraiIosif Itkin
 
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference Open
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference OpenEXTENT Talks QA Community Tbilisi 20 April 2019 - Conference Open
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference OpenIosif Itkin
 
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...Iosif Itkin
 
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...Iosif Itkin
 
QA Community Saratov: Past, Present, Future (2019-02-08)
QA Community Saratov: Past, Present, Future (2019-02-08)QA Community Saratov: Past, Present, Future (2019-02-08)
QA Community Saratov: Past, Present, Future (2019-02-08)Iosif Itkin
 
Machine Learning and RoboCop Testing
Machine Learning and RoboCop TestingMachine Learning and RoboCop Testing
Machine Learning and RoboCop TestingIosif Itkin
 
Behaviour Driven Development: Oltre i limiti del possibile
Behaviour Driven Development: Oltre i limiti del possibileBehaviour Driven Development: Oltre i limiti del possibile
Behaviour Driven Development: Oltre i limiti del possibileIosif Itkin
 
2018 - Exactpro Year in Review
2018 - Exactpro Year in Review2018 - Exactpro Year in Review
2018 - Exactpro Year in ReviewIosif Itkin
 
Exactpro Discussion about Joy and Strategy
Exactpro Discussion about Joy and StrategyExactpro Discussion about Joy and Strategy
Exactpro Discussion about Joy and StrategyIosif Itkin
 
FIX EMEA Conference 2018 - Post Trade Software Testing Challenges
FIX EMEA Conference 2018 - Post Trade Software Testing ChallengesFIX EMEA Conference 2018 - Post Trade Software Testing Challenges
FIX EMEA Conference 2018 - Post Trade Software Testing ChallengesIosif Itkin
 
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)Iosif Itkin
 

More from Iosif Itkin (20)

Foundations of Software Testing Lecture 4
Foundations of Software Testing Lecture 4Foundations of Software Testing Lecture 4
Foundations of Software Testing Lecture 4
 
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
 
Exactpro FinTech Webinar - Global Exchanges Test Oracles
Exactpro FinTech Webinar - Global Exchanges Test OraclesExactpro FinTech Webinar - Global Exchanges Test Oracles
Exactpro FinTech Webinar - Global Exchanges Test Oracles
 
Exactpro FinTech Webinar - Global Exchanges FIX Protocol
Exactpro FinTech Webinar - Global Exchanges FIX ProtocolExactpro FinTech Webinar - Global Exchanges FIX Protocol
Exactpro FinTech Webinar - Global Exchanges FIX Protocol
 
Operational Resilience in Financial Market Infrastructures
Operational Resilience in Financial Market InfrastructuresOperational Resilience in Financial Market Infrastructures
Operational Resilience in Financial Market Infrastructures
 
20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season
20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season
20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season
 
Testing the Intelligence of your AI
Testing the Intelligence of your AITesting the Intelligence of your AI
Testing the Intelligence of your AI
 
EXTENT 2019: Exactpro Quality Assurance for Financial Market Infrastructures
EXTENT 2019: Exactpro Quality Assurance for Financial Market InfrastructuresEXTENT 2019: Exactpro Quality Assurance for Financial Market Infrastructures
EXTENT 2019: Exactpro Quality Assurance for Financial Market Infrastructures
 
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...
 
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan ShamraiEXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
 
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference Open
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference OpenEXTENT Talks QA Community Tbilisi 20 April 2019 - Conference Open
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference Open
 
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...
 
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...
 
QA Community Saratov: Past, Present, Future (2019-02-08)
QA Community Saratov: Past, Present, Future (2019-02-08)QA Community Saratov: Past, Present, Future (2019-02-08)
QA Community Saratov: Past, Present, Future (2019-02-08)
 
Machine Learning and RoboCop Testing
Machine Learning and RoboCop TestingMachine Learning and RoboCop Testing
Machine Learning and RoboCop Testing
 
Behaviour Driven Development: Oltre i limiti del possibile
Behaviour Driven Development: Oltre i limiti del possibileBehaviour Driven Development: Oltre i limiti del possibile
Behaviour Driven Development: Oltre i limiti del possibile
 
2018 - Exactpro Year in Review
2018 - Exactpro Year in Review2018 - Exactpro Year in Review
2018 - Exactpro Year in Review
 
Exactpro Discussion about Joy and Strategy
Exactpro Discussion about Joy and StrategyExactpro Discussion about Joy and Strategy
Exactpro Discussion about Joy and Strategy
 
FIX EMEA Conference 2018 - Post Trade Software Testing Challenges
FIX EMEA Conference 2018 - Post Trade Software Testing ChallengesFIX EMEA Conference 2018 - Post Trade Software Testing Challenges
FIX EMEA Conference 2018 - Post Trade Software Testing Challenges
 
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)
 

IBM - Managing Uncertain Data at Scale

  • 1. Click to add text Managing Uncertain Data at Scale Nikolay Marin © 2013 IBM Corporation
  • 2. Managing Uncertain Data at Scale Managing Uncertain Data at Scale  By 2015, 80% of the world’s data will be uncertain Trend: Most of the world’s analyzed  Uncertain data management requires new techniques data will be uncertain  These techniques are necessary for real-world Big Data Analytics Opportunity:  Robust, business-aware uncertain data management Business leadership  Use analytics over uncertain web, sensor, and human-generated data using Big Data Analytics  Enable good business decisions by understanding analysis confidence Challenge: Taking  Analysis of text is highly nuanced; sensor-based data is imprecise Big Data Analytics  Timely business decisions require efficient large-scale analytics into an uncertain world  It is more difficult to obtain insight about an individual than a group, especially if the source data is uncertain © 2013 3IBM Corporation 2
  • 3. Managing Uncertain Data at Scale The fourth dimension of Big Data: Veracity – handling data in doubt Volume Velocity Variety Veracity* Data in Many Data at Rest Data in Motion Data in Doubt Forms Terabytes to Streaming data, Structured, Uncertainty due to exabytes of existing milliseconds to unstructured, text, data inconsistency data to process seconds to respond multimedia & incompleteness, ambiguities, latency, deception, model approximations * Truthfulness, accuracy or precision, correctness © 2013 3IBM Corporation 3
  • 4. Managing Uncertain Data at Scale Uncertainty arises from many sources Process Uncertainty Data Uncertainty Model Uncertainty Processes contain Data input is uncertain All modeling is approximate “randomness” Intended Actual Spelling Text Entry Spelling ? ? ? Fitting a curve to data Uncertain travel times GPS Uncertainty ? ? Testimony ? {Paris Airport} Ambiguity {John Smith, Dallas} Semiconductor yield {John Smith, Kansas} Forecasting a hurricane Contaminated? (www.noaa.gov) Rumors Conflicting Data © 2013 3IBM Corporation 4
  • 5. Managing Uncertain Data at Scale By 2015, 80% of all available data will be uncertain By 2015 the number of networked devices will be double the entire global population. All 9000 sensor data has uncertainty. 8000 100 Global Data Volume in Exabytes 90 The total number of social media 7000 accounts exceeds the entire global Aggregate Uncertainty % 80 population. This data is highly uncertain 6000 in both its expression and content. 70 s) 5000 of r s in g rn nso 60 Th Data quality solutions exist for e 4000 S 50 et enterprise data like customer, te (In 3000 40 product, and address data, but this is only a fraction of the ia ) M ed d text 2000 30 total enterprise data. i a l an S ,oc audio 20 eo P 1000 (vid VoI 10 0 Enterprise Data Multiple sources: IDC,Cisco 2005 2010 2015 © 2013 3IBM Corporation 5
  • 6. Managing Uncertain Data at Scale How to reduce uncertainty in processes, models, and data Constructing context for better understanding  Extract as much information as feasible from each source  Combine (condense) data from multiple sources  More data from more sources is better – Gathers more evidence for statistical methods Using statistical methods scaled for Big Data  Stochastic techniques efficiently reason about uncertainty  Monte Carlo techniques explore many possible scenarios in order to gain insight Requires specific business process and industry context © 2013 3IBM Corporation 6
  • 7. Managing Uncertain Data at Scale Statistical techniques reduce uncertainty in analytical models Attributes Trouble tickets Help agent find similar tickets Use stochastic search to find trouble tickets that are similar Trouble ticket attributes Model approximation Prediction  Some attributes such as server type  Treat N attributes as N are precise dimensions in space  Improve predictability by getting  Other attributes such as words in  Model similarity as closeness in agent feedback trouble tickets may be imprecise the N dimensional space indicators of the problem  Improve suggestions for similar problems using corroborating data and better mathematical techniques  Analyze all the data – do not subset  Use related techniques to automate Level 1 support, finding problem clusters, etc. © 2013 3IBM Corporation 7
  • 8. Managing Uncertain Data at Scale Analytics is broadly defined as the use of data and computation to make smart decisions Data Decision point Possible outcomes  Data instances Historical 1 n  Reports and queries on Optio data aggregates  Predictive models Option 2  Answers and confidence Opt Simulated ion  Feedback and learning 3 Text Video, Images Audio © 2013 3IBM Corporation 8
  • 9. Managing Uncertain Data at Scale Future of Analytics Explosion of  Creates new analytics opportunities unstructured data  Addresses new enterprise needs Consistent, extensible, and  Reduces cost-to-value for enterprises consumable analytics  Increases analytics solution coverage with limited supply of skills platform Optimizing across  Analytics becomes a dominant IT workload and drives HW design the stack to deploy  Opportunity to seamlessly scale from terascale to exascale analytics at scale © 2013 3IBM Corporation 9
  • 10. Managing Uncertain Data at Scale Analytics toolkits will be expanded to support ingestion and interpretation of unstructured data, and enable adaptation and learning Adaptive Analysis Responding to context  Learn In the context of Continual Analysis Responding to local change/feedback New the decision Methods Optimization under Uncertainty Quantifying or mitigating risk process  Decide and Act Optimization Decision complexity, solution speed Predictive Modeling Causality, probabilistic, confidence levels Simulation High fidelity, games, data farming  Understand Forecasting Larger data sets, nonlinear regression and Predict Tradi- tional Alerts Rules/triggers, context sensitive, complex events Query/Drill Down In memory data, fuzzy search, geo spatial Ad hoc Reporting Query by example, user defined reports  Report Standard Reporting Real time, visualizations, user interaction Entity Resolution People, roles, locations, things  Collect and New Relationship, Feature Extraction Rules, semantic inferencing, matching Ingest/Interpret Data Decide what to count; Annotation and Tokenization Automated, crowd sourced enable accurate counting Extended from: Competing on Analytics, Davenport and Harris, 2007 © 2013 3IBM Corporation 10
  • 11. Managing Uncertain Data at Scale Finally...what about a longer term view.... say the next 10-50 years? 1. Artificial Intelligence 2. Nano –“everything” 3. Cognitive Computing 4. Deep (Exascale) Computing 5. Automic & Quantum Computing 6. Human / Computer Interaction 7. Machine to Machine Interaction 8. BioTech / Human Augmentation 9. Robots & Robotics 10. Advanced / Predictive Analytics 11. Security & Privacy 12. 3-D Printing 13. Video-enabled Business Processes 14. Personalized Web/Assistants 15. Ubiquitous Computing 16. Gaming 17. Simulation 18. Virtual Computing (including virtual worlds, tele-presence, etc.) 19. Augmented Reality IBM Academy of Technology and Global Technology Outlook can help you find some answers © 2013 3IBM Corporation 11
  • 12. Managing Uncertain Data at Scale © 2013 3IBM Corporation