SlideShare uma empresa Scribd logo
1 de 15
Emerging Technologies
      DIY Analytics
IBM Software for a Smarter Planet


       Emerging Technology - What Do We Do?


       Innovation/collaborations in technologies
       that we hope garner broad industry
       adoption in timeframe of 12 -18 months

       Our technology initiatives are refined based
       on the marketplace & evolution of web
       technologies

       Voice of the Customer – early & direct
       customer engagements (POCs) to iterate
       on both the technology and the business
       value




IBM Confidential                                Chart   2   © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Evolving Emerging Technology Focus Areas


         Big Data Analytics for Business
         Professionals - DIY Analytic Tool &
         middleware - enabling massive amounts
         of data to be in analyzed for actionable
         insights

         Web Browser Application Platform -
         pushing the envelope of next
         generation RIA applications & tooling
         delivered with web browser reach &
         economics

         Mobile - next generation Enterprise-
         Consumer applications & architecture




IBM Confidential                                    Chart   3   © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Evolving Emerging Technology Focus Areas


         Big Data Analytics for Business
         Professionals - DIY Analytic Tool &
         middleware - enabling massive amounts
         of data to be in analyzed for actionable
         insights

         Web Browser Application Platform -
         pushing the envelope of next
         generation RIA applications & tooling
         delivered with web browser reach &
         economics

         Mobile - next generation Enterprise-
         Consumer applications & architecture




IBM Confidential                                    Chart   4   © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Evolving Emerging Technology Focus Areas


         Big Data Analytics for Business
         Professionals - DIY Analytic Tool &
         middleware - enabling massive amounts
         of data to be in analyzed for actionable
         insights

         Web Browser Application Platform -
         pushing the envelope of next
         generation RIA applications & tooling
         delivered with web browser reach &
         economics

         Mobile - next generation Enterprise-
         Consumer applications & architecture




IBM Confidential                                    Chart   5   © 2009 IBM Corporation
IBM Software for a Smarter Planet


       New Intelligence




                      DIY Analytics
     Making Hadoop accessible
    to the business professionals




IBM Confidential                           Chart   6   © 2009 IBM Corporation
IBM Software for a Smarter Planet


       New Intelligence - New Class of Application On Horizon

        Hear business users asking for the
        ability to directly manipulate, analyze &
        remix massive data sources & services
        • LOB “… Google wetted my appetite...I
             want more customizable analytics with
             me in the drivers seat…”                               Rich
                                                                  Spectrum
                                                                 DIY Analytic
        Leveraging easy-to-use, rich data
        manipulation metaphors like                              Applications
        spreadsheets, etc..                                       Emerging


        Rich visualizations to quickly identify
        insights




IBM Confidential                                     Chart   7            © 2009 IBM Corporation
IBM Software for a Smarter Planet


       IBM Emerging Technology Project: BigSheets

        What is it?
        An insight engine for enabling ad-hoc business insights for
        business users - at web scale


        How does it work?
        Discovery Process
        1. point BigSheets to data sources of interests
           • unstructured web data, feeds, XML, etc..
        2. transform data into a form that can be analyzed
           • Unstructured data becomes semi-structured data
           • Example: name: Rod Smith, employer: IBM, state: GA
           • Apply analytics - enriching the data
        3. “what if tooling” - browser-based visual front end - spreadsheet
           metaphor to create worksheets for exploring/visualizing the big data



        What’s different?
        • Unlocking insights embedded in unstructured data
        • Analyzing data previously unavailable to analyze


IBM Confidential                                                  Chart   8       © 2009 IBM Corporation
IBM Software for a Smarter Planet


       BigSheets: Framework on Hadoop


      Expanding upon the Hadoop stack
      • Visual tooling builds extensively on Pig

      Big Sheets Architecture Characteristics:
      • Extensible via UDFs
      • REST API for customer choice of analytic service/
           engine
      •    REST APl for choice of visualization packages
      •    Export content as feeds, XML, etc..
      •    ...more to come




IBM Confidential                                           Chart   9   © 2009 IBM Corporation
IBM Software for a Smarter Planet


        BigSheets in action

                                                   Crowd sourcing - Nikon: what are folks on
                                                   twitter saying about our cameras - by model




[                      Input
    Gather Daily Tweets for May
    • 64 million tweets per day
    •   ~210 terabytes a month              ][
                                             •
                                             •
                                                            Map
                                                 Split data across cluster
                                                 Emit tweets mentioning Nikon
                                                 cameras (key=Nikon D90, …)     ][
                                                                                 •
                                                                                 •
                                                                                 •
                                                                                     model
                                                                                             Reduce

                                                                                     D90: 300 tweets
                                                                                     D3000: 68 tweets             ]
                                                                                     Aggregate tweets for each Nikon

                                                                                                                       •
                                                                                                                       •
                                                                                                                               Output
                                                                                                                       Perform sediment analysis
                                                                                                                       • “..Wow, Great, Incredible…”
                                                                                                                           “..Lousy, sucks, ... “
                                                                                                                           “..no RAW support...”




IBM Confidential                                                     Chart 10
                                                                            3                                                           © 2009 IBM Corporation
IBM Software for a Smarter Planet


       A Demonstration of BigSheets in action

                                              Crowd sourcing - What do people want to buy?

                   What do people want to buy

                   • Gather

                   • Created an analysis model, using IBM Content      Analytics, looking for ʻbuy signalsʼ:

                    • Verb phrase indicating the desire to get something
                      • “I would really love a...”
                    • Buy Target (“I would really love to get myself a cool new phone”)
                    • Brand, Company, and opinion statements in the context of this buy statement

                    • Deployed the analysis model into BigSheets where it gets deployed across the Hadoop
                      cloud

                    ★In BigSheets each analysis model is considered a macro

                    • Visualize the results

IBM Confidential                                            Chart 11
                                                                   3                                           © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Marketplace Application Example - British Library

                                                                               The Goal
                                                                               Can an ET technology project &
                      Web Archive Opportunity                                  IBM’s Classification Module (ICM)
                                                                               electronically classify & tag web
       Libraries & archives are interested in                                  content & enable/create
       collecting & preserving the web data                                    visualizations
       • British Library has opened the UK Web Archive
            portal for researchers & historians to explore
            preserved web content
       • Parliament nearing vote to give the British Library
            the nod to archive all .uk domain data, spanning 4
            million sites & ~128TB today.
            • Today, web page classification for the 5000 British
                   Library web sites is performed by 30 folks




                                                                               Web Content To Gather:
                                                                               • British Library gathered 1.48 TB of data - 4
                                                                                 web archive files comprising ~400,000 web
                                                                                 pages from 300 archived websites

                                                                               • 4 machines (dual core), HD 1TB, 8 GBs
                                                                                 RAM


IBM Confidential                                                    Chart 12                                           © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Marketplace Application Example: AmEx or IBM
                                                                   Business Questions
                                                                   • Ongoing tracking of acquisitions and
                                                                     associated IP
                                                                   • Visualizations, e.g. corporate
                                                                     genealogy




                                  Project:                         Knowledge of Interest:
                   Improve IP Portfolio Analysis for               •   Corporate genealogies
                       Mergers & Acquisitions                      •   IP ownership roll-up
                                                                   •   Patents ranked by citation
                                                                   •   Augment analysis with items affecting IP
                     “...please collect all US Patent                  value, inventor affiliation, citation rank by
                         filings… then let’s do…”                      time




                                                                   Web Content To Gather:
                                                                   •   SEC filings, e.g. annual and quarterly reports
                                                                   •   USPTO patents, assignments and trademarks
                                                                   •   Company press releases
                                                                   •   Other M&A, inventor information from
                                                                       feeds, webpages


IBM Confidential                                        Chart 13                                            © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Let’s Talk Customers: AmEx or IBM
                                             American Express:
                             Evaluating IP with large amounts of public and private data
     Gathered 1,400,000 U.S. Patents on record from
     2002 - 2009
                                                                          ★ 90 were cited/referenced of AMEX cited patents, 24
     •      The 1,400,000 cited/referenced another 6,100,000                cited 1 time thru one cited 67 times
            U.S. & International patents
                                                                          •   3600 cases from Court of Appeals, Federal Circuit,
     ★ Odd fact: a few patents cited/referenced as many as                    1993 - 2007 (Georgetown Law)
       13,870 other patents
                                                                          ★ 43 mentions of U.S. patents issued between 2002 -
     •      ~216 are AMEX patents                                          2009; relies on exact “Patent No. 9,999,999” match

                                                                          •   Productivity improvement from weeks to hours




IBM Confidential                                               Chart 14                                                © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Conclusion


                        In God we trust
                   ...all others, bring data




IBM Confidential                           Chart 15   © 2009 IBM Corporation

Mais conteúdo relacionado

Semelhante a Disruptive Applications with Hadoop__HadoopSummit2010

ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
ITCamp
 

Semelhante a Disruptive Applications with Hadoop__HadoopSummit2010 (20)

IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
 
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
 
Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...
Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...
Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...
 
A journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercializationA journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercialization
 
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
 
An Enterprise Perspective on Cloud Innovation
An Enterprise Perspective on Cloud InnovationAn Enterprise Perspective on Cloud Innovation
An Enterprise Perspective on Cloud Innovation
 
Advance Data Visualization and Storytelling Virtual Workshop
Advance Data Visualization and Storytelling Virtual WorkshopAdvance Data Visualization and Storytelling Virtual Workshop
Advance Data Visualization and Storytelling Virtual Workshop
 
Enabling Ad Hoc Reporting
Enabling Ad Hoc ReportingEnabling Ad Hoc Reporting
Enabling Ad Hoc Reporting
 
Application Modernization: Where Consumer, Social, and Mobile Converge
Application Modernization: Where Consumer, Social, and Mobile ConvergeApplication Modernization: Where Consumer, Social, and Mobile Converge
Application Modernization: Where Consumer, Social, and Mobile Converge
 
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
 
BI on Cloud Computing
BI on Cloud ComputingBI on Cloud Computing
BI on Cloud Computing
 
AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...
AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...
AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...
 
Manage the Velocity of Change with Cloud Computing
Manage the Velocity of Change with Cloud Computing Manage the Velocity of Change with Cloud Computing
Manage the Velocity of Change with Cloud Computing
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
AI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter BusinessAI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter Business
 
What’s New in Cognos Analytics 11.1.4
What’s New in Cognos Analytics 11.1.4What’s New in Cognos Analytics 11.1.4
What’s New in Cognos Analytics 11.1.4
 
IBM_Garage_client_deck.pptx
IBM_Garage_client_deck.pptxIBM_Garage_client_deck.pptx
IBM_Garage_client_deck.pptx
 
Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...
Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...
Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...
 

Mais de Yahoo Developer Network

Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Yahoo Developer Network
 

Mais de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Disruptive Applications with Hadoop__HadoopSummit2010

  • 1. Emerging Technologies DIY Analytics
  • 2. IBM Software for a Smarter Planet Emerging Technology - What Do We Do? Innovation/collaborations in technologies that we hope garner broad industry adoption in timeframe of 12 -18 months Our technology initiatives are refined based on the marketplace & evolution of web technologies Voice of the Customer – early & direct customer engagements (POCs) to iterate on both the technology and the business value IBM Confidential Chart 2 © 2009 IBM Corporation
  • 3. IBM Software for a Smarter Planet Evolving Emerging Technology Focus Areas Big Data Analytics for Business Professionals - DIY Analytic Tool & middleware - enabling massive amounts of data to be in analyzed for actionable insights Web Browser Application Platform - pushing the envelope of next generation RIA applications & tooling delivered with web browser reach & economics Mobile - next generation Enterprise- Consumer applications & architecture IBM Confidential Chart 3 © 2009 IBM Corporation
  • 4. IBM Software for a Smarter Planet Evolving Emerging Technology Focus Areas Big Data Analytics for Business Professionals - DIY Analytic Tool & middleware - enabling massive amounts of data to be in analyzed for actionable insights Web Browser Application Platform - pushing the envelope of next generation RIA applications & tooling delivered with web browser reach & economics Mobile - next generation Enterprise- Consumer applications & architecture IBM Confidential Chart 4 © 2009 IBM Corporation
  • 5. IBM Software for a Smarter Planet Evolving Emerging Technology Focus Areas Big Data Analytics for Business Professionals - DIY Analytic Tool & middleware - enabling massive amounts of data to be in analyzed for actionable insights Web Browser Application Platform - pushing the envelope of next generation RIA applications & tooling delivered with web browser reach & economics Mobile - next generation Enterprise- Consumer applications & architecture IBM Confidential Chart 5 © 2009 IBM Corporation
  • 6. IBM Software for a Smarter Planet New Intelligence DIY Analytics Making Hadoop accessible to the business professionals IBM Confidential Chart 6 © 2009 IBM Corporation
  • 7. IBM Software for a Smarter Planet New Intelligence - New Class of Application On Horizon Hear business users asking for the ability to directly manipulate, analyze & remix massive data sources & services • LOB “… Google wetted my appetite...I want more customizable analytics with me in the drivers seat…” Rich Spectrum DIY Analytic Leveraging easy-to-use, rich data manipulation metaphors like Applications spreadsheets, etc.. Emerging Rich visualizations to quickly identify insights IBM Confidential Chart 7 © 2009 IBM Corporation
  • 8. IBM Software for a Smarter Planet IBM Emerging Technology Project: BigSheets What is it? An insight engine for enabling ad-hoc business insights for business users - at web scale How does it work? Discovery Process 1. point BigSheets to data sources of interests • unstructured web data, feeds, XML, etc.. 2. transform data into a form that can be analyzed • Unstructured data becomes semi-structured data • Example: name: Rod Smith, employer: IBM, state: GA • Apply analytics - enriching the data 3. “what if tooling” - browser-based visual front end - spreadsheet metaphor to create worksheets for exploring/visualizing the big data What’s different? • Unlocking insights embedded in unstructured data • Analyzing data previously unavailable to analyze IBM Confidential Chart 8 © 2009 IBM Corporation
  • 9. IBM Software for a Smarter Planet BigSheets: Framework on Hadoop Expanding upon the Hadoop stack • Visual tooling builds extensively on Pig Big Sheets Architecture Characteristics: • Extensible via UDFs • REST API for customer choice of analytic service/ engine • REST APl for choice of visualization packages • Export content as feeds, XML, etc.. • ...more to come IBM Confidential Chart 9 © 2009 IBM Corporation
  • 10. IBM Software for a Smarter Planet BigSheets in action Crowd sourcing - Nikon: what are folks on twitter saying about our cameras - by model [ Input Gather Daily Tweets for May • 64 million tweets per day • ~210 terabytes a month ][ • • Map Split data across cluster Emit tweets mentioning Nikon cameras (key=Nikon D90, …) ][ • • • model Reduce D90: 300 tweets D3000: 68 tweets ] Aggregate tweets for each Nikon • • Output Perform sediment analysis • “..Wow, Great, Incredible…” “..Lousy, sucks, ... “ “..no RAW support...” IBM Confidential Chart 10 3 © 2009 IBM Corporation
  • 11. IBM Software for a Smarter Planet A Demonstration of BigSheets in action Crowd sourcing - What do people want to buy? What do people want to buy • Gather • Created an analysis model, using IBM Content Analytics, looking for ʻbuy signalsʼ: • Verb phrase indicating the desire to get something • “I would really love a...” • Buy Target (“I would really love to get myself a cool new phone”) • Brand, Company, and opinion statements in the context of this buy statement • Deployed the analysis model into BigSheets where it gets deployed across the Hadoop cloud ★In BigSheets each analysis model is considered a macro • Visualize the results IBM Confidential Chart 11 3 © 2009 IBM Corporation
  • 12. IBM Software for a Smarter Planet Marketplace Application Example - British Library The Goal Can an ET technology project & Web Archive Opportunity IBM’s Classification Module (ICM) electronically classify & tag web Libraries & archives are interested in content & enable/create collecting & preserving the web data visualizations • British Library has opened the UK Web Archive portal for researchers & historians to explore preserved web content • Parliament nearing vote to give the British Library the nod to archive all .uk domain data, spanning 4 million sites & ~128TB today. • Today, web page classification for the 5000 British Library web sites is performed by 30 folks Web Content To Gather: • British Library gathered 1.48 TB of data - 4 web archive files comprising ~400,000 web pages from 300 archived websites • 4 machines (dual core), HD 1TB, 8 GBs RAM IBM Confidential Chart 12 © 2009 IBM Corporation
  • 13. IBM Software for a Smarter Planet Marketplace Application Example: AmEx or IBM Business Questions • Ongoing tracking of acquisitions and associated IP • Visualizations, e.g. corporate genealogy Project: Knowledge of Interest: Improve IP Portfolio Analysis for • Corporate genealogies Mergers & Acquisitions • IP ownership roll-up • Patents ranked by citation • Augment analysis with items affecting IP “...please collect all US Patent value, inventor affiliation, citation rank by filings… then let’s do…” time Web Content To Gather: • SEC filings, e.g. annual and quarterly reports • USPTO patents, assignments and trademarks • Company press releases • Other M&A, inventor information from feeds, webpages IBM Confidential Chart 13 © 2009 IBM Corporation
  • 14. IBM Software for a Smarter Planet Let’s Talk Customers: AmEx or IBM American Express: Evaluating IP with large amounts of public and private data Gathered 1,400,000 U.S. Patents on record from 2002 - 2009 ★ 90 were cited/referenced of AMEX cited patents, 24 • The 1,400,000 cited/referenced another 6,100,000 cited 1 time thru one cited 67 times U.S. & International patents • 3600 cases from Court of Appeals, Federal Circuit, ★ Odd fact: a few patents cited/referenced as many as 1993 - 2007 (Georgetown Law) 13,870 other patents ★ 43 mentions of U.S. patents issued between 2002 - • ~216 are AMEX patents 2009; relies on exact “Patent No. 9,999,999” match • Productivity improvement from weeks to hours IBM Confidential Chart 14 © 2009 IBM Corporation
  • 15. IBM Software for a Smarter Planet Conclusion In God we trust ...all others, bring data IBM Confidential Chart 15 © 2009 IBM Corporation