SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
© 2006 IBM Corporation




                          Enabling ad-hoc
                           Analytic Apps
                                 Text

                           with Hadoop


                                      rod smith (rod.smith@us.ibm.com)

Friday, October 2, 2009
Hadoop World ’09



 Emerging Technology - What do we work on?




      Making Hadoop
       accessible to
         business
       professionals


October 2009                                 SWG Emerging Internet Technology   IBM Software Group



Friday, October 2, 2009
Hadoop World ’09


 New Intelligence - Big Data

     Nearly 15 petabytes of data are created
     every day — eight times more than the
     information in all the libraries in the U.S,

     Volume of data in enterprises is doubling
     approximately every 3 years (Forrester Research)
     • Includes structured and unstructured data, excludes rich
        media



     Costs to find, collect & analyze data is
     decreasing significantly as web innovation
     proceeds



     Content is untapped value for business
     insights & intelligence




October 2009                                                      SWG Emerging Internet Technology   IBM Software Group



Friday, October 2, 2009
Hadoop World ’09



 New Intelligence - New Class of Application on Horizon?

     Internet Evolution: A web of data
     sources, services for exploring &
     manipulating data, and ways that
     users can connect them together                             Extract
     (Tom Coates/Yahoo™ )



                                                Gather                                Explore
     Enterprises recognizing potential of
     leveraging the broader web for
     business intelligence coverage - as
     well as for internal data



     Next wave of content-centric webApps
     emerging
     • Long(er) running    data collection
        & analytic applications


October 2009                                   SWG Emerging Internet Technology   IBM Software Group



Friday, October 2, 2009
Hadoop World ’09



 New Intelligence - New Class of Application on Horizon?

     Internet Evolution: A web of data
     sources, services for exploring &
     manipulating data, and ways that
     users can connect them together
     (Tom Coates/Yahoo™ )




     Enterprises recognizing potential of
     leveraging the broader web for
     business intelligence coverage - as
     well as for internal data



     Next wave of content-centric webApps
     emerging
     • Long(er) running    data collection
        & analytic applications


October 2009                                   SWG Emerging Internet Technology   IBM Software Group



Friday, October 2, 2009
Hadoop World ’09



 New Intelligence - New Class of Application on Horizon?


   Hear business users asking for
   the ability to directly manipulate,
   analyze & remix massive data
   sources & services
   • LOB “… Google wetted my appetite...I
      want more customizable analytics with
      me in the drivers seat…”



   Leveraging easy-to-use, rich data
   manipulation metaphors like
   spreadsheets, etc..



   Rich visualizations to quickly
   identify insights



October 2009                                  SWG Emerging Internet Technology   IBM Software Group



Friday, October 2, 2009
Hadoop World ’09



 New Intelligence - New Class of Application on Horizon?


   Hear business users asking for
   the ability to directly manipulate,
   analyze & remix massive data
   sources & services
   • LOB “… Google wetted my appetite...I
      want more customizable analytics with                                   Rich
      me in the drivers seat…”
                                                                           Spectrum
                                                                          DIY Analytic
   Leveraging easy-to-use, rich data
   manipulation metaphors like
                                                                          Applications
   spreadsheets, etc..                                                     Emerging


   Rich visualizations to quickly
   identify insights



October 2009                                  SWG Emerging Internet Technology   IBM Software Group



Friday, October 2, 2009
Hadoop World ’09


 Let!s Talk Customer Scenarios - BBC


                                                          Business Questions
                                                          • Name names: Who is doing what, who
                                                            isn!t doing what
                                                          • Overlay voting record with
                                                            demographic & voting records over
                                                            time
                                                          • Buzz - what are people talking about?
                 BBC Digital                              • Visualize content relationships
               Democracy Project
                 Achieving Increased                      Knowledge of Interest:
                                                          • Members of Parliament (MPs)
               Government Transparency
                                                          • Bills, Debates, Voting Districts




                                                          Web Content To Gather:
                                                          • UK Parliament Web Site
                                                          • Timeframe: 10 + years




October 2009                                 SWG Emerging Internet Technology   IBM Software Group



Friday, October 2, 2009
Hadoop World ’09


 Let!s Talk Customers Scenarios - Thomson Reuters
                                                                                Business Questions
                                                                                • NewsBuzz: What are the headlines? What
                                                                                  are not the headlines but still infocus?
                                                                                • OpinionMonitor: Who is saying what? What
                                                                                  are the debate topics?
                                                                                • NewsTimeline: Chronology (pulse) of
                                                                                  headline news?
               Enrich Trader!s Desktop                                          • TopicCloud: Tag based topic metrix
                    Enhancement                                                 • IssueAnalytics: Link backs to semantically
    Timely aggregation & analytics of content                                     related news
    originating from public internet sites

    Scenario
    • Gather unstructured data from anywhere between 200 to
                                                                                Knowledge of Interest:
      2000 data sources - every 15 minutes                                      • People, places, events
    • Perform preprocessing (search, transform, index) over
      each source
    • Publish harvested content for distributed content services
      and downstream Mashups                                                    Web Content To Gather:
                                                                                •   ~118 3rd Party Finanical News Services and
                                                                                    Blogs, including: BBC, CNN ,Yahoo News,
                                                                                    Financial Times, NY Times, The Big Picture,
                                                                                    Fox News, PR Newswire, Market Watch, World
                                                                                    Press, Forbes, Google News, Wall Street ,
                                                                                    Journal, MSNBC, The Sun, ZDNet,




October 2009                                                       SWG Emerging Internet Technology   IBM Software Group



Friday, October 2, 2009
Hadoop World ’09


 IBM Emerging Technology Project: M2

                     What is it?
                     An insight engine for enabling ad-hoc business insights for
                     business users - at web scale

                     How does it work?
                     Discovery Process
                     1. point M2 to data sources of interests
                           •   unstructured web data, feeds, XML, etc..

                     2. transform data into a form that can be analyzed
                           •   Unstructured data becomes semi-structured data
                           •   Example: name: Rod Smith, employer: IBM, state: GA
                           •   Apply analytics - enriching the data

                     3. “what if tooling” - browser-based visual front end - spreadsheet
                        metaphor to create worksheets for exploring/visualizing the data

                     What!s different?
                     • Unlocking insights embedded in unstructured data
                     • Analyzing data previously unavailable to analyze


October 2009                                                                SWG Emerging Internet Technology   IBM Software Group



Friday, October 2, 2009
Hadoop World ’09


 M2 -> Demo
                                                                Business Questions
                                                                • How much is a target company worth?
                                                                • What are the high-value areas of their
                                                                  portfolio?
                                                                • Explored cited patent topics, litigated
                                                                  patents



                                                                Knowledge of Interest:
                          Project:                              • Patents ranked by citation – e.g how often
               Improve IP Portfolio Analysis                      was a patent referenced determines value
                for Mergers & Acquisitions
                                                                • Corporate genealogies IP ownership roll-up
                                                                • Augment analysis with items affecting IP
                “...please collect all US Patent                  value, inventor affiliation, citation rank by
                    filings… then let’s do…”
                                                                  time



                                                                Web Content To Gather:
                                                                • Gathered 1.4m patent docs from USPTO
                                                                • 1991-2007 case records from Court of
                                                                  Appeals United States Federal Circuit
                                                                  (CAFC)


October 2009                                       SWG Emerging Internet Technology   IBM Software Group



Friday, October 2, 2009
Hadoop World ’09


 What!s Under the Covers: Hadoop


  Emergence of map/reduce programming
  model for a new class of webApp


  Hadoop: provides a framework for large
  scale parallel processing map/reduce
  apps (Apache projects lead by Yahoo)
  • Offers simplicity of “programming” - Looks like a
     simple single threaded app model for developers

  • Handles big data -   scalable storage across
     machine clusters (think read-only file system)

  • Deployment: no application knowledge of runtime
     or OS or cloud necessary

  • Today - setting up, coding Hadoop jobs in Java,
     etc. is the domain of skilled Java engineers



October 2009                                            SWG Emerging Internet Technology   IBM Software Group



Friday, October 2, 2009
Hadoop World ’09



 IBM Emerging Technology Project: M2 Architectural Components


      Expanding upon the Hadoop stack
      •    Visual tooling builds extensively on Pig


      M2 Architecture Characteristics:
      • Extensible via UDFs
      • REST API for customer choice of analytic
           service/engine
      •    REST APl for choice of visualization packages
      •    Export content as feeds, XML, etc..
      •    ...more to come




October 2009                                               SWG Emerging Internet Technology   IBM Software Group



Friday, October 2, 2009
Hadoop World ’09


 Conclusions




                   In God we trust



October 2009                                 SWG Emerging Internet Technology   IBM Software Group



Friday, October 2, 2009
Hadoop World ’09


 Conclusions




    …all others bring data



October 2009                                 SWG Emerging Internet Technology   IBM Software Group



Friday, October 2, 2009
Hadoop World ’09


 Conclusions


         Enterprises quickly evolving their thinking
         from a Database strategy to a Data Strategy
         encompassing unstructured & structured
         content


         Repeatable business patterns in broad range
         of industries emerging


         Hadoop has potential to be the platform for
         broad range of solutions from web-based
         analytics -> business event processing ->
         collaboration




October 2009                                       SWG Emerging Internet Technology   IBM Software Group



Friday, October 2, 2009
Hadoop World ’09


 Almost The End


Selecting customer proof
  of concept projects


               INTERESTED?
                                                                              www-01.ibm.com/software/ebusiness/jstart/about.html




                                  !"#$%"&!'!()*('+,*,-



October 2009                                             SWG Emerging Internet Technology       IBM Software Group



Friday, October 2, 2009

Mais conteúdo relacionado

Mais de Cloudera, Inc.

Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Cloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionCloudera, Inc.
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Cloudera, Inc.
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloudera, Inc.
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceCloudera, Inc.
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enoughCloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR compliance
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enough
 

Último

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Hw09 Enabling Ad Hoc Analytics At Web Scale

  • 1. © 2006 IBM Corporation Enabling ad-hoc Analytic Apps Text with Hadoop rod smith (rod.smith@us.ibm.com) Friday, October 2, 2009
  • 2. Hadoop World ’09 Emerging Technology - What do we work on? Making Hadoop accessible to business professionals October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 3. Hadoop World ’09 New Intelligence - Big Data Nearly 15 petabytes of data are created every day — eight times more than the information in all the libraries in the U.S, Volume of data in enterprises is doubling approximately every 3 years (Forrester Research) • Includes structured and unstructured data, excludes rich media Costs to find, collect & analyze data is decreasing significantly as web innovation proceeds Content is untapped value for business insights & intelligence October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 4. Hadoop World ’09 New Intelligence - New Class of Application on Horizon? Internet Evolution: A web of data sources, services for exploring & manipulating data, and ways that users can connect them together Extract (Tom Coates/Yahoo™ ) Gather Explore Enterprises recognizing potential of leveraging the broader web for business intelligence coverage - as well as for internal data Next wave of content-centric webApps emerging • Long(er) running data collection & analytic applications October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 5. Hadoop World ’09 New Intelligence - New Class of Application on Horizon? Internet Evolution: A web of data sources, services for exploring & manipulating data, and ways that users can connect them together (Tom Coates/Yahoo™ ) Enterprises recognizing potential of leveraging the broader web for business intelligence coverage - as well as for internal data Next wave of content-centric webApps emerging • Long(er) running data collection & analytic applications October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 6. Hadoop World ’09 New Intelligence - New Class of Application on Horizon? Hear business users asking for the ability to directly manipulate, analyze & remix massive data sources & services • LOB “… Google wetted my appetite...I want more customizable analytics with me in the drivers seat…” Leveraging easy-to-use, rich data manipulation metaphors like spreadsheets, etc.. Rich visualizations to quickly identify insights October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 7. Hadoop World ’09 New Intelligence - New Class of Application on Horizon? Hear business users asking for the ability to directly manipulate, analyze & remix massive data sources & services • LOB “… Google wetted my appetite...I want more customizable analytics with Rich me in the drivers seat…” Spectrum DIY Analytic Leveraging easy-to-use, rich data manipulation metaphors like Applications spreadsheets, etc.. Emerging Rich visualizations to quickly identify insights October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 8. Hadoop World ’09 Let!s Talk Customer Scenarios - BBC Business Questions • Name names: Who is doing what, who isn!t doing what • Overlay voting record with demographic & voting records over time • Buzz - what are people talking about? BBC Digital • Visualize content relationships Democracy Project Achieving Increased Knowledge of Interest: • Members of Parliament (MPs) Government Transparency • Bills, Debates, Voting Districts Web Content To Gather: • UK Parliament Web Site • Timeframe: 10 + years October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 9. Hadoop World ’09 Let!s Talk Customers Scenarios - Thomson Reuters Business Questions • NewsBuzz: What are the headlines? What are not the headlines but still infocus? • OpinionMonitor: Who is saying what? What are the debate topics? • NewsTimeline: Chronology (pulse) of headline news? Enrich Trader!s Desktop • TopicCloud: Tag based topic metrix Enhancement • IssueAnalytics: Link backs to semantically Timely aggregation & analytics of content related news originating from public internet sites Scenario • Gather unstructured data from anywhere between 200 to Knowledge of Interest: 2000 data sources - every 15 minutes • People, places, events • Perform preprocessing (search, transform, index) over each source • Publish harvested content for distributed content services and downstream Mashups Web Content To Gather: • ~118 3rd Party Finanical News Services and Blogs, including: BBC, CNN ,Yahoo News, Financial Times, NY Times, The Big Picture, Fox News, PR Newswire, Market Watch, World Press, Forbes, Google News, Wall Street , Journal, MSNBC, The Sun, ZDNet, October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 10. Hadoop World ’09 IBM Emerging Technology Project: M2 What is it? An insight engine for enabling ad-hoc business insights for business users - at web scale How does it work? Discovery Process 1. point M2 to data sources of interests • unstructured web data, feeds, XML, etc.. 2. transform data into a form that can be analyzed • Unstructured data becomes semi-structured data • Example: name: Rod Smith, employer: IBM, state: GA • Apply analytics - enriching the data 3. “what if tooling” - browser-based visual front end - spreadsheet metaphor to create worksheets for exploring/visualizing the data What!s different? • Unlocking insights embedded in unstructured data • Analyzing data previously unavailable to analyze October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 11. Hadoop World ’09 M2 -> Demo Business Questions • How much is a target company worth? • What are the high-value areas of their portfolio? • Explored cited patent topics, litigated patents Knowledge of Interest: Project: • Patents ranked by citation – e.g how often Improve IP Portfolio Analysis was a patent referenced determines value for Mergers & Acquisitions • Corporate genealogies IP ownership roll-up • Augment analysis with items affecting IP “...please collect all US Patent value, inventor affiliation, citation rank by filings… then let’s do…” time Web Content To Gather: • Gathered 1.4m patent docs from USPTO • 1991-2007 case records from Court of Appeals United States Federal Circuit (CAFC) October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 12. Hadoop World ’09 What!s Under the Covers: Hadoop Emergence of map/reduce programming model for a new class of webApp Hadoop: provides a framework for large scale parallel processing map/reduce apps (Apache projects lead by Yahoo) • Offers simplicity of “programming” - Looks like a simple single threaded app model for developers • Handles big data - scalable storage across machine clusters (think read-only file system) • Deployment: no application knowledge of runtime or OS or cloud necessary • Today - setting up, coding Hadoop jobs in Java, etc. is the domain of skilled Java engineers October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 13. Hadoop World ’09 IBM Emerging Technology Project: M2 Architectural Components Expanding upon the Hadoop stack • Visual tooling builds extensively on Pig M2 Architecture Characteristics: • Extensible via UDFs • REST API for customer choice of analytic service/engine • REST APl for choice of visualization packages • Export content as feeds, XML, etc.. • ...more to come October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 14. Hadoop World ’09 Conclusions In God we trust October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 15. Hadoop World ’09 Conclusions …all others bring data October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 16. Hadoop World ’09 Conclusions Enterprises quickly evolving their thinking from a Database strategy to a Data Strategy encompassing unstructured & structured content Repeatable business patterns in broad range of industries emerging Hadoop has potential to be the platform for broad range of solutions from web-based analytics -> business event processing -> collaboration October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009
  • 17. Hadoop World ’09 Almost The End Selecting customer proof of concept projects INTERESTED? www-01.ibm.com/software/ebusiness/jstart/about.html !"#$%"&!'!()*('+,*,- October 2009 SWG Emerging Internet Technology IBM Software Group Friday, October 2, 2009