SlideShare uma empresa Scribd logo
1 de 11
   Big Data: Web, SNS, System log, voice data, images/video, sensor
       data…
      Growth rate is 45%/year
       ◦ Increase of “unstructured data” such as sensor data
                                           Sensor data


                                                                                     35ZB
                                    (5 billions
Structured data                     phones)
                                                                                     in 2020
          Customer data             45%growth/year
                                                                     images/video
                                                  SNS
Business data

                    0.8ZB
                                    (Processed data:100TB/day) (uploaded videos: 60,000/week)
                    in 2009
                                                                     Unstructured data
                                         (8,000Tweets/sec)
                                                                                                2
   Hadoop:A de-facto distributed computing framework for Big
    Data
       But not suitable for realtime processing and in-depth analysis




    Realtime Processing                                  In-depth Analysis




     Batch Processing                                    Simple Statistics


                                    Big data
                                                                             3
Batch application            Realtime application


        Simple Analysis               In-depth Analysis
         (Statistics)      (classification, estimation, prediction)




Big
Data




                                               Realtime(Online)
       Batch(Stored)



                                          Jubatus
                                                                      4
   Requirements: “Scalability,” ”Realtime processing,”
    and “In-depth analysis”
   Joint development with Preferred Infrastructure

                          In-depth
                          Analysis

                   SVMlight




                                            Scalabili
                                            ty
                                            References:
                                            •Hadoop->http://hadoop.apache.org/
                                            •mahout->http://mahout.apache.org/
                                            •WEKA->http://weka-jp.info/
                                            •SVMlight->http://svmlight.joachims.org/
                                            •Yahoo! S4->http://s4.io/
                                            •TwitterStorm->http://engineering.twitter.com/2011/08/storm-is-
                                            coming-more-details-and-plans.html
                                            •CEP-> Complex Event Processing




                                                                                                              5
   【Big Data】Big stream⇒ worldwide:8000 Tweets/sec,Japanese:500~2000tweets/sec
   【Realtime processing】recognition of “good”/”bad” newsby learning ⇒ following up
    bursty tweets
   【In-deapth analysis】automatic classification of “tweets related to topics of
    interest(keyword)”
                             Realtime                      Client Application
                            analysis by
【Big Data】
tweets
                              Jubatus                        keyword: NTT
Worldwide: 8000Tweets/sec                                      Monitoring for NTT-related
Japanese: 2000Tweets/sec                                         tweets
                                                               Unnecessarily to contain
                                                 results         “NTT”

Twitte
  r




            【Realtime】【in-depth analysis】
            Automatic realtime classification
            for highly related tweets with the
            concerned issue(keyword)


                                                                                             6
Realtime recommendation for E-Commercesites/On demand TV
  ・Conventional batch processing:a recommended item for a certain period
  ・Jubatus:instant recognition of sudden changes in buying trend
                                                  Recommendation
                                                  accuracy
Customers
                                                                             Sudden order increase
                                                                             after the death of a celebrity
                                       Customer
                                                              Sudden order increase
                                       buying                 after a TV expose
                                       history
                                                                                                      Real behavior



                                                                                                      Jubatus


                                                                                                      Batch
                                                                                                      processing



                 Realtime recommendation                                                           time

                       by Jubatus


               Recommended items are updated in
            realtime by relating other customers’
                    buying history trends
                                                                                                                7
2-3machines
                            for current
                                               Company
【Big Data】                  Twitter stream     Category
Tweets
Worldwide:                                     CompanyA
8000Tweets/sec                                 CompanyB

                                               CompanyC
      Twitter
                                               CompanyD

                                                  ...




                 【Realtime】 &【in-depth
                 analysis】
                 Realtime automatic company
                 classification for “tweets”              8
【Big Data】& 【Realtime processing】
                  100,000/sec update throughput per server
Buying/search
   queries                           Item   Item   Item         Item
                                                          ...
                                     1      2      3            X

                              User
                                     ○             ○            ○
                              A

                              User
                              B             ○


                              ...    ○                          ○

                              User
                                            ○      ○
                              Y

 Recommend
   ed item
                【Big Data】&【In-depth analysis】
                Response time: 0.1sec for 30 million
                users(x10 faster than Mahout)
                                                                       9
   Jubatus OSS website
    ◦ http://jubat.us
    ◦ 2nd edition will be
      released on 17th Feb.

              Features

    1st ed. Linear classification
              Regression,Statistics,
    2nd ed.
              Recommendation
                           OSS community
                               Web: http://jubat.us
                               Github https://github.com/jubatus/jubatus
                               Twitter @JubatusOfficial                    10
11

Mais conteúdo relacionado

Semelhante a Jubatus Presentation on R&D forum 2011

September 2 Technology Trends Rpaquet
September 2 Technology Trends RpaquetSeptember 2 Technology Trends Rpaquet
September 2 Technology Trends Rpaquet
Tom_Webb
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
Odinot Stanislas
 
OpTier McKinsey Big Data Overview
OpTier McKinsey Big Data OverviewOpTier McKinsey Big Data Overview
OpTier McKinsey Big Data Overview
nickychu
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overview
optier
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overview
optier
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
IntelAPAC
 

Semelhante a Jubatus Presentation on R&D forum 2011 (20)

Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 
September 2 Technology Trends Rpaquet
September 2 Technology Trends RpaquetSeptember 2 Technology Trends Rpaquet
September 2 Technology Trends Rpaquet
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
Cloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ NetflixCloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ Netflix
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
 
Embedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of InnovationEmbedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of Innovation
 
Secure Big Data Analytics - Hadoop & Intel
Secure Big Data Analytics - Hadoop & IntelSecure Big Data Analytics - Hadoop & Intel
Secure Big Data Analytics - Hadoop & Intel
 
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentBig Data: A Big Trap for Product Development
Big Data: A Big Trap for Product Development
 
Query at Speed of Thought
Query at Speed of ThoughtQuery at Speed of Thought
Query at Speed of Thought
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Microsoft StreamInsight
Microsoft StreamInsight Microsoft StreamInsight
Microsoft StreamInsight
 
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
 
OpTier McKinsey Big Data Overview
OpTier McKinsey Big Data OverviewOpTier McKinsey Big Data Overview
OpTier McKinsey Big Data Overview
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overview
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overview
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
 
Big data and bi best practices slidedeck
Big data and bi best practices slidedeckBig data and bi best practices slidedeck
Big data and bi best practices slidedeck
 
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageWebinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
 

Mais de JubatusOfficial

Mais de JubatusOfficial (20)

新機能紹介 1.0.6
新機能紹介 1.0.6新機能紹介 1.0.6
新機能紹介 1.0.6
 
Python 特徴抽出プラグイン
Python 特徴抽出プラグインPython 特徴抽出プラグイン
Python 特徴抽出プラグイン
 
Jubakitの解説
Jubakitの解説Jubakitの解説
Jubakitの解説
 
Jubatus解説本の紹介
Jubatus解説本の紹介Jubatus解説本の紹介
Jubatus解説本の紹介
 
Jubatus 1.0 の紹介
Jubatus 1.0 の紹介Jubatus 1.0 の紹介
Jubatus 1.0 の紹介
 
地域の魅力を伝えるツアーガイドAI
地域の魅力を伝えるツアーガイドAI地域の魅力を伝えるツアーガイドAI
地域の魅力を伝えるツアーガイドAI
 
JUBARHYME
JUBARHYMEJUBARHYME
JUBARHYME
 
小町の溜息
小町の溜息小町の溜息
小町の溜息
 
単語コレクター(文章自動校正器)
単語コレクター(文章自動校正器)単語コレクター(文章自動校正器)
単語コレクター(文章自動校正器)
 
銀座のママ
銀座のママ銀座のママ
銀座のママ
 
小町のレス数が予測できるか試してみた
小町のレス数が予測できるか試してみた小町のレス数が予測できるか試してみた
小町のレス数が予測できるか試してみた
 
新聞から今年の漢字を予測する
新聞から今年の漢字を予測する新聞から今年の漢字を予測する
新聞から今年の漢字を予測する
 
かまってちゃん小町
かまってちゃん小町かまってちゃん小町
かまってちゃん小町
 
発言小町からのプロファイリング
発言小町からのプロファイリング発言小町からのプロファイリング
発言小町からのプロファイリング
 
コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用
コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用
コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用
 
まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化
まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化
まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化
 
jubarecommenderの紹介
jubarecommenderの紹介jubarecommenderの紹介
jubarecommenderの紹介
 
JubaQLご紹介
JubaQLご紹介JubaQLご紹介
JubaQLご紹介
 
Jubaanomalyについて
JubaanomalyについてJubaanomalyについて
Jubaanomalyについて
 
jubabanditの紹介
jubabanditの紹介jubabanditの紹介
jubabanditの紹介
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Jubatus Presentation on R&D forum 2011

  • 1.
  • 2. Big Data: Web, SNS, System log, voice data, images/video, sensor data…  Growth rate is 45%/year ◦ Increase of “unstructured data” such as sensor data Sensor data 35ZB (5 billions Structured data phones) in 2020 Customer data 45%growth/year images/video SNS Business data 0.8ZB (Processed data:100TB/day) (uploaded videos: 60,000/week) in 2009 Unstructured data (8,000Tweets/sec) 2
  • 3. Hadoop:A de-facto distributed computing framework for Big Data  But not suitable for realtime processing and in-depth analysis Realtime Processing In-depth Analysis Batch Processing Simple Statistics Big data 3
  • 4. Batch application Realtime application Simple Analysis In-depth Analysis (Statistics) (classification, estimation, prediction) Big Data Realtime(Online) Batch(Stored) Jubatus 4
  • 5. Requirements: “Scalability,” ”Realtime processing,” and “In-depth analysis”  Joint development with Preferred Infrastructure In-depth Analysis SVMlight Scalabili ty References: •Hadoop->http://hadoop.apache.org/ •mahout->http://mahout.apache.org/ •WEKA->http://weka-jp.info/ •SVMlight->http://svmlight.joachims.org/ •Yahoo! S4->http://s4.io/ •TwitterStorm->http://engineering.twitter.com/2011/08/storm-is- coming-more-details-and-plans.html •CEP-> Complex Event Processing 5
  • 6. 【Big Data】Big stream⇒ worldwide:8000 Tweets/sec,Japanese:500~2000tweets/sec  【Realtime processing】recognition of “good”/”bad” newsby learning ⇒ following up bursty tweets  【In-deapth analysis】automatic classification of “tweets related to topics of interest(keyword)” Realtime Client Application analysis by 【Big Data】 tweets Jubatus keyword: NTT Worldwide: 8000Tweets/sec  Monitoring for NTT-related Japanese: 2000Tweets/sec tweets  Unnecessarily to contain results “NTT” Twitte r 【Realtime】【in-depth analysis】 Automatic realtime classification for highly related tweets with the concerned issue(keyword) 6
  • 7. Realtime recommendation for E-Commercesites/On demand TV ・Conventional batch processing:a recommended item for a certain period ・Jubatus:instant recognition of sudden changes in buying trend Recommendation accuracy Customers Sudden order increase after the death of a celebrity Customer Sudden order increase buying after a TV expose history Real behavior Jubatus Batch processing Realtime recommendation time by Jubatus Recommended items are updated in realtime by relating other customers’ buying history trends 7
  • 8. 2-3machines for current Company 【Big Data】 Twitter stream Category Tweets Worldwide: CompanyA 8000Tweets/sec CompanyB CompanyC Twitter CompanyD ... 【Realtime】 &【in-depth analysis】 Realtime automatic company classification for “tweets” 8
  • 9. 【Big Data】& 【Realtime processing】 100,000/sec update throughput per server Buying/search queries Item Item Item Item ... 1 2 3 X User ○ ○ ○ A User B ○ ... ○ ○ User ○ ○ Y Recommend ed item 【Big Data】&【In-depth analysis】 Response time: 0.1sec for 30 million users(x10 faster than Mahout) 9
  • 10. Jubatus OSS website ◦ http://jubat.us ◦ 2nd edition will be released on 17th Feb. Features 1st ed. Linear classification Regression,Statistics, 2nd ed. Recommendation OSS community Web: http://jubat.us Github https://github.com/jubatus/jubatus Twitter @JubatusOfficial 10
  • 11. 11