2. Big Data: Web, SNS, System log, voice data, images/video, sensor
data…
Growth rate is 45%/year
◦ Increase of “unstructured data” such as sensor data
Sensor data
35ZB
(5 billions
Structured data phones)
in 2020
Customer data 45%growth/year
images/video
SNS
Business data
0.8ZB
(Processed data:100TB/day) (uploaded videos: 60,000/week)
in 2009
Unstructured data
(8,000Tweets/sec)
2
3. Hadoop:A de-facto distributed computing framework for Big
Data
But not suitable for realtime processing and in-depth analysis
Realtime Processing In-depth Analysis
Batch Processing Simple Statistics
Big data
3
4. Batch application Realtime application
Simple Analysis In-depth Analysis
(Statistics) (classification, estimation, prediction)
Big
Data
Realtime(Online)
Batch(Stored)
Jubatus
4
5. Requirements: “Scalability,” ”Realtime processing,”
and “In-depth analysis”
Joint development with Preferred Infrastructure
In-depth
Analysis
SVMlight
Scalabili
ty
References:
•Hadoop->http://hadoop.apache.org/
•mahout->http://mahout.apache.org/
•WEKA->http://weka-jp.info/
•SVMlight->http://svmlight.joachims.org/
•Yahoo! S4->http://s4.io/
•TwitterStorm->http://engineering.twitter.com/2011/08/storm-is-
coming-more-details-and-plans.html
•CEP-> Complex Event Processing
5
6. 【Big Data】Big stream⇒ worldwide:8000 Tweets/sec,Japanese:500~2000tweets/sec
【Realtime processing】recognition of “good”/”bad” newsby learning ⇒ following up
bursty tweets
【In-deapth analysis】automatic classification of “tweets related to topics of
interest(keyword)”
Realtime Client Application
analysis by
【Big Data】
tweets
Jubatus keyword: NTT
Worldwide: 8000Tweets/sec Monitoring for NTT-related
Japanese: 2000Tweets/sec tweets
Unnecessarily to contain
results “NTT”
Twitte
r
【Realtime】【in-depth analysis】
Automatic realtime classification
for highly related tweets with the
concerned issue(keyword)
6
7. Realtime recommendation for E-Commercesites/On demand TV
・Conventional batch processing:a recommended item for a certain period
・Jubatus:instant recognition of sudden changes in buying trend
Recommendation
accuracy
Customers
Sudden order increase
after the death of a celebrity
Customer
Sudden order increase
buying after a TV expose
history
Real behavior
Jubatus
Batch
processing
Realtime recommendation time
by Jubatus
Recommended items are updated in
realtime by relating other customers’
buying history trends
7
8. 2-3machines
for current
Company
【Big Data】 Twitter stream Category
Tweets
Worldwide: CompanyA
8000Tweets/sec CompanyB
CompanyC
Twitter
CompanyD
...
【Realtime】 &【in-depth
analysis】
Realtime automatic company
classification for “tweets” 8
9. 【Big Data】& 【Realtime processing】
100,000/sec update throughput per server
Buying/search
queries Item Item Item Item
...
1 2 3 X
User
○ ○ ○
A
User
B ○
... ○ ○
User
○ ○
Y
Recommend
ed item
【Big Data】&【In-depth analysis】
Response time: 0.1sec for 30 million
users(x10 faster than Mahout)
9
10. Jubatus OSS website
◦ http://jubat.us
◦ 2nd edition will be
released on 17th Feb.
Features
1st ed. Linear classification
Regression,Statistics,
2nd ed.
Recommendation
OSS community
Web: http://jubat.us
Github https://github.com/jubatus/jubatus
Twitter @JubatusOfficial 10