Treasure Data: Big Data Analytics on Heroku

Treasure Data:
Big Data Analytics on Heroku
Muga Nishizawa, Chief Software Architect

Muga Nishizawa (@muga_nishizawa)
Chief Software Architect, Treasure Data

Treasure Data Overview
 Founded to deliver big data analytics in days not months without
specialist IT resources for one-tenth the cost of other alternatives
 Service based subscription business model
 World class open source team
• Founded world’s largest Hadoop User Group
• Developed Fluentd and MessagePack
• Contributed to Memcached, Hibernate, etc.
 Treasure Data is in production
• 20 customers incl. Fortune 500 companies
• 100+ billion records stored
 Processing 10,000 messages per second

3

Our Customers – Fortune Global 500
leaders and start-ups including:

4

One Hundred Billion Records and
Growing!
120
100
80
60
40
20
Sep Nov Jan Mar May Jul Aug
2011 2011 2012 2012 2012 2012 2012

5

Treasure Data Service
“Store Your Data Now for Future Insights”

6

User
Apache

App
Treasure Data
RDBMS columnar data storage
App

Other data sources

MAPREDUCE JOBS

HIVE, PIG (to be supported)
td-command
Query
Query
Processing
API
JDBC, REST Cluster
User BI apps

7

User
Apache
2012-02-04 01:33:51
App
myappdb.buylog { Data
Treasure
App
“user”: ”12345”,
Other data sources
“path”: “/buyItem”,
“price”: 150,
MAPREDUCE JOBS

td-command
“referer”: “/landing”
Query
} Query
Processing
API
JDBC, REST Cluster
User BI apps

8

User
Apache
$ td query -w -d myappdb
App "SELECT Treasure Data
App RDBMS TD_TIME_FORMAT(time, data storage
columnar "yyyy-MM-dd", "PDT"
COUNT(1) AS cnt
Other data sources FROM buylog
GROUP BY TD_TIME_FORMAT(time, "yyyy-MM-dd"
MAPREDUCE JOBS
ORDER BY cnt"
td-command
Query
Query
Processing
API
JDBC, REST Cluster
User BI apps

9

User
Apache

App
Treasure Data
App

Other data sources
+------------+------+
MAPREDUCE JOBS
| day | cnt |
HIVE, PIG (to+------------+------+
be supported)
td-command
| 2012-05-26 | 4981 | Query
Query
Processing
JDBC, REST| 2012-05-27 | 4481 |
API
Cluster
User BI apps | 2012-05-28 | 481 |
+------------+------+

10

Comparing On-Premise & Cloud Big Data Mkts
Cloud

Database- Big Data-as-a-
as-a- Service
Service

Traditional
DBMS Hadoop
(ODS, Data Mart) Data
Warehouse

On-Premise

Low Data Volume High

© 2012 Forrester Research, Inc. Reproduction Prohibited 11

Treasure Data as Heroku Add-on

12

Demo with Heroku

13

Synergy Effect for Data-Driven
Development!

×

14
1
0

The Power of the Cloud

Easier to Scale
Easier to Maintain
Easier to Iterate

15
1
1

Implementation Process
Traditional DW and
On-Premise Big Data

16

Implementation Process
Traditional DW and Heroku
On-Premise Big Data ×
Treasure Data

Dramatically streamlined
Implementation process

17

Viki.com: “Global Hulu”

18
1
4

Viki Before
 Hard to manage Hadoop
 Complicated data collection

19

Viki After
 No more Hadoop maintenance
 Versatile data collector, td-agent

20

How Does It Work?

22

Query Processing
Query Language

Query Execution

Columnar Data

Object Storage

23

1/4: Compile SQL into MapReduce

SELECT COUNT(DISTINCT ip) FROM tbl;

24

2/4: MapReduce is executed in parallel


cc2.8xlarge cluster compute instance (up to 100 nodes * 32 threads)

25

3/4: Columnar Data Access

10Gbps Network

Read ONLY the Required Part of Data

26

4/4: Object-based Storage

27

Enjoy Data-Driven Development!

28

Big Data for the Rest of Us

www.treasure-data.com | @TreasureData

Great Investors
 Bill Tai
 Naren Gupta –Nexus Ventures, Director of Red Hat, TIBCO
 Dave Stamm – Clarify, Daisy Systems, Enkata
 Othman Laraki –Twitter
 James Lindembaum, Adam Wiggins and Orion Henry – Heroku
 Anand Babu Periasamy and Hitesh Chellani –Gluster
 Yukihiro “Matz” Matsumoto – Creator of Ruby, now at Heroku
 Dan Schienman – Former Cisco SVP
 Jean-Philippe Emelie Marcos – Tango, D.E. Shaw
 + executives from Cisco, Red Hat, Salesforce.com, GREE

32

What are your options?
 Traditional  OnPremise Hadoop
• Never design for analytic
processing
• Too many people
• Too much software from too
many sources

 Cloud Hadoop
 Too much complexity
• Partial solution
 Too long to get live
• Vendor lock-in
 Too expensive to maintain
 Can only innovate at speed of
vendor

33

Example Use Case – MySQL to TD

35

Example Use Case – MySQL to TD

36

Treasure Data: Big Data Analytics on Heroku

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Treasure Data: Big Data Analytics on Heroku

Semelhante a Treasure Data: Big Data Analytics on Heroku (20)

Mais de Salesforce Developers Japan

Mais de Salesforce Developers Japan (20)

Treasure Data: Big Data Analytics on Heroku