The document discusses Treasure Data, a big data analytics service that allows customers to analyze large volumes of data in the cloud without needing specialist resources. Some key points:
- Treasure Data was founded to deliver cost-effective big data analytics without months of setup or expensive infrastructure.
- It offers a subscription-based cloud service that can store over 100 billion records and process thousands of messages per second.
- Major customers include Fortune 500 companies processing tens of billions of records.
- The service integrates with Heroku and allows developers to easily collect, store and query data to power applications.
- It aims to simplify big data analytics in the cloud compared to on-premise Hadoop deploy
3. Treasure Data Overview
Founded to deliver big data analytics in days not months without
specialist IT resources for one-tenth the cost of other alternatives
Service based subscription business model
World class open source team
• Founded world’s largest Hadoop User Group
• Developed Fluentd and MessagePack
• Contributed to Memcached, Hibernate, etc.
Treasure Data is in production
• 20 customers incl. Fortune 500 companies
• 100+ billion records stored
Processing 10,000 messages per second
3
4. Our Customers – Fortune Global 500
leaders and start-ups including:
4
5. One Hundred Billion Records and
Growing!
120
100
80
60
40
20
Sep Nov Jan Mar May Jul Aug
2011 2011 2012 2012 2012 2012 2012
5
7. Treasure Data Service
“Store Your Data Now for Future Insights”
User
Apache
App
Treasure Data
RDBMS columnar data storage
App
Other data sources
MAPREDUCE JOBS
HIVE, PIG (to be supported)
td-command
Query
Query
Processing
API
JDBC, REST Cluster
User BI apps
7
8. Treasure Data Service
“Store Your Data Now for Future Insights”
User
Apache
2012-02-04 01:33:51
App
myappdb.buylog { Data
Treasure
RDBMS columnar data storage
App
“user”: ”12345”,
Other data sources
“path”: “/buyItem”,
“price”: 150,
MAPREDUCE JOBS
td-command
HIVE, PIG (to be supported)
“referer”: “/landing”
Query
} Query
Processing
API
JDBC, REST Cluster
User BI apps
8
9. Treasure Data Service
“Store Your Data Now for Future Insights”
User
Apache
$ td query -w -d myappdb
App "SELECT Treasure Data
App RDBMS TD_TIME_FORMAT(time, data storage
columnar "yyyy-MM-dd", "PDT"
COUNT(1) AS cnt
Other data sources FROM buylog
GROUP BY TD_TIME_FORMAT(time, "yyyy-MM-dd"
MAPREDUCE JOBS
ORDER BY cnt"
HIVE, PIG (to be supported)
td-command
Query
Query
Processing
API
JDBC, REST Cluster
User BI apps
9
10. Treasure Data Service
“Store Your Data Now for Future Insights”
User
Apache
App
Treasure Data
RDBMS columnar data storage
App
Other data sources
+------------+------+
MAPREDUCE JOBS
| day | cnt |
HIVE, PIG (to+------------+------+
be supported)
td-command
| 2012-05-26 | 4981 | Query
Query
Processing
JDBC, REST| 2012-05-27 | 4481 |
API
Cluster
User BI apps | 2012-05-28 | 481 |
+------------+------+
10
29. Big Data for the Rest of Us
www.treasure-data.com | @TreasureData
30.
31.
32. Great Investors
Bill Tai
Naren Gupta –Nexus Ventures, Director of Red Hat, TIBCO
Dave Stamm – Clarify, Daisy Systems, Enkata
Othman Laraki –Twitter
James Lindembaum, Adam Wiggins and Orion Henry – Heroku
Anand Babu Periasamy and Hitesh Chellani –Gluster
Yukihiro “Matz” Matsumoto – Creator of Ruby, now at Heroku
Dan Schienman – Former Cisco SVP
Jean-Philippe Emelie Marcos – Tango, D.E. Shaw
+ executives from Cisco, Red Hat, Salesforce.com, GREE
32
33. What are your options?
Traditional OnPremise Hadoop
• Never design for analytic
processing
• Too many people
• Too much software from too
many sources
Cloud Hadoop
Too much complexity
• Partial solution
Too long to get live
• Vendor lock-in
Too expensive to maintain
Can only innovate at speed of
vendor
33