14. Web 2.0 is *pushing the envelope*…
• Scale
• CPU-intensive text analytics
• Search outside the column
• 7x24 operation
15. Web Application Heresies?
• REST and Resource-oriented data
• Cloud computing
• Map/Reduce will be next decade’s MVC
• Semi-structured data
• Grassroots, flexible schemas…microformats
• Distributed hash tables
• Offline browser clients
( * Adapted from Sam Ruby)
16.
17.
18. Save. See. Share. Secure.
FOUR PILLARS OF DATA MANAGEMENT*
( * According to Damien Katz )
20. The Three Levels of Platforms you will meet on the Internet *
ACCESS. PLUG-IN. RUNTIME.
* Marc Andreessen, http://blog.pmarca.com/2007/09/the-three-kinds.html
27. Lowers barriers to entry. Enables situational applications. Isolates concerns to app.
LEVEL 3
28. Who is building Level 3 platforms?
Ning Social Application Platform
Salesforce Sforce/AppExchange
Google Mashup Editor, et al
Second Life Scriptable 3D world
Amazon Electronic Computing Cloud
Akamai EdgeComputing
Yahoo! Pipes
IBM Mashup Maker
29. “IN THE LONG RUN, ALL CREDIBLE LARGE-SCALE
INTERNET COMPANIES WILL PROVIDE LEVEL 3
PLATFORMS”
* Marc Andreessen, http://blog.pmarca.com/2007/09/the-three-kinds.html
35. Column-oriented databases
Id Last_name First_name Salary
1 Smith Joe 40000
2 Jones Mary 50000
3 Johnson Cathy 44000
1,Smith,Joe,40000;2,Jones,Mary,50000;3,Johnson,Cathy,44000;
1,2,3;Smith,Jones,Johnson;Joe,Mary,Cathy;40000,50000,44000;
36. Google Apps Based on BigTable
• Google Reader • Google Docs
• Google Maps • Google Calendar
• Google Print • Google Page Creator
• Google Earth • Google Notebook
• Blogger.com • Google Mashup Editor
• Google Code • Etc.
• Orkut
• YouTube
41. Common themes?
• Flexible schema
• Highly distributed
• HTTP is the database driver
• JSON , XML, HTML, and JavaScript
• Full text search
42.
43. One Size Fits All
AN IDEA WHOSE TIME HAS COME AND GONE?
( * Michael Stonebraker )
44.
45. The first is that there will be a dedicated core, those that are heavily invested, either
monetarily or professionally, in the status quo, and they will resist any change.
The second is that change doesn't care about your investment.
TWO RULES FOR ANY CHANGE IN
TECHNOLOGY *
( * Joe Gregorio, http://bitworking.org/news/217/Ch-ch-changes )
56. CouchDb
• Green implementation, no legacy
• Designed to:
– implement the four pillars of data management
– leverage recent paradigm shifts
• Level 3 Data Platform
57. CouchDB: Feature Summary
Robust Data Storage Replication
REST API User Authentication
Views Built on Erlang/OTP
Append-only writes MVCC with optimistic concurrency
Etags Full text search
Map/Reduce (Your feature here. It’s open source!)
58. CouchDb: REST API
• Easy retrieval using our favorite, scalable
architecture: HTTP
• Exchange in industry-standard formats:
(XML/JSON)
• Simple and intuitive interface
60. The Hadoop stack (from a DBMS perspective)
MapReduce Java framework to write parallel scans and aggregations
Hbase Simple database
HDFS Distributed file system
IBM Impliance
Muse Query Language Declarative query language
MapReduce+ Enhancements to MapReduce
Muse Data Model Semi-structured data model
Hbase Core Databse storage, transactions
HDFS Distributed file system
61. “Luckily, there are only a handful of companies…in the world that need to operate at
[this] scale.”
DOES EVERYBODY NEED THIS DEGREE
OF SCALE?
( * Dare Obasanjo, http://www.25hoursaday.com/weblog/2007/10/06/ThoughtsOnAmazonsInternalStorageSystemDynamo.aspx )
62. “BEWARE OF FOCUSING TOO MUCH ON THE
APPS OF THE PAST WHEN LOOKING AT
PLATFORMS OF THE FUTURE”
65. Understand and communicate HTTP Resource vs. RDBMS differences
Research, explore, and push the limits of the MapReduce programming model
Discover where distributed hash tables may make sense over RDBMS
ACADEMIC PROPOSAL
66. View ourselves as a Level 3 Platform by…
Take the runtime out of the developers control
Leverage IBM’s Impliance project for massive data scaling
PROJECT ZERO PROPOSAL