The document discusses the rapid growth of data on the web and how NoSQL databases provide an alternative to traditional relational databases by being able to handle massive amounts of unstructured and semi-structured data across a large number of servers in a simple and scalable way. It reviews different types of NoSQL databases like key-value stores, document databases, and graph databases and provides examples of popular NoSQL databases like MongoDB, CouchDB, HBase, and Neo4j that are being used by large companies to store and query large datasets.
2. About me
• Student at Faculty of Computer Science -
first year
• Ruby & Rails fan
• Building RailsAdmin for RubySOC 2010
• Interested in Scalability and High-Availability
• http://twitter.com/hurrycane
4. Data growth on the web
• Facebook Photos +25TB/week
• Twitter +7TB/day
• Flickr +21GB/hour
• +150GB of tweets while I give this talk
• All this amounts multiply every year
5. Data size
988.00
1000.00
750.00
623.00
500.00
397.00
253.00
250.00
161.00
0
2006 2007 2008 2009 2010
ExaBytes of data stored on the web 1ExaByte = 1018 bytes (source IDC)
7. Databases
More T RDBMS
MySQL, Oracle
Online transaction
processing (OLTP)
NoSQL
Mongodb, couchdb
Less T
8. RDBMS
• Relational database management system
• Based on E.F. Codd - relational model - 1969
• Dominant for transactional & analytical
applications
• Most popular and easy to use RDBMS is
MySQL
9. RDBMS performance
SalaryList
Majority of
Web Apps
Performance
Social networks
Trend analysis
Data complexity
10. NoSQL
• Doesn’t mean No to SQL
• It actually means Not Only SQL
• A class of data stores that may not require
fixed table schemas (wikipedia)
• Majority of them based on Google’s BigTable
12. Common grounds
• They all support huge amount of data
• The majority of them supports replication
and sharding
• Have some sort of failure detection
mechanism
• Can scale to the complexity of data they
store
13. Key-Value stores
• Scales to huge amount of data
• Can handle massive load
• Based on Amazon’s Dynamo
• A big persistent associative-array
• Examples: Voldermort, Tokyo Tyrant/Cabinet
14. BigTable clones
• Tables similar to RDBMS but semi-
structured
• Based on Google’s BigTable paper
• Column oriented
• Examples: HBase, Cassandra
15. Document databases
• Similar to Key-Value Stores but the DB
knows what the Value is
• Collection of key-value collections
• Documents are often versioned
• Example: CouchDB, MongoDB, Redis
16. Graph databases
• Focus in structure of data
• Scales to the complexity of data
• Data stored in nodes
• Lots of cool Graph algorithms can be
implemented
• Examples: Neo4J, FlockDB
17. But how do I query it?
• RESTful interface
• QueryAPIs
• SPARQL
• Gremlin - graph traversal database
• GQL - SQL-like Query Lange for Google BT
19. Who uses NoSQL?
BigTable Cassandra
FlockDB
Dynamo Voldemort
MongoDB
20. • Scalable, high-performance, open-source,
document oriented database
• Somewhere between key-value stores and
document databases
• Big community
• Tons of features
21. • JSON-style documents
{ author: 'joe', created : new
Date('03-28-2009'), title : 'Yet another blog
post' }
• Schema free
• Flexibility
• Data is stored in collections
22. • Dynamic queries
db.people.update( { name:"Joe" }, { $inc: { n : 1 } } );
• Replication - very easy to set up
• Indexing
• Sharding
• GeoSpatial Index - location based queries
23. • Map Reduce search
res = db.events.mapReduce(m, r, { query : {type:'sale'} });
• Simple querying
db.users.find({'last_name': 'Smith'})
• GridFS - for storing large files
• Support for many programming languages
30. Hummingbird
• Real time Web Traffic Visualiser
• Let’s you see visitors interacting with your
site in real time
• Did a say real time?
31. Hummingbird technology
• Node.js
Evented TCP server written in JavaScript
powered by the V8 JS engine.
• WebSocks
• SVG Graphs
• Real time = 20 times per second