4. 4
• Massive volume of both structured and
unstructured data , which is difficult to process
using traditional database and software
techniques.
5. 5
• Approximately 90 percent of all the real-time
information being created today is unstructured data.
• Every day 2.5 quintillion (1018) bytes of data are being
generated.
• 90 percent of world's present day data has been
created in the two previous years alone.
6. 6
• High Volume – amount of data
– Facebook take 500 terabytes of new data every day
• High Variety – range of data types and sources
– geospatial data, audio and video etc..
• High Velocity – speed of data in and out
– on-line gaming systems support millions of
concurrent users, each producing multiple inputs
per second
8. 8
What is NoSQL
• Non-relational next generation databases
• Different data models (Not relational)
• A collection of very different products
• Most are not using SQL for queries
• No predefined schema
9. 9
• Relational
• Divide into tables, relate into foreign keys, DB
constraints, normalized data, the Interface is SQL
• NoSQL
• Store in schemaless format, redundancy
encouraged,Interface varies and is optimized for the
implementation, no forced DB constraints.
10. 10
Document DB
MongoDB, CouchDB
Wide Column– Column Family
Cassandra, HBASE, Amazon SimpleDB
Key Value
DynamoDB, Voldemort, MemcacheDB
Graph
Neo4J, OrientDB
Many many many, many more! (http://nosql-database.org/)
12. 12
• MongoDB is an open-source NoSQL-Database developed by
10gen in C++.
• Developed in 2009
• It consists of set of databases
• In which each database contains multiple collections.
• Every collection can contain different types of objects
called document
• Every document is represented as a JSON structure: a list of
key-value pairs.
17. 17
• Data stored as documents (BSON)
• Schema free
• Dynamic Queries
• Capped collections
• Sharding (sometimes called partitioning) for
scalability
• Replication
18. 18
• MongoDB does not need any defined data schema.
• Every document could have different data!
name: “jeff”,
eyes: “blue”,
height: 72,
boss: “ben”}
{name: “brendan”,
aliases: [“el diablo”]}
name: “ben”,
hat: ”yes”}
{name: “matt”,
pizza: “DiGiorno”,
height: 72,
boss: 555.555.1212}
{name: “will”,
eyes: “blue”,
birthplace: “NY”,
aliases: [“bill”, “la
ciacco”],
gender: ”???”,
boss: ”ben”}
19. 19
• JSON (Java Script Object Notation)-exchange and store data
• schemaless
• Excellent performance
• MongoDB actually use BSON (Binary JSON).
20. 20
• Work with data in BSON is same as in JSON.
• It is serialized into a special binary encoded format
21. 21
• Run a query without planning for it in advance
• Like SQL queries in an RDBMS
• Wonder why listed as a feature?
• Not all NoSQL databases support dynamic queries
(CouchDB biggest “competitor”)
22. 22
• Fixed-size collections called capped collections
• Use the db.createCollection command and marked it as
capped
//limit our capped collection to 2 megabyte
db.createCollection(’logs’, {capped: true, size: 2097152})
• When it reaches 2MB limit, old documents are
automatically removed
23. 23
• Scale up, vertical scaling
Increasing server capacity (more CPU, RAM)
Managing is hard, possible down times
• Scale out, horizontal scaling
Adding servers
No single point of failure cause whole system (servers fail often, thus
replication)
24. 24
• MongoDB provides horizontal scale-out for
databases using a technique called sharding
• MongoDB distributes data across multiple physical
partitions called shards(mongod).
25. 25
Manual sharding :
application maintains connections to several completely
independent database servers
storing different data on different servers
querying against the appropriate server to get data back
Autosharding:
which eliminates some of the administrative headaches of
manual sharding.
• It handles splitting up data and rebalancing automatically.
26. 26
• Collections smaller chunks
• Chunks distributed across shards
• Each shard is responsible for a subset of the total data
set
• Application don’t want to know what shard has what
data
• So run a routing process called mongos in front of the
shards
• It knows where all of the data is located, so applications
can connect to it
• It connected to a normal mongod
28. 28
• Replication via replica set
• Replica sets distribute data across machines for
– redundancy
– automate failover
29. 29
• It consist of
• exactly one primary node
• one or more secondary nodes
• It primary node can accept both reads and writes
• The secondary nodes are read-only