The emerging world of mongo db csp

scheduler
Reflexion
What's NOSQL means
We're thinking in changes
How is MongoDB
Let's get started MongoDB
What's going wrong?
Document structure
Basic Operations CRUD
Index Explain Hint
Data Model: be quiet - the answer
Sharding -scaling

{
"Name" : "Carlos Sánchez Pérez",
Name
"Love" : "Web Dev",
Love
"Title" : "Rough Boy",
Title
"Twitter" : "@carlossanchezp",
Twitter
"Blog" : "carlossanchezperez.wordpress.com",
Blog
"Job" : "ASPgems",
Job
"Github" : "carlossanchezp"
Github
}

REFLEXION
Through all this time one thing has stayed constant—relational
databases store the data and our decision is almost always implied

Because the true spirit of “NoSQL” does not consist in the
way data is queried. It consists in the way data is stored.
NoSQL is all about data storage.
“NoSQL” should be called “SQL with alternative storage
models”

we are thinking in
changes.....
Wait let me show you

●

How will we add new machines?

●

Are their any single points of failure?

●

Do the writes scale as well?

●

How much administration will the system require?

●

If its open source, is there a healthy community?

How much time and effort would we have to expend to deploy and
integrate it?
●

●

Does it use technology which we know we can work with?

MongoDB is a powerful, flexible, and scalable
general-purpose database

Ease of Use
MongoDB is a document-oriented database, not a relational one.
One of the reason for moving away from the relational model is to make
scaling out easier.

Ease of Use
A document-oriented database replaces the concept

“row” with a more flexible model: “document”.
By allowing embedded documents and arrays, the document-oriented
approach makes it possible to represent complex hierarchical
relationships with a single record.

Ease of Use
Without a fixed schema, adding or removing fields as needed
becomes easier.
This makes development faster as developers can quickly iterate.
It is also easier to experiment.
Developers can try a lot of models for the data and then choose the
best one.

Easy Scaling
Data set sizes for applications are growing at an incredible pace.
As the amount of data that developers need to store grows, developers
face a difficult decision:
how should they scale their databases?
Scaling a database comes down to the choice between:
scaling up : getting a bigger machine.
 scaling out : partitioning data across more machines.


Let's see some of the basic concepts of MongoDB:
• A document is the basic unit of data for MongoDB and is
equivalent to a row in a Relational Database.
• Collection can be thought of as a table with a dynamic schema.
• A single instance of MongoDB can host multiple independent
databases, each of which can have its own collections.
• One document has a special key, "_id", that is unique within a
collection.
• Awesome JavaScript shell, which is useful for the administration
and data manipulation.

Schemaless dynamics
MongoDB is a "schemaless" but it doesn't mean that you don't
need to thinking about design your schema!!
MongoDB is a “schemaless dynamic”, the meaning is don't
have an ALTER TABLE and migration.

At the core of MongoDB is the document: an ordered set of keys
with associated values.
In JavaScript, for example, documents are represented as objects:
{"name" : "Hello, MongoDB world!"}
{"name" : "Hello, MongoDB world!", "foo" : 3}
{"name" : "Hello, MongoDB world!",
"foo" : 3,
"fruit": ["pear","apple"]}

The keys in a document are strings. Any UTF-8 character is allowed in
a key, with a few notable exceptions:
• Keys must not contain the character 0 (the null character). This
character is used to signify the end of a key.
• The . and $ characters have some special properties and should be
used only in certain circumstances, but in general, they should be
considered reserved.

SQL Terms/Concepts
database
table
Row
column
index
foreign key
primary key

MongoDB Terms/Concepts
database
collection
document or BSON document
field
index
joins embedded documents and linking
automatically set to the _id field.

MongoDB is type-sensitive and case-sensitive.
For example, these documents are distinct:
{"foo" : "3"}
{"foo" : 3}
{"Foo" : 3}
MongoDB cannot contain duplicate keys:
{"name" : "Hello, world!", "name" : "Hello, MongoDB!"}

Data
COLLECTION
db.blog
methods_mongodb
DataBase
Use db_name

DOCUMENT {…..}
SUBDOCUMENT {...}
FIELD name: type

Array [….]

document {…..[{...}]......}

Or my Owner
By default ID
{ _id: ObjectID('4bd9e8e17cefd644108961bb'),
By default ID
title: 'My own Adventures in Databases',
url: 'http://example.com/exampledatabases.txt',
author: 'csp',
vote_count: 5,
Array
tags: ['databases', 'mongodb', 'indexing'],
Array
image: {
url: 'http://example.com/db.jpg',
caption: '',
SubDoc
type: 'jpg',
SubDoc
size: 75381,
data: "Binary"
},
comments: [ { user: 'abc',text: 'Nice article!'},
{ user: 'jkl',text: 'Another related article is at
http://example.com/db/mydb.txt'}
]
}
DOCUMENT

_id: 1

Array + SubDoc
Array + SubDoc

Collections
Like a
Table SQL
{“title” : “First”, “edge” : 34}

Collections

Document

Document
Like a row SQL

A collection is a group of documents. If a document is the
MongoDB analog of a row in a relational database, then a
collection can be thought of as the analog to a table.

Dynamic Schemas
Collections have dynamic schemas. This means that the
documents within a single collection can have any number of
different “shapes.”
For example, both of this documents could be stored in a single
collection:
{"greeting" : "Hello, mongoDB world!"}
{"foo" : 23}

Subcollections
One convention for organizing collections is to use namespaced
subcollections separated by the “.” character.
For example, an a Blog application might have a collection
named blog.posts and a separate collection named
blog.authors.

Basic Operations
CREATE: The insert function adds a document to a collection.
> post = {"title" : "My first post in my blog",
... "content" : "Here's my blog post.",
... "date" : new Date()}
> db.blog.insert(post)
> db.blog.find()
{
"_id" : ObjectId("5037ee4a1084eb3ffeef7228"),
"title" : "My first post in my blog",
"content" : "Here's my blog post.",
"date" : ISODate("2013-10-05T16:13:42.181Z")
}

this.insertEntry = function (title, body, tags, author, callback) {
"use strict";
console.log("inserting blog entry" + title + body);
// fix up the permalink to not include whitespace
var permalink = title.replace( /s/g, '_' );
permalink = permalink.replace( /W/g, '' );
// Build a new post
var post = {"title": title,
"author": author,
"body": body,
"permalink":permalink,
"tags": tags,
"comments": [],
"date": new Date()}
// now insert the post
posts.insert(post, function (err, post) {
"use strict";
if (!err) {
console.log("Inserted new post");
console.dir("Successfully inserted: " + JSON.stringify(post));
return callback(null, post);
}
return callback(err, null);
});
}

Basic Operations
FIND: find and findOne can be used to query a collection.
> db.blog.findOne()
{
"date" : ISODate("2012-08-24T21:12:09.982Z")
}

this.getPostsByTag = function(tag, num, callback) {
"use strict";
posts.find({ tags : tag }).sort('date', -1).limit(num).toArray(function(err,
items) {
"use strict";
if (err) return callback(err, null);
console.log("Found " + items.length + " posts");
callback(err, items);
});
}
this.getPostByPermalink = function(permalink, callback) {
"use strict";
posts.findOne({'permalink': permalink}, function(err, post) {
"use strict";
if (err) return callback(err, null);
callback(err, post);
});
}

Basic Operations
UPDATE: If we would like to modify our post, we can use update. update takes (at
least) two parameters: the first is the criteria to find which document to update,
and the second is the new document.
> post.comments = []
> db.blog.update({title : "My first post in my blog"}, post)
> db.blog.find()
{
"date" : ISODate("2013-10-05T16:13:42.181Z"),
"comments" : [ ]
}

this.addComment = function(permalink, name, email, body, callback) {
"use strict";
var comment = {'author': name, 'body': body}
if (email != "") {
comment['email'] = email
}
posts.update({'permalink': permalink},{ $push: { "comments": comment } },{safe:true},
function (err, comment) {
"use strict";
if (!err) {
console.log("Inserted new comment");
console.log(comment);
return callback(null, comment);
}
return callback(err, null);
});
}

Basic Operations
DELETE: remove permanently deletes documents from the database. Called with
no parameters, it removes all documents from a collection. It can also take a
document specifying criteria for removal.

> db.blog.remove({title : "My first post in my blog"})
> db.blog.find()
{
"title" : "My second post in my blog",
"content" : "Here's my second blog post.",
"date" : ISODate("2013-10-05T16:13:42.181Z"),
"comments" : [ ]
}

Comparison
Name Description
$gt Matches values that are greater than the value specified in the query.
$gteMatches values that are equal to or greater than the value specified in the query.
$in Matches any of the values that exist in an array specified in the query.
$lt Matches values that are less than the value specified in the query.
$lte Matches values that are less than or equal to the value specified in the query.
$ne Matches all values that are not equal to the value specified in the query.
$ninMatches values that do not exist in an array specified to the query.

Logical
Name Description
$or Joins query clauses with a logical OR returns all documents that match the
conditions of either clause.
$and
Joins query clauses with a logical AND returns all documents that match the
conditions of both clauses.
$not
Inverts the effect of a query expression and returns documents that do not
match the query expression.
$nor
Joins query clauses with a logical NOR returns all documents that fail to
match both clauses.

Examples
db.scores.find( { score : { $gt : 50 }, score : { $lt : 60 } } );
db.scores.find( { $or : [ { score : { $lt : 50 } }, { score : { $gt : 90 } } ] } ) ;
db.users.find({ name : { $regex : "q" }, email : { $exists: true } } );
db.users.find( { friends : { $all : [ "Joe" , "Bob" ] }, favorites : { $in :
[ "running" , "pickles" ] } } )

> for (i=0; i<1000000; i++) {
...
... db.users.insert(
... {
... "i" : i,
... "username" : "user"+i,
... "age" : Math.floor(Math.random()*120),
... "created" : new Date()
... }
... );
... }
> db.users.count()
1000000
> db.users.find()
{ "_id" : ObjectId("526403c77c1042777e4dd7f1"), "i" : 0, "username" : "user0", "age" : 80, "created"
: ISODate("2013-10-20T16:24:39.780Z") }
: ISODate("2013-10-20T16:24:39.826Z") }
{ "_id" : ObjectId("526403c77c1042777e4dd7f3"), "i" : 2, "username" : "user2", "age" : 5, "created" :
ISODate("2013-10-20T16:24:39.826Z") }
: ISODate("2013-10-20T16:24:39.826Z") }
: ISODate("2013-10-20T16:24:39.826Z") }

> db.users.find({username: "user999999"}).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 1,
Others means:
Others means:
"nscannedObjects" : 1000000,
"nscanned" : 1000000,
The query could be
The query could be
"nscannedObjectsAllPlans" : 1000000,
returned 5 documents --n
returned 5 documents n
"nscannedAllPlans" : 1000000,
scanned 9 documents
scanned 9 documents
"scanAndOrder" : false,
from the index --nscanned
from the index nscanned
"indexOnly" : false,
and then read 5
and then read 5
"nYields" : 1,
full documents from
full documents from
"nChunkSkips" : 0,
the collection
the collection
"millis" : 392,
--nscannedObjects
nscannedObjects
"indexBounds" : {
},
"server" : "desarrollo:27017"
}

The results of explain() describe the details of how MongoDB executes the query.
Some of relevant fields are:
cursor: A result of BasicCursor indicates a non-indexed query. If we had used an
indexed query, the cursor would have a type of BtreeCursor.
nscanned and nscannedObjects: The difference between these two similar fields is
distinct but important. The total number of documents scanned by the query is
represented by nscannedObjects. The number of documents and indexes is
represented by nscanned. Depending on the query, it's possible for nscanned to be
greater than nscannedObjects.
n: The number of matching objects.
millis: Query execution duration.

> db.users.ensureIndex({"username" : 1})
> db.users.find({username: "user101"}).limit(1).explain()
{
"cursor" : "BtreeCursor username_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 40,
"indexBounds" : {
"username" : [
[
"user101",
"user101"
]
]
},
"server" : "desarrollo:27017"
}

> db.users.ensureIndex({"age" : 1, "username" : 1})
> db.users.find({"age" : {"$gte" : 41, "$lte" : 60}}).
... sort({"username" : 1}).
... limit(1000).
... hint({"age" : 1, "username" : 1})

> db.users.find({"age" : {"$gte" : 41, "$lte" : 60}}).
... sort({"username" : 1}).
... limit(1000).
... hint({"username" : 1, "age" : 1})

Where are the answers?
1) Embedded or Link
2) 1 : 1
3) 1: many
4) 1:few
4) many:many
5) few:few

So gimme just a minute and I'll tell you why

Because any document can be put into any collection,
then i wonder:
“Why do we need separate collections at all?”
with no need for separate schemas for different kinds of
documents,
why should we use more than one collection?

COMMETS

POST

{ _id:1,
post_id:____,
author:___,
author_email:_,
order:___}

{_id:1,
title:____,
body:___,
author:___,
date:___}

TAGS
1) Embedded 16Mb
2) Living without Constrain,
in MongoDB
dependent of you
3) No JOINS

{ _id:___,
tag:____,
post_id: 1 }

Model One-to-One Relationships with Embedded Documents
{
_id: "csp",
name: "Carlos Sánchez Pérez"
}
{

2 Collections
patron_id: "csp",
street: "123 Aravaca",
city: "Madrid",
Number: "25 3º A",
zip: 12345

1) Frequency access
Thinking about the memory.
All information load
2) Growing size at the items
Writen of data
separated or embedbed
3) > 16Mb
4) Atomicity of data

}
If the address data is frequently retrieved with the name information, then with
referencing, your application needs to issue multiple queries to resolve the reference.

The better data model would be to embed the address data in the patron data:
{

{

_id: "csp",
name: "Carlos Sánchez Pérez",
address: {
city: "Madrid",
Number: "25 3º A",
zip: 12345
}

}
{

}
}

_id: "csp",
name: "Carlos Sánchez Pérez",
address: {
city: "Madrid",
Number: "25 3º A",
zip: 12345
}

{

_id: "csp2",
name: "Carlos SP3",
address: {
city: "Madrid",
Number: "75 3º A",
zip: 12345
}

}
_id: "csp1",
name: "Carlos1 Sánchez1",
address: {
city: "Madrid",
Number: "95 3º A",
zip: 12345
}

{
_id: "csp",
name: "Carlos SN",
address: {
city: "Madrid",
Number: "45 3º A",
zip: 12345
}
}

Model One-to-Many Relationships with Embedded
Documents
{
_id: "csp",
}
{
patron_id: "csp",
city: "Madrid",
Number: "25 3º A",
zip: 12345
}
{
patron_id: "csp",
city: "Madrid",
Number: "55 1º B",
zip: 12345
}

3 Collections

1) Frequency access
2) Growing size
3) > 16Mb o Mib
4) Atomicity of data

Model One-to-Many Relationships with Embedded
Documents
{
_id: "csp",
addresses: [
{
city: "Madrid",
Number: "25 3º A",
zip: 12345
},
{
city: "Madrid",
Number: "25 3º A",
zip: 12345
}
]
}

Model One-to-Many Relationships with Document
References
People
{
_id: "csp",
City: “MD”,
….................
}
City
{
_id: "MD",
Name: …..
….................
}

If you are thinking in
If you are thinking in
Embedded: ititnot a good
Embedded: not a good
solution in this case
Why?
Why?
Redundance information
Redundance information

Model One-to-Many Relationships with Document
References
BLOG
BLOG

Embedded itita good
Embedded a good
One-to-few
One-to-few

post {
title: “My first tirle”,
author : “Carlos Sánchez Pérez” ,
date : “19/08/2013″,
comments : [ {name: "Antonio López", comment : "my comment" }, { .... } ],
tags : ["tag1","tag2","tag3"]
}
autor { _id : “Carlos Sánchez Pérez “, password; “”,…….. }

Model Many-to-Many Relationships
Books and Authors
Books
{
:id: 12
title: "MongoDB: My Definitive easy Guide",
author: [32]
…........................
}
Authors
{
:id: 32
author: "Peter Garden",
books: [12,34,5,6,78,65,99]
…........................
}

MODEL
MODEL
Few-to-few
Few-to-few
Embedded books: ititnot a good
Embedded books: not a good

Benefits of Embedding
Performace and better read.
Be careful if the document chage a lot, slow write.

Sharding
Vertical scaling adds more CPU and storage resources to increase
capacity. Scaling by adding capacity has limitations: high performance
systems with large numbers of CPUs and large amount of RAM are
disproportionately more expensive than smaller systems. Additionally,
cloud-based providers may only allow users to provision smaller
instances. As a result there is a practical maximum capability for
vertical scaling.
Sharding, or horizontal scaling, by contrast, divides the data set and
distributes the data over multiple servers, or shards. Each shard is an
independent database, and collectively, the shards make up a single
logical database.

S1
S1
r1,r2,r3
r1,r2,r3

s2
s2

s4
s4

s3
s3

Mongos (router)
Mongos (router)

APP
APP

s5
s5

Config server

COLLECTIONS
User0.......................................................................................................user99999
User0.......................................................................................................user99999

FIRST: a collection is sharded, it can be thought of as a single chunk
from the smallest value of the shard key to the largest

$minkey
user100

User100
user300

User300
user600

User600
user900

User900 User1200 User1500
user1200 user1500 $maxkey

Sharding splits the collection into many chunks based on shard key ranges

Any questions?
….. I'll shoot it to you straight and look you in the eye.
So gimme just a minute and I'll tell you why
I'm a rough boy.

That's all folks!!
and
Thanks a lot for your
attention
I'm comming soon..........

The emerging world of mongo db csp

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a The emerging world of mongo db csp

Semelhante a The emerging world of mongo db csp (20)

Último

Último (20)

The emerging world of mongo db csp