Este es el tercer seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web se explica la arquitectura de las bases de datos de documentos.
5. Agenda del Curso
Date Time Webinar
25-Mayo-2016 16:00 CEST Introducción a NoSQL
7-Junio-2016 16:00 CEST Su primera aplicación MongoDB
21-Junio-2016 16:00 CEST Diseño de esquema orientado a documentos
07-Julio-2016 16:00 CEST Indexación avanzada, índices de texto y geoespaciales
19-Julio-2016 16:00 CEST Introducción al Aggregation Framework
28-Julio-2016 16:00 CEST Despliegue en producción
6. Resumen de los webinar 1 y 2
• ¿Porqué existe NoSQL?
• Tipos de bases de datos NoSQL
• Características clave de MongoDB
• Instalación y creación de bases de datos y colecciones
• Operaciones CRUD
• Índices y explain()
7. Thinking in Documents
• Los documentos de MongoDB son objetos JS (JSON)
• Se almacenan codificados en BSON
• BSON es “Binary JSON”
• BSON es una forma eficiente de codificar y decodificar JSON
• Required for efficient transmission and storage on disk
• Eliminates the need to “text parse” all the sub objects
• Si quieres saber más: http://bsonspec.org/
8. Documento de Ejemplo
{
name : “Rubén Terceño”,
title : “Senior Solutions Architect”,
employee_number : 653,
location : {
type : “Point”,
coordinates : [ 43.34, -3.26 ]},
expertise: [ “MongoDB”, “Java”, “Geospatial” ],
address : {
address1 : “Rutilo 11”,
address2 : “Piso 1, Oficina 2”,
zipcode : “28041”,
}
}
Fields can contain sub-documents
Typed field values
Fields can contain arrays
Fields
9. Some Example Queries
• Find all Solution Architects
db.mongo.find({title : “Solutions Architect”})
• Find all employees knowing Java in Support or Consulting
db.mongo.find({expertise: “Java”,
departament: {$in : [“Support”, “Consulting”]}})
• Find all employees in my postcode
db.mongo.find({“address.zipcode”: 28041})
10. Modelling and Cardinality
• One to One
• Author to blog post
• One to Many
• Blog post to comments
• One to Millions
• Blog post to site views (e.g. Huffington Post)
11. One To One Relationships
• “Belongs to” relationships are often embedded
• Holistic representation of entities with their
embedded attributes and relationships.
• Great read performance
Most important:
• Keeps simple things simple
• Frees up time to tackle harder schema issues
12. One To One Relationships
{
“Title” : “This is a blog post”,
“Author” : {
name : “Rubén Terceño”,
login : “ruben@mongodb.com”,
},
…
}
We can index on “Title” and “Author.login”.
13. One to Many - Embedding
{
“_id” : ObjectID( “ZZZZ” ),
“Title” : “A Blog Title”,
“Body” : “A blog post”,
“comments” : [{
name : “Juan Amores”,
email : “jamores@mongodb.com”,
comment :“I love your writing style”,
}
{
name : “Pedro Víbora”,
email : “pvibora@mongodb.com”,
comment :“I hate your writing style”,
}]
}
Where we expect a small number of sub-documents we can embed them in the main
document
14. Key Concerns
• What are the write patterns?
• Comments are added more frequently than posts
• Comments may have images, tags, large bodies of
text
• What are the read patterns?
• Comments may not be displayed
• May be shown in their own window
• People rarely look at all the comments
15. One to Many – Linking I
• Keep all comments in a separate comments collection
• Add references to posts IDs
• Requires two queries to display blog post and associated comments
{
_id : ObjectID( “AAAA” ),
post_id : ObjectID( “ZZZZ” ),
name : “Juan Amores”,
email : “jamores@mongodb.com”,
comment :“I love your writing style”,
}
{
_id : ObjectID( “AAAB” ),
post_id : ObjectID( “ZZZZ” ),
name : “Pedro Víbora”,
email : “pvivora@mongodb.com”,
comment :“I hate your writing style”,
}
{
“_id” : ObjectID( “ZZZZ” ),
“Title” : “A Blog Title”,
“Body” : “A blog post”
}
{
“_id” : ObjectID( “ZZZZ” ),
“Title” : “Another Blog Title”,
“Body” : “Another blog post”,
}
16. One to Many – Linking II
• Keep all comments in a separate comments collection
• Add references to comments as an array of comment IDs
• Requires two queries to display blog post and associated comments
• Requires two writes to create a comments {
_id : ObjectID( “AAAA” ),
name : “Joe Drumgoole”,
email : “Joe.Drumgoole@mongodb.com”,
comment :“I love your writing style”,
}
{
_id : ObjectID( “AAAB” ),
name : “John Smith”,
email : “Joe.Drumgoole@mongodb.com”,
comment :“I hate your writing style”,
}
{
“_id” : ObjectID( “ZZZZ” ),
“Title” : “A Blog Title”,
“Body” : “A blog post”,
“comments” : [ ObjectID( “AAAA” ),
ObjectID( “AAAB” )]
}
{
“_id” : ObjectID( “ZZZZ” ),
“Title” : “A Blog Title”,
“Body” : “A blog post”,
“comments” : []
}
17. One To Many – Hybrid Approach
{
_id : ObjectID( “ZZZZ” ),
Title : “A Blog Title”,
Body : “A blog post”,
last_comments : [{
_id : ObjectID( “AAAA” )
name : “Juan Amores”, comment
:“I love your writing style”,
},
{
_id : ObjectID( “AAAB” ),
name : “Pedro Víbora”,
comment :“I hate your writing
style”,
}]
}
{
“_id” : ObjectID( “AAAA” ),
“post_id” : ObjectId( “ZZZZ”),
“name” : “Juan Amores”,
“email” : “jamores@mongodb.com”,
“comment” :“I love your writing
style”,
}
{...},{...},{...},{...},{...},{...}
,{..},{...},{...},{...} ]
18. Linking vs. Embedding
• Embedding
• Terrific for read performance
• Webapp “front pages” and pre-aggregated material
• Writes can be slow
• Data integrity needs to be managed
• Linking
• Flexible
• Data integrity is built-in
• Work is done during reads
19. Let’s do crazy things!
• What is we were tracking mouse position for heat tracking?
• Each user will generate hundreds of data points per visit
• Thousands of data points per post
• Millions of data points per blog site
• Relational-like model
• Store a blog ID per event
• Be polymorphic, my friend!
{
“post_id” : ObjectID(“ZZZZ”),
“timestamp” : ISODate("2005-01-02T16:35:24Z”),
“event” : {
type: click,
position : [240, 345]} }
{
“post_id” : ObjectID(“ZZZZ”),
“timestamp” : ISODate("2005-01-02T16:35:24Z”),
“event” : {
type: close}
}
22. Implement data governance without
sacrificing agility that comes from dynamic
schema
• Enforce data quality across multiple teams and
applications
• Use familiar MongoDB expressions to control
document structure
• Validation is optional and can be as simple as a
single field, all the way to every field, including
existence, data types, and regular expressions
Data Governance with Doc. Validation
23. The example on the left adds a rule to the
contacts collection that validates:
• The year of birth is no later than 1998
• The document contains a phone number and / or
an email address
• When present, the phone number and email
addresses are strings
Document Validation Example
db.runCommand({
collMod : “contacts”,
validator : {
$and : [
{year_of_birth : {$lte: 1998}},
{$or : [
{phone : { $type : “string”}},
{email : {$type : “string}}
]]}
)
24. Summary
• Schema design is different in MongoDB
• But basic data design principles stay the same
• Focus on how an application accesses/manipulates data
• Seek out and capture belongs-to 1:1 relationships
• Don’t get stuck in “one record” per item thinking
• Embrace the hierarchy and think about cardinality
• Evolve the schema to meet requirements as they change
• Be polymorphic!
• Document updates are transactions
• Use validation in your advantage
25. Próximo Webinar
Indexación avanzada, índices de texto y geoespaciales
• 7 de Julio 2016 – 16:00 CEST, 11:00 ART, 9:00
• ¡Regístrese si aún no lo ha hecho!
• Los índices de texto permiten hacer búsquedas “tipo Google” sobre
todos los campos de todos los registros del dataset.
• Los índices geoespaciales nos ayudan a realizar queries utilizando
posiciones, tanto simples (proximidad, distancia, etc.) como avanzadas
(intersección, inclusión, etc.)
• Regístrese en : https://www.mongodb.com/webinars
• Denos su opinión, por favor: back-to-basics@mongodb.com
Delighted to have you here. Hope you can make it to all the sessions. Sessions will be recorded so we can send them out afterwards so don’t worry if you miss one.
If you have questions please pop them in the sidebar.
Let’s summarize one-to-one relationships.
Read performance is optimized because we only need a single query and a single disk/memory hit. Write performance change is negligible.
MongoDB is especially useful in dealing with 1:1 belongs-to relationships