New applications, users and inputs demand new types of data, like unstructured, semi-structured and polymorphic data. Adopting MongoDB means adopting to a new, document-based data model.
While most developers have internalized the rules of thumb for designing schemas for relational databases, these rules don't apply to MongoDB. Documents can represent rich data structures, providing lots of viable alternatives to the standard, normalized, relational model. In addition, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense.
In this session, Buzz Moschetti explores how you can take advantage of MongoDB's document model to build modern applications.
2. Before We Begin
• This webinar is being recorded
• Use The Chat Window for
• Technical assistance
• Q&A
• MongoDB Team will answer quick
questions in realtime
• “Common” questions will be
reviewed at the end of the webinar
3. Theme #1:
Great Data Design involves
much more than the database
• Easily understood structures
• Harmonized with software
• Acknowledging legacy issues
4. Theme #2:
Today’s solutions need to
accommodate tomorrow’s needs
• End of “Requirements Complete”
• Ability to economically scale
• Shorter solutions lifecycles
17. 17
Substructure Works Well With Code
> addr = db.patrons.find({_id :“joe”},{“_id”: 0,”address”:1})
{
address: {
street: "123 Fake St. ",
city: "Faketon",
state: "MA",
zip: “12345”
}
}
// Pass the whole Map to this function:
doSomethingWithOneAddress(addr);
// Somewhere else in the code is the actual function:
doSomethingWithOneAddress(Map addr)
{ // Look for state }
19. 19
Substructure Amplifies Agility
> addr = db.patrons.find({_id :“bob”},{“_id”: 0,”address”:1})
{
address: {
street: ”139 W45 St. ",
city: ”NY",
state: ”NY",
country: ”USA”
}
}
// Pass the whole Map to this function:
doSomethingWithOneAddress(addr);
doSomethingWithOneAddress(Map addr)
{ // Look for state and optional country }
NO CHANGE
to queries
Only the single
implementation that
looks for country needs
to change
NO COMPILE-TIME
DEPENDENCIES
when passing Maps
20. 20
The Advantage over Rectangles
resultSet = select street, state, city, country, …
Map addr = processIntoMap(resultSet);
// Pass the whole Map to this function:
doSomethingWithOneAddress(addr);
doSomethingWithOneAddress(Map addr)
{ // Look for state and optional country }
Queries must change
to pick up new columns
Compile-time
dependency to process
new columns to Map
22. 22
One-to-One Relationships
• “Belongs to” relationships are often embedded
• Holistic representation of entities with their
embedded attributes and relationships
• Great read performance
Most important:
• Keeps simple things simple
• Frees up time to tackle harder schema
design issues
24. A Patron and their Addresses
> db.patrons.find({ _id : “bob” })
{
_id: “bob",
name: “Bob Knowitall",
addresses: [
{street: "1 Vernon St.", city: "Newton", …},
{street: "52 Main St.", city: "Boston", …}
]
}
25. A Patron and their Addresses
> db.patrons.find({ _id : “bob” })
{
_id: “bob",
name: “Bob Knowitall",
addresses: [
{street: "1 Vernon St.", city: "Newton", …},
{street: "52 Main St.", city: "Boston", …}
]
}
> db.patrons.find({ _id : “joe” })
{
_id: "joe",
name: "Joe Bookreader",
address: { street: "123 Fake St. ", city: "Faketon", …}
}
26. 26
Migration Options
• Migrate all documents when the schema changes.
• Migrate On-Demand
– As we pull up a patron’s document, we make the
change.
– Any patrons that never come into the library never get
updated.
• Leave it alone
– The code layer knows about both address and
addresses
27. 27
Letting The Code Deal With Documents
Map d = collection.find(new BasicDBObject(”_id”,”bob”));
// Contract: Return either a List of addresses or a null
// if no addresses exist
// Try to get the new “version 2” shape:
List addrl = (List) d.get(”addresses”);
// If not there, try to get the old one:
if(addrl == null) {
Map oneAddr = (Map) d.get(”address”);
if(oneAddr != null) {
addrl = new List();
addrl.append(oneAddr);
}
}
// addrl either exists with 1 or more items or is null
29. 29
Book
• MongoDB: The Definitive Guide,
• By Kristina Chodorow and Mike Dirolf
• Published: 9/24/2010
• Pages: 216
• Language: English
• Publisher: O’Reilly Media, CA
32. 32
One-To-Many Using Embedding
• Optimized for read performance of Books
• We accept data duplication
• An index on “publisher.name” provides:
– Efficient lookup of all books for given publisher name
– Efficient way to find all publisher names (distinct)
• Does not automatically mean there is no “master”
Publisher collection (from which data is copied when
creating a new Book)
34. Single Book with Linked Publisher
> book = db.books.find({ _id: “123” })
{
_id: “123”,
publisher_id: “oreilly”,
title: "MongoDB: The Definitive Guide",
…
}
> db.publishers.find({ _id : book.publisher_id })
{
_id: “oreilly”,
name: "O’Reilly Media",
founded: 1980,
locations: ["CA”, ”NY” ]
}
35. Multiple Books with Linked Publisher
db.books.find({ pages: {$gt:100}} ).forEach(function(book) {
// Do whatever you need with the book document, but
// in addition, capture publisher ID uniquely by
// using it as a key in an object (Map)
tmpm[book.publisher.name] = true;
});
uniqueIDs = Object.keys(tmpm); // extract ONLY keys
db.publishers.find({"_id": {"$in": uniqueIDs } });
36. Cartesian Product != Desired Structure
resultSet = “select B.name, B.publish_date, P.name, P.founded
from Books B, Publisher P
where P.name = B.publisher_name
and B.pages > 100”
B.Name B.publish_date P.name P.founded
More Jokes 2003 Random House 1843
Perl Tricks 1998 O’Reilly 1980
More Perl 2000 O’Reilly 1980
Starting Perl 1996 O’Reilly 1980
Flying Kites 1980 Random House 1843
Using Perl 2002 O’Reilly 1980
Bad Food 2011 Random House 1843
37. …And Tough To Use Without ORDER BY
resultSet = “select B.name, B.publish_date, P.name, P.founded
from Books B, Publisher P
where P.name = B.publisher_name
and B.pages > 100
order by P.name”;
B.Name B.publish_date P.name P.founded
Perl Tricks 1998 O’Reilly 1980
More Perl 2000 O’Reilly 1980
Using Perl 2002 O’Reilly 1980
Starting Perl 1996 O’Reilly 1980
Flying Kites 1980 Random House 1843
Bad Food 2011 Random House 1843
More Jokes 2003 Random House 1843
38. SQL Is About Disassembly
resultSet = “select B.name, B.publish_date, P.name, P.founded
from Books B, Publisher P
where P.name = B.publisher_name
and B.pages > 100
order by P.name”;
prev_name = null;
while(resultSet.next()) {
if(!resultSet.getString(“P.name”).equals(prevName)) {
// “next” publisher name found. Process material
// accumulated and reset for next items.
makeNewObjects(); //etc.
prev_name = resultSet.getString(“P.name”)
}
}
39. 39
One-To-Many Using Linking
• Optimized for efficient management of mutable data
• Familiar way to organize basic entities
• Code is used to assemble fetched material into other
objects, not disassemble a single ResultSet
– More complicated queries may be easier to code and maintain
with assembly vs. disassembly
47. 47
Embedding vs. Linking
• Embedding
– Terrific for read performance
• Webapp “front pages” and pre-aggregated material
• Complex structures
– Great for insert-only / immutable designs
– Inserts might be slower than linking
– Data integrity for not-belongs-to needs to be managed
• Linking
– Flexible
– Data integrity is built-in
– Work is done during reads
• But not necessarily more work than RDBMS
56. 56
Summary
• Physical design is different in MongoDB
– But basic data design principles stay the same
• Focus on how an application accesses/manipulates data
• Seek out and capture belongs-to 1:1 relationships
• Use substructure to better align to code objects
• Be polymorphic!
• Evolve the schema to meet requirements as they change
HELLO!
This is Buzz Moschetti at MongoDB, and welcome to today’s webinar entitled “Thinking in Documents”, part of our Back To Basics series.
If your travel plans today do not include exploring the document model in MongoDB then please exit the aircraft immediately and see an agent at the gate
Otherwise – WELCOME ABOARD for about the next hour.
Let’s talk about some the terms.
JOINS: RDBMS uses Join to stich together fundamentally simple things into larger, more complex data.
MongoDB uses embedding of data within data and linking to produce the same result
There are three themes that are important to grasp when Thinking In Documents in MongoDB and these will be reinforced throughout the presentation.
First, great schema design…. I’ll repeat it: great…
This is not new or something revolutionary to MongoDB.
It is something we have been doing all along in the construction of solutions. Sometimes well, sometimes not.
It’s just that the data structures and APIs used by MongoDB make it much easier to satisfy the first two bullet points.
Particularly for an up-stack software engineer kind of person like myself, the ease and power of well-harmonizing your persistence with your code – java, javascript, perl, python – is a vital part of
ensuring your overall information architecture is robust.
Part of the exercise also involves candidly addressing legacy RDBMS issues that we see over and over again after 40 years, like schema explosion and field overloading and flattening
Boils down to success = “schema” + code
Potentially very provocative.
In the old days, we had long requirements cycles and one or both of the following things happened:
Requirements changed midstream, invalidating a lot of schema design already performed especially for certain 1:1 entity relationships
High quality problem: The solution was asked to scale well beyond the original goals
Increasingly, it is no longer acceptable to have a 6 or 12 or 18 month requirements phase to build a system that lasts 10 years or more.
You want to get the initial information architecture done in a couple of weeks, expand upon it incrementally, and build a system that doesn’t have to live 10 years to “justify its development time.”
Because MongoDB is impedance matched to the software around it, you have a great deal of design flexibility and even innovation in the way you construct the overall solution.
We’ll explore some of those best practices today that will really put you on the road to Thinking In Documents
Let’s talk about some the terms.
JOINS: RDBMS uses Join to stich together fundamentally simple things into larger, more complex data.
MongoDB uses embedding of data within data and linking to produce the same result
A document is not a PDF or a MS word artifact.
A document a term for a rich shape. Structures of structures of lists of structures that ultimately at the leaves have familiar scalars like int, double, datetimes, and string.
In this example we see also that we’re carrying a thumbnail photo in a binary byte array type; that’s natively supported as well.
This is different than the traditional row-column approach used in RDBMS.
Another important difference is that In MongoDB, it is not required for every document in a collection to be the same shape; shapes can VARY
With the upcoming release of 3.2, we will be supporting documentation validation so in those designs where certain fields and their types are absolutely mandatory, we’ll be able to enforce that at the DB engine level similar to – but not exactly like – traditional schemas.
Truth is in most non-trivial systems, even with RDBMS and stored procs, etc. plenty of validation and logic is being handled outside the database..
Now here is something very important:
For the purposes of the webinar, we will be seeing this “to-string” representation of a document as it is emitted from the MongoDB CLI.
This is easy to read and gets the structural design points across nicely.
But make no mistake: you want most of your actual software interaction with MongoDB (and frankly any DB) to be via high fidelity types, not a big string with whitespaces and CR and quotes and whatnot.
Drivers in each language represent documents in a language-native form most appropriate for that language.
Java has maps, python has dictionaries. You deal with actual objects like Dates, not strings that must be constructed or parsed.
Another important note: We’ll be using query functionality to kill 2 birds with one stone:
To show the shape of the document
To show just a bit of the MongoDB query language itself including dotpath notation to “dig into” substructures
Note also that in MongoDB, documents go into collections in the same shape they come out so we won’t focus on insert.
This is a very different design paradigm from RDBMS, where, for example, the read-side of an operation implemented as an 8 way join is very different than the set of insert statements (some of them in a loop) required for the write side.
Let’s get back to data design.
…
Traditional data design is characterized by some of the points above, and this is largely because the design goals and constraints of legacy RDBMS engines heavily influence the data being put into them.
These platforms were designed when CPUs were slow and memory was VERY expensive.
Perhaps more interesting is that the languages of the time – COBOL, FORTRAN, APL, PASCAL, C – were very compile time oriented and very rectangular in their expression of data structures.
One could say rigid schema combined with these languages was in fact well-harmonized.
Overall, the platform is VERY focused on the physical representation of data.
For example, although most have been conflated, the legacy types of char, varchar, text, CLOB etc. to represent a string suggest a strong coupling to byte-wise storage concerns.
Documents, on the other hand, are more like business entities.
You’ll want to think of your data moving in and out as objects.
And the types and features of Document APIs are designed to be well-harmonized with today’s programming languages – Java, C#, python, node.js, Scala , C++ -- languages that are not nearly as compile-time oriented and offer great capabilities to dynamically manipulate data and perform reflection/introspection upon it.
So what better way to show this than diving right into a problem?
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
To make things a little more interesting, we’ll capture our emerging requirements in the form of questions the application needs to ask the database.
This is useful because, again, it helps us keep our eye on the software layer above the database
Imagine a patron walks up to the counter and presents his/her library card to check out some books. The first thing a librarian might want to do is confirm the patron’s address so as to have a place to send the library police when the book isn’t returned in a timely manner.
This might be the very first cut at such a thing, with 2 collections, patrons and addresses. Both share a common key space on the _id unique ID field.
To get name and address will require 2 queries.
Very important: You notice here that a list of favorite genres has been set up. It is a LIST of strings. Not a semicolon or comma delimited string.
Each element of that list is completely addressable, queryable, and updatable via dotpath notation. As is the entire list.
And should we require it, it is indexable. But let’s not get too far ahead.
Can we do better?
I think we can: embed the address substructure into the patron document.
Only 1 query is necessary.
You notice here that we treated address like an Object. We did not “flatten” everything out into addr_street, addr_city, addr_state, etc.
We’ll come back to that again in just a few slides and see the power it provides.
Already, with this simple example, two important aspects of Thinking In Documents are hilighted: arrays and substructures.
A common concern that pops up at this point is that every query will act like “SELECT *” and drag back potentially a LOT of data.
In MongoDB you can control that with projection as we see here, where fields can be explicitly included or excluded.
This operation is of course processed server side and is efficient.
Note that we can both pick up an entire substructure as a unit OR JUST specific subfields using “dot notation”.
Very powerful.
Before I mentioned we’d see more about the power of substructure.
Substructure is the concept that allows your schema to really well-harmonize with software.
By being able to pull out the entire address structure, you can pass it to a function that does something interesting with no matter what fields are or are not set.
And what’s really great is that these substructures are generic Maps or dictionaries – NO COMPILE TIME dependencies!
If your Thinking in Documents, then remember that Document Shapes can vary.
Apols for the small font.
But here, we are inserting a new document where address contains an additional field “country” (and, if you’re watching closely, does NOT have the zipcode field!)
When do a find and ask for the address field, two slightly different shapes emerge.
But more importantly, only one place “needs” to change: the place where the new “country” field is actually consumed.
Neither the query nor the intermediate parts of the call stack change at all – the new country field simply flows through!
Assuming you’re not in a “select *” environment which has it’s own huge set of issues, this is the traditional approach to the same problem.
For brevity I am leaving out the entire issue of ALTER TABLE and schema migration scripts which may be avoided in MongoDB.
The focus here is agilty and accuracy of Day 2 mods needed and how many places you have to make them.
Here, Every SQL statement that pulled out city, state, and zip has to be assessed in the context of doSomethingWithOneAddress(). In others words, if the call to doSomethingWithOneAddress() exists in code – the call NOT the one implementation -- then BOTH the SQL statement and the thing that converts it into a useful object (located hopefully nearby) have to be modified. This potentially is a large undertaking!
The problem is magnified by the number of addresses that you may be pulling out.
In SQL and rectangles, you have to track down everything that looks like it is part of an address and ALTER TABLE and add columns to the select statements.
In MongoDB and Documents, the additional material is automatically collected and flowed downstream with no DDL or compile time entanglements.
We’ll see in a moment how Lists in Documents make this easier still!
Let’s summarize one-to-one relationships.
Read performance – especially for complex shapes of documents that otherwise would entail multiway JOINs -- is optimized because we only need a single query and a single disk/memory hit..
MongoDB is especially useful in dealing with 1:1 belongs-to relationships
Not every non-scalar datum is going to force you to create a new table, a foreign key, and key management.
It keeps simple things simple.
Business Requirements Change!
People may have more than one address.
And: we acknowledge that the legacy practice of assuming some small number of instances and flattening out addresses into address1 and address2 is bound to bump up against a situation where someone has 3 addresses.
We’ve seen where that leads traditional schema design.
We don’t want to repeat it.
Now, just model addresses as an array.
That was easy.
No need to create a new table, set up foreign keys, do key management, etc.
We combine the power of arrays with substructures.
At times it takes longer to describe the change than it does to implement it and have it available as part of the information architecture including querying and indexing.
And again, these rich structures are presented in the client side driver as Lists, Maps, dictionaries, arrays and other high-fidelity types.
Now suppose this design change occurred on Day 2, after the system was placed into production.
What about the old documents that had address, not addresses?
There are 2 aspects to this issue.
The MongoDB paradigm is that the code layer just above the DB is responsible for managing these types of things. The same code that is responsible for the higher-level business validation and verification is also responsible for selectively using (both on a read and write basis) the address or addresses field.
There are several migration strategies that one could employ in conjunction with functionality at the code layer.
Remember, Thinking In Documents means thinking about documents with varying shapes.
Migrate on release: If address exists, write into new array of size 1 of new fields addresses and remove address field.
When we say “let the code deal with it” we mean something like this.
The Data Access Layer code in cooperation with the Document model significantly eases Day 2 information architecture change.
And again, this is an important aspect of Thinking In Documents.
Moving on, let’s look at another relationship in our emerging information architecture: publisher and books.
Shameless plug!
Note that a book may have multiple authors….
Here’s a first cut at the design.
Here, the 1:! Belongs-to relationship between book and publisher is not nearly as strong as patron to address
But for the moment, let’s choose to duplicate publisher in every book that the publisher has published.
Data duplication is OK because the publisher is largely immutable.
Here’s an even better one: making the author name more granular by making it a substructure with separate first and last name elements.
A good rule of thumb is to try to eliminate embedded whitespace or punctuation in a field by breaking it down into substructure.
This design will permit very high speed display of relevant book information.
And we can still quickly look up all books for a given publisher.
Suppose publisher info is not so static.
Maybe we want to take a more traditional route and separate publisher information out into it’s own collection.
We need to assign _ids and we’ll use simplified examples here.
Note that in MongoDB each document needs a unique _id but it’s not important for today’s talk to get into the advanced topic of _id construction
If you don’t supply one, one will be created automatically.
And here’s a teaser: _id does not have to be a string or number or scalar: it can be a multi-field substructure!
Note that the belongs-to 1:1 relationship of locations carries over nicely without fuss.
It now takes 2 queries to pull the information.
Now this isn’t the end of the world.
It’s effectively no more code work than a JOIN. Instead of decomposing a resultset or passing the result into an ORM, your code layer deals directly with the database.
If we want to get more than one book, no problem.
Get the books and as you’re processing the book data, capture the publisher ID.
We store it in a map to remove duplicate IDs
Then we turn around and do an efficient lookup by primary key into the publishers collection. This works well regardless of whether the list is 1 long or 10,000.
This is effectively the multi value JOIN. And of course the approach applies to linking any number of collections, not just the 2 we see here.
In MongoDB, the application assembles material from multiple queries.
Before the chat window lights up with all sorts of questions and rage, let’s do a quick compare and contrast to SQL.
This example should be straightforward to most of you.
As you know, the basic behavior of JOIN has some subtle issues:
The cartesian product in the ResultSet is a giant rectangle containing repeating column values; you have to do something to change that into a form well-digestable by your application.
Depending on the amount of JOINed columns and the cardinality, you could be redundantly dragging LOTS of data across the network.
And on top of it all, it’s hard to deal with Giant Rectangle efficiently because the JOINed data is jumping around.
This leads us to run things through ORDER BY so that we can now more easily deal with the Giant Rectangle.
What does this do to performance?
What if it’s more than a 2 way JOIN? 4 or 8 way?
In the end, for anything except the most trivial of JOINs, the amount of work required to unwind or disassemble the Giant Rectangle is largely the same amount of code as
It takes for MongoDB to assemble material.
For sophisticated queries to complex shapes, it could be argued that assembly is both easier to write AND maintain Day 2.
This design will permit very high speed display of relevant book information.
And we can still quickly look up all books for a given publisher.
We have already seen a way to do this : by performing a find() on the Books collection with a particular publisher name OR publisher ID.
But is there another way?
As much as arrays are very useful in MongoDB, beware of the temptation to treat them as “rows.”
Arrays should be “slow-growing” and not without a reasonable bound.
For example, you wouldn’t carry system of record customer transactions as a array inside the customer.
You MIGHT, however, for the purposes of speed at scale, duplicate the last 3 transactions as an array in the customer record – but that array is not growing without bound.
From an underlying technology basis, this is less of an issue now with the WiredTiger storage engine available in 3.0 but you should still be mindful of it.
As our schema is evolving, we’re beginning to see that an increasing number of questions can be efficiently answered by existing set of entities and relationships.
This holds for authors; recall we set up the substructure of first and last names. But suppose we want more detail on an author?
Let’s go ahead and create the Authors collection and link to it with a Author ID.
Yes, we are using some fancier code here to extract the author IDs but in the end, it is still only 2 queries.
The fancy code here is using a map() operation to extract just the author IDs which will be passed along to the $in operator
The authors documents contain all sorts of information about each author.
We have left the first and last names in the books document for the purposes of speed
But we could get rid of them if we wanted.
It is a choice of integrity vs. speed of retrieval.
Inevitably, we turn the question around and start to expose ourselves to the MANY-TO-MANY problem.
Here’s one way to do it: Acknowledging that most authors are not steven king or danielle steele and have something of upper bound on the books they’ve written, we can doubly link the books and authors collection.
In general the data is static so referential integrity is not as much as an issue as performance.
Again, we’ve kept title in the structure within the books field for performance purposes (to keep it to a single query) when looking up basic info about an author.
But if so desired, we could just have books be an array of book IDs and use a 2-query approach as we’ve seen in prior slides.
Note this is different than publishers having an array of books because the upper bound for author-to-books is much less than publisher-to-books.
Remember: all fields including those inside structures inside arrays are indexable.
We saw an example of this earlier with indexing on publisher.name
Complex structures like arrays of coordinate pairs, deep nested data, etc.
With immutable designs, sometimes the safest / easiest route is to dereference the data. Note that with the emergence of deduping storage arrays, this is increasingly becoming a practical and exciting solution!
Suppose the library wants to add value by having book clubs enhance the authors information with bespoke information.
There are at least 3 challenges here:
1. the book club is operating semi-independently from the main system designers
Clearly, we don’t know early on what the book club is going to add
Each author may have a different set of information, created by a different member of the book club.
This is a very, very exciting part of MongoDB.
No need to come up with userDefined column 1, column 2, etc.
We see here that Kristina and Mike have very different substructures inside the personalData field.
We call this polymorphism: the variation of shape from document-to-document within the same collection.
The library application logic only is looking for a field called “personalData”; actions will be taken dynamically based on the shape and types in the substructure!
For example, It is a very straightforward exercise to recursively “walk” the structure and construct a panel in a GUI – especially if you are using AngularJS and the MEAN stack
(MongoDB / Express / Angular / Node.js )
No need to use XML or blobs or serialized objects. It’s all native MongoDB -- and documents are represented in the form most easily natively manipulated by the language
Every field is queryable and if so desired, indexable! Documents that do not contain fields in a query predicate are simply treated as unset.
Polymorphism is such an important aspect of Thinking In Documents that it deserves another slide..
Here’s an events collection. We’ve taken out _id for some extra space.
Type and ts and data are the well-known fields, and there is probably an index on type and ts.
But the data field contents are specialized for each type.
No need for complex tables or blobs or XML.
And very easily extensible to new types we may need to add on day 2.
This model extends to products, transactions, actors in a data design..
If you think about your data in terms of entities, you’ll discover that you probably only need a handful. That plus polymorphism within the entity gets you a long way.
Because of this design approach, rarely will a MongoDB database have 100s of collections
This is an example of faceted search or tagging in general.
We can begin simply by implementing Categories as an array
We probably will only have a few and they don’t change very often and have a natural upward bound.
An index on categories will allow rapid retrieval.
Again, a very appropriate use of belongs-to 1:1
If we want a little more hierarchy, we can use regex operations.
The category field is indexed as before, but now as a string
Uses an index because of the anchored regular expression.
There is a downside: when a category hierarchy gets changed, all documents will need to be re-categorized.
A little techno-trailer for you: Document Validation.
Coming in v3.2, Document Validation “inverts” the use of existing MongoDB query language.
Documents being inserted or updated will be assessed against a validator if it exists. There are a number of special features including “strictness on update” that are beyond the scope of today’s presentation but the good news is that it is coming!
We’ll close with something really cool – document validation that adapts to change over time.
Assuming you “soft version” your documents by including a logical version ID ( in this case, a simple integer in field v) , you can maintain multiple different shapes of documents in one collection, each of them validation enforced to the version rules appropriate at the time. And again, because it is at the DB engine level, enforcement is guaranteed through all drivers.
SUPER POWERFUL!
On behalf of all of us at MongoDB , thank you for attending this webinar!
I hope what you saw and heard today gave you some insight and clues into what you might face in your own data design efforts.
Remember you can always reach out to us at MongoDB for guidance.
With that, code well and be well.