We like to use strongly type languages and used them along side with flexible schema databases. What challenges and strategies do we have to deal with data coherence and format validations using different strategies and tools like ODMs versioning, migrations et al. We also review the tradeoffs of such strategies.
7. 7
Traditional RDMS
Table with
checks
create table cat_pictures(
id int not null,
size int not null,
picture blob not null,
user_id int,
primary key (id),
foreign key (user_id) references users(id));
Null checks
Foreign and Primary
key checks
9. 9
Is this Flexible?
• What happens when we need to
change the schema?
– Add new fields
– Add new relations
– Change data types
• What happens when we need to
scale out our data structure?
11. 11
Flexible Schema
• No mandatory schema definition
• No structure restrictions
• No schema validation process
12. 12
We start from code
public class CatPicture {
int size;
byte[] blob;
}
public class User {
int id;
String firstname;
String lastname;
CatPicture[] cat_pictures;
}
14. 14
Flexible Schema Databases
• Challenges
– Different Versions of Documents
– Different Structures of Documents
– Different Value Types for Fields in
Documents
15. 15
Different Versions of Documents
Same document across time suffers changes on how it
represents data
{ "_id" : 174, "firstname": "Juan" }
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }
First Version
Second Version
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures":
[{"size": 10, picture: BinData("0x133334299399299432")}]
}
Third Version
16. 16
Different Versions of Documents
Same document across time suffers changes on how it
represents data
{ "_id" : 174, "firstname": "Juan" }
{ "_id" : 174, "name": { "first": "Juan", "last": "Olivo"} }
Different Structure
17. 17
Different Structures of Documents
Different documents coexisting on the same collection
{ "_id" : 175, "brand": "Ford", "model": "Mustang", "date": ISODate("XXX") }
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }
Within same collection
18. 18
Different Data Types for Fields
Different documents coexisting on the same collection
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "bdate": 1224234312}
{ "_id" : 175, "firstname": "Paco", "lastname": "Hernan", "bdate": "2015-06-27"}
{ "_id" : 176, "firstname": "Tomas", "lastname": "Marce", "bdate": ISODate("2015-06-27")}
Same field, different data type
28. 28
Decoupled Architectures
• Allows the business logic to evolve independently of the
data layer
• Decouples the underlying storage / persistency option
from the business service
• Changes are "requested" and not imposed across all
applications
• Better versioning control of each request and it's
mapping
30. 30
ODM
• Reduce impedance between code and Databases
• Data management facilitator
• Hides complexity of operators
• Tries to decouple business complexity with "magic"
recipes
31. 31
Spring Data
• POJO centric model
• MongoTemplate || CrudRepository
extensions to make the connection to the
repositories
• Uses annotations to override default field names
and even data types (data type mapping)
public interface UserRepository extends
MongoRepository<User, Integer>{
}
public class User {
@Id
int id;
@Field("first_name")
String firstname;
String lastname;
33. 33
Spring Data Considerations
• Data formats, versions and types still need to be
managed
• Does not solve issues like type validation out-of-box
• Can make things more complicated but more
"controllable"
@Field("first_name")
String firstname;
34. 34
Morphia
• Data source centric
• Will do all the discovery of POJO's for
given package
• Also uses annotations to perform
overrides and deal with object mapping
@Entity("users")
public class User {
@Id
int id;
String firstname;
String lastname;
morphia.mapPackage("examples.odms.morphia.pojos");
Datastore datastore = morphia.createDatastore(new MongoClient(),
"morphia_example");
datastore.save(user);
36. 36
Morphia Considerations
• Enables better control at Class loading
• Also facilitates, like Spring Data, the field overriding (tags
to define field keys)
• Better support for Object Polymorphism
38. 38
Versioning
Versioning of data structures (specially documents) can be
very helpful
Recreate documents over time
Flow Control
Data / Field Multiversion Requirements
Archiving and History Purposes
39. 39
Versioning – Option 0
Change existing document each time there is a write with
monotonically increasing version number inside
{ "_id" : 174, "v" : 1, "firstname": "Juan" }
{ "_id" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }
{ "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
> db.users.update( {"_id":174 } , { {"$set" :{ ... }, {"$inc": { "v": 1 }} } )!
Increment field
value
40. 40
Versioning – Option 1
Store full document each time there is a write with
monotonically increasing version number inside
{ "docId" : 174, "v" : 1, "firstname": "Juan" }
{ "docId" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }
{ "docId" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
> db.users.insert( {"docId":174 …})!
> db.docs.find({"docId":174}).sort({"v":-1}).limit(-1);!
Find always latest
version
43. 43
Versioning
Schema Fetch 1 Fetch Many Update Recover if
Fail
0) Increment
Version
Easy, Fast Fast Easy Medium N/A
1) New
Document
Easy, Fast Not Easy,
Slow
Medium Hard
2) Embedded in
Single Doc
Easy,
Fastest
Easy, Fastest Medium N/A
3) Separate
Collection
Easy,
Fastest
Easy, Fastest Medium Medium, Hard
47. 47
Change Field Names
Again, programmatically you can do it
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo",}
{ "_id" : 174, "first": "Juan", "last": "Olivo" }
> db.users.update( {"_id": 174}, {"$rename": { "firstname":
"first", "lastname":"last"} })!
48. 48
Change Field Data Type
Align to a new code change and move from Int to String!
{..."bdate": 1435394461522} {..."bdate": "2015-06-27"}
1) Batch Process
2) Aggregation Framework
3) Change based on usage
49. 49
Change Field Data Type
1) Batch Process – bulk api
public void migrateBulk(){
DateFormat df = new SimpleDateFormat("yyyy-MM-DD");
...
List<UpdateOneModel<Document>> toUpdate =
new ArrayList<UpdateOneModel<Document>>();
for (Document doc : coll.find()){
String dateAsString = df.format( new Date( doc.getInteger("bdate", 0) ));
Document filter = new Document("_id", doc.getInteger("_id"));
Document value = new Document("bdate", dateAsString);
Document update = new Document("$set", value);
toUpdate.add(new UpdateOneModel<Document>(filter, update));
}
coll.bulkWrite(toUpdate);
50. 50
Change Field Data Type
1) Batch Process – bulk api
public void migrateBulk(){
...
for (Document doc : coll.find()){
...
}
coll.bulkWrite(toUpdate);
Is there any problem with
this?
51. 51
Change Field Data Type
1) Batch Process – bulk api
public void migrateBulk(){
...
//bson type 16 represents int32 data type
Document query = new Document("bdate", new Document("$type", "16"));
for (Document doc : coll.find(query)){
...
}
coll.bulkWrite(toUpdate);
More efficient filtering!
54. 54
Tradeoffs
Positives Penalties
Decoupled Architecture - Should be your default
approach
- Clean Solution
- Scalable
N/A
Data Structures Variability - Reflects Nowadays data
structures
- You can push decisions for
later
- More complex code base
Data Structures Strictness - Simple to maintain
- Always aligned with your
code base
- Will eventually need
Migrations
- Restricts your code
iterations
56. 56
Recap
• Flexible and Dynamic Schemas are a great tool
– Use them wisely
– Make sure you understand the tradeoffs
– Make sure you understand the different strategies and
options
• Works well with Strongly Typed Languages