The document describes an architecture and implementation for server-side data synchronization for mobile apps. It discusses syncing scenarios, challenges with the existing solution, and the new architecture and implementation. The key aspects covered are using GUIDs for unique identifiers, suggesting a "from" timestamp for incremental syncing, transferring record states instead of operations, and algorithms for resolving conflicts including for hierarchical data using a sort by hierarchy and updating ids.
9. Scenario
Brownfield project
!
several mobile apps for tracking user generated data
(calendar, notes, bio data)
!
iOS & Android
!
~10 K users steadily growing at 1.2 K/month
10. Scenario
MongoDB
!
Legacy App based on codeigniter
!
Existing RPC-wannabe-REST API for data sync
11. Scenario
get updates:
!
POST /m/<app>/get/<user_id>/<res>/<updated_from>
!
!
!
send updates:
!
POST /m/<app>/update/<user_id>/<res_id>/<dev_id>/<res>
!
!
17. Not Invented Here?
Don't Reinvent The Wheel,
Unless You Plan on Learning More About Wheels
!
J. Atwood
18. Architecture
!
!
2 different mobile platforms
several teams with different skill level
!
changing storage wasn’t an option
forcing a particular technology client side wasn’t an option
20. Implementation
!
!
In the sync domain all resources are the same
!
For every app
one endpoint for getting new data
one endpoint for pushing changes
one endpoint for uploading images
21. Get changes
!
Get all changes (1st sync):
!
GET /apps/{app}/users/{user_id}/changes
!
Get latest changes:
!
GET /apps/{app}/users/{user_id}/changes?from={from}
22. Get changes
!
Get all changes (1st sync):
!
GET /apps/{app}/users/{user_id}/changes
!
Get latest changes:
!
GET /apps/{app}/users/{user_id}/changes?from={from}
timestamp?
23. Server suggest the sync time
timestamp are inaccurate (skew and developer errors)
!
server suggests the “from” parameter to be used in the
next request
GET /changes
c1 server
{ ‘next’ : 123456,
‘data’: […] }
24. Server suggest the sync time
GET /changes
{ ‘next’ : 12345,
‘data’: […] }
c1 server
25. Server suggest the sync time
GET /changes
{ ‘next’ : 12345,
‘data’: […] }
c1 server
GET /changes?from=12345
{ ‘next’ : 45678,
‘data’: […] }
27. what to transfer
!
we chosen to transfer states
{id: ‘1’, ’type’: ‘measure’, ‘_deleted’: true}
{id: 2’, ‘type’: ‘note’}
{id: ‘3’, ‘type’: ‘note’}
!
!
ps: soft delete all the things!
28. unique identifiers
How do we generate an unique id in a distributed system?
!
UUID: several implementations (RFC 4122)
!
Local Ids/Global Id: server generates GUIDs
clients use local ids to manage their records
GET /changes
c1 server
{‘data’:{’guid’: ‘58f0bdd7-1481’}}
30. conflict resolution algorithm (plain data)
!
server handles conflicts resolution
mobile generated data are “temporary” until sync to server
!
conflict resolution:
domain indipendent: last-write wins
domain dipendent: use domain knowledge to resolve
54. enforcing domain constraints
!
!
e.g. “only one temperature can be registered in a given day”
!
how to we enforce domain constraints on data?
55. enforcing domain constraints
!
!
e.g. “only one temperature can be registered in a given day”
!
how to we enforce domain constraints on data?
1) relax constraints
56. enforcing domain constraints
!
!
e.g. “only one temperature can be registered in a given day”
!
how to we enforce domain constraints on data?
1) relax constraints
2) integrate constraints in sync algorithm
57. !
!
from findByGuid to findSimilar
!
first lookup by GUID then by domain rules
!
“two measures are similar if are referred to the same date”
!
!
!
!
enforcing domain constraints
64. dealing with binary data
!
Binary data uploaded via custom endpoint
!
Sync data remain small
!
Uploads can be resumed
65. dealing with binary data
!
Two steps*
1) data are synched to server
2) related images are uploaded
!
* this means record without file for a given time
66. dealing with binary data
POST /merge
{ ‘lid’ : 1,
‘type’ : ‘baby’,
‘image’ : ‘myimage.jpg’ }
{ ‘lid’ : 1,
‘guid’ : ‘ac435-f8345’ }
c1 server
POST /upload/ac435-f8345/image
67. What we learned
!
Implementing this stuff is tricky
!
Explore existing solution if you can
!
Understanding the domain is important
69. CRDT
!
Conflict-free Replicated Data Types (CRDTs)
!
Constraining the types of operations in order to:
- ensure convergence of changes to shared data by
uncoordinated, concurrent actors
- eliminate network failure modes as a source of error
70. Math!!!
CRDT
!
Bounded-join semilattices
- join operation defining a least
upper bound
- partially order set
- always increasing
71. Gateways handles sync
Data flows through channels
- partition data set
- authorization
- limit the data
!
Use revision trees
Couchbase Mobile
72. Riak
Distributed DB
Eventually/Strong Consistency
!
Data Types
!
Configurable conflic resolution
- db level for built-in data types
- application level for custom data