This document discusses data intensive applications and some of the challenges, tools, and best practices related to them. The key challenges with data intensive applications include large quantities of data, complex data structures, and rapidly changing data. Common tools mentioned include NoSQL databases, message queues, caches, search indexes, and batch/stream processing frameworks. The document also discusses concepts like distributed systems architectures, outage case studies, and strategies for improving reliability, scalability, and maintainability in data systems. Engineers working in this field need an accurate understanding of various tools and how to apply the right tools for different use cases while avoiding common pitfalls.
5. Engineers Job
• Accurate understanding of tools
• Dig deeper into the buzzwords and mine out the
trade-offs
• Understand the principles, algorithms and check
• Where each tool fits in
• How to make good use of each tool
• How to avoid pitfalls
6. Big outages
• Facebook - https://www.facebook.com/notes/facebook-engineering/
more-details-on-todays-outage/431441338919/
• Amazon - http://money.cnn.com/2011/04/22/technology/
amazon_ec2_cloud_outage/index.htm
• Google - https://www.cnet.com/news/google-outage-reportedly-
caused-big-drop-in-global-traffic/
• Sweden dropped off the internet - http://www.networkworld.com/article/
2232047/data-center/missing-dot-drops-sweden-off-the-internet.html
• EBS impact - https://aws.amazon.com/message/65648/
7. Flipkart Big Billion 2015, 2014
Crashes,
No search results,
“Please try after sometime”
What went wrong?
9. AWS Problems
• Whole zone failure problem
• Virtual h/w life is lesser than real h/w, 200days on avg
• Better to be in more than one zone, and redundant across zones
• Multi zone failures too happen, so go for multi-region also
• To maintain high uptime, EBS is not the best option
• I/O rates on EBS are poor
• EBS fails at the region level, not on a per-volume basis
• Failure of an EBS volumes can lock the entire Linux machine, leaving it inaccessible and affecting even
operations that don't have direct disk activity
• Other AWS services that use EBS may fail when EBS fails
• Services like ELB, RDS, Elastic Beanstalk use EBS
• EC2 and S3 don't use EBS
Ref: http://www.talisman.org/~erlkonig/misc/aws-the-good-the-bad+the-ugly/
10. 1. Does this arch ensure that the data remains
correct and complete, even when things go
wrong internally
2. Does this provide consistently good
performance even when part of the system
are degraded
3. Does it scale to handle increase in load
4. What does an API for this kind of service
look like
A typical system architecture
12. Reliability
• The system should work correctly in the face of adversity
• Correctly - Performing the correct function at the desired
level of performance
• Tolerate user mistakes, prevent unauthorised access …
• Adversity - Hardware faults, software faults, and even human
error
• Anticipate faults and design for it
• Even AWS has problems and needs its own way of
planning
13. Software Errors
• Errors
• A runaway process that uses a shared resource like cpu, memory, disk, or network bandwidth
• A service which has slowed down, become unresponsive
• Cascading failures of components
• Fixes
• Careful analysis of assumptions and interactions in the system
• Thorough testing
• Process isolation
• Allowing processes to crash and restart. Chaos Monkey by Netflix.
• Measuring, monitoring and analysing system behaviour in production
• Constantly checking the guarantees a system provide, and raising an alert in case of
discrepancies.
14. Human Errors
• Well defined abstractions, APIs, and admin interfaces
• These make it easy to do the “right thing” and discourage the “wrong thing”
• Setup fully featured non-production sandbox environment
• Here people can explore and experiment using real data w/o affecting real users
• Unit, integration, automated and manual testing.
• Automated is particularly good for covering corner cases
• Allow quick and easy recovery from human errors
• Make it fast to rollback of config changes, gradually roll out new code, tools to recompute the
data
• Setup metrics, monitoring and error rates
• These let us get early warning signals, check if any assumption is being violated, and
diagnose an issue in case of errrors/faults/failures.
15. Scalability
• As the systems grows, there should be reasonable
ways of dealing with the growth
• Grows - growth in data volume, traffic volume or
complexity
16. Describing Load
• Load parameters like
• Request/sec to a web server
• Ratio of reads to writes to a database
• No of simultaneously active users in a chat room
• Hit rate on a cache.
• Twitter - 2 main operations
• Post tweet - 4.6K requests/sec on avg, 12k requests/sec at peak (2012)
• Home timeline - 300K requests/sec
• Hybrid approach if Implementation
• Users with less following - Fanout tweet immediately to home time line caches of all followers of the
user
• Celebrities (30M followers) - Fetch celebrity tweet separately and merge into the timeline of the
celebrity follower only when follower loads his home timeline
17. Describing Performance
• Performance parameters like
• Throughput - In batch processing systems like Hadoop
• Response time - In online systems
• Response time not always remain same for reasons like
• Context switch to a background process
• Loss of a network packet and TCP retransmission
• Garbage collection pause
• Page fault forcing a read from disk
• Mechanical vibrations in the server rack
18. Measuring Performance
• Median and percentiles (95p, 99p, 99.9p) of
performance metrics
• Plotting them on a histogram
• Averaging out histograms for all servers
19. Maintainability
• Over the time many people will work on the system, and they should be able to
work productively
• Fix bugs, investigate failures
• Keep system operational
• Implement new use cases
• Repay technical debt
• 3 design principles for a maintainable system
• Operability
• Simplicity
• Evolvability
20. Operability
• Operational tasks
• Health monitoring and restoring a
service from bad state
• Tracking down cause of failures or
degraded performances
• Updates, security patches
• Capacity planning
• Setting up tools for Deployment and
configuration management
• Moving applications from one
platform to another
• Preserving knowledge as people
come and go
• How data systems can support
effectiveness of operational tasks
• Good monitoring - visibility into
runtime behaviour and system
internals
• Support automation
• Avoiding dependency on individual
machines
• Provide good documentation
• Provide good default behaviour, and
option to override defaults
• Self healing where appropriate with
option to manually control system
state
21. Operations friendly services
best practices
• Expect failures, handle all failures gracefully
• Component may crash/stop
• Dependent component may crash/stop
• Network failure
• Disk can go out of space
• Keep things simple
• Avoid unnecessary dependencies
• Installation should be simple
• Failures on one server should have no impact on rest of the data centre.
• Automate everything
• People make mistakes, they need sleep, they forget things
• Automated processes are testable, fixable and therefore more reliable
Ref: https://www.usenix.org/legacy/events/lisa07/tech/full_papers/hamilton/hamilton.pdf
22. Latency
• Understand latency from the entire latency distribution curve
• Simply looking at 95th or 99th percentile is not sufficient
• Tail latency matters
• Median is not representative of common case. Average is even worse.
• No single metric can define behaviour of latency
• Be conscious with the monitoring tools and the data they report
• Percentiles cant be averaged
• Latency is not service time
• Plot your data with coordinated omission and there is often a quick high rise in the curve
• A non-omitted test often has a smoother curve
• Very few tools actually correct for coordinated omission
• HdrHistogram
• Is additive, uses log buckets, helpful in capturing high volume data in production
Ref: http://bravenewgeek.com/everything-you-know-about-latency-is-wrong/
24. Document vs Relational
• Document database store one-to-many relationships or nested records within the parent record
(not in a separate table)
• One-to-many - One person can have many contact details
• Document and Relational database store many-to-one and many-to-many using unique
identifier called foreign key in relational and document reference in document model.
• Many-to-one - Many persons can have one address
• Many-to-many - Many persons can have many skills
25. Document vs Relational
cont.Document Relational
Data Model
Closer to data structures used by application Better support for joins
Schema flexibility Better support for Many to One relationships
Better performance due to locality Better support for Many to Many relationships
Fault Tolerance
Concurrency
Good for Analytics app where M-M reln is not needed
Bad for
• Reading small portion of a large document
• Writes that increase the size of large document
Recommended
use
• Keep documents fairly small
• Avoid writes that increase document size
27. Facebook Thundering herd Problem
• Problem:
• Millions of people tune in to a celebrity Live broadcast simultaneously,
potentially 100s of thousands of video requests will see a cache miss at the
Edge Cache servers.
• This results in excessive queries to the Origin Cache and Live processing
servers, which are not designed to handle high concurrent loads.
• Solution:
• Create request queues at the Edge cache servers,
• Allowing one request to go through to the livestream server and return the content
to the Edge cache, where it is distributed to the rest of the queue all at once.
Ref: https://code.facebook.com/posts/1653074404941839/under-the-hood-
broadcasting-live-video-to-millions/
28. PostgreSQL MongoDb
Flexibility Have to match schema Put anything in any document
Integrity Read valid data only Read anything out
Consistency
Written means written, no
exceptions (except disk failure, use
RAID)
Written means written, unless something
goes wrong (e.g. server crash, network
partition, disk failure)
Availability If master dies, stop to avoid corruption
If master dies, rebalance to avoid
downtime
Bigger servers
(Expensive, Cant use
cloud)
Good, upto 64 cores, 1TB Ram Bad, per database write lock
Sharding (Cheaper, works
in cloud)
Bad, hard to choose shards to
maintain integrity
Good, built in support with mongos
Replication
Doesn't help write-throughput, always hits master
Faster Failover
Ideal Use Case
Good for storing arbitrary pieces of json, when you don't care at all what is inside that JSON.
If your code expects something TO BE present in json, then MongoDB is wrong choice.
Never use mongoldb if one document has conceptual links to another document(s).
Ref: https://speakerdeck.com/conradirwin/mongodb-confessions-of-a-postgresql-lover
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
https://www.infoq.com/presentations/data-types-issues
29. Storage Engines
• Optimised for one of
• Transaction processing
• Analytics such as column oriented
• Belong to one of families
• Log structured storage engines
• Page oriented storage engines such as B-trees
30. Data structure behind databases
#!/bin/bash
db_set () {
echo "$1,$2" >> database
}
db_get () {
grep "^$1," database | sed -e "s/^$1,//" | tail -n 1
}
$ db_set 81 '{"x":"11","places":["London Eye"]}'
$ db_set 42 '{"x":"23","places":["Exploratorium"]}'
$ db_set 42 '{"x":"35","places":["Golden Gate"]}'
$ db_get 42 '{"x":"35","places":["Golden Gate"]}'
$ cat database
81,{"x":"11","place":["London Eye"]}
42,{"x":"23","places":["Exploratorium"]}
42,{“x":"35","places":["Golden Gate"]}
Many dbs use a log, an append-only data file, similar to
what db_set does
But real database has to deal with more issues
• Concurrency control
• Reclaiming disk space
• Log size control
• Handling errors, crash recovery
• Partially written records
• File format
• Deleting records
Append-log is efficient
• Appending and segment merging are faster
• Concurrency and crash recovery are much simpler if
segment file are append-only or immutable
• Merging old segments avoids fragmentation problem
32. Indexes
• Hash Index
• Must fit in memory, for very large no if keys, hash index wont work
• Range queries wont work
• SSTables and LSM-Trees
33. Traditional RDBMS wisdom
• Row store
• Data is in disk block formatting (heavily encoded)
• With a main memory buffer pool of blocks
• Query plans
• Optimize CPU, I/O
• Fundamental operation is read a row
• Indexing via B-Trees
• Clustered or Unclustered
• Dynamic row level locking
• Aries-style write-ahead log
• Replication (sync or async)
• Update the primary first
• Then move the log to the other sites
• And roll forward at the secondary(s)
• MySQL, Oracle, Postgres, SQLServer, DB2
• Traditional wisdom is now obsolete
34. DBMS marketplace
Market size Features Current State
Data
warehouse
s
1/3
• Lots of big reads
• Bulk-loaded from OLTP
systems
• Market already moving towards column
stores (which is not based on traditional
wisdom ex. HP Vertica, Amazon
Paraccel)
• Column stores are 50 - 100 times faster
than row stores
OLTP 1/3
• Lots of small updates
• And a few reads
• Not clear who will win, but NewSQL dbs
are wildly faster. Ex. Voltdb, Google
Spanner
• OLTP and NewSQL
Everything
else
1/3
• Hadoop, NoSQL, graph dbs, array dbs
…
35. Why column-stores are faster
• Typical warehouse query read 4-5 attribute from a 100 column fact table
• Row store - reads all 100 attributes
• Column store - reads just the ones you need
• Compression is way easier and more productive in column store
• Each column has data of same type -> Each block contains data of one kind of attribute. Bitmap
can be used
• No big record headers in column store
• They dont compress well
• A column executor is wildly faster than row executor
• Because of vector processing
36. OLTP and NewSQL
What future holds for OLTP
• Main memory DBMS
• With anti-caching
• Deterministic concurrency control
• HA via active-active
OLTP data bases - 3 big decisions
• Main memory vs disk orientation
• Concurrency control strategy
• Replication strategy
Ref : http://slideshot.epfl.ch/play/suri_stonebraker
37. Data format or schema changes
• Data format/schema change often needs a
change in application code
• Code changes often cannot happen
instantaneously
• Server side apps - Staged Rollout
(installing new codes in some nodes and
gradually installing to other nodes as the
new code is found working fine)
• Client side apps - Some user may and
some may not install upgrade for some
time
• Hence old & new versions of code, and old
& new data formats may potentially coexist
in the system at the same time.
• Backward compatibility - Newer code
can read data that was written by older
code
• Forward compatibility (trickier)- Older
code can read data that was written by
newer code
• Data encoding formats supports in
achieving the above requirements
• JSON, XML, Protocol Buffers, Thrift,
Avro
38. Encoding formats
• Programs generally work with data in 2 representations
• In-memory representation - As object, structs, lists,
array, hash tables, trees and so on.
• These data structures are often optimised for
efficient usage by CPU typically using
pointers
• Disk File and/or Over the network representation -
A self contained byte sequence of bytes to be
stored in a disk file or to be transferred over the
network
• Since a pointer wouldn’t make sense to any
other process, this and in-memory
representation are quite different
• Encoding
• Translation from in-memory to byte
sequence
• Also called marshalling,
serialisation
• Decoding
• Translation from byte sequence to
in-memory
• Also called unmarshalling,
deserialisation, parsing
39. Language-specific vs Standard formats
Language specific Standard
java.io.Serializable, Ruby Marshal, Python Pickle, PHP
serialize/unserialize functions
JSON, XML, CSV
Encoding is tied to programming language
Lots of ambiguity in numbers encoding. XML and CSV cant
distinguish between number and string. JSON distinguishes
numbers and strings, but it doesn't distinguishes integers and
floating points
To restore data in same object types, the decoding
process needs to instantiate arbitrary classes which
has security issues
JSON and XML support Unicode character strings i.e. human
readable text, but don't support binary strings i.e. seq of bytes w/
o character encoding. Generally Base64 is used as a
workaround.
Data versioning is not taken care of. Backward and
Forward compatibility is always an issue
There is optional schema support for XML and JSON. They are
powerful but they are quite complicated too. CSV doesn't have
schema.
Efficiency - cpu time and size of encoded data is
always an afterthought
CSV is vague format, confusion arises if a value contains comma
or newline character. Its escaping rules are not correctly
implemented by all parsers
40. Security issue with arbitrary class
instantiation
• A Vulnerability in Java environments
• Any application that accepts serialized Java objects is likely vulnerable,
even if a framework or library is responsible and not your custom code.
• There’s no easy way to protect applications en-masse. It will take
organizations a long time to find and fix all the different variants of this
vulnerability.
• There’s no way to know what you’re deserializing before you’ve
decoded it.
• An attacker can serialize a bunch of malicious objects and send them to
your application.
• ObjectInputStream in = new ObjectInputStream( inputStream );
• return (Data)in.readObject();
• Once you call readObject(), it’s too late. The attackers malicious objects
have already been instantatiated, and have taken over your entire
server.
• Solution : Allow deserialization, but
make it impossible for attackers to
create instances of arbitrary classes.
• List<Class<?>> safeClasses =
Arrays.asList( BitSet.class,
ArrayList.class );
• Data data =
safeReadObject( Data.class,
safeClasses, 10, 50, inputStream );
• Limit the input to a maximum of
10 embedded objects and 50
bytes of input.
41. Javascript - working with large
numbers
• JS supports only 53 bits integers
• All numbers in JS are floating point numbers
• Numbers including integers and floating point are
represented as sign x mantissa x 2^exponent
• Mantissa has 53 bits
• Exponent can be used to get higher numbers but
they wont be contagious
• Twitter keeps 64 bit integer ids for
status, user, direct message, search ids
• Due to JS integer limitation, json
returned by twitter api includes ids
twice, once as json number and once
as decimal string.
• {"id": 10765432100123456789, "id_str":
"10765432100123456789", ...}
• Languages that use 64 bit unsigned
integers can use property id and don't
need id_str
• Javascript can use id_str along with
library like strint to do all kind of math
operations on id_str
Ref : http://2ality.com/2012/07/large-integers.html
https://groups.google.com/forum/#!topic/twitter-development-talk/ahbvo3VTIYI
42. Scaling to Higher Load
Shared memory Shared disk Shared nothing
Many CPUs, many RAM chips and many
disks joined together under one OS, and
a fast interconnect allows any CPU to
access any part of the memory or disk
Uses several machines with
independent CPUs and RAMs, but
stores data on an array of disks that is
shared between the machines,
connected via a fast network
Each machine running the database
software is called a node. Each node
has its own CPU, RAM, and disks.
Any coordination between nodes is
done at software level, using
conventional network
Cost is super linear. A machine twice the
size may not necessarily handle twice the
load
Application developer needs to be
super cautious. Since the data is
distributed over multiple nodes,
constraints and trade-offs needs to
be taken care of at software level.
Also called Vertical Scaling or Scaling
Up. Its the simplest approach i.e. buy
powerful machine
Also called horizontal scaling or
scaling out
Crash recovery is easiest, but
concurrency control is little difficult
because of the necessity of dealing with
lock table as hot spot
Concurrency control is most difficult
because of coordinating multiple
copies of the same lock table, and
syncing writes to a common log or logs
Concurrency control is more difficult
because it requires a distributed
deadlock detector and a multi-phase
commit protocol
Ref: http://www.benstopford.com/2009/11/24/understanding-the-shared-nothing-architecture/
43. Scaling to Higher Load cont..
Shared memory Shared disk Shared nothing
Difficulty of concurrency control 2 3 2
Difficulty of crash recovery 1 3 2
Difficulty of data base design 2 2 3
Difficulty of load balancing 1 2 3
Difficulty of high availability 3 2 1
Number of messages 1 2 3
Bandwidth required 3 2 1
Ability to scale to large no of machines 3 2 1
Ability to scale to large distances between machines 3 2 1
Susceptibility to critical sections 3 2 1
Number of system images 1 3 3
Susceptibility to hot spots 3 3 3
Ranking 1 - best, 2 - 2nd best, 3 - 3rd best
Ref: http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf
44. Replication
Use cases
• Reduce latency
• Increase availability
• Increase read throughput
Scenarios
• Small dataset, stored in a single
machine
• Partitioning or Sharding, stored in
multiple machines
• Faults
• Synchronous vs Asynchronous
replication
• Handling failed replicas
• Eventual consistency
• Setting up new followers
Replicating changes
• Single leader
• Multi leader
• Leaderless
45. Leader based replication
• Also known as Active/Passive and
Master/Slave replication
• Built in feature of
• Postgres, Mysql, Oracle Data
Guard, Sql Server Availability
Group
• MongoDB, RethinkDB, Espresso
• Kafka, RabbitMQ
• Network File Systems,
replicated block devices like
DRDB
• Synchronous replication
• Leader waits for confirmation from follower before reporting success to its
client
• Guarantee of up-to-date copy between leader and follower
• All followers can never be synchronous : any one node outage would cause
the whole system to grind to a halt
• Asynchronous replication
• Leader sends message to follower and reports success to its client (does n
wait for confirmation from follower)
• Often leader based replication is asynchronous
• Non durable - If leader fails and is not recoverable, all un-replicated writes
are lost
• It is inevitable when many followers or geographically distributed followers
• Semi-synchronous replication
• If sync follower becomes unavailable or slow, an async follower is made
synchronous
46. Setting up new followers
• Take a consistent snapshot of leader’s db (without taking a lock on entire db)
• Most dbs have this built in. 3rd party tools like innobackupex for Mysql can also be
used.
• Snapshot should record the exact position of leader’s replication log. This position
is called as log sequence number (postgres), binlog coordinates (mysql).
• Copy the snapshot to the new follower node
• Follower connects to the leader, and request all the data changes happened after log
sequence number
• After follower has processed all the backlog of data changes, it is said to caught up.
Now follower can continue processing data from leader as then happen
47. Handling node outages
• Leader failure handling is trickier than
follower failure handling:-
• Determining that the leader has failed
• Timeout is most popular strategy
to detect a leader’s failure (nodes
bounce messages back and forth
between each other and when a
node doesn't respond for 30secs
it is assumed to be dead)
• Choosing a new leader
• Either election process or
previously chosen controller node.
The best candidate is usually the
replica with most up-to-date data
changes from the old leader.
• Reconfiguring the system to use the
new leader
• Using Request Routing, client now
send data to new leader. When
old leader comes back, the
system has to ensure that it
becomes follower and recognise
the new leader
• Failover is subject to things that may go wrong
• Async repl : New leader do not have all the writes form old leader
• If old leader rejoin the cluster, what should happen to new those writes?
• The new leader may have received conflicting writes in the meantime!!
• Commonly, these writes are discarded, which has its own problems
• Violation of client’s durability expectation
• Dangerous situations may arise if other storage systems outside of
database needs to be coordinated with the database contents
• Ex. Github incident when out-of-date MySQL follower was
promoted to leader. Some auto increment primary keys were
reused by old and new leaders. The same keys were used by
Redis store. This resulted in private data of some users shared
with some other users
• Split brain : 2 nodes both believing they are the leader
• Sometime this lead to shutdown of both system
• Timeout : Right timeout for leader to be declared dead
• Short timeout can lead to unnecessary failovers
• Long timeout can be due to load on network, traffic spike, any failure
during such situation can worsen the situation further
Due to unavailability of easy solutions to these problems, most devops teams prefer to use manual failover even if software supports auto failover
48. Implementations of replication logs
Statement based Write ahead log Logical log Trigger based
Every write statement is logged
and sent to follower i.e. every
insert, update, delete is
statement is forwarded. The
follower parses and executes
the statement as if it is received
from a client.
The database log is used to
build a replica on another
node - both log-structured
storage engine and b-tree
uses a log in some or the
other way to store the data
Use different log formats
for replication and for
storage engine.
A transaction that
modifies several rows
generates several such
log records
Involves application code,
replication is moved up the
application layer. Ex. When
only a subset of data is to be
replicated, or want to replicate
from one kind of database to
another.
Cons : A statement that calls a
non-deterministic function like
NOW() or RAND() is likely
generate a different value on
replica.
Auto-increment may have a
different impact in executed in
different order
Cons: The Log describes
data on a very low level
including details like which
bytes were changed on
which disk block. This make
tight coupling with storage
engine - A zero-downtime
upgrade of database
softwares by first upgrading
followers and then making
one of the nodes the leader
is not possible.
Pros: Allows for
decoupling of replication
log and storage engine -
a zero-downtime
upgrade is hence
possible.
The logical log can sent
to external systems such
as data warehouse,
custom indexes, caches
Triggers and stored
procedures are used to
achieve this.
Cons: Greater overhead than
other replication methods,
more prone to bugs and
limitations.
Used in MySQL before ver 5.1.
Now MySQL switches to row
based repl whenever any non-
determinism is present in a
statement
Used in PostgreSql and
Oracle
Mysql’s binlog when
configured to use row
based replication use
this approach.
Databus for Oracle and
Bucardo for Postgres
Ref: http://www.benstopford.com/2009/11/24/understanding-the-shared-nothing-architecture/
49. Multi leader replication
• A somewhat retrofitted
feature in many databases
• Often it causes pitfalls and
problems with other
database features
• Auto incrementing keys
• Triggers
• Integrity constraints
• Multi leader replication is
often considered a
dangerous territory that
should be avoided if
possible
50. Use cases for Multi leader replication
• Multi-datacenter operation
• Performance is better as the every write can be processed in local/nearest datacenter
• Datacenter outages can be better tolerated
• Network problems can be better tolerated
• Clients with offline operation
• Calendar apps in mobile phones, laptop and other devices, needs to allow create/edit/view of calendar events even if they are not
connected to internet.
• All offline changes needs to be synced with server and other devices when the drive is next online.
• All device local database acts a leader, and there is async multi-leader replication process between the replicas of calendar on all
devices
• There is a rich history of broken calendar sync implementations. Hence multi- leader repl is a tricky thing to get right
• CouchDB is designed for making this use case easier
• Collaborative editing
• Google docs, Etherpad
• Changes are instantly applied to local replica and async replicated to server and other users editing the same document
• To avoid conflicts, each user obtain a lock before editing the document
• For faster collaboration, the unit of change is made very small, ex. a single keystroke.
51. Conflict resolution - in multi leader replication
• Custom conflict resolution
• On write
• Bucardo works this way
• When conflict is detected, the database calls a conflict handler. In bucardo it can be perl script.
• The handler runs in the background and do not allow to prompt the user
• On read
• CouchDB works this way
• When conflict is detected, all conflicting writes are stored.
• The next time data is read, all the multiple versions of data is returned to application code.
• The application may prompt the user or resolve the conflict and write back the result in database.
• Automatic conflict resolution
• Used by Amazon
• Frequently products that are removed from cart still appear in the cart due to the conflict resolution logic errors
52. Ensuring Consistency in Multi-leader Replication
• Pessimistic Locking
• Wait for your turn
• Optimistic Locking
• Early bird gets the worm
• Conflict Resolution
• Your mother cleans up later
• Conflict Avoidance
• Solve the problem by not having it
#https://www.percona.com/live/mysql-conference-2013/sessions/state-art-mysql-multi-master-replication slide 7
53. Microservices at UBER
• Microservices bring benefits like
• Each teams owning their own
release cycles
• Each team responsible for their
own uptime
• Microservices has challenges like
• The aggregate velocity can be
much slower for ex. the Java team
has to figure out how to talk to
metrics system, so do Node people
and Go people
• A hard fought bug on one platform
has also to be fought on another
platform
• “I hadn't expected the cost of multiple
languages to be as high as it was” —
Matt Ranney (Uber’s Chief System
Architect)
Present in lots of data
centres around the world
TLS termination
at front end
Riak clusters to manage the
site of all in progress jobs
Completed jobs travel from
Marketplace to other logic
systems through Kafka
Marketplace - the dispatch system which
supports all sorts of logistics including rides,
ubereats etc. in Node.js, Java, Go
Other queues execute other
workflows ex. prompt user
to get the receipt and rate
the trip
Map services compete the
ETAs and routes for the trip.
Some high throughput
systems written in Java
All Kafka streams go to Hadoop for
analytical processing
Moving towards type safe and
verifiable interfaces between
services as type unsafe JSON cost
is too huge
A lot of early code was using
JSON over HTTP, which
makes it hard to validate
interfaces.
Army of mobile phones around the world
doing black box testing