SlideShare uma empresa Scribd logo
1 de 54
Baixar para ler offline
Data Intensive
Applications
• Challenges
• Quantity of data
• Complexity of data
• Speed of change in data
When data is primary
challenge
Tools
• NoSQL
• Message Queues
• Caches
• Search Indexes
• Batch processing frameworks
• Stream processing frameworks
Buzzwords
Engineers Job
• Accurate understanding of tools
• Dig deeper into the buzzwords and mine out the
trade-offs
• Understand the principles, algorithms and check
• Where each tool fits in
• How to make good use of each tool
• How to avoid pitfalls
Big outages
• Facebook - https://www.facebook.com/notes/facebook-engineering/
more-details-on-todays-outage/431441338919/
• Amazon - http://money.cnn.com/2011/04/22/technology/
amazon_ec2_cloud_outage/index.htm
• Google - https://www.cnet.com/news/google-outage-reportedly-
caused-big-drop-in-global-traffic/
• Sweden dropped off the internet - http://www.networkworld.com/article/
2232047/data-center/missing-dot-drops-sweden-off-the-internet.html
• EBS impact - https://aws.amazon.com/message/65648/
Flipkart Big Billion 2015, 2014
Crashes,
No search results,
“Please try after sometime”
What went wrong?
What is technology
implementation for
“Sale Prices Reveal at 8PM”
Myntra 2017 sales
AWS Problems
• Whole zone failure problem
• Virtual h/w life is lesser than real h/w, 200days on avg
• Better to be in more than one zone, and redundant across zones
• Multi zone failures too happen, so go for multi-region also
• To maintain high uptime, EBS is not the best option
• I/O rates on EBS are poor
• EBS fails at the region level, not on a per-volume basis
• Failure of an EBS volumes can lock the entire Linux machine, leaving it inaccessible and affecting even
operations that don't have direct disk activity
• Other AWS services that use EBS may fail when EBS fails
• Services like ELB, RDS, Elastic Beanstalk use EBS
• EC2 and S3 don't use EBS
Ref: http://www.talisman.org/~erlkonig/misc/aws-the-good-the-bad+the-ugly/
1. Does this arch ensure that the data remains
correct and complete, even when things go
wrong internally
2. Does this provide consistently good
performance even when part of the system
are degraded
3. Does it scale to handle increase in load
4. What does an API for this kind of service
look like
A typical system architecture
Basic requirements from DIA
• Reliability
• Scalability
• Maintainability
Reliability
• The system should work correctly in the face of adversity
• Correctly - Performing the correct function at the desired
level of performance
• Tolerate user mistakes, prevent unauthorised access …
• Adversity - Hardware faults, software faults, and even human
error
• Anticipate faults and design for it
• Even AWS has problems and needs its own way of
planning
Software Errors
• Errors
• A runaway process that uses a shared resource like cpu, memory, disk, or network bandwidth
• A service which has slowed down, become unresponsive
• Cascading failures of components
• Fixes
• Careful analysis of assumptions and interactions in the system
• Thorough testing
• Process isolation
• Allowing processes to crash and restart. Chaos Monkey by Netflix.
• Measuring, monitoring and analysing system behaviour in production
• Constantly checking the guarantees a system provide, and raising an alert in case of
discrepancies.
Human Errors
• Well defined abstractions, APIs, and admin interfaces
• These make it easy to do the “right thing” and discourage the “wrong thing”
• Setup fully featured non-production sandbox environment
• Here people can explore and experiment using real data w/o affecting real users
• Unit, integration, automated and manual testing.
• Automated is particularly good for covering corner cases
• Allow quick and easy recovery from human errors
• Make it fast to rollback of config changes, gradually roll out new code, tools to recompute the
data
• Setup metrics, monitoring and error rates
• These let us get early warning signals, check if any assumption is being violated, and
diagnose an issue in case of errrors/faults/failures.
Scalability
• As the systems grows, there should be reasonable
ways of dealing with the growth
• Grows - growth in data volume, traffic volume or
complexity
Describing Load
• Load parameters like
• Request/sec to a web server
• Ratio of reads to writes to a database
• No of simultaneously active users in a chat room
• Hit rate on a cache.
• Twitter - 2 main operations
• Post tweet - 4.6K requests/sec on avg, 12k requests/sec at peak (2012)
• Home timeline - 300K requests/sec
• Hybrid approach if Implementation
• Users with less following - Fanout tweet immediately to home time line caches of all followers of the
user
• Celebrities (30M followers) - Fetch celebrity tweet separately and merge into the timeline of the
celebrity follower only when follower loads his home timeline
Describing Performance
• Performance parameters like
• Throughput - In batch processing systems like Hadoop
• Response time - In online systems
• Response time not always remain same for reasons like
• Context switch to a background process
• Loss of a network packet and TCP retransmission
• Garbage collection pause
• Page fault forcing a read from disk
• Mechanical vibrations in the server rack
Measuring Performance
• Median and percentiles (95p, 99p, 99.9p) of
performance metrics
• Plotting them on a histogram
• Averaging out histograms for all servers
Maintainability
• Over the time many people will work on the system, and they should be able to
work productively
• Fix bugs, investigate failures
• Keep system operational
• Implement new use cases
• Repay technical debt
• 3 design principles for a maintainable system
• Operability
• Simplicity
• Evolvability
Operability
• Operational tasks
• Health monitoring and restoring a
service from bad state
• Tracking down cause of failures or
degraded performances
• Updates, security patches
• Capacity planning
• Setting up tools for Deployment and
configuration management
• Moving applications from one
platform to another
• Preserving knowledge as people
come and go
• How data systems can support
effectiveness of operational tasks
• Good monitoring - visibility into
runtime behaviour and system
internals
• Support automation
• Avoiding dependency on individual
machines
• Provide good documentation
• Provide good default behaviour, and
option to override defaults
• Self healing where appropriate with
option to manually control system
state
Operations friendly services
best practices
• Expect failures, handle all failures gracefully
• Component may crash/stop
• Dependent component may crash/stop
• Network failure
• Disk can go out of space
• Keep things simple
• Avoid unnecessary dependencies
• Installation should be simple
• Failures on one server should have no impact on rest of the data centre.
• Automate everything
• People make mistakes, they need sleep, they forget things
• Automated processes are testable, fixable and therefore more reliable
Ref: https://www.usenix.org/legacy/events/lisa07/tech/full_papers/hamilton/hamilton.pdf
Latency
• Understand latency from the entire latency distribution curve
• Simply looking at 95th or 99th percentile is not sufficient
• Tail latency matters
• Median is not representative of common case. Average is even worse.
• No single metric can define behaviour of latency
• Be conscious with the monitoring tools and the data they report
• Percentiles cant be averaged
• Latency is not service time
• Plot your data with coordinated omission and there is often a quick high rise in the curve
• A non-omitted test often has a smoother curve
• Very few tools actually correct for coordinated omission
• HdrHistogram
• Is additive, uses log buckets, helpful in capturing high volume data in production
Ref: http://bravenewgeek.com/everything-you-know-about-latency-is-wrong/
Declarative vs Imperative Languages
Document vs Relational
• Document database store one-to-many relationships or nested records within the parent record
(not in a separate table)
• One-to-many - One person can have many contact details
• Document and Relational database store many-to-one and many-to-many using unique
identifier called foreign key in relational and document reference in document model.
• Many-to-one - Many persons can have one address
• Many-to-many - Many persons can have many skills
Document vs Relational
cont.Document Relational
Data Model
Closer to data structures used by application Better support for joins
Schema flexibility Better support for Many to One relationships
Better performance due to locality Better support for Many to Many relationships
Fault Tolerance
Concurrency
Good for Analytics app where M-M reln is not needed
Bad for
• Reading small portion of a large document
• Writes that increase the size of large document
Recommended
use
• Keep documents fairly small
• Avoid writes that increase document size
TAO - Facebook distributed data store
Facebook Thundering herd Problem
• Problem:
• Millions of people tune in to a celebrity Live broadcast simultaneously,
potentially 100s of thousands of video requests will see a cache miss at the
Edge Cache servers.
• This results in excessive queries to the Origin Cache and Live processing
servers, which are not designed to handle high concurrent loads.
• Solution:
• Create request queues at the Edge cache servers,
• Allowing one request to go through to the livestream server and return the content
to the Edge cache, where it is distributed to the rest of the queue all at once.
Ref: https://code.facebook.com/posts/1653074404941839/under-the-hood-
broadcasting-live-video-to-millions/
PostgreSQL MongoDb
Flexibility Have to match schema Put anything in any document
Integrity Read valid data only Read anything out
Consistency
Written means written, no
exceptions (except disk failure, use
RAID)
Written means written, unless something
goes wrong (e.g. server crash, network
partition, disk failure)
Availability If master dies, stop to avoid corruption
If master dies, rebalance to avoid
downtime
Bigger servers
(Expensive, Cant use
cloud)
Good, upto 64 cores, 1TB Ram Bad, per database write lock
Sharding (Cheaper, works
in cloud)
Bad, hard to choose shards to
maintain integrity
Good, built in support with mongos
Replication
Doesn't help write-throughput, always hits master

Faster Failover
Ideal Use Case
Good for storing arbitrary pieces of json, when you don't care at all what is inside that JSON. 

If your code expects something TO BE present in json, then MongoDB is wrong choice. 

Never use mongoldb if one document has conceptual links to another document(s).
Ref: https://speakerdeck.com/conradirwin/mongodb-confessions-of-a-postgresql-lover

http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/

https://www.infoq.com/presentations/data-types-issues
Storage Engines
• Optimised for one of
• Transaction processing
• Analytics such as column oriented
• Belong to one of families
• Log structured storage engines
• Page oriented storage engines such as B-trees
Data structure behind databases
#!/bin/bash
db_set () {
echo "$1,$2" >> database
}
db_get () {
grep "^$1," database | sed -e "s/^$1,//" | tail -n 1
}
$ db_set 81 '{"x":"11","places":["London Eye"]}' 

$ db_set 42 '{"x":"23","places":["Exploratorium"]}' 

$ db_set 42 '{"x":"35","places":["Golden Gate"]}' 

$ db_get 42 '{"x":"35","places":["Golden Gate"]}'
$ cat database

81,{"x":"11","place":["London Eye"]} 

42,{"x":"23","places":["Exploratorium"]}

42,{“x":"35","places":["Golden Gate"]}
Many dbs use a log, an append-only data file, similar to
what db_set does
But real database has to deal with more issues
• Concurrency control
• Reclaiming disk space
• Log size control
• Handling errors, crash recovery
• Partially written records
• File format
• Deleting records
Append-log is efficient
• Appending and segment merging are faster
• Concurrency and crash recovery are much simpler if
segment file are append-only or immutable
• Merging old segments avoids fragmentation problem
Performing compaction and segment merging
simultaneously in a append-only log file
Indexes
• Hash Index
• Must fit in memory, for very large no if keys, hash index wont work
• Range queries wont work
• SSTables and LSM-Trees
Traditional RDBMS wisdom
• Row store
• Data is in disk block formatting (heavily encoded)
• With a main memory buffer pool of blocks
• Query plans
• Optimize CPU, I/O
• Fundamental operation is read a row
• Indexing via B-Trees
• Clustered or Unclustered
• Dynamic row level locking
• Aries-style write-ahead log
• Replication (sync or async)
• Update the primary first
• Then move the log to the other sites
• And roll forward at the secondary(s)
• MySQL, Oracle, Postgres, SQLServer, DB2
• Traditional wisdom is now obsolete
DBMS marketplace
Market size Features Current State
Data
warehouse
s
1/3
• Lots of big reads
• Bulk-loaded from OLTP
systems
• Market already moving towards column
stores (which is not based on traditional
wisdom ex. HP Vertica, Amazon
Paraccel)
• Column stores are 50 - 100 times faster
than row stores
OLTP 1/3
• Lots of small updates
• And a few reads
• Not clear who will win, but NewSQL dbs
are wildly faster. Ex. Voltdb, Google
Spanner
• OLTP and NewSQL
Everything
else
1/3
• Hadoop, NoSQL, graph dbs, array dbs
…
Why column-stores are faster
• Typical warehouse query read 4-5 attribute from a 100 column fact table
• Row store - reads all 100 attributes
• Column store - reads just the ones you need
• Compression is way easier and more productive in column store
• Each column has data of same type -> Each block contains data of one kind of attribute. Bitmap
can be used
• No big record headers in column store
• They dont compress well
• A column executor is wildly faster than row executor
• Because of vector processing
OLTP and NewSQL
What future holds for OLTP
• Main memory DBMS
• With anti-caching
• Deterministic concurrency control
• HA via active-active
OLTP data bases - 3 big decisions
• Main memory vs disk orientation
• Concurrency control strategy
• Replication strategy
Ref : http://slideshot.epfl.ch/play/suri_stonebraker
Data format or schema changes
• Data format/schema change often needs a
change in application code
• Code changes often cannot happen
instantaneously
• Server side apps - Staged Rollout
(installing new codes in some nodes and
gradually installing to other nodes as the
new code is found working fine)
• Client side apps - Some user may and
some may not install upgrade for some
time
• Hence old & new versions of code, and old
& new data formats may potentially coexist
in the system at the same time.
• Backward compatibility - Newer code
can read data that was written by older
code
• Forward compatibility (trickier)- Older
code can read data that was written by
newer code
• Data encoding formats supports in
achieving the above requirements
• JSON, XML, Protocol Buffers, Thrift,
Avro
Encoding formats
• Programs generally work with data in 2 representations
• In-memory representation - As object, structs, lists,
array, hash tables, trees and so on.
• These data structures are often optimised for
efficient usage by CPU typically using
pointers
• Disk File and/or Over the network representation -
A self contained byte sequence of bytes to be
stored in a disk file or to be transferred over the
network
• Since a pointer wouldn’t make sense to any
other process, this and in-memory
representation are quite different
• Encoding
• Translation from in-memory to byte
sequence
• Also called marshalling,
serialisation
• Decoding
• Translation from byte sequence to
in-memory
• Also called unmarshalling,
deserialisation, parsing
Language-specific vs Standard formats
Language specific Standard
java.io.Serializable, Ruby Marshal, Python Pickle, PHP
serialize/unserialize functions
JSON, XML, CSV
Encoding is tied to programming language
Lots of ambiguity in numbers encoding. XML and CSV cant
distinguish between number and string. JSON distinguishes
numbers and strings, but it doesn't distinguishes integers and
floating points
To restore data in same object types, the decoding
process needs to instantiate arbitrary classes which
has security issues
JSON and XML support Unicode character strings i.e. human
readable text, but don't support binary strings i.e. seq of bytes w/
o character encoding. Generally Base64 is used as a
workaround.
Data versioning is not taken care of. Backward and
Forward compatibility is always an issue
There is optional schema support for XML and JSON. They are
powerful but they are quite complicated too. CSV doesn't have
schema.
Efficiency - cpu time and size of encoded data is
always an afterthought
CSV is vague format, confusion arises if a value contains comma
or newline character. Its escaping rules are not correctly
implemented by all parsers
Security issue with arbitrary class
instantiation
• A Vulnerability in Java environments
• Any application that accepts serialized Java objects is likely vulnerable,
even if a framework or library is responsible and not your custom code.
• There’s no easy way to protect applications en-masse. It will take
organizations a long time to find and fix all the different variants of this
vulnerability.
• There’s no way to know what you’re deserializing before you’ve
decoded it.
• An attacker can serialize a bunch of malicious objects and send them to
your application.
• ObjectInputStream in = new ObjectInputStream( inputStream );
• return (Data)in.readObject();
• Once you call readObject(), it’s too late. The attackers malicious objects
have already been instantatiated, and have taken over your entire
server.
• Solution : Allow deserialization, but
make it impossible for attackers to
create instances of arbitrary classes.
• List<Class<?>> safeClasses =
Arrays.asList( BitSet.class,
ArrayList.class );
• Data data =
safeReadObject( Data.class,
safeClasses, 10, 50, inputStream );
• Limit the input to a maximum of
10 embedded objects and 50
bytes of input.
Javascript - working with large
numbers
• JS supports only 53 bits integers
• All numbers in JS are floating point numbers
• Numbers including integers and floating point are
represented as sign x mantissa x 2^exponent
• Mantissa has 53 bits
• Exponent can be used to get higher numbers but
they wont be contagious
• Twitter keeps 64 bit integer ids for
status, user, direct message, search ids
• Due to JS integer limitation, json
returned by twitter api includes ids
twice, once as json number and once
as decimal string.
• {"id": 10765432100123456789, "id_str":
"10765432100123456789", ...}
• Languages that use 64 bit unsigned
integers can use property id and don't
need id_str
• Javascript can use id_str along with
library like strint to do all kind of math
operations on id_str
Ref : http://2ality.com/2012/07/large-integers.html

https://groups.google.com/forum/#!topic/twitter-development-talk/ahbvo3VTIYI
Scaling to Higher Load
Shared memory Shared disk Shared nothing
Many CPUs, many RAM chips and many
disks joined together under one OS, and
a fast interconnect allows any CPU to
access any part of the memory or disk
Uses several machines with
independent CPUs and RAMs, but
stores data on an array of disks that is
shared between the machines,
connected via a fast network
Each machine running the database
software is called a node. Each node
has its own CPU, RAM, and disks.
Any coordination between nodes is
done at software level, using
conventional network
Cost is super linear. A machine twice the
size may not necessarily handle twice the
load
Application developer needs to be
super cautious. Since the data is
distributed over multiple nodes,
constraints and trade-offs needs to
be taken care of at software level.
Also called Vertical Scaling or Scaling
Up. Its the simplest approach i.e. buy
powerful machine

Also called horizontal scaling or
scaling out
Crash recovery is easiest, but
concurrency control is little difficult
because of the necessity of dealing with
lock table as hot spot
Concurrency control is most difficult
because of coordinating multiple
copies of the same lock table, and
syncing writes to a common log or logs
Concurrency control is more difficult
because it requires a distributed
deadlock detector and a multi-phase
commit protocol
Ref: http://www.benstopford.com/2009/11/24/understanding-the-shared-nothing-architecture/
Scaling to Higher Load cont..
Shared memory Shared disk Shared nothing
Difficulty of concurrency control 2 3 2
Difficulty of crash recovery 1 3 2
Difficulty of data base design 2 2 3
Difficulty of load balancing 1 2 3
Difficulty of high availability 3 2 1
Number of messages 1 2 3
Bandwidth required 3 2 1
Ability to scale to large no of machines 3 2 1
Ability to scale to large distances between machines 3 2 1
Susceptibility to critical sections 3 2 1
Number of system images 1 3 3
Susceptibility to hot spots 3 3 3
Ranking 1 - best, 2 - 2nd best, 3 - 3rd best

Ref: http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf
Replication
Use cases
• Reduce latency
• Increase availability
• Increase read throughput
Scenarios
• Small dataset, stored in a single
machine
• Partitioning or Sharding, stored in
multiple machines
• Faults
• Synchronous vs Asynchronous
replication
• Handling failed replicas
• Eventual consistency
• Setting up new followers
Replicating changes
• Single leader
• Multi leader
• Leaderless
Leader based replication
• Also known as Active/Passive and
Master/Slave replication
• Built in feature of
• Postgres, Mysql, Oracle Data
Guard, Sql Server Availability
Group
• MongoDB, RethinkDB, Espresso
• Kafka, RabbitMQ
• Network File Systems,
replicated block devices like
DRDB
• Synchronous replication
• Leader waits for confirmation from follower before reporting success to its
client
• Guarantee of up-to-date copy between leader and follower
• All followers can never be synchronous : any one node outage would cause
the whole system to grind to a halt
• Asynchronous replication
• Leader sends message to follower and reports success to its client (does n
wait for confirmation from follower)
• Often leader based replication is asynchronous
• Non durable - If leader fails and is not recoverable, all un-replicated writes
are lost
• It is inevitable when many followers or geographically distributed followers
• Semi-synchronous replication
• If sync follower becomes unavailable or slow, an async follower is made
synchronous
Setting up new followers
• Take a consistent snapshot of leader’s db (without taking a lock on entire db)
• Most dbs have this built in. 3rd party tools like innobackupex for Mysql can also be
used.
• Snapshot should record the exact position of leader’s replication log. This position
is called as log sequence number (postgres), binlog coordinates (mysql).
• Copy the snapshot to the new follower node
• Follower connects to the leader, and request all the data changes happened after log
sequence number
• After follower has processed all the backlog of data changes, it is said to caught up.
Now follower can continue processing data from leader as then happen
Handling node outages
• Leader failure handling is trickier than
follower failure handling:-
• Determining that the leader has failed
• Timeout is most popular strategy
to detect a leader’s failure (nodes
bounce messages back and forth
between each other and when a
node doesn't respond for 30secs
it is assumed to be dead)
• Choosing a new leader
• Either election process or
previously chosen controller node.
The best candidate is usually the
replica with most up-to-date data
changes from the old leader.
• Reconfiguring the system to use the
new leader
• Using Request Routing, client now
send data to new leader. When
old leader comes back, the
system has to ensure that it
becomes follower and recognise
the new leader
• Failover is subject to things that may go wrong
• Async repl : New leader do not have all the writes form old leader
• If old leader rejoin the cluster, what should happen to new those writes?
• The new leader may have received conflicting writes in the meantime!!
• Commonly, these writes are discarded, which has its own problems
• Violation of client’s durability expectation
• Dangerous situations may arise if other storage systems outside of
database needs to be coordinated with the database contents
• Ex. Github incident when out-of-date MySQL follower was
promoted to leader. Some auto increment primary keys were
reused by old and new leaders. The same keys were used by
Redis store. This resulted in private data of some users shared
with some other users
• Split brain : 2 nodes both believing they are the leader
• Sometime this lead to shutdown of both system
• Timeout : Right timeout for leader to be declared dead
• Short timeout can lead to unnecessary failovers
• Long timeout can be due to load on network, traffic spike, any failure
during such situation can worsen the situation further
Due to unavailability of easy solutions to these problems, most devops teams prefer to use manual failover even if software supports auto failover
Implementations of replication logs
Statement based Write ahead log Logical log Trigger based
Every write statement is logged
and sent to follower i.e. every
insert, update, delete is
statement is forwarded. The
follower parses and executes
the statement as if it is received
from a client.
The database log is used to
build a replica on another
node - both log-structured
storage engine and b-tree
uses a log in some or the
other way to store the data
Use different log formats
for replication and for
storage engine.

A transaction that
modifies several rows
generates several such
log records
Involves application code,
replication is moved up the
application layer. Ex. When
only a subset of data is to be
replicated, or want to replicate
from one kind of database to
another.
Cons : A statement that calls a
non-deterministic function like
NOW() or RAND() is likely
generate a different value on
replica. 

Auto-increment may have a
different impact in executed in
different order
Cons: The Log describes
data on a very low level
including details like which
bytes were changed on
which disk block. This make
tight coupling with storage
engine - A zero-downtime
upgrade of database
softwares by first upgrading
followers and then making
one of the nodes the leader
is not possible.
Pros: Allows for
decoupling of replication
log and storage engine -
a zero-downtime
upgrade is hence
possible.

The logical log can sent
to external systems such
as data warehouse,
custom indexes, caches
Triggers and stored
procedures are used to
achieve this.



Cons: Greater overhead than
other replication methods,
more prone to bugs and
limitations.
Used in MySQL before ver 5.1.
Now MySQL switches to row
based repl whenever any non-
determinism is present in a
statement
Used in PostgreSql and
Oracle
Mysql’s binlog when
configured to use row
based replication use
this approach.
Databus for Oracle and
Bucardo for Postgres
Ref: http://www.benstopford.com/2009/11/24/understanding-the-shared-nothing-architecture/
Multi leader replication
• A somewhat retrofitted
feature in many databases
• Often it causes pitfalls and
problems with other
database features
• Auto incrementing keys
• Triggers
• Integrity constraints
• Multi leader replication is
often considered a
dangerous territory that
should be avoided if
possible
Use cases for Multi leader replication
• Multi-datacenter operation
• Performance is better as the every write can be processed in local/nearest datacenter
• Datacenter outages can be better tolerated
• Network problems can be better tolerated
• Clients with offline operation
• Calendar apps in mobile phones, laptop and other devices, needs to allow create/edit/view of calendar events even if they are not
connected to internet.
• All offline changes needs to be synced with server and other devices when the drive is next online.
• All device local database acts a leader, and there is async multi-leader replication process between the replicas of calendar on all
devices
• There is a rich history of broken calendar sync implementations. Hence multi- leader repl is a tricky thing to get right
• CouchDB is designed for making this use case easier
• Collaborative editing
• Google docs, Etherpad
• Changes are instantly applied to local replica and async replicated to server and other users editing the same document
• To avoid conflicts, each user obtain a lock before editing the document
• For faster collaboration, the unit of change is made very small, ex. a single keystroke.
Conflict resolution - in multi leader replication
• Custom conflict resolution
• On write
• Bucardo works this way
• When conflict is detected, the database calls a conflict handler. In bucardo it can be perl script.
• The handler runs in the background and do not allow to prompt the user
• On read
• CouchDB works this way
• When conflict is detected, all conflicting writes are stored.
• The next time data is read, all the multiple versions of data is returned to application code.
• The application may prompt the user or resolve the conflict and write back the result in database.
• Automatic conflict resolution
• Used by Amazon
• Frequently products that are removed from cart still appear in the cart due to the conflict resolution logic errors
Ensuring Consistency in Multi-leader Replication
• Pessimistic Locking
• Wait for your turn
• Optimistic Locking
• Early bird gets the worm
• Conflict Resolution
• Your mother cleans up later
• Conflict Avoidance
• Solve the problem by not having it
#https://www.percona.com/live/mysql-conference-2013/sessions/state-art-mysql-multi-master-replication slide 7
Microservices at UBER
• Microservices bring benefits like
• Each teams owning their own
release cycles
• Each team responsible for their
own uptime
• Microservices has challenges like
• The aggregate velocity can be
much slower for ex. the Java team
has to figure out how to talk to
metrics system, so do Node people
and Go people
• A hard fought bug on one platform
has also to be fought on another
platform
• “I hadn't expected the cost of multiple
languages to be as high as it was” —
Matt Ranney (Uber’s Chief System
Architect)
Present in lots of data
centres around the world
TLS termination
at front end
Riak clusters to manage the
site of all in progress jobs
Completed jobs travel from
Marketplace to other logic
systems through Kafka
Marketplace - the dispatch system which
supports all sorts of logistics including rides,
ubereats etc. in Node.js, Java, Go
Other queues execute other
workflows ex. prompt user
to get the receipt and rate
the trip
Map services compete the
ETAs and routes for the trip.
Some high throughput
systems written in Java
All Kafka streams go to Hadoop for
analytical processing
Moving towards type safe and
verifiable interfaces between
services as type unsafe JSON cost
is too huge
A lot of early code was using
JSON over HTTP, which
makes it hard to validate
interfaces.
Army of mobile phones around the world
doing black box testing
Death of MapReduce at Google

Mais conteúdo relacionado

Mais procurados

How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoringAndrew White
 
Fast 360 assessment sample report
Fast 360 assessment sample reportFast 360 assessment sample report
Fast 360 assessment sample reportExtraHop Networks
 
PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...
PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...
PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...RightScale
 
Modern web dev_taxonomy
Modern web dev_taxonomyModern web dev_taxonomy
Modern web dev_taxonomykevin_donovan
 
Brighttalk high scale low touch and other bedtime stories - final
Brighttalk   high scale low touch and other bedtime stories - finalBrighttalk   high scale low touch and other bedtime stories - final
Brighttalk high scale low touch and other bedtime stories - finalAndrew White
 
Architecting for the cloud elasticity security
Architecting for the cloud elasticity securityArchitecting for the cloud elasticity security
Architecting for the cloud elasticity securityLen Bass
 
Red Hat Storage Day - When the Ceph Hits the Fan
Red Hat Storage Day -  When the Ceph Hits the FanRed Hat Storage Day -  When the Ceph Hits the Fan
Red Hat Storage Day - When the Ceph Hits the FanRed_Hat_Storage
 
Downtime is Not an Option: Integrating IBM Z into ServiceNow and Splunk
Downtime is Not an Option: Integrating IBM Z into ServiceNow and SplunkDowntime is Not an Option: Integrating IBM Z into ServiceNow and Splunk
Downtime is Not an Option: Integrating IBM Z into ServiceNow and SplunkPrecisely
 
Peter Zaitsev - Practical MySQL Performance Optimization
Peter Zaitsev - Practical MySQL Performance OptimizationPeter Zaitsev - Practical MySQL Performance Optimization
Peter Zaitsev - Practical MySQL Performance OptimizationCaroline_Rose
 
Real World Performance - OLTP
Real World Performance - OLTPReal World Performance - OLTP
Real World Performance - OLTPConnor McDonald
 
Understanding VMware Capacity
Understanding VMware CapacityUnderstanding VMware Capacity
Understanding VMware CapacityPrecisely
 
Design (Cloud systems) for Failures
Design (Cloud systems) for FailuresDesign (Cloud systems) for Failures
Design (Cloud systems) for FailuresRodolfo Kohn
 
How to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsHow to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsExtraHop Networks
 
How to Detect Heartbleed with Wire Data Analytics
How to Detect Heartbleed with Wire Data AnalyticsHow to Detect Heartbleed with Wire Data Analytics
How to Detect Heartbleed with Wire Data AnalyticsExtraHop Networks
 
Security Breakout Session
Security Breakout Session Security Breakout Session
Security Breakout Session Splunk
 
Architecting for the cloud cloud providers
Architecting for the cloud cloud providersArchitecting for the cloud cloud providers
Architecting for the cloud cloud providersLen Bass
 
IBM and Lightbend Build Integrated Platform for Cognitive Development
IBM and Lightbend Build Integrated Platform for Cognitive DevelopmentIBM and Lightbend Build Integrated Platform for Cognitive Development
IBM and Lightbend Build Integrated Platform for Cognitive DevelopmentLightbend
 
Measuring Data Quality with DataOps
Measuring Data Quality with DataOpsMeasuring Data Quality with DataOps
Measuring Data Quality with DataOpsSteven Ensslen
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rulesOleg Tsal-Tsalko
 

Mais procurados (20)

How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoring
 
Fast 360 assessment sample report
Fast 360 assessment sample reportFast 360 assessment sample report
Fast 360 assessment sample report
 
PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...
PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...
PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...
 
Modern web dev_taxonomy
Modern web dev_taxonomyModern web dev_taxonomy
Modern web dev_taxonomy
 
Brighttalk high scale low touch and other bedtime stories - final
Brighttalk   high scale low touch and other bedtime stories - finalBrighttalk   high scale low touch and other bedtime stories - final
Brighttalk high scale low touch and other bedtime stories - final
 
Architecting for the cloud elasticity security
Architecting for the cloud elasticity securityArchitecting for the cloud elasticity security
Architecting for the cloud elasticity security
 
IBM Lotus Notes 360
IBM Lotus Notes 360IBM Lotus Notes 360
IBM Lotus Notes 360
 
Red Hat Storage Day - When the Ceph Hits the Fan
Red Hat Storage Day -  When the Ceph Hits the FanRed Hat Storage Day -  When the Ceph Hits the Fan
Red Hat Storage Day - When the Ceph Hits the Fan
 
Downtime is Not an Option: Integrating IBM Z into ServiceNow and Splunk
Downtime is Not an Option: Integrating IBM Z into ServiceNow and SplunkDowntime is Not an Option: Integrating IBM Z into ServiceNow and Splunk
Downtime is Not an Option: Integrating IBM Z into ServiceNow and Splunk
 
Peter Zaitsev - Practical MySQL Performance Optimization
Peter Zaitsev - Practical MySQL Performance OptimizationPeter Zaitsev - Practical MySQL Performance Optimization
Peter Zaitsev - Practical MySQL Performance Optimization
 
Real World Performance - OLTP
Real World Performance - OLTPReal World Performance - OLTP
Real World Performance - OLTP
 
Understanding VMware Capacity
Understanding VMware CapacityUnderstanding VMware Capacity
Understanding VMware Capacity
 
Design (Cloud systems) for Failures
Design (Cloud systems) for FailuresDesign (Cloud systems) for Failures
Design (Cloud systems) for Failures
 
How to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsHow to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT Operations
 
How to Detect Heartbleed with Wire Data Analytics
How to Detect Heartbleed with Wire Data AnalyticsHow to Detect Heartbleed with Wire Data Analytics
How to Detect Heartbleed with Wire Data Analytics
 
Security Breakout Session
Security Breakout Session Security Breakout Session
Security Breakout Session
 
Architecting for the cloud cloud providers
Architecting for the cloud cloud providersArchitecting for the cloud cloud providers
Architecting for the cloud cloud providers
 
IBM and Lightbend Build Integrated Platform for Cognitive Development
IBM and Lightbend Build Integrated Platform for Cognitive DevelopmentIBM and Lightbend Build Integrated Platform for Cognitive Development
IBM and Lightbend Build Integrated Platform for Cognitive Development
 
Measuring Data Quality with DataOps
Measuring Data Quality with DataOpsMeasuring Data Quality with DataOps
Measuring Data Quality with DataOps
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 

Semelhante a Building data intensive applications

Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that growGibraltar Software
 
Cloud Computing - Geektalk
Cloud Computing - GeektalkCloud Computing - Geektalk
Cloud Computing - GeektalkMalisa Ncube
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
 
Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesIvo Andreev
 
Cloud Design Patterns - Hong Kong Codeaholics
Cloud Design Patterns - Hong Kong CodeaholicsCloud Design Patterns - Hong Kong Codeaholics
Cloud Design Patterns - Hong Kong CodeaholicsTaswar Bhatti
 
Visualizing Your Network Health - Know your Network
Visualizing Your Network Health - Know your NetworkVisualizing Your Network Health - Know your Network
Visualizing Your Network Health - Know your NetworkDellNMS
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
02 Models of Distribution Systems.pdf
02 Models of Distribution Systems.pdf02 Models of Distribution Systems.pdf
02 Models of Distribution Systems.pdfRobeliaJoyVillaruz
 
Cloud patterns at Carleton University
Cloud patterns at Carleton UniversityCloud patterns at Carleton University
Cloud patterns at Carleton UniversityTaswar Bhatti
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Gridsjlorenzocima
 
Performance Tuning
Performance TuningPerformance Tuning
Performance TuningJannet Peetz
 
performancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdfperformancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdfMAshok10
 
Tech for the Non Technical - Anatomy of an Application Stack
Tech for the Non Technical - Anatomy of an Application StackTech for the Non Technical - Anatomy of an Application Stack
Tech for the Non Technical - Anatomy of an Application StackIntelligent_ly
 
Building high performance and scalable share point applications
Building high performance and scalable share point applicationsBuilding high performance and scalable share point applications
Building high performance and scalable share point applicationsTalbott Crowell
 
8 cloud design patterns you ought to know - Update Conference 2018
8 cloud design patterns you ought to know - Update Conference 20188 cloud design patterns you ought to know - Update Conference 2018
8 cloud design patterns you ought to know - Update Conference 2018Taswar Bhatti
 
Operating System - Types Of Operating System Unit-1
Operating System - Types Of Operating System Unit-1Operating System - Types Of Operating System Unit-1
Operating System - Types Of Operating System Unit-1abhinav baba
 
Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Lari Hotari
 
RuSIEM overview (english version)
RuSIEM overview (english version)RuSIEM overview (english version)
RuSIEM overview (english version)Olesya Shelestova
 
EM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance PagesEM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance PagesEnkitec
 

Semelhante a Building data intensive applications (20)

Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud Computing - Geektalk
Cloud Computing - GeektalkCloud Computing - Geektalk
Cloud Computing - Geektalk
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challenges
 
Cloud Design Patterns - Hong Kong Codeaholics
Cloud Design Patterns - Hong Kong CodeaholicsCloud Design Patterns - Hong Kong Codeaholics
Cloud Design Patterns - Hong Kong Codeaholics
 
Visualizing Your Network Health - Know your Network
Visualizing Your Network Health - Know your NetworkVisualizing Your Network Health - Know your Network
Visualizing Your Network Health - Know your Network
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
02 Models of Distribution Systems.pdf
02 Models of Distribution Systems.pdf02 Models of Distribution Systems.pdf
02 Models of Distribution Systems.pdf
 
Cloud patterns at Carleton University
Cloud patterns at Carleton UniversityCloud patterns at Carleton University
Cloud patterns at Carleton University
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Grids
 
Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
 
performancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdfperformancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdf
 
Tech for the Non Technical - Anatomy of an Application Stack
Tech for the Non Technical - Anatomy of an Application StackTech for the Non Technical - Anatomy of an Application Stack
Tech for the Non Technical - Anatomy of an Application Stack
 
Building high performance and scalable share point applications
Building high performance and scalable share point applicationsBuilding high performance and scalable share point applications
Building high performance and scalable share point applications
 
8 cloud design patterns you ought to know - Update Conference 2018
8 cloud design patterns you ought to know - Update Conference 20188 cloud design patterns you ought to know - Update Conference 2018
8 cloud design patterns you ought to know - Update Conference 2018
 
Operating System - Types Of Operating System Unit-1
Operating System - Types Of Operating System Unit-1Operating System - Types Of Operating System Unit-1
Operating System - Types Of Operating System Unit-1
 
Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014
 
RuSIEM overview (english version)
RuSIEM overview (english version)RuSIEM overview (english version)
RuSIEM overview (english version)
 
EM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance PagesEM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance Pages
 

Último

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Building data intensive applications

  • 2. • Challenges • Quantity of data • Complexity of data • Speed of change in data When data is primary challenge
  • 3. Tools • NoSQL • Message Queues • Caches • Search Indexes • Batch processing frameworks • Stream processing frameworks
  • 5. Engineers Job • Accurate understanding of tools • Dig deeper into the buzzwords and mine out the trade-offs • Understand the principles, algorithms and check • Where each tool fits in • How to make good use of each tool • How to avoid pitfalls
  • 6. Big outages • Facebook - https://www.facebook.com/notes/facebook-engineering/ more-details-on-todays-outage/431441338919/ • Amazon - http://money.cnn.com/2011/04/22/technology/ amazon_ec2_cloud_outage/index.htm • Google - https://www.cnet.com/news/google-outage-reportedly- caused-big-drop-in-global-traffic/ • Sweden dropped off the internet - http://www.networkworld.com/article/ 2232047/data-center/missing-dot-drops-sweden-off-the-internet.html • EBS impact - https://aws.amazon.com/message/65648/
  • 7. Flipkart Big Billion 2015, 2014 Crashes, No search results, “Please try after sometime” What went wrong?
  • 8. What is technology implementation for “Sale Prices Reveal at 8PM” Myntra 2017 sales
  • 9. AWS Problems • Whole zone failure problem • Virtual h/w life is lesser than real h/w, 200days on avg • Better to be in more than one zone, and redundant across zones • Multi zone failures too happen, so go for multi-region also • To maintain high uptime, EBS is not the best option • I/O rates on EBS are poor • EBS fails at the region level, not on a per-volume basis • Failure of an EBS volumes can lock the entire Linux machine, leaving it inaccessible and affecting even operations that don't have direct disk activity • Other AWS services that use EBS may fail when EBS fails • Services like ELB, RDS, Elastic Beanstalk use EBS • EC2 and S3 don't use EBS Ref: http://www.talisman.org/~erlkonig/misc/aws-the-good-the-bad+the-ugly/
  • 10. 1. Does this arch ensure that the data remains correct and complete, even when things go wrong internally 2. Does this provide consistently good performance even when part of the system are degraded 3. Does it scale to handle increase in load 4. What does an API for this kind of service look like A typical system architecture
  • 11. Basic requirements from DIA • Reliability • Scalability • Maintainability
  • 12. Reliability • The system should work correctly in the face of adversity • Correctly - Performing the correct function at the desired level of performance • Tolerate user mistakes, prevent unauthorised access … • Adversity - Hardware faults, software faults, and even human error • Anticipate faults and design for it • Even AWS has problems and needs its own way of planning
  • 13. Software Errors • Errors • A runaway process that uses a shared resource like cpu, memory, disk, or network bandwidth • A service which has slowed down, become unresponsive • Cascading failures of components • Fixes • Careful analysis of assumptions and interactions in the system • Thorough testing • Process isolation • Allowing processes to crash and restart. Chaos Monkey by Netflix. • Measuring, monitoring and analysing system behaviour in production • Constantly checking the guarantees a system provide, and raising an alert in case of discrepancies.
  • 14. Human Errors • Well defined abstractions, APIs, and admin interfaces • These make it easy to do the “right thing” and discourage the “wrong thing” • Setup fully featured non-production sandbox environment • Here people can explore and experiment using real data w/o affecting real users • Unit, integration, automated and manual testing. • Automated is particularly good for covering corner cases • Allow quick and easy recovery from human errors • Make it fast to rollback of config changes, gradually roll out new code, tools to recompute the data • Setup metrics, monitoring and error rates • These let us get early warning signals, check if any assumption is being violated, and diagnose an issue in case of errrors/faults/failures.
  • 15. Scalability • As the systems grows, there should be reasonable ways of dealing with the growth • Grows - growth in data volume, traffic volume or complexity
  • 16. Describing Load • Load parameters like • Request/sec to a web server • Ratio of reads to writes to a database • No of simultaneously active users in a chat room • Hit rate on a cache. • Twitter - 2 main operations • Post tweet - 4.6K requests/sec on avg, 12k requests/sec at peak (2012) • Home timeline - 300K requests/sec • Hybrid approach if Implementation • Users with less following - Fanout tweet immediately to home time line caches of all followers of the user • Celebrities (30M followers) - Fetch celebrity tweet separately and merge into the timeline of the celebrity follower only when follower loads his home timeline
  • 17. Describing Performance • Performance parameters like • Throughput - In batch processing systems like Hadoop • Response time - In online systems • Response time not always remain same for reasons like • Context switch to a background process • Loss of a network packet and TCP retransmission • Garbage collection pause • Page fault forcing a read from disk • Mechanical vibrations in the server rack
  • 18. Measuring Performance • Median and percentiles (95p, 99p, 99.9p) of performance metrics • Plotting them on a histogram • Averaging out histograms for all servers
  • 19. Maintainability • Over the time many people will work on the system, and they should be able to work productively • Fix bugs, investigate failures • Keep system operational • Implement new use cases • Repay technical debt • 3 design principles for a maintainable system • Operability • Simplicity • Evolvability
  • 20. Operability • Operational tasks • Health monitoring and restoring a service from bad state • Tracking down cause of failures or degraded performances • Updates, security patches • Capacity planning • Setting up tools for Deployment and configuration management • Moving applications from one platform to another • Preserving knowledge as people come and go • How data systems can support effectiveness of operational tasks • Good monitoring - visibility into runtime behaviour and system internals • Support automation • Avoiding dependency on individual machines • Provide good documentation • Provide good default behaviour, and option to override defaults • Self healing where appropriate with option to manually control system state
  • 21. Operations friendly services best practices • Expect failures, handle all failures gracefully • Component may crash/stop • Dependent component may crash/stop • Network failure • Disk can go out of space • Keep things simple • Avoid unnecessary dependencies • Installation should be simple • Failures on one server should have no impact on rest of the data centre. • Automate everything • People make mistakes, they need sleep, they forget things • Automated processes are testable, fixable and therefore more reliable Ref: https://www.usenix.org/legacy/events/lisa07/tech/full_papers/hamilton/hamilton.pdf
  • 22. Latency • Understand latency from the entire latency distribution curve • Simply looking at 95th or 99th percentile is not sufficient • Tail latency matters • Median is not representative of common case. Average is even worse. • No single metric can define behaviour of latency • Be conscious with the monitoring tools and the data they report • Percentiles cant be averaged • Latency is not service time • Plot your data with coordinated omission and there is often a quick high rise in the curve • A non-omitted test often has a smoother curve • Very few tools actually correct for coordinated omission • HdrHistogram • Is additive, uses log buckets, helpful in capturing high volume data in production Ref: http://bravenewgeek.com/everything-you-know-about-latency-is-wrong/
  • 24. Document vs Relational • Document database store one-to-many relationships or nested records within the parent record (not in a separate table) • One-to-many - One person can have many contact details • Document and Relational database store many-to-one and many-to-many using unique identifier called foreign key in relational and document reference in document model. • Many-to-one - Many persons can have one address • Many-to-many - Many persons can have many skills
  • 25. Document vs Relational cont.Document Relational Data Model Closer to data structures used by application Better support for joins Schema flexibility Better support for Many to One relationships Better performance due to locality Better support for Many to Many relationships Fault Tolerance Concurrency Good for Analytics app where M-M reln is not needed Bad for • Reading small portion of a large document • Writes that increase the size of large document Recommended use • Keep documents fairly small • Avoid writes that increase document size
  • 26. TAO - Facebook distributed data store
  • 27. Facebook Thundering herd Problem • Problem: • Millions of people tune in to a celebrity Live broadcast simultaneously, potentially 100s of thousands of video requests will see a cache miss at the Edge Cache servers. • This results in excessive queries to the Origin Cache and Live processing servers, which are not designed to handle high concurrent loads. • Solution: • Create request queues at the Edge cache servers, • Allowing one request to go through to the livestream server and return the content to the Edge cache, where it is distributed to the rest of the queue all at once. Ref: https://code.facebook.com/posts/1653074404941839/under-the-hood- broadcasting-live-video-to-millions/
  • 28. PostgreSQL MongoDb Flexibility Have to match schema Put anything in any document Integrity Read valid data only Read anything out Consistency Written means written, no exceptions (except disk failure, use RAID) Written means written, unless something goes wrong (e.g. server crash, network partition, disk failure) Availability If master dies, stop to avoid corruption If master dies, rebalance to avoid downtime Bigger servers (Expensive, Cant use cloud) Good, upto 64 cores, 1TB Ram Bad, per database write lock Sharding (Cheaper, works in cloud) Bad, hard to choose shards to maintain integrity Good, built in support with mongos Replication Doesn't help write-throughput, always hits master
 Faster Failover Ideal Use Case Good for storing arbitrary pieces of json, when you don't care at all what is inside that JSON. 
 If your code expects something TO BE present in json, then MongoDB is wrong choice. 
 Never use mongoldb if one document has conceptual links to another document(s). Ref: https://speakerdeck.com/conradirwin/mongodb-confessions-of-a-postgresql-lover
 http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
 https://www.infoq.com/presentations/data-types-issues
  • 29. Storage Engines • Optimised for one of • Transaction processing • Analytics such as column oriented • Belong to one of families • Log structured storage engines • Page oriented storage engines such as B-trees
  • 30. Data structure behind databases #!/bin/bash db_set () { echo "$1,$2" >> database } db_get () { grep "^$1," database | sed -e "s/^$1,//" | tail -n 1 } $ db_set 81 '{"x":"11","places":["London Eye"]}' 
 $ db_set 42 '{"x":"23","places":["Exploratorium"]}' 
 $ db_set 42 '{"x":"35","places":["Golden Gate"]}' 
 $ db_get 42 '{"x":"35","places":["Golden Gate"]}' $ cat database
 81,{"x":"11","place":["London Eye"]} 
 42,{"x":"23","places":["Exploratorium"]}
 42,{“x":"35","places":["Golden Gate"]} Many dbs use a log, an append-only data file, similar to what db_set does But real database has to deal with more issues • Concurrency control • Reclaiming disk space • Log size control • Handling errors, crash recovery • Partially written records • File format • Deleting records Append-log is efficient • Appending and segment merging are faster • Concurrency and crash recovery are much simpler if segment file are append-only or immutable • Merging old segments avoids fragmentation problem
  • 31. Performing compaction and segment merging simultaneously in a append-only log file
  • 32. Indexes • Hash Index • Must fit in memory, for very large no if keys, hash index wont work • Range queries wont work • SSTables and LSM-Trees
  • 33. Traditional RDBMS wisdom • Row store • Data is in disk block formatting (heavily encoded) • With a main memory buffer pool of blocks • Query plans • Optimize CPU, I/O • Fundamental operation is read a row • Indexing via B-Trees • Clustered or Unclustered • Dynamic row level locking • Aries-style write-ahead log • Replication (sync or async) • Update the primary first • Then move the log to the other sites • And roll forward at the secondary(s) • MySQL, Oracle, Postgres, SQLServer, DB2 • Traditional wisdom is now obsolete
  • 34. DBMS marketplace Market size Features Current State Data warehouse s 1/3 • Lots of big reads • Bulk-loaded from OLTP systems • Market already moving towards column stores (which is not based on traditional wisdom ex. HP Vertica, Amazon Paraccel) • Column stores are 50 - 100 times faster than row stores OLTP 1/3 • Lots of small updates • And a few reads • Not clear who will win, but NewSQL dbs are wildly faster. Ex. Voltdb, Google Spanner • OLTP and NewSQL Everything else 1/3 • Hadoop, NoSQL, graph dbs, array dbs …
  • 35. Why column-stores are faster • Typical warehouse query read 4-5 attribute from a 100 column fact table • Row store - reads all 100 attributes • Column store - reads just the ones you need • Compression is way easier and more productive in column store • Each column has data of same type -> Each block contains data of one kind of attribute. Bitmap can be used • No big record headers in column store • They dont compress well • A column executor is wildly faster than row executor • Because of vector processing
  • 36. OLTP and NewSQL What future holds for OLTP • Main memory DBMS • With anti-caching • Deterministic concurrency control • HA via active-active OLTP data bases - 3 big decisions • Main memory vs disk orientation • Concurrency control strategy • Replication strategy Ref : http://slideshot.epfl.ch/play/suri_stonebraker
  • 37. Data format or schema changes • Data format/schema change often needs a change in application code • Code changes often cannot happen instantaneously • Server side apps - Staged Rollout (installing new codes in some nodes and gradually installing to other nodes as the new code is found working fine) • Client side apps - Some user may and some may not install upgrade for some time • Hence old & new versions of code, and old & new data formats may potentially coexist in the system at the same time. • Backward compatibility - Newer code can read data that was written by older code • Forward compatibility (trickier)- Older code can read data that was written by newer code • Data encoding formats supports in achieving the above requirements • JSON, XML, Protocol Buffers, Thrift, Avro
  • 38. Encoding formats • Programs generally work with data in 2 representations • In-memory representation - As object, structs, lists, array, hash tables, trees and so on. • These data structures are often optimised for efficient usage by CPU typically using pointers • Disk File and/or Over the network representation - A self contained byte sequence of bytes to be stored in a disk file or to be transferred over the network • Since a pointer wouldn’t make sense to any other process, this and in-memory representation are quite different • Encoding • Translation from in-memory to byte sequence • Also called marshalling, serialisation • Decoding • Translation from byte sequence to in-memory • Also called unmarshalling, deserialisation, parsing
  • 39. Language-specific vs Standard formats Language specific Standard java.io.Serializable, Ruby Marshal, Python Pickle, PHP serialize/unserialize functions JSON, XML, CSV Encoding is tied to programming language Lots of ambiguity in numbers encoding. XML and CSV cant distinguish between number and string. JSON distinguishes numbers and strings, but it doesn't distinguishes integers and floating points To restore data in same object types, the decoding process needs to instantiate arbitrary classes which has security issues JSON and XML support Unicode character strings i.e. human readable text, but don't support binary strings i.e. seq of bytes w/ o character encoding. Generally Base64 is used as a workaround. Data versioning is not taken care of. Backward and Forward compatibility is always an issue There is optional schema support for XML and JSON. They are powerful but they are quite complicated too. CSV doesn't have schema. Efficiency - cpu time and size of encoded data is always an afterthought CSV is vague format, confusion arises if a value contains comma or newline character. Its escaping rules are not correctly implemented by all parsers
  • 40. Security issue with arbitrary class instantiation • A Vulnerability in Java environments • Any application that accepts serialized Java objects is likely vulnerable, even if a framework or library is responsible and not your custom code. • There’s no easy way to protect applications en-masse. It will take organizations a long time to find and fix all the different variants of this vulnerability. • There’s no way to know what you’re deserializing before you’ve decoded it. • An attacker can serialize a bunch of malicious objects and send them to your application. • ObjectInputStream in = new ObjectInputStream( inputStream ); • return (Data)in.readObject(); • Once you call readObject(), it’s too late. The attackers malicious objects have already been instantatiated, and have taken over your entire server. • Solution : Allow deserialization, but make it impossible for attackers to create instances of arbitrary classes. • List<Class<?>> safeClasses = Arrays.asList( BitSet.class, ArrayList.class ); • Data data = safeReadObject( Data.class, safeClasses, 10, 50, inputStream ); • Limit the input to a maximum of 10 embedded objects and 50 bytes of input.
  • 41. Javascript - working with large numbers • JS supports only 53 bits integers • All numbers in JS are floating point numbers • Numbers including integers and floating point are represented as sign x mantissa x 2^exponent • Mantissa has 53 bits • Exponent can be used to get higher numbers but they wont be contagious • Twitter keeps 64 bit integer ids for status, user, direct message, search ids • Due to JS integer limitation, json returned by twitter api includes ids twice, once as json number and once as decimal string. • {"id": 10765432100123456789, "id_str": "10765432100123456789", ...} • Languages that use 64 bit unsigned integers can use property id and don't need id_str • Javascript can use id_str along with library like strint to do all kind of math operations on id_str Ref : http://2ality.com/2012/07/large-integers.html
 https://groups.google.com/forum/#!topic/twitter-development-talk/ahbvo3VTIYI
  • 42. Scaling to Higher Load Shared memory Shared disk Shared nothing Many CPUs, many RAM chips and many disks joined together under one OS, and a fast interconnect allows any CPU to access any part of the memory or disk Uses several machines with independent CPUs and RAMs, but stores data on an array of disks that is shared between the machines, connected via a fast network Each machine running the database software is called a node. Each node has its own CPU, RAM, and disks. Any coordination between nodes is done at software level, using conventional network Cost is super linear. A machine twice the size may not necessarily handle twice the load Application developer needs to be super cautious. Since the data is distributed over multiple nodes, constraints and trade-offs needs to be taken care of at software level. Also called Vertical Scaling or Scaling Up. Its the simplest approach i.e. buy powerful machine
 Also called horizontal scaling or scaling out Crash recovery is easiest, but concurrency control is little difficult because of the necessity of dealing with lock table as hot spot Concurrency control is most difficult because of coordinating multiple copies of the same lock table, and syncing writes to a common log or logs Concurrency control is more difficult because it requires a distributed deadlock detector and a multi-phase commit protocol Ref: http://www.benstopford.com/2009/11/24/understanding-the-shared-nothing-architecture/
  • 43. Scaling to Higher Load cont.. Shared memory Shared disk Shared nothing Difficulty of concurrency control 2 3 2 Difficulty of crash recovery 1 3 2 Difficulty of data base design 2 2 3 Difficulty of load balancing 1 2 3 Difficulty of high availability 3 2 1 Number of messages 1 2 3 Bandwidth required 3 2 1 Ability to scale to large no of machines 3 2 1 Ability to scale to large distances between machines 3 2 1 Susceptibility to critical sections 3 2 1 Number of system images 1 3 3 Susceptibility to hot spots 3 3 3 Ranking 1 - best, 2 - 2nd best, 3 - 3rd best
 Ref: http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf
  • 44. Replication Use cases • Reduce latency • Increase availability • Increase read throughput Scenarios • Small dataset, stored in a single machine • Partitioning or Sharding, stored in multiple machines • Faults • Synchronous vs Asynchronous replication • Handling failed replicas • Eventual consistency • Setting up new followers Replicating changes • Single leader • Multi leader • Leaderless
  • 45. Leader based replication • Also known as Active/Passive and Master/Slave replication • Built in feature of • Postgres, Mysql, Oracle Data Guard, Sql Server Availability Group • MongoDB, RethinkDB, Espresso • Kafka, RabbitMQ • Network File Systems, replicated block devices like DRDB • Synchronous replication • Leader waits for confirmation from follower before reporting success to its client • Guarantee of up-to-date copy between leader and follower • All followers can never be synchronous : any one node outage would cause the whole system to grind to a halt • Asynchronous replication • Leader sends message to follower and reports success to its client (does n wait for confirmation from follower) • Often leader based replication is asynchronous • Non durable - If leader fails and is not recoverable, all un-replicated writes are lost • It is inevitable when many followers or geographically distributed followers • Semi-synchronous replication • If sync follower becomes unavailable or slow, an async follower is made synchronous
  • 46. Setting up new followers • Take a consistent snapshot of leader’s db (without taking a lock on entire db) • Most dbs have this built in. 3rd party tools like innobackupex for Mysql can also be used. • Snapshot should record the exact position of leader’s replication log. This position is called as log sequence number (postgres), binlog coordinates (mysql). • Copy the snapshot to the new follower node • Follower connects to the leader, and request all the data changes happened after log sequence number • After follower has processed all the backlog of data changes, it is said to caught up. Now follower can continue processing data from leader as then happen
  • 47. Handling node outages • Leader failure handling is trickier than follower failure handling:- • Determining that the leader has failed • Timeout is most popular strategy to detect a leader’s failure (nodes bounce messages back and forth between each other and when a node doesn't respond for 30secs it is assumed to be dead) • Choosing a new leader • Either election process or previously chosen controller node. The best candidate is usually the replica with most up-to-date data changes from the old leader. • Reconfiguring the system to use the new leader • Using Request Routing, client now send data to new leader. When old leader comes back, the system has to ensure that it becomes follower and recognise the new leader • Failover is subject to things that may go wrong • Async repl : New leader do not have all the writes form old leader • If old leader rejoin the cluster, what should happen to new those writes? • The new leader may have received conflicting writes in the meantime!! • Commonly, these writes are discarded, which has its own problems • Violation of client’s durability expectation • Dangerous situations may arise if other storage systems outside of database needs to be coordinated with the database contents • Ex. Github incident when out-of-date MySQL follower was promoted to leader. Some auto increment primary keys were reused by old and new leaders. The same keys were used by Redis store. This resulted in private data of some users shared with some other users • Split brain : 2 nodes both believing they are the leader • Sometime this lead to shutdown of both system • Timeout : Right timeout for leader to be declared dead • Short timeout can lead to unnecessary failovers • Long timeout can be due to load on network, traffic spike, any failure during such situation can worsen the situation further Due to unavailability of easy solutions to these problems, most devops teams prefer to use manual failover even if software supports auto failover
  • 48. Implementations of replication logs Statement based Write ahead log Logical log Trigger based Every write statement is logged and sent to follower i.e. every insert, update, delete is statement is forwarded. The follower parses and executes the statement as if it is received from a client. The database log is used to build a replica on another node - both log-structured storage engine and b-tree uses a log in some or the other way to store the data Use different log formats for replication and for storage engine.
 A transaction that modifies several rows generates several such log records Involves application code, replication is moved up the application layer. Ex. When only a subset of data is to be replicated, or want to replicate from one kind of database to another. Cons : A statement that calls a non-deterministic function like NOW() or RAND() is likely generate a different value on replica. 
 Auto-increment may have a different impact in executed in different order Cons: The Log describes data on a very low level including details like which bytes were changed on which disk block. This make tight coupling with storage engine - A zero-downtime upgrade of database softwares by first upgrading followers and then making one of the nodes the leader is not possible. Pros: Allows for decoupling of replication log and storage engine - a zero-downtime upgrade is hence possible.
 The logical log can sent to external systems such as data warehouse, custom indexes, caches Triggers and stored procedures are used to achieve this.
 
 Cons: Greater overhead than other replication methods, more prone to bugs and limitations. Used in MySQL before ver 5.1. Now MySQL switches to row based repl whenever any non- determinism is present in a statement Used in PostgreSql and Oracle Mysql’s binlog when configured to use row based replication use this approach. Databus for Oracle and Bucardo for Postgres Ref: http://www.benstopford.com/2009/11/24/understanding-the-shared-nothing-architecture/
  • 49. Multi leader replication • A somewhat retrofitted feature in many databases • Often it causes pitfalls and problems with other database features • Auto incrementing keys • Triggers • Integrity constraints • Multi leader replication is often considered a dangerous territory that should be avoided if possible
  • 50. Use cases for Multi leader replication • Multi-datacenter operation • Performance is better as the every write can be processed in local/nearest datacenter • Datacenter outages can be better tolerated • Network problems can be better tolerated • Clients with offline operation • Calendar apps in mobile phones, laptop and other devices, needs to allow create/edit/view of calendar events even if they are not connected to internet. • All offline changes needs to be synced with server and other devices when the drive is next online. • All device local database acts a leader, and there is async multi-leader replication process between the replicas of calendar on all devices • There is a rich history of broken calendar sync implementations. Hence multi- leader repl is a tricky thing to get right • CouchDB is designed for making this use case easier • Collaborative editing • Google docs, Etherpad • Changes are instantly applied to local replica and async replicated to server and other users editing the same document • To avoid conflicts, each user obtain a lock before editing the document • For faster collaboration, the unit of change is made very small, ex. a single keystroke.
  • 51. Conflict resolution - in multi leader replication • Custom conflict resolution • On write • Bucardo works this way • When conflict is detected, the database calls a conflict handler. In bucardo it can be perl script. • The handler runs in the background and do not allow to prompt the user • On read • CouchDB works this way • When conflict is detected, all conflicting writes are stored. • The next time data is read, all the multiple versions of data is returned to application code. • The application may prompt the user or resolve the conflict and write back the result in database. • Automatic conflict resolution • Used by Amazon • Frequently products that are removed from cart still appear in the cart due to the conflict resolution logic errors
  • 52. Ensuring Consistency in Multi-leader Replication • Pessimistic Locking • Wait for your turn • Optimistic Locking • Early bird gets the worm • Conflict Resolution • Your mother cleans up later • Conflict Avoidance • Solve the problem by not having it #https://www.percona.com/live/mysql-conference-2013/sessions/state-art-mysql-multi-master-replication slide 7
  • 53. Microservices at UBER • Microservices bring benefits like • Each teams owning their own release cycles • Each team responsible for their own uptime • Microservices has challenges like • The aggregate velocity can be much slower for ex. the Java team has to figure out how to talk to metrics system, so do Node people and Go people • A hard fought bug on one platform has also to be fought on another platform • “I hadn't expected the cost of multiple languages to be as high as it was” — Matt Ranney (Uber’s Chief System Architect) Present in lots of data centres around the world TLS termination at front end Riak clusters to manage the site of all in progress jobs Completed jobs travel from Marketplace to other logic systems through Kafka Marketplace - the dispatch system which supports all sorts of logistics including rides, ubereats etc. in Node.js, Java, Go Other queues execute other workflows ex. prompt user to get the receipt and rate the trip Map services compete the ETAs and routes for the trip. Some high throughput systems written in Java All Kafka streams go to Hadoop for analytical processing Moving towards type safe and verifiable interfaces between services as type unsafe JSON cost is too huge A lot of early code was using JSON over HTTP, which makes it hard to validate interfaces. Army of mobile phones around the world doing black box testing
  • 54. Death of MapReduce at Google