Dynamo db

Dynamo:
Amazon’s Highly Available Key-value Store
&
Amazon DynamoDB

Presented by:

Zuhair Khayyat

What is Dynamo
● Dynamo is an eventually-consistent key-value storage
system used in Amazon's web services to support scalable
highly available data access.
● Dynamo is used to mainly to manage the state of services,
such as S3 and e-commerce.
● Optimized for availability (always on experience) to
maximize customer satisfaction in trade of:
– Data consistency
– Durability
– Performance
Dynamo & DynamoDB

Dynamo: Why not relational database
● Many services on Amazon’s platform that requires high
reliability requirements only need primary-key access to a
data store.
● Relational databases are highly optimized for complex
query processing, however they have limited scalability
and chose consistency over availability.
● The complicated features of relational databases requires
expensive hardware and very skillful administrators.

Dynamo & DynamoDB

Dynamo: Amazon's Requirements
● Simple reads and writes to binary objects not larger than
1 MB while no operation spans for multiple data.
● Very fast data access, (<300) ms response time.
● Heterogeneous commodity hardware infrastructure.
● Used by decentralized, loosely coupled services.
● Highly available (always on); expect small frequent
network and server failures.

Dynamo & DynamoDB

Dynamo: Consistency and Replication
● Strong data consistency and high data availability cannot
be achieved simultaneously.
● “Dynamo is designed to be an eventually consistent data
store; that is all updates reach all replicas eventually.”
● “always writable” data store, do not reject write
operations if data is inconsistent.
– Imagine you are ordering form Amazon.com and the
website rejects adding an item to your cart!
● Conflict resolution: The application is responsible too
resolve the data conflicts.
Dynamo & DynamoDB

Dynamo VS Bigtable
Dynamo Bigtable
Cluster Setup decentralized Centralized (GFS)
Data Access (Primary-key, version*) (row key,col key,timestamp)
Data Partitioning and Load Customized Consistency 64K partitions stored in
Balancing Hashing least utilized machines
(GFS)
Data Query Zero-hop DHT Ask the Master (GFS)
Read Operation Multiple copies read Single copy read
Typical Value size Less than 1 MB Not specified (GFS)
Writes operation on Accept all write operations Make data unavailable until
inconsistence Data and resolve conflicts consistent (GFS)

Dynamo & DynamoDB

Dynamo: Interface
● Key-value storage system with operators:
– get(key): returns a single or a list of objects with
conflicting versions
– put(key,context,object): place the object and write its
replicas to disk. Context contains information about the
object such as the version.
● MD5 hashing is applied on the key to generate 128-bit
identifier.

Dynamo & DynamoDB

Dynamo: Partitioning
● Dynamo is designed to scale incrementally one machine
at a time.
● Consistent hashing generates a fixed output space
constructed as a ring.
● A variant of consistent hashing (virtual nodes) is used
by Dynamo to dynamically repartition and load balance
the data over the storage hosts.
● Each storage host acquires data depending on its
capacity.

Dynamo & DynamoDB

Dynamo: Consistent Hashing
A
H [1,10]
[71,80] D
[11.20]
A
G H [1,10]
[61,70] E [71,80] D
[21.30] [11.20]
G
C [61,70]
[51,60] B E
F [31.40] [21.30]
[41,50] C
[55,60] B
Adding a node [31.40]
(storage host) I F
[47,54] [41,46]
Dynamo & DynamoDB

Dynamo: Variant of Consistent Hashing
A
D* [1,10]
[71,80] D
[11.20]
A
B* D* [1,10] D
[61,70] C* [71,80] [11.16]
[21.30]
B* E
C [61,70] [17,24]
[51,60] B
A* [31.40] C*
[41,50] C [25.30]
[55,60]
Adding a node B
(storage host) E* [31.40]
A*
[47,54]
Dynamo & DynamoDB [41,46]

Dynamo: Replication
● Each key (k) is assigned to a coordinator node (i).
● Each value (v) is replicated to (N-1) clockwise
successor logical nodes in the ring.
● Node (i) is responsible to update all other (N-1)
replicas for the keys it owns.
● Each key (k) has a preference list of physical
nodes that are responsible to maintain and access
the key's data
Dynamo & DynamoDB

Dynamo: Data Versioning
● Eventual consistency protocol is used to update all
data replicas asynchronously.
● put() is returned before updating all replicas.
● get() can return multiple versions for the same key.
● Dynamo track each data mutation as a new version
version to support “write always” protocol.
● Dynamo uses vector clocks protocol for versioning.

Dynamo & DynamoDB

Dynamo: vector clocks example 1

Value=100
A
A:1

B

C

Dynamo & DynamoDB


Value=100
A
A:1
+1

Value=101
B A:1,B:1

C

Dynamo & DynamoDB


Value=100
A
A:1
+1

Value=101
B A:1,B:1
+4

Value=105
C A:1,B:1,C:1

Dynamo & DynamoDB


Value=100
A
A:1
+1

Value=101 Value=205
B A:1,B:1 A:1,B:2,C:1
+4

Value=105 +100
C A:1,B:1,C:1

Dynamo & DynamoDB


Value=100
A
A:1
+1

Value=101 Value=205
B A:1,B:1 A:1,B:2,C:1
+4 +110

Value=105 +100 Value=315
C A:1,B:1,C:1 A:1,B:2,C:2

Dynamo & DynamoDB


Value=100
A
A:1
+1

Value=101 +100 Value=201
B A:1,B:1 A:1,B:2
+4 +110

Value=105 Value=311
C A:1,B:1,C:1 +110 A:1,B:2,C:1

Conflict!

Dynamo & DynamoDB Value=215
A:1,B:1,C:2

Dynamo: resolving conflicts
● Syntactic reconciliation:
– The Application is able to resolve the conflict automatically
● Semantic reconciliation:
– Merge results from different conflicts, make the user revise
the new values.
– Example: Amazon's shopping cart:
● Preserve “Add to cart” items.
● Deleted items can resurface.

Dynamo & DynamoDB

Dynamo: Processing put() & get()
● The user is able to issue commands with either of the
following scenarios:
– A generic load balancer is invoked to direct the user's
requests to the least utilization.
– Use a partition-aware library to direct the request to one of
the data owners directly.
● The system requires two configurable values:
– R: the number of available healthy nodes required for a
successful reads
– W: the number of available healthy nodes required for a
successful write.
Dynamo & DynamoDB

Dynamo: Hinted Handoff
● Assuming N=3, a failed put() operation on node A is
temporarily handled by B.
● After A recovers, B sends the result of put() operation back
to A.
● Advantage: temporarily A
D'
failure has minimal effect D

on the application.
A''
C'

C
Dynamo & DynamoDB B
A'

Dynamo: Scalability
● Adding or removing the node requires a third party tool
or direct user interaction.
● Gossip-based protocol is used to propagate membership
throughout the cluster and to detect failures.
● Replica synchronization is done using Merkle hash tree.

Dynamo & DynamoDB

Dynamo: Peak Performance
● Shopping Cart Service at a holiday:
– 10 Million requests
– 3 million checkouts
– 100000+ concurrent sessions
– No downtime!

Dynamo & DynamoDB

Dynamo DB

Dynamo & DynamoDB

What is DynamoDB
● A NoSQL database service available publicly through
amazon's EC2; released on 2012.
● Based on Dynamo, a scalable highly available (key,
value) storage system used by Amazon's servers;
published in SOSP 2007
●

Dynamo & DynamoDB

DynamoDB: Data Model
● The database is a collection of tables.
● A table is a collection of items.
● An item is a collection of attributes.
● Primary key is required.
● No nulls or empty Strings.
● No schema is required, items can vary in the number of
attributes.. How it is possible?

Dynamo & DynamoDB

DynamoDB: Example
● Table name: ProductCatalog
{ Id = 101 { Id = 202
ProductName = "Book 101 Title" ProductName = "21-Bicycle 202"
ISBN = "111-1111111111" Description = "202 description"
Authors = [ "Author 1","Author 2" ] BicycleType = "Road"
Price = -2 Brand = "Brand-Company A"
Dimensions = "8.5 x 11.0 x 0.5" Price = 200
PageCount = 500 Gender = "M"
InPublication = 1 Color = [ "Green", "Black" ]
ProductCategory = "Book" ProductCategory = "Bike"
} }
{ Id = 201
ProductName = "18-Bicycle 201"
Description = "201 description"
BicycleType = "Road"
Brand = "Brand-Company A"
Price = 100
Gender = "M"
Color = [ "Red", "Black" ]
ProductCategory = "Bike"
}

DynamoDB: Example
● Storage in Dynamo:
– <Tabel_List, {ProductCatalog,....}>
– <ProductCatalog, {101,102,201,202}>
– <101, {ProductName={},ISBN={},Authors={}...}>
– or –
– <Tabel_List, {ProductCatalog,....}>
– <ProductCatalog, {101,102,201,202}>
– <101, {ProductName,ISBN,Authors...}>
– <101_Authors,{Author 1,Author 2}>1
Dynamo & DynamoDB

DynamoDB: Table Primary Keys
● A table in DynamoDB must have a primary key.
● A primary key can be either “hash only” or hash and range.
● DynamoDB uses unsorted hash index, while the range index
is sorted.
● Hash only primary key is based on only a single attribute.
● Hash and range primary key is based on two attributes.
● Data types:
– Scalar data types: Number, String, and Binary.
– Multi-valued types: String Set, Number Set, and Binary Set.
Dynamo & DynamoDB

DynamoDB: Read operation
● Availability and durability are maintained through data
replication.
● Updating all the replicas after data mutation requires some
latency; DynamoDB eventually will synchronize all the replicas.
● DynamoDB supports two read operations:
– Eventually consistent read
● Does not necessarily reflects the last data mutation.
● Very fast data access; not affected by failures.

– Consistent read
● Always reflects the last data access.
● Wait for data to be consistent in all replicas; affected by
network and storage failures.

DynamoDB: Similar services
● Datastore on Google Appengine
● Cloudant Data Layer (CouchDB)

Dynamo & DynamoDB

DynamoDB: try it today

Dynamo & DynamoDB

Dynamo db

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Dynamo db

Semelhante a Dynamo db (20)

Mais de Zuhair khayyat

Mais de Zuhair khayyat (11)

Dynamo db