SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Dynamo:
Amazon’s Highly Available Key-value Store
                   &
          Amazon DynamoDB


             Presented by:

             Zuhair Khayyat
What is Dynamo
●   Dynamo is an eventually-consistent key-value storage
    system used in Amazon's web services to support scalable
    highly available data access.
●   Dynamo is used to mainly to manage the state of services,
    such as S3 and e-commerce.
●   Optimized for availability (always on experience) to
    maximize customer satisfaction in trade of:
         –   Data consistency
         –   Durability
         –   Performance
                                Dynamo & DynamoDB
Dynamo: Why not relational database
●   Many services on Amazon’s platform that requires high
    reliability requirements only need primary-key access to a
    data store.
●   Relational databases are highly optimized for complex
    query processing, however they have limited scalability
    and chose consistency over availability.
●   The complicated features of relational databases requires
    expensive hardware and very skillful administrators.



                          Dynamo & DynamoDB
Dynamo: Amazon's Requirements
●   Simple reads and writes to binary objects not larger than
    1 MB while no operation spans for multiple data.
●   Very fast data access, (<300) ms response time.
●   Heterogeneous commodity hardware infrastructure.
●   Used by decentralized, loosely coupled services.
●   Highly available (always on); expect small frequent
    network and server failures.


                         Dynamo & DynamoDB
Dynamo: Consistency and Replication
●   Strong data consistency and high data availability cannot
    be achieved simultaneously.
●   “Dynamo is designed to be an eventually consistent data
    store; that is all updates reach all replicas eventually.”
●   “always writable” data store, do not reject write
    operations if data is inconsistent.
         –   Imagine you are ordering form Amazon.com and the
               website rejects adding an item to your cart!
●   Conflict resolution: The application is responsible too
    resolve the data conflicts.
                            Dynamo & DynamoDB
Dynamo VS Bigtable
                                      Dynamo                        Bigtable
      Cluster Setup                decentralized               Centralized (GFS)
      Data Access             (Primary-key, version*)      (row key,col key,timestamp)
Data Partitioning and Load   Customized Consistency          64K partitions stored in
       Balancing                    Hashing                  least utilized machines
                                                                       (GFS)
       Data Query                  Zero-hop DHT              Ask the Master (GFS)
     Read Operation             Multiple copies read            Single copy read
    Typical Value size            Less than 1 MB              Not specified (GFS)
   Writes operation on       Accept all write operations   Make data unavailable until
   inconsistence Data          and resolve conflicts           consistent (GFS)




                                 Dynamo & DynamoDB
Dynamo: Interface
●   Key-value storage system with operators:
        –   get(key): returns a single or a list of objects with
              conflicting versions
        –   put(key,context,object): place the object and write its
              replicas to disk. Context contains information about the
              object such as the version.
●   MD5 hashing is applied on the key to generate 128-bit
    identifier.



                              Dynamo & DynamoDB
Dynamo: Partitioning
●   Dynamo is designed to scale incrementally one machine
    at a time.
●   Consistent hashing generates a fixed output space
    constructed as a ring.
●   A variant of consistent hashing (virtual nodes) is used
    by Dynamo to dynamically repartition and load balance
    the data over the storage hosts.
●   Each storage host acquires data depending on its
    capacity.

                         Dynamo & DynamoDB
Dynamo: Consistent Hashing
                   A
       H         [1,10]
    [71,80]                    D
                            [11.20]
                                                                      A
   G                                                         H      [1,10]
[61,70]                           E                       [71,80]                 D
                               [21.30]                                         [11.20]
                                                     G
      C                                           [61,70]
   [51,60]                   B                                                       E
                 F        [31.40]                                                 [21.30]
              [41,50]                                C
                                                  [55,60]                        B
                              Adding a node                                   [31.40]
                              (storage host)                 I         F
                                                          [47,54]   [41,46]
                                      Dynamo & DynamoDB
Dynamo: Variant of Consistent Hashing
                   A
      D*         [1,10]
    [71,80]                    D
                            [11.20]
                                                                      A
  B*                                                        D*      [1,10]       D
[61,70]                          C*                       [71,80]             [11.16]
                               [21.30]
                                                    B*                                E
      C                                           [61,70]                          [17,24]
   [51,60]                   B
                A*        [31.40]                                                    C*
              [41,50]                                C                             [25.30]
                                                  [55,60]
                              Adding a node                                      B
                              (storage host)                E*                [31.40]
                                                                      A*
                                                          [47,54]
                                      Dynamo & DynamoDB             [41,46]
Dynamo: Replication
●   Each key (k) is assigned to a coordinator node (i).
●   Each value (v) is replicated to (N-1) clockwise
    successor logical nodes in the ring.
●   Node (i) is responsible to update all other (N-1)
    replicas for the keys it owns.
●   Each key (k) has a preference list of physical
    nodes that are responsible to maintain and access
    the key's data
                       Dynamo & DynamoDB
Dynamo: Data Versioning
●   Eventual consistency protocol is used to update all
    data replicas asynchronously.
●   put() is returned before updating all replicas.
●   get() can return multiple versions for the same key.
●   Dynamo track each data mutation as a new version
    version to support “write always” protocol.
●   Dynamo uses vector clocks protocol for versioning.

                        Dynamo & DynamoDB
Dynamo: vector clocks example 1

    Value=100
A
       A:1




B




C



                Dynamo & DynamoDB
Dynamo: vector clocks example 1

    Value=100
A
       A:1
                  +1



                Value=101
B                A:1,B:1




C



                            Dynamo & DynamoDB
Dynamo: vector clocks example 1

    Value=100
A
       A:1
                  +1



                Value=101
B                A:1,B:1
                                 +4



                              Value=105
C                             A:1,B:1,C:1



                            Dynamo & DynamoDB
Dynamo: vector clocks example 1

    Value=100
A
       A:1
                  +1



                Value=101                       Value=205
B                A:1,B:1                        A:1,B:2,C:1
                                 +4



                              Value=105          +100
C                             A:1,B:1,C:1



                            Dynamo & DynamoDB
Dynamo: vector clocks example 1

    Value=100
A
       A:1
                  +1



                Value=101                       Value=205
B                A:1,B:1                        A:1,B:2,C:1
                                 +4                           +110



                              Value=105          +100         Value=315
C                             A:1,B:1,C:1                     A:1,B:2,C:2



                            Dynamo & DynamoDB
Dynamo: vector clocks example 2

    Value=100
A
       A:1
                  +1



                Value=101         +100          Value=201
B                A:1,B:1                         A:1,B:2
                                 +4                         +110



                              Value=105                     Value=311
C                             A:1,B:1,C:1          +110     A:1,B:2,C:1

                                                             Conflict!

                            Dynamo & DynamoDB               Value=215
                                                            A:1,B:1,C:2
Dynamo: resolving conflicts
●   Syntactic reconciliation:
         –   The Application is able to resolve the conflict automatically
●   Semantic reconciliation:
         –   Merge results from different conflicts, make the user revise
              the new values.
         –   Example: Amazon's shopping cart:
                   ●   Preserve “Add to cart” items.
                   ●   Deleted items can resurface.



                                 Dynamo & DynamoDB
Dynamo: Processing put() & get()
●   The user is able to issue commands with either of the
    following scenarios:
         –   A generic load balancer is invoked to direct the user's
               requests to the least utilization.
         –   Use a partition-aware library to direct the request to one of
              the data owners directly.
●   The system requires two configurable values:
         –   R: the number of available healthy nodes required for a
               successful reads
         –   W: the number of available healthy nodes required for a
              successful write.
                                Dynamo & DynamoDB
Dynamo: Hinted Handoff
●   Assuming N=3, a failed put() operation on node A is
    temporarily handled by B.
●   After A recovers, B sends the result of put() operation back
    to A.
●   Advantage: temporarily                                A
                                                D'
    failure has minimal effect                                    D

    on the application.
                                          A''
                                                                      C'


                                                C
                          Dynamo & DynamoDB                   B
                                                     A'
Dynamo: Scalability
●   Adding or removing the node requires a third party tool
    or direct user interaction.
●   Gossip-based protocol is used to propagate membership
    throughout the cluster and to detect failures.
●   Replica synchronization is done using Merkle hash tree.




                         Dynamo & DynamoDB
Dynamo: Peak Performance
●   Shopping Cart Service at a holiday:
        –   10 Million requests
        –   3 million checkouts
        –   100000+ concurrent sessions
        –   No downtime!




                            Dynamo & DynamoDB
Dynamo DB




Dynamo & DynamoDB
What is DynamoDB
●   A NoSQL database service available publicly through
    amazon's EC2; released on 2012.
●   Based on Dynamo, a scalable highly available (key,
    value) storage system used by Amazon's servers;
    published in SOSP 2007
●




                        Dynamo & DynamoDB
DynamoDB: Data Model
●   The database is a collection of tables.
●   A table is a collection of items.
●   An item is a collection of attributes.
●   Primary key is required.
●   No nulls or empty Strings.
●   No schema is required, items can vary in the number of
    attributes.. How it is possible?


                           Dynamo & DynamoDB
DynamoDB: Example
    ●    Table name: ProductCatalog
{       Id = 101                              {   Id = 202
        ProductName = "Book 101 Title"            ProductName = "21-Bicycle 202"
        ISBN = "111-1111111111"                   Description = "202 description"
        Authors = [ "Author 1","Author 2" ]       BicycleType = "Road"
        Price = -2                                Brand = "Brand-Company A"
        Dimensions = "8.5 x 11.0 x 0.5"           Price = 200
        PageCount = 500                           Gender = "M"
        InPublication = 1                         Color = [ "Green", "Black" ]
        ProductCategory = "Book"                  ProductCategory = "Bike"
}                                             }
{       Id = 201
        ProductName = "18-Bicycle 201"
        Description = "201 description"
        BicycleType = "Road"
        Brand = "Brand-Company A"
        Price = 100
        Gender = "M"
        Color = [ "Red", "Black" ]
        ProductCategory = "Bike"
}
DynamoDB: Example
●   Storage in Dynamo:
        –   <Tabel_List, {ProductCatalog,....}>
        –   <ProductCatalog, {101,102,201,202}>
        –   <101, {ProductName={},ISBN={},Authors={}...}>
                                – or –
        –   <Tabel_List, {ProductCatalog,....}>
        –   <ProductCatalog, {101,102,201,202}>
        –   <101, {ProductName,ISBN,Authors...}>
        –   <101_Authors,{Author 1,Author 2}>1
                            Dynamo & DynamoDB
DynamoDB: Table Primary Keys
●   A table in DynamoDB must have a primary key.
●   A primary key can be either “hash only” or hash and range.
●   DynamoDB uses unsorted hash index, while the range index
    is sorted.
●   Hash only primary key is based on only a single attribute.
●   Hash and range primary key is based on two attributes.
●   Data types:
         –   Scalar data types: Number, String, and Binary.
         –   Multi-valued types: String Set, Number Set, and Binary Set.
                              Dynamo & DynamoDB
DynamoDB: Read operation
●   Availability and durability are maintained through data
    replication.
●    Updating all the replicas after data mutation requires some
    latency; DynamoDB eventually will synchronize all the replicas.
●   DynamoDB supports two read operations:
         –   Eventually consistent read
                   ● Does not necessarily reflects the last data mutation.
                   ● Very fast data access; not affected by failures.


         –   Consistent read
                   ●   Always reflects the last data access.
                   ●   Wait for data to be consistent in all replicas; affected by
                        network and storage failures.
DynamoDB: Similar services
●   Datastore on Google Appengine
●   Cloudant Data Layer (CouchDB)




                     Dynamo & DynamoDB
DynamoDB: try it today




       Dynamo & DynamoDB

Mais conteúdo relacionado

Mais procurados

Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and consSaniya Khalsa
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Web Services
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceReal-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAmazon Web Services
 
NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011David Funaro
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best PracticesAmazon Web Services
 
SRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftSRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftAmazon Web Services
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 

Mais procurados (20)

Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and cons
 
Project Voldemort
Project VoldemortProject Voldemort
Project Voldemort
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech Talks
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceReal-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing Performance
 
NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
 
SRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftSRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon Redshift
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 

Destaque

Efficiency, rating, and applications of dynamos
Efficiency, rating, and applications of dynamosEfficiency, rating, and applications of dynamos
Efficiency, rating, and applications of dynamosMaria Romina Angustia
 
Efficiency, rating & applications of dynamos
Efficiency, rating & applications of dynamosEfficiency, rating & applications of dynamos
Efficiency, rating & applications of dynamosMaria Romina Angustia
 
Efficiency, rating & applications of dynamos
Efficiency, rating & applications of dynamosEfficiency, rating & applications of dynamos
Efficiency, rating & applications of dynamosMaria Romina Angustia
 
La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcacoLa big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcacoData Con LA
 
Kai – An Open Source Implementation of Amazon’s Dynamo
Kai – An Open Source Implementation of Amazon’s DynamoKai – An Open Source Implementation of Amazon’s Dynamo
Kai – An Open Source Implementation of Amazon’s DynamoTakeru INOUE
 
rms value
rms valuerms value
rms value2461998
 
rms value average value
rms value average valuerms value average value
rms value average value2461998
 
Construction of dc machines
Construction of dc machinesConstruction of dc machines
Construction of dc machinesRohini Haridas
 
Generator 101 power point
Generator 101 power pointGenerator 101 power point
Generator 101 power pointJorge Palgan
 
AWS Webcast - Data Modeling for low cost and high performance with DynamoDB
AWS Webcast - Data Modeling for low cost and high performance with DynamoDBAWS Webcast - Data Modeling for low cost and high performance with DynamoDB
AWS Webcast - Data Modeling for low cost and high performance with DynamoDBAmazon Web Services
 
A PRESENTATION ON HALF WAVE RECTIFIER
A PRESENTATION ON  HALF WAVE RECTIFIERA PRESENTATION ON  HALF WAVE RECTIFIER
A PRESENTATION ON HALF WAVE RECTIFIERSimran Singh
 
Operation and Maintenance of Diesel Power Generating Plants
Operation and Maintenance of Diesel Power Generating PlantsOperation and Maintenance of Diesel Power Generating Plants
Operation and Maintenance of Diesel Power Generating PlantsLiving Online
 

Destaque (20)

Dynamo
DynamoDynamo
Dynamo
 
Dynamo & its functions
Dynamo & its functionsDynamo & its functions
Dynamo & its functions
 
Faraday’s dynamo
Faraday’s dynamoFaraday’s dynamo
Faraday’s dynamo
 
Efficiency, rating, and applications of dynamos
Efficiency, rating, and applications of dynamosEfficiency, rating, and applications of dynamos
Efficiency, rating, and applications of dynamos
 
Dynamo ppt
Dynamo pptDynamo ppt
Dynamo ppt
 
Efficiency, rating & applications of dynamos
Efficiency, rating & applications of dynamosEfficiency, rating & applications of dynamos
Efficiency, rating & applications of dynamos
 
Efficiency, rating & applications of dynamos
Efficiency, rating & applications of dynamosEfficiency, rating & applications of dynamos
Efficiency, rating & applications of dynamos
 
La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcacoLa big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
 
Kai – An Open Source Implementation of Amazon’s Dynamo
Kai – An Open Source Implementation of Amazon’s DynamoKai – An Open Source Implementation of Amazon’s Dynamo
Kai – An Open Source Implementation of Amazon’s Dynamo
 
rms value
rms valuerms value
rms value
 
rms value average value
rms value average valuerms value average value
rms value average value
 
Shielding Design
Shielding DesignShielding Design
Shielding Design
 
Construction of dc machines
Construction of dc machinesConstruction of dc machines
Construction of dc machines
 
Generator 101 power point
Generator 101 power pointGenerator 101 power point
Generator 101 power point
 
AWS Webcast - Data Modeling for low cost and high performance with DynamoDB
AWS Webcast - Data Modeling for low cost and high performance with DynamoDBAWS Webcast - Data Modeling for low cost and high performance with DynamoDB
AWS Webcast - Data Modeling for low cost and high performance with DynamoDB
 
A PRESENTATION ON HALF WAVE RECTIFIER
A PRESENTATION ON  HALF WAVE RECTIFIERA PRESENTATION ON  HALF WAVE RECTIFIER
A PRESENTATION ON HALF WAVE RECTIFIER
 
Earthing
EarthingEarthing
Earthing
 
diesel generator
diesel generatordiesel generator
diesel generator
 
Rectifier
RectifierRectifier
Rectifier
 
Operation and Maintenance of Diesel Power Generating Plants
Operation and Maintenance of Diesel Power Generating PlantsOperation and Maintenance of Diesel Power Generating Plants
Operation and Maintenance of Diesel Power Generating Plants
 

Semelhante a Dynamo db

Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorialmubarakss
 
Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Felix Geisendörfer
 
Alternator webinar september 2019
Alternator webinar   september 2019Alternator webinar   september 2019
Alternator webinar september 2019Nadav Har'El
 
How History Justifies System Architecture (or Not)
How History Justifies System Architecture (or Not)How History Justifies System Architecture (or Not)
How History Justifies System Architecture (or Not)Thomas Zimmermann
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBAmazon Web Services
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBAmazon Web Services
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersJen Aman
 
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloadedmistercteam
 
TriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in ProductionTriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in Productiontrihug
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Roopa Tangirala
 
Swug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathSwug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathDennis Chung
 
Ict mock exam answer
Ict mock exam answerIct mock exam answer
Ict mock exam answerGary Tsang
 
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016Amazon Web Services Korea
 

Semelhante a Dynamo db (20)

Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorial
 
Amazon DynamoDB and Amazon DAX
Amazon DynamoDB and Amazon DAXAmazon DynamoDB and Amazon DAX
Amazon DynamoDB and Amazon DAX
 
Amazon DynamoDB and DAX
Amazon DynamoDB and DAXAmazon DynamoDB and DAX
Amazon DynamoDB and DAX
 
Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?
 
Alternator webinar september 2019
Alternator webinar   september 2019Alternator webinar   september 2019
Alternator webinar september 2019
 
How History Justifies System Architecture (or Not)
How History Justifies System Architecture (or Not)How History Justifies System Architecture (or Not)
How History Justifies System Architecture (or Not)
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDB
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDB
 
Amazon DynamoDB and DAX
Amazon DynamoDB and DAXAmazon DynamoDB and DAX
Amazon DynamoDB and DAX
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloaded
 
Node.js - As a networking tool
Node.js - As a networking toolNode.js - As a networking tool
Node.js - As a networking tool
 
TriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in ProductionTriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in Production
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
Swug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathSwug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainath
 
3DRepo
3DRepo3DRepo
3DRepo
 
Ict mock exam answer
Ict mock exam answerIct mock exam answer
Ict mock exam answer
 
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
 
NoSQL Infrastructure
NoSQL InfrastructureNoSQL Infrastructure
NoSQL Infrastructure
 

Mais de Zuhair khayyat

Scaling Big Data Cleansing
Scaling Big Data CleansingScaling Big Data Cleansing
Scaling Big Data CleansingZuhair khayyat
 
BigDansing presentation slides for KAUST
BigDansing presentation slides for KAUSTBigDansing presentation slides for KAUST
BigDansing presentation slides for KAUSTZuhair khayyat
 
IEJoin and Big Data Cleansing
IEJoin and Big Data CleansingIEJoin and Big Data Cleansing
IEJoin and Big Data CleansingZuhair khayyat
 
BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015Zuhair khayyat
 
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...Zuhair khayyat
 
Large Graph Processing
Large Graph ProcessingLarge Graph Processing
Large Graph ProcessingZuhair khayyat
 
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph ProcessingMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph ProcessingZuhair khayyat
 
Graphlab under the hood
Graphlab under the hoodGraphlab under the hood
Graphlab under the hoodZuhair khayyat
 

Mais de Zuhair khayyat (11)

Scaling Big Data Cleansing
Scaling Big Data CleansingScaling Big Data Cleansing
Scaling Big Data Cleansing
 
BigDansing presentation slides for KAUST
BigDansing presentation slides for KAUSTBigDansing presentation slides for KAUST
BigDansing presentation slides for KAUST
 
IEJoin and Big Data Cleansing
IEJoin and Big Data CleansingIEJoin and Big Data Cleansing
IEJoin and Big Data Cleansing
 
BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015
 
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...
 
Large Graph Processing
Large Graph ProcessingLarge Graph Processing
Large Graph Processing
 
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph ProcessingMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
 
Google appengine
Google appengineGoogle appengine
Google appengine
 
MapReduce
MapReduceMapReduce
MapReduce
 
Kineograph
KineographKineograph
Kineograph
 
Graphlab under the hood
Graphlab under the hoodGraphlab under the hood
Graphlab under the hood
 

Dynamo db

  • 1. Dynamo: Amazon’s Highly Available Key-value Store & Amazon DynamoDB Presented by: Zuhair Khayyat
  • 2. What is Dynamo ● Dynamo is an eventually-consistent key-value storage system used in Amazon's web services to support scalable highly available data access. ● Dynamo is used to mainly to manage the state of services, such as S3 and e-commerce. ● Optimized for availability (always on experience) to maximize customer satisfaction in trade of: – Data consistency – Durability – Performance Dynamo & DynamoDB
  • 3. Dynamo: Why not relational database ● Many services on Amazon’s platform that requires high reliability requirements only need primary-key access to a data store. ● Relational databases are highly optimized for complex query processing, however they have limited scalability and chose consistency over availability. ● The complicated features of relational databases requires expensive hardware and very skillful administrators. Dynamo & DynamoDB
  • 4. Dynamo: Amazon's Requirements ● Simple reads and writes to binary objects not larger than 1 MB while no operation spans for multiple data. ● Very fast data access, (<300) ms response time. ● Heterogeneous commodity hardware infrastructure. ● Used by decentralized, loosely coupled services. ● Highly available (always on); expect small frequent network and server failures. Dynamo & DynamoDB
  • 5. Dynamo: Consistency and Replication ● Strong data consistency and high data availability cannot be achieved simultaneously. ● “Dynamo is designed to be an eventually consistent data store; that is all updates reach all replicas eventually.” ● “always writable” data store, do not reject write operations if data is inconsistent. – Imagine you are ordering form Amazon.com and the website rejects adding an item to your cart! ● Conflict resolution: The application is responsible too resolve the data conflicts. Dynamo & DynamoDB
  • 6. Dynamo VS Bigtable Dynamo Bigtable Cluster Setup decentralized Centralized (GFS) Data Access (Primary-key, version*) (row key,col key,timestamp) Data Partitioning and Load Customized Consistency 64K partitions stored in Balancing Hashing least utilized machines (GFS) Data Query Zero-hop DHT Ask the Master (GFS) Read Operation Multiple copies read Single copy read Typical Value size Less than 1 MB Not specified (GFS) Writes operation on Accept all write operations Make data unavailable until inconsistence Data and resolve conflicts consistent (GFS) Dynamo & DynamoDB
  • 7. Dynamo: Interface ● Key-value storage system with operators: – get(key): returns a single or a list of objects with conflicting versions – put(key,context,object): place the object and write its replicas to disk. Context contains information about the object such as the version. ● MD5 hashing is applied on the key to generate 128-bit identifier. Dynamo & DynamoDB
  • 8. Dynamo: Partitioning ● Dynamo is designed to scale incrementally one machine at a time. ● Consistent hashing generates a fixed output space constructed as a ring. ● A variant of consistent hashing (virtual nodes) is used by Dynamo to dynamically repartition and load balance the data over the storage hosts. ● Each storage host acquires data depending on its capacity. Dynamo & DynamoDB
  • 9. Dynamo: Consistent Hashing A H [1,10] [71,80] D [11.20] A G H [1,10] [61,70] E [71,80] D [21.30] [11.20] G C [61,70] [51,60] B E F [31.40] [21.30] [41,50] C [55,60] B Adding a node [31.40] (storage host) I F [47,54] [41,46] Dynamo & DynamoDB
  • 10. Dynamo: Variant of Consistent Hashing A D* [1,10] [71,80] D [11.20] A B* D* [1,10] D [61,70] C* [71,80] [11.16] [21.30] B* E C [61,70] [17,24] [51,60] B A* [31.40] C* [41,50] C [25.30] [55,60] Adding a node B (storage host) E* [31.40] A* [47,54] Dynamo & DynamoDB [41,46]
  • 11. Dynamo: Replication ● Each key (k) is assigned to a coordinator node (i). ● Each value (v) is replicated to (N-1) clockwise successor logical nodes in the ring. ● Node (i) is responsible to update all other (N-1) replicas for the keys it owns. ● Each key (k) has a preference list of physical nodes that are responsible to maintain and access the key's data Dynamo & DynamoDB
  • 12. Dynamo: Data Versioning ● Eventual consistency protocol is used to update all data replicas asynchronously. ● put() is returned before updating all replicas. ● get() can return multiple versions for the same key. ● Dynamo track each data mutation as a new version version to support “write always” protocol. ● Dynamo uses vector clocks protocol for versioning. Dynamo & DynamoDB
  • 13. Dynamo: vector clocks example 1 Value=100 A A:1 B C Dynamo & DynamoDB
  • 14. Dynamo: vector clocks example 1 Value=100 A A:1 +1 Value=101 B A:1,B:1 C Dynamo & DynamoDB
  • 15. Dynamo: vector clocks example 1 Value=100 A A:1 +1 Value=101 B A:1,B:1 +4 Value=105 C A:1,B:1,C:1 Dynamo & DynamoDB
  • 16. Dynamo: vector clocks example 1 Value=100 A A:1 +1 Value=101 Value=205 B A:1,B:1 A:1,B:2,C:1 +4 Value=105 +100 C A:1,B:1,C:1 Dynamo & DynamoDB
  • 17. Dynamo: vector clocks example 1 Value=100 A A:1 +1 Value=101 Value=205 B A:1,B:1 A:1,B:2,C:1 +4 +110 Value=105 +100 Value=315 C A:1,B:1,C:1 A:1,B:2,C:2 Dynamo & DynamoDB
  • 18. Dynamo: vector clocks example 2 Value=100 A A:1 +1 Value=101 +100 Value=201 B A:1,B:1 A:1,B:2 +4 +110 Value=105 Value=311 C A:1,B:1,C:1 +110 A:1,B:2,C:1 Conflict! Dynamo & DynamoDB Value=215 A:1,B:1,C:2
  • 19. Dynamo: resolving conflicts ● Syntactic reconciliation: – The Application is able to resolve the conflict automatically ● Semantic reconciliation: – Merge results from different conflicts, make the user revise the new values. – Example: Amazon's shopping cart: ● Preserve “Add to cart” items. ● Deleted items can resurface. Dynamo & DynamoDB
  • 20. Dynamo: Processing put() & get() ● The user is able to issue commands with either of the following scenarios: – A generic load balancer is invoked to direct the user's requests to the least utilization. – Use a partition-aware library to direct the request to one of the data owners directly. ● The system requires two configurable values: – R: the number of available healthy nodes required for a successful reads – W: the number of available healthy nodes required for a successful write. Dynamo & DynamoDB
  • 21. Dynamo: Hinted Handoff ● Assuming N=3, a failed put() operation on node A is temporarily handled by B. ● After A recovers, B sends the result of put() operation back to A. ● Advantage: temporarily A D' failure has minimal effect D on the application. A'' C' C Dynamo & DynamoDB B A'
  • 22. Dynamo: Scalability ● Adding or removing the node requires a third party tool or direct user interaction. ● Gossip-based protocol is used to propagate membership throughout the cluster and to detect failures. ● Replica synchronization is done using Merkle hash tree. Dynamo & DynamoDB
  • 23. Dynamo: Peak Performance ● Shopping Cart Service at a holiday: – 10 Million requests – 3 million checkouts – 100000+ concurrent sessions – No downtime! Dynamo & DynamoDB
  • 24. Dynamo DB Dynamo & DynamoDB
  • 25. What is DynamoDB ● A NoSQL database service available publicly through amazon's EC2; released on 2012. ● Based on Dynamo, a scalable highly available (key, value) storage system used by Amazon's servers; published in SOSP 2007 ● Dynamo & DynamoDB
  • 26. DynamoDB: Data Model ● The database is a collection of tables. ● A table is a collection of items. ● An item is a collection of attributes. ● Primary key is required. ● No nulls or empty Strings. ● No schema is required, items can vary in the number of attributes.. How it is possible? Dynamo & DynamoDB
  • 27. DynamoDB: Example ● Table name: ProductCatalog { Id = 101 { Id = 202 ProductName = "Book 101 Title" ProductName = "21-Bicycle 202" ISBN = "111-1111111111" Description = "202 description" Authors = [ "Author 1","Author 2" ] BicycleType = "Road" Price = -2 Brand = "Brand-Company A" Dimensions = "8.5 x 11.0 x 0.5" Price = 200 PageCount = 500 Gender = "M" InPublication = 1 Color = [ "Green", "Black" ] ProductCategory = "Book" ProductCategory = "Bike" } } { Id = 201 ProductName = "18-Bicycle 201" Description = "201 description" BicycleType = "Road" Brand = "Brand-Company A" Price = 100 Gender = "M" Color = [ "Red", "Black" ] ProductCategory = "Bike" }
  • 28. DynamoDB: Example ● Storage in Dynamo: – <Tabel_List, {ProductCatalog,....}> – <ProductCatalog, {101,102,201,202}> – <101, {ProductName={},ISBN={},Authors={}...}> – or – – <Tabel_List, {ProductCatalog,....}> – <ProductCatalog, {101,102,201,202}> – <101, {ProductName,ISBN,Authors...}> – <101_Authors,{Author 1,Author 2}>1 Dynamo & DynamoDB
  • 29. DynamoDB: Table Primary Keys ● A table in DynamoDB must have a primary key. ● A primary key can be either “hash only” or hash and range. ● DynamoDB uses unsorted hash index, while the range index is sorted. ● Hash only primary key is based on only a single attribute. ● Hash and range primary key is based on two attributes. ● Data types: – Scalar data types: Number, String, and Binary. – Multi-valued types: String Set, Number Set, and Binary Set. Dynamo & DynamoDB
  • 30. DynamoDB: Read operation ● Availability and durability are maintained through data replication. ● Updating all the replicas after data mutation requires some latency; DynamoDB eventually will synchronize all the replicas. ● DynamoDB supports two read operations: – Eventually consistent read ● Does not necessarily reflects the last data mutation. ● Very fast data access; not affected by failures. – Consistent read ● Always reflects the last data access. ● Wait for data to be consistent in all replicas; affected by network and storage failures.
  • 31. DynamoDB: Similar services ● Datastore on Google Appengine ● Cloudant Data Layer (CouchDB) Dynamo & DynamoDB
  • 32. DynamoDB: try it today Dynamo & DynamoDB