SlideShare a Scribd company logo
1 of 100
Download to read offline
Under the Covers of DynamoDB
                            Matt Wood
               Principal Data Scientist
                                @mza
Hello.
Overview
 1. Getting started
 2. Data modeling
 3. Partitioning
 4. Replication & Analytics
 5. Customer story: Localytics
1




Getting started
DynamoDB is a managed
NoSQL database service.

Store and retrieve any amount of data.
  Serve any level of request traffic.
Without the operational burden.
Consistent, predictable performance.

        Single digit millisecond latency.
         Backed on solid-state drives.
Flexible data model.

Key/attribute pairs. No schema required.
    Easy to create. Easy to adjust.
Seamless scalability.

No table size limits. Unlimited storage.
            No downtime.
Durable.

             Consistent, disk only writes.
Replication across data centers and availability zones.
Without the operational burden.
Focus on your app.
Two decisions + three clicks
     = ready for use
Level of throughput
        Primary keys


Two decisions + three clicks
     = ready for use
Level of throughput
        Primary keys


Two decisions + three clicks
     = ready for use
Provisioned throughput.


Reserve IOPS for reads and writes.
  Scale up for down at any time.
Pay per capacity unit.


Priced per hour of provisioned throughput.
Write throughput.

Size of item x writes per second
   $0.0065 for 10 write units
Consistent writes.

        Atomic increment and decrement.

Optimistic concurrency control: conditional writes.
Transactions.

    Item level transactions only.

Puts, updates and deletes are ACID.
Strong or eventual consistency


 Read throughput.
Strong or eventual consistency


           Read throughput.


Provisioned units = size of item x reads per second

          $0.0065 per hour for 50 units
Strong or eventual consistency


           Read throughput.

Provisioned units = size of item x reads per second
                                   2
          $0.0065 per hour for 100 units
Strong or eventual consistency


 Read throughput.


  Same latency expectations.

 Mix and match at ‘read time’.
Provisioned throughput is
managed by DynamoDB.
Data is partitioned and
managed by DynamoDB.
Indexed data storage.


    $0.25 per GB per month.
    Tiered bandwidth pricing:
aws.amazon.com/dynamodb/pricing
Reserved capacity.


Up to 53% for 1 year reservation.
Up to 76% for 3 year reservation.
Authentication.

   Session based to minimize latency.
Uses the Amazon Security Token Service.
        Handled by AWS SDKs.
          Integrates with IAM.
Monitoring.


             CloudWatch metrics:
latency, consumed read and write throughput,
             errors and throttling.
Libraries, mappers and mocks.

  ColdFusion, Django, Erlang, Java, .Net,
     Node.js, Perl, PHP, Python, Ruby

         http://j.mp/dynamodb-libs
2




Data modeling
date = 2012-05-16-
id = 100        09-00-10        total = 25.00


           date = 2012-05-15-
id = 101        15-00-11        total = 35.00


           date = 2012-05-16-
id = 101        12-00-10        total = 100.00
Table

           date = 2012-05-16-
id = 100        09-00-10        total = 25.00


           date = 2012-05-15-
id = 101        15-00-11        total = 35.00


           date = 2012-05-16-
id = 101        12-00-10        total = 100.00
date = 2012-05-16-
id = 100        09-00-10        total = 25.00
                                                 Item


           date = 2012-05-15-
id = 101        15-00-11        total = 35.00


           date = 2012-05-16-
id = 101        12-00-10        total = 100.00
date = 2012-05-16-
id = 100        09-00-10         total = 25.00
                                Attribute


           date = 2012-05-15-
id = 101        15-00-11         total = 35.00


           date = 2012-05-16-
id = 101        12-00-10        total = 100.00
Where is the schema?

Tables do not require a formal schema.
  Items are an arbitrarily sized hash.
Indexing.

Items are indexed by primary and secondary keys.
        Primary keys can be composite.
      Secondary keys are local to the table.
ID   Date   Total
Hash key


   ID      Date   Total
Hash key                   Range key


   ID                           Date   Total




        Composite primary key
Hash key   Range key   Secondary range key


   ID        Date             Total
Programming DynamoDB.

  Small but perfectly formed API.
CreateTable           PutItem

UpdateTable           GetItem

DeleteTable        UpdateItem

DescribeTable       DeleteItem

ListTables       BatchGetItem

Query           BatchWriteItem

Scan
CreateTable           PutItem

UpdateTable           GetItem

DeleteTable        UpdateItem

DescribeTable       DeleteItem

ListTables       BatchGetItem

Query           BatchWriteItem

Scan
CreateTable           PutItem

UpdateTable           GetItem

DeleteTable        UpdateItem

DescribeTable       DeleteItem

ListTables       BatchGetItem

Query           BatchWriteItem

Scan
Conditional updates.

PutItem, UpdateItem, DeleteItem can take
     optional conditions for operation.

UpdateItem performs atomic increments.
One API call, multiple items

       BatchGet returns multiple items by key.
BatchWrite performs up to 25 put or delete operations.
    Throughput is measured by IO, not API calls.
CreateTable           PutItem

UpdateTable           GetItem

DeleteTable        UpdateItem

DescribeTable       DeleteItem

ListTables       BatchGetItem

Query           BatchWriteItem

Scan
Query vs Scan

      Query returns items by key.
Scan reads the whole table sequentially.
Query patterns

    Retrieve all items by hash key.

         Range key conditions:
==, <, >, >=, <=, begins with, between.

  Counts. Top and bottom n values.
         Paged responses.
EXAMPLE 1:




Mapping relationships.
Players
 user_id =   location =    joined =
   mza       Cambridge    2011-07-04
 user_id =   location =    joined =
  jeffbarr     Seattle    2012-01-20
 user_id =   location =    joined =
  werner     Worldwide    2011-05-15
Players
 user_id =   location =     joined =
   mza       Cambridge     2011-07-04
 user_id =   location =     joined =
  jeffbarr     Seattle     2012-01-20
 user_id =   location =     joined =
  werner     Worldwide     2011-05-15


Scores
 user_id =    game =        score =
   mza       angry-birds    11,000
 user_id =    game =        score =
   mza         tetris      1,223,000
 user_id =   location =     score =
  werner     bejewelled     55,000
Players
 user_id =   location =     joined =
   mza       Cambridge     2011-07-04
 user_id =   location =     joined =
  jeffbarr     Seattle     2012-01-20
 user_id =   location =     joined =
  werner     Worldwide     2011-05-15


Scores                                  Leader boards
 user_id =    game =        score =       game =        score =    user_id =
   mza       angry-birds    11,000       angry-birds    11,000       mza
 user_id =    game =        score =       game =        score =    user_id =
   mza         tetris      1,223,000       tetris      1,223,000     mza
 user_id =   location =     score =       game =        score =    user_id =
  werner     bejewelled     55,000         tetris      9,000,000    jeffbarr
Players
 user_id =   location =     joined =
   mza       Cambridge     2011-07-04
 user_id =   location =     joined =
                                                Query for scores
  jeffbarr     Seattle     2012-01-20               by user
 user_id =   location =     joined =
  werner     Worldwide     2011-05-15


Scores                                  Leader boards
 user_id =    game =        score =       game =        score =    user_id =
   mza       angry-birds    11,000       angry-birds    11,000       mza
 user_id =    game =        score =       game =        score =    user_id =
   mza         tetris      1,223,000       tetris      1,223,000     mza
 user_id =   location =     score =       game =        score =    user_id =
  werner     bejewelled     55,000         tetris      9,000,000    jeffbarr
Players
 user_id =   location =     joined =
   mza       Cambridge     2011-07-04
 user_id =   location =     joined =         High scores by game
  jeffbarr     Seattle     2012-01-20
 user_id =   location =     joined =
  werner     Worldwide     2011-05-15


Scores                                  Leader boards
 user_id =    game =        score =       game =        score =    user_id =
   mza       angry-birds    11,000       angry-birds    11,000       mza
 user_id =    game =        score =       game =        score =    user_id =
   mza         tetris      1,223,000       tetris      1,223,000     mza
 user_id =   location =     score =       game =        score =    user_id =
  werner     bejewelled     55,000         tetris      9,000,000    jeffbarr
EXAMPLE 2:




Storing large items.
Unlimited storage.

Unlimited attributes per item.
 Unlimited items per table.

 Maximum of 64k per item.
Split across items.


                                        message =
message_id = 1          part = 1
                                        <first 64k>

                                        message =
message_id = 1          part = 2
                                       <second 64k>

                                          joined =
message_id = 1          part = 3
                                        <third 64k>
Store a pointer to S3.


                                 message =
message_id = 1
                        http://s3.amazonaws.com...

                                 message =
message_id = 2
                        http://s3.amazonaws.com...

                                 message =
message_id = 3
                        http://s3.amazonaws.com...
EXAMPLE 3:




Time series data
Hot and cold tables.
April
        event_id =          timestamp =       key =
          1000          2013-04-16-09-59-01   value
        event_id =          timestamp =       key =
          1001          2013-04-16-09-59-02   value
        event_id =          timestamp =       key =
          1002          2013-04-16-09-59-02   value

March
        event_id =          timestamp =       key =
          1000          2013-03-01-09-59-01   value
        event_id =          timestamp =       key =
          1001          2013-03-01-09-59-02   value
        event_id =         timestamp =        key =
December   January   February   March   April
Archive data.

  Move old data to S3: lower cost.
    Still available for analytics.

Run queries across hot and cold data
      with Elastic MapReduce.
3




Partitioning
Uniform workload.

         Data stored across multiple partitions.
       Data is primarily distributed by primary key.

Provisioned throughput is divided evenly across partitions.
To achieve and maintain full
 provisioned throughput, spread
workload evenly across hash keys.
Non-Uniform workload.

Might be throttled, even at high levels of throughput.
BEST PRACTICE 1:




Distinct values for hash keys.
   Hash key elements should have a
    high number of distinct values.
Lots of users with unique user_id.
Workload well distributed across hash key.
user_id =        first_name =     last_name =
  mza                 Matt            Wood
user_id =        first_name =     last_name =
 jeffbarr              Jeff           Barr
user_id =        first_name =     last_name =
 werner              Werner          Vogels
user_id =        first_name =     last_name =
 simone             Simone          Brunozzi

   ...               ...              ...
BEST PRACTICE 2:




Avoid limited hash key values.
    Hash key elements should have a
     high number of distinct values.
Small number of status codes.
Unevenly, non-uniform workload.

status =                   date =
  200                2012-04-01-00-00-01
status =                   date =
  404                2012-04-01-00-00-01
 status                    date =
  404                2012-04-01-00-00-01
status =                   date =
  404                2012-04-01-00-00-01
BEST PRACTICE 3:




Model for even distribution.
Access by hash key value should be evenly
     distributed across the dataset.
Large number of devices.
Small number which are much more popular than others.
           Workload unevenly distributed.
            mobile_id =          access_date =
               100            2012-04-01-00-00-01
            mobile_id =          access_date =
               100            2012-04-01-00-00-02
            mobile_id =          access_date =
               100            2012-04-01-00-00-03
            mobile_id =          access_date =
               100            2012-04-01-00-00-04

                ...                   ...
Sample access pattern.
Workload randomized by hash key.

  mobile_id =          access_date =
    100.1           2012-04-01-00-00-01
  mobile_id =          access_date =
    100.2           2012-04-01-00-00-02
  mobile_id =          access_date =
    100.3           2012-04-01-00-00-03
  mobile_id =          access_date =
    100.4           2012-04-01-00-00-04

      ...                   ...
4




Replication & Analytics
Seamless scale.


Scalable methods for data processing.
Scalable methods for backup/restore.
Amazon Elastic MapReduce.

    Managed Hadoop service for
     data-intensive workflows.

       aws.amazon.com/emr
create external table items_db
 (id string, votes bigint, views bigint) stored by
 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
  tblproperties
 ("dynamodb.table.name" = "items",
  "dynamodb.column.mapping" =
  "id:id,votes:votes,views:views");
select id, likes, views
from items_db
order by views desc;
5
DynamoDB @ Localytics

                       Mohit Dilawari
              Director of Engineering
                          @mdilawari
About Localytics
• Mobile App Analytics Service

• 750+ Million Devices and over 20,000 Apps

• Customers Include:




                       …and many more.
                                              84
About the Development Team

•   Small team of four managing entire AWS infrastructure - 100 EC2
    Instances
•   Experts in BigData
•   Leveraging Amazon's service has been the key to our success
•   Large scale users of:
•   SQS
•   S3
•   ELB
•   RDS
•   Route53
•   Elastic Cache
•   EMR
                         …and of course DynamoDB
                                                                 85
Why DynamoDB?

Set it and Forget it




                       86
Our use-case: Dedup Data
•   Each datapoint includes a globally unique ID

•   Mobile traffic over 2G/3G will upload periodic duplicate data

•   We accept data up to a 28 day window




                                                                    87
First Design for Dedup table
Unique ID:    aaaaaaaaaaaaaaaaaaaaaaaaa333333333333333


 Table Name = dedup_table

 ID

 aaaaaaaaaaaaaaaaaaaaaaaaa111111111111111

 aaaaaaaaaaaaaaaaaaaaaaaaa222222222222222
aaaaaaaaaaaaaaaaaaaaaaaaa333333333333333



                "Test and Set" in a single operation

                                                         88
Optimization One - Data Aging

•   Partition by Month

•   Create new table day before the month

•   Need to keep two months of data




                                            89
Optimization One - Data Aging
Unique ID:   bbbbbbbbbbbbbbbbbbbbbbbbb333333333333333

  Check Previous month:

     Table Name = March2013_dedup
     ID
     aaaaaaaaaaaaaaaaaaaaaaaaa111111111111111   Not Here!

     aaaaaaaaaaaaaaaaaaaaaaaaa222222222222222




                                                            90
Optimization One - Data Aging
Unique ID:    bbbbbbbbbbbbbbbbbbbbbbbbb333333333333333

  Test and Set in current month:

     Table Name = April2013_dedup

     ID

     bbbbbbbbbbbbbbbbbbbbbbbbb111111111111111
     bbbbbbbbbbbbbbbbbbbbbbbbb222222222222222
     bbbbbbbbbbbbbbbbbbbbbbbbb333333333333333   Inserted


                                                           91
Optimization Two

•   Reduce the index size - Reduces costs

•   Each item has a 100 byte overhead which is substantial

•   Combine multiple IDs together to one record

•   Split each ID into two halves
o   First half is the key. Second Half is added to the set




                                                             92
Optimization Two - Use Sets
Unique ID:         ccccccccccccccccccccccccccc999999999999999


           ccccccccccccccccccccccccccc                  999999999999999

Prefix                          Values

aaaaaaaaaaaaaaaaaaaaaaaaa       [111111111111111, 222222222222222, 333333333333333]

bbbbbbbbbbbbbbbbbbbbbbbbb       [444444444444444, 555555555555555, 666666666666666]

ccccccccccccccccccccccccccc     [777777777777777, 888888888888888,                  ]




                                                                               93
Optimization Three - Combine Months
   •     Go back to a single table



Prefix           March2013                          April2013

aaaaaaaaaa...    [111111111111111, 22222222222...   [1212121212121212, 3434343434....

bbbbbbbbbb...    [444444444444444, 555555555....    [4545454545454545, 6767676767.....

ccccccccccc...   [777777777777777, 888888888...     [8989898989898989, 1313131313....


  One Operation                  1. Delete February2013 Field
                                 2. Check ID in March2013
                                 • Test and Set into April 2013
                                                                                   94
Recap

 Compare Plans for 20 Billion IDs per month
Plan              Storage   Read    Write Costs   Total    Savings
                  Costs     Costs

Naive (after a    $8400     0       $4000         $12400
year)

Data Age          $900      $350    $4000         $5250    57%

Using Sets        $150      $350    $4000         $4500    64%

Multiple Months   $150      0       $4000         $4150    67%


                                                                     95
Thank You
@mdilawari



             96
Summary
 1. Getting started
 2. Data modeling
 3. Partitioning
 4. Replication & Analytics
 5. Customer story: Localytics
Free tier.
aws.amazon.com/dynamodb
Thank you!
matthew@amazon.com
@mza

More Related Content

What's hot

How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
Vinay Kumar Chella
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Building high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftBuilding high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache Thrift
RX-M Enterprises LLC
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviate
NETWAYS
 

What's hot (20)

Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Real Time Processing Using Twitter Heron by Karthik Ramasamy
Real Time Processing Using Twitter Heron by Karthik RamasamyReal Time Processing Using Twitter Heron by Karthik Ramasamy
Real Time Processing Using Twitter Heron by Karthik Ramasamy
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache Pinot
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Building high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftBuilding high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache Thrift
 
Introducing DynamoDB
Introducing DynamoDBIntroducing DynamoDB
Introducing DynamoDB
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
 
Oracle Transparent Data Encryption (TDE) 12c
Oracle Transparent Data Encryption (TDE) 12cOracle Transparent Data Encryption (TDE) 12c
Oracle Transparent Data Encryption (TDE) 12c
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine Learning
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviate
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Event Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspectiveEvent Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspective
 

Viewers also liked

Refactor Your Monolithic Rails App to a SOA
Refactor Your Monolithic Rails App to a SOARefactor Your Monolithic Rails App to a SOA
Refactor Your Monolithic Rails App to a SOA
Chris Wyckoff
 
Event oriented programming
Event oriented programmingEvent oriented programming
Event oriented programming
Ashwini Awatare
 
Event+driven+programming key+features
Event+driven+programming key+featuresEvent+driven+programming key+features
Event+driven+programming key+features
Faisal Aziz
 
AWS Canberra WWPS Summit 2013 - Become an Innovation Enterprise with AWS
AWS Canberra WWPS Summit 2013 - Become an Innovation Enterprise with AWSAWS Canberra WWPS Summit 2013 - Become an Innovation Enterprise with AWS
AWS Canberra WWPS Summit 2013 - Become an Innovation Enterprise with AWS
Amazon Web Services
 
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4 AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
Amazon Web Services
 

Viewers also liked (20)

Amazon DynamoDB Design Patterns for Ultra-High Performance Apps (DAT304) | AW...
Amazon DynamoDB Design Patterns for Ultra-High Performance Apps (DAT304) | AW...Amazon DynamoDB Design Patterns for Ultra-High Performance Apps (DAT304) | AW...
Amazon DynamoDB Design Patterns for Ultra-High Performance Apps (DAT304) | AW...
 
(SDD407) Amazon DynamoDB: Data Modeling and Scaling Best Practices | AWS re:I...
(SDD407) Amazon DynamoDB: Data Modeling and Scaling Best Practices | AWS re:I...(SDD407) Amazon DynamoDB: Data Modeling and Scaling Best Practices | AWS re:I...
(SDD407) Amazon DynamoDB: Data Modeling and Scaling Best Practices | AWS re:I...
 
Deep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDBDeep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDB
 
Building Applications with DynamoDB
Building Applications with DynamoDBBuilding Applications with DynamoDB
Building Applications with DynamoDB
 
Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDBData & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
 
CloudSpokes Overview
CloudSpokes OverviewCloudSpokes Overview
CloudSpokes Overview
 
Getting started with Amazon DynamoDB
Getting started with Amazon DynamoDBGetting started with Amazon DynamoDB
Getting started with Amazon DynamoDB
 
Refactor Your Monolithic Rails App to a SOA
Refactor Your Monolithic Rails App to a SOARefactor Your Monolithic Rails App to a SOA
Refactor Your Monolithic Rails App to a SOA
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
 
Event driven theory
Event driven theoryEvent driven theory
Event driven theory
 
Event driven programming amazeballs
Event driven programming amazeballsEvent driven programming amazeballs
Event driven programming amazeballs
 
Event oriented programming
Event oriented programmingEvent oriented programming
Event oriented programming
 
DynamoDB In-depth & Developer Drill Down
DynamoDB In-depth & Developer Drill Down DynamoDB In-depth & Developer Drill Down
DynamoDB In-depth & Developer Drill Down
 
(WRK302) Event-Driven Programming
(WRK302) Event-Driven Programming(WRK302) Event-Driven Programming
(WRK302) Event-Driven Programming
 
Event+driven+programming key+features
Event+driven+programming key+featuresEvent+driven+programming key+features
Event+driven+programming key+features
 
AWS Summit 2013 | Auckland - Continuous Deployment Practices, with Production...
AWS Summit 2013 | Auckland - Continuous Deployment Practices, with Production...AWS Summit 2013 | Auckland - Continuous Deployment Practices, with Production...
AWS Summit 2013 | Auckland - Continuous Deployment Practices, with Production...
 
Journey Through The Cloud Webinar Program - What is AWS?
Journey Through  The Cloud Webinar Program - What is AWS?Journey Through  The Cloud Webinar Program - What is AWS?
Journey Through The Cloud Webinar Program - What is AWS?
 
Delivering Search for Today's Local, Social, and Mobile Applications
Delivering Search for Today's Local, Social, and Mobile ApplicationsDelivering Search for Today's Local, Social, and Mobile Applications
Delivering Search for Today's Local, Social, and Mobile Applications
 
AWS Canberra WWPS Summit 2013 - Become an Innovation Enterprise with AWS
AWS Canberra WWPS Summit 2013 - Become an Innovation Enterprise with AWSAWS Canberra WWPS Summit 2013 - Become an Innovation Enterprise with AWS
AWS Canberra WWPS Summit 2013 - Become an Innovation Enterprise with AWS
 
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4 AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
 

Similar to Under the Covers of DynamoDB

Kicking ass with redis
Kicking ass with redisKicking ass with redis
Kicking ass with redis
Dvir Volk
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for Developers
Jonathan Levin
 
Intro to-rails-webperf
Intro to-rails-webperfIntro to-rails-webperf
Intro to-rails-webperf
New Relic
 

Similar to Under the Covers of DynamoDB (20)

DAT302 Under the Covers of Amazon DynamoDB - AWS re: Invent 2012
DAT302 Under the Covers of Amazon DynamoDB - AWS re: Invent 2012DAT302 Under the Covers of Amazon DynamoDB - AWS re: Invent 2012
DAT302 Under the Covers of Amazon DynamoDB - AWS re: Invent 2012
 
Conhecendo o DynamoDB
Conhecendo o DynamoDBConhecendo o DynamoDB
Conhecendo o DynamoDB
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
 
DynamoDB Deep Dive
DynamoDB Deep DiveDynamoDB Deep Dive
DynamoDB Deep Dive
 
Beyond php it's not (just) about the code
Beyond php   it's not (just) about the codeBeyond php   it's not (just) about the code
Beyond php it's not (just) about the code
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
 
Staying railsy - while scaling complexity or Ruby on Rails in Enterprise Soft...
Staying railsy - while scaling complexity or Ruby on Rails in Enterprise Soft...Staying railsy - while scaling complexity or Ruby on Rails in Enterprise Soft...
Staying railsy - while scaling complexity or Ruby on Rails in Enterprise Soft...
 
SOLID Ruby, SOLID Rails
SOLID Ruby, SOLID RailsSOLID Ruby, SOLID Rails
SOLID Ruby, SOLID Rails
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
 
Timothy N. Tsvetkov, Rails 3.1
Timothy N. Tsvetkov, Rails 3.1Timothy N. Tsvetkov, Rails 3.1
Timothy N. Tsvetkov, Rails 3.1
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Kicking ass with redis
Kicking ass with redisKicking ass with redis
Kicking ass with redis
 
AWS Under the covers with Amazon DynamoDB IP Expo 2013
AWS Under the covers with Amazon DynamoDB IP Expo 2013AWS Under the covers with Amazon DynamoDB IP Expo 2013
AWS Under the covers with Amazon DynamoDB IP Expo 2013
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for Developers
 
Intro to-rails-webperf
Intro to-rails-webperfIntro to-rails-webperf
Intro to-rails-webperf
 
performance vamos dormir mais?
performance vamos dormir mais?performance vamos dormir mais?
performance vamos dormir mais?
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Under the Covers of DynamoDB

  • 1. Under the Covers of DynamoDB Matt Wood Principal Data Scientist @mza
  • 3. Overview 1. Getting started 2. Data modeling 3. Partitioning 4. Replication & Analytics 5. Customer story: Localytics
  • 5. DynamoDB is a managed NoSQL database service. Store and retrieve any amount of data. Serve any level of request traffic.
  • 7. Consistent, predictable performance. Single digit millisecond latency. Backed on solid-state drives.
  • 8. Flexible data model. Key/attribute pairs. No schema required. Easy to create. Easy to adjust.
  • 9. Seamless scalability. No table size limits. Unlimited storage. No downtime.
  • 10. Durable. Consistent, disk only writes. Replication across data centers and availability zones.
  • 13. Two decisions + three clicks = ready for use
  • 14. Level of throughput Primary keys Two decisions + three clicks = ready for use
  • 15. Level of throughput Primary keys Two decisions + three clicks = ready for use
  • 16. Provisioned throughput. Reserve IOPS for reads and writes. Scale up for down at any time.
  • 17. Pay per capacity unit. Priced per hour of provisioned throughput.
  • 18. Write throughput. Size of item x writes per second $0.0065 for 10 write units
  • 19. Consistent writes. Atomic increment and decrement. Optimistic concurrency control: conditional writes.
  • 20. Transactions. Item level transactions only. Puts, updates and deletes are ACID.
  • 21. Strong or eventual consistency Read throughput.
  • 22. Strong or eventual consistency Read throughput. Provisioned units = size of item x reads per second $0.0065 per hour for 50 units
  • 23. Strong or eventual consistency Read throughput. Provisioned units = size of item x reads per second 2 $0.0065 per hour for 100 units
  • 24. Strong or eventual consistency Read throughput. Same latency expectations. Mix and match at ‘read time’.
  • 26. Data is partitioned and managed by DynamoDB.
  • 27. Indexed data storage. $0.25 per GB per month. Tiered bandwidth pricing: aws.amazon.com/dynamodb/pricing
  • 28. Reserved capacity. Up to 53% for 1 year reservation. Up to 76% for 3 year reservation.
  • 29. Authentication. Session based to minimize latency. Uses the Amazon Security Token Service. Handled by AWS SDKs. Integrates with IAM.
  • 30. Monitoring. CloudWatch metrics: latency, consumed read and write throughput, errors and throttling.
  • 31. Libraries, mappers and mocks. ColdFusion, Django, Erlang, Java, .Net, Node.js, Perl, PHP, Python, Ruby http://j.mp/dynamodb-libs
  • 33. date = 2012-05-16- id = 100 09-00-10 total = 25.00 date = 2012-05-15- id = 101 15-00-11 total = 35.00 date = 2012-05-16- id = 101 12-00-10 total = 100.00
  • 34. Table date = 2012-05-16- id = 100 09-00-10 total = 25.00 date = 2012-05-15- id = 101 15-00-11 total = 35.00 date = 2012-05-16- id = 101 12-00-10 total = 100.00
  • 35. date = 2012-05-16- id = 100 09-00-10 total = 25.00 Item date = 2012-05-15- id = 101 15-00-11 total = 35.00 date = 2012-05-16- id = 101 12-00-10 total = 100.00
  • 36. date = 2012-05-16- id = 100 09-00-10 total = 25.00 Attribute date = 2012-05-15- id = 101 15-00-11 total = 35.00 date = 2012-05-16- id = 101 12-00-10 total = 100.00
  • 37. Where is the schema? Tables do not require a formal schema. Items are an arbitrarily sized hash.
  • 38. Indexing. Items are indexed by primary and secondary keys. Primary keys can be composite. Secondary keys are local to the table.
  • 39. ID Date Total
  • 40. Hash key ID Date Total
  • 41. Hash key Range key ID Date Total Composite primary key
  • 42. Hash key Range key Secondary range key ID Date Total
  • 43. Programming DynamoDB. Small but perfectly formed API.
  • 44. CreateTable PutItem UpdateTable GetItem DeleteTable UpdateItem DescribeTable DeleteItem ListTables BatchGetItem Query BatchWriteItem Scan
  • 45. CreateTable PutItem UpdateTable GetItem DeleteTable UpdateItem DescribeTable DeleteItem ListTables BatchGetItem Query BatchWriteItem Scan
  • 46. CreateTable PutItem UpdateTable GetItem DeleteTable UpdateItem DescribeTable DeleteItem ListTables BatchGetItem Query BatchWriteItem Scan
  • 47. Conditional updates. PutItem, UpdateItem, DeleteItem can take optional conditions for operation. UpdateItem performs atomic increments.
  • 48. One API call, multiple items BatchGet returns multiple items by key. BatchWrite performs up to 25 put or delete operations. Throughput is measured by IO, not API calls.
  • 49. CreateTable PutItem UpdateTable GetItem DeleteTable UpdateItem DescribeTable DeleteItem ListTables BatchGetItem Query BatchWriteItem Scan
  • 50. Query vs Scan Query returns items by key. Scan reads the whole table sequentially.
  • 51. Query patterns Retrieve all items by hash key. Range key conditions: ==, <, >, >=, <=, begins with, between. Counts. Top and bottom n values. Paged responses.
  • 53. Players user_id = location = joined = mza Cambridge 2011-07-04 user_id = location = joined = jeffbarr Seattle 2012-01-20 user_id = location = joined = werner Worldwide 2011-05-15
  • 54. Players user_id = location = joined = mza Cambridge 2011-07-04 user_id = location = joined = jeffbarr Seattle 2012-01-20 user_id = location = joined = werner Worldwide 2011-05-15 Scores user_id = game = score = mza angry-birds 11,000 user_id = game = score = mza tetris 1,223,000 user_id = location = score = werner bejewelled 55,000
  • 55. Players user_id = location = joined = mza Cambridge 2011-07-04 user_id = location = joined = jeffbarr Seattle 2012-01-20 user_id = location = joined = werner Worldwide 2011-05-15 Scores Leader boards user_id = game = score = game = score = user_id = mza angry-birds 11,000 angry-birds 11,000 mza user_id = game = score = game = score = user_id = mza tetris 1,223,000 tetris 1,223,000 mza user_id = location = score = game = score = user_id = werner bejewelled 55,000 tetris 9,000,000 jeffbarr
  • 56. Players user_id = location = joined = mza Cambridge 2011-07-04 user_id = location = joined = Query for scores jeffbarr Seattle 2012-01-20 by user user_id = location = joined = werner Worldwide 2011-05-15 Scores Leader boards user_id = game = score = game = score = user_id = mza angry-birds 11,000 angry-birds 11,000 mza user_id = game = score = game = score = user_id = mza tetris 1,223,000 tetris 1,223,000 mza user_id = location = score = game = score = user_id = werner bejewelled 55,000 tetris 9,000,000 jeffbarr
  • 57. Players user_id = location = joined = mza Cambridge 2011-07-04 user_id = location = joined = High scores by game jeffbarr Seattle 2012-01-20 user_id = location = joined = werner Worldwide 2011-05-15 Scores Leader boards user_id = game = score = game = score = user_id = mza angry-birds 11,000 angry-birds 11,000 mza user_id = game = score = game = score = user_id = mza tetris 1,223,000 tetris 1,223,000 mza user_id = location = score = game = score = user_id = werner bejewelled 55,000 tetris 9,000,000 jeffbarr
  • 59. Unlimited storage. Unlimited attributes per item. Unlimited items per table. Maximum of 64k per item.
  • 60. Split across items. message = message_id = 1 part = 1 <first 64k> message = message_id = 1 part = 2 <second 64k> joined = message_id = 1 part = 3 <third 64k>
  • 61. Store a pointer to S3. message = message_id = 1 http://s3.amazonaws.com... message = message_id = 2 http://s3.amazonaws.com... message = message_id = 3 http://s3.amazonaws.com...
  • 63. Hot and cold tables. April event_id = timestamp = key = 1000 2013-04-16-09-59-01 value event_id = timestamp = key = 1001 2013-04-16-09-59-02 value event_id = timestamp = key = 1002 2013-04-16-09-59-02 value March event_id = timestamp = key = 1000 2013-03-01-09-59-01 value event_id = timestamp = key = 1001 2013-03-01-09-59-02 value event_id = timestamp = key =
  • 64. December January February March April
  • 65. Archive data. Move old data to S3: lower cost. Still available for analytics. Run queries across hot and cold data with Elastic MapReduce.
  • 67. Uniform workload. Data stored across multiple partitions. Data is primarily distributed by primary key. Provisioned throughput is divided evenly across partitions.
  • 68. To achieve and maintain full provisioned throughput, spread workload evenly across hash keys.
  • 69. Non-Uniform workload. Might be throttled, even at high levels of throughput.
  • 70. BEST PRACTICE 1: Distinct values for hash keys. Hash key elements should have a high number of distinct values.
  • 71. Lots of users with unique user_id. Workload well distributed across hash key. user_id = first_name = last_name = mza Matt Wood user_id = first_name = last_name = jeffbarr Jeff Barr user_id = first_name = last_name = werner Werner Vogels user_id = first_name = last_name = simone Simone Brunozzi ... ... ...
  • 72. BEST PRACTICE 2: Avoid limited hash key values. Hash key elements should have a high number of distinct values.
  • 73. Small number of status codes. Unevenly, non-uniform workload. status = date = 200 2012-04-01-00-00-01 status = date = 404 2012-04-01-00-00-01 status date = 404 2012-04-01-00-00-01 status = date = 404 2012-04-01-00-00-01
  • 74. BEST PRACTICE 3: Model for even distribution. Access by hash key value should be evenly distributed across the dataset.
  • 75. Large number of devices. Small number which are much more popular than others. Workload unevenly distributed. mobile_id = access_date = 100 2012-04-01-00-00-01 mobile_id = access_date = 100 2012-04-01-00-00-02 mobile_id = access_date = 100 2012-04-01-00-00-03 mobile_id = access_date = 100 2012-04-01-00-00-04 ... ...
  • 76. Sample access pattern. Workload randomized by hash key. mobile_id = access_date = 100.1 2012-04-01-00-00-01 mobile_id = access_date = 100.2 2012-04-01-00-00-02 mobile_id = access_date = 100.3 2012-04-01-00-00-03 mobile_id = access_date = 100.4 2012-04-01-00-00-04 ... ...
  • 78. Seamless scale. Scalable methods for data processing. Scalable methods for backup/restore.
  • 79. Amazon Elastic MapReduce. Managed Hadoop service for data-intensive workflows. aws.amazon.com/emr
  • 80. create external table items_db (id string, votes bigint, views bigint) stored by 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' tblproperties ("dynamodb.table.name" = "items", "dynamodb.column.mapping" = "id:id,votes:votes,views:views");
  • 81. select id, likes, views from items_db order by views desc;
  • 82. 5
  • 83. DynamoDB @ Localytics Mohit Dilawari Director of Engineering @mdilawari
  • 84. About Localytics • Mobile App Analytics Service • 750+ Million Devices and over 20,000 Apps • Customers Include: …and many more. 84
  • 85. About the Development Team • Small team of four managing entire AWS infrastructure - 100 EC2 Instances • Experts in BigData • Leveraging Amazon's service has been the key to our success • Large scale users of: • SQS • S3 • ELB • RDS • Route53 • Elastic Cache • EMR …and of course DynamoDB 85
  • 86. Why DynamoDB? Set it and Forget it 86
  • 87. Our use-case: Dedup Data • Each datapoint includes a globally unique ID • Mobile traffic over 2G/3G will upload periodic duplicate data • We accept data up to a 28 day window 87
  • 88. First Design for Dedup table Unique ID: aaaaaaaaaaaaaaaaaaaaaaaaa333333333333333 Table Name = dedup_table ID aaaaaaaaaaaaaaaaaaaaaaaaa111111111111111 aaaaaaaaaaaaaaaaaaaaaaaaa222222222222222 aaaaaaaaaaaaaaaaaaaaaaaaa333333333333333 "Test and Set" in a single operation 88
  • 89. Optimization One - Data Aging • Partition by Month • Create new table day before the month • Need to keep two months of data 89
  • 90. Optimization One - Data Aging Unique ID: bbbbbbbbbbbbbbbbbbbbbbbbb333333333333333 Check Previous month: Table Name = March2013_dedup ID aaaaaaaaaaaaaaaaaaaaaaaaa111111111111111 Not Here! aaaaaaaaaaaaaaaaaaaaaaaaa222222222222222 90
  • 91. Optimization One - Data Aging Unique ID: bbbbbbbbbbbbbbbbbbbbbbbbb333333333333333 Test and Set in current month: Table Name = April2013_dedup ID bbbbbbbbbbbbbbbbbbbbbbbbb111111111111111 bbbbbbbbbbbbbbbbbbbbbbbbb222222222222222 bbbbbbbbbbbbbbbbbbbbbbbbb333333333333333 Inserted 91
  • 92. Optimization Two • Reduce the index size - Reduces costs • Each item has a 100 byte overhead which is substantial • Combine multiple IDs together to one record • Split each ID into two halves o First half is the key. Second Half is added to the set 92
  • 93. Optimization Two - Use Sets Unique ID: ccccccccccccccccccccccccccc999999999999999 ccccccccccccccccccccccccccc 999999999999999 Prefix Values aaaaaaaaaaaaaaaaaaaaaaaaa [111111111111111, 222222222222222, 333333333333333] bbbbbbbbbbbbbbbbbbbbbbbbb [444444444444444, 555555555555555, 666666666666666] ccccccccccccccccccccccccccc [777777777777777, 888888888888888, ] 93
  • 94. Optimization Three - Combine Months • Go back to a single table Prefix March2013 April2013 aaaaaaaaaa... [111111111111111, 22222222222... [1212121212121212, 3434343434.... bbbbbbbbbb... [444444444444444, 555555555.... [4545454545454545, 6767676767..... ccccccccccc... [777777777777777, 888888888... [8989898989898989, 1313131313.... One Operation 1. Delete February2013 Field 2. Check ID in March2013 • Test and Set into April 2013 94
  • 95. Recap Compare Plans for 20 Billion IDs per month Plan Storage Read Write Costs Total Savings Costs Costs Naive (after a $8400 0 $4000 $12400 year) Data Age $900 $350 $4000 $5250 57% Using Sets $150 $350 $4000 $4500 64% Multiple Months $150 0 $4000 $4150 67% 95
  • 97. Summary 1. Getting started 2. Data modeling 3. Partitioning 4. Replication & Analytics 5. Customer story: Localytics