The document summarizes a presentation on advanced design patterns for building ultra-high performance apps using Amazon DynamoDB. The presentation covers using hash and range schemas to model social networks, secondary indexes for flexible querying of image data, conditional writes for synchronizing game state, and fine-grained access control for user data. Examples are provided for each pattern discussed.
2. Plan
•
•
•
•
Social Network (hash + range schemas)
Image Tagging (secondary indexes)
Game State (conditional writes)
User Data (fine-grained access control)
10. Social Network
Friends Table
Users Table
User
Friend
User
Nicknames
Bob
Alice
Bob
[ Rob, Bobby ]
Alice
Bob
Alice
[ Allie ]
Alice
Carol
Carol
[ Caroline ]
Alice
Dan
Dan
[ Daniel, Danny ]
11. Social Network
Friends Table
Users Table
Hash-and-Range
Primary Key Schema
User
Nicknames
Alice
Bob
[ Rob, Bobby ]
Alice
Bob
Alice
[ Allie ]
Alice
Carol
Carol
[ Caroline ]
Alice
Dan
Dan
[ Daniel, Danny ]
User
Friend
Bob
12. Social Network
Friends Table
Users Table
User
Friend
User
Nicknames
Bob
Alice
Bob
[ Rob, Bobby ]
Alice
Bob
Alice
[ Allie ]
Alice
Carol
Carol
[ Caroline ]
Alice
Dan
Dan
[ Daniel, Danny ]
Query for Alice’s friends
23. Image Tagging
Local Secondary Index on Date
Images Table
User
Image
Date
Link
User
Date
Image
Bob
aed4c
2013-10-01
s3://…
Bob
2013-09-05
cf2e2
Bob
cf2e2
2013-09-05
s3://…
Bob
2013-10-01
aed4c
Bob
f93bae
2013-10-08
s3://…
Bob
2013-10-08
f93bae
Alice
ca61a
2013-09-12
s3://…
Alice
2013-09-12
ca61a
Table
ByDate Local Secondary Index
24. Image Tagging
Query for Bob’s two most recent images
Images Table
User
Image
Date
Link
User
Date
Image
Bob
aed4c
2013-10-01
s3://…
Bob
2013-09-05
cf2e2
Bob
cf2e2
2013-09-05
s3://…
Bob
2013-10-01
aed4c
Bob
f93bae
2013-10-08
s3://…
Bob
2013-10-08
f93bae
Alice
ca61a
2013-09-12
s3://…
Alice
2013-09-12
ca61a
Table
ByDate Local Secondary Index
25. Image Tagging
•
•
•
•
Query a user’s images
Query a user’s images by date
Tag other users in images
Query images a user is tagged in
34. Image Tagging
Global Secondary Index on User, Image
ImageTags Table
Image
User
User
Image
aed4c
Alice
Bob
aed4c
aed4c
Bob
Bob
f93bae
f93bae
Alice
Alice
aed4c
f93bae
Bob
Alice
f93bae
Table
ByUser Global Secondary Index
36. Image Tagging
Query for images tagged Alice
ImageTags Table
Image
User
User
Image
aed4c
Alice
Bob
aed4c
aed4c
Bob
Bob
f93bae
f93bae
Alice
Alice
aed4c
f93bae
Bob
Alice
f93bae
Table
ByUser Global Secondary Index
Alice
37. Recap: Image Tagging
• Local Secondary Indexes support flexible queries
• Global Secondary Indexes unlock even more
flexible queries
47. Gaming the System
Bob (1)
Bob (2)
State : STARTED,
Turn : Bob,
Top-Right : O
Amazon
DynamoDB
Bob (3)
48. Gaming the System
Bob (1)
Update:
Turn : Alice
Top-Left : X
Bob (2)
Update:
Turn : Alice
Mid : X
Bob (3)
Update:
Turn : Alice
Low-Right : X
State : STARTED,
Turn : Bob,
Top-Right : O
Amazon
DynamoDB
49. Gaming the System
Bob (1)
Update:
Turn : Alice
Top-Left : X
State : STARTED,
Turn : Alice,
Top-Right : O,
Top-Left : X,
Mid: X,
Low-Right: X
Bob (2)
Update:
Turn : Alice
Mid : X
Bob (3)
Update:
Turn : Alice
Low-Right : X
Amazon
DynamoDB
50. Conditional Writes
•
•
•
•
Accept a write only if values are as expected
Otherwise reject the write
Performance like normal writes
Single item only (no transactions)
51. Tic Tac Toe (Fixed)
Bob (1)
Update:
Turn : Alice
Top-Left : X
Expect:
Turn : Bob
Top-Left : null
State : STARTED,
Turn : Bob,
Top-Right : O
Bob (2)
Update:
Turn : Alice
Mid : X
Expect:
Turn : Bob
Mid : null
Bob (3)
Update:
Turn : Alice
Low-Right : X
Expect:
Turn : Bob
Low-Right : null
Amazon
DynamoDB
52. Tic Tac Toe (Fixed)
Bob (1)
Update:
Turn : Alice
Top-Left : X
Expect:
Turn : Bob
Top-Left : null
State : STARTED,
Turn : Bob,
Top-Right : O
Bob (2)
Update:
Turn : Alice
Mid : X
Expect:
Turn : Bob
Mid : null
Bob (3)
Update:
Turn : Alice
Low-Right : X
Expect:
Turn : Bob
Low-Right : null
Amazon
DynamoDB
53. Tic Tac Toe (Fixed)
Bob (1)
Update:
Turn : Alice
Top-Left : X
Expect:
Turn : Bob
Top-Left : null
State : STARTED,
Turn : Alice,
Top-Right : O,
Top-Left : X
Bob (2)
Update:
Turn : Alice
Mid : X
Expect:
Turn : Bob
Mid : null
Bob (3)
Update:
Turn : Alice
Low-Right : X
Expect:
Turn : Bob
Low-Right : null
Amazon
DynamoDB
54. Recap: Game State
• Conditional writes synchronize state transitions
• Multi-item transactions require application-level
coordination
62. Fine-Grained Access Control
• Limit access to particular hash key values
• Limit access to specific attributes
• Use policy substitution variables to write the
policy once
63. Fine-Grained Access Control
Images Table
User
Image
Date
Link
Bob
aed4c
2013-10-01
s3://…
Bob
5f2e2
2013-09-05
s3://…
Bob
f93bae
2013-10-08
s3://…
Alice
ca61a
2013-09-12
s3://…
“Allow all authenticated
Facebook users to Query the
Images table, but only on items
where their Facebook ID is the
hash key”
64. Fine-Grained Access Control
Images Table
User
Image
Date
Link
Bob
aed4c
2013-10-01
s3://…
Bob
5f2e2
2013-09-05
s3://…
Bob
f93bae
2013-10-08
s3://…
Alice
ca61a
2013-09-12
s3://…
Bob “logs in” using
web identity federation
AWS
IAM
Bob
65. Fine-Grained Access Control
Images Table
User
Image
Date
Link
Bob
aed4c
2013-10-01
s3://…
Bob
5f2e2
2013-09-05
s3://…
Bob
f93bae
2013-10-08
s3://…
Alice
ca61a
2013-09-12
s3://…
Bob
Bob can Query for Images
where User=“Bob”
66. Fine-Grained Access Control
Images Table
User
Image
Date
Link
Bob
aed4c
2013-10-01
s3://…
Bob
5f2e2
2013-09-05
s3://…
Bob
f93bae
2013-10-08
s3://…
Alice
ca61a
2013-09-12
s3://…
Bob
Bob cannot Query for Images
where User=“Alice”
67. Two-tier Architecture Tradeoffs
• Pros:
– Lower latency
– Lower cost
– Lower operational complexity
• Cons:
– Less visibility into application behavior
– More difficult to make changes to persistence layer
– Requires “scoping” items to a given user
Users
Amazon
DynamoDB
68. Recap: User Data
• Web identity federation makes it easy for endusers to call AWS directly
• Fine-grained access control supports a secure
two-tier architecture on Amazon DynamoDB
69. Recap (Thanks!)
•
•
•
•
Social Network (hash + range schemas)
Image Tagging (secondary indexes)
Game State (conditional writes)
User Data (fine-grained access control)
70. An Amazon DynamoDB Adoption Story:
Before, After, and Beyond
David Tuttle, Engineering Manager, Platform Team
73. An Amazon DynamoDB Adoption Story
•
•
•
•
Before: Pre-DynamoDB world
During: The move to DynamoDB
After: DynamoDB has improved our lives
Beyond: Future direction
74. Before: Challenges in a Pre-DynamoDB World
• Complex manual scaling
table_shardBit0_shardBit1 {
shardKey string, primary key;
}
– Scaling up was not a fast operation
– DB/DB slave maintenance was painful
– Handcrafted table sharding required to scale DB with application
• Slow queries on large tables
– Developed processes to rotate tables before they got too large
– Hindered by unpredictable query time
– Developer effort spent optimizing SQL queries
• Schema changes are difficult on large active tables
– Forced to code a workaround when we would have preferred altering a table
• Replication delay
75. Move to DynamoDB
• DynamoDB forced a different way of thinking about data
– Data organized into “items” around a primary key (hash key, [range key])
– Configured level of read + write throughput in DynamoDB is achievable if and
only if certain conditions are met
• Avoid sparse hash keys
• Workload evenly distributed across hash keys
• Throttled scans and queries
– A scan or a query without a limit risked consuming all of the throughput for an
internal partition of a DynamoDB table
– Parallel scans required for high-speed jobs
• Porting the application backend to use DynamoDB
– Wealth of provided AWS SDKs: Java, PHP, Python boto, etc.
– DynamoDB web service facilitates development in any programming language
76. Move to DynamoDB – Data Migration
• Migration performed in a single 4-hour window
– I/O overhead requires careful balancing to achieve desired processing speeds
– Provisioned throughput was orders of magnitude higher than required for
post-migration load
– Excessive internal partitioning and non-uniform workload lead to
configured/observed throughput mismatch
• Better: dual-writes with multi-phased deployment
Configured
Ideal
Actual
77. Alive – Client Network Health Checks
• Alive is a mission critical Devicescape service
– Use event data to both bill customers and grow our WiFi database
– 250 million events per day (and growing!)
• Reliable high performance is needed
– Alive is written to DynamoDB within the bounds of a device’s HTTP request
– It must be read from DynamoDB to support real-time processing
• Data is organized as a time series
– Each table represents a particular day in UTC
– The next day’s table is created via a Python script
– We keep up to 60 days worth of data and expire old data based on a
retention policy
78. Alive – Schema
• Hash key: Timestamp + UUID shard
– Indexed by time
– Writes distributed over 256 hash keys/second
alive_<YYYYMMDD> {
tsUuidShard, string hash key;
uuidIndex string, range key;
index number;
. . .
}
• Collisions are infrequent, but must be handled
– Append index to range key
– Atomic increment index in first item
tsUuidShard
uuid
uuidIndex
1382969157_e2
0e645c9c-08b9-48b4-a5b0-c310957451e2
0e645c9c-08b9-48b4-a5b0-c310957451e2_0
1382969157_1e
c349262e-75a3-444b-9716-20f09823411e
c349262e-75a3-444b-9716-20f09823411e_0
1382969157_1e
c349262e-75a3-444b-9716-20f09823411e
c349262e-75a3-444b-9716-20f09823411e_1
index
1
79. Alive – Plan for Failure
• Distribute HTTP front ends over 3 Availability Zones
– Amazon DynamoDB and Elastic Load Balancing inherently multi-AZ
• One “Patient uploader” per instance
– HTTPD tries once to DynamoDB ands write to local file on failure
– Patient uploader monitors local file and retries events with backoff
– Operations team alerted if patient uploader is > 1 hour behind
Elastic Load Balancing
AZ-A
AZ-B
HTTP
AZ-C
HTTP
Amazon DynamoDB
HTTP
80. Alive – Data Processing
• System usage and customer billing
– Alive data is a key component of our connection detail record (CDR)
– We have a processor running 2 hours behind current time that queries
each second’s event data
• Multiple threads access 1 shard each
• Near real-time WiFi data improvement
– Alive helps us quickly discover high-quality access points
– Processor running 5 minutes behind current time
• Batch processing
– Daily extraction to Amazon S3 allows us to perform counting and
aggregation of the alive data in Amazon EMR with HBase and Hive
– We can perform broader trend analysis on the data that is not feasible
in real-time
81. After: How Devicescape uses DynamoDB
•
•
•
•
Critical ‘online’ services principally on DynamoDB
Infrastructure cost savings of 30%
Greatly reduced DB maintenance
New developments use DynamoDB
AZ-A
Amazon EMR
Amazon
DynamoDB
Amazon S3
CloudWatch
ElastiCache
HTTP
AZ-B
Amazon
Application
HTTP
DynamoDB
82. Beyond: Future Plans
• Move more into DynamoDB!
– Some other data sources that we thought could remain in MySQL
(e.g. event logs) have already begun to hit pain points.
• Investigate new Geospatial Library
– We have worried about how our Geohash + Memcache + MySQL
solution for WiFi geolocation data would translate into the DynamoDB
and NoSQL worlds.
– Eventually the costs of scaling theses datasources in our handcrafted
MySQL sharding solution will become prohibitive.
• Use DynamoDB Local
– Incorporate private, local DynamoDB version into developer sandboxes
85. What is Dropcam?
•
•
•
•
•
Software company in SF
Wi-Fi enabled camera
Intelligent activity recognition
Apps (iOS, Android, Web)
Cloud recording
86. Dropcam Uses
Home security
Burglar caught – Bellevue, WA
Pets
Woodland Park Zoo – Seattle, WA
Family
Baby
Dropcam employee
Small business
The Baconery - NYC
Kyra – N. Virginia
Just because
Toyota dealer saw I-5 bridge collapse
87. Cloud Recording (CVR)
• Continuous recording to the cloud
• Accessible instantly from anywhere
• Activity recognition
– Motion detection
– Machine learning
$9.95 / mo.
$29.95 / mo.
91. Dropcam on Amazon DynamoDB
• camera_records
– metadata about cameras
• cuepoints
– metadata about activity
• recording_sessions
– metadata about CVR data in S3
• user_sessions
– session data for logged-in user
case class CameraRecord(
cameraId: Int,
// hash key
ownerId: Int,
subscribers: Set[Int],
hoursOfRecording: Int,
...
)
case class Cuepoint(
cameraId: Int,
// hash key
timestamp: Long, // range key
type: String,
...
)
92. Example: Cuepoints
• Activity detected PutItem
– camera_id, timestamp, type (motion, sound, etc.), other data…
• Activity recognized UpdateItem
– category_id
• Available via API Query
– Get all for camera_id, or
– Get all for camera_id BETWEEN start_time and end_time
• Expire after 7 days (or 30 days) BatchWriteItem (delete)
– Periodically, per camera: query where timestamp < now - 7 days, then delete
– Spikes chew up IOPS self-throttling
– Another approach: table per week, DeleteTable after 5 weeks
Cuepoints
93. ...Not That Simple
• “Eventual consistency”
• Choosing hash key == sharding
– Important up-front design decision
– Ensure uniform access over your key space!
• IOs are front-and-center
– Actually, even harder: IOs Per Second
• Throttling behavior is opaque
– Actively watch for throttled responses
• No built-in expiration
94. Less is More
• The right set of tradeoffs
• Managed NoSQL
– Focus on our own software and product
•
•
•
•
•
Scales easily with our business
High availability
Very fast (SSDs)
Predictable performance
Inexpensive