Learn how Digital Advertising customers are leveraging the integration between Amazon DynamoDB and Amazon Redshift to manage their high scale data, from creation to analysis. In this session, we will describe the three essential ingredients of efficient data flow in the cloud, and introduce a reference architecture that enables customers to meet the demands for low latency and high volume encountered in the Digital Advertising industry. Using existing SQL-based tools and business intelligence systems, you will learn how to gain deeper insight from your data at lower cost. The design principles presented here will be useful to every environment where managing data at scale is a challenge.
9. Fast Application
Development
Time to Build
New Applications
• Flexible data models
• Simple API
• High-scale queries
• Laptop development
Amazon
DynamoDB
DEVS
OPS
USERS
11. request-based capacity provisioning model
Provisioned Throughput
Throughput is declared and updated via the API or the console
CreateTable (foo, reads/sec = 100, writes/sec = 150)
UpdateTable (foo, reads/sec=10000, writes/sec=4500)
DynamoDB handles the rest
Capacity is reserved and available when needed
Scaling-up triggers repartitioning and reallocation
No impact to performance or availability
13. WRITES
Replicated continuously to 3 AZ’s
Persisted to disk (custom SSD)
READS
Strongly or eventually consistent
No latency trade-off
14. Latest News… DynamoDB Local
• Disconnected development
• Full API support
• Download from http://aws.amazon.com/dynamodb/resources/#testing
15. “Compared to similar products, DynamoDB
provides an amazing feature set, including super
low latencies, (literally) push-button scaling,
automatic data persistence, and seamless
integration with Redshift and other AWS services.”
Peter Bogunovich, RightAction Inc
17. EC2
Profiles Database
ad request
ad url
visitor
Ad Servers
DynamoDB
1. Visitor loads a web page
2. Web page issues a request to ad servers on EC2
3. Query to DynamoDB returns the ad to display
4. Link is returned to visitor
cookie
hash=userid
range=timestamp
user-profile
hash=userid
18. EC2
Profiles DatabaseAd Servers
DynamoDB
Real-time bidding
platform
Bidder DynamoDB
Ads ProfilesQueues
and
BufferBid response
20 ms
20 ms 20 ms 40 ms
Request network transit
Response network transit
Decision on best ad and bid price based on
optimization that needs multiple data look-ups
Contingency
time buffer
…
Bid request
real-time
bidding
19. EC2
Profiles Database
ad request
ad url
visitor
Ad Servers
DynamoDB
1. Ad files are downloaded from CloudFront
2. Impressions captured in logs to S3
CloudFront
advertisement
impression logs
Static Repository Files
Amazon S3
20. CloudFront
advertisement
impression logs
Static Repository Files
Amazon S3
Profiles Database
EC2 (MAZ)
ad request
ad url
Ad Servers
DynamoDB
Elastic Load
Balancing
visitor
Click-through
Servers
click through
log files
click through
requests
Elastic Load
Balancing
23. • Direct-attached storage
• Large data block sizes
• Columnar storage
• Data compression
• Zone maps
Redshift dramatically reduces I/O
Id Age State
123 20 CA
345 25 WA
678 40 FL
Row storage Column storage
28. Redshift works with existing BI tools
JDBC/ODBC
Amazon Redshift
More coming soon…
29. Redshift is Priced to Analyze All Your Data
$0.85 per hour for on-demand (2TB)
$999 per TB per year (3-yr reservation)
30. “Amazon Redshift introduces a major
opportunity to improve the performance of
our real-time reporting, allowing us to run
queries up to 50 times faster than our current
OLAP solution.” – Niek Sanders, VP Engineering
Realized a 20x – 40x
reduction in query times
“Redshift is the
real deal”
32. CloudFront
advertisement
impression logs
Static Repository Files
Amazon S3
Profiles Database
EC2 (MAZ)
ad request
ad url
Ad Servers
DynamoDB
Elastic Load
Balancing
visitor
Amazon Redshift
bid history
user history
ETLClick-through
Servers
click through
log files
click through
requests
Elastic Load
Balancing
Amazon EMR
updated
profiles
impressions
new requests
user history
33. Amazon Redshift
Drive qualified users to
advertiser’s sites
• Ad server logs
• 3rd party data
• Bid history
• User history
Bid Optimization
Optimizing with Redshift
Optimize return on
advertising expenditure
• Impressions
• 3rd party data
• User history
• Enrichment
Cost Optimization
34. 1. Describe the full lifecycle of data
Identify data consumption patterns, expected data volumes and
SLAs (latency, availability, durability) at each point on the timeline
2. Leverage specialized options
DynamoDB – real-time transaction processing
Redshift – online reporting and analysis
EMR – enrichment
S3 – data staging
Three steps to optimal data performance
35. 3. Optimize access patterns
Design database schemas for maximum efficiency
DynamoDB
» minimize payloads
» separate hot data from cold
Redshift
» good distribution and sort key selection – test as needed
» efficient ingestion (from DynamoDB and S3)
Three steps to optimal data performance
36. DynamoDB
• Best Practices, How-Tos, and Tools
• http://aws.amazon.com/dynamodb/resources/
• Download DynamoDB Local
• http://aws.amazon.com/dynamodb/resources/#testing
Redshift
• Best practices for loading data
• http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html
• Best practices for designing tables
• http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-
practices.html
Resources