3. Problems with this approach
Client
Application
Relational database
âą It doesnât scale
âą Management is hard
âą High cost
âą Low performance
âą Migration is difficult
4. Why do we get these problems?
When all you have is a hammer, everything looks like a nail
Client
Application
Relational database
6. AWS service and use case mapping
Data
Search NoSQL SQL DWHCache Hadoop
Blob
store
ETL
Amazon S3 Amazon EMRDynamoDB Amazon RDSElastiCache Amazon
Redshift
AWS Data
Pipeline
Amazon
CloudSearch
8. Social gaming
Autoscaling
Elastic
Loadbalancer
Mobile client
DynamoDB Amazon S3
Log files
Amazon
Elastic
MapReduce
3
1
2
Social gaming have a large amount
of transactions, which all require
high performance and extreme
scalability
â Player data is stored in Amazon
DynamoDB, which can scale both in
terms of data volume and performance.
Long term usage log files are sent in
parallel to S3 for unlimited and cheap
storage.
Big data analytics are done in
EMR, which can be easily integrated
with both DynamoDB and S3.
1
2
3
9. E-commerce site
Autoscaling
End users
RDS
(Master)
ElastiCache
4
1
2
High availability, search performance
and flexibility to rapidly change data
structures to fit new business
requirements.
â For high performance, low latency
responses, cache in Elasticache first
⥠Order and customer information stored
in a traditional, but fault tolerant RDS.
ć Item meta data, such as color, title etc
are all stored in DynamoDB for a very
flexible data schema
⣠For scalable search meta data is
indexed into CloudSearch, which can
handle full text search easily
1
2
3
RDS
(Slave)
Amazon
CloudSearch
Amazon
DynamoDB
ïŒ
10. How do I know which service to pick?
The âdata temperatureâ method
11. What is âdata temperatureâ?
Data ïŒ
http://www.amazon.co.jp/dp/B0016V9FCQ
12. Data temperature
Hot Warm Cold
Volume MBïœGB GBïœTB PB
Item size BïœKB KBïœMB KBïœTB
Latency ms ms-s min-hr
Durability Low-high High Very high
Request rate Very high High Low
Cost/GB $$~$ $~¹¹ ¹
The temperature of the data will vary depending on its format and use.
13. The AWS service heat map
Low
Data volume
Latency
Cost/GB
Request
Amazon
ElastiCache Amazon RDS
Amazon DynamoDB Amazon S3
Amazon RedShift
Amazon EMR
Low
High
High
Low
Low
High
High
14. How do I know which service to pick?
The cost estimation method
15. Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
âą âIâm currently scoping out a project that will greatly
increase my teamâs use of Amazon S3. Hoping you
could answer some questions. The current iteration of
the design calls for many small files, perhaps up to a
billion during peak. The total size would be on the
order of 1.5 TB per monthâŠâ
Request rate
writes/s
Object size
bytes
Total size
GB/month
Objects per
month
300 2048 1483 777,600,000
16. Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
âą Time for âŠ
â»ïŒ http://calculator.s3.amazonaws.com/index.html?lng=ja_JP
17. Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
Request rate Object size Total size Objects
300 2048 1483 777,600,000
DynamoDB
Monthly costïŒ $669.56
Amazon S3
Monthly costïŒ $4325.33ïŒ
18. Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
Request rate Object size Total size Objects
Scenario 1 300 2048 1483 777,600,000
Scenario 2 300 32,768 23,730 777,600,000
DynamoDB win
Amazon S3 win
20. Summary
âą The era of relational database only onpremises
architecture is over.
âą Performance, reliability, and scalability can
all be improved by the cloud, but choosing the
right architecture is must.
âą There are several ways of choosing the right
service for the job
â Use the âdata temperatureâ and use case
â Use the reverse cost estimate method
â Ask AWS sales
21. When in doubt, contact us
https://aws.amazon.com/jp/contact-us/
23. Amazon RDS
A fully managed relational database service
âą Create and scale with a
few clicks
âą Automated backups every
5 minutes for DR
âą Manual snapshot feature
Availability Zone A Availability Zone B
Master Slave
Data synch
Automatic failover
Automated
backup
âą Automated security
patching
âą 4 supported engines
âą Monitoring and
automatic recovery
24. Amazon RDS
A fully managed relational database service
When to use
âą Transactions
âą Complex queries
âą Medium to high query/write
rate
â Up to 30 K IOPS (15 K reads
+ 15K writes)
âą 100s of GB to low TBs
âą Workload can fit in a
single node
âą High durability
and not to use
âą Massive read/write rates
â Example: 150 K write requests
per second
âą Data size or throughput
demands
âą sharding
â Example: 10 s or 100 s of
terabytes
âą Simple Get/Put and queries
that a NoSQL can handle
âą Complex analytics
25. DynamoDB
Fully managed NoSQL service
âą Easy administration and
high availability
â No SPOF
â Data is replicated into 3
availability zones
â Storage scales, and data is
automatically partioned
âą No limit on storage
â Only pay for the storage you
use
â No need to add nodes or disks
as storage grows
Client
Region
26. DynamoDB
Fully managed NoSQL service
âą Fast and predictable
performance
âą Seamless/massive scale
âą Autosharding
âą Consistent/low latency
âą No size or throughput
limits
âą Very high durability
âą Key-value or simple queries
âą Need multi-item/row or
cross table transactions
âą Need complex
queries, joins
âą Need real-time analytics
on historic data
âą Storing cold data
When to use and not to use
27. Amazon Redshift
Fully managed data warehouse service
âą DWH as a Service: Amazon Redshift
is a fast, fully
managed, petabyte-scale data
warehouse service
âą Scalable: 160GB ïœ Petabytes
âą Fast: Amazon Redshift has a
massively parallel processing
(MPP) architecture, parallelizing
and distributing SQL operations to
take advantage of all available
resources.
âą Low cost: No initial cost, no
license fees, and only pay for
what you use.
+nodes
BI tools
ăȘăŒăăŒ
ăăŒă
Comput
e node
Comput
e node
Comput
e node
JDBC/ODBC
10GigE Mesh
SQL end point:
âą Parallel queries
âą Create results
S3, DynamoDB, EMR
integration
28. Amazon Redshift
Fully managed data warehouse service
âą Information analysis and
reporting
âą Complex DW queries that
summarize historical data
âą Batched large updates e.g. daily
sales totals
âą 10s of concurrent queries
âą 100s GB to PB
âą Compression
âą Column based
âą Very high durability
âą OLTP workloads
â 1000s of concurrent
users
â Large number of
singleton updates
When to use and not to use
29. Amazon S3
low cost, highly reliable object storage service
Datacenter A
Datacenter C
Datacenter B
File A
File B
File C
User side Infrastructure side
âą Never lose data with
99.99999999999% reliability
âą Data automatically replicated
âą Choose from over 9 regions
globally
âą Only put data, with no need to
worry about scalability,
infrastructure, volume expansion
etc.
âą Only pay for what you use
ExampleïŒ1GB/Month â ~3yen
30. Amazon S3
low cost, highly reliable object storage service
âą Store large objects
âą Key-value store - Get/Put/List
âą Unlimited storage
âą Versioning
âą Very high durability
â 99.999999999%
âą Very high throughput (via parallel
clients)
âą Use for storing persistent data
â Backups
â Source/target for EMR
â Blob store with metadata in SQL or
NoSQL
âą Complex queries
âą Very low latency (ms)
âą Search
âą Read-after-write
consistency for
overwrites
âą Need transactions
When to use and not to use