In this popular session, you will learn about the latest features and use cases for Amazon EBS, including best practices, an overview of newly introduced features, and brand-new re:Invent announcements. In particular we will cover the expanded portoflio of volume types, including provisioned IOPS, cold storage, and throughput-optimized. This session will help database admins and application architects understand how to blend performance and cost with applicaitns for big data analytics, data warehousing, and transactional and NoSQL databases.
5. What is Amazon EC2 instance store?
EC2 instances • Local to instance
• Non-persistent data store
• Data not replicated (by default)
• No snapshot support
• SSD or HDD
Physical Host
Instance Store
or
12. What is EBS?
EBS
boot
volume
Availability Zone
AWS Region
EC2
instance
EBS
data
volume
EBS
data
volume
• Volumes attach to one
instance at a time
• Many volumes can attach
to an instance
• Separate boot volume from
data volumes
14. EBS is designed for:
What is EBS?
99.999% service availability
0.1% to 0.2% annual failure rate (AFR)
15. What is an EBS snapshot?
EBS
volume
Availability Zone
AWS Region
Amazon
S3
EBS snapshot
Availability Zone
Replica
16. How does an EBS snapshot work?
EBS
volume
• Point-in-time backup of modified volume blocks
• Stored in S3, accessed via EBS APIs
• Subsequent snapshots are incremental
• Deleting snapshot will only remove data
exclusive to that snapshot
EBS
snapshot
17. What can you do with a snapshot?
EBS
volume
Availability Zone
AWS Region
EC2 instance
EBS snapshot
AMI
18. What can you do with a snapshot?
EBS
volume
Availability Zone
AWS Region
Amazon
S3
EBS snapshot
Availability Zone
EBS
volume
Replica Replica
19. What can you do with a snapshot?
EBS
volume
Availability Zone
AWS Region
Amazon
S3
EBS snapshot
EBS
volume
Availability Zone
AWS Region
EBS snapshot
Replica Replica
20. What can you do with a snapshot?
AWS Region
Public datasets on
AWS available as
EBS snapshots:
Availability Zone
EBS
volume
https://aws.amazon.com/public-data-sets/
• Genomic
• Census
• Global weather
• Transportation
Replica
21. What is an EBS-optimized instance?
EBS
volume
Availability Zone
AWS Region
EBS-optimized
EC2 instance
22. What is an EBS-optimized instance?
EBS
EC2
instances
Internet
Databases
~ 125 MB/s
S3
Shared
c3.2xlarge
23. What is an EBS-optimized instance?
EBS
EC2
instances InternetDatabases
c3.2xlarge
S3
~ 125 MB/s
Shared
24. What is an EBS-optimized instance?
More details:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html
• Dedicated network bandwidth for EBS I/O
• Enabled by default on c4, d2, m4, p2, and x1 instances
• Can be enabled at instance launch or on a running instance
• Not an option on some 10 Gbps instance types
(c3.8xlarge, r3.8xlarge, i2.8xlarge)
25. What is EBS encryption?
Encryption
• Attach both encrypted and unencrypted
• No volume performance impact
• Any current generation instance
• Supported by all EBS volume types
• Snapshots also encrypted
• No extra cost
• Boot and data volumes can be encrypted
31. EBS volume types: I/O Provisioned
General Purpose SSD
gp2
Throughput: 160 MB/s
Latency: Single-digit ms
Capacity: 1 GB to 16 TB
Baseline: 3 IOPS per GB up to 10,000
Burst: 3,000 IOPS (for volumes up to 1 TB)
Great for boot volumes, low-latency applications, and bursty databases
33. Burst bucket: General Purpose SSD (gp2)
Max I/O credit per bucket is 5.4M
You can spend up to
3000 IOPS per second
Baseline performance = 3 IOPS per GiB or 100 IOPS
Always accumulating
3 IOPS per GiB per second
gp2
34. How long can I burst on gp2?
0
100
200
300
400
500
600
700
1 8 30 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950
MinutesofBurst
Volume size in GB
43 min 1 hour
10 hours
35. How do I monitor gp2 burst balance?
VolumeWriteOpsBurstBalance
500 GB gp2 volume
900,000
write IOs
over 5 min =
3000 IOPS
450,000
write IOs
over 5 min =
1500 IOPS
37. i2
gp2 io1
Choosing an EBS volume type
Latency ?
< 1 ms Single-digit ms
Which is more important ?
Cost Performance
IOPS
≤ 65,000> 65,000
is more important
38. EBS volume types: I/O Provisioned
Provisioned IOPS SSD
io1
Baseline: 100 to 20,000 IOPS
Throughput: 320 MB/s
Latency: Single-digit ms
Capacity: 4 GB to 16 TB
Ideal for critical applications and databases with sustained IOPS
39. Scaling Provisioned IOPS SSD (io1)IOPS
0 2 16
1,000
5,000
10,000
15,000
20,000
6 90.4
MAX PROVISIONED IOPS
(Maximum IOPS:GB ratio of 50:1)
Available Provisioned IOPS
Volume Size (TB)
~ 400 GB
40. i2
gp2 io1
Choosing an EBS volume type
Latency ?
< 1 ms Single-digit ms
Which is more important ?
Cost Performance
IOPS
≤ 65,000> 65,000
is more important
Throughput?
41. Throughput
is more important
Small, random I/O Large, sequential I/O
i2
gp2 io1 st1
d2
Choosing an EBS volume type
Latency ?
< 1 ms Single-digit ms ≤ 1,250 MB/s
Aggregate throughput?
> 1,250 MB/s
Which is more important ?
Cost Performance
IOPS
≤ 65,000> 65,000
is more important
Which is more important ?
Cost Performance
42. EBS volume types: Throughput Provisioned
Throughput
Optimized HDD
st1
Baseline: 40 MB/s per TB up to 500 MB/s
Capacity: 500 GB to 16 TB
Burst: 250 MB/s per TB up to 500 MB/s
Ideal for large-block, high-throughput sequential workloads
43. Throughput Optimized HDD – burst and base
0
100
200
300
400
500
600
0.5 1 2 4 6 8 10 12 14 16
ThroughputinMB/s
Volume Size in TB
Burst Base
320
ST1
44. Burst bucket: Throughput Optimized HDD (st1)
Max I/O bucket credit is 1 TB of
credit per TB in volume
You can spend up to
250 MB/s per TB
Baseline performance = 40 MB/s per TB
Always accumulating 40 MB/s per TB
st1
45. Up to 8 TB in I/O credit
Always accumulating 320 MB/s
You can spend up
to 500 MB/s
Burst bucket: example 8 TB st1 volume
Baseline performance = 320 MB/s
st1
46. Throughput
is more important
Small, random I//O Large, sequential I/O
Which is more important?
Latency?
i2
gp2 io1 sc1 st1
d2
Choosing an EBS volume type
IOPS
≤ 65,000> 65,000
< 1 ms Single-digit ms ≤ 1,250 MB/s
Aggregate throughput?
> 1,250 MB/s
is more important
Cost Performance
Which is more important?
Cost Performance
47. Cold HDD
sc1
EBS volume types: Throughput Provisioned
Baseline: 12 MB/s per TB up to 192 MB/s
Capacity: 500 GB to 16 TB
Burst: 80 MB/s per TB up to 250 MB/s
Ideal for sequential throughput workloads, such as logging and backup
48. Cold HDD – burst and base
0
50
100
150
200
250
300
0.5 1 2 4 6 8 10 12 14 16
ThroughputinMB/s
Volume size in TB
Burst Base
192
SC1
49. Burst bucket: Cold HDD (sc1)
Max I/O bucket credit is 1 TB of
credit per TB in volume
You can spend up to 80
MB/s per TB
Baseline performance = 12 MB/s per TB
Always accumulating 12 MB/s per TB
sc1
50. Throughput
is more important
Small, random I/O Large, sequential I/O
Which is more important?
Latency?
i2
gp2 io1 sc1 st1
d2
Choosing an EBS volume type
IOPS
≤ 65,000> 65,000
< 1 ms Single-digit ms ≤ 1,250 MB/s
Aggregate throughput?
> 1,250 MB/s
is more important
Cost Performance
Which is more important?
Cost Performance
51. I/O Provisioned Volumes Throughput Provisioned Volumes
sc1st1io1gp2
$0.10 per GB $0.125 per GB
$0.065 per PIOPS
* All prices are per month, and from the us-west-2 Region as of April 2016
$0.045 per GB $0.025 per GB
Snapshot storage for all volume types is $0.05 per GB per month
52. Hybrid volume use cases
c4
gp2
st1
STG205
Case Study:
Librato’s Experience Running
Cassandra Using Amazon EBS
Data files
Commit log
i2
53. Hybrid volume use cases
gp2 st1
STG311
Case Study:
How Videology and Zendesk
Modernized Their Big Data Platforms on
Amazon EBS
Hot data
0–7 Days
Warm data
8–30 days
sc1
Cold data
31–60 days
Tiered Elasticsearch data:
54. Hybrid volume use cases
st1
Case Study:
Info: https://aws.amazon.com/solutions/case-
studies/infor-ebs/
Transaction logs
“We’ve seen much stronger performance for our database
backup workloads with the Amazon EBS ST1 volumes, and
we’re also saving 75 percent on our monthly backup costs.”
Randy Young, Director of Cloud Operations, Infor
i2
st1
Full backups
st1
Partial backups
SQL Server
Database
EBS
snapshots
55. Hybrid volume use cases
gp2
st1
Amazon EMR Apache Hadoop
Example
Frameworks on YARN
HDFS
sc1
EMR cluster
instance
• Random, small I/O
• Shuffle, spill, and temp operations
• Large, sequential I/O
• Multiple volumes for more parallelism
or
59. How do we count I/Os for GP2 and IO1?
When possible, we merge sequential I/Os (up to 256 KB in
size)
...To minimize I/O charges on IO1
and maximize burst on GP2
60. How do we count I/Os for GP2 and IO1?
Example 1: Random I/Os
• 4 random I/Os (i.e., non sequential I/Os)
• Each I/O 64 KB
Up to 256 KB
EC2
instance
EBS
Counted as 4 I/Os
61. How do we count I/Os for GP2 and IO1?
Example 2: Sequential I/O
• 4 sequential I/Os
• Each I/O 64 KB
Up to 256 KB
EC2
instance
EBS
Counted as 1 I/O
62. How do we count I/Os for GP2 and IO1?
Example 3: Large I/O
• 1 I/O
• 1024 KB
Up to 256 KB
EC2
instance
EBS
Counted as 4 I/Os
63. How do we count I/Os for ST1 and SC1?
• When possible, we merge sequential I/Os (up to 1 MB in size)
• Workloads with primarily large, sequential I/Os perform best on
ST1 and SC1
• Ex: Big Data/EMR, Hadoop, Kafka, Log Processing, Data
Warehouses
64. How do we count I/Os for ST1 and SC1?
Example 1: Random I/Os
• 4 random I/Os
• Each I/O 64 KB
Up to 1024 KB
EC2
instance
EBS
Counted as 4 I/Os or 4 MB/s of burst
65. How do we count I/Os for ST1 and SC1?
Example 2: Sequential I/O
• 4 sequential I/Os
• Each I/O 1024 KB
Up to 1024 KB
EC2
instance
EBS
Counted as 4 I/Os or 4 MB/s of burst
66. How do we count I/Os for ST1 and SC1?
Example 3: Mixed I/O
• 2 * 512 KB sequential I/Os
• 2 * 64 KB random I/Os
• 2 * 128 KB sequential I/Os
Up to 1024 KB
EC2
instance
EBS
Counted as 4 I/Os or 4 MB/s of burst (but only ~ 1.4 MB of data transferred)
67. Burst balance for ST1 and SC1
0
20
40
60
80
100
120
0 1 2 3 4 5 6 7 8 9 10
BurstBalance%
Time in Hours
1 MB Sequential 16 KB Random
4 TB ST1 volume
1 MB Sequential:
500 MB/s for 3 hours
16 KB Random:
8 MB/s for 3 hours
68. Burst balance for ST1 and SC1
4 TB ST1 Volume
0
1000
2000
3000
4000
5000
6000
Data Transferred in GB
1 MB Sequential 16 KB Random
1 MB Sequential:
5.4 TB transferred
16 KB Random:
87 GB transferred
69. 2046 sectors x 512 bytes/sector = ~1024 KiB
$ iostat –xm
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
xvdf 0.00 0.20 0.00 523.40 0.00 523.00 2046.44 3.99 7.62 1.61 100.00
Verify workload I/O patterns
iostat for Linux
perfmon for Windows
74. I/O requests in a Linux virtual world: 3.8+ kernel
Instance
EBS
userspace process
kernel
request
queue
scheduler
noop
deadline
cfq
pre 3.8:
44 KB
post 3.8:
128 KB to
1024 KB
I/O Driver Domain
Hypervisor
Up to 32 requests in queue
75. I/O requests in a Linux virtual world: 4.2+ kernel
Instance
EBS
userspace process
kernel
request
queue
per core
blk-mq
pre 3.8:
44 KB
post 3.8:
128 KB to
1024 KB
I/O Driver Domain
Hypervisor
Up to 32 requests in queue
76. ST1 & SC1: Linux performance tuning
Increase maximum request size:
• Recommended for ST1, SC1 on a 4.2+ Linux kernel
• Memory allocated per device
• Default is 32, max for EC2 is 256
For example with GRUB’s /boot/grub/menu.lst configuration:
kernel /boot/vmlinuz-4.4.5-15.26.amzn1.x86_64 root=LABEL=/ console=ttyS0 xen_blkfront.max=256
Verify setting:
/sys/module/xen_blkfront/parameters/max
• OS boot command line configuration
77. ST1 & SC1: Linux performance tuning
Increase read-ahead buffer:
• Recommended for high-throughput read workloads
• Per device configuration
• Default is 128 KiB (256 sectors) for Amazon Linux
• Smaller or random I/O will degrade performance
For example:
$ sudo blockdev –setra 2048 /dev/xvdf
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html
84. Best practice: RAID
Avoid RAID for redundancy
• EBS data is already replicated
• RAID5/6 loses 20 – 30% of usable I/O to parity
• RAID1 halves available EBS bandwidth
86. What about EC2 instance failure?
Availability Zone
AWS Region
EBS
volume
EC2
instance
Replica
87. What about EC2 instance failure?
Availability Zone
AWS Region
EBS
volume
New
EC2
instance
Replica
88. EBS enables EC2 auto recovery
RECOVER Instance
Instance ID
Instance metadata
Private IP addresses
Elastic IP addresses
EBS volume attachments
Instance retains:
* Supported on C3, C4, M3, M4, P2, R3, T2, and X1 instance types with EBS-only storage
StatusCheckFailed_System
Amazon CloudWatch
per-instance metric alarm:
When alarm triggers?
89. What about EC2 instance termination?
Availability Zone
EBS
volume
EC2
instance
DeleteOnTermination = True
DeleteOnTermination = False
AWS Region
Replica
90. Best practice: taking snapshots from Linux
Quiesce I/O
1. Database: FLUSH and LOCK tables
2. Filesystem: sync and fsfreeze
3. EBS: snapshot all volumes
4. When CreateSnapshot API returns
success, it is safe to resume
91. Best practice: taking snapshots from Windows
1. sync equivalent available
2. Use Volume Shadow Copy Service-
(VSS) aware utilities for backups
3. EBS: backups on dedicated volume
for snapshots
92. Best practice: taking EBS snapshots from Windows
EBS
boot
volume
Windows
EC2
instance
EBS
data
volume
EBS
backup
volume
Windows Server Backup
EBS snapshot
93. EBS volume initialization
New EBS volume? New EBS volume from snapshot?
• Attach and it’s ready to go • Initialize for best performance
• Random read across volume
95. Best practice: automate snapshots
Key ingredients:
AWS Lambda Amazon EC2
Run command
Tagging
https://aws.amazon.com/ec2/run-command/
96. Best practice: automate snapshots
Lambda
scheduled event:
daily snapshots
EC2
instances
Backup
Retention
30 days
Search for instances
tagged “Backup”
EC2 Run commands to
quiesce file system
Snapshot attached
volumes
Tag snapshots with
expire date
1. 2. 3. 4.
97. Best practice: automate snapshot expiration
Lambda
scheduled event:
daily expire
Search for snapshots
tagged to “Expire On”
today
Delete expired
snapshots
1. 2.
EBS
snapshots
Backup
ExpireOn
Date
101. Best practice: encryption
Create a new AWS KMS master key for EBS
• Define key rotation policy
• Enable AWS CloudTrail auditing
• Control who can use key
• Control who can administer key
106. Summary
Use encryption if
you need it
Take snapshots,
tag snapshots
Select the right
instance for your
workload
Select the right
volume for your
workload