SlideShare uma empresa Scribd logo
1 de 10
AWS Summit 2013
Navigating the Cloud
Understanding Amazon EBS Availability and Performance
Eric Anderson
CopperEgg
April 18, 2013
CopperEgg: EBS Use Case
• How CopperEgg uses EBS
• EBS vs Provisioned IOPS EBS
• EBS and RAID
• Backup/Snapshot best practices
• Filesystem selection and tuning
• Monitoring/Migrations/Planning
How CopperEgg uses EBS
• Real-time monitoring (every 5s)
– System information
– Processes
– Synthetic HTTP/TCP/etc
– Application metrics
– Tons more..
• Requirements:
– Store many terabytes of data
– Persist the data over long periods of time
– Backups (use snapshots)
– High IO: 50-60k+ ops/s per node
• SSD + Provisioned IOPS EBS
– Consistent IO behavior (non-spikey)
EBS vs Provisioned IOPS EBS
• Standard EBS
– Good for low IO volume
– Bursty workloads may be a good
fit: do the math
• Provisioned IOPS EBS
– Great for steady IO patterns that
need consistency
– Not always more expensive than
standard!
– Be sure to use the IOPS you
provision!
EBS and RAID
• Which RAID?
– Depends on your use case, but:
• We use stripes (RAID 0) for most things
– Good performance, we build our fault tolerance at a different level
• RAID 10 (stripe of mirrors)
– Good RAID0 performance, but increase in fault tolerance due to mirrors
– Twice the cost of RAID 0
• RAID 0+1 (mirror of stripes)
– Don’t do this – same performance, worse fault tolerance
• RAID 5 (stripe with parity)
– Could be dangerous: software RAID 5 can be bad if you have any write caching enabled.
– Maybe RAID 6 (dual parity) is an option..
• Block size
– Use an appropriate stripe size for best results
• We use 64kb – but you need to test various configs to get the best fit for your application
Backup/Snapshot best practices
• Snapshot regularly
– At least once per day, more if you can
– First snapshots take a while, subsequent are faster
– Schedule for when your IO load is lowest to reduce impact
• We do it at around 9pm CST
• Use consistent naming for snapshots
– {hostname}-{raid device}-{device}-{timestamp}
• Use the API for creation
– Faster kickoff, more likely to be consistent (script it!)
– ec2-create-snapshot –d “{hostname}-{raid device}-{device}-{timestamp}” vol-d726382
• Move older snapshots to S3/Glacier for long-term storage
• RAID makes this a bit more complex:
– Make sure you unmount/snapshot/remount your file system, or use fsfreeze to keep
consistent snapshots!
Choosing a good file system
• We like ext3/4, but we love XFS
– High performance, consistent
– Robust and lots of options for tweaking/adjusting as needed
• Our favorite mount options: (your mileage may vary)
– inode64, noatime, nodiratime, attr2, nobarrier, logbufs=8, logbsize=256k, osyncisdsync, nobootwait, noauto
– Yields great performance, reduces unnecessary writes, stable
• We like ZFS a lot too, but we want to see more runtime on linux first
– But FreeBSD/ZFS would be a fine choice
• However: test your workload!
– File systems behave differently under different workloads
EBS/File system performance tuning
• Tuning file systems:
– Set the scheduler to use „deadline‟ (for each disk in RAID array/EBS):
• [as root] echo deadline > /sys/block/[disk device]/queue/scheduler
– Adjust how aggressively the cache is written to disk. Tune these back if you are
bursty in write IO:
• vm.dirty_ratio=30
• vm.dirty_background_ratio=20
• Track what you change!
– Before changing anything, monitor it
– After you make the change, monitor it
– Then: KEEP monitoring it – things can change over time in unexpected ways
Monitoring
• Observing:
– iostat –xcd –t 1
• Watch the sum of r/s and w/s – this is your IOPS metric. For PIOPS, you want it close to the provisioned
amount. We monitor this using CopperEgg custom metrics, and alert if it goes low, or high.
– grep –A 1 dirty /proc/vmstat
• If nr_dirty approaches nr_dirty_threshold, you need to tune down vm.dirty to flush writes more often.
• Reference: http://docs.neo4j.org/chunked/stable/linux-performance-guide.html
• Useful stats to capture:
– In /proc/fs/xfs/stat
• xs_trans* -> transactions
• xs_read/write* -> read/write operations stats
• xb_* -> buffer stats
• Ignore SMART - does not work for EBS
• Watch the console log
– Use the AWS API to look for warning signs of EBS issues
Migrations and Capacity Planning
• Using PIOPS?
– Plan on a data migration path if you need to increase PIOPS
• You can‟t (yet) increase IOPS on the fly
• Migration steps from an EBS backed RAID:
1. Snapshot 1hr before, then again, and again – each time it takes less time
2. Stop all services
3. Unmount the filesystem
4. Stop the RAID (mdadm –stop /dev/md0)
5. Take final snapshot
6. Create new volumes based on last snapshot
7. RAID attach new volumes – mdadm should detect the array and magically make it work.
8. Mount the filesystem
9. Restart services

Mais conteúdo relacionado

Destaque

Eastenders soap example
Eastenders soap exampleEastenders soap example
Eastenders soap example
aq101824
 
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio GuerreroTendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
tex4future
 
Smart Technologies - Cetemmsa
Smart Technologies - CetemmsaSmart Technologies - Cetemmsa
Smart Technologies - Cetemmsa
tex4future
 
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
polo0007
 
Periodic Table Project 2012
Periodic Table Project 2012Periodic Table Project 2012
Periodic Table Project 2012
jmori1
 
Updated copyright presentation_after_chapter7-9
Updated copyright presentation_after_chapter7-9Updated copyright presentation_after_chapter7-9
Updated copyright presentation_after_chapter7-9
albertrodriguez5150
 
Ndiaye Agricultural non family workers (Sourga) in Senegal River Valley
Ndiaye Agricultural non family workers (Sourga) in Senegal River ValleyNdiaye Agricultural non family workers (Sourga) in Senegal River Valley
Ndiaye Agricultural non family workers (Sourga) in Senegal River Valley
futureagricultures
 
Ear study guide
Ear study guideEar study guide
Ear study guide
smblum2
 
Civil Society - recommendations from AIGLIA2014
Civil Society - recommendations from AIGLIA2014Civil Society - recommendations from AIGLIA2014
Civil Society - recommendations from AIGLIA2014
futureagricultures
 

Destaque (17)

Betonfootball
BetonfootballBetonfootball
Betonfootball
 
Eastenders soap example
Eastenders soap exampleEastenders soap example
Eastenders soap example
 
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio GuerreroTendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
 
Smart Technologies - Cetemmsa
Smart Technologies - CetemmsaSmart Technologies - Cetemmsa
Smart Technologies - Cetemmsa
 
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
 
Periodic Table Project 2012
Periodic Table Project 2012Periodic Table Project 2012
Periodic Table Project 2012
 
2010 1
2010 12010 1
2010 1
 
Latin I lesson 11
Latin I lesson 11Latin I lesson 11
Latin I lesson 11
 
Updated copyright presentation_after_chapter7-9
Updated copyright presentation_after_chapter7-9Updated copyright presentation_after_chapter7-9
Updated copyright presentation_after_chapter7-9
 
Ndiaye Agricultural non family workers (Sourga) in Senegal River Valley
Ndiaye Agricultural non family workers (Sourga) in Senegal River ValleyNdiaye Agricultural non family workers (Sourga) in Senegal River Valley
Ndiaye Agricultural non family workers (Sourga) in Senegal River Valley
 
Ear study guide
Ear study guideEar study guide
Ear study guide
 
real estate dealer in patna 9304611353
real estate dealer in patna 9304611353real estate dealer in patna 9304611353
real estate dealer in patna 9304611353
 
Lecture ready class 5
Lecture ready class 5Lecture ready class 5
Lecture ready class 5
 
Twinny in Romania, Bucharest, Sc 279
Twinny in Romania, Bucharest, Sc 279Twinny in Romania, Bucharest, Sc 279
Twinny in Romania, Bucharest, Sc 279
 
Betonfootball (подробная презентация)
Betonfootball (подробная презентация)Betonfootball (подробная презентация)
Betonfootball (подробная презентация)
 
Voto de Gilmar Mendes contra Lula - Mar 2016
Voto de Gilmar Mendes contra Lula - Mar 2016Voto de Gilmar Mendes contra Lula - Mar 2016
Voto de Gilmar Mendes contra Lula - Mar 2016
 
Civil Society - recommendations from AIGLIA2014
Civil Society - recommendations from AIGLIA2014Civil Society - recommendations from AIGLIA2014
Civil Society - recommendations from AIGLIA2014
 

Mais de CopperEgg

Mais de CopperEgg (13)

Infographic: How much of your infrastructure is in the cloud?
Infographic: How much of your infrastructure is in the cloud?Infographic: How much of your infrastructure is in the cloud?
Infographic: How much of your infrastructure is in the cloud?
 
Infographic - MSP AWS Migration
Infographic - MSP AWS MigrationInfographic - MSP AWS Migration
Infographic - MSP AWS Migration
 
6 Development Tools we Love for Mac
6 Development Tools we Love for Mac6 Development Tools we Love for Mac
6 Development Tools we Love for Mac
 
Infographic - The State of Application Performance Monitoring
Infographic - The State of Application Performance MonitoringInfographic - The State of Application Performance Monitoring
Infographic - The State of Application Performance Monitoring
 
CopperEgg Popular Features
CopperEgg Popular FeaturesCopperEgg Popular Features
CopperEgg Popular Features
 
Infographic - Essential Elements for Server and Web Monitoring
Infographic - Essential Elements for Server and Web Monitoring Infographic - Essential Elements for Server and Web Monitoring
Infographic - Essential Elements for Server and Web Monitoring
 
Infographic - Deploying and Monitoring AWS
Infographic - Deploying and Monitoring AWSInfographic - Deploying and Monitoring AWS
Infographic - Deploying and Monitoring AWS
 
Infographic - CopperEgg and Chef Integration
Infographic - CopperEgg and Chef IntegrationInfographic - CopperEgg and Chef Integration
Infographic - CopperEgg and Chef Integration
 
Infographic - Choosing EC2 Instances: Honey Badger or Sloth?
Infographic - Choosing EC2 Instances: Honey Badger or Sloth?Infographic - Choosing EC2 Instances: Honey Badger or Sloth?
Infographic - Choosing EC2 Instances: Honey Badger or Sloth?
 
Infographic - Cloud Monitoring Basics Cheat Sheet
Infographic - Cloud Monitoring Basics Cheat SheetInfographic - Cloud Monitoring Basics Cheat Sheet
Infographic - Cloud Monitoring Basics Cheat Sheet
 
Top 5 Nagios Replacement Must Haves
Top 5 Nagios Replacement Must HavesTop 5 Nagios Replacement Must Haves
Top 5 Nagios Replacement Must Haves
 
Server Monitoring as a Service
Server Monitoring as a ServiceServer Monitoring as a Service
Server Monitoring as a Service
 
Cloud Monitoring 101 - The Five Key Elements to Effective Cloud Monitoring
Cloud Monitoring 101 - The Five Key Elements to Effective Cloud MonitoringCloud Monitoring 101 - The Five Key Elements to Effective Cloud Monitoring
Cloud Monitoring 101 - The Five Key Elements to Effective Cloud Monitoring
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Understanding Amazon EBS Availability and Performance

  • 1. AWS Summit 2013 Navigating the Cloud Understanding Amazon EBS Availability and Performance Eric Anderson CopperEgg April 18, 2013
  • 2. CopperEgg: EBS Use Case • How CopperEgg uses EBS • EBS vs Provisioned IOPS EBS • EBS and RAID • Backup/Snapshot best practices • Filesystem selection and tuning • Monitoring/Migrations/Planning
  • 3. How CopperEgg uses EBS • Real-time monitoring (every 5s) – System information – Processes – Synthetic HTTP/TCP/etc – Application metrics – Tons more.. • Requirements: – Store many terabytes of data – Persist the data over long periods of time – Backups (use snapshots) – High IO: 50-60k+ ops/s per node • SSD + Provisioned IOPS EBS – Consistent IO behavior (non-spikey)
  • 4. EBS vs Provisioned IOPS EBS • Standard EBS – Good for low IO volume – Bursty workloads may be a good fit: do the math • Provisioned IOPS EBS – Great for steady IO patterns that need consistency – Not always more expensive than standard! – Be sure to use the IOPS you provision!
  • 5. EBS and RAID • Which RAID? – Depends on your use case, but: • We use stripes (RAID 0) for most things – Good performance, we build our fault tolerance at a different level • RAID 10 (stripe of mirrors) – Good RAID0 performance, but increase in fault tolerance due to mirrors – Twice the cost of RAID 0 • RAID 0+1 (mirror of stripes) – Don’t do this – same performance, worse fault tolerance • RAID 5 (stripe with parity) – Could be dangerous: software RAID 5 can be bad if you have any write caching enabled. – Maybe RAID 6 (dual parity) is an option.. • Block size – Use an appropriate stripe size for best results • We use 64kb – but you need to test various configs to get the best fit for your application
  • 6. Backup/Snapshot best practices • Snapshot regularly – At least once per day, more if you can – First snapshots take a while, subsequent are faster – Schedule for when your IO load is lowest to reduce impact • We do it at around 9pm CST • Use consistent naming for snapshots – {hostname}-{raid device}-{device}-{timestamp} • Use the API for creation – Faster kickoff, more likely to be consistent (script it!) – ec2-create-snapshot –d “{hostname}-{raid device}-{device}-{timestamp}” vol-d726382 • Move older snapshots to S3/Glacier for long-term storage • RAID makes this a bit more complex: – Make sure you unmount/snapshot/remount your file system, or use fsfreeze to keep consistent snapshots!
  • 7. Choosing a good file system • We like ext3/4, but we love XFS – High performance, consistent – Robust and lots of options for tweaking/adjusting as needed • Our favorite mount options: (your mileage may vary) – inode64, noatime, nodiratime, attr2, nobarrier, logbufs=8, logbsize=256k, osyncisdsync, nobootwait, noauto – Yields great performance, reduces unnecessary writes, stable • We like ZFS a lot too, but we want to see more runtime on linux first – But FreeBSD/ZFS would be a fine choice • However: test your workload! – File systems behave differently under different workloads
  • 8. EBS/File system performance tuning • Tuning file systems: – Set the scheduler to use „deadline‟ (for each disk in RAID array/EBS): • [as root] echo deadline > /sys/block/[disk device]/queue/scheduler – Adjust how aggressively the cache is written to disk. Tune these back if you are bursty in write IO: • vm.dirty_ratio=30 • vm.dirty_background_ratio=20 • Track what you change! – Before changing anything, monitor it – After you make the change, monitor it – Then: KEEP monitoring it – things can change over time in unexpected ways
  • 9. Monitoring • Observing: – iostat –xcd –t 1 • Watch the sum of r/s and w/s – this is your IOPS metric. For PIOPS, you want it close to the provisioned amount. We monitor this using CopperEgg custom metrics, and alert if it goes low, or high. – grep –A 1 dirty /proc/vmstat • If nr_dirty approaches nr_dirty_threshold, you need to tune down vm.dirty to flush writes more often. • Reference: http://docs.neo4j.org/chunked/stable/linux-performance-guide.html • Useful stats to capture: – In /proc/fs/xfs/stat • xs_trans* -> transactions • xs_read/write* -> read/write operations stats • xb_* -> buffer stats • Ignore SMART - does not work for EBS • Watch the console log – Use the AWS API to look for warning signs of EBS issues
  • 10. Migrations and Capacity Planning • Using PIOPS? – Plan on a data migration path if you need to increase PIOPS • You can‟t (yet) increase IOPS on the fly • Migration steps from an EBS backed RAID: 1. Snapshot 1hr before, then again, and again – each time it takes less time 2. Stop all services 3. Unmount the filesystem 4. Stop the RAID (mdadm –stop /dev/md0) 5. Take final snapshot 6. Create new volumes based on last snapshot 7. RAID attach new volumes – mdadm should detect the array and magically make it work. 8. Mount the filesystem 9. Restart services