SlideShare uma empresa Scribd logo
1 de 10
AWS Summit 2013
Navigating the Cloud
Understanding Amazon EBS Availability and Performance
Eric Anderson
CopperEgg
April 18, 2013
CopperEgg: EBS Use Case
• How CopperEgg uses EBS
• EBS vs Provisioned IOPS EBS
• EBS and RAID
• Backup/Snapshot best practices
• Filesystem selection and tuning
• Monitoring/Migrations/Planning
How CopperEgg uses EBS
• Real-time monitoring (every 5s)
– System information
– Processes
– Synthetic HTTP/TCP/etc
– Application metrics
– Tons more..
• Requirements:
– Store many terabytes of data
– Persist the data over long periods of time
– Backups (use snapshots)
– High IO: 50-60k+ ops/s per node
• SSD + Provisioned IOPS EBS
– Consistent IO behavior (non-spikey)
EBS vs Provisioned IOPS EBS
• Standard EBS
– Good for low IO volume
– Bursty workloads may be a good
fit: do the math
• Provisioned IOPS EBS
– Great for steady IO patterns that
need consistency
– Not always more expensive than
standard!
– Be sure to use the IOPS you
provision!
EBS and RAID
• Which RAID?
– Depends on your use case, but:
• We use stripes (RAID 0) for most things
– Good performance, we build our fault tolerance at a different level
• RAID 10 (stripe of mirrors)
– Good RAID0 performance, but increase in fault tolerance due to mirrors
– Twice the cost of RAID 0
• RAID 0+1 (mirror of stripes)
– Don’t do this – same performance, worse fault tolerance
• RAID 5 (stripe with parity)
– Could be dangerous: software RAID 5 can be bad if you have any write caching enabled.
– Maybe RAID 6 (dual parity) is an option..
• Block size
– Use an appropriate stripe size for best results
• We use 64kb – but you need to test various configs to get the best fit for your application
Backup/Snapshot best practices
• Snapshot regularly
– At least once per day, more if you can
– First snapshots take a while, subsequent are faster
– Schedule for when your IO load is lowest to reduce impact
• We do it at around 9pm CST
• Use consistent naming for snapshots
– {hostname}-{raid device}-{device}-{timestamp}
• Use the API for creation
– Faster kickoff, more likely to be consistent (script it!)
– ec2-create-snapshot –d “{hostname}-{raid device}-{device}-{timestamp}” vol-d726382
• Move older snapshots to S3/Glacier for long-term storage
• RAID makes this a bit more complex:
– Make sure you unmount/snapshot/remount your file system, or use fsfreeze to keep
consistent snapshots!
Choosing a good file system
• We like ext3/4, but we love XFS
– High performance, consistent
– Robust and lots of options for tweaking/adjusting as needed
• Our favorite mount options: (your mileage may vary)
– inode64, noatime, nodiratime, attr2, nobarrier, logbufs=8, logbsize=256k, osyncisdsync, nobootwait, noauto
– Yields great performance, reduces unnecessary writes, stable
• We like ZFS a lot too, but we want to see more runtime on linux first
– But FreeBSD/ZFS would be a fine choice
• However: test your workload!
– File systems behave differently under different workloads
EBS/File system performance tuning
• Tuning file systems:
– Set the scheduler to use „deadline‟ (for each disk in RAID array/EBS):
• [as root] echo deadline > /sys/block/[disk device]/queue/scheduler
– Adjust how aggressively the cache is written to disk. Tune these back if you are
bursty in write IO:
• vm.dirty_ratio=30
• vm.dirty_background_ratio=20
• Track what you change!
– Before changing anything, monitor it
– After you make the change, monitor it
– Then: KEEP monitoring it – things can change over time in unexpected ways
Monitoring
• Observing:
– iostat –xcd –t 1
• Watch the sum of r/s and w/s – this is your IOPS metric. For PIOPS, you want it close to the provisioned
amount. We monitor this using CopperEgg custom metrics, and alert if it goes low, or high.
– grep –A 1 dirty /proc/vmstat
• If nr_dirty approaches nr_dirty_threshold, you need to tune down vm.dirty to flush writes more often.
• Reference: http://docs.neo4j.org/chunked/stable/linux-performance-guide.html
• Useful stats to capture:
– In /proc/fs/xfs/stat
• xs_trans* -> transactions
• xs_read/write* -> read/write operations stats
• xb_* -> buffer stats
• Ignore SMART - does not work for EBS
• Watch the console log
– Use the AWS API to look for warning signs of EBS issues
Migrations and Capacity Planning
• Using PIOPS?
– Plan on a data migration path if you need to increase PIOPS
• You can‟t (yet) increase IOPS on the fly
• Migration steps from an EBS backed RAID:
1. Snapshot 1hr before, then again, and again – each time it takes less time
2. Stop all services
3. Unmount the filesystem
4. Stop the RAID (mdadm –stop /dev/md0)
5. Take final snapshot
6. Create new volumes based on last snapshot
7. RAID attach new volumes – mdadm should detect the array and magically make it work.
8. Mount the filesystem
9. Restart services

Mais conteúdo relacionado

Destaque

Eastenders soap example
Eastenders soap exampleEastenders soap example
Eastenders soap example
aq101824
 
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio GuerreroTendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
tex4future
 
Smart Technologies - Cetemmsa
Smart Technologies - CetemmsaSmart Technologies - Cetemmsa
Smart Technologies - Cetemmsa
tex4future
 
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
polo0007
 
Periodic Table Project 2012
Periodic Table Project 2012Periodic Table Project 2012
Periodic Table Project 2012
jmori1
 
Updated copyright presentation_after_chapter7-9
Updated copyright presentation_after_chapter7-9Updated copyright presentation_after_chapter7-9
Updated copyright presentation_after_chapter7-9
albertrodriguez5150
 
Ndiaye Agricultural non family workers (Sourga) in Senegal River Valley
Ndiaye Agricultural non family workers (Sourga) in Senegal River ValleyNdiaye Agricultural non family workers (Sourga) in Senegal River Valley
Ndiaye Agricultural non family workers (Sourga) in Senegal River Valley
futureagricultures
 
Ear study guide
Ear study guideEar study guide
Ear study guide
smblum2
 
Civil Society - recommendations from AIGLIA2014
Civil Society - recommendations from AIGLIA2014Civil Society - recommendations from AIGLIA2014
Civil Society - recommendations from AIGLIA2014
futureagricultures
 

Destaque (17)

Betonfootball
BetonfootballBetonfootball
Betonfootball
 
Eastenders soap example
Eastenders soap exampleEastenders soap example
Eastenders soap example
 
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio GuerreroTendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
 
Smart Technologies - Cetemmsa
Smart Technologies - CetemmsaSmart Technologies - Cetemmsa
Smart Technologies - Cetemmsa
 
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
 
Periodic Table Project 2012
Periodic Table Project 2012Periodic Table Project 2012
Periodic Table Project 2012
 
2010 1
2010 12010 1
2010 1
 
Latin I lesson 11
Latin I lesson 11Latin I lesson 11
Latin I lesson 11
 
Updated copyright presentation_after_chapter7-9
Updated copyright presentation_after_chapter7-9Updated copyright presentation_after_chapter7-9
Updated copyright presentation_after_chapter7-9
 
Ndiaye Agricultural non family workers (Sourga) in Senegal River Valley
Ndiaye Agricultural non family workers (Sourga) in Senegal River ValleyNdiaye Agricultural non family workers (Sourga) in Senegal River Valley
Ndiaye Agricultural non family workers (Sourga) in Senegal River Valley
 
Ear study guide
Ear study guideEar study guide
Ear study guide
 
real estate dealer in patna 9304611353
real estate dealer in patna 9304611353real estate dealer in patna 9304611353
real estate dealer in patna 9304611353
 
Lecture ready class 5
Lecture ready class 5Lecture ready class 5
Lecture ready class 5
 
Twinny in Romania, Bucharest, Sc 279
Twinny in Romania, Bucharest, Sc 279Twinny in Romania, Bucharest, Sc 279
Twinny in Romania, Bucharest, Sc 279
 
Betonfootball (подробная презентация)
Betonfootball (подробная презентация)Betonfootball (подробная презентация)
Betonfootball (подробная презентация)
 
Voto de Gilmar Mendes contra Lula - Mar 2016
Voto de Gilmar Mendes contra Lula - Mar 2016Voto de Gilmar Mendes contra Lula - Mar 2016
Voto de Gilmar Mendes contra Lula - Mar 2016
 
Civil Society - recommendations from AIGLIA2014
Civil Society - recommendations from AIGLIA2014Civil Society - recommendations from AIGLIA2014
Civil Society - recommendations from AIGLIA2014
 

Mais de CopperEgg

Mais de CopperEgg (13)

Infographic: How much of your infrastructure is in the cloud?
Infographic: How much of your infrastructure is in the cloud?Infographic: How much of your infrastructure is in the cloud?
Infographic: How much of your infrastructure is in the cloud?
 
Infographic - MSP AWS Migration
Infographic - MSP AWS MigrationInfographic - MSP AWS Migration
Infographic - MSP AWS Migration
 
6 Development Tools we Love for Mac
6 Development Tools we Love for Mac6 Development Tools we Love for Mac
6 Development Tools we Love for Mac
 
Infographic - The State of Application Performance Monitoring
Infographic - The State of Application Performance MonitoringInfographic - The State of Application Performance Monitoring
Infographic - The State of Application Performance Monitoring
 
CopperEgg Popular Features
CopperEgg Popular FeaturesCopperEgg Popular Features
CopperEgg Popular Features
 
Infographic - Essential Elements for Server and Web Monitoring
Infographic - Essential Elements for Server and Web Monitoring Infographic - Essential Elements for Server and Web Monitoring
Infographic - Essential Elements for Server and Web Monitoring
 
Infographic - Deploying and Monitoring AWS
Infographic - Deploying and Monitoring AWSInfographic - Deploying and Monitoring AWS
Infographic - Deploying and Monitoring AWS
 
Infographic - CopperEgg and Chef Integration
Infographic - CopperEgg and Chef IntegrationInfographic - CopperEgg and Chef Integration
Infographic - CopperEgg and Chef Integration
 
Infographic - Choosing EC2 Instances: Honey Badger or Sloth?
Infographic - Choosing EC2 Instances: Honey Badger or Sloth?Infographic - Choosing EC2 Instances: Honey Badger or Sloth?
Infographic - Choosing EC2 Instances: Honey Badger or Sloth?
 
Infographic - Cloud Monitoring Basics Cheat Sheet
Infographic - Cloud Monitoring Basics Cheat SheetInfographic - Cloud Monitoring Basics Cheat Sheet
Infographic - Cloud Monitoring Basics Cheat Sheet
 
Top 5 Nagios Replacement Must Haves
Top 5 Nagios Replacement Must HavesTop 5 Nagios Replacement Must Haves
Top 5 Nagios Replacement Must Haves
 
Server Monitoring as a Service
Server Monitoring as a ServiceServer Monitoring as a Service
Server Monitoring as a Service
 
Cloud Monitoring 101 - The Five Key Elements to Effective Cloud Monitoring
Cloud Monitoring 101 - The Five Key Elements to Effective Cloud MonitoringCloud Monitoring 101 - The Five Key Elements to Effective Cloud Monitoring
Cloud Monitoring 101 - The Five Key Elements to Effective Cloud Monitoring
 

Último

Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
UK Journal
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Último (20)

The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptxBT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 

Understanding Amazon EBS Availability and Performance

  • 1. AWS Summit 2013 Navigating the Cloud Understanding Amazon EBS Availability and Performance Eric Anderson CopperEgg April 18, 2013
  • 2. CopperEgg: EBS Use Case • How CopperEgg uses EBS • EBS vs Provisioned IOPS EBS • EBS and RAID • Backup/Snapshot best practices • Filesystem selection and tuning • Monitoring/Migrations/Planning
  • 3. How CopperEgg uses EBS • Real-time monitoring (every 5s) – System information – Processes – Synthetic HTTP/TCP/etc – Application metrics – Tons more.. • Requirements: – Store many terabytes of data – Persist the data over long periods of time – Backups (use snapshots) – High IO: 50-60k+ ops/s per node • SSD + Provisioned IOPS EBS – Consistent IO behavior (non-spikey)
  • 4. EBS vs Provisioned IOPS EBS • Standard EBS – Good for low IO volume – Bursty workloads may be a good fit: do the math • Provisioned IOPS EBS – Great for steady IO patterns that need consistency – Not always more expensive than standard! – Be sure to use the IOPS you provision!
  • 5. EBS and RAID • Which RAID? – Depends on your use case, but: • We use stripes (RAID 0) for most things – Good performance, we build our fault tolerance at a different level • RAID 10 (stripe of mirrors) – Good RAID0 performance, but increase in fault tolerance due to mirrors – Twice the cost of RAID 0 • RAID 0+1 (mirror of stripes) – Don’t do this – same performance, worse fault tolerance • RAID 5 (stripe with parity) – Could be dangerous: software RAID 5 can be bad if you have any write caching enabled. – Maybe RAID 6 (dual parity) is an option.. • Block size – Use an appropriate stripe size for best results • We use 64kb – but you need to test various configs to get the best fit for your application
  • 6. Backup/Snapshot best practices • Snapshot regularly – At least once per day, more if you can – First snapshots take a while, subsequent are faster – Schedule for when your IO load is lowest to reduce impact • We do it at around 9pm CST • Use consistent naming for snapshots – {hostname}-{raid device}-{device}-{timestamp} • Use the API for creation – Faster kickoff, more likely to be consistent (script it!) – ec2-create-snapshot –d “{hostname}-{raid device}-{device}-{timestamp}” vol-d726382 • Move older snapshots to S3/Glacier for long-term storage • RAID makes this a bit more complex: – Make sure you unmount/snapshot/remount your file system, or use fsfreeze to keep consistent snapshots!
  • 7. Choosing a good file system • We like ext3/4, but we love XFS – High performance, consistent – Robust and lots of options for tweaking/adjusting as needed • Our favorite mount options: (your mileage may vary) – inode64, noatime, nodiratime, attr2, nobarrier, logbufs=8, logbsize=256k, osyncisdsync, nobootwait, noauto – Yields great performance, reduces unnecessary writes, stable • We like ZFS a lot too, but we want to see more runtime on linux first – But FreeBSD/ZFS would be a fine choice • However: test your workload! – File systems behave differently under different workloads
  • 8. EBS/File system performance tuning • Tuning file systems: – Set the scheduler to use „deadline‟ (for each disk in RAID array/EBS): • [as root] echo deadline > /sys/block/[disk device]/queue/scheduler – Adjust how aggressively the cache is written to disk. Tune these back if you are bursty in write IO: • vm.dirty_ratio=30 • vm.dirty_background_ratio=20 • Track what you change! – Before changing anything, monitor it – After you make the change, monitor it – Then: KEEP monitoring it – things can change over time in unexpected ways
  • 9. Monitoring • Observing: – iostat –xcd –t 1 • Watch the sum of r/s and w/s – this is your IOPS metric. For PIOPS, you want it close to the provisioned amount. We monitor this using CopperEgg custom metrics, and alert if it goes low, or high. – grep –A 1 dirty /proc/vmstat • If nr_dirty approaches nr_dirty_threshold, you need to tune down vm.dirty to flush writes more often. • Reference: http://docs.neo4j.org/chunked/stable/linux-performance-guide.html • Useful stats to capture: – In /proc/fs/xfs/stat • xs_trans* -> transactions • xs_read/write* -> read/write operations stats • xb_* -> buffer stats • Ignore SMART - does not work for EBS • Watch the console log – Use the AWS API to look for warning signs of EBS issues
  • 10. Migrations and Capacity Planning • Using PIOPS? – Plan on a data migration path if you need to increase PIOPS • You can‟t (yet) increase IOPS on the fly • Migration steps from an EBS backed RAID: 1. Snapshot 1hr before, then again, and again – each time it takes less time 2. Stop all services 3. Unmount the filesystem 4. Stop the RAID (mdadm –stop /dev/md0) 5. Take final snapshot 6. Create new volumes based on last snapshot 7. RAID attach new volumes – mdadm should detect the array and magically make it work. 8. Mount the filesystem 9. Restart services