SlideShare uma empresa Scribd logo
1 de 33
Michael Richardson
Twitter: @Mr_SPB
1© 2011 Energized Work - www.energizedwork.com
Availability and Recoverability
So what is High Availability?
• Five 9s?
• No Single point of failure?
• Multiple Data Centre’s?
• Fault Tolerance?
• Load Balancing?
• Uptime?
2© 2012 Energized Work - www.energizedwork.com
The 9’s of Availability
3© 2012 Energized Work - www.energizedwork.com
9
9
The 9’s of Availability
4© 2012 Energized Work - www.energizedwork.com
Availability Downtime per Year
One nine (90%) 36.5 days
Two nines (99%) 3.65 days
Three nines (99.9%) 8.76 hours
Four nines (99.99%) 52.56 minutes
Five nines (99.999%) 5.26 minutes
Problem with the 9’s
5© 2012 Energized Work - www.energizedwork.com
• What do they mean?
• Guaranteed or just an SLA
• Multiplicity
(99.9% * 99.9% * 99.9% = 99.7%)
SLA availability numbers:
just aim to provide a level of
confidence in a website’s
service
6© 2012 Energized Work - www.energizedwork.com
No Single Point of
Failure (SPOF)
7© 2012 Energized Work - www.energizedwork.com
two of everything?
8© 2012 Energized Work - www.energizedwork.com
Start with this
9© 2012 Energized Work - www.energizedwork.com
Index.html
Users
End with this
10© 2012 Energized Work - www.energizedwork.com
WEB1
switch 1 switch 2
WEB2 APP1 APP2 DB1 DB2
Firewall 1 Firewall 2
Users
• It’s expensive ££
• Where do you draw the line?
• Are failures independent
• Can you guarantee No SPOF?
• Increased complexity
11© 2012 Energized Work - www.energizedwork.com
Problems with
eliminating SPOF
Problem: Data Centre’s Fail
12© 2012 Energized Work - www.energizedwork.com
Solution: Get a 2nd
Data Centre
13© 2012 Energized Work - www.energizedwork.com
Hot/Hot Multisite
14© 2012 Energized Work - www.energizedwork.com
• Full range of services available in
multiple locations.
• Easy to automate failover of sites
• Data Consistency is hard.
• Capacity Planning concerns
+
Hot/Warm Multisite
15© 2012 Energized Work - www.energizedwork.com
• Simpler than Hot/Hot
• Read/write ratio dependant
• Synchronous or Asynchronously
replicate data?
+
Hot/Cold Multisite
16© 2012 Energized Work - www.energizedwork.com
• Easy to setup
• Will it work?
• Can it be trusted?
• Cold site rapidly become stale
• Is it actually valuable?
+
DR Multisite
17© 2012 Energized Work - www.energizedwork.com
• Fingers crossed you never need it.
• How can/should you test it?
• Cloud?
+
Problems with Multiple sites
18© 2012 Energized Work - www.energizedwork.com
• ££ - it’s expensive
• Managing more systems
• Managing consistency of Data
• Managing Capacity
• Is it still fail proof?
• Unless you test it, it’s just a plan
19© 2012 Energized Work - www.energizedwork.com
We now have a
Complex System
• More redundancy and automation leads
to more complexity.
• More complexity often adds more
points of failure.
20© 2012 Energized Work - www.energizedwork.com
Complex Systems
Author: Dr. Richard Cook
21© 2012 Energized Work - www.energizedwork.com
“How Complex Systems fail”
• Catastrophe is always just around the
corner.
• Human Operators have dual roles.
• Change introduces new forms of failure
Failure and Recovery
22© 2012 Energized Work - www.energizedwork.com
Questions for the Customer
23© 2012 Energized Work - www.energizedwork.com
• What is the cost of downtime?
• What are the RTO and RPO?
24© 2012 Energized Work - www.energizedwork.com
RTO = Recovery Time Objective
RPO = Recovery Point Objective
Aggressive RTO & RPO is
expensive and has a
performance impact.
25© 2012 Energized Work - www.energizedwork.com
RTO / RPO example
26© 2012 Energized Work - www.energizedwork.com
problem
•Simple DB
•Business can tolerate up to 15 minutes
downtime
•10 minute window of data lose.
RTO / RPO example
27© 2012 Energized Work - www.energizedwork.com
Possible solution
1.Continuously replicate data to 2nd
host
2.Continue with nightly backups and also
copy DB transaction logs from the primary
host to another system.
So what’s more important?
28© 2012 Energized Work - www.energizedwork.com
Increasing Availability
Or
Reducing Recovery Time
29© 2012 Energized Work - www.energizedwork.com
MTBF
Or
MTTR
What about MTTD??
30© 2012 Energized Work - www.energizedwork.com
Answer?
It Depends
31© 2012 Energized Work - www.energizedwork.com
Failure is inevitable
32© 2012 Energized Work - www.energizedwork.com
Ask anyone
33© 2011 Energized Work - www.energizedwork.com
Thank you
The End
Twitter - @Mr_SPB

Mais conteúdo relacionado

Semelhante a System Availability Talk

MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012Energized Work
 
Emc sql server 2012 overview
Emc sql server 2012 overviewEmc sql server 2012 overview
Emc sql server 2012 overviewsolarisyougood
 
Musings of an MSP - Why Some Things Never Change and Others Have To - Datacom
Musings of an MSP - Why Some Things Never Change and Others Have To - DatacomMusings of an MSP - Why Some Things Never Change and Others Have To - Datacom
Musings of an MSP - Why Some Things Never Change and Others Have To - DatacomAmazon Web Services
 
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]Strangeloop
 
Disaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and TungstenDisaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and TungstenJeff Mace
 
Walmart pagespeed-slide
Walmart pagespeed-slideWalmart pagespeed-slide
Walmart pagespeed-slideBitsytask
 
Walmart Web Performance Circa 2013
Walmart Web Performance Circa 2013Walmart Web Performance Circa 2013
Walmart Web Performance Circa 2013Cliff Crocker
 
Presentation virtualizing oracle unlocked enterprise wide benefits
Presentation   virtualizing oracle unlocked enterprise wide benefitsPresentation   virtualizing oracle unlocked enterprise wide benefits
Presentation virtualizing oracle unlocked enterprise wide benefitssolarisyourep
 
O'Reilly webcast: Joshua Bixby on Mobile Performance Trends and Predictions
O'Reilly webcast: Joshua Bixby on Mobile Performance Trends and PredictionsO'Reilly webcast: Joshua Bixby on Mobile Performance Trends and Predictions
O'Reilly webcast: Joshua Bixby on Mobile Performance Trends and PredictionsStrangeloop
 
Scaling mature systems
Scaling mature systemsScaling mature systems
Scaling mature systemsHanMorten
 
Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014
Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014
Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014James Charter
 
Executing the Digital Strategy
Executing the Digital StrategyExecuting the Digital Strategy
Executing the Digital StrategyBen Turner
 
Optimizing Browser Rendering
Optimizing Browser RenderingOptimizing Browser Rendering
Optimizing Browser Renderingmichael.labriola
 
How to Choose the Right Cloud for Continuity
How to Choose the Right Cloud for ContinuityHow to Choose the Right Cloud for Continuity
How to Choose the Right Cloud for Continuitymarketingunitrends
 
Works on my machine, your problem now? - QCon 2014
Works on my machine, your problem now? - QCon 2014Works on my machine, your problem now? - QCon 2014
Works on my machine, your problem now? - QCon 2014Wolfgang Gottesheim
 
At bruxelles scaling agile - v1.5 slideshare
At bruxelles   scaling agile - v1.5 slideshareAt bruxelles   scaling agile - v1.5 slideshare
At bruxelles scaling agile - v1.5 slideshareHerve Lourdin
 
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?Steve Poole
 
Oracle primavera and bpm the power of integration ppt
Oracle primavera and bpm   the power of integration pptOracle primavera and bpm   the power of integration ppt
Oracle primavera and bpm the power of integration pptp6academy
 

Semelhante a System Availability Talk (20)

MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012
 
Emc sql server 2012 overview
Emc sql server 2012 overviewEmc sql server 2012 overview
Emc sql server 2012 overview
 
Musings of an MSP - Why Some Things Never Change and Others Have To - Datacom
Musings of an MSP - Why Some Things Never Change and Others Have To - DatacomMusings of an MSP - Why Some Things Never Change and Others Have To - Datacom
Musings of an MSP - Why Some Things Never Change and Others Have To - Datacom
 
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
 
Disaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and TungstenDisaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and Tungsten
 
Walmart pagespeed-slide
Walmart pagespeed-slideWalmart pagespeed-slide
Walmart pagespeed-slide
 
Walmart Web Performance Circa 2013
Walmart Web Performance Circa 2013Walmart Web Performance Circa 2013
Walmart Web Performance Circa 2013
 
Presentation virtualizing oracle unlocked enterprise wide benefits
Presentation   virtualizing oracle unlocked enterprise wide benefitsPresentation   virtualizing oracle unlocked enterprise wide benefits
Presentation virtualizing oracle unlocked enterprise wide benefits
 
O'Reilly webcast: Joshua Bixby on Mobile Performance Trends and Predictions
O'Reilly webcast: Joshua Bixby on Mobile Performance Trends and PredictionsO'Reilly webcast: Joshua Bixby on Mobile Performance Trends and Predictions
O'Reilly webcast: Joshua Bixby on Mobile Performance Trends and Predictions
 
Scaling mature systems
Scaling mature systemsScaling mature systems
Scaling mature systems
 
Why You Should Move to the Cloud
Why You Should Move to the CloudWhy You Should Move to the Cloud
Why You Should Move to the Cloud
 
Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014
Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014
Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014
 
Executing the Digital Strategy
Executing the Digital StrategyExecuting the Digital Strategy
Executing the Digital Strategy
 
Optimizing Browser Rendering
Optimizing Browser RenderingOptimizing Browser Rendering
Optimizing Browser Rendering
 
How to Choose the Right Cloud for Continuity
How to Choose the Right Cloud for ContinuityHow to Choose the Right Cloud for Continuity
How to Choose the Right Cloud for Continuity
 
Works on my machine, your problem now? - QCon 2014
Works on my machine, your problem now? - QCon 2014Works on my machine, your problem now? - QCon 2014
Works on my machine, your problem now? - QCon 2014
 
At bruxelles scaling agile - v1.5 slideshare
At bruxelles   scaling agile - v1.5 slideshareAt bruxelles   scaling agile - v1.5 slideshare
At bruxelles scaling agile - v1.5 slideshare
 
Scaling CQ5
Scaling CQ5Scaling CQ5
Scaling CQ5
 
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
 
Oracle primavera and bpm the power of integration ppt
Oracle primavera and bpm   the power of integration pptOracle primavera and bpm   the power of integration ppt
Oracle primavera and bpm the power of integration ppt
 

Mais de m_richardson

Persistence in the cloud with bosh
Persistence in the cloud with boshPersistence in the cloud with bosh
Persistence in the cloud with boshm_richardson
 
bootstrapping containers with confd
bootstrapping containers with confdbootstrapping containers with confd
bootstrapping containers with confdm_richardson
 
Docker Service Registration and Discovery
Docker Service Registration and DiscoveryDocker Service Registration and Discovery
Docker Service Registration and Discoverym_richardson
 
Puppetcamp Melbourne - puppetdb
Puppetcamp Melbourne - puppetdbPuppetcamp Melbourne - puppetdb
Puppetcamp Melbourne - puppetdbm_richardson
 
Node collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsNode collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsm_richardson
 
Node collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBNode collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBm_richardson
 
Serverspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collideServerspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collidem_richardson
 
Chef - managing yours servers with Code
Chef - managing yours servers with CodeChef - managing yours servers with Code
Chef - managing yours servers with Codem_richardson
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Toolsm_richardson
 

Mais de m_richardson (9)

Persistence in the cloud with bosh
Persistence in the cloud with boshPersistence in the cloud with bosh
Persistence in the cloud with bosh
 
bootstrapping containers with confd
bootstrapping containers with confdbootstrapping containers with confd
bootstrapping containers with confd
 
Docker Service Registration and Discovery
Docker Service Registration and DiscoveryDocker Service Registration and Discovery
Docker Service Registration and Discovery
 
Puppetcamp Melbourne - puppetdb
Puppetcamp Melbourne - puppetdbPuppetcamp Melbourne - puppetdb
Puppetcamp Melbourne - puppetdb
 
Node collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsNode collaboration - sharing information between your systems
Node collaboration - sharing information between your systems
 
Node collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBNode collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDB
 
Serverspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collideServerspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collide
 
Chef - managing yours servers with Code
Chef - managing yours servers with CodeChef - managing yours servers with Code
Chef - managing yours servers with Code
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Tools
 

Último

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Último (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

System Availability Talk

  • 1. Michael Richardson Twitter: @Mr_SPB 1© 2011 Energized Work - www.energizedwork.com Availability and Recoverability
  • 2. So what is High Availability? • Five 9s? • No Single point of failure? • Multiple Data Centre’s? • Fault Tolerance? • Load Balancing? • Uptime? 2© 2012 Energized Work - www.energizedwork.com
  • 3. The 9’s of Availability 3© 2012 Energized Work - www.energizedwork.com 9 9
  • 4. The 9’s of Availability 4© 2012 Energized Work - www.energizedwork.com Availability Downtime per Year One nine (90%) 36.5 days Two nines (99%) 3.65 days Three nines (99.9%) 8.76 hours Four nines (99.99%) 52.56 minutes Five nines (99.999%) 5.26 minutes
  • 5. Problem with the 9’s 5© 2012 Energized Work - www.energizedwork.com • What do they mean? • Guaranteed or just an SLA • Multiplicity (99.9% * 99.9% * 99.9% = 99.7%)
  • 6. SLA availability numbers: just aim to provide a level of confidence in a website’s service 6© 2012 Energized Work - www.energizedwork.com
  • 7. No Single Point of Failure (SPOF) 7© 2012 Energized Work - www.energizedwork.com
  • 8. two of everything? 8© 2012 Energized Work - www.energizedwork.com
  • 9. Start with this 9© 2012 Energized Work - www.energizedwork.com Index.html Users
  • 10. End with this 10© 2012 Energized Work - www.energizedwork.com WEB1 switch 1 switch 2 WEB2 APP1 APP2 DB1 DB2 Firewall 1 Firewall 2 Users
  • 11. • It’s expensive ££ • Where do you draw the line? • Are failures independent • Can you guarantee No SPOF? • Increased complexity 11© 2012 Energized Work - www.energizedwork.com Problems with eliminating SPOF
  • 12. Problem: Data Centre’s Fail 12© 2012 Energized Work - www.energizedwork.com
  • 13. Solution: Get a 2nd Data Centre 13© 2012 Energized Work - www.energizedwork.com
  • 14. Hot/Hot Multisite 14© 2012 Energized Work - www.energizedwork.com • Full range of services available in multiple locations. • Easy to automate failover of sites • Data Consistency is hard. • Capacity Planning concerns +
  • 15. Hot/Warm Multisite 15© 2012 Energized Work - www.energizedwork.com • Simpler than Hot/Hot • Read/write ratio dependant • Synchronous or Asynchronously replicate data? +
  • 16. Hot/Cold Multisite 16© 2012 Energized Work - www.energizedwork.com • Easy to setup • Will it work? • Can it be trusted? • Cold site rapidly become stale • Is it actually valuable? +
  • 17. DR Multisite 17© 2012 Energized Work - www.energizedwork.com • Fingers crossed you never need it. • How can/should you test it? • Cloud? +
  • 18. Problems with Multiple sites 18© 2012 Energized Work - www.energizedwork.com • ££ - it’s expensive • Managing more systems • Managing consistency of Data • Managing Capacity • Is it still fail proof? • Unless you test it, it’s just a plan
  • 19. 19© 2012 Energized Work - www.energizedwork.com We now have a Complex System
  • 20. • More redundancy and automation leads to more complexity. • More complexity often adds more points of failure. 20© 2012 Energized Work - www.energizedwork.com Complex Systems
  • 21. Author: Dr. Richard Cook 21© 2012 Energized Work - www.energizedwork.com “How Complex Systems fail” • Catastrophe is always just around the corner. • Human Operators have dual roles. • Change introduces new forms of failure
  • 22. Failure and Recovery 22© 2012 Energized Work - www.energizedwork.com
  • 23. Questions for the Customer 23© 2012 Energized Work - www.energizedwork.com • What is the cost of downtime? • What are the RTO and RPO?
  • 24. 24© 2012 Energized Work - www.energizedwork.com RTO = Recovery Time Objective RPO = Recovery Point Objective
  • 25. Aggressive RTO & RPO is expensive and has a performance impact. 25© 2012 Energized Work - www.energizedwork.com
  • 26. RTO / RPO example 26© 2012 Energized Work - www.energizedwork.com problem •Simple DB •Business can tolerate up to 15 minutes downtime •10 minute window of data lose.
  • 27. RTO / RPO example 27© 2012 Energized Work - www.energizedwork.com Possible solution 1.Continuously replicate data to 2nd host 2.Continue with nightly backups and also copy DB transaction logs from the primary host to another system.
  • 28. So what’s more important? 28© 2012 Energized Work - www.energizedwork.com Increasing Availability Or Reducing Recovery Time
  • 29. 29© 2012 Energized Work - www.energizedwork.com MTBF Or MTTR What about MTTD??
  • 30. 30© 2012 Energized Work - www.energizedwork.com Answer? It Depends
  • 31. 31© 2012 Energized Work - www.energizedwork.com Failure is inevitable
  • 32. 32© 2012 Energized Work - www.energizedwork.com Ask anyone
  • 33. 33© 2011 Energized Work - www.energizedwork.com Thank you The End Twitter - @Mr_SPB

Notas do Editor

  1. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  2. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  3. Ask any business how much downtime is acceptable and you will get a consistent answer. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  4. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  5. Found more in Marketing literature than technical literature 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  6. An SLA is just an instrument that makes business people comfortable (just like insurance) 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  7. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  8. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  9. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  10. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  11. 1 & 2 Diminishing returns Paradoxically, adding more components to an overall system design can undermine efforts to achieve high availability Cascading failures 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  12. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  13. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  14. Read & Write anywhere Global Server Load Balancing with DNS 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  15. Read intensive apps are well suited to this – Reads Hot/Hot 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  16. Cold site is so untrusted that perhaps spending hours restoring the primary DC is a better and safer bet. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  17. Cold site is so untrusted that perhaps spending hours restoring the primary DC is a better and safer bet. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  18. Talk about capacity planning Hot/Hot – config switches Most companies don ’ t thoroughly test DC failover. When failure occurs many companies will often focus on restoring the failure in the primary DC rather attempt a failover. So why bother having a 2 nd DC anyway. If you plan on having multiple DC ’ s or DR then test your procedures when you ’ re not in an emergency situation. Game Day events 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  19. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  20. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  21. Mention John Alspaw ’ s Qcon talk 2. Dual roles of humans Defenders against failure Producers of failure 3. Introduce a technology change To prevent low-consequence, but high frequency failures May introduce low frequency, but high consequence failure Introduce new pathways to large-scale, catastrophic failures. Focus of humans is on the beneficial charactistics of the change. New failure ’ s maybe difficult to foresee. Give config management example Knife Resolv.conf 3. Also covers maintenance and why many find it difficult. Build and forget mentality. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  22. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  23. Cost of downtime – easy or difficult to measure Can downtime actually be equated to lost revenue. Give online shopping example 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  24. RTO and RPO are often in competition Give eg of replication lag between 2 sites. Zero RPO example - If replication lags between systems and you have an aggressive RPO you maybe better off taking a few hours outage and focusing on restoring your primary site. Zero RTO example – if replication lags between DC ’ s you may decide to failover immediately and take the data loss for some inflight transactions Aggressive RTO & RPO is expensive and has a performance 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  25. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  26. Typical nightly backups aren ’ t going to cut it. Common practice is to backup systems nightly. Is your business happy to lose up to 24 hours of data? Probably not. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  27. Covers you for any catastrophic hardware failure 2 nd host has independent storage infrastructure. Data corruption would however result in 2 copies of crap 2. Covers you for data corruption Playing back transaction logs will also allow you to identify the place where corruption occurred. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  28. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  29. What about MTTD? 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  30. My experience tells me most companies focus on availability How many companies take nightly tape backups but have never bothered trying to restore or test them? If you think you can built a completely fail-proof system you are kidding yourself. How many companies have game days? 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  31. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  32. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  33. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING