O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

The Top Outages of 2022: Analysis and Takeaways

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 24 Anúncio

Mais Conteúdo rRelacionado

Semelhante a The Top Outages of 2022: Analysis and Takeaways (20)

Mais de ThousandEyes (14)

Anúncio

Mais recentes (20)

The Top Outages of 2022: Analysis and Takeaways

  1. 1. 1 © 1992–2023 Cisco Systems, Inc. All rights reserved.
  2. 2. 2 © 1992–2023 Cisco Systems, Inc. All rights reserved. Featured Speakers Chris Villemez Technical Marketing Engineer Brian Tobia Technical Marketing Engineer
  3. 3. 3 © 1992–2023 Cisco Systems, Inc. All rights reserved. Before We Begin... • If you have any questions, please type them in the Questions window. • If you have any audio problems, please chat us for help. • A recording of this presentation will be sent to you in a few days. 3 @ThousandEyes © 1992–2023 Cisco Systems, Inc. All rights reserved.
  4. 4. 4 © 1992–2023 Cisco Systems, Inc. All rights reserved. Agenda • About ThousandEyes • Noteworthy Outages of 2022 • Primer: Digital Service Building Blocks • Top Ten Outage Countdown • Lessons & Takeaways • Q&A 4 @ThousandEyes
  5. 5. 5 © 1992–2023 Cisco Systems, Inc. All rights reserved. Actionable Insight for Internet, Cloud, and SaaS Correlated Insights Quickly isolate issues to app, network, or service Network Visibility Overlay, hop-by-hop underlay, ISP performance, and BGP routing App Experience SaaS, API, and internal app performance and user experience
  6. 6. 6 © 1992–2023 Cisco Systems, Inc. All rights reserved. 2022 Noteworthy Outages Major Significant Shadow British Airways (2/25) Twitter prefixes hijacked (3/28) Atlassian services unavailable (4/5) Rogers routing failure (7/8) AWS AZ Failure (8/9) Zoom Outage (9/15) Zscaler Internet Access Failure (10/25) WhatsApp Outage (10/25) AWS packet loss (12/5)
  7. 7. 7 © 1992–2023 Cisco Systems, Inc. All rights reserved. CDN Cloud BGP DNS The Building Blocks of Today’s Digital Services SaaS
  8. 8. 8 © 1992–2023 Cisco Systems, Inc. All rights reserved. DNS BGP Many Options, Complex Dependencies ISP Users CDN Your App Security
  9. 9. 9 © 1992–2023 Cisco Systems, Inc. All rights reserved. DNS BGP Many Options, Complex Dependencies ISP Users CDN Your App Cloud APIs Data Center Cloud IaaS Security
  10. 10. 10 © 1992–2023 Cisco Systems, Inc. All rights reserved. Step 1: DNS – Where are We Going? Users CDN Your App BGP ISP DNS Root Server TLD Server Authoritative Server
  11. 11. 11 © 1992–2023 Cisco Systems, Inc. All rights reserved. Step 2: How do We Get There? Users BGP ISP DNS CDN Your App
  12. 12. 12 © 1992–2023 Cisco Systems, Inc. All rights reserved. Step 3: CDNs - Do We Have to Travel So Far? Users Your App CDN BGP ISP DNS
  13. 13. 13 © 1992–2023 Cisco Systems, Inc. All rights reserved. Step 4: Rinse and Repeat For Services & API Calls Your App SaaS Apps Cloud APIs Data Center Backend Services
  14. 14. Top Ten Countdown
  15. 15. 15 © 1992–2023 Cisco Systems, Inc. All rights reserved. Atlassian, Apr 5, 2022 #9 #8 #10 #7 #6 Zscaler Internet Access, Oct 25, 2022 WhatsApp, Oct 25, 2022 AWS, Dec 5, 2022 Rogers, Jul 8, 2022 ~24 hours App + routing issues ~2.5 days Service unavailable/data loss Rogers withdrew its prefixes due to an internal routing issue, rendering it unreachable across the Internet for nearly 24 hours. Lesson: No provider is immune to outages. Plan for a backup network provider that can alleviate the length and scope of an outage. Customers using Zscaler Internet Access (ZIA) experienced connectivity failures or high latency in reaching Zscaler proxies. Lesson: Having network-agnostic data for complex scenarios like this can enable quicker attribution and remediation. ~30 minutes Network traffic loss ~2 hours Failure to send/receive messages ~1 hour Network traffic/packet loss Significant packet loss between 2 global locations and AWS' us- east-2 region. Lesson: it’s important to monitor not just the applications, but also the cloud infrastructure components and any dependent cloud software services. The two-hour outage left WhatsApp users unable to send or receive messages. Lesson: A thriving SaaS business relies on continuous improvement, which is why an immediate feedback loop—whereby mistakes can be rectified quickly—is necessary. Due to a maintenance script error, Atlassian services experienced a days-long outage. Lesson: One cannot rely on status pages alone to communicate about outages. Customers can be left worrying with no answer as to how serious an outage is and when it will be fixed. Outage Blog Outage Blog
  16. 16. 16 © 1992–2023 Cisco Systems, Inc. All rights reserved. Zoom, September 15th, 2022 #5 • Service unavailable ~20 minutes • Users were unable to log in or join meetings • Most of the HTTP errors seen were 503 Bad Gateway responses, indicative of potential CDN issues • The service would appear to be available if just testing via IP, but looking at HTTP results/service status tells a different story Lesson: It may be that the app itself is causing issues rather than the network. Having visibility into which it is can prevent confusion and finger-pointing during root cause analysis.
  17. 17. 17 © 1992–2023 Cisco Systems, Inc. All rights reserved. British Airways, February 25, 2022 #4 • Service unavailable ~20 minutes • Outage caused hundreds of flight cancellations and disruptions in the airline's operations • Network paths to the airline’s online services (and servers) were reachable, but server and site responses were timing out Lesson: Architecting backends that avoid single points of failure can reduce the likelihood of a chain of events
  18. 18. 18 © 1992–2023 Cisco Systems, Inc. All rights reserved. Google, August 9, 2022 #3 • Service unavailable for ~60 minutes • Outage affected Google search and maps • During this time, Google web servers responded with HTTP 500 Internal Server Error messages, 502 bad gateway errors, and timeouts Lesson: It is important to monitor not just your application front ends but also the performance-critical dependencies that power your app. Outage Blog
  19. 19. 19 © 1992–2023 Cisco Systems, Inc. All rights reserved. AWS AZ Failure, July 28th, 2022 #2 • Service unavailable ~20 minutes, ~3 hours for customers to recover • Caused by an Availability Zone power failure • Impacted applications such as Webex, Okta, and Splunk. • Affected EC2 instances and EBS volumes as well as traffic routing Lesson: Be sure to have redundant AZ architecture as they are typically active/active and remove the need to execute a backup plan. Outage Blog
  20. 20. 20 © 1992–2023 Cisco Systems, Inc. All rights reserved. Twitter, March 28th, 2022 #1 • Service unavailable ~45 minutes • Twitter was rendered unreachable for some users when JSC RTComm.RU (AS 8342) announced one of Twitter’s prefixes and subsequently blackholed traffic • Since Twitter’s service is not located within RTComm’s network, any Twitter traffic destined to RTComm would have failed. Lesson: Though your company might have RPKI implemented to fend off BGP threats, it's possible that your telco won't. Something to consider when selecting ISPs. Outage Blog
  21. 21. 21 © 1992–2023 Cisco Systems, Inc. All rights reserved. Lessons and Takeaways • BGP powers the Internet, but can also be misused and abused. Visibility and planning is needed to protect your network. • Public cloud is ubiquitous and reliable. But, ensure that you are monitoring all cloud dependencies. • Avoid single points of failure. Your apps are only as resilient as your architecture. • Security is essential, but it can add great complexity that requires continuous end-to-end visibility. • Whenever the infrastructure is touched, failures can occur. Visibility is critical before and after each network change to avoid impacts.
  22. 22. © 1992–2023 Cisco Systems, Inc. All rights reserved. 22 @ThousandEyes Learn more Free Trial / Demo Next Steps Copyright ©2023 ThousandEyes • Subscribe! https://blog.thousandeyes.com • Get a real-time view of the health of the Internet https://thousandeyes.com/outages • Sign up for a Free Trial: https://www.thousandeyes.com/signup • Request a demo: https://www.thousandeyes.com/request-demo
  23. 23. Q&A

×