SRE Demystified - 16 - NALSD - Non-Abstract Large System Design

•

0 gostou•444 visualizações

This document discusses Non-abstract Large System Design (NALSD), an iterative process for designing distributed systems. NALSD involves designing systems with realistic constraints in mind from the start, and assessing how designs would work at scale. It describes taking a basic design and refining it through iterations, considering whether the design is feasible, resilient, and can meet goals with available resources. Each iteration informs the next. NALSD is a skill for evaluating how well systems can fulfill requirements when deployed in real environments.

Tecnologia

SRE Demystified
NALSD
ganesh@ganeshniyer.com
ganesh.vigneswara@gmail.com,
http://ganeshniyer.com
Dr Ganesh Neelakanta Iyer

Designing distributed systems
• There are many ways to design distributed systems.
• One way involves growing systems organically
• Components are rewritten or redesigned as the system
handles more requests
• Another method starts with a proof of concept.
• Once the system adds value to the business, a second
version is designed from the ground up
2

Non-abstract large system design (NALSD)
• NALSD describes an iterative process for designing,
assessing, and evaluating distributed systems, such as Borg
cluster management for distributed computing and
the Google distributed file system
• NALSD describes a skill critical to SRE: the ability to
assess, design, and evaluate large systems
• Practically, NALSD combines elements of capacity planning,
component isolation, and graceful system degradation that
are crucial to highly available production systems
3
https://research.google/pubs/pub51/
https://research.google/pubs/pub43438/

Why Non-abstract?
• All systems will eventually have to run on real computers in real
datacenters using real networks
• The people designing distributed systems need to develop and
continuously exercise the muscle of turning a whiteboard design into
concrete estimates of resources at multiple steps in the process
• This extra bit of work up front typically leads to fewer last-minute
system design changes to account for some unforeseen physical
constraint
• The value of this exercise is in combining many imperfect-but-
reasonable results into a better understanding of the design
4
https://landing.google.com/sre/workbook/chapters/non-abstract-design/

Design process
• Iterative approach to design systems that meet our goals
• Each iteration defines a potential design and examines its
strengths and weaknesses
• This analysis either feeds into the next iteration or
indicates when the design is good enough to recommend
5

Two-Phases
1. Basic design
• Is it possible?
• Is the design even possible? If we didn’t have to worry
about enough RAM, CPU, network bandwidth, and so
on, what would we design to satisfy the requirements?
• Can we do better?
• For any such design, we ask, “Can we do better?” For
example, can we make the system meaningfully faster,
smaller, more efficient? If the design solves the problem
in O(N) time, can we solve it more quickly—say, O(ln(N))
6

Two-Phases
1. Scale-up the basic design
• Is it feasible?
• Is it possible to scale this design, given constraints on
money, hardware, and so on? If necessary, what
distributed design would satisfy the requirements?
• Is it resilient?
• Can the design fail gracefully? What happens when this
component fails? How does the system work when an
entire datacenter fails?
• Can we do better?
7

References
• https://landing.google.com/sre/workbook/chapters/non-abstract-
design/
• https://www.usenix.org/sites/default/files/conference/protected-files/
srecon18americas_slides_virji.pdf
• https://cloud.google.com/blog/products/management-tools/sre-
principles-and-flashcards-to-design-nalsd
• https://www.youtube.com/watch?v=modXC5IWTJI
8

Dr Ganesh Neelakanta Iyer
ganesh@ganeshniyer.com
ganesh.vigneswara@gmail.com

Mais conteúdo relacionado

Mais procurados

Streaming sql and druid arupmalakar

Real-time Analytics with Trino and Apache PinotXiang Fu

Docker 基礎介紹與實戰Bo-Yi Wu

Apache Tez – Present and FutureDataWorks Summit

Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...Spark Summit

Container Performance AnalysisBrendan Gregg

Monitoring with prometheusKasper Nissen

How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg

Monitoring with PrometheusShiao-An Yuan

Enforcing Bespoke Policies in KubernetesTorin Sandall

HAProxy Arindam Nayak

The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...Databricks

Introduction to PrometheusJulien Pivotto

JupyterHub: Learning at ScaleCarol Willing

Evening out the uneven: dealing with skew in FlinkFlink Forward

Performance Optimizations in Apache ImpalaCloudera, Inc.

Monitoring Flink with PrometheusMaximilian Bode

Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020HostedbyConfluent

Apache PulsarFirst OverviewRicardo Paiva

Common issues with Apache Kafka® Producerconfluent

Mais procurados (20)

Streaming sql and druid

Real-time Analytics with Trino and Apache Pinot

Docker 基礎介紹與實戰

Apache Tez – Present and Future

Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...

Container Performance Analysis

Monitoring with prometheus

How Netflix Tunes EC2 Instances for Performance

Monitoring with Prometheus

Enforcing Bespoke Policies in Kubernetes

HAProxy

The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...

Introduction to Prometheus

JupyterHub: Learning at Scale

Evening out the uneven: dealing with skew in Flink

Performance Optimizations in Apache Impala

Monitoring Flink with Prometheus

Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020

Apache PulsarFirst Overview

Common issues with Apache Kafka® Producer

Semelhante a SRE Demystified - 16 - NALSD - Non-Abstract Large System Design

Ch12Nimol KEO

Software System Engineering - Chapter 15Fadhil Ismail

1605162990-week56.pptxSumbalIlyas1

Ch 9-design-engineeringSHREEHARI WADAWADAGI

Scrum an extension pattern language for hyperproductive software developmentShiraz316

Game development (Game Architecture)Rajkumar Pawar

Software Design ConceptsMohammed Fazuluddin

WbsSulman Ahmed

Wbs, estimation and schedulingSulman Ahmed

Spacewalker: Rapid UI Design Exploration Using Lightweight Markup Enhancement...ivaderivader

Nimble frameworktusjain

Agile practices and benefitsRichard Stone

RRC RUPTerry Startzel, MS, PMP, SCPM, CSM

Nimble Framework - Software architecture and design in agile era - PSQT Templatetjain

Code Management WorkshopSameh El-Ashry

Chap05professorkarla

Chapter-1.pptKanadamKarteekaPavan1

Agile_SDLC_Node.js@Paypal_pptHitesh Kumar

Lecture 5 -6(CSC205).pptx jsksnxbbxjxksnsnzAhmadSajjad34

A Pattern-Language-for-software-DevelopmentShiraz316

Semelhante a SRE Demystified - 16 - NALSD - Non-Abstract Large System Design (20)

Ch12

Software System Engineering - Chapter 15

1605162990-week56.pptx

Ch 9-design-engineering

Scrum an extension pattern language for hyperproductive software development

Game development (Game Architecture)

Software Design Concepts

Wbs

Wbs, estimation and scheduling

Spacewalker: Rapid UI Design Exploration Using Lightweight Markup Enhancement...

Nimble framework

Agile practices and benefits

RRC RUP

Nimble Framework - Software architecture and design in agile era - PSQT Template

Code Management Workshop

Chap05

Chapter-1.ppt

Agile_SDLC_Node.js@Paypal_ppt

Lecture 5 -6(CSC205).pptx jsksnxbbxjxksnsnz

A Pattern-Language-for-software-Development

Mais de Dr Ganesh Iyer

SRE Demystified - 14 - SRE Practices overviewDr Ganesh Iyer

SRE Demystified - 13 - Docs that matter -2Dr Ganesh Iyer

SRE Demystified - 12 - Docs that matter -1 Dr Ganesh Iyer

SRE Demystified - 01 - SLO SLI and SLADr Ganesh Iyer

SRE Demystified - 11 - Release management-2Dr Ganesh Iyer

SRE Demystified - 10 - Release management-1Dr Ganesh Iyer

SRE Demystified - 09 - SimplicityDr Ganesh Iyer

SRE Demystified - 07 - Practical AlertingDr Ganesh Iyer

SRE Demystified - 06 - Distributed MonitoringDr Ganesh Iyer

SRE Demystified - 05 - Toil EliminationDr Ganesh Iyer

SRE Demystified - 04 - Engagement ModelDr Ganesh Iyer

SRE Demystified - 03 - Choosing SLIs and SLOsDr Ganesh Iyer

Machine Learning for Statisticians - IntroductionDr Ganesh Iyer

Making Decisions - A Game Theoretic approachDr Ganesh Iyer

Cloud and Industry4.0Dr Ganesh Iyer

Game Theory and Engineering ApplicationsDr Ganesh Iyer

Machine Learning and its ApplicationsDr Ganesh Iyer

How to become a successful entrepreneurDr Ganesh Iyer

Dockers and kubernetesDr Ganesh Iyer

Containerization Principles Overview for app development and deploymentDr Ganesh Iyer

Mais de Dr Ganesh Iyer (20)

SRE Demystified - 14 - SRE Practices overview

SRE Demystified - 13 - Docs that matter -2

SRE Demystified - 12 - Docs that matter -1

SRE Demystified - 01 - SLO SLI and SLA

SRE Demystified - 11 - Release management-2

SRE Demystified - 10 - Release management-1

SRE Demystified - 09 - Simplicity

SRE Demystified - 07 - Practical Alerting

SRE Demystified - 06 - Distributed Monitoring

SRE Demystified - 05 - Toil Elimination

SRE Demystified - 04 - Engagement Model

SRE Demystified - 03 - Choosing SLIs and SLOs

Machine Learning for Statisticians - Introduction

Making Decisions - A Game Theoretic approach

Cloud and Industry4.0

Game Theory and Engineering Applications

Machine Learning and its Applications

How to become a successful entrepreneur

Dockers and kubernetes

Containerization Principles Overview for app development and deployment

Último

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

From Family Reminiscence to Scholarly Archive .Alan Dix

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

CloudStudio User manual (basic edition):comworks

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Story boards and shot lists for my a level piececharlottematthew16

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Advanced Computer Architecture – An IntroductionDilum Bandara

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

SRE Demystified - 16 - NALSD - Non-Abstract Large System Design

1. SRE Demystified NALSD ganesh@ganeshniyer.com ganesh.vigneswara@gmail.com, http://ganeshniyer.com Dr Ganesh Neelakanta Iyer

2. Designing distributed systems • There are many ways to design distributed systems. • One way involves growing systems organically • Components are rewritten or redesigned as the system handles more requests • Another method starts with a proof of concept. • Once the system adds value to the business, a second version is designed from the ground up 2

3. Non-abstract large system design (NALSD) • NALSD describes an iterative process for designing, assessing, and evaluating distributed systems, such as Borg cluster management for distributed computing and the Google distributed file system • NALSD describes a skill critical to SRE: the ability to assess, design, and evaluate large systems • Practically, NALSD combines elements of capacity planning, component isolation, and graceful system degradation that are crucial to highly available production systems 3 https://research.google/pubs/pub51/ https://research.google/pubs/pub43438/

4. Why Non-abstract? • All systems will eventually have to run on real computers in real datacenters using real networks • The people designing distributed systems need to develop and continuously exercise the muscle of turning a whiteboard design into concrete estimates of resources at multiple steps in the process • This extra bit of work up front typically leads to fewer last-minute system design changes to account for some unforeseen physical constraint • The value of this exercise is in combining many imperfect-but- reasonable results into a better understanding of the design 4 https://landing.google.com/sre/workbook/chapters/non-abstract-design/

5. Design process • Iterative approach to design systems that meet our goals • Each iteration defines a potential design and examines its strengths and weaknesses • This analysis either feeds into the next iteration or indicates when the design is good enough to recommend 5

6. Two-Phases 1. Basic design • Is it possible? • Is the design even possible? If we didn’t have to worry about enough RAM, CPU, network bandwidth, and so on, what would we design to satisfy the requirements? • Can we do better? • For any such design, we ask, “Can we do better?” For example, can we make the system meaningfully faster, smaller, more efficient? If the design solves the problem in O(N) time, can we solve it more quickly—say, O(ln(N)) 6

7. Two-Phases 1. Scale-up the basic design • Is it feasible? • Is it possible to scale this design, given constraints on money, hardware, and so on? If necessary, what distributed design would satisfy the requirements? • Is it resilient? • Can the design fail gracefully? What happens when this component fails? How does the system work when an entire datacenter fails? • Can we do better? 7

8. References • https://landing.google.com/sre/workbook/chapters/non-abstract- design/ • https://www.usenix.org/sites/default/files/conference/protected-files/ srecon18americas_slides_virji.pdf • https://cloud.google.com/blog/products/management-tools/sre- principles-and-flashcards-to-design-nalsd • https://www.youtube.com/watch?v=modXC5IWTJI 8

9. Dr Ganesh Neelakanta Iyer ganesh@ganeshniyer.com ganesh.vigneswara@gmail.com

SRE Demystified - 16 - NALSD - Non-Abstract Large System Design

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a SRE Demystified - 16 - NALSD - Non-Abstract Large System Design

Semelhante a SRE Demystified - 16 - NALSD - Non-Abstract Large System Design (20)

Mais de Dr Ganesh Iyer

Mais de Dr Ganesh Iyer (20)

Último

Último (20)

SRE Demystified - 16 - NALSD - Non-Abstract Large System Design