This document discusses Non-abstract Large System Design (NALSD), an iterative process for designing distributed systems. NALSD involves designing systems with realistic constraints in mind from the start, and assessing how designs would work at scale. It describes taking a basic design and refining it through iterations, considering whether the design is feasible, resilient, and can meet goals with available resources. Each iteration informs the next. NALSD is a skill for evaluating how well systems can fulfill requirements when deployed in real environments.
2. Designing distributed systems
• There are many ways to design distributed systems.
• One way involves growing systems organically
• Components are rewritten or redesigned as the system
handles more requests
• Another method starts with a proof of concept.
• Once the system adds value to the business, a second
version is designed from the ground up
2
3. Non-abstract large system design (NALSD)
• NALSD describes an iterative process for designing,
assessing, and evaluating distributed systems, such as Borg
cluster management for distributed computing and
the Google distributed file system
• NALSD describes a skill critical to SRE: the ability to
assess, design, and evaluate large systems
• Practically, NALSD combines elements of capacity planning,
component isolation, and graceful system degradation that
are crucial to highly available production systems
3
https://research.google/pubs/pub51/
https://research.google/pubs/pub43438/
4. Why Non-abstract?
• All systems will eventually have to run on real computers in real
datacenters using real networks
• The people designing distributed systems need to develop and
continuously exercise the muscle of turning a whiteboard design into
concrete estimates of resources at multiple steps in the process
• This extra bit of work up front typically leads to fewer last-minute
system design changes to account for some unforeseen physical
constraint
• The value of this exercise is in combining many imperfect-but-
reasonable results into a better understanding of the design
4
https://landing.google.com/sre/workbook/chapters/non-abstract-design/
5. Design process
• Iterative approach to design systems that meet our goals
• Each iteration defines a potential design and examines its
strengths and weaknesses
• This analysis either feeds into the next iteration or
indicates when the design is good enough to recommend
5
6. Two-Phases
1. Basic design
• Is it possible?
• Is the design even possible? If we didn’t have to worry
about enough RAM, CPU, network bandwidth, and so
on, what would we design to satisfy the requirements?
• Can we do better?
• For any such design, we ask, “Can we do better?” For
example, can we make the system meaningfully faster,
smaller, more efficient? If the design solves the problem
in O(N) time, can we solve it more quickly—say, O(ln(N))
6
7. Two-Phases
1. Scale-up the basic design
• Is it feasible?
• Is it possible to scale this design, given constraints on
money, hardware, and so on? If necessary, what
distributed design would satisfy the requirements?
• Is it resilient?
• Can the design fail gracefully? What happens when this
component fails? How does the system work when an
entire datacenter fails?
• Can we do better?
7