This document discusses lessons learned about building scalable service architectures. It covers key principles like making services self-contained and disposable, treating backing services as external resources, using stateless processes and storing data externally, implementing microservices patterns, monitoring systems and traffic, load balancing, failure isolation, and automating deployments. The goal is to build systems that can handle increased workloads by applying these scalability strategies in a cost-effective manner.
11. Defining scalability
Scalability is the ability to handle increased workload
by repeatedly applying a costeffective strategy for
extending a system’s capacity.
(CMU paper, 2006)
How well a solution to some problem will work when
the size of the problem increases. When the size
decreases, the solution must fit. (dictionary.com and
Theo Schlossnagle, 2006)
12. Self-contained
service
Explicitly declare and
isolate dependencies
Isolation from the outside
system
Static linking
Pay attention to GPL
Do not rely on system
packages
13. Disposability Maximize robustness with
fast startup and graceful
shutdown
Disposable processes
Graceful shutdown on
SIGTERM
Handling sudden death:
robust queue backend
14. Backing Services Treat backing services as
attached resources
No distinction between
local and third party
services
Easily swap out resources
Export services via port
binding
Become the backing
service for another app
Drawing source: 12factor.net
15. Processes,
concurrency
Stateless processes (not
even sticky sessions)
Process types by work type
We <3 linux process
Container > VM
Shared-nothing adding
concurrency is safe
Process distribution
spanning machines
16. Statelessness Store everything in a
datastore
Aggregate data
Aggregator / map &
reduce
CQEngine
Chandra
Scalable datastores
Handling user sessions
17. Microservices Self-contained
Disposable
Stateless
Shared-nothing
API communication
Dependency management
moved to external
Be Warned!
Image credits: christofcoetzee.co.za, techblog.netflix.com
18. Monitoring Metrics collecting
Graphite, New Relic
Self-aware applications
Cluster state
Zookeeper, Consul
Scaling decisions
Capacity amount
Graph derivative
App requests
19.
20. Load Balance DNS or API
App level balance
Uniform entry point or
proxy
Balance decisions
Load
Zookeeper state
Resource policies
25. Reading
Scalable Internet Architectures by Theo Schlossnagle
The 12-factor App: http://12factor.net/
Carnegie Mellon Paper: http://www.sei.cmu.edu/reports/06tn012.pdf
Circuit Breaker: http://martinfowler.com/bliki/CircuitBreaker.html
Release It! by Michael T. Nygard
Netflix Tech Blog: http://techblog.netflix.com/
Quick description of the streaming stack, roles of components, how they require scaling
- Transcontroller/transcoder scaling
- UMS scaling
Quick description of the streaming stack, roles of components, how they require scaling
- Transcontroller/transcoder scaling
UMS scaling
30 day viewer graph. Clear peaks -> need for scaling
Scaling delivery CDN, UCDN, other talk
Scaling applications!
Now comes some scaling theory
Carnegie Mellon University paper by Charles B. Weinstock, John B. Goodenough: On System Scalability
LINFO: The Linux Information Project http://www.linfo.org/
Next: principles
Example: calling imagemagick or curl from code – they might be there or might not be
Bundle everything into the app instead
Disposable process: they can be started or stopped at a moment’s notice
For a web process, graceful shutdown is achieved by ceasing to listen on the service port (thereby refusing any new requests), allowing any current requests to finish, and then exiting. Implicit in this model is that HTTP requests are short (no more than a few seconds), or in the case of long polling, the client should seamlessly attempt to reconnect when the connection is lost.
For a worker process, graceful shutdown is achieved by returning the current job to the work queue.
A backing service is any service the app consumes over the network as part of its normal operation. Examples include datastores (such as MySQL or CouchDB), messaging/queueing systems (such as RabbitMQ or Beanstalkd), SMTP services for outbound email (such as Postfix), and caching systems (such as Memcached).
Put a resource locator in the config only – environment variables
Example: Easily swap out a local mysql to a remote service
The app does not rely on runtime injection of a webserver into the execution environment to create a web-facing service. The web app exports HTTP as a service by binding to a port, and listening to requests coming in on that port.
One app can become the backing service for another app, by providing the URL to the backing app as a resource handle in the config for the consuming app
Handle diverse workloads by assigning each type of work to a process type. For example, HTTP requests may be handled by a web process, and long-running background tasks handled by a worker process
An individual VM can only grow so large (vertical scale), so the application must also be able to span multiple processes running on multiple physical machines.
Aggregate everything within the app and write it out in bulk – careful about write frequency, must not lose too many data on a crash
Aggregator map-reduce
Redis: scales reads, write problematic
Cassandra: quick scaling questionable
Aerospike: scales reads and writes, working together with their eng team
User sessions: persistent connection, NIO+
Aggregate everything within the app and write it out in bulk – careful about write frequency, must not lose too many data on a crash
Aggregator map-reduce
Redis: scales reads, write problematic
Cassandra: quick scaling questionable
Aerospike: scales reads and writes, working together with their eng team
User sessions: persistent connection, NIO+
Report everything to graphite, constantly check graph trends automatically
Apps are self-aware, they know their health
App instances report into Zookeeper and thus know about each other
Central logic can request resource based on capacity or graph, app can request based on self-check or zookeeper
Zookeeper, Consul: miért, mik az előnyei
load balancing distributes workloads across multiple computing resources
Flexibility: can increase or decrease its own size, example: Threadpools
Adapting to CPU, RAM, disk, network
App level: transcontroller selects transcoder
App level balance with proxy can be SPOF, careful
Resource policies: even distribution, keep large chunks free for possible large tasks (transcoder use case), group requests together on some attribute (pro, etc)
Failure inevitable because: large numbers, hw issues, independent network
Hystrix by Netflix 2011/12
Circuit Breaker: Martin Fowler post from 2014
Decoupling: serving one request should not wait on others
Service decoupling example: inserting layers between DB and UMS -> RGW. Then another layer between RGW and UMS -> Queue
Antipattern example: connection limit, if filled up, new connections are kept waiting until a resource frees up
Docker: build images from dockerfile, deploy from repository
Tasks before shutdown: moving jobs, log collection, sleep
van egy environment, es abba rakunk egy kubernetest, ez N darab gepen fut, abbol a nehany gepbol 1-3 az maga a Kubernetes master, a tobbi pedig a worker nodeok, azokon futnak a userek alkalmazasai. A kornyezetek jol elszeparaltak, nincsennek egymassal hatassal, de ha kell megtalaljak egymast
A pod-ok azok kontenerek halmazai (1 vagy tobb, altalaban 1 amugy)[9:33] Peldaul az egy POD hogy UHS meg Chunkserver van egy gepen, mert a chunkserver olvassa az Ingest altal kiirt fajlokat.[9:33] Ez ket kontener, de megis egy POD.
egy app az servicek es replication controllerek-bol allnak, a replication controller ami felugyeli hogy kello szamu instance legyen egy adott POD-bol (anya POD)
Logs: logs as stream / stdout (factor #9), collect / transport / process
Scaling API: Other considerations: price, network line to the cloud provider, instance type (spot vs normal)
Openstack, Ganeti