The Glue is the Hard Part: Making a Production-Ready PaaS

The Glue is the Hard Part:
Making a Production-Ready PaaS
Evan Krall
Site Reliability Engineer @ Yelp

Agenda
PaaSTA
What parts does PaaSTA
have?
How did we glue them
together?
Wrap-up
Intro
Context: Yelp before
PaaSTA
What's in a PaaS?
Production-Ready
What makes a PaaS
production-ready?
Lessons learned
Next steps

Yelp’s Mission:
Connecting people with great
local businesses.
4

5
Yelp Stats:
As of Q3 2015
89M 3271%90M

Service Oriented Architecture
Scale our engineering team by splitting our
codebase into many smaller parts
7

Dependency Hell
As services gain adoption, shared libraries
become difficult to upgrade. Not all services
are Python anymore.
8

Too Many Services
We can no longer fit all services on each
service host. How do we split them up?
9

“I wonder how many
organizations that say they're
"doing DevOps" are actually
building a bespoke PaaS. And
how many of those realize it.”
— @markimbriaco
1
0

Scheduling
Decide which hosts run a service
12

Delivery
Put the code on the host and run it
13

Discovery
Tell clients where your service is running
14

What makes a PaaS trustworthy enough to run our
website?
Production-Ready

16
Production-ready systems
minimize impact of failures
impact =
frequency
×
severity
×
duration

A production-ready PaaS
should minimize the impact
of both application failures
and PaaS failures
17

Use stable components (software, hardware)
You will always have failures.
Reduce failure frequency
18

No SPOFs
Keep working when a box dies
20

Graceful Degradation
Avoid full outages when components break
21

Painless upgrades
Upgrades should be easy, without downtime
22

Self-healing
Recover from common failures automatically
24

Alerting
Tell humans when things are still broken
25

Visibility
Make it easy for humans to diagnose issues
26

PaaSTA
Yelp's Open-Source
Docker-based PaaS

PaaSTA
28
● Delivery: Docker
● Scheduling: Mesos + Marathon
● Discovery: Smartstack
● Alerting: Sensu

Delivery in PaaSTA: Docker
29
● Self-contained artifacts
● Provides software flexibility
● Reproducible builds
● Resource limits make scheduling
easier

● Mesos is an "SDK for distributed
systems", batteries not included.
● Requires a framework
○ Marathon (like ASG for Mesos)
○ Chronos (Periodic tasks)
● Supports Docker as task executor
Scheduling in PaaSTA:
Mesos and Marathon
30

Marathon
● Run N copies of Docker image
● Works with Mesos to find space on
cluster
● Replaces dead instances
31

32
from http://mesos.apache.org/documentation/latest/mesos-architecture/

33

34
(Marathon)
(Docker)

35
(Marathon)
(Docker)

36
(Marathon)
(Docker)

37
(Marathon)
(Docker)

How do we build &
distribute Docker images?
38

Building Docker images
39
● Jenkins builds and tests images
● Bless images by creating git tags
○ 1:1 git commit <-> docker image
● Pushes to registry

Shipping Docker images
40
● Distribution via private registry
● S3 bucket shared among all
environments

code
metadata
stagebuild prod
41

How do we configure
Marathon?
42

Aside: Declarative Control
43
● Describe end goal, not path
● Helps achieve fault tolerance
"Deploy 12abcd34 to prod"
vs.
"Commit 12abcd34 should be running in prod"
Gas pedal vs. Cruise Control

Configuring Marathon
44
● Need a wall around Marathon: it has root
on your entire cluster.
● Cron job
● Combines per-service config and
currently-blessed docker image

marathon-$cluster.yaml
45
● # tasks
● CPU, memory
● How to healthcheck your service
● Bounce strategy
● Command / args

How do services talk to each
other?
47

Discovery in PaaSTA:
SmartStack
● Registration agent on each box
writes to ZooKeeper
● Discovery agent on each box reads
from ZK, configures HAProxy
48

Registering with SmartStack
50
● configure_nerve.py queries local
mesos-slave API
● Keeping it local means registration
works even if Mesos master or
Marathon is down.
● We can register non-PaaSTA
services as well

hacheck
service_1
service_2
service_3
Service host
ZK configure_nerve.py
nerve
metadata
healthcheck
Architecture: Registration
51

Nerve registers service instance in ZooKeeper:
/nerve/region:myregion
├── service_1
│ └── server_1_0000013614
├── service_2
│ └── server_1_0000000959
├── service_3
│ ├── server_1_0000002468
│ └── server_2_0000002467
[...]
{
"host":"10.0.0.123",
"port":31337,
"name":"server_1",
"weight":10,
}
ZooKeeper Data
52

Normally hacheck acts as a
transparent proxy for healthchecks:
$ curl -s yocalhost:6666/http/service_1/1234/status
{
"uptime": 5693819.315988064,
"pid": 2595160,
"host": "server_1",
"version": "b6309e09d71da8f1e28213d251f7c",
}
$
hacheck
53

Can also force healthchecks to fail
before we shut down a service
$ hadown service_1
$ curl -s yocalhost:6666/http/service_1/1234/status
Service service_1 in down state since 1443217910: krall
$
hacheck
54

synapse
haproxy
ZK
client
configure_synapse.py
nerve
metadata
traffic
Architecture: Discovery
56

HAProxy
● By default, bind to 0.0.0.0
● Bind only to yocalhost on public-
facing servers
● Gives us goodies for all clients:
○ Redispatch on conn failure
○ Easy request logging
○ Rock-solid load balancing
57

yocalhost
58
● One HAProxy per host
● What address to bind HAProxy to?
● 127.0.0.1 is per-container
● Add loopback address to host:
169.254.255.254
● This also works on servers without
Docker

docker container 2
lo 127.0.0.1
eth0 169.254.14.18
docker container 1
yocalhost
59
lo 127.0.0.1
eth0 169.254.14.17
docker0 169.254.1.1
eth0 10.1.2.3
haproxy
lo 127.0.0.1
lo:0 169.254.255.244

smartstack.yaml
60
● port that HAProxy binds to
● mode (TCP/HTTP)
● Timeouts
● Healthcheck URI

Monitoring a PaaS is different
63
● Things can change frequently
○ Which boxes run which services?
○ What services even exist?
● Traditional "host X runs service Y"
checks don't work anymore.

Monitor the invariants
64
● N copies of a service are running
● Marathon running on X,Y,Z
● All nodes are running mesos-slave,
synapse, nerve, docker
● Cron jobs have succeeded recently

Sensu monitoring
65
● Decentralized checking
● Client executes checks, puts
results on a message queue
● Sensu servers handle results from
the queue, route them to email,
PagerDuty, JIRA, etc.

try:
something that might fail
except:
send failure event
else:
send success event
We can send our own events
66

Lessons Learned
What has PaaSTA taught us?

App-Infra boundary
Permissive enough for developers to do their
job, strict enough to prevent infrastructure
from ballooning
69

The right abstractions can save you a lot of
work if you need to swap components
Between infra components
70

Iterative improvements find
local optima
Sometimes you need to take bigger risks to
get bigger rewards
"Evolution versus Revolution"
71

● It's open source now!
● More polish, docs, examples
● Support more technologies
○ Chronos in-progress
○ Docker Swarm?
○ Kubernetes?
What's next for PaaSTA?
72

Thank you!
Evan Krall
@meatmanek
krall@yelp.com

The Glue is the Hard Part: Making a Production-Ready PaaS

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a The Glue is the Hard Part: Making a Production-Ready PaaS

Semelhante a The Glue is the Hard Part: Making a Production-Ready PaaS (20)

Último

Último (20)

The Glue is the Hard Part: Making a Production-Ready PaaS