Presented by Casper S. Jensen, Software Engineer, Uber
At Uber, we've been introducing Docker to give service owners more control over their environments. However, everything at Uber is moving very fast so we have had to do it a way such that Docker fitted into the existing infrastructure and services could be migrated seamlessly to Docker without any service interruptions. In this talk we will talk about the challenges we faced while doing this, such as handling both non-Docker and Docker builds, image replication, integration with our deployment systems and other challenges when deploying Docker at scale.
DockerCon EU 2015: Placing a container on a train at 200mph
1. Placing a container on a train at 200mph
Casper S. Jensen
Software Engineer, Uber
2. About Me
● Joined Uber January 2015,
Compute Platform
Denmark, Aarhus office
● PhD, CS
On a completely unrelated topic
● Linux aficionado
● Docker “user” since February
10. Not that hard...
10
You just have to handle
● 24/7 availability across the globe
● Very different markets
● 1000s of developers and teams
● Adding new features like there’s no tomorrow
UberPOOL, UberKITTEN, UberICECREAM, UberEATS,
UberWHATEVERYOUCANIMAGINE
● Hypergrowth in all dimensions
● Datacenters, servers, infrastructure, etc
Basically, you have to make magic happen every time a user
opens the application
12. A fair amount of frustration
12
1)Write service RFC
2)Wait for feedback
3)Do all necessary scaffolding by hand
4)Start developing your service
5)Wait for infra team to write service scaffolding
6)Wait for IT to allocate servers
7)Wait for infra team to provision servers
8)Deploy to development servers and test
9)Deploy to production
10)Monitor and iterate
Steps 5–7 could take days or weeks...
13. It's just not scalable
13
But you have to start somewhere
14. —Internal e-mail, February 2015
“Make it easier for service
owners to manage their local
service environments.”
14
15.
16. New development process
16
1)Write service RFC
2)Wait for feedback
3)Do all necessary scaffolding using tools
4)Start developing your service
5)Deploy to development servers and test
6)Deploy to production
7)Monitor and iterate
19. All the things you did not consider
19
● Routing
● Dynamic service discovery
● Deployment
● Placement engine
● Logging and tracing
● Dual build environments
● Handling of secrets
● Security updates
● Private repositories
● Replicating images across multiple datacenters
Also, how much freedom do you really want to give your developers?
21. uDeploy
21
● Rolling upgrades
● Automatic rollbacks on failure
● Health checks, stats, exceptions,
○ Load-, and system-tests
● Service building
● Build replication
● 4.000+ upgrades/week
● 3.000+ builds/week
● 300+ rollbacks/week
● 600+ managed services
Our in-house deployment/cluster
management system
22. Moving to docker with zero downtime
22
Build multiplexing
We want to keep on trucking while migrating to docker
23. Build process & scaffolding
23
Declarative build scripts
● Service configuration in git
● Preset service frameworks
● Many options
● Generator creating
○ Dockerfile
○ Health checks
○ Entry point scripts inside container
○ In general, all glue between host and service
● Possible to supply custom Dockerfile
service_name: test-uber-service
owning_team: udeploy
backend_port: 123
frontend_port: 456
service_type: clay_wheel
clay_wheel:
celeries:
- queue: test-uber-service
has_celerybeat: true
24. Image replication
24
● Multiple datacenters
● Images must be stored within DCs
● Build once, replicate everywhere
● Traffic restrictions, push but not pull
Current setup
● Stock docker registry
● File back-end
● Docker-mover
● Syncing images using pull/push
● Use notification API to speed up replication
25. Service discovery & routing
25
● Previously, we used HAProxy + scripts to do this
● Now, we use Hyberbahn + TChannel RPC
https://github.com/uber/{hyperbahn|tchannel}
○ Used for docker and legacy services
○ Required in order to move containers around in seconds
○ Dynamic routing, circuit breaking, retries, rate limiting,
load balancing
○ Completely dynamic, no fixed ports
27. 27
● Remove team dependencies
● More freedom
● Not tied to specific frameworks
or versions (hi, Python 3)
● Easy to experiment with new
technologies
● Too much freedom
● Non-trivial integrating with a
large running system
● Infrastructure must be dynamic
throughout
● Containers are only a minor
part of the infrastructure,
don't forget that
The good & the bad
28. Current and future wins
● Today, 30% of all services in docker
● Soon-ish, 100%
● Great improvements in provisioning time (done)
● Framework and service owners can manage their own
environment (done)
● Faster and automatic scaling of capacity (in progress)