This talk explores a scalable and cost efficient way of deploying and running microservices workloads using quality of service scheduling on top of Amazon EC2 Container service. Running services in a pay as you go fashion will soon be a reality as much as todays on demand compute
Microservices and elastic resource pools with Amazon EC2 Container Service
1. Microservices and elastic resource pools with
ECS
Boyan Dimitrov,
Platform Automation Lead @ Hailo
@nathariel
2.
3. Microservices intro
• Service to service communication libs
• Discovery
• Configuration
• A/B testing capabilities
• Monitoring & Instrumentation
• … and much more
Each service (at Hailo) gets for free:
Service
B
Service
A
Service
C
Service
E
Service
D
Small, self-contained units of execution with
well defined API
Built around business capabilities or domain objects
Responsible for one thing and one thing only
Fully automated lifecycle
AWS Summits 2015
Monolith
App
4. What do we have
AWS Summits 2015
• Microservices ecosystem based on Go
• Designed specifically for the cloud – different building blocks and components
will constantly be in flux, broken or unavailable
• 1000+ AWS instances spanning multiple regions
• 200+ services in production
6. Service deployment specifics
Main goals: Reliability, Ease of Use, Resource EfficiencyAWS Summits 2015
• Each service is decoupled from the rest and deployed individually
• We run multiple services on the same instance
• We rely on auto scaling groups for organizing and scaling our workload
• We use static partitioning to match a service to an auto scaling group
• An automated deployment system takes care of all service lifecycle details
7. Provisioning Service
CI Pipeline
Amazon S3
Provisioning Manager
Provisioning Service
Docker Registry
Deployment overview and our journey towards containers
Instance Instance
Process Container
Auto Scaling GroupAuto Scaling Group
8. How hard is to deploy a service?
service name version
auto scaling group
AWS Summits 2015
9. Is this good enough?
Main goals: Reliability, Ease of Use, Resource Efficiency
service name version
auto scaling group
How do I figure this one out?
Would my service live there forever?
What if my team owns 20+ services ?
As a developer:
AWS Summits 2015
10. What about resource efficiency?
35%
Utilization
85%
Utilization
Auto Scaling Group A
Auto Scaling Group B
AZ eu-west-1a AZ eu-west-1cAZ eu-west-1b
instance instance instance instance instance instance
instance instance instance
Main goals: Reliability, Ease of Use, Resource EfficiencyAWS Summits 2015
11. Challenges
AWS Summits 2015
• Our overall utilization across the services auto scaling groups is between 25%
and 50%
• Performance of individual services is way more complex than simple CPU and
memory calculations. Accumulated interference on the instance needs to be
accounted for
• Static partitioning of services is hard and non scalable
• Our developers should not care about service placement or infrastructure
specifics!
12. So what do we want?
Elastic resource pool
75-80%
Utilization
eu-west-1a eu-west-1b eu-west-1c
One word – such difference!
Main goals: Reliability, Ease of Use, Resource Efficiency
instance
instance
instance
instance
instance
instance
13. Our solution – cluster management on top of an elastic resource pool
Elastic Resource Pool
ECS Agent ECS Agent ECS Agent ECS Agent ECS Agent ECS Agent
QoS Scheduler
eu-west-1a eu-west-1b eu-west-1c
AWS
Cloud Provider
ECS
Cluster Manager
instance
instance
instance
instance
instance
instance
14. Why ECS?
AWS Summits 2015
• It is a managed service!
• It is great for storing and enforcing task state
• Designed with custom schedulers in mind
• The agent code is available on a public GitHub repo and … it is in GO!
• Easy to integrate with other AWS services
15. Why building our own scheduler?
AWS Summits 2015
• Service Priority
• Service specific runtime metrics
• Interference
• Cloud awareness ( availability zones, pool elasticity…)
Running services in a pay as you go fashion will soon be a reality as much as todays
on demand compute
We want a cloud-native scheduler that is aware of the cloud specifics and our
microservices ecosystem:
16. {!
“service”: “Foo”!
”minCPU": 10,!
”minMemory": 500,!
“minInstances”: 3,!
“Priority”: “Default”!
}
{!
“service”: “Baz”!
”minCPU": 50,!
”minMemory": 1500,!
“minInstances”: 3,!
“Priority”: “Critical”!
}
Take Service Priority as an example
AWS Summits 2015
17. t0
t1
X
Star6ng
t2
Service criticality ma ers when resources are constrained
AWS Summits 2015
instance
instance
instance
instance
instance
instance
instance
instance
instance
instance
instance
instance
t3
We built our custom provisioning system and we started by running a number of services on a single instance
Initially we were running services as normal processes on the instance but this started causing noisy neighbour problems
Several months ago we gradually started moving to containers aiming for isolation and resource control capabilities.
Using static partitioning leaves a lot of unused resources
When running many services on the same instance, using generic auto scaling group triggers is inefficient
Running a lot of services together, containers or not, creates interference
We want an elastic resource pool where services are scheduled on a need to basis
We don’t want to manage services manually and leave that to a smart scheduler