4. About Me
● Software engineer
○ Netflix Edge Engineering
○ Sun Microsystems + Oracle Corp.
○ Resource scheduling, stream processing,
distributed systems
● Author of Fenzo scheduling library
5. ● Why Apache Mesos?
● Why focus on scheduling?
● How to guarantee capacity for various apps?
● What’s needed from the container executor?
Let’s address a few questions
13. A few common themes
Large variation in peak to trough resource
requirements
Mantis
events/sec
8M
2M
Titus
concurrent jobs
1000s
10s
14. A few common themes
Heterogeneous mix of jobs and resources
Resource Task request Agent sizes
CPU 1 - 32 CPUs 8 - 32 CPUs
Memory 2 - 200+ GB 32 - 244 GB
Network
bandwidth
10 - 1024 Mbps 1024 - 10240
Resource affinity based on task type
Task locality
15. A few common themes
Jobs needing high availability of tasks across ephemeral cloud
resources
Host1
ec2 zone=d
Host2
ec2 zone=e
Host3
ec2 zone=f
Job with N tasks
16. What kind of scheduler do I need?
Scheduler
Cluster wide optimizations:
#servers, heterogeneous mix, security
User centric
optimizations:
Resource affinity,
task locality
Assignments
Achieve multiple scheduling objectives
17. Functions of a framework
Framework
API
Resource
Scheduling
Persistence
Domain
specific
Environment
specific
Potentially
common
18. NetflixOSS Fenzo scheduling library
https://github.com/Netflix/Fenzo
● Heterogeneous mix of task and resource sizes
● Autoscaling of Mesos agent clusters
● Customizable scheduling objectives
20. For each task
On each host
Validate hard constraints
Eval fitness and soft constraints
Until fitness “good enough”, and
A minimum #hosts evaluated
Fenzo Scheduling strategy
= Plugins
Sample plugins: bin packing fitness function and soft/hard constraint evaluators for resource
affinity and task locality
21. Fenzo agent cluster autoscaling
● Scaling up is relatively easy
● Scaling down requires bin packing
○ By resource footprint, runtime, etc.
Host 1 Host 2 Host 3 Host 4
vs.
Host 1 Host 2 Host 3 Host 4
24. Capacity guarantees
Guarantee capacity for timely job starts
Mesos support for quotas, etc. evolving^
Agreed upon
Generally, optimize throughput for batch jobs and start
latency for service jobs
25. Capacity guarantees
Guarantee capacity for timely job starts
Mesos support for quotas, etc. evolving^
Agreed upon
Some service style jobs may be less important
Categorize by expected behavior instead:
Critical versus Flex (flexible scheduling requirements)
Generally, optimize throughput for batch jobs and start
latency for service jobs
30. Capacity guarantees: hybrid view
Head of line blocking
What if ‘Critical’ task isn’t satisfied?
Or, it isn’t ready?
Dynamic scheduling
Critical
Flex
31. Capacity guarantees: hybrid view
Head of line blocking
What if ‘Critical’ task isn’t satisfied?
Or, it isn’t ready?
Automatic advance reservation
Task T2
Dynamic scheduling
T1 T2
HostA
Critical
Flex
Time
32. Capacity guarantees: hybrid view
Head of line blocking
What if ‘Critical’ task isn’t satisfied?
Or, it isn’t ready?
Automatic advance reservation
Task T2
Dynamic scheduling
T1 T2
HostA
Critical
Flex
Time
Underutilization
33. Capacity guarantees: hybrid view
Head of line blocking
What if ‘Critical’ task isn’t satisfied?
Or, it isn’t ready?
Automatic advance reservation
Task T2
Back filling improves utilization
Task T3
Dynamic scheduling
T1 T2
Time
T3
HostA
Critical
Flex
35. Capacity guarantees: “utilization”
What if ‘Critical’ is under utilizing?
Let Flex use it, but …
Preemptions
“Fairness” via composable functions
Critical
Flex
38. Container executor
+ <
Augment missing pieces:
IP per container
Security - Security Groups, IAM roles
Isolation for networking b/w, disk I/O
MULTI-TENANT
39. No IP Needed
Task 0
SecGrp Y
Task 1 Task 2 Task 3
docker0 (*)
EC2 VMeth0
eni0
SG=Titus Agent
eth1
eni1
SecGrp=X
eth2
eni2
SG=Y
IP 1
IP 2
IP 3
pod root
veth<id>
app
SecGrp X
pod root
veth<id>
app
SecGrp X
pod root
veth<id>
appapp
veth<id>
Linux Policy
Based Routing
EC2
Metadata
Proxy
169.254.169.254
IPTables NAT (*)
* **
169.254.169.254
Plumbing VPC Networking into Docker