1. Introduction Off-Line Problem On-Line Problem Summary Appendix
Dynamic Fractional Resource Scheduling
for Cluster Platforms
Mark Stillwell
Department of Information and Computer Sciences
University of Hawai’i at M¯anoa
Achievement Rewards for College Scientists
2010 Scholarship Awards Program
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
2. Introduction Off-Line Problem On-Line Problem Summary Appendix
Clusters
Definition
A cluster is a group of independent computers, or nodes,
working together closely, usually connected by a high-speed
network.
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
3. Introduction Off-Line Problem On-Line Problem Summary Appendix
Jobs
The system can accept user requests to run jobs or the
administrator can instantiate jobs directly
Running jobs are made up of nearly identical tasks
The number of tasks is specified by user/administrator
Tasks can block while communicating with each other
The assignment of resources to tasks is called scheduling
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
4. Introduction Off-Line Problem On-Line Problem Summary Appendix
Current Approaches
Service Hosting (Off-Line scheduling)
traditionally, dedicated machines
current interest in server consolidation
few good theoretical models
heavy “engineering” bias
High-Performance Computing (On-Line Scheduling)
Usually First-Come-First-Served (FCFS) with backfilling
Backfilling needs (unreliable) compute time estimates
Unbounded wait times
Inefficient use of nodes/resources
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
5. Introduction Off-Line Problem On-Line Problem Summary Appendix
Our Proposal
Use virtual machine technology.
Multiple tasks on one node
Performance isolation
Sharing of fractional resources
Define a run-time computable metric that captures notions
of performance and fairness.
Design heuristics that allocate resources to jobs while
explicitly trying to achieve high ratings by our metric.
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
6. Introduction Off-Line Problem On-Line Problem Summary Appendix
Requirements, Needs, and Yield
Tasks have memory requirements and CPU needs
All tasks of a job have the same requirements and needs
For a task to be placed on a node there must be memory
available at least equal to its requirements
A task can be allocated less CPU than its need, and the
ratio of the allocation to the need is the yield
All tasks of a job must have the same yield, so we can also
speak of the yield of a job
The yield of a job gives its performance relative to if it were
run on a dedicated system
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
7. Introduction Off-Line Problem On-Line Problem Summary Appendix
Off-Line Problem Assumptions
Steady-state execution with infinite jobs
Models an ideal service hosting environment
Makes the problem more tractable [Marchal et al., 2006]
Good when job duration longer than schedule time
Jobs are serial (single task)
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
8. Introduction Off-Line Problem On-Line Problem Summary Appendix
Objective Function
The performance of a job is correlated to its yield
Maximizing the average or sum of the yields may be unfair
to some jobs
Instead, we seek to maximize the minimum yield
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
9. Introduction Off-Line Problem On-Line Problem Summary Appendix
Task Placement Heuristics
Greedy Task Placement – Incremental, puts each task on
the node with the lowest computational load on which it
can fit without violating memory constraints
Randomized Rounding Task Placement – Based on
relaxing an MILP to an LP and rounding probabilistically
MCB Task Placement – Global, iteratively applies
multi-capacity (vector) bin-packing heuristics during a
binary search for the maximized minimum yield
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
10. Introduction Off-Line Problem On-Line Problem Summary Appendix
Large Problem Set: Minimum Yield vs. Free Memory
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.3
0.35
0.4
0.45
0.5
0.55
Slack
MinimumYield
bound
MCB8
GR
GB
SG
SGB
RRND
RRNZ
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
11. Introduction Off-Line Problem On-Line Problem Summary Appendix
Conclusions
The problem is tractable
Multi-capacity bin packing algorithms perform well
Greedy algorithms perform okay, and are very fast
Randomized rounding approaches are not very promising
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
12. Introduction Off-Line Problem On-Line Problem Summary Appendix
On-Line Problem Assumptions
Job arrival/completion times are not known in advance
Jobs are temporary
The user wants a final result
Quick turnaround relative to runtime is desired
Jobs are not interactive
So can wait until resources are available to start
We avoid the use of runtime estimates
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
13. Introduction Off-Line Problem On-Line Problem Summary Appendix
Stretch
Our goal: minimize maximum stretch (aka slowdown)
Stretch: the time a job spends in the system divided by the
time that would be spent in a dedicated
system [Bender et al., 1998]
Popular to quantify schedule quality post-mortem
Not generally used to make scheduling decisions
Runtime computation requires (unreliable) user estimates.
Minimizing average stretch prone to starvation
Minimizing maximum stretch captures notions of both
performance and fairness.
Our approach: try to maximize minimum yield
Similar, but not the same, as minimizing maximum stretch
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
14. Introduction Off-Line Problem On-Line Problem Summary Appendix
Max Stretch Degradation vs. Load, no migration cost
1
10
100
1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
StretchDegradationFactor
Load
EASY
FCFS
GREEDY
GREEDYP
GREEDYPM
DynMCB8
MCB8P 600
MCB8PG 600
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
15. Introduction Off-Line Problem On-Line Problem Summary Appendix
Conclusions
DFRS approaches can significantly outperform traditional
approaches
Aggressive repacking can lead to much better resource
allocations
But also to heavy migration costs
Greedy migration is not that useful
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
16. Introduction Off-Line Problem On-Line Problem Summary Appendix
Summary
We have proposed a novel approach to job scheduling on
clusters, Dynamic Fractional Resource Scheduling, that
makes use of modern virtual machine technology and
seeks to optimize a runtime-computable, user-centric
measure of performance called the minimum yield
Multi-capacity bin packing heuristics can be used to find
good solutions
Our approach avoids the use of unreliable runtime
estimates
This approach has the potential to lead to
order-of-magnitude improvements in performance over
current technology
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
17. Introduction Off-Line Problem On-Line Problem Summary Appendix
References I
Bender, M. A., Chakrabarti, S., and Muthukrishnan, S.
(1998).
Flow and Stretch Metrics for Scheduling Continuous Job
Streams.
In Proc. of the 9th ACM-SIAM Symp. On Discrete
Algorithms, pages 270–279.
Marchal, L., Yang, Y., Casanova, H., and Robert, Y. (2006).
Steady-state scheduling of multiple divisible load
applications on wide-area distributed computing platforms.
Intl. J. of High Performance Computing Applications,
20(3):365–381.
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
18. Introduction Off-Line Problem On-Line Problem Summary Appendix
References II
Stillwell, M., Shanzenbach, D., Vivien, F., and Casanova,
H. (2009).
Resource Allocation using Virtual Clusters.
In CCGrid, pages 260–267. IEEE.
Stillwell, M., Vivien, F., and Casanova, H. (2010).
Dynamic fractional resource scheduling for HPC workloads.
In IPDPS.
to appear.
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters