1. 3
XCPU
Workload Distribution & Aggregation
Pravin Shinde & Eric Van Hensbergen
Problem Related Work
work node
• Workload distribution hasn’t
evolved much from when we
were batch scheduling tasks to
single machines
• Today’s Cluster Based
Schedulers:
System V UNIX Plan 9 from Bell Labs XCPU (LANL)
• Not interactive.
• Not resilient to failure.
Extended basic procfs
Provided synthetic file Built an application-layer
• Difficult for existing tasks to
dynamically grow or shrink resources system access to process concepts by also enabling provided file system for
allocated to it. information which was control and debug UNIX systems using the
• Difficult to deploy & administer. later extended to a interfaces. The nature of Plan 9 model. XCPU
• Based on middleware instead of hierarchy in Linux procfs. the Plan 9 distributed extended previous work by
integrated with underlying operating namespace also made allowing process creation
system.
these process interfaces to occur via the file system
• In many cases tightly bound to the
available over the network. and allowed for execution
underlying runtime or language.
• Unlikely to function at exascale. and coordination of groups
of processes on remote
systems.
Our Approach
Environment Syntax Name Space File Syntax
• Establish hierarchical namespace of cluster services • mount[–abcC] new old: Bind new on old. servename on old.
[–abcC] servename old [spec]: Mount
• Automount remote servers based on reference (ie. cd /csrv/criswell)
c3
• key=value • bind [–abc] host [remotepath] mountpoint: Import remotepath from machine
• Export local services for use elsewhere within the network /csrv • OBJTYPE=386 • import attach it to mountpoint.
server and
• SYSTYPE=Linux • cd dir: Changeold: Unmountdirectory toold, or everything mounted on old if new is
/csrv
/local
• etc. • unmount [new]
the working
new from
dir.
/local missing.
/l2
t /L • clear: Clear the name space withfile path. Note that path must be present in the
rfork(RFCNAMEG).
/local
/local
/c4
• . path:space being built.
name
Execute the namespace
/l1
/local
/local /local
L /c1 /L
/local /local
/c2 /l1 arch - architecture & platform (ie. Linux i386)
I1 I2
/local /local env - default environment variables for host
/c1
/l2 ns - default name space for host
/local
c1 c2 c3 c4
/local
/c3 /c2 fs - access to host file system
/local /local net - access to host network (i.e. Plan 9 devip)
/c4 /t status - load average, running jobs, available memory
/local /local clone - open to establish new session
/0
/1 - session subdirectories
Control File Syntax
/n
• reserve [n] [os] [arch] - reserve a (number of) resources with os and arch
specification ctl - reservation and task control
• dir [wdir] - set theargs ... - spawn a hostthe task to run the command with
working directory for
env - environment variables for task
• exec commands process
arguments as given
ns - name space for task
• kill - kill the-hostthe device to kill the host command when the ctl file is closed
command immediately
• killonclose set scheduling priority of the host command args - task arguments
• nice [n][path] -the standard output to [path] (on executing host)
- set
• splice splice wait - blocks until all threads complete
status - current task status (reserved, running, etc.)
!"#$%&'
!"#(#
stdin - aggregate standard input for task
!"#$%&
!"#$%&'
!"#(#
!"#(#
stdout - aggregate standard output for task
!"#$%&'
!"#(#
stdio - combined standard I/O for task
- component thread session subdirectories
!"#$%&'
!"#(#
!"#$%& !"#$%&
!"#$%&'
!"#(# !"#(#
/0
/n
!"#(#
ctl - thread control
!"#$%&'
!"#(#
local service proxy service aggregate service !"#$%&'
!"#$%&
!"#(#
!"#$%&'
!"#(#
env - environment variables for thread
ns - name space for thread
!"#(#
local service !"#$%&'
!"#(#
args - thread arguments
Desktop Extension PUSH Pipeline Model wait - blocks until thread completes
remote services
status - current thread status (reserved, running, etc.)
Aggregation Via stdin - standard input for thread
Dynamic Namespace stdout - standard output for thread
Scaling Reliability
and stdio - standard I/O for thread
Distributed Service
Model
This project is supported in part by the
U.S. Department of Energy under
Award Number DE-FG02- 08ER25851 http://www.research.ibm.com/austin
For More Information: http://www.research.ibm.com/hare