1. Mitglied der Helmholtz-Gemeinschaft
ISSGC’09
7th International
Summer School on
Grid Computing
UNICORE day at ISSGC’09
Presenters:
Rebecca Breu, Bastian Demuth, Mathilde Romberg
Jülich Supercomputing Centre (JSC)
6. Juli 2009
7 July 2009
2. ISSGC’09
Agenda
9:00 – 10:30 Principles of Job Submission and Execution Management
Set the scene
11:00 – 12:30 UNICORE – Architecture and Components
Technical overview on how UNICORE works and how it is used
14:00 – 15:30 UNICORE Basic Practical
Practical: submitting jobs with the command line client
16:00 – 17:30 UNICORE Workflow Practical
Practical: submitting workflows with the graphical client
18:00 – 19:00 UNICORE: An Application
Example applications using UNICORE
07/07/2009 Slide 2
3. Mitglied der Helmholtz-Gemeinschaft
ISSGC’09
7th International
Summer School on
Grid Computing
Session 9:
Principles of Job Submission and
Execution Management
Author/Presenter: Achim Streit, Mathilde Romberg
Jülich Supercomputing Centre (JSC)
6. Juli 2009
5. ISSGC’09
Jobs
Job
Some work to be executed
Requires CPU and memory
Possibly accesses additional resources,
e.g., storage, devices, services
Job scheduling
Policy for assigning jobs to resources
Courtesy of Prof. Felix Wolf, RWTH Aachen
07/07/2009 Slide 5
6. ISSGC’09
Resources
Compute
Memory
Central Processing Units
Nodes
Threads/Tasks
Data
Size
Transfer rate
Network
Bandwidths
...
07/07/2009 Slide 6
7. ISSGC’09
How to Differentiate Compute Resources?
by number of CPUs
Single processor
Multi processor
Multiprocessor systems can be grouped into
Shared memory
Equal access time to memory from each processor
Distributed memory
Each CPU has its own memory and I/O
Different address spaces
Distributed shared memory
Shared address space
Access time depends on location of data in memory
07/07/2009 Slide 7
8. ISSGC’09
Multiprocessor Systems – Examples
Jülich
SMP (Symmetric (shared-memory) MultiProcessors)
IBM Power 4/5/6 node, multi-core chips
MPP (Massively Parallel Processor) Jülich Helsinki
IBM Blue Gene/P, Cray XT4
NUMA (Non-Uniform Memory Access) Munich
SGI Altix
Cluster:
Barcelona
Mare Nostrum, IBM Power4/5/6 system
Tera-10, self-built cluster
07/07/2009 Slide 8
10. ISSGC’09
Job Scheduling
Policy for assigning jobs to resources
Input are
Set of jobs with requirements
Set of resources
Criteria for assignment
Fairness
Efficiency
Minimize response time (interactive users)
and turnaround time (batch jobs)
Maximize throughput
Courtesy of Prof. Felix Wolf, RWTH Aachen
07/07/2009 Slide 10
11. ISSGC’09
Usage of Multiprocessor Systems
Typically the user/job resource demands are greater than
the available resources users/jobs compete
Typically resource requirements differ from one user
(or application) to the other
Large/small (in terms of number of processors)
Large/small (in terms of amount of memory)
Long/short (in terms of duration of resource usage)
A form of resource management and job scheduling is
required !
How to share the available resources among the
competing jobs?
When does a job start and which resources are
assigned?
07/07/2009 Slide 11
12. ISSGC’09
Resource Management & Job Scheduling – 1
Time-sharing (or time-slicing)
Several jobs share the same resource
Jobs are executed quasi-simultaneously
Resources are not exclusively assigned to jobs
Resource usage of jobs is reduced to short time slices
(some clock ticks of the processor)
Jobs need more than a single time slice to complete
Each job gets the resource assigned in a round-robin fashion
New jobs start immediately
Execution time takes longer than on a dedicated resource
Typically handled by the operating system
Examples: SMP machines, your own Linux PC
07/07/2009 Slide 12
13. ISSGC’09
Resource Management & Job Scheduling – 2
Space-sharing (or space-slicing)
Resources are exclusively assigned to a job until it completes
Jobs may have to wait for enough free resources until their start
Needs a separate resource management system (also known as
batch system) and job scheduler
Examples:
MPP systems, clusters, etc.
LoadLeveler, Torque + Maui, PBSPro, OpenCCS, SLURM, …
space-sharing based resource management and
job scheduling is commonly used on clusters
07/07/2009
and other multiprocessor systems Slide 13
14. ISSGC’09
Job Submission on Multiprocessor Systems
Example – LoadLeveler
IBM Tivoli Workload Scheduler LoadLeveler
Available for AIX, Linux
Basic LoadLeveler commands
llsubmit
Submit a job
llq
Show queued and running jobs
llcancel <job_id>
Delete a queued or running job
llstatus
Displays status information
Job submission via job command file
llsubmit <cmdfile>
07/07/2009 Slide 14
16. ISSGC’09
LoadLeveler cmd_file examples – 2
IBM p690 eServer Cluster 1600 @ Jülich – JUMP
#@ job_type = parallel
#@ output = out.$(jobid).$(stepid)
#@ error = err.$(jobid).$(stepid)
#@ wall_clock_limit = 00:15:00 runtime/duration
#@ notify_user = a.streit@fz-juleich.de
#@ node = 2
resource
#@ total_tasks = 64
#@ data_limit = 1.5GB requirements
#@ queue
myprogram executable
#@ node: number of nodes for the job
#@ total_tasks: number of total tasks in the job
07/07/2009 Slide 16
17. ISSGC’09
Job Submission on Multiprocessor Systems
Example – Torque + Maui
Torque is the resource manager
Maui is the cluster scheduler
Basic Maui commands Submit a new job
msub
showq
Displays detailed and prioritized
list of active and idle jobs
canceljob
Cancels existing jobs
showstart
Shows estimated start time of
idle jobs
showstats
Shows detailed usage statistics
for users, groups, and accounts,
the user has access to
07/07/2009 Slide 17
18. ISSGC’09
Job submission in Maui
Via commandline:
msub -l nodes=32:ppn=2,pmem=1800mb,walltime=3600 myscript
resource list: script file
32 nodes with 2 processors each
1800 MB per task
3600 seconds duration
07/07/2009 Slide 18
19. ISSGC’09
Lessons Learned
Each job submission system is different
Different commands for submission, status query,
cancellation
Different options, scheduling policies, …
Even different configurations of the same job submission systems
for different multiprocessor systems exist
Job requirements are specified differently
Command-line parameters for the job submission command
Separate job command file
Different job requirements exist
Nodes and tasks per node, total tasks, …
07/07/2009 Slide 19
20. ISSGC’09
Job submission and the Grid
A higher, meta level with more abstraction is needed to
describe the requirements of jobs in a Grid of
heterogeneous systems
A lot of proprietary solutions exist, each Grid middleware
is using its own language, e.g.
AJO in UNICORE 5, ClassAds/JDL in Condor, JDL in
gLite, RSL in Globus Toolkit, xRSL in ARC/NorduGrid,
etc…
And there is JSDL 1.0
Open Grid Forum (OGF) standard
http://www.gridforum.org/documents/GFD.56.pdf
07/07/2009 Slide 20
21. ISSGC’09
JSDL – Introduction
JSDL stands for Job Submission Description Language
A language for describing the requirements of computational
jobs for submission to Grids and other systems
Can also be used between middleware systems or for
submitting to any Grid middleware ( interoperability)
JSDL does not define a submission interface or what the results
of a submission look like or how resources are selected
07/07/2009 Slide 21
22. ISSGC’09
JSDL Document
A JSDL document is an XML document, which may contain
Generic (job) identification information
Application description
Resource requirements (main focus is computational jobs)
Description of required data files
A JSDL document is a template, which can be submitted multiple
times and can be used to create multiple job instances
No job instance specific attributes can be defined,
e.g. start or end time
07/07/2009 Slide 22
23. ISSGC’09
JSDL – A Hello World Example
<?xml version="1.0" encoding="UTF-8"?>
<jsdl:JobDefinition
xmlns:jsdl=“http://schemas.ggf.org/2005/11/jsdl”
xmlns:jsdl-posix=http://schemas.ggf.org/jsdl/2005/11/jsdl-posix>
<jsdl:JobDescription>
<jsdl:Application>
<jsdl-posix:POSIXApplication>
<jsdl-posix:Executable>/bin/echo<jsdl-posix:Executable>
<jsdl-posix:Input>/dev/null</jsdl-posix:Input>
<jsdl-posix:Output>std.out</jsdl-posix:Output>
<jsdl-posix:Argument>hello</jsdl-posix:Argument>
<jsdl-posix:Argument>world</jsdl-posix:Argument>
</jsdl-posix:POSIXApplication>
</jsdl:Application>
</jsdl:JobDescription>
</jsdl:JobDefinition>
07/07/2009 Slide 23
24. ISSGC’09
JSDL – Resource Requirements Description
Support simple descriptions of resource requirements
NOT a comprehensive resource requirements language
Can be extended with other elements for richer or more
abstract descriptions
Main target is compute jobs
CPU, memory, file system/disk, operating system
requirements
Allows some flexibility for aggregate (total) requirements
“I want 10 CPUs in total and each resource should have
2 or more”
Very basic support for network requirements
07/07/2009 Slide 24
25. ISSGC’09
JSDL application extensions
SPMD (single-program-multiple-data)
http://www.ogf.org/documents/GFD.115.pdf
Extends JDSL 1.0 for parallel applications (MPI, PVM, etc.)
Allows to specify number of processors, processors per
host, threads per processes along with the application
HPC (high performance computing)
http://www.ogf.org/documents/GFD.111.pdf
Extends JSDL 1.0 for HPC applications running as
operating system processes (e.g. username, environment,
working directory can be specified)
07/07/2009 Slide 25
26. ISSGC’09
Lessons Learned
JSDL is a standardized language to describe jobs to be submitted
to Grid resources
Not only the job itself (application, arguments, input, output,
etc.), but also resource requirements (CPU, memory, etc.)
Extensions for specific application domains (parallel
programs, HPC applications) exist
BUT: JSDL can not directly be submitted to Grid resources, i.e. a
resource management and job scheduling system of a cluster or
multiprocessor system in a Grid
07/07/2009 Slide 26
27. ISSGC’09
Execution and Job Management – 1
One of the essential functionalities and components in a Grid
middleware
Deals with
Initiating/submitting, monitoring and managing jobs
Handling and staging of all job data
Coordinating and scheduling of multi-step jobs
Examples:
XNJS in UNICORE 6 ( sessions 10-12 today)
WS-GRAM in GT4 ( session 19-21 on Thursday)
WMS in gLite ( session 24-26 on Friday)
ARC Client in NorduGrid/ARC ( session 29-30 on Sat.)
07/07/2009 Slide 27
28. ISSGC’09
Execution and Job Management – 2
Initiating/submitting, monitoring and managing jobs
Translates the Grid job in a specific job (application details,
resources, etc.) for the target system
Submits the job to the resource management system using
its proprietary way of job submission
Frequently polls the job status (waiting/queued,
running/executing, failed, aborted, paused, finished, etc.)
from the resource management system
Provides “access” to the job, its status and data during its
runtime and after its (successful or unsuccessful)
completion
If at job submission time the resource management system
becomes not available/reachable, the job is cached for a
future hand over to it
07/07/2009 Slide 28
29. ISSGC’09
Execution and Job Management – 3
Handling and staging of all job data, incl. job directory and persistency
Creates, manages, destroys the job directory
All data submitted with the job as input is stored in the job
directory
Data is staged in from remote data resources/archives
All data generated by the job is preserved and/or staged after the
successful completion of the job
Coordinating and scheduling of multi-step jobs
If a job consists of more than one step (a workflow), the required
resources are orchestrated
Manages the proper initiation of the workflow execution
The execution of the workflow is controlled and monitored
07/07/2009 Slide 29
30. ISSGC’09
Lessons Learned
Execution and job management is needed
A meta-layer on top of the Grid resources is needed
to provide a uniform way of accessing the Grid
and to provide an intuitive, secure and easy to
use interface for the user
07/07/2009 Slide 30
32. ISSGC’09
(Short) History Lesson
UNiform Interface to COmputing REsources
seamless, secure, and intuitive
Initial development started in two German projects funded by the
German ministry of education and research (BMBF)
08/1997 – 12/1999: UNICORE project
Results: well defined security architecture with X.509 certificates,
intuitive GUI, central job supervisor based on Codine
(predecessor of SGE) from Genias
1/2000 – 12/2002: UNICORE Plus project
Results: implementation enhancements (e.g. replacement of
Codine by custom NJS), extended job control (workflows),
application specific interfaces (plugins)
Continuous development since 2002 in several EU projects
Open Source community development since Summer 2004
07/07/2009 Slide 32
33. ISSGC’09
Projects
WisNetGrid
ETICS2
More than a decade of German and European SmartLM
research & development and PRACE
infrastructure projects D-MON
PHOSPHORUS
Chemomentum
Any many others, e.g. eDEISA
A-WARE
OMII-Europe
EGEE-II
D-Grid IP D-Grid IP 2
CoreGRID
NextGRID
DEISA DEISA2
VIOLA
UniGrids
OpenMolGRID
GRIDSTART
GRIP
EUROGRID
UNICORE Plus
UNICORE
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
07/07/2009 Slide 33
34. ISSGC’09
– Grid driving HPC
Used in
DEISA (European Distributed Supercomputing Infrastructure)
National German Supercomputing Center NIC
Gauss Center for Supercomputing (Alliance of the three
German HPC centers)
PRACE (European PetaFlop HPC Infrastructure) – starting-up
But also in non-HPC-focused infrastructures (i.e. D-Grid)
Taking up major
requirements from i.e.
HPC users
HPC user support teams
HPC operations teams
07/07/2009 Slide 34
35. ISSGC’09
– www.unicore.eu
Open source (BSD license)
Open developer community on SourceForge
Contribution with your own developments easily possible
Design principles
Standards: OGSA-conform, WS-RF compliant
Open, extensible, interoperable
End-to-End, seamless, secure and intuitive
Security: X.509, proxy and VO support
Workflow and application support directly integrated
Variety of clients: graphical, commandline, portal, API, etc.
Quick and simple installation and configuration
Support for many operating and batch systems
100% Java 5
07/07/2009 Slide 35
37. ISSGC’09
Usage in D-Grid
Core D-Grid sites committing parts of
their existing resources to D-Grid
Approx. 700 CPUs
Approx. 1 PByte of storage
UNICORE is installed and used
Additional Sites received extra money
from the BMBF for buying compute
clusters and data storage
Approx. 2000 CPUs
Approx. 2 PByte of storage
UNICORE (as well as Globus
and gLite) is installed as soon LRZ
DLR-DFD
as systems are in place
07/07/2009 Slide 37
38. ISSGC’09
Distributed European Infrastructure
for Supercomputing Applications
Consortium of leading national HPC centers in Europe
Deploy and operate a persistent, production quality,
distributed, heterogeneous HPC environment www.deisa.eu
UNICORE as Grid Middleware
On top of DEISA’s core services:
Dedicated network
Shared file system
Common production
environment at all
sites
Used e.g. for workflow IDRIS – CNRS (Paris, France), FZJ (Jülich, Germany), RZG (Garching,
Germany), CINECA (Bologna, Italy), EPCC ( Edinburgh, UK),
applications CSC (Helsinki, Finland), SARA (Amsterdam, NL), HLRS (Stuttgart, Germany),
BSC (Barcelona, Spain), LRZ (Munich, Germany), ECMWF (Reading, UK)
more in session 33, Monday 9:00 – 10:30
07/07/2009 Slide 38
39. ISSGC’09
Interoperability and Usability of Grid Infrastructures
Focus on providing common interfaces and integration of major Grid
software infrastructures
OGSA-DAI, VOMS, GridSphere, OGSA-BES, OGSA-RUS
UNICORE, gLite, Globus Toolkit, CROWN
Apply interoperability components in application use-cases
07/07/2009 Slide 39
40. ISSGC’09
Grid Services based Environment to enable Innovative Research
Provide an integrated Grid solution for workflow-centric, complex
applications with a focus on data, semantics and knowledge
Provide decision support services for risk assessment, toxicity
prediction, and drug design
End user focus
ease of use
domain specific tools
“hidden Grid”
Based on UNICORE 6
more in sessions 12-13, this afternoon
07/07/2009 Slide 40
41. ISSGC’09
Commercial usage at
07/07/2009 Slide courtesy of Alfred Geiger, T-Systems SfRSlide 43
42. ISSGC’09
Lessons Learned
UNICORE has a strong HPC-background, but is not limited
to HPC use cases, it can be used in any Grid
UNICORE is OGSA-conform and WS-RF compliant
UNICORE is open, extensible and interoperable
UNICORE is open source and coded in Java
UNICORE is used in EU and national projects,
European e-infrastructures, National Grid Initiatives (NGI),
commercial environments, etc.
Documentation, tutorials, mailing lists, community links,
software, source code, and more:
http://www.unicore.eu
07/07/2009 Slide 44