1. A Model-Driven Approach to Job/Task
Composition in Cluster Computing
(Slide background intentionally left blank.)
George K. Thiruvathukal (presenter)
Neeraj Mehta,Yogesh Kanitkar, and Konstantin Läufer
Loyola University Chicago
Computer Science / Emerging Technologies Laboratory
http://www.etl.luc.edu
http://code.google.com/p/neighborhood
2. An “Outsider’s” View of Grid/
Cluster/HPC Research
quot;My personal reservation is that most of the Grid [and cluster]
researchers are operating at the level of threads, sockets, and files. I
believe the Grid will be a federation of applications and data servers –
the DataGrid in the taxonomy above rather than a number cruncher
ComputerGrid. This data-centric view means that files (and FTP) are
the wrong metaphor. Classes, methods, and objects (encapsulated data)
are the right metaphor.quot;
Jim Gray, Microsoft and Grid Computing, 2002
http://research.microsoft.com/~Gray/papers/
Microsoft_and_Grid_Computing.doc
3. Outline
• •
About Model-Driven XML Model Interchange
Development (XMI)
• •
CN in a Nutshell Model Transformation
(XMI to CNX)
• UML Overview
• Tagged Values
• UML Activity Diagrams
•
for Task Composition Conclusions
• •
CN/XML Job Descriptors Future Plans
4. Model-Driven
Development
• Approach: Express computational tasks in the form of a high-
level domain model and generate executable code from model
automatically
• Goals/motivation:
• Much easier to adapt high-level model to changing
requirements.
• Hide low-level details of mapping tasks to computational
resources and inter-task communication.
• Independence of specific cluster configuration.
5. CN in a Nutshell
• CN Server: CN Servers run on the various nodes of the cluster.
• CN API: Client programs use the CN API to execute and exploit the
various resources of the cluster.
• CN Intelligent Object Editor: The user could specify the details required to
generate the Client program using this graphical use interface.
• CNX: CNX (XML) is a compositional language that captures the details of
the client program.
• CNX2Java: CNX2Java is an XSLT document that translates CNX to
compilable JAVA code.
• XMI2CNX: XMI2CNX is an XSLT document that translates UML model in
XMI format to CNX.
6. Why UML?
• Standard modeling notation
• Wide tool support (e.g. Eclipse, NetBeans,
Poseidon, Rational, and others)
• Tool interoperability through XMI (XML
Metadata Interchange)
• XMI itself is transformable to other formats
via XSLT or ad hoc transformers.
7. UML Diagrams
• UML supports many dynamic diagrams (beyond sequence
diagrams, the most common case)
• An activity diagram is a visual representation of an activity
graph.
• An activity graph is a state machine whose states represent
actions or subactivities and where transitions out of states
are triggered by the completion of the corresponding actions
or subactivities.
• Activity diagrams are thus intended for modeling computations
whose control flow is driven by internal processing.
8. Activity Diagrams and
Parallel Computing
• In parallel computing, a computational job typically
consists of one or more concurrent tasks whose
dependencies form a directed acyclic graph
• In CN, a client is composed from one or more
such tasks. While the details of jobs and tasks are
implemented modularly in the target programming
language, the jobs and tasks need to be “glued”
together outside of the implementation itself.
9. CN/XML Job
Descriptors (Nutshell)
• CN/XNL (CNX) is a job/task composition language inspired by CSP.
• Task attributes:
• name: user-friendly name
• jar/archive: where to find the code (most CN jobs are Java, so you can pack the job into
a .jar file)
• class: entry point
• dependencies: predecessors
• Task requirements:
• memory: memory needed to run Task
• run model: e.g. create new VM or run threads within current VM
• Task parameters: Properties (key/value pairs)
• Tasks may be composed recursively using parallel or sequential CSP-like constructs. Task
dependencies (and IDs) generated automatically.
10. XML Metadata
Interchange
• XMI is an XML based format for persisting a UML model for possible exchange
among differnet UML tools.
• CNX is a rich descriptor for CN Jobs and Tasks (transformable to XMI)
• UML's XML Model Interchange is a bare bones descriptor framework. Annotations
can be made using tagged values.
• <UML:ActionState>: Each CN task is represented as an ActionState.
• <UML:StateVertex.outgoing>: The tasks to be notified when a particular task has
completed.
• <UML:StateVertex.incoming>: The tasks that must be completed before this task can
run.
• Jobs and tasks are nameless in CN. In XMI, tasks are translated into nodes, which
have document-wide unique identifiers.
11. Tagged Values
• Tagged values allow metadata to be included in the UML activity diagram (model or
domain-specific information)
• Example of CN-specific metadata included as UML “tagged values”:
• jar: tctask.jar
• class: org.jhpc.cn2.trnsclsrtask.TCTask
• memory: 1000
• runmodel: RUN AS THREAD IN TM
• ptype0: java.lang.Integer
• pvalue0: 2
12. Activity Diagram for Guiding Example using Explicit
Concurrency (Static Composition)
13. UML for Guiding Example
(Dynamic Configuration Based on Runtime Parameter)
14. Model Transformation
• The UML model for the CN computation is
created in the form of an activity diagram
• The UML model is exported as an XMI document
• The XMI document is transformed, using XSL
Transformations to a CNX client descriptor
• The CNX client descriptor is transformed, using
XSL Transformations, to a client program in the
target language (currently Java)
18. Conclusions
• This work is at a preliminary stage. CN has a mature implementation but is still
not a prevalent approach.
• We've been successful to go from UML to CN's XML descriptors, to running
programs.
• Not all parallel programs have a simple decomposition. (Breaking news!)
• Other UML functionality will need to be introduced to model complex
(internal) task structure.
• CN has much in common with workflow systems but is not aimed directly at
supporting SOA. CN computations, of course, can be exposed as proper
services.
• That said, UML is aimed at large-scale software engineering. A component
modeled with UML can be embedded in a larger UML diagram and/or stand on
its own.
19. Futures
• CN is an active and ongoing research effort.
• Explore other aspects of UML, especially for modeling
internal task behavior.
• Integrate CN (a P2P clustering framework) with Hydra (a
P2P storage framework).
• Replace CN task messaging/signaling with Trull, an event-
based framework for concurrent (and distributed) systems,
led by Konstantin Läufer (also at Loyola University
Chicago).
• Redo Intelligent Object Editor (not covered here) using
Eclipse or use Eclipse UML plugin.
• Extend work for other grid/clustering projects.