The document describes the OGCE WorkflowSuite, which provides tools for composing and executing scientific workflows. It includes the Generic Service Toolkit for wrapping applications as web services, the XRegistry for information sharing, and XBaya for graphical workflow composition and monitoring. Workflows can integrate various resources and be made flexible, dynamic, and interoperable. Example applications discussed are weather forecasting, genome analysis, and computational evaluation.
1. OGCE WorkflowSuite for Science
Gateways
Suresh Marru, Raminder Singh,
Chathura Herath & Marlon Pierce
Indiana University
2. OGCE
Gateways TeraGrid
User Portal
(LEAD,
GridChem,
…)
TG GIG
Generalize,
Harden, Build
Test
Gateways/E-Science Community
3. Requirements from gateways
• Gateways demand scientific workflow systems
to be:
– Flexible
– Dynamic
– Interactive
– Technology Adaptive
– Interoperable with Emerging Computational
Resources and their job management interfaces
4. OGCE Workflow Suite
• Generic Service Toolkit
– Tool to wrap command-line applications as web services
– Handles file staging&job submissions
– Extensible runtime for security, resource brokering& urgent computing
– Generic Factory service for on-demand creation of application services
• XRegistry
– Information repository for the OGCE workflow suite
– Register, search, retrieve&share XML documents
– User & hierarchical group based authorization
• XBaya
– GUI based tool to compose&monitor workflows
– Extensible support for compiler plug-ins like BPEL &Jython
– Dynamic Workflow Execution support to start, pause, resume, rewind
of workflow executions
OGCE Workflow Tutorial
5. Features
• Security
– Authentication and authorization
– Secure invocations between services
– Support for gateway community accounts
– Support for multiple user accounts
• Reliability
– Retry job submissions and file staging
– Fault Tolerance and Recovery service
• Over-provisioning and migration
• Compatibility
– Taverna, Kepler and Trianna
OGCE Workflow Tutorial
6. Application Services
• Workflows are built by composing web Application Factory
services c
– Fortran applications are “wrapped” by a
Application Factory which generates a web
service for the app.
• Registers WSDL for the service with a registry
App
– Each service generates a stream of Service
notifications that log the service actions back
to the XMC Cat Metadata Catalog.
Run program
& publish events
7. Workflow Composition, Execution
& Monitoring
Baya enables users to
construct, share, execute
and monitor sequence of
tasks executing on their
local workstations to
high-end compute
resources.
8. Service Monitoring via Events
• The service output is a stream of events Application
Service
– I am running your request Instance
– I have started to move your input files.
– I have all the files 6
5
– I am running your application. 4
– The application is finished 3
– I am moving the output to you file space 2
1
– I am done.
• These are automatically generated
by the service using a
distributed event system
(WS-Eventing / WS-Notification) Notification
– Topic based pub-sub system with Channel
a well known “channel”.
Subscribe
Topic=x x
x
listener publisher
11. XML Metadata Catalog (XMC Cat)
Taming Complex Scientific Metadata Schemas
“A significant need exists in
many disciplines for long- Message Bus
term, distributed, and
Notifications
Workflow
Workflow
N otification
s
stable data and metadata Record
Workflo
w Outputs
repositories”
Intermediate Results
Workflow Configuration and
– NSF Blue-Ribbon Advisory In puts Metadata Catalog
rkflow
Panel on Cyberinfrastructure r d Wo
Reco
s
low
ws
sults
o rkf
lo
W
kf
Workflow or
e
or
yF
Search R
rW
Co e r
Qu
ito
mp
on
os
“Metadata is key to being eW
M
or
kfl
ow
able to share results”
– UK e-Science Core Programme Study
Portal
More Info: Scott Jensen
12. Applications
• LEAD
– Lower entry barrier to using weather analysis tools
– Improve detection, analysis & prediction of mesoscale weather
• Motif-Network
– Transformation of sequenced genomes to “domain-space”
• Cyber-Infrastructure Evaluation
– Performance evaluation of future supercomputer architectures
• ADAM
– Algorithms for feature extraction, data normalization, classification
and normalization
• GridChem
– Molecular Chemistry Grid helping researchers run chemistry
applications on Grid Environment
OGCE Workflow Tutorial
13. LEAD: A Weather Forecasting Workflow (1/2)
Terrain data files
NAM, RUC, GFS data 9
3 3D Model Data
1 Interpolator
Terrain 3D Model (lateral Boundary
Data Surface data, Conditions)
Preprocessor upper air mesonet data and
Interpolator
(Initial Boundary wind profiler data 11 15
Conditions)
2
ARPS to WRF IDV
WRF Static Data
Preprocessor Interpolator
4
88D Radar
Re-mapper
Surface, terrestrial
7
data files 10 WRF
ADAS WRF
ARPS 12 WRFWRF
Radar data
Run once per (Level II) Ensemble
forecast region 5 Generator
13
NIDS Radar 8
Radar data Re-mapper WRF to ARPS Data
(Level III) ADAM Interpolator
Satellite 6
data Visualization on
Satellite Data users request
Re-mapper 14
Repeat ARPS Plotting
periodically Program
for new data Data mining:
look for storm
signature Triggered if a storm
Static data Real time data Initialization Forecast
13
Visualization is detected
Analysis Data Mining
14. LEAD: A Weather Forecasting Workflow (2/2)
WRF-Static running
on Tungsten
OGCE Workflow Tutorial
15. Motif-Network: Whole Genome
workflow
• Domain webs of large genomes
– Input list of amino acid sequences
– Identify all known domains
– Construct webs
Ensemble-type processing
(minimal network reqs)
Capacity-type computing
Parallel processing
Capability-type computing
Jeff Tilson, RENCI
16. CI: Execute Sub-Workflow
• Input a campaign step filename
• Execute GAMESS per step
specification
Jeff Tilson, RENCI
17. Example: “Optimal” Weather
Prediction Using Dynamic Adaptivity
Storms Forming
Forecast Model
Streaming
Observations Data Mining
Instrument Steering
Refine forecast grid
On-Demand
Grid Computing