Presentation given by Bertram at the Data Integration in the Life Sciences (DILS) Workshop in Leipzig, Germany, 2004.
Reference:
Bowers, Shawn, and Bertram Ludäscher. "An ontology-driven framework for data transformation in scientific workflows." In International Workshop on Data Integration in the Life Sciences (DILS), pp. 1-16. Springer, 2004.
So this isn't new -- but still relevant :-)
ABSTRACT. Ecologists spend considerable effort integrating heterogeneous data for statistical analyses and simulations, for example, to run and test predictive models. Our research is focused on reducing this effort by providing data integration and transformation tools, allowing researchers to focus on “real science,” that is, discovering new knowledge through analysis and modeling. This paper defines a generic framework for transforming heterogeneous data within scientific workflows. Our approach relies on a formalized ontology, which serves as a simple, unstructured global schema. In the framework, inputs and outputs of services within scientific workflows can have structural types and separate seman- tic types (expressions of the target ontology). In addition, a registration mapping can be defined to relate input and output structural types to their corresponding semantic types. Using registration mappings, ap- propriate data transformations can then be generated for each desired service composition. Here, we describe our proposed framework and an initial implementation for services that consume and produce XML data.
Schema on read is obsolete. Welcome metaprogramming..pdf
An ontology-driven framework for data transformation in scientific workflows
1. An Ontology-Driven Framework for
Data Transformation in Scientific
Workflows
Shawn Bowers
Bertram Ludäscher
San Diego Supercomputer Center
University of California, San Diego
2. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
2
Outline
• Background (SEEK Project)
• Scientific Workflows
• The Problem: Reusing Structurally
Incompatible Services
• The Ontology-Driven Framework
• Future Work
3. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
3
Outline
• Background (SEEK Project)
• Scientific Workflows
• The Problem: Reusing Structurally
Incompatible Services
• The Ontology-Driven Framework
• Future Work
4. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
4
Science Environment for
Ecological Knowledge (SEEK)
• Domain Science Driver
– Ecology (LTER), biodiversity, …
• Analysis & Modeling System
– Design and execution of
ecological models and
analysis
– End user focus
– {application,upper}-ware
• Semantic Mediation System
– Data Integration of hard-to-
relate sources and processes
– Semantic Types and
Ontologies
– upper middleware
• EcoGrid
– Access to ecology data and
tools
– {middle,under}-ware
Architecture (cf. US cyberinfrastructure,
UK e-Science)
this paper
5. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
5
Outline
• The SEEK Project
• Scientific Workflows
– Focus: analysis & component integration on
top of data integration
• The Problem: Reusing Structurally
Incompatible Services
• The Ontology-Driven Framework
• Future Work
6. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
6
Promoter Identification in Kepler
[SSDBM’03]
• Problems
– Many components (web
serivces) are NOT
designed to fit!
“The problem P that X solves
is simple, and X doesn’t
solve it well”
– Semantically
meaningful connections
are structurally
incompatible
• Approach
– Distinguish structural
type and semantic type
– Structural type: e.g.
XML Schema
– Semantic type: e.g.
OWL expressions
– Exploit the (optional!)
semantic type as much as
possible
7. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
7
A Very Simple Scientific Workflow
S1
(life stage property)
S2
(mortality rate
for period)
P1
P2
P4
P3 P5
8. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
8
A Very Simple Scientific Workflow
S1
(life stage property)
S2
(mortality rate
for period)
P1
P2
P4
P3 P5
Phase Observed
Eggs
Instar I
Instar II
Instar III
Instar IV
Adults
44,000
3,513
2,529
1,922
1,461
1,300
observations
Population samples for life stages of the
common field grasshopper [Begon et al, 1996]
9. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
9
A Very Simple Scientific Workflow
S1
(life stage property)
S2
(mortality rate
for period)
P1
P2
P4
P3 P5
Phase Observed Period Phases
Eggs
Instar I
Instar II
Instar III
Instar IV
Adults
44,000
3,513
2,529
1,922
1,461
1,300
Nymphal {Instar I, Instar II, Instar III, Instar IV}
Population samples for life stages of the
common field grasshopper [Begon et al, 1996]
Periods of development in terms of phases
life stage periods
observations
10. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
10
A Very Simple Scientific Workflow
S1
(life stage property)
S2
(mortality rate
for period)
P1
P2
P4
P3 P5
Phase Observed Period Phases
Eggs
Instar I
Instar II
Instar III
Instar IV
Adults
44,000
3,513
2,529
1,922
1,461
1,300
Nymphal {Instar I, Instar II, Instar III, Instar IV}
Population samples for life stages of the
common field grasshopper [Begon et al, 1996]
Periods of development in terms of phases
life stage periods
k-value for each period
of observation
[(nymphal, 0.44)]
observations
11. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
11
Scientific Workflows
A scientific workflow consists of a network
of connected services …
A service can be any software
component (including a web service or
even a data source) …
Each service (optionally) takes input and
(optionally) produces output
12. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
12
Scientific Workflows
SEEK adopts a Ptolemy II “workflow” model:
– A service is called an actor
– Each actor has zero or more input and output ports
(and possibly parameters)
– Data flows through a workflow based on
connections made from output to input ports
– (ignored here: different models of computation, directors, …)
S1
(life stage property)
S2
(mortality rate
for period)
P1
P2
P4
P3 P5
13. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
13
Outline
• The SEEK Project
• Scientific Workflows
• The Problem: Reusing Structurally
Incompatible Services
• The Ontology-Driven Framework
• Future Work
14. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
14
Service Reusability
A scientist wishes to connect two
(independent) services
Source
Service
Target
Service
Ps Pt
Desired Connection
15. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
15
Service Reusability
In Ptolemy II/Kepler (and in web services),
input and output ports (message parts)
have structural types (XML Schema)
Source
Service
Target
Service
Ps Pt
Structural
Type Pt
Structural
Type Ps
Desired Connection
16. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
16
Service Reusability
Unless “designed to fit,” independent
services are structurally incompatible
è Generally, the source output type will not
be a subtype of the target input type
Source
Service
Target
Service
Ps Pt
Structural
Type Pt
Structural
Type Ps
Desired Connection
Incompatible
(⋠)
17. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
17
Service Reusability
A transformation mapping (d) is required to
connect the services … artificially
creating subtype compatibility
If such a d exists, the services are
“structurally feasible”
Source
Service
Target
Service
Ps Pt
Structural
Type Pt
Structural
Type Ps
Desired Connection
Incompatible
(⋠)
d(Ps)
d (≺)
18. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
18
Service Reusability
SEEK annotates services with semantic types
for discovery and interoperability of services
Source
Service
Target
Service
Ps Pt
Ontologies (OWL)
Semantic
Type Ps
Semantic
Type Pt
Desired Connection
Compatible (⊑)
19. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
19
Service Reusability
Services can be semantically compatible,
but structurally incompatible
Source
Service
Target
Service
Ps Pt
Semantic
Type Ps
Semantic
Type Pt
Structural
Type Pt
Structural
Type Ps
Desired Connection
Incompatible
Compatible
(⋠)
(⊑)
d(Ps)
d (≺)
Ontologies (OWL)
22. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
22
Example Semantic Types
Portion of SEEK measurement ontology
MeasContext
Observation EntityMeasProperty
hasContext 0:*
1:1
appliesTo
hasProperty
0:*
Accuracy
Qualifier
Ecological
Property
Abundance
Count
LifeStage
Property
Numeric
Value
Spatial
Location
hasLocation
hasCount
1:1
1:1
hasValue
1:1
itemMeasured
1:*
Same in OWL, a description logic standard (here, Sparrow syntax):
Observation subClassOf forall hasContext/MeasContext and
forall hasProperty/MeasProperty and
exists itemMeasured/Entity.
MeasContext subClassOf exists appliesTo/Entity and
atmost 1/appliesTo.
EcologicalProperty subClassOf Entity.
LifeStageProperty subClassOf EcologicalProperty.
AbundanceCount subClassOf EcologicalProperty and
exists hasLocation/SpatialLocation and
atMost 1/hasLocation and
exists hasCount/NumericValue and
atMost 1/hasCount.
23. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
23
Example Semantic Types
Semantic types for P2 and P3
S1
(life stage property)
S2
(mortality rate
for period)
P1
P2
P4
P3 P5
Observation
semType(P3)
MeasContext
hasContext
1:1
appliesTo LifeStage
Property1:1
Abundance
Count
itemMeasured Number
Value
hasCount
1:11:1
semType(P2)
⊑
Accuracy
Qualifier
hasProperty
1:1
hasValue
1:1
24. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
24
Example Semantic Types
Semantic types for P2 and P3
S1
(life stage property)
S2
(mortality rate
for period)
P1
P2
P4
P3 P5
Observation
semType(P3)
MeasContext
hasContext
1:1
appliesTo LifeStage
Property1:1
Abundance
Count
itemMeasured Number
Value
hasCount
1:11:1
semType(P2)
⊑
Accuracy
Qualifier
hasProperty
1:1
hasValue
1:1
semType(P3) subClassOf Observation and
exists hasContext/(MeasurementContext and
exists appliesTo/LifeStageProperty and
atMost 1/appliesTo) and
exists itemMeasured/AbundanceCount and
atMost 1/itemMeasured.
semType(P2) subClassOf Observation and
exists hasContext/(MeasurementContext and
exists appliesTo/LifeStageProperty and
atMost 1/appliesTo) and
exists itemMeasured/AbundanceCount and
atMost 1/itemMeasured and
exists hasProperty/AccuracyQualifier and
atMost 1/hasProperty.
25. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
25
Outline
• The SEEK Project
• Scientific Workflows
• The Problem: Reusing Structurally
Incompatible Services
• The Ontology-Driven Framework
• Future Work
26. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
26
The Ontology-Driven Framework
Define semantic registration mappings
(“semantic views”) to connect structural
and semantic types
Use registration mappings to (semi-)
automate transformation, based on
derived structural correspondences
Depending on the ontologies and registration
mappings, it may not be possible to find an
appropriate d …
(since the correspondence is often under-
specified)
27. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
27
The Ontology-Driven Framework
Source
Service
Target
Service
Ps Pt
Semantic
Type Ps
Semantic
Type Pt
Structural
Type Pt
Structural
Type Ps
Desired Connection
Compatible (⊑)
Registration
Mapping (Output)
Registration
Mapping (Input)
Ontologies (OWL)
40. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
40
The Ontology-Driven Framework
Source
Service
Target
Service
Ps Pt
Semantic
Type Ps
Semantic
Type Pt
Structural
Type Pt
Structural
Type Ps
Desired Connection
Compatible (⊑)
Registration
Mapping (Output)
Registration
Mapping (Input)
Correspondence
Generate d(Ps)
Ontologies (OWL)
Transformation
41. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
41
Example Result (XQuery)
Based on the structural correspondences
and certain assumptions, we derive the
transformation XQuery:
<cohortTable>
{ for $s in /population/sample return
<measurement>
{ for $c in $s/meas/cnt return <obs>{$c/text()}</obs> }
{ for $l in $s/lsp return <phase>{$l/text()}</phase> }
</measurement>
}
</cohortTable>
42. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
42
Assumptions Made
(or why this may not work for you…)
• Common XPath prefixes refer to the same
element
• Elements in correspondences have
compatible cardinalities
– source is equivalent or stricter than target
(e.g., + is stricter than *)
• Primitive data types are compatible
43. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
43
Framework Operations and Properties
In the paper, we define:
– A semantic registration mapping R as a set of rules q↔p,
where q is a substructure selection (query) and p is a
contextual path (a path in an ontology)
– A structural correspondence as a rule qs®qt, where qs
and qt are substructure selections over the source and
target, resp.
– The semantic composition of registration mappings Rs
and Rt, which returns a set of structural correspondence
rules
– The semantic subpath operation (subconcept), which
is used by the semantic composition to find matching
substructure selection rules
44. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
44
Framework Operations and Properties
In the paper, we define:
– Registration mapping properties (cardinality
consistency and partial complete registrations) and
discuss the impact on determining structural
transformations
– The simple XPath and Semantic Path languages for
defining registration mappings, and the corresponding
semantic join operator to find correspondences
45. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
45
Outline
• The SEEK Project
• Scientific Workflows
• The Problem: Reusing Structurally
Incompatible Services
• The Ontology-Driven Framework
• A Simple Framework Implementation
• Future Work
46. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
46
Future Work
• Extend the registration mapping language
– XPath is too limited …
è try a more general query language (e.g.,
XPath + variables)
è relational/Datalog based substructure
selection (query)
• Formalize the properties of registration
mappings and their effect on automated
transformation
• Introduce conversion routines (e.g., for
units) at the ontology level; apply them in
transformations
• Extend transformations to different
computation models and workflow
scheduling algorithms
• Add to the Kepler Scientific Workflow
System
47. Bowers & Ludäscher – Ontology-Driven Data Transformations, DILS’04, Leipzig
47
Acknowledgements
• NSF/ITR Science Environment for Ecological Knowledge
• NSF/ITR Geosciences Network
• NIH Biomedical Informatics
Research Network
• DOE Scientific Data
Management Center