Scientists publish computational experiments in ways that do not facilitate reproducibility or reuse. Significant domain expertise, time and effort are required to understand scientific experiments and their research outputs. In order to improve this situation, mechanisms are needed to capture the exact details and the context of computational experiments. Only then, Intelligent Systems would be able help researchers understand, discover, link and reuse products of existing research.
In this presentation I will introduce my work and vision towards enabling scientists share, link, curate and reuse their computational experiments and results. In the first part of the talk, I will present my work for capturing and sharing the context of scientific experiments by using scientific workflows and machine readable representations. Thanks to this approach, experiment results are described in an unambiguous manner, have a clear trace of their creation process and include a pointer to the sources used for their generation. In the second part of the talk, I will describe examples on how the context of scientific experiments may be exploited to browse, explore and inspect research results. I will end the talk by presenting new ideas for improving and benefiting from the capture of context of scientific experiments and how to involve scientists in the process of curating and creating abstractions on available research metadata.
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
Ā
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
1. Capturing Context in Scientific Experiments:
Towards Computer-Driven Science
Daniel Garijo
Information Sciences Institute and
Department of Computer Science
https://w3id.org/people/dgarijo
@dgarijov
dgarijo@isi.edu
2. A prediction of the futureā¦ from the past
Useful for:
ā¢ Every day tasks
ā¢ Organize agenda
ā¢ Calls
ā¢ Look for information
ā¢ Research features
ā¢ Summarize related work
ā¢ Reuse and comparison of
work
ā¢ Highlights
ā¢ Do new data analyses
Capturing Context in Scientific Experiments: Towards Computer-Driven Science 2
Source: https://www.businessinsider.com.au/apple-future-computer-knowledge-navigator-john-sculley-george-lucas-2017-10,
https://www.youtube.com/watch?v=QRH8eimU_20
The knowledge navigator (Apple, 1987)
3. Meeting expectationsā¦
ā¢ In terms of Data
ā¢ Open datasets
ā¢ Open metadata portals
ā¢ In terms of Software
ā¢ Open Source repositories
ā¢ Containers and virtual machines
ā¢ In terms of Publications
ā¢ Open journals
ā¢ Open methods/protocols
3Capturing Context in Scientific Experiments: Towards Computer-Driven Science
4. What are we missing?
ā¢ Methods in publications are not designed for intelligent systems
ā¢ Objectives, hypotheses, methodology and conclusions are tailored for humans
ā¢ Link between data, software and publications is not clear (if exists)
ā¢ Functionality and instructions for executing software requires specific
domain expertise
ā¢ Publications are difficult to reuse and reproduce
4
Retracted Scientiļ¬c Studies: A Growing List - NYTimes.com
Sections Home Search Skip to content
Advertisement
Email
Share
Tweet
More
Search
Subscribe
Log In 0 Settings
Close search
search sponsored by
Search NYTimes.com
SUBSCRIBE NOW
5/ 29/ 15, 1:49 AMRetracted Scientiļ¬c Studies: A Growing List - NYTimes.com
The retraction by Science of a study of changing attitudes about gay marriage is
the latest prominent withdrawal of research results from scientific literature.
And it very likely won't be the last. A 2011 study in Nature found a 10-fold
increase in retraction notices during the preceding decade.
Many retractions barely register outside of the scientific field. But in some
instances, the studies that were clawed back made major waves in societal
discussions of the issues they dealt with. This list recounts some prominent
retractions that have occurred since 1980.
Photo
In 1998, The Lancet, a British medical journal,
published a study by Dr. Andrew Wakefield
that suggested that autism in children was
caused by the combined vaccine for measles,
mumps and rubella. In 2010, The Lancet
retracted the study following a review of Dr.
Wakefield's scientific methods and financial
conflicts.
Despite challenges to the study, Dr.
Wakefield's research had a strong effect on
many parents. Vaccination rates tumbled in
Britain, and measles cases grew. American
antivaccine groups also seized on the research. The United States had more
cases of measles in the first month of 2015
than the number that is typically diagnosed in a full year.
Vaccinesand
Autism
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
5. The Cost of Reproducibility
5
ā¢ Necessary to fill in the gaps
ā¢ 2 months of effort in reproducing published method [Kinnings et al, PLOS 2010]
ā¢ Authors expertise was required
Comparison of
ligand binding
sites
Comparison of dissimilar
protein structures
Graph network
generation
Molecular Docking
[Garijo et al PLOS]
Collaboration with UCSD
5Capturing Context in Scientific Experiments: Towards Computer-Driven Science
6. Scientist-Driven Science
6
Scientist
Scientist +
Automated
Tools
Scientist +
Intelligent
System
Intelligent Systems help:
ā¢ Comparing
ā¢ Reusing/Repurposing
ā¢ Testing new hypotheses
ā¢ Explaining results
Requirements:
ā¢ Functionality
ā¢ Relations between data,
software and method
ā¢ Provenance
Scientists:
ā¢ Keep their own records
ā¢ Write their own software
ā¢ Data cleaning
ā¢ Reformatting
ā¢ Analysis
ā¢ Run the experiments
ā¢ Manually analyze results
and compare to state of
the art
Automated Tools help:
ā¢ Searching
ā¢ Setting up execution
ā¢ Visualizing
ā¢ Sharing
Requirements
ā¢ Data/Dataset metadata
ā¢ Software/Software
metadata
ā¢ Method description
ā¢ User/domain expertise
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Context of a computational experiment
7. Outline
ā¢ Capturing and publishing context of computational experiments
ā¢ From scientific workflows to Linked Data
ā¢ Capturing software functionality
ā¢ Representing software metadata
ā¢ Using context to facilitate reusability and exploration of experiments
ā¢ Detecting commonalities among experiments
ā¢ Explaining computational results
ā¢ Using context in Intelligent Systems
ā¢ Hypothesis testing
ā¢ Environmental sciences modeling
ā¢ A vision for context capture in computer-driven science
7Capturing Context in Scientific Experiments: Towards Computer-Driven Science
8. Introduction
Lab book
Digital Log
Laboratory Protocol
(recipe)
Scientific Workflow
Experiment
In silico experiment
8
Background: Computational Experiments
Capturing Context in Scientific Experiments: Towards Computer-Driven Science 8
9. Outline
ā¢ Capturing and publishing context of computational experiments
ā¢ From scientific workflows to Linked Data
ā¢ Capturing software functionality
ā¢ Representing software metadata
ā¢ Using context to facilitate reusability and exploration of experiments
ā¢ Detecting commonalities among experiments
ā¢ Explaining computational results
ā¢ Using context in Intelligent Systems
ā¢ Hypothesis testing
ā¢ Environmental sciences modeling
ā¢ A vision for context capture in computer-driven science
9Capturing Context in Scientific Experiments: Towards Computer-Driven Science
11. Requirements
Workflow template description
Workflow execution trace description
Workflow attribution
Workflow metadata
Link between templates and executions
Requirements for workflow Representation
[Garijo et al., 2017 FGCS]
Plan: P-Plan [Garijo et al 2012]
http://purl.org/net/p-plan
Provenance: PROV (W3C)
[Lebo et al 2013]
http://www.w3.org/ns/prov#
Dublin Core, PROV (W3C)
11Capturing Context in Scientific Experiments: Towards Computer-Driven Science
13. Publishing workflows as Linked Data
Specification
Why Linked Data?
ā¢Facilitates exploitation of workflow resources in an homogeneous manner
Adapted methodology from [VillazĆ³n-Terrazas et al 2011]
Tested it for the WINGS workflow system
1
Base URI = http://www.opmw.org/
Ontology URI = http://www.opmw.org/ontology/
Assertion URI = http://www.opmw.org/export/resource/ClassName/instanceName
Examples:
http://www.opmw.org/export/resource/WorkflowTemplate/ABSTRACTSUBWFDOCKING
http://www.opmw.org/export/resource/WorkflowExecutionAccount/ACCOUNT1348629
350796
Publishing scientific workflows as Linked Data
14Capturing Context in Scientific Experiments: Towards Computer-Driven Science
14. Publishing workflows as Linked Data
Why Linked Data?
ā¢Facilitates exploitation of workflow resources in an homogeneous manner
Adapted methodology from [VillazĆ³n-Terrazas et al 2011]
Tested it for the WINGS workflow system
Publishing scientific workflows as Linked Data
Specification Modeling
1 2
OPMW
P-Plan
OPM DC
PROV
15Capturing Context in Scientific Experiments: Towards Computer-Driven Science
15. Publishing workflows as Linked Data
Why Linked Data?
ā¢Facilitates exploitation of workflow resources in an homogeneous manner
Adapted methodology from [VillazĆ³n-Terrazas et al 2011]
Tested it for the WINGS workflow system
Publishing scientific workflows as Linked Data
16Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Specification Modeling Generation
1 2 3
Workflow system
Workflow
Template
Workflow
execution
OPMW
export
OPMW
RDF
16. Publishing workflows as Linked Data
Why Linked Data?
ā¢Facilitates exploitation of workflow resources in an homogeneous manner
Adapted methodology from [VillazĆ³n-Terrazas et al 2011]
Tested it for the WINGS workflow system
Publishing scientific workflows as Linked Data
17Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Specification Modeling Generation Publication
1 2 3 4
RDF
Triple
store
Permanent
web-
accessible
file
store
RDF Upload Interface
SPARQL
Endpoint
OPMW
RDF
17. Publishing workflows as Linked Data
Why Linked Data?
ā¢Facilitates exploitation of workflow resources in an homogeneous manner
Adapted methodology from [VillazĆ³n-Terrazas et al 2011]
Tested it for the WINGS workflow system
Publishing scientific workflows as Linked Data
18Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Specification Modeling Generation Publication
1 2 3 4
Exploitation
5
Curl Linked Data Browser SPARQL
endpoint
Workflow explorer
18. Outline
ā¢ Capturing and publishing context of computational experiments
ā¢ From scientific workflows to Linked Data
ā¢ Capturing software functionality
ā¢ Representing software metadata
ā¢ Using context to facilitate reusability and exploration of experiments
ā¢ Detecting commonalities among experiments
ā¢ Explaining computational results
ā¢ Using context in Intelligent Systems
ā¢ Hypothesis testing
ā¢ Machine learning analysis
ā¢ Environmental sciences modeling
ā¢ A vision for context capture in computer-driven science
18Capturing Context in Scientific Experiments: Towards Computer-Driven Science
19. Capturing software functionality
[Garijo et al 2014a] (Collaboration with U. of Manchester)
Is it possible to generalize workflow steps based on their functionality in an
experiment?
19Capturing Context in Scientific Experiments: Towards Computer-Driven Science
ā¢ What kind of data manipulations are performed in a workflow?
ā¢E.g.:
ā¢Data retrieval
ā¢Data preparation
ā¢Data curation
ā¢Data visualization
ā¢ etc.
20. Capturing software functionality
[Garijo et al 2014a] (Collaboration with U. of Manchester)
Analyzed software steps of 260 workflows from 4 different workflow systems
Created a catalog of workflow step functionalities (motifs)
Guidelines for annotating workflows
Catalog available at: http://purl.org/net/wf-motifs#
20Capturing Context in Scientific Experiments: Towards Computer-Driven Science
= 260 workflows
89 12526 20
21. Outline
ā¢ Capturing and publishing context of computational experiments
ā¢ From scientific workflows to Linked Data
ā¢ Capturing software functionality
ā¢ Representing software metadata
ā¢ Using context to facilitate reusability and exploration of experiments
ā¢ Detecting commonalities among experiments
ā¢ Explaining computational results
ā¢ Using context in Intelligent Systems
ā¢ Hypothesis testing
ā¢ Environmental sciences modeling
ā¢ A vision for context capture in computer-driven science
21Capturing Context in Scientific Experiments: Towards Computer-Driven Science
22. Capturing Software Metadata
[Gil et al 2015]
ā¢ Scientific workflows capture some software metadata
ā¢ High amount of software not used in scientific workflows
ā¢ Software in open repositories often have missing metadata
ā¢ How to use it?
ā¢ What can I use it with?
ā¢ What are the dependencies?
ā¢ Is it still maintained?
ā¢ How can I contribute?
ā¢ ā¦
ā¢ Ontology for scientific software metadata
ā¢ Described with scientist in mind:
ā¢ How can scientist contribute to populate it?
ā¢ What do scientists need in terms of software?
22Capturing Context in Scientific Experiments: Towards Computer-Driven Science
23. Software Metadata: Categories
23Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Used in the OntoSoft
metadata Registry:
http://ontosoft.org/portals
http://ontosoft.org/software
24. Using the ontology in the Ontosoft software registry
24Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Software entries
from distributed
repositories are
readily accessible
Semantic
search
Comparison matrix
of software entries
PIHM PIHMgis DrEICH TauDEM WBMsed
nto$
o%$
Metadata
completion
highlighted
Software is
contrasted
by property
25. Outline
ā¢ Capturing and publishing context of computational experiments
ā¢ From scientific workflows to Linked Data
ā¢ Capturing software functionality
ā¢ Representing software metadata
ā¢ Using context to facilitate reusability and exploration of experiments
ā¢ Detecting commonalities among experiments
ā¢ Explaining computational results
ā¢ Using context in Intelligent Systems
ā¢ Hypothesis testing
ā¢ Environmental sciences modeling
ā¢ A vision for context capture in computer-driven science
25Capturing Context in Scientific Experiments: Towards Computer-Driven Science
26. Detecting commonalities in computational experiments
[Garijo et al 2014b]
PROBLEMS to address:
ā¢ Workflows have many detailed steps and may be difficult to understand
ā¢ The general method may not apparent
ā¢ How are different workflow related?
ā¢ What steps do they have in common?
26Capturing Context in Scientific Experiments: Towards Computer-Driven Science
A
B
C
A
F
D
A
B
C
G
B
H
A
B
F
B
E
Common workflow fragments
Workflow 1 Workflow 2 Workflow 3
27. 1
2
3
4
28Capturing Context in Scientific Experiments: Towards Computer-Driven Science
A method for detecting reusable workflow fragments
[Garijo et al 2014b]
Dataset
Stemmer
algorithm
Result
Term weighting
algorithm
FinalResult
Stemmer
algorithm
Term weighting
algorithm
Duplicated workflows are removed
Single-step workflows are removed
28. 1
2
3
4
29Capturing Context in Scientific Experiments: Towards Computer-Driven Science
A method for detecting reusable workflow fragments
[Garijo et al 2014b]
Popular graph mining techniques
Inexact FSM: usage of heuristics to calculate
similarity between two graphs. The solution
might not be complete
Exact FSM: deliver all the possible fragments to be
found the dataset.
29. 1
2
3
4
30Capturing Context in Scientific Experiments: Towards Computer-Driven Science
A method for detecting reusable workflow fragments
[Garijo et al 2014b]
Remove redundant fragments
30. 1
2
3
4
31Capturing Context in Scientific Experiments: Towards Computer-Driven Science
A method for detecting reusable workflow fragments
[Garijo et al 2014b]
Link fragments back to the workflows
where they were found
http://purl.org/net/wf-fd
31. ?
Research question: Are our proposed workflow fragments useful?
ā¢A fragment is useful if it has been designed and (re)used by a user.
ā¢Comparison between proposed fragments and user designed fragments
(groupings) and workflows
Workflow fragment assessment
32Capturing Context in Scientific Experiments: Towards Computer-Driven Science
32. ?
Workflow fragment assessment
33Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Metrics: Precision and recall
Fragments
(F)
Workflows
(W)
Groupings
(G)
33. ?
Workflow fragment assessment
34Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Workflow corpora
User Corpus 1 (WC1)
ā¢ Designed mostly by a single a single user
ā¢ 790 workflows (475 after data preparation)
User Corpus 2 (WC2)
ā¢ Created by a user, with collaborations of others
ā¢ 113 workflows (96 after data preparation)
Multi User Corpus 3 (WC3)
ā¢ Workflows submitted by 62 users during the month of Jan 2014
ā¢ 5859 workflows (357 after data preparation)
User Corpus 4 (WC4)
ā¢ Designed mostly by a single a single user
ā¢ 53 workflows (50 after data preparation)
34. ?
Workflow fragment assessment
35Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Result assessment
ā¢30%-60% of proposed fragments are equal to user defined groupings or
workflows
ā¢40%-80% of proposed of proposed fragments are equal or similar to user
defined groupings or workflows
Commonly occurring patterns are potentially useful for users designing workflows
What about the rest of the fragments? Are those useful?
35. ?
Workflow fragment assessment
36Capturing Context in Scientific Experiments: Towards Computer-Driven Science
User feedback: user survey
Q1: Would you consider the proposed fragment a valuable grouping?
ā¢I would not select it as a grouping (0)
ā¢I would use it as a grouping with major changes (i.e., adding/removing more than 30% of the steps) (1)
ā¢I would use it as a grouping with minor changes (i.e., adding/removing less than 30% of the steps) (2).
ā¢I would use it as a grouping as it is (3)
Q2: What do you think about the complexity of the fragment?
ā¢The fragment is too simple (0)
ā¢The fragment is fine as it is (1)
ā¢The fragment has too many steps (2)
Not enough evidence to state that all proposed workflow fragments are useful
36. Outline
ā¢ Capturing and publishing context of computational experiments
ā¢ From scientific workflows to Linked Data
ā¢ Capturing software functionality
ā¢ Representing software metadata
ā¢ Using context to facilitate reusability and exploration of experiments
ā¢ Detecting commonalities among experiments
ā¢ Explaining computational results
ā¢ Using context in Intelligent Systems
ā¢ Hypothesis testing
ā¢ Environmental sciences modeling
ā¢ A vision for context capture in computer-driven science
36Capturing Context in Scientific Experiments: Towards Computer-Driven Science
37. Using captured context to explain results
[Gil and Garijo 2016]
Current methods in paper are ambiguous, incomplete and described at
inconsistent levels of detail
Comparison of
ligand binding
sites
Comparison of dissimilar
protein structures
Graph network
generation
Molecular Docking
The SMAP software was used to
compare the binding sites of the 749
M.tb protein structures plus 1,446
homology models (a total of 2,195
protein structures) with the 962 binding
sites of 274 approved drugs, in an all-
against-all manner. While the
binding sites of the approved drugs
were already defined by the bound
ligand, the entire protein surface of each
of the 2,195 M.tb protein structures
was scanned in order to identify
alternative binding sites. For each
pairwise comparison, a P -value
representing the significance of the
binding site similarity was calculated.
38Capturing Context in Scientific Experiments: Towards Computer-Driven Science
38. Using captured context to explain results
[Gil and Garijo 2016]
Current methods in paper are ambiguous, incomplete and described at
inconsistent levels of detail
Goal: Automatically generate reports from computer-generated data
analysis records
ā¢ Reports must:
ā¢ Be truthful to actual events
ā¢ Enable inspection
ā¢ Be human-understandable
ā¢ Abstract details
ā¢ Ideally:
ā¢ Become part of papers
ā¢ Have persistent evidence
ā¢ Be adapted to different audiences/expertise/purpose
39Capturing Context in Scientific Experiments: Towards Computer-Driven Science
39. Data Narratives
1. A record of events that describe a new result
ā¢ A workflow and/or provenance of all the computations executed
2. Persistent entries for key entities involved
ā¢ URIs/DOIs for data, software versions, workflow,ā¦
3. Narrative account(s)
ā¢ Human-consumable rendering(s) that includes pointers to the detailed
records and entries
ā¢ Each account is generated for a different audience/purpose
ā¢ A casual reader, a close colleague, someone inspecting how the work
was done, someone reproducing the work
40Capturing Context in Scientific Experiments: Towards Computer-Driven Science
40. Data Narrative Accounts: An example
40
āTopic modeling was run on the Reuters R8 dataset (10.6084/
m9.figshare.776887), and English Words dataset
(10.6084/m9.figshare.776888), with iterations set to 100, stop word
size set to 3, number of topics set to 10 and batch size set to 10.
The results are at 10.6084/m9.figshare.776856ā
āThe topics at 10.6084/m9.figshare.776856 were found
in the Reuters R8 dataset
(10.6084/m9.figshare.776887) and English Words
dataset (10.6084/m9.figshare.776888)ā
ā¢ Execution view
ā¢ Inputs, parameters and main outputs
ā¢ Data view
ā¢ Just the data that influenced the results
ā¢ Method view
ā¢ Main steps based on their functionality
āTopic training was run on the input dataset. The results are
product of PlotTopics, a visualization stepā
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
41. ā¢ Dependency view
ā¢ How the steps depend on each other
ā¢ Implementation view
ā¢ How the steps were implemented in the execution
ā¢ Software view
ā¢ Details on the software used to implement the steps
Data Narrative Accounts: An example
41
āFirst, the input data is filtered by Stop Words, followed by Small
Words, Format Dataset, and Train Topics. The final results are
produced by Plot Topicsā
āTrain topics was implemented using Latent Dirichlet allocationā
āThe train topics step was generated with Online LDA open source
software, written in Java. Plot topics was generated with the Termite
software.ā
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
42. DANA: DAta NArratives
42
Experiment
Records
Provenance
RepositoryExperiment-
specific
Knowledge Base
DANA Generator
Narrative
accounts Software
registry
Query
patterns
Data Narrative aggregator
Input
Resource
request
Response
Resource
request
Response
Output
Get query Pattern
result
Get
pattern
1. Identify which experiment records to describe
2. Generation of an Experiment-specific knowledge base
3. Creation of the Data Narrative from templates
4. Produce narrative accounts
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
https://knowledgecaptureanddiscovery.github.io/DataNarratives/
43. Formative evaluation
ā¢ Survey with 6 target scenarios
ā¢ Each scenario:
ā¢ Description of a situation where a user has to do a task
ā¢ A workflow sketch of the analysis done
ā¢ Six candidate narratives of that workflow sketch.
ā¢ 12 responses from users
ā¢ Results
ā¢ Each narrative is considered appropriate for describing some scenario
ā¢ Different users chose different narratives for each scenario
43Capturing Context in Scientific Experiments: Towards Computer-Driven Science
44. Outline
ā¢ Capturing and publishing context of computational experiments
ā¢ From scientific workflows to Linked Data
ā¢ Capturing software functionality
ā¢ Representing software metadata
ā¢ Using context to facilitate reusability and exploration of experiments
ā¢ Detecting commonalities among experiments
ā¢ Explaining computational results
ā¢ Using context in Intelligent Systems
ā¢ Hypothesis testing
ā¢ Environmental sciences modeling
ā¢ A vision for context capture in computer-driven science
44Capturing Context in Scientific Experiments: Towards Computer-Driven Science
45. Using Context for Hypothesis Testing
[Gil et al 2016]
45Capturing Context in Scientific Experiments: Towards Computer-Driven Science
data
Protein PRKCDBP is expressed
in samples of patient P36
hypothesis
revision
PRKCDBP mutation
is expressed in P36
workflows meta-
workflows
Wf#0# Wf#1# Wf#2#
simMetrics#
com parison*
hypothesis#
revisedHyp#
hypothesisRevision*
46. Hypothesis Testing: My Contribution
[Garijo et al 2017]
46Capturing Context in Scientific Experiments: Towards Computer-Driven Science
HG2 HE2
HG1
HE1
HS2
Protein
EGFR
Colon
Cancer
SubtypeA
Associated
With
revisionOf
HS1
Protein
EGFR
Colon
Cancer
Associated
With
wasGeneratedBy
Execution 1
wasGeneratedBy
HQ2
Execution 2
C1
hasConfidence
Report
L2
hasConfidenceLevel
wasGeneratedBy
HQ1
C1
hasConfidence
Report
L1
hasConfidenceLevel
Statement
Qualifier
Evidence
History
The DISK Ontology: http://disk-project.org/ontology/disk/
47. Using Context for Environmental Sciences Modeling
47Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Work in progress
ā¢ Modeler wants to predict a situation
ā¢ E.g., Impact of draught in the Amazon
ā¢ Intelligent system assists:
ā¢ Finding data of interest
ā¢ Connecting environmental models:
hydrology, economy, agronomy, etc.
ā¢ Facilitating the execution of models
ā¢ Visualizing results
My contribution:
ā¢ Extending our software ontology to
capture requirements of environmental
models
ā¢ Relating variables to inputs, units, time, etc.
Albedo
Soil
moisture
Soil
quality
Precipi
tation
Comm
odity
prices
Property
rights
Market
access
Crop/forest
yields
Land
use
House
hold
type
Climate Model Hydrology Model
Economy
model
ā¦
Intelligent System
predictionsvariables
Scenario
Data Catalog
Model Catalog
48. Outline
ā¢ Capturing and publishing context of computational experiments
ā¢ From scientific workflows to Linked Data
ā¢ Capturing software functionality
ā¢ Representing software metadata
ā¢ Using context to facilitate reusability and exploration of experiments
ā¢ Detecting commonalities among experiments
ā¢ Explaining computational results
ā¢ Using context in Intelligent Systems
ā¢ Hypothesis testing
ā¢ Environmental sciences modeling
ā¢ A vision for context capture in computer-driven science
48Capturing Context in Scientific Experiments: Towards Computer-Driven Science
49. Where are we headed?
49
Scientist Driven Science Computer Driven Science
Scientist
Scientist +
Automated
Tools
Scientist +
Intelligent
System
Intelligent
System +
Scientist
ā¢ Can an Intelligent System co-author a paper? Can it be an author?
ā¢ Can it win a Nobel prize? [Kitano, ISWC 2016]
ā¢ What do we need to capture (in Software, Data, Methods, Provenance)?
1. Functionality and abstraction
2. Granularity
3. Importance
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
50. Next steps for context capture in
computational experiments
ā¢ Capturing different levels of abstraction in experiments
ā¢ Using user expertise to curate captured context
ā¢ What do users consider important?
ā¢ Improve explanation of details
ā¢ How can we identify the core function of a
software step?
ā¢ Represent the goal and objectives of a
computational experiment
50Capturing Context in Scientific Experiments: Towards Computer-Driven Science
RDF
Triple
store
51. Summing up
ā¢ Context is needed to understand and reuse computational experiments
ā¢ Sharing context from computational experiments
ā¢ Scientific workflows and their executions
ā¢ Software functionality and metadata
ā¢ Getting value out of context
ā¢ Reusability, exploration, explanation
ā¢ Used to power intelligent systems!
ā¢ Next steps
ā¢ Representing functionality and levels of abstraction
ā¢ Interact with users to curate context
51Capturing Context in Scientific Experiments: Towards Computer-Driven Science
52. Special thanks
ā¢ Yolanda Gil
ā¢ Varun Ratnakar
ā¢ Oscar Corcho
ā¢ Pinar Alper
ā¢ Khalid Belhajjame
ā¢ Asuncion Gomez Perez
ā¢ Idafen Santana Perez
ā¢ Felisa Verdejo
ā¢ Francisco Garijo
52Capturing Context in Scientific Experiments: Towards Computer-Driven Science
53. References
ā¢ [Kinnings et al, PLOS 2010]: Kinnings SL, Xie L, Fung KH, Jackson RM, Xie L, Bourne PE (2010) The
Mycobacterium tuberculosis Drugome and Its Polypharmacological Implications. PLoS Comput Biol
6(11): e1000976. https://doi.org/10.1371/journal.pcbi.1000976
ā¢ [Garijo et al PLOS]: Garijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. (2013) Quantifying
Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome. PLoS ONE 8(11):
e80278. https://doi.org/10.1371/journal.pone.0080278
ā¢ [Garijo et al 2014a]: Garijo, D.; Alper, P.; Belhajjame, K.; Corcho, O.; Gil, Y.; and Goble, C .Common motifs
in scientific workflows: An empirical analysis. Future Generation Computer Systems, 36: 338--351. 2014.
ā¢ [Garijo et al 2014b]: Garijo, D.; Corcho, O.; Gil, Y.; Gutman, B. A; Dinov, I. D; Thompson, P.; and Toga, A
Fragflow automated fragment detection in scientific workflows. W In e-Science (e-Science), 2014 IEEE
10th International Conference on, volume 1, pages 281--289, 2014. IEEE
ā¢ [Garijo and Gil 2016]: Gil, Y.; and Garijo, D. Towards Automating Data Narratives. In Proceedings of the
22nd International Conference on Intelligent User Interfaces, pages 565--576, 2017. ACM
ā¢ [Garijo et al 2017]: Garijo, D.; Gil, Y.; and Ratnakar, V. The DISK Hypothesis Ontology: Capturing
Hypothesis Evolution for Automated Discovery. In Proceedings of the Workshop on Capturing Scientific
Knowledge (SciKnow), held in conjunction with the ACM International Conference on Knowledge Capture
(K-CAP), Austin, Texas, 2017.
ā¢ [Garijo et al 2017 FGCS]: Garijo, D.; Gil, Y.; and Corcho, O. Abstract, link, publish, exploit: An end to end
framework for workflow sharing. Future Generation Computer Systems, . 2017.
ā¢ [Gil et al 2015]: Gil, Y.; Ratnakar, V.; and Garijo, D. OntoSoft: Capturing scientific software metadata. In
Proceedings of the 8th International Conference on Knowledge Capture, pages 32, 2015. ACM
ā¢ [Kitano ISWC 2016]: Kitano, H. Artificial Intelligence to Win the Nobel Prize and Beyond: Creating the
Engine for Scientific Discovery. Keynote http://iswc2016.semanticweb.org/pages/program/keynote-
kitano.html
53Capturing Context in Scientific Experiments: Towards Computer-Driven Science
54. Capturing Context in Scientific Experiments:
Towards Computer-Driven Science:
Daniel Garijo
Information Sciences Institute and
Department of Computer Science
https://w3id.org/people/dgarijo
@dgarijov
dgarijo@isi.edu
Editor's Notes
This slide details what we can do to fix the current situation
Data driven, usually represented as Directed Acyclic Graphs (DAGs)
State the benefits (briefly)
Workflow template and instance: steps and their dependencies
Workflow execution trace: provenance of the results
Experiment metadata: specific methods, author contribution, etc.
P-Plan is simple and extensible (to cater to cases that require more complex wf operators)
Say that P-Plan has been used for describing scientific processes in social sciences and lab protocols
State that the focus is workflow description
Explain that this is necessary to relate software together. And for capturing the role of software in a experiment
Overview of the steps here. Say clearly that
Overview of the steps here. Say clearly that
Overview of the steps here. Say clearly that
Overview of the steps here. Say clearly that
Motivation.
Motivation.
Functionality: Relation between similar software, data and methods. GOALS of a method.
Granularity: What level of detail is needed to communicate a finding?
Importance: What analysis are important? What are the most important steps?
In this slide, I could mention potential collaboration opportunities, such as AMRs and work from Gully to represent tables from papers