TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Executable papers
1. Executable Papers:
publishing science
that works
Anita de Waard, Elsevier Labs
HCLS Scientific Discourse Group
June 20, 2011
2. Elsevier Challenges
Goals:
- Invite and survey ideas in innovative science publishing
- Create a community of people working on similar issues, from different
backgrounds/viewpoints
Rules:
- Open submission; very interdisciplinary panel of judges; open publication of
submissions
- IPR stays with author; if commercial development, Elsevier has right of first
refusal
Challenges so far:
- 2008/9: Elsevier Grand Challenge for knowledge enhancement in the life
sciences: http://www.elseviergrandchallenge.com
- 2010/11: ISMB Killer App award: rewarding bioinformatics apps that work for
biologists http://killerapp.iscb.org/
- 2011: Elsevier Executable Paper challenge: http://www.executablepapers.com/
3. Executable Paper Challenge
Driven by issues in publishing computational science:
- How can we develop a model for executable files that is compatible
with the user’s operating system and architecture and adaptable to
future systems?
- How do we manage very large file sizes?
- How do we validate data and code, and decrease the reviewer’s
workload?
- How to support registering and tracking of actions taken on the
‘executable paper?’
Coorganised with International Conf on Computational Sci http://www.iccs-
meeting.org:
- For high-performance and (geo/eco/bio/chem)‘-informatics’ fields
- Actually challenge participants were a different community!
4. The Finalists:
http://www.executablepapers.com/finalists.html
1.SHARE - a web portal for creating and sharing executable research
papers http://sites.google.com/site/executablepaper/
2.A data and code model for reproducible research and executable papers http://
dirac.cnrs-orleans.fr/~hinsen/executable_paper_challenge.tar.gz
3.A-R-E: The Author-Review-Execute environment
http://iwb.fluidops.com:7878/resource/AREpaper
4.Planetary System: Web 3.0 and Active Documents
https://trac.mathweb.org/planetary/wiki/EPCDemo
5.Paper Mache: Creating Dynamic Reproducible Science
http://oware.cse.tamu.edu:8080/
6.A Provenance Based Infrastructure for Creating Executable Papers
http://www.vistrails.org/index.php/ExecutablePapers
7.Universal Identifier for Computation Results
http://vcr.stanford.edu
8.R2 Platform for Reproducible Research
http://rsquared.stat.uni-muenchen.de/
9.The Collage Authoring Environment
http://collage.cyfronet.pl
5. SHARE - a web portal for creating
and sharing executable research papers
http://sites.google.com/site/executablepaper/
- built to house the submissions to the Transformation Tool Contest (TTC)
- an environment where all software and data related to the paper is
optimally installed and ready for (temporary and secure) evaluation
- a specific virtual machine image can be instantiated within the paper
- SHARE supports multiple operating systems both at the level of the remote
virtual machines as well as at the level of the connecting clients running on
the user’s machine
- more than 100 heterogenous images have been contributed by different
research communities so far
6. A-R-E: The Author-Review-Execute environment
http://iwb.fluidops.com:7878/resource/AREpaper
- A data-driven, loosely coupled, and distributed approach to support
the life cycle of an (executable) paper: authoring, reviewing, publication
and study:
- finding out which paragraph is providing the information bit
pertinent to the reference
- navigate from data points in a plot to the data items in raw
experimental data that led to these points (e.g. point to an excel
sheet column with experimental data)
- navigate into the program code that led to a specific data set
- Based on a semantic wiki:
7. A Provenance Based Infrastructure for
Creating Executable Papers
http://www.vistrails.org/index.php/ExecutablePapers
- VisTrails provides a mechanism to store provenance for workflows
- Code and plug-ins for LaTeX, Wiki, Microsoft Word, and PowerPoint
- CrowdLabs (http://www.crowdlabs.org) to allow papers to point to results
that can be executed on a remote server and interactively explored from a
Web browser
8. Universal Identifier for Computation Results
http://vcr.stanford.edu
- Verifiable Computational Result (VCR): A
computational result (eg. table, figure, chart, dataset),
together with the metadata describing in detail the
computations that created it every computation
automatically generates a detailed chronicle of its
inputs and outputs as part of the process execution.
The chronicle is automatically stored in a standard
format on a VCR repository for later access
- Verifiable Result Repository (Repository): A web-
services provider that archives VCRs and later serves
up views of specific computational results
- Verifiable Result Identifier (VRI): A URL (web address)
that universally and permanently identifies a repository
and causes it to serve up views of a specific VCRa
DOI-like string that permanently and uniquely
identifies the chronicle associated to that result and
the repository that can serve views of that chronicle.
9. The Collage Authoring Environment
http://collage.cyfronet.pl
- environment which enables authors to seamlessly embed chunks of
executable code (called assets) into scientific publications:
- input forms: used by the user to feed input data into the running
experiment
- visualizations: render an experiment result which can be directly
visualized in the research paper
- code snippets: embed an editable view of the code which enacts a
specific computation and may be used to generate additional assets
- allow repeated execution of these assets on underlying computing and data
storage resources:
10. Next step: The Executable Journal?
- Ideally, we’d like all these tools to work together
- In fact, we’d like that to be how we communicate informatics/
computer science!
- Submit a paper with a piece of working code
- The code works on the platform
- The code stays on the platform, and is available for other papers
to run on, too!
- Advantages:
- Clearer communication of software
- Less reinvention of the wheel
- More collaboration
11. In other words:
“I like the idea of [...] a research object corresponding to a
PhD thesis sitting on the (digital) library shelf and then being
re-executed as new data comes along. So the thesis sits
there and new results (or papers, or research objects) pop
out. I like this example because it involves tying down the
method and letting the data flow, instead of the widely held
view that the data sits there and methods are applied to it.
[...]
These papers then become a way of distributing data and
methods in a highly usable and user-centric way [...]. So
scientists don't need to download and install tools and learn
user interfaces.They just interact with the published
executable papers...”
Dave De Roure, email to Wf4ever group
12. What does this have to do with HCLS?
- Might be a good area to explore this in?
- E.g. interchange of annotations that we are
exploring w/Tim Clark’s group...
- Next step:
- Funding?
- Format?
- Platform?
- Thoughts??