Presentation at the event "Let's do it together: How to implement Open Science Practices in Research Projects" (29/11/2019), organised by Universidad Politécnica de Madrid, where we discuss on the need to take into account not only open access or open research data, but also all the other artefacts that are a result of our research processes.
DevEX - reference for building teams, processes, and platforms
Open Data (and Software, and other Research Artefacts) -A proper management
1. Open Data (and Software, and
other Research Artefacts):
A proper management
Seminar: Let’s Do it Together! How to implement
Open Science practices in Research Projects
Universidad Politécnica de Madrid
29/11/2019
With contributions from Esteban González, Daniel Garijo,
Idafen Santana, Olga Giraldo
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho, @opencitydata_es
https://www.slideshare.com/ocorcho
2. License
• This work is licensed under the license
CC BY-NC-SA 4.0 International
• http://purl.org/NET/rdflicense/cc-by-nc-sa4.0
• You are free:
• to Share — to copy, distribute and transmit the work
• to Remix — to adapt the work
• Under the following conditions
• Attribution — You must attribute the work by inserting
• “[source Oscar Corcho]” at the footer of each reused slide
• a credits slide stating: “These slides are partially based on
“Open Data (and Software, and other Research Artefacts):
A proper management” by O. Corcho”
• Non-commercial
• Share-Alike
3. The key messages of my talk...
Open Science ≠ Open Access
Science is not only about papers (other objects exist)
Open Science = Open Access + Research Data Management
+ Research Object Management
We all need principled approaches and clear guidelines
(community or institution driven) to adopt an Open
Science approach
We expect this (non-extra) work to pay off in the future
4. Outline
• From Open (Government) Data to Open Science
• Our previous OEG-UPM research and development
to support Open Science practices
• Research Objects
• Systematic (Meta)Data Management in Research
• Ontology-based Representation of Laboratory Protocols
• Reproducibility of Computational Experiments
• Our (practical) understanding of Open Science and
current practices at OEG-UPM
5. Outline
• From Open (Government) Data to Open Science
• Our previous OEG-UPM research and development
to support Open Science practices
• Research Objects
• Systematic (Meta)Data Management in Research
• Ontology-based Representation of Laboratory Protocols
• Reproducibility of Computational Experiments
• Our (practical) understanding of Open Science and
current practices at OEG-UPM
6. What is Open (Government) Data?
• Open data is data that can be freely used, re-used
and redistributed by anyone - subject only, at most, to
the requirement to attribute and sharealike
• Key aspects:
• Availability and access: the data must be available as a
whole and at no more than a reasonable reproduction cost,
preferably by downloading over the Internet. The data must
also be available in a convenient and modifiable form.
• Re-use and redistribution: the data must be provided
under terms that permit re-use and redistribution including
the intermixing with other datasets.
• Universal participation: everyone must be able to use, re-
use and redistribute - there should be no discrimination
against fields of endeavour or against persons or groups
[source: Open Data Handbook, http://opendatahandbook.org/en/what-is-open-data/ ]
7. Relevant Legislation. Europe and Spain
• Open Access Initiative (2001). Scientific information; > 510 orgs
• Aarhus Convention (1998). Right to participate and access; 41
countries and the EU
• Convention on official documentation access (2009). 12 countries
• (Open Data and) PSI-reuse Directives (2003/98/EC, 2013/37/UE and
2019/1024)
• https://ec.europa.eu/digital-single-market/en/european-legislation-reuse-public-sector-information
• List of high-value datasets: geospatial, Earth Observation and environment, meteorological, statistics,
companies and company ownership, mobility
• Law 37/2007. PSI reuse (transposition of directive 2003/98/EC)
• Modified in law 18/2015 (BOE 10/07/2015, directive 2013/37/UE)
• 2019/1024 Directive to be transposed by 16/07/2021
• Law 11/2007. Citizen rights to access to good-quality public services
• RD 4/2010 Esquema Nacional de Interoperabilidad
• Open standards, technology neutral, open source
• RD 1495/2011 It develops Law 37/2007 for national agencies
• Norma Técnica de Interoperabilidad (19/02/2013, BOE 4/3/2013)
[source: based on material from Antonio Rodríguez Pascual (CNIG)]
9. Some of our activities in Open (Government) Data
Culture (@BNE) Geograhy (@IGN) Metereology (@AEMET)
Cities (@ Zaragoza, Gob Aragón, Catalogues)
Host of esDBpedia
UNE 178301:2015
Norm on Open Data
for Smart Cities
11. Open Scientific Data vs Open Government Data (I)
• Is Open Data in Science actually much different from Open
Government Data?
• NO
• “freely used, re-used and redistributed by anyone - subject
only, at most, to the requirement to attribute and sharealike”
• Funders encourage the generation of open research data
• E.g., guidelines on FAIR Data Management H2020
http://ec.europa.eu/research/participants/data/ref/h2020/gr
ants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
• YES
• Not such a large history of legislation
• Initially, most work focused on open access (papers)
• Not only available for use and reuse, but also for reproducibility
• It is often not useful without the rest of research artefacts that
come together with it (methods, software, protocols, papers)
12. Open Scientific Data vs Open Government Data (II)
The same explosion of Data Portals
General-purpose Region-specific
Domain-specific (e.g. Astronomy) Institution-specific
13. Open Scientific Data vs Open Government Data (III)
• And a good number of alternative technologies
• And a good number of metadata schemas
• DataCite
• CrossRef
• CKAN Metadata
• DDI4
• DCAT
• …
14. Are we applying what we learned in Open Gov Data?
• Some of the same mistakes are being done
• Setting up a portal/infrastructure does not mean that you are
better than others
• Having more objects in your repository does not mean that
you are doing more or better Open Science
• No clear instructions on what to upload or not, and how to
ensure quality (except for mature domains or organisations)
• No clear governance (handled by researchers?, handled by
data centers?, handled by libraries?)
• And a few more things
• No clear relationship among all research artefacts
• No clear relationship between the Data Management Plans
and the way in which data is finally handled
15. Outline
• From Open (Government) Data to Open Science
• Our previous OEG-UPM research and development
to support Open Science practices
• Research Objects
• Systematic (Meta)Data Management in Research
• Ontology-based Representation of Laboratory Protocols
• Reproducibility of Computational Experiments
• Our (practical) understanding of Open Science and
current practices at OEG-UPM
16. How do we do Science? Main components
[source: Idafen Santana]
17. The life of our researchers at OEG-UPM
Scientist
Live RO Live RO
RO snapshot
<<copy>>
Permanent URI
Some metadata
Some curation
Mostly private (for my group)
RO snapshot
<<copy>>
Permanent URI
Some metadata
Some curation
Mostly private (for my group
and for paper reviewers)
Librarian/Curator
Scientist
My supervisor calls me to
report my work
My supervisor calls me
again and we decide to
publish our RO+paper
<<versionOf>>
Archived RO
<<copy, filter
and curate>>
Permanent URI
Good metadata
and curation
Mostly public
Reviews received and
final version
published
<<versionOf>>
A new PhD student
continues my work
<<copy>>
19
18. bundles and relates digital resources of a scientific experiment
or investigation using standard mechanisms, “tool middleware”
http://www.w3.org/community/rosc/
http://www.researchobject.org/
19. Systematic (meta)data Management in Research
• Open (Research) Data portals
• Data Management
• Data Publication
• DOIs
• Sensor Data (photometers)
• Management
• Visualisation
» And Citizen Science
22. Outline
• From Open (Government) Data to Open Science
• Our previous OEG-UPM research and development
to support Open Science practices
• Research Objects
• Systematic (Meta)Data Management in Research
• Ontology-based Representation of Laboratory Protocols
• Reproducibility of Computational Experiments
• Our (practical) understanding of Open Science and
current practices at OEG-UPM
23. How do we do it at OEG-UPM?
• Which research artefacts do we handle at OEG-UPM?
• Papers (sure, let’s see the following talk by UPM’s library)
• Data Management Plans (DMPOnline –PaGoDa did not exist)
• Datasets
• Normally in GitHub, e.g. https://github.com/oeg-upm/btn100
• Software source code
• Normally in GitHub: http://www.github.com/oeg-upm
• Docker images, models and APIs
• Normally in DockerHub: https://hub.docker.com/u/oegupm/
• Ontologies, thesauri, etc.
• Normally in GitHub, e.g.,
https://github.com/CiudadesAbiertas/vocab-sector-publico-
agenda-municipal
• And published online, e.g.,
http://vocab.ciudadesabiertas.es/def/sector-publico/agenda-
municipal/
• …
24. And which are our (good) practices?
• Still missing many, but...
• When a research or experiment starts, a new GitHub
repository is created
• The repository is connected to Zenodo, so as to get DOIs
and ensure archival
• Automated archival process after every release
• DOIs also added to the GitHub repository
• Our papers cite those DOIs
• Bit.ly, dropbox, GDrive links, etc., are strictly prohibited in
our papers
• Zenodo community
• https://zenodo.org/communities/ontologyengineeringgrou
p/
25. The key messages of my talk...
Open Science ≠ Open Access
Science is not only about papers (other objects exist)
Open Science = Open Access + Research Data Management
+ Research Object Management
We all need principled approaches and clear guidelines
(community or institution driven) to adopt an Open
Science approach
We expect this (non-extra) work to pay off in the future
26. Open Data (and Software, and
other Research Artefacts):
A proper management
Seminar: Let’s Do it Together! How to implement
Open Science practices in Research Projects
Universidad Politécnica de Madrid
29/11/2019
With contributions from Esteban González, Daniel Garijo,
Idafen Santana, Olga Giraldo
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho, @opencitydata_es
https://www.slideshare.com/ocorcho
Notas do Editor
http://opendatahandbook.org/en/what-is-open-data/
To share your research materials (RO as a social object)
To facilitate reproducibility and reuse of methods
To be recognized and cited (even for constituent resources)
To preserve results and prevent decay (curation of workflow definition; using provenance for partial rerun)
Middleware