Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments
Simon Hodson, Executive Director, CODATA
www.codata.org
SA-EU Open Science Dialogue Workshop
Centurion Lake Hotel
15 May 2018
Why Open Science / FAIR Data?
• Good scientific practice depends on communicating the evidence.
• Open research data are essential for reproducibility, self-correction.
• Academic publishing has not kept up with age of digital data.
• Danger of an replication / evidence / credibility gap.
• Boulton: to fail to communicate the data that supports scientific
assertions is malpractice
• Open data practices have transformed certain areas of research.
• Genomics and related biomedical sciences; crystallography;
astronomy; areas of earth systems science; various disciplines using
remote sensing data…
• FAIR data helps use of data at scale, by machines, harnessing
technological potential.
• Research data often have considerable potential for reuse,
reinterpretation, use in different studies.
• Open data foster innovation and accelerate scientific discovery through
reuse of data within and outside the academic system.
• Research data produced by publicly funded research are a public
asset.
Policy Push for
Open Research Data
• Bits of Power: Issues in Global Access to Scientific Data (1997)
• The three Bs (Budapest, Berlin and Bethesda) and Open Access, 2002-3
• OECD Principles and Guidelines on Access to Research Data, 2004, 2007
• UK Funder Data Policies, from 2001, but accelerates from 2009
• NSF Data Management Plan Requirements, 2010
• Royal Society Report ‘Science as an Open Enterprise’, 2012
• OSTP Memo ‘Increasing Access to the Results of Federally Funded Scientific Research’,
Feb 2013
• G8 Science Ministers Statement, June 2013
• G8 Open Data Charter and Technical Appendix, June 2013
• EC H2020 Open Data Policy Pilot, 2014; Adoption of FAIR Data Principles, 2017.
• Science International Accord on Open Data in a Big Data World, Dec 2015:
http://bit.ly/opendata-bigdata
Bill and Melinda Gates Foundation, Open Access and Open Data Policy
https://www.gatesfoundation.org/how-we-work/general-information/open-access-policy
‘Data Underlying Published Research Results Will Be Accessible and Open Immediately. The
foundation will require that data underlying the published research results be immediately accessible
and open. This too is subject to the transition period and a 12-month embargo may be applied.’
MSF Data Sharing Policy: http://www.msf.org/en/msf-data-sharing-policy
‘MSF recognizes the ethical imperative it has to share its data openly, transparently and in a timely
manner for the greater public health good.’
Appropriate restrictions for consent, privacy, etc.
European Commission Data Policy: ‘as open as possible, as closed as necessary’, FAIR Data
Wellcome Trust: strong support for Open Data sharing, with appropriate restrictions.
Developments: Donor Policies
Dryad Joint Data Archiving Policy, Feb 2010: http://datadryad.org/jdap
This journal requires, as a condition for publication, that data supporting the results in the
paper should be archived in an appropriate public archive, such as GenBank, TreeBASE,
Dryad, or the Knowledge Network for Biocomplexity.
PLOS Data Availability Policy, revised Feb 2014:
http://www.plosone.org/static/policies.action#sharing
PLOS journals require authors to make all data underlying the findings described in their
manuscript fully available without restriction, with rare exceptions.
Springer Nature initiative to standardise policies:
http://www.springernature.com/gp/group/data-policy/policy-types
RDA Interest Group developing standardised journal data policies.
Developments: Journal Policies
Open Science, FAIR Data:
Commons, Clouds, Platforms…
Commons: ‘collectively owned and managed by a community of users’
Clouds: European Open Science Cloud (not just European, not entirely Open, not just for
science and not exclusively cloud technology)…
Platform Approaches:
brokerage for discovery and access, reinforced by the development of common
standards and principles or policies (e.g. GEOSS, Research Data Australia);
brokerage of services: approaches for discovery and access, augmented by the
provision of services for particular research disciplines, including the promotion of
skills, training, competences, standards, tools for analysis etc (e.g. Elixir, CESSDA and
other ESFRIs, CGIAR on a global scale);
platform environment: utilizing the capacity of Cloud Computing for efficiency, access
management, analysis across vast numbers of datasets, marketisation of services in a
platform economy in which standards and common rules minimize vendor lock-in (e.g.
NIH Data Commons, European Open Science Cloud).
EOSC Declaration
https://ec.europa.eu/research/openscience/pdf/eosc_declaration.pdf
Data culture, Open access by default, Skills, Data stewardship,
Rewards and incentives, FAIR principles, Standards, FAIR Data
governance, Implementation and transition to FAIR, Research data
repositories, Accreditation / certification, Data Management Plans,
Technical implementation, Citation system, Common catalogues,
Semantic layer, FAIR tools and services, Data expert organisations,
Legal aspects, EOSC architecture, Implementation, Legacy, User
needs, Service provision, Service deployment, Thematic areas,
Research infrastructures, EU-added value and coordination, HPC
and the EOSC, Innovation, Governance model, Governance
framework, Executive board, Coordination structure, Long-term
sustainability, Funding, Global aspects.
[FAIR principles] Implementation of the FAIR principles must be
pragmatic and technology-neutral, encompassing all four
dimensions: findability, accessibility, interoperability and reusability.
FAIR principles are neither standards nor practices. The disciplinary
sectors must develop their specific notions of FAIR data in a
coordinated fashion and determine the desired level of FAIR-ness.
FAIR principles should apply not only to research data but also to
data-related algorithms, tools, workflows, protocols, services and
other kinds of digital research objects.
Emerging Policy Consensus? FAIR Data
• FAIR Data (see original guiding principles at https://www.force11.org/node/6062
• Findable: have sufficiently rich metadata and a unique and persistent identifier.
• Accessible: retrievable by humans and machines through a standard protocol;
open and free by default; authentication and authorization where necessary.
• Interoperable: metadata use a ‘formal, accessible, shared, and broadly applicable
language for knowledge representation’.
• Reusable: metadata provide rich and accurate information; clear usage license;
detailed provenance.
European Commission Expert Group
on FAIR Data
Core Deliverables
1. To develop recommendations on what
needs to be done to turn each
component of the FAIR data principles
into reality
2. To propose indicators to measure
progress on each of the FAIR components
3. Actively support the creation of the FAIR
Data Action Plan, by proposing a list of
concrete actions as part of its Final
Report
4. Draft for consultation, released 11 June
2018, final report October 2018.
5. Support Commission in presentation of
FAIR Data Action Plan in Autumn 2018.
Report Structure
1. Concepts: Why FAIR?
2. Creating a culture of FAIR data
3. Making FAIR data a reality: technical
perspective
4. Skills and capacities for FAIR data
5. Measuring Change
6. Facilitating Change: a FAIR Data
Action Plan
International ‘ecosystem’ of open science
and FAIR data components
Open Science infrastructure is not just the network, storage and compute.
Ecosystem of components which are created and governed internationally.
Reporting Research Outputs: information systems for research output reporting (CRIS), metadata
standards e.g. CERIF, managed by euroCRIS.
Persistent and Unique Identifiers: DOIs for articles (CrossRef); DOIs for data sets (DataCite); author IDs
(ORCID).
Data and Metadata Standards: CIF in crystallography, FITS in astronomy, DDI in social science surveys,
Darwin Core in biodiversity, etc, etc.
DCC Registry of Metadata Standards http://www.dcc.ac.uk/resources/metadata-standards ; now
maintained by RDA IG http://rd-alliance.github.io/metadata-directory/
Data Repositories: listed in Re3Data, registry of data repositories: https://www.re3data.org/
Trusted Data Repositories: Core Trust Seal https://www.coretrustseal.org/, a merger of Data Seal of
Approval and the World Data System criteria.
Criteria for Trustworthy Digital Archives (DIN 31644) http://www.data-
archive.ac.uk/curate/trusted-digital-repositories/standards-of-trust?index=3
Audit and certification of trustworthy digital repositories (ISO 16363) http://www.data-
archive.ac.uk/curate/trusted-digital-repositories/standards-of-trust?index=2
Vision of a coordinating activity to help put in place and link the enabling practices,
capacities and technologies for Open Science.
Pan African in ambition.
Current three-year pilot funded by Department of Science and Technology via National
Research Foundation; delivered by ASSAf, directed by CODATA.
Ambitious programme of engagement with a number of African countries, key stakeholders
(universities, academies, NRENs, emerging research infrastructures.
Preparing the foundations for a broader initiative in terms of outputs and network building.
Successful first strategy workshop (March 2018) followed by a stakeholder workshop (Sept
2018) to prepare the platform initiative.
Aim for this to be launched at Science Forum South Africa, Dec 2018.
African Open Science
Platform
Key deliverables of the pilot project will be foundations for the platform in these four key
area:
1. Frameworks and guidance to assist policy development at national and institutional
level.
2. Study and recommendations to reduce barriers and provide constructive incentives for
Open Science.
3. Framework for data science training (including RDM, data stewardship and science of
data); curriculum framework, training materials, recommendations for training
initiatives.
4. Framework and roadmap for data infrastructure development: emphasising
partnerships and de-duplication between national systems, economies of scale,
institutions and domain initiatives.
Framework for Policies, Incentives,
Training and Technical Infrastructures
African Open Science Platform:
Suggested Phase Two Activities
1. Registry of African data initiatives, collections and services
2. Coordination and provision of network, compute and storage (building on current work of
NRENs, targeting needs of Open Science, achieving economies of scale).
3. A virtual space for scientists to find, deposit, manage, share and reuse data, software and
metadata (i.e. support for / or provision of FAIR data components, data stewardship and
Research Infrastructures).
4. An African Data Science Institute (to develop African capacities at the international cutting
edge of research in data analytics, artificial intelligence, machine learning and data
stewardship).
5. Major data-intensive programmes in science areas where Africa is data-asset rich (process
for identifying these areas, obtaining funding, ensuring that RIs are in place).
6. Network for Education and Skills in Data and Information (training programmes in data
science, data stewardship, data literacy, targeted at all stages of education).
7. Network for Open Science Access and Dialogue (building full engagement and joint action in
transdisciplinary and citizen science initiatives as an essential component of Open Science).
INTERNATIONAL DATA WEEK
IDW 2018
Gaborone, Botswana: 5-8 November 2018
Information: http://internationaldataweek.org/
Deadline for abstracts, 31 May:
https://www.scidatacon.org/IDW2018/
International Data Week
Keynotes
Joy Phumaphi, former Minister of Health,
Botswana; co-chair of WHO Group on
Family and Community Health.
Rob Adam, Director of SKA South Africa, a
major African science and data initiative.
Ismail Serageldin, founding Director of the
new Biblioteca Alexandrina, noted thinker
on science policy issues.
Elizabeth Marincola, former CEO of PLOS;
now leading the African Academy of
Sciences publication initiatives (see AAS
Open Research).
Tshilidzi Marwala, VC of University of
Johannesburg, noted thinker in Big Data
and AI.
CODATA-RDA School of
Research Data Science
• Annual foundational school at ICTP, Trieste (with the
objective to build a network of partners, train-the-
trainers).
• Advanced workshops, ICTP, Trieste, following the
foundational school.
• National or regional schools, organised with local
partners.
2018
• Next #DataTrieste Summer School, 6-17 August 2018.
• Next #DataTrieste Advanced Workshops 20-24 August
2018.
• Call for applications, deadline 21 May:
http://www.codata.org/datatrieste2018
• Schools in Brisbane (UQ and Australian Academy of
Sciences); ICTP Kigali (October); ICTP São Paulo
(December)
Simon Hodson
Executive Director CODATA
www.codata.org
http://lists.codata.org/mailman/listinfo/codata-international_lists.codata.org
Email: simon@codata.org
Twitter: @simonhodson99
Tel (Office): +33 1 45 25 04 96 | Tel (Cell): +33 6 86 30 42 59
CODATA (ICSU Committee on Data for Science and Technology), 5 rue Auguste Vacquerie, 75016 Paris,
Thank you for your attention!
SciDataCon part of
International Data Week
SciDataCon aims to help this community ensure that it has a concrete scientific record of its
work: peer reviewed abstracts > presentations > Special Collection in the Data Science
Journal.
Themes and Scope: see
https://www.scidatacon.org/conference/IDW2018/conference_themes_and_scope/
Approved Sessions: https://www.scidatacon.org/conference/IDW2018/approved_sessions/
Incredibly rich range of topics. If you do not find a topic there you can submit an abstract
to the general submissions.
Abstracts can be submitted to Approved Sessions or to General Submissions. Will be peer
reviewed and distributed into the programme.
Abstracts for presentations and lightning talks/posters.
Deadline is 31 May: https://www.scidatacon.org/conference/IDW2018/call_for_papers/
CODATA-RDA School of Research
Data Science
• Contemporary research – particularly
when addressing the most significant,
interdisciplinary research challenges –
increasingly depends on a range of skills
relating to data.
• These skills include the principles and
practice of Open Science; research data
management and curation, how to
prepare a data management plan and to
annotate data; software and data
carpentry; principles and practices of
visualisation; data analysis, statistics and
machine learning; use of computational
infrastructures. The ensemble of these
skills, relating to data in research, can
usefully be called ‘Research Data Science’.
DataTrieste Film on Vimeo: https://vimeo.com/232209813
Call for applications, deadline 21 May: http://www.codata.org/datatrieste2018
Notas do Editor
Open Science, Open Data and FAIR data have become internationally important for very good reasons.
We have witnessed a policy push for Open Access, Open Research Data and Open Science.
We are experiencing a
We are experiencing a
It is not an exaggeration to say that there is an emerging policy consensus around FAIR. In an accessible way, FAIR summarises attributes of data which have been stressed in a number of policy documents, including the Royal Society report on Science as an Open Enterprise and its definition of ‘intelligent openness’.
It is not an exaggeration to say that there is an emerging policy consensus around FAIR. In an accessible way, FAIR summarises attributes of data which have been stressed in a number of policy documents, including the Royal Society report on Science as an Open Enterprise and its definition of ‘intelligent openness’.
We are experiencing a
We are experiencing a
We are experiencing a
Gaborone International Convention Centre (GICC)
CODATA was established by the International Council for Science to promote the availability and quality of data for all areas of research.
CODATA has three strategic priority areas: Please consult the CODATA strategy and Prospectus for more information.
promoting data principles, policies and practice: recent work includes a survey of research data policies, a report on the value of open data sharing for GEO, the promotion of data citation and the Science International Accord on Open Data in a Big Data World, which has been endorsed by IUCr.
advancing the frontiers of data science: this is done through Task Groups and Working Groups; by means of the Data Science Journal, relaunched with Ubiquity Press and regular conferences (henceforth we intend to organise a CODATA Conference in odd years and International Data Week, with RDA and WDS, in even years).
mobilising data capacity (with particular attention strategies, skills and ‘soft’ infrastructure in LMICs): through the initiative for a foundational curriculum for research data science (research data science summer schools), the regular Open Data Training Workshops hosted by CODATA China and the capacity building element of initiatives like the African Open Science Platform.