Pests of castor_Binomics_Identification_Dr.UPR.pdf
DataverseNL as structured data hub
1. dans.knaw.nl
DANS is een instituut van KNAW en NWO
The development of DataverseNL data repository
as Structured Data Hub
Dataverse Community Meeting, 16th of June, 2017
Harvard University
Vyacheslav Tykhonov, Peter Doorn & Marion Wittenberg (DANS)
2. DataverseNL facts
Started at DANS as service in 2014
In the Netherlands, DataverseNL was installed at the Utrecht
University in 2010, after which it developed into a
shared service of 15 institutions.
General statistics (June 2017):
227 dataverses
448 datasets
1,569 files
7,151 downloads
5. Value of DataverseNL for the Dutch data landscape
• DataverseNL is service implementing best practices for data management
• The vision of DANS is to store data for ongoing projects in DataverseNL, once
project is finished the original and produced datasets should go to the
Trusted Digital Repository
• Сommunity is the biggest value of DataverseNL, hundreds of people using
this service to deposit their data produced in different projects
• DataverseNL is collaboration platform that allows researchers from 15
Universities and organisations to work together and share results of their
research to the public
• DataverseNL is major integration point where datasets from different
disciplines produced by research communities of the Netherlands are coming
together
• DataverseNL can serve as main entrance to use different tools from Virtual
Research Environments (VREs) on various types of objects (data, video,
audio) and share them between members of the community
6. dans.knaw.nl
DANS is een instituut van KNAW en NWO
Within the context of DANS’ mission, it is obligatory that every (digital)
object archived via DANS has a PID, so that it can be (re)located and
cited. DANS uses PIDs for both (digital) objects and people.
DataverseNL for ongoing research projects
• every dataset has its own handle (for Dutch Universities)
• revisions of dataset don’t change the handle, every new version changing
only citation
EASY for permanent data archiving (DOIs)
• archived dataset has DOI
• every version of dataset archived from DataverseNL producing new DOI
• all metadata exposed in Dublin Core
7. Linking DataverseNL to the Semantic Web
• Our goal is to make DataverseNL dataset metadata available
as Linked Open Data (LoD)
• we’re working on markup that uses Schema.org, SKOS and
other vocabularies to migrate the existing metadata schemas
to the Structured Data Hub
• the idea that every metadata field can be described as
“subject/predicate/object" (or triple) and linked to the proper
vocabulary (ontology)
• different disciplines and projects have different controlled
vocabularies so the same metadata can be linked to various
ontologies
8. Schema.org introduction
“Schema.org is an initiative launched on 2 June 2011 by Bing, Google
and Yahoo![1][2][3] (then operators of the world's largest search
engines)[4] to “create and support a common set of schemas for
structured data markup on web pages.” (from Wikipedia)
• We’re trying to link dataverses and datasets from
DataverseNL to the proper entities from controlled
vocabularies
• Linked data should go to Google Knowledge Graph and can
be queried via their API to get triples back
• To credit people that are contributing to their knowledge
base search engines show pages with Linked Data in the
special format
10. Digital preservation in the Long Term Archive
• DANS has developed Plugin to archive datasets
deposited in DataverseNL temporary storage to
Trusted Digital Repository (TDR)
• Before putting datasets in the Long Term Archive
users should create account in TDR and get proper
permissions from it to archive their data
• Archival Plugin is open source software and can be
easily extended by support of any TDRs
supporting bagit packages:
https://github.com/DANS-KNAW/dataverse-bridge
11. Datasets archiving process
“Archive” button is
available for local
Dataverse administrators
to push datasets to EASY
archive for long term
preservation
12. Archived version of the dataset in TDR
Archived version of the
dataset is available on
EASY Trusted Digital
Repository landing
page and can be cited
in research papers
13. EASY metadata export to Linked Open Data cloud
• OAI-PMH endpoint to expose metadata in Dublin Core
schema
• Semantic pipeline to convert Dublin Core entities to RDF
triples
• Triples are stored to the Huygens Timbuctoo Linked Data
repository (CLARIAH project) and Virtuoso (DANS research)
and ready for SPARQL querying
• Outcome: EASY metadata will become data input for Linked
Open Data repository
15. DataverseNL as Linked Data repository
Linked Data object in DataverseNL consists of:
• metadata with authorship and citation information
• data usage licence
• handle as persistent identifier
• information how to obtain key (API token) to start use API endpoint(s)
• link to API endpoint delivering data
• representation of API (interactive documentation, Swagger)
• data provenance
• controlled vocabularies to meet domain specific community standards
(optional)
Public demonstration is available on Dataverse demo website.
16. Linked Data API endpoint example
Source: http://grlc.io/api/CLARIAH/wp4-queries/
Dataverse: Object with PID
API specification in Swagger