SlideShare uma empresa Scribd logo
1 de 35
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
External controlled vocabularies
support in Dataverse
Slava Tykhonov (DANS-KNAW)
lead software engineer, DANS R&D
Dataverse Community Meeting 2021
16 June 2021
Harvard University
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
2
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
DANS Data Stations - Future Data Services
Dataverse is API based data platform and a key framework for Open Innovation!
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
FAIR and Dataverse
Source:
Mercè Crosas,
“FAIR principles and beyond:
implementation in Dataverse”
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Out of the box CV support in Dataverse (1)
Source: Dataverse Metadata Schema
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Out of the box CV support in Dataverse (2)
Internal vocabularies are stored in Dataverse, we need more CVs!
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
The importance of standards and ontologies
Generic controlled vocabularies to link metadata in the bibliographic collections are well
known: ORCID, GRID, GeoNames, Getty.
Medical knowledge graphs powered by:
● Biological Expression Language (BEL)
● Medical Subject Headings (MeSH®) by U.S. National Library of Medicine (NIH)
● Wikidata (Open ontology) - Wikipedia
Integration based on metadata standards:
● MARC21, Dublin Core (DC), Data Documentation Initiative (DDI)
The most of prominent ontologies already available as a Web Services with API
endpoints.
7
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Simple Knowledge Organization System (SKOS)
SKOS models a thesauri-like resources:
- skos:Concepts with preferred labels and alternative labels (synonyms) attached to them
(skos:prefLabel, skos:altLabel).
- skos:Concept can be related with skos:broader, skos:narrower and skos:related properties.
- terms and concepts could have more than one broader term and concept.
SKOS allows to create a semantic layer on top of objects, a network with statements and
relationships.
A major difference of SKOS is logical “is-a hierarchies”. In thesauri the hierarchical relation can
represent anything from “is-a” to “part-of”.
8
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Global Research Identifier Database (GRID) in SKOS
9
We already have a lot of data in
the global Dataverse network.
Can we provide depositors a
convenient web interface to link
their metadata to external
controlled vocabularies?
Is it possible to disambiguate
concepts and create links
automatically?
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
SKOSMOS framework to discover ontologies
10
● SKOSMOS is developed in
Europe by the National Library
of Finland (NLF)
● active global user community
● search and browsing interface
for SKOS concept
● multilingual vocabularies
support
● used for different use cases
(publish vocabularies, build
discovery systems, vocabulary
visualization)
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
SKOSMOS API specification in Swagger
11
Source: Finto API
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
SKOSMOS API example for GRID ontology
12
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Semantic Gateway as plugin app (in development)
Source: Dataverse gateway
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse deposit form with selected CVs
Every field could be linked to the controlled vocabularies in FAIR way!
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
One metadata field linked to many ontologies
Language switch in Dataverse will change the language of suggested terms!
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Improved metadata schema with 4 child fields
Vocabulary and Term selected by user, Vocabulary URL and Term URL filled automatically:
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Configuration for external controlled vocabularies
Pull Request to Dataverse core https://github.com/IQSS/dataverse/pull/7712
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Javascript interface
CV interface implemented as Javascript
and placed outside of Dataverse
application.
internal:
“js-url”: “/resources/js/cvoc-interface.js”
External:
“js-url”:
“https://raw.githubusercontent.com/Dans-
labs/semantic-
gateway/main/static/js/interface.js”
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
SKOSMOS python module (SKOSMOS-Client)
from skosmos_client import SkosmosClient
# then you can create your own client
skosmos = SkosmosClient(api_base='http://api.finto.fi/rest/v1/')
Finding the available vocabularies:
Vocabulary id: afo title: AFO - Natural resource and environment
ontology
Vocabulary id: allars title: Allärs - General thesaurus in Swedish
Vocabulary id: cn title: Finnish Corporate Names
Vocabulary id: ic title: Iconclass
...
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Other SKOSMOS supported services
● Finto (Finnish thesaurus and ontology service)
● CESSDA CV Service has implemented SKOSMOS interface
● CESSDA ELSST (European Language Social Science Thesaurus)
● ACDH Vocabularies (Austrian Academy of Sciences)
● Thesaurus INRAE (Paris, France)
● AGROVOC Multilingual Thesaurus (United Nations)
● UNESCO Thesaurus
● European Space Agency ESA
NDE (Netwerk Digitaal Erfgoed) is working with DANS on the (partial)
support of SKOSMOS protocol to get a proper external CV connection to
DANS Data Stations.
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Collaboration with GDCC
External controlled vocabulary working group.
Consensus proposal for the CVV support implementation, the current state and
requirements matrix:
https://docs.google.com/document/d/1txdcFuxskRx_tLsDQ7KKLFTMR_r9IBhorDu3V_r44
5w/edit?ts=607451c0
Pull Request
https://github.com/IQSS/dataverse/pull/7712
Demonstration
http://github.com/GlobalDataverseCommunityConsortium/dataverse/tree/external-cvoc2
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Known issues with support of external controlled vocabularies
● how CV support could be applied to any field
● support of any available vocabulary
● backward compatibility with fields from the old metadata schema
● clean UI experience (one selection can fill 4 child fields)
● can we use non-managed vocabularies or free-text values in same
field
● concept drift (the change of meaning of concepts)
● interoperability across all Dataverse instances
● how to ensure CVs are coming from authoritative services
...
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Issue: how CV support could be applied to any field?
Problem: would support keyword field (with addition of one child field) and any new/existing fields built
to have the 4 required child fields. For example, subject, funder ID, grant ID, etc?
Possible solution: changes could be applied to existing text fields without modifying the metadata block,
by adding new fields to store URIs. However requires changes on CV interface side.
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Issue: support of any available vocabulary
Problem: currently the implementation specific to SKOSMOS protocols
which handles many vocabs.
Solution: the interface to external API endpoints with vocabularies has been
placed outside of Dataverse as external Javascript and could be extended
with support of any API, for example, ORCID service.
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Issue: Backward Compatibility
Problem: our implementation of external controlled vocabularies support requires 4
child fields instead of 3 (default for Dataverse).
Possible solution: create a flyway script to adapt existing fields entries if metadata
schema will get extension with new 4th field to keep the concepts URIs. Second
option is to keep new field with URIs empty and force depositors to fill it manually.
Source: Harvard Dataverse
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Issue: clean User Interface experience
Problem: display retains the 4 fields even though one selection determines
all 4. Could hide/disable other fields? With SKOSMOS-served vocabularies,
some child fields will be filled automatically.
Possible solution: use more flexible configuration to define 2 child fields
(label/URI) instead of 4 where it’s possible. Or make 3 fields read-only and
managed by Dataverse, not user, if it’s unavoidable.
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Issue: non-managed vocabularies or free-text values
Problem: can user mix non-managed controlled vocabularies or free-text
values in the same field?
Possible solution: input could allow disabling the selector with some
‘manual’ mode. If user wants to store some term that doesn’t match any
entry in CV, it could be allowed to be stored as text. However, it’s not
sustainable solution - how to sync free-text terms with external CVs?
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Issue: how to ensure CV is from an Authoritative service?
Problem: since the service URL is part of config, it could be configured to
use other services (a locally managed one, a mirror, etc.)
Possible solution: admin is responsible for the decision to use an
authoritative source. However, we don’t know how to control this in the
distributed network. It could become a serious issue if service is moving
from one to another service provider, mirrors should be also considered
there.
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Issue: Concept Drift
Concept drift is related to the cases where a concept may replace the meaning of
other concepts, or other concepts can take over its meaning. Can lead to the
problems with data quality, very difficult to trace and address.
Possible scenarios of concept drift:
- at the concept identifier level (label drift)
- in the basic properties of the concept (intensional drift)
- to the things the concept refers to (extensionaldrift)
Source: Detecting and Reporting Extensional Concept Drift in Statistical Linked Data
Possible solution: create and maintain cache of every concept inside of data
repository
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Issue: interoperability across all Dataverse instances
Problem: this implementation requires the same configuration files to import
data and metadata from another Dataverse instance. If not configured,
shows as 4 child fields by default.
Possible solution: terms from unsupported (undefined) vocabs would just
show as their URLs in another instance.
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Required Development for the sustainability
This proposal leverages the work already done in PR #7712. The additional work needed to implement the
proposal above includes:
1. Creation of a new vocabulary table (termUri string, term metadata (json text))
○ Column for service type/URL?
○ Column for retrieval date?
2. Add CRUD API for vocabulary table. API would allow addition of a termURI and would then perform a
web call to populate the term metadata (versus allowing user input of metadata)
3. Adapt current PR to add termURI to table during upload.
4. Adapt current PR (config file example) to work with a single field versus parent/4 child model
5. Adapt SKOSMOS Javascript to handle display as well as input.
6. Develop plan for migrating existing keyword entries to new model
○ E.g. identify existing CVV entries and identify/create SKOSMOS service to provide them, develop sql
script to replace existing values
7. Develop recommendations/documentation/examples to support using CVVs in keyword and custom
fields.
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Caching function for CVV
Linked Data Serverless Resolver as Lambda Function on Harvard AWS cloud
https://github.com/Dans-labs/ld-serverless-resolver
Features:
● Shared service for all
Dataverse instances
● Memento protocol
support must have
● Integrated with LD
Proxy service
● Archived concepts for
every dataset version
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Caching concept URIs
WikiData MeSH
Archived concepts incorporated in the dataset metadata export is the link to Linked Open Data!
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Linking data (files) to external CVs, not only metadata
Source: Scholars Portal’ Data Curation Tool (Canada)
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Thank you for your attention!
Slava Tykhonov (DANS-KNAW)
vyacheslav.tykhonov@dans.knaw.nl
https://www.sshopencloud.eu
info@sshopencloud.eu
@SSHOpenCloud
/in/sshopencloud
Join our community

Mais conteúdo relacionado

Mais procurados

Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdfJuanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
FIWARE
 

Mais procurados (20)

NGSI-LD Introduction
NGSI-LD IntroductionNGSI-LD Introduction
NGSI-LD Introduction
 
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdfJuanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
 
Orion Context Broker 1.15.0
Orion Context Broker 1.15.0Orion Context Broker 1.15.0
Orion Context Broker 1.15.0
 
Terminology, value-sets, codesystems by Lloyd McKenzie
Terminology, value-sets, codesystems by Lloyd McKenzieTerminology, value-sets, codesystems by Lloyd McKenzie
Terminology, value-sets, codesystems by Lloyd McKenzie
 
FIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE Global Summit - NGSI-LD - NGSI with Linked DataFIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE Global Summit - NGSI-LD - NGSI with Linked Data
 
FHIR Tutorial - Morning
FHIR Tutorial - MorningFHIR Tutorial - Morning
FHIR Tutorial - Morning
 
Rolling out FHIR - architecture and implementation considerations by Lloyd Mc...
Rolling out FHIR - architecture and implementation considerations by Lloyd Mc...Rolling out FHIR - architecture and implementation considerations by Lloyd Mc...
Rolling out FHIR - architecture and implementation considerations by Lloyd Mc...
 
Session 3 - i4Trust components for Identity Management and Access Control i4T...
Session 3 - i4Trust components for Identity Management and Access Control i4T...Session 3 - i4Trust components for Identity Management and Access Control i4T...
Session 3 - i4Trust components for Identity Management and Access Control i4T...
 
Session 5 - NGSI-LD Advanced Operations | Train the Trainers Program
Session 5 -  NGSI-LD Advanced Operations | Train the Trainers ProgramSession 5 -  NGSI-LD Advanced Operations | Train the Trainers Program
Session 5 - NGSI-LD Advanced Operations | Train the Trainers Program
 
OPA open policy agent
OPA open policy agentOPA open policy agent
OPA open policy agent
 
How to Build Interoperable Decentralized Identity Systems with OpenID for Ver...
How to Build Interoperable Decentralized Identity Systems with OpenID for Ver...How to Build Interoperable Decentralized Identity Systems with OpenID for Ver...
How to Build Interoperable Decentralized Identity Systems with OpenID for Ver...
 
Session 1 - Introduction to i4Trust Data Spaces, building blocks, and roles |...
Session 1 - Introduction to i4Trust Data Spaces, building blocks, and roles |...Session 1 - Introduction to i4Trust Data Spaces, building blocks, and roles |...
Session 1 - Introduction to i4Trust Data Spaces, building blocks, and roles |...
 
FIWARE Global Summit - Idra: A Solution for Open Data Interoperability
FIWARE Global Summit - Idra: A Solution for Open Data InteroperabilityFIWARE Global Summit - Idra: A Solution for Open Data Interoperability
FIWARE Global Summit - Idra: A Solution for Open Data Interoperability
 
Authoring Profiles in FHIR
Authoring Profiles in FHIRAuthoring Profiles in FHIR
Authoring Profiles in FHIR
 
FHIR Profiling tutorial
FHIR Profiling tutorialFHIR Profiling tutorial
FHIR Profiling tutorial
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
Developing Cross platform apps in flutter (Android, iOS, Web)
Developing Cross platform apps in flutter (Android, iOS, Web)Developing Cross platform apps in flutter (Android, iOS, Web)
Developing Cross platform apps in flutter (Android, iOS, Web)
 
Dspace 7 presentation
Dspace 7 presentationDspace 7 presentation
Dspace 7 presentation
 
Practical use of Knowledge Graph with Case Studies using Semantic Web Publish...
Practical use of Knowledge Graph with Case Studies using Semantic Web Publish...Practical use of Knowledge Graph with Case Studies using Semantic Web Publish...
Practical use of Knowledge Graph with Case Studies using Semantic Web Publish...
 
HL7 Fhir for Developers
HL7 Fhir for DevelopersHL7 Fhir for Developers
HL7 Fhir for Developers
 

Semelhante a Controlled vocabularies and ontologies in Dataverse data repository

Semelhante a Controlled vocabularies and ontologies in Dataverse data repository (20)

External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research data
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloud
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
 
2. EOSC-hub (Daan Broeder, CLARIN ERIC)
2. EOSC-hub (Daan Broeder, CLARIN ERIC)2. EOSC-hub (Daan Broeder, CLARIN ERIC)
2. EOSC-hub (Daan Broeder, CLARIN ERIC)
 
SSHOC at EOSC-hub Week - SSH & eInfra Projects - Daan Broeder
SSHOC at EOSC-hub Week - SSH & eInfra Projects - Daan BroederSSHOC at EOSC-hub Week - SSH & eInfra Projects - Daan Broeder
SSHOC at EOSC-hub Week - SSH & eInfra Projects - Daan Broeder
 
SSH & eInfra projects
SSH & eInfra projectsSSH & eInfra projects
SSH & eInfra projects
 
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
SSHOC General Presentation
SSHOC General PresentationSSHOC General Presentation
SSHOC General Presentation
 
Overview of the Sustainability Plans of the ICT-29b) Projects
Overview of the Sustainability Plans of the ICT-29b) ProjectsOverview of the Sustainability Plans of the ICT-29b) Projects
Overview of the Sustainability Plans of the ICT-29b) Projects
 
Sshoc kick off meeting - Work package 3 Pitch - Daan Broeder - KNAW HuC/CLARI...
Sshoc kick off meeting - Work package 3 Pitch - Daan Broeder - KNAW HuC/CLARI...Sshoc kick off meeting - Work package 3 Pitch - Daan Broeder - KNAW HuC/CLARI...
Sshoc kick off meeting - Work package 3 Pitch - Daan Broeder - KNAW HuC/CLARI...
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
Introduction to RDF and related Vocabularies/Languages. Introduction to SPARQL
Introduction to RDF and related Vocabularies/Languages. Introduction to SPARQLIntroduction to RDF and related Vocabularies/Languages. Introduction to SPARQL
Introduction to RDF and related Vocabularies/Languages. Introduction to SPARQL
 
3. Collaboration with other cluster projects - Governance (Franciska de Jong,...
3. Collaboration with other cluster projects - Governance (Franciska de Jong,...3. Collaboration with other cluster projects - Governance (Franciska de Jong,...
3. Collaboration with other cluster projects - Governance (Franciska de Jong,...
 
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
 
Sshoc kick off meeting - Work Package 5 Pitch - Annette Scherpenzeel
Sshoc kick off meeting - Work Package 5 Pitch - Annette ScherpenzeelSshoc kick off meeting - Work Package 5 Pitch - Annette Scherpenzeel
Sshoc kick off meeting - Work Package 5 Pitch - Annette Scherpenzeel
 
PaNOSC and ExPaNDS commitment to Open Science
PaNOSC and ExPaNDS commitment to Open SciencePaNOSC and ExPaNDS commitment to Open Science
PaNOSC and ExPaNDS commitment to Open Science
 
Sshoc kick off meeting - Work Package 9 Pitch
Sshoc kick off meeting - Work Package 9 PitchSshoc kick off meeting - Work Package 9 Pitch
Sshoc kick off meeting - Work Package 9 Pitch
 
5. RDA community - Expectations from 14th plenary (Marieke Willems, Trust-IT)
5. RDA community - Expectations from 14th plenary (Marieke Willems, Trust-IT)5. RDA community - Expectations from 14th plenary (Marieke Willems, Trust-IT)
5. RDA community - Expectations from 14th plenary (Marieke Willems, Trust-IT)
 

Mais de vty

Mais de vty (20)

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataverse
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC,  Service QA and DataverseIntegration of WORSICA’s thematic service in EOSC,  Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challenges
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 

Último

Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Controlled vocabularies and ontologies in Dataverse data repository

  • 1. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 External controlled vocabularies support in Dataverse Slava Tykhonov (DANS-KNAW) lead software engineer, DANS R&D Dataverse Community Meeting 2021 16 June 2021 Harvard University
  • 2. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 2
  • 3. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 DANS Data Stations - Future Data Services Dataverse is API based data platform and a key framework for Open Innovation!
  • 4. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 FAIR and Dataverse Source: Mercè Crosas, “FAIR principles and beyond: implementation in Dataverse”
  • 5. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Out of the box CV support in Dataverse (1) Source: Dataverse Metadata Schema
  • 6. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Out of the box CV support in Dataverse (2) Internal vocabularies are stored in Dataverse, we need more CVs!
  • 7. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 The importance of standards and ontologies Generic controlled vocabularies to link metadata in the bibliographic collections are well known: ORCID, GRID, GeoNames, Getty. Medical knowledge graphs powered by: ● Biological Expression Language (BEL) ● Medical Subject Headings (MeSH®) by U.S. National Library of Medicine (NIH) ● Wikidata (Open ontology) - Wikipedia Integration based on metadata standards: ● MARC21, Dublin Core (DC), Data Documentation Initiative (DDI) The most of prominent ontologies already available as a Web Services with API endpoints. 7
  • 8. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Simple Knowledge Organization System (SKOS) SKOS models a thesauri-like resources: - skos:Concepts with preferred labels and alternative labels (synonyms) attached to them (skos:prefLabel, skos:altLabel). - skos:Concept can be related with skos:broader, skos:narrower and skos:related properties. - terms and concepts could have more than one broader term and concept. SKOS allows to create a semantic layer on top of objects, a network with statements and relationships. A major difference of SKOS is logical “is-a hierarchies”. In thesauri the hierarchical relation can represent anything from “is-a” to “part-of”. 8
  • 9. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Global Research Identifier Database (GRID) in SKOS 9 We already have a lot of data in the global Dataverse network. Can we provide depositors a convenient web interface to link their metadata to external controlled vocabularies? Is it possible to disambiguate concepts and create links automatically?
  • 10. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 SKOSMOS framework to discover ontologies 10 ● SKOSMOS is developed in Europe by the National Library of Finland (NLF) ● active global user community ● search and browsing interface for SKOS concept ● multilingual vocabularies support ● used for different use cases (publish vocabularies, build discovery systems, vocabulary visualization)
  • 11. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 SKOSMOS API specification in Swagger 11 Source: Finto API
  • 12. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 SKOSMOS API example for GRID ontology 12
  • 13. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Semantic Gateway as plugin app (in development) Source: Dataverse gateway
  • 14. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse deposit form with selected CVs Every field could be linked to the controlled vocabularies in FAIR way!
  • 15. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 One metadata field linked to many ontologies Language switch in Dataverse will change the language of suggested terms!
  • 16. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Improved metadata schema with 4 child fields Vocabulary and Term selected by user, Vocabulary URL and Term URL filled automatically:
  • 17. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Configuration for external controlled vocabularies Pull Request to Dataverse core https://github.com/IQSS/dataverse/pull/7712
  • 18. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Javascript interface CV interface implemented as Javascript and placed outside of Dataverse application. internal: “js-url”: “/resources/js/cvoc-interface.js” External: “js-url”: “https://raw.githubusercontent.com/Dans- labs/semantic- gateway/main/static/js/interface.js”
  • 19. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 SKOSMOS python module (SKOSMOS-Client) from skosmos_client import SkosmosClient # then you can create your own client skosmos = SkosmosClient(api_base='http://api.finto.fi/rest/v1/') Finding the available vocabularies: Vocabulary id: afo title: AFO - Natural resource and environment ontology Vocabulary id: allars title: Allärs - General thesaurus in Swedish Vocabulary id: cn title: Finnish Corporate Names Vocabulary id: ic title: Iconclass ...
  • 20. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Other SKOSMOS supported services ● Finto (Finnish thesaurus and ontology service) ● CESSDA CV Service has implemented SKOSMOS interface ● CESSDA ELSST (European Language Social Science Thesaurus) ● ACDH Vocabularies (Austrian Academy of Sciences) ● Thesaurus INRAE (Paris, France) ● AGROVOC Multilingual Thesaurus (United Nations) ● UNESCO Thesaurus ● European Space Agency ESA NDE (Netwerk Digitaal Erfgoed) is working with DANS on the (partial) support of SKOSMOS protocol to get a proper external CV connection to DANS Data Stations.
  • 21. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Collaboration with GDCC External controlled vocabulary working group. Consensus proposal for the CVV support implementation, the current state and requirements matrix: https://docs.google.com/document/d/1txdcFuxskRx_tLsDQ7KKLFTMR_r9IBhorDu3V_r44 5w/edit?ts=607451c0 Pull Request https://github.com/IQSS/dataverse/pull/7712 Demonstration http://github.com/GlobalDataverseCommunityConsortium/dataverse/tree/external-cvoc2
  • 22. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Known issues with support of external controlled vocabularies ● how CV support could be applied to any field ● support of any available vocabulary ● backward compatibility with fields from the old metadata schema ● clean UI experience (one selection can fill 4 child fields) ● can we use non-managed vocabularies or free-text values in same field ● concept drift (the change of meaning of concepts) ● interoperability across all Dataverse instances ● how to ensure CVs are coming from authoritative services ...
  • 23. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Issue: how CV support could be applied to any field? Problem: would support keyword field (with addition of one child field) and any new/existing fields built to have the 4 required child fields. For example, subject, funder ID, grant ID, etc? Possible solution: changes could be applied to existing text fields without modifying the metadata block, by adding new fields to store URIs. However requires changes on CV interface side.
  • 24. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Issue: support of any available vocabulary Problem: currently the implementation specific to SKOSMOS protocols which handles many vocabs. Solution: the interface to external API endpoints with vocabularies has been placed outside of Dataverse as external Javascript and could be extended with support of any API, for example, ORCID service.
  • 25. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Issue: Backward Compatibility Problem: our implementation of external controlled vocabularies support requires 4 child fields instead of 3 (default for Dataverse). Possible solution: create a flyway script to adapt existing fields entries if metadata schema will get extension with new 4th field to keep the concepts URIs. Second option is to keep new field with URIs empty and force depositors to fill it manually. Source: Harvard Dataverse
  • 26. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Issue: clean User Interface experience Problem: display retains the 4 fields even though one selection determines all 4. Could hide/disable other fields? With SKOSMOS-served vocabularies, some child fields will be filled automatically. Possible solution: use more flexible configuration to define 2 child fields (label/URI) instead of 4 where it’s possible. Or make 3 fields read-only and managed by Dataverse, not user, if it’s unavoidable.
  • 27. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Issue: non-managed vocabularies or free-text values Problem: can user mix non-managed controlled vocabularies or free-text values in the same field? Possible solution: input could allow disabling the selector with some ‘manual’ mode. If user wants to store some term that doesn’t match any entry in CV, it could be allowed to be stored as text. However, it’s not sustainable solution - how to sync free-text terms with external CVs?
  • 28. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Issue: how to ensure CV is from an Authoritative service? Problem: since the service URL is part of config, it could be configured to use other services (a locally managed one, a mirror, etc.) Possible solution: admin is responsible for the decision to use an authoritative source. However, we don’t know how to control this in the distributed network. It could become a serious issue if service is moving from one to another service provider, mirrors should be also considered there.
  • 29. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Issue: Concept Drift Concept drift is related to the cases where a concept may replace the meaning of other concepts, or other concepts can take over its meaning. Can lead to the problems with data quality, very difficult to trace and address. Possible scenarios of concept drift: - at the concept identifier level (label drift) - in the basic properties of the concept (intensional drift) - to the things the concept refers to (extensionaldrift) Source: Detecting and Reporting Extensional Concept Drift in Statistical Linked Data Possible solution: create and maintain cache of every concept inside of data repository
  • 30. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Issue: interoperability across all Dataverse instances Problem: this implementation requires the same configuration files to import data and metadata from another Dataverse instance. If not configured, shows as 4 child fields by default. Possible solution: terms from unsupported (undefined) vocabs would just show as their URLs in another instance.
  • 31. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Required Development for the sustainability This proposal leverages the work already done in PR #7712. The additional work needed to implement the proposal above includes: 1. Creation of a new vocabulary table (termUri string, term metadata (json text)) ○ Column for service type/URL? ○ Column for retrieval date? 2. Add CRUD API for vocabulary table. API would allow addition of a termURI and would then perform a web call to populate the term metadata (versus allowing user input of metadata) 3. Adapt current PR to add termURI to table during upload. 4. Adapt current PR (config file example) to work with a single field versus parent/4 child model 5. Adapt SKOSMOS Javascript to handle display as well as input. 6. Develop plan for migrating existing keyword entries to new model ○ E.g. identify existing CVV entries and identify/create SKOSMOS service to provide them, develop sql script to replace existing values 7. Develop recommendations/documentation/examples to support using CVVs in keyword and custom fields.
  • 32. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Caching function for CVV Linked Data Serverless Resolver as Lambda Function on Harvard AWS cloud https://github.com/Dans-labs/ld-serverless-resolver Features: ● Shared service for all Dataverse instances ● Memento protocol support must have ● Integrated with LD Proxy service ● Archived concepts for every dataset version
  • 33. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Caching concept URIs WikiData MeSH Archived concepts incorporated in the dataset metadata export is the link to Linked Open Data!
  • 34. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Linking data (files) to external CVs, not only metadata Source: Scholars Portal’ Data Curation Tool (Canada)
  • 35. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Thank you for your attention! Slava Tykhonov (DANS-KNAW) vyacheslav.tykhonov@dans.knaw.nl https://www.sshopencloud.eu info@sshopencloud.eu @SSHOpenCloud /in/sshopencloud Join our community