SlideShare a Scribd company logo
1 of 19
dans.knaw.nl
DANS is een instituut van KNAW en NWO
Data standardization process
for arts and humanities
Vyacheslav Tykhonov
Senior Information Scientist
(DANS-KNAW, Netherlands)
Developing the SSHOC Reference Ontology workshop
ICS-FORTH , Heraklion, Crete
21-22 May, 2019
DANS-KNAW core services
Outline
• Standardization process during data deposit and archiving
(metadata level created by users)
• Research data management and harmonization of deposited
datasets (file level)
• Standardization and enrichment of harvested content (metadata
level provided by different data providers)
• Tracking provenance information for data and tools, moving to FAIR
Big problem: researchers and librarians are not talking to each other
and there is no common Reference model!
Metadata schemas
• EASY TDR has own metadata schema developed for Dutch
scientific landscape but allows Dublin Core export from OAI-
PMH endpoint
• NARCIS is an aggregator that harvesting metadata from
various repositories, no standardization pipeline
• Metadata from Dataverse can be exported as:
Controlled vocabulary and thesaurus
• Linked data is one step forward (or actually backward in the right
direction) on solving some of standardization problems.
• By having shared controlled vocabularies (CV) created and
maintained by experts on various domains, the digital items can
be annotated with them and easily retrieved by other experts
from the same domain without being librarian. It’s clear
indication which vocabulary is good enough and shared by a
critical mass.
• A thesaurus is a semantic network of unique concepts, including
relationships between synonyms, broader and narrower
(parent/child) contexts, and other related concepts. Thesaurus is
hierarchy for controlled vocabularies.
SSHOC data repository
DANS-KNAW is leading the development of SSHOC DataverseEU project.
We’re developing multilingual web interface and localizing metadata fields and developed data
standardization technique based on APIs for CESSDA CVs, Topic Classification and CESSDA CV Manager
services.
SSHOC/CESSDA DataverseEU:
• Hungary (TARKI)
• Sweden (SND)
• Slovenia (ADP)
• Germany (GESIS)
• France (SciencesPro)
• Austria (AUSSDA)
• United Kingdom (UKDA)
• Italy (CNR, UniData)
• Belgium (SODA)
• Latvia (LSZDA)
• Poland (PSNC)
• Norway (DataverseNO)
• Netherlands (DANS-KNAW)
SKOS RDF Vocabularies (CESSDA)
We’re importing thesaurus delivered as SKOS RDF, for example:
Rest API endpoint delivers back JSON suitable for web applications.
Metadata standardization during deposit process
Standardized metadata in Dataverse
Standardized metadata in RDF
All relations exported and available in the Knowledge Graph
and ready for the further querying and exploration:
Research data management
Data standardization process plays a key role in the data
management plan of any organization but current situation in
research data management is very complex:
• too much data chaos in datasets
• no data transparency
• sometimes no standards available
• no provenance information attached to data
• homonyms, synonyms, generalizations, specializations,
spelling variations and mistakes, language versions are all
complicating the keyword-based search and retrieval of
information
Data standardization pipeline based on
chatbot
Mapping produced by AI as result
mappings:
Image-image:
predicateobjects:
- [a, 'http://xmlns.com/foaf/0.1/Image']
- [a, 'http://schema.org/ImageObject']
- [a, 'http://schema.org/CreativeWork']
- [a, 'http://xmlns.com/foaf/0.1/Document']
- ['http://www.w3.org/2000/01/rdf-schema#label', $(image)]
- ['http://schema.org/image', $(image)]
source: dataset-source
subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Image/$(image)
Person-name:
predicateobjects:
- [a, 'http://schema.org/Person']
- [a, 'http://www.w3.org/2000/10/swap/pim/contact#Person']
- [a, 'http://xmlns.com/foaf/0.1/Person']
- [a, 'http://purl.org/dc/terms/Agent']
- [a, 'http://purl.org/goodrelations/v1#BusinessEntity']
- [a, 'http://rhizomik.net/ontologies/copyrightonto.owl#LegalPerson']
- [a, 'http://schema.org/Thing']
- [a, 'http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing']
- [a, 'http://xmlns.com/foaf/0.1/Agent']
- ['http://www.w3.org/2000/01/rdf-schema#label', $(name)]
source: dataset-source
subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Person/$(name)
Place-birth_cty:
predicateobjects:
- [a, 'http://schema.org/Place']
- [a, 'http://purl.org/goodrelations/v1#Location']
- [a, 'http://rdfs.co/juso/SpatialThing']
- [a, 'http://schema.org/Thing']
- ['http://www.w3.org/2000/01/rdf-schema#label', $(birth_cty)]
- ['http://dbpedia.org/ontology/era', $(era)]
- ['http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#isDescribedBy', $(era)]
- ['http://dbpedia.org/ontology/birthDate', $(birth), 'http://www.w3.org/2001/XMLSchema#datetime']
source: dataset-source
subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Place/$(birth_cty)
Place-birth_prv:
predicateobjects:
- [a, 'http://schema.org/Place']
- [a, 'http://purl.org/goodrelations/v1#Location']
- [a, 'http://rdfs.co/juso/SpatialThing']
- [a, 'http://schema.org/Thing']
- ['http://www.w3.org/2000/01/rdf-schema#label', $(birth_prv)]
- ['http://dbpedia.org/ontology/deathDate', $(death), 'http://www.w3.org/2001/XMLSchema#datetime']
source: dataset-source
subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Place/$(birth_prv)
NARCIS metadata (example)
No authority linking or controlled vocabularies support, but…
Tracking Provenance information
Prov-O example from PARTHENOS project
Time Machine association
• large scale project with 300+
partners
• development and support of
sustainable networked services
• trends watching and tracking of
software maturity level
• reliable governance model
• the foundation for the further
innovation!
Conclusion
• development of large-scale networked services out of research
pipelines
• every service should be mature enough, maintainable and follow
continuous integration pipeline
• tracking provenance information for every tool and dataset is the
highest priority
• creation and governance of standardization pipelines based on
services providing access to domain specific controlled vocabularies
and ontologies
• providing access to data, metadata and provenance (processes) in the
Knowledge Graph
• further integration of services maintained by different partners and
deployed in the Cloud
Questions?
Feel free to ask questions!
Vyacheslav (Slava) Tykhonov
e-mail: vyacheslav.tykhonov@dans.knaw.nl
website: http://dans.knaw.nl (DANS-KNAW)

More Related Content

What's hot

Data(base) taxonomy
Data(base) taxonomyData(base) taxonomy
Data(base) taxonomyDejan Radic
 
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...4Science
 
DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15Michele Mennielli
 
DSpace-CRIS_An open source solution for Research_EDU15
DSpace-CRIS_An open source solution for Research_EDU15DSpace-CRIS_An open source solution for Research_EDU15
DSpace-CRIS_An open source solution for Research_EDU15Michele Mennielli
 
Ado Fundamentals
Ado FundamentalsAdo Fundamentals
Ado Fundamentalsasim78
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsAhmad Assaf
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...Jens Mittelbach
 
An Introduction to Linked Data and Microdata
An Introduction to Linked Data and MicrodataAn Introduction to Linked Data and Microdata
An Introduction to Linked Data and MicrodataDLFCLIR
 
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)Jan Polowinski
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talkbenosteen
 
Expressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLExpressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLCredential Engine
 
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...ariadnenetwork
 

What's hot (20)

No sql
No sqlNo sql
No sql
 
Data(base) taxonomy
Data(base) taxonomyData(base) taxonomy
Data(base) taxonomy
 
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
 
Database
DatabaseDatabase
Database
 
Dkan
DkanDkan
Dkan
 
DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15
 
DSpace-CRIS_An open source solution for Research_EDU15
DSpace-CRIS_An open source solution for Research_EDU15DSpace-CRIS_An open source solution for Research_EDU15
DSpace-CRIS_An open source solution for Research_EDU15
 
Ado Fundamentals
Ado FundamentalsAdo Fundamentals
Ado Fundamentals
 
I say NoSQL you say what
I say NoSQL you say whatI say NoSQL you say what
I say NoSQL you say what
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data Portals
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
 
An Introduction to Linked Data and Microdata
An Introduction to Linked Data and MicrodataAn Introduction to Linked Data and Microdata
An Introduction to Linked Data and Microdata
 
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
 
Expressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLExpressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDL
 
The Danish National Bibliography as LOD
The Danish National Bibliography as LODThe Danish National Bibliography as LOD
The Danish National Bibliography as LOD
 
Web Spa
Web SpaWeb Spa
Web Spa
 
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
LODLAM Landscape
LODLAM LandscapeLODLAM Landscape
LODLAM Landscape
 

Similar to Data standardization process for social sciences and humanities

Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesvty
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...vty
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And VisualizationIvan Ermilov
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataMarin Dimitrov
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projectszsrlibrary
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphsSören Auer
 
PoolParty Semantic Platform - Overview
PoolParty Semantic Platform - OverviewPoolParty Semantic Platform - Overview
PoolParty Semantic Platform - OverviewSemantic Web Company
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunitiesvty
 
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar SeriesKatina Toufexis
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM4Science
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked dataEnno Meijers
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationDenodo
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Stefan Dietze
 
DataverseNL as structured data hub
DataverseNL as structured data hubDataverseNL as structured data hub
DataverseNL as structured data hubvty
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Stuart Chalk
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 

Similar to Data standardization process for social sciences and humanities (20)

Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projects
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
PoolParty Semantic Platform - Overview
PoolParty Semantic Platform - OverviewPoolParty Semantic Platform - Overview
PoolParty Semantic Platform - Overview
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
Databases for Data Science
Databases for Data ScienceDatabases for Data Science
Databases for Data Science
 
mx & dbs
mx & dbsmx & dbs
mx & dbs
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
DataverseNL as structured data hub
DataverseNL as structured data hubDataverseNL as structured data hub
DataverseNL as structured data hub
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 

More from vty

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs vty
 
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs vty
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure vty
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museumvty
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataversevty
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7vty
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyvty
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes vty
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryvty
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...vty
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligencevty
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Projectvty
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataversevty
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research datavty
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution vty
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataversevty
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse vty
 

More from vty (20)

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataverse
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repository
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research data
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 

Recently uploaded

Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 

Recently uploaded (20)

Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 

Data standardization process for social sciences and humanities

  • 1. dans.knaw.nl DANS is een instituut van KNAW en NWO Data standardization process for arts and humanities Vyacheslav Tykhonov Senior Information Scientist (DANS-KNAW, Netherlands) Developing the SSHOC Reference Ontology workshop ICS-FORTH , Heraklion, Crete 21-22 May, 2019
  • 3. Outline • Standardization process during data deposit and archiving (metadata level created by users) • Research data management and harmonization of deposited datasets (file level) • Standardization and enrichment of harvested content (metadata level provided by different data providers) • Tracking provenance information for data and tools, moving to FAIR Big problem: researchers and librarians are not talking to each other and there is no common Reference model!
  • 4. Metadata schemas • EASY TDR has own metadata schema developed for Dutch scientific landscape but allows Dublin Core export from OAI- PMH endpoint • NARCIS is an aggregator that harvesting metadata from various repositories, no standardization pipeline • Metadata from Dataverse can be exported as:
  • 5. Controlled vocabulary and thesaurus • Linked data is one step forward (or actually backward in the right direction) on solving some of standardization problems. • By having shared controlled vocabularies (CV) created and maintained by experts on various domains, the digital items can be annotated with them and easily retrieved by other experts from the same domain without being librarian. It’s clear indication which vocabulary is good enough and shared by a critical mass. • A thesaurus is a semantic network of unique concepts, including relationships between synonyms, broader and narrower (parent/child) contexts, and other related concepts. Thesaurus is hierarchy for controlled vocabularies.
  • 6. SSHOC data repository DANS-KNAW is leading the development of SSHOC DataverseEU project. We’re developing multilingual web interface and localizing metadata fields and developed data standardization technique based on APIs for CESSDA CVs, Topic Classification and CESSDA CV Manager services. SSHOC/CESSDA DataverseEU: • Hungary (TARKI) • Sweden (SND) • Slovenia (ADP) • Germany (GESIS) • France (SciencesPro) • Austria (AUSSDA) • United Kingdom (UKDA) • Italy (CNR, UniData) • Belgium (SODA) • Latvia (LSZDA) • Poland (PSNC) • Norway (DataverseNO) • Netherlands (DANS-KNAW)
  • 7. SKOS RDF Vocabularies (CESSDA) We’re importing thesaurus delivered as SKOS RDF, for example: Rest API endpoint delivers back JSON suitable for web applications.
  • 10. Standardized metadata in RDF All relations exported and available in the Knowledge Graph and ready for the further querying and exploration:
  • 11. Research data management Data standardization process plays a key role in the data management plan of any organization but current situation in research data management is very complex: • too much data chaos in datasets • no data transparency • sometimes no standards available • no provenance information attached to data • homonyms, synonyms, generalizations, specializations, spelling variations and mistakes, language versions are all complicating the keyword-based search and retrieval of information
  • 12. Data standardization pipeline based on chatbot
  • 13. Mapping produced by AI as result mappings: Image-image: predicateobjects: - [a, 'http://xmlns.com/foaf/0.1/Image'] - [a, 'http://schema.org/ImageObject'] - [a, 'http://schema.org/CreativeWork'] - [a, 'http://xmlns.com/foaf/0.1/Document'] - ['http://www.w3.org/2000/01/rdf-schema#label', $(image)] - ['http://schema.org/image', $(image)] source: dataset-source subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Image/$(image) Person-name: predicateobjects: - [a, 'http://schema.org/Person'] - [a, 'http://www.w3.org/2000/10/swap/pim/contact#Person'] - [a, 'http://xmlns.com/foaf/0.1/Person'] - [a, 'http://purl.org/dc/terms/Agent'] - [a, 'http://purl.org/goodrelations/v1#BusinessEntity'] - [a, 'http://rhizomik.net/ontologies/copyrightonto.owl#LegalPerson'] - [a, 'http://schema.org/Thing'] - [a, 'http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing'] - [a, 'http://xmlns.com/foaf/0.1/Agent'] - ['http://www.w3.org/2000/01/rdf-schema#label', $(name)] source: dataset-source subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Person/$(name) Place-birth_cty: predicateobjects: - [a, 'http://schema.org/Place'] - [a, 'http://purl.org/goodrelations/v1#Location'] - [a, 'http://rdfs.co/juso/SpatialThing'] - [a, 'http://schema.org/Thing'] - ['http://www.w3.org/2000/01/rdf-schema#label', $(birth_cty)] - ['http://dbpedia.org/ontology/era', $(era)] - ['http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#isDescribedBy', $(era)] - ['http://dbpedia.org/ontology/birthDate', $(birth), 'http://www.w3.org/2001/XMLSchema#datetime'] source: dataset-source subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Place/$(birth_cty) Place-birth_prv: predicateobjects: - [a, 'http://schema.org/Place'] - [a, 'http://purl.org/goodrelations/v1#Location'] - [a, 'http://rdfs.co/juso/SpatialThing'] - [a, 'http://schema.org/Thing'] - ['http://www.w3.org/2000/01/rdf-schema#label', $(birth_prv)] - ['http://dbpedia.org/ontology/deathDate', $(death), 'http://www.w3.org/2001/XMLSchema#datetime'] source: dataset-source subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Place/$(birth_prv)
  • 14. NARCIS metadata (example) No authority linking or controlled vocabularies support, but…
  • 16. Prov-O example from PARTHENOS project
  • 17. Time Machine association • large scale project with 300+ partners • development and support of sustainable networked services • trends watching and tracking of software maturity level • reliable governance model • the foundation for the further innovation!
  • 18. Conclusion • development of large-scale networked services out of research pipelines • every service should be mature enough, maintainable and follow continuous integration pipeline • tracking provenance information for every tool and dataset is the highest priority • creation and governance of standardization pipelines based on services providing access to domain specific controlled vocabularies and ontologies • providing access to data, metadata and provenance (processes) in the Knowledge Graph • further integration of services maintained by different partners and deployed in the Cloud
  • 19. Questions? Feel free to ask questions! Vyacheslav (Slava) Tykhonov e-mail: vyacheslav.tykhonov@dans.knaw.nl website: http://dans.knaw.nl (DANS-KNAW)