SlideShare uma empresa Scribd logo
1 de 34
Metaverse for Dataverse
Building collaborative Machine Learning platform for
Dataverse network
Slava Tykhonov, R&D
(DANS-KNAW)
DANS seminar, 29.03.2022
What is Metaverse?
“A metaverse is a network of 3D virtual worlds focused on social connection. In
futurism and science fiction, it is often described as a hypothetical iteration of the
Internet as a single, universal virtual world that is facilitated by the use of virtual
and augmented reality headsets.”
“Access points for metaverses include general-purpose computers and
smartphones, in addition to augmented reality (AR), mixed reality, virtual reality
(VR), and virtual world technologies.”
Wikipedia
Where is the place of Open Science in this vision?
Moving towards Open Science
Source: Citizen Science and Open Science Core Concepts and Areas of Synergy (Vohland and Göbel, 2017)
Time Machine project
● An international collaboration to bring 5000 years of European history to
life
● Digitising millions of historical documents, painting and monuments
● The largest computer simulation ever developed
● An open access, interactive resource
● 600+ consortium members from the European countries
● top academic and research institutions
● Private sector partners from SMEs to international companies
“Our focus in on the joint efforts on Big Data, artificial intelligence,
augmented reality and 3D and the development of European
platforms in line with European values”.
Visit http://timemachine.eu
(Semantic) Web 3.0 as a new Internet
● Web 3.0 is democratized – it will be built on a decentralized blockchain protocol where there
is no centralized ownership of content, services, or platforms.
● It is semantic – the Semantic Web is not identical to Web 3.0, but is an underlying technology
for the third generation of the internet. It would allow multiple internet pages to be correlated
using a semantic protocol so that the relationships between pages are apparent, indexed, and
searchable.
● It may be spatial in nature – While Web 1.0 and Web 2.0 offered two-dimensional
experiences, Web 3.0 may be immersive and offers spatial experiences similar to the real
world. To achieve this, there will be a spatially-interactive layer on top of the digital
information layer, which uses sensory triggers and controls like voice, gesture, biometric
commands, and others.
Source: XR Today
NFT (a non-fungible token) in Metaverse
“The metaverse is a future evolution of the Internet based on persistent, shared
virtual worlds in which people interact as 3D avatars.
Blockchain technology may provide the backbone of the metaverse, with
interoperable NFT assets that can be used across different metaverse spaces.”
Source: Decrypt
Open questions: can Metaverse be created without common FAIR principles?
How the semantic interoperability layer should look like?
Vision: Semantic interoperability on the infrastructure level
We envision a situation where thousands of Dataverse instances (due to EOSC) on the
web can be simultaneously search for data and will form Data Lake.
The old dream of Federated search/Universal catalogue can only be realised if:
(1) Cross -walks; mapping across different metadata schemes are implemented
(2) In metadata schemes we seek for ways to enrich indexes with values from controlled
vocabularies
Standard response (centralized) = standardisation and harmonisation = repository
software, certain metadata standards, or certain controlled vocabularies
New response (distributed) = explore agile solutions (Proof of Concepts) which can be
implemented by different communities (even smaller ones), so we keep variety and still
enable integration by applying Linked Data technologies.
Data Stations - Future Data Services
Dataverse is API based data platform and a key framework for Open Innovation!
Conceptual approach: building common infrastructure components
Dataverse Semantic API in release 5.6: https://github.com/IQSS/dataverse/releases/tag/v5.6
“Dataset metadata can be retrieved, set, and updated using a new, flatter JSON-LD format -
following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier transfer of
metadata to/from other systems (i.e. without needing to know Dataverse's metadata block and
field storage architecture). This new API also allows for the update of terms metadata“.
External controlled vocabularies support is being developed by DANS in SSHOC project and
already integrated in Dataverse core in release 5.7.
Proposal:
https://docs.google.com/document/d/1txdcFuxskRx_tLsDQ7KKLFTMR_r9IBhorDu3V_r445w/
Interfaces: http://github.com/gdcc/dataverse-external-vocab-support
Integrations: Wikidata, ORCID, MeSH, Skosmos vocabularies
External controlled vocabularies in Dataverse
Any research community can run the same Dataverse instance with own controlled vocabularies linked in FAIR way!
“Archive in a box” - SSHOC Dataverse
● fully automatic Dataverse deployment with Traefik proxy
● Dataverse configuration managed through environmental file .env
● different Dataverse distributions with services on your preference suitable for different
use cases
● external controlled vocabularies support (demo of CESSDA CMM metadata fields
connected to Skosmos framework)
● MinIO storage support for Cloud Storage
● data previewers integrated in the distributive
● startup process managed through scripts located in init.d folder
● automatic SOLR reindex
● external services integration PostgreSQL triggers
● support of custom metadata schemes (CESSDA CMM, CLARIN CMDI, ...)
● built-in Web interface localization uses Dataverse language pack to support multiple
languages out of the box
https://github.com/IQSS/dataverse-docker
Building Dataverse distributions
“Software distribution is the process of delivering software to the end user.
A distro is a collection of software components built, assembled and configured so that it can
essentially be used "as is". It is often the closest thing to turnkey form of free software. A distro may
take the form of a binary distribution, with an executable installer which can be downloaded from the
Internet. Examples range from whole operating system distributions to server and interpreter
distributions (for example WAMP installers).
Examples of software distributions include BSD-based distros (such as FreeBSD, NetBSD, OpenBSD,
and DragonflyBSD) and Linux-based distros (such as openSUSE, Ubuntu, and Fedora). “
Source: Wikipedia
We can build a Dataverse based distributions for research communities and link them into distributed
data network to solve all interoperability problems! EOSC, CESSDA, CLARIAH, DARIAH, ODISSEI,
… will have own metadata schemes but use the same Dataverse technology.
Benefits of the Common Data Infrastructure (Distributions)
● maintenance costs will drop massively, as more organizations will join, less
expensive it will be to support
● It’s distributed and sustainable, suitable for the future
● maintenance costs could be reallocated to the training and further
development of the new (common) features
● reuse of the same infrastructure components will enforce the quality and the
speed of the knowledge exchange
● building a multidisciplinary teams reusing the same infra can bring us new
insights and unexpected views
● Common Data Infrastructure plays a role of the universal gravitation layer
for Data Science projects
(and so on…)
Historically most of datasets preserved in data silos (archives), not interlinked and
lacking of standardization. There are cultural, structural and technological
challenges.
Solutions:
● Integrating Linked Data and Semantic Web technologies, forcing research
communities to share data and add more interoperability following FAIR
principles
● Create a standardized (meta)data layer for Large Scale projects like Time
Machine and CoronaWhy
● Working on the automatic metadata linkage to ontologies and external
controlled vocabularies in order to get it linked in the Linked Open Data Cloud
● Using the Knowledge Graph for the Machine Learning
Supporting Semantic Web for Data
Why Artificial Intelligence?
Human resources are very expensive and deficit, it’s difficult to find
appropriate expertise in-house.
Solution:
● Building AI/ML pipelines for the automatic metadata enrichment and
linkage prediction
● applying NLP for NER, data mining, topic classification etc
● building multidisciplinary knowledge graphs should facilitate the
development of new projects with economic and social scientists, they will
take ownership of their own data if they see value (Clio Infra)
How to control Artificial Intelligence
Problem:
It’s naive to fully trust Machine Learning and AI, we need to support a “human
in the loop” processes to take a control over automatic workflows. Ethics is
also important, fake detection problem.
Solution:
A lot of “human in the loop” tools already developed in research projects, we
need to support the best for the different use cases, add the appropriate
maturity, for example, with CI/CD and introduce them to research
communities.
Human-in-the-Loop for Machine Learning
“Computers are incredibly fast, accurate
and stupid; humans are incredibly slow,
inaccurate and brilliant; together they
are powerful beyond imagination."
Albert Einstein
“A combination of AI and Human
Intelligence gives rise to an extremely
high level of accuracy and intelligence
(Super Intelligence)”
17
Source: Hackernoon.com
Human in the loop explained
General blueprint for a human-in-the-loop interactive AI system. Credits: Stanford University HAI
“how do we build a smarter system?” to “how do we incorporate useful,
meaningful human interaction into the system?”
Hypothes.is annotations as a peer review service
1. AI pipeline does
domain specific
entities extraction
and ranking of
relevant papers.
2. Automatic entities
and statements will
be added, important
fragments should be
highlighted.
3. Human annotators
should verify results
and validate all
statements.
19
Doccano annotation with Machine Learning
Source: Doccano Labs
Dataverse network as solution for Metaverse
- persistency, how to archive and move an object (artifact) from one
metaverse space to another virtual world
- how persistent identifiers in Dataverse can solve the problem PIDs for
files (from archiving to live representation)
- Interoperability layer should allow to build smart contracts and track their
usage
Problem: too much human resources and adoption time is about 3-5 years.
How to speed up?
SpaCy - state of art NLP library
SpaCy models
Using modern NLP frameworks (spacy, for example)
Recognised entities can form a metadata layer and stored in Dataverse
Projects in Doccano
Doccano annotation tool
Integrating metadata keywords in annotation
Metadata /
Dataverse
Annotation /
Doccano
Keyword:
“legal research”
Building Metaverse: COVID-19 Museum
● The idea to create a ‘museum’ on Covid-19 objects emerged at the
Université de Paris, France and directed by Prof. Yves Rozenholc
● the COVID-19 Museum is envisioned as a public service and will respect
the ownership of all digital artefacts
● the aim is to create a virtual space where all metadata related to the
pandemic collected and interlinked
● while it starts as a French initiative, the vision of its makers is to turn it
into an international effort in multiple languages
● This museum should bring together researchers and artists, curious and
critical, amateurs and professionals around their objects of care, study
and astonishment.
COVID-19 Museum demonstrator
Instagram workflow:
create a subcollection of images related to a few COVID-19 related keywords like ‘nurse’,
‘art’, ‘museum’, etc. Extraction and storage of metadata in Dataverse collections should be
done without breaking any copyrights by keeping original references to the artifacts.
COVID-19 pandemic coverage in newspapers:
collect news articles from the French news media
for each article extract important public attributes like title, date, summary, illustrating image
and a few lines of text publicly available.
Corona pages in Wikipedia:
collect attributes from wiki pages revision: date of change, author, contribution by amount of
characters
realize a 2D-timeline of this set of wikipedia pages with respect to time-creation and time-
history and offer a vision of the history evolution of every page.
COVID-19 Museum Dataverse
Annotation with Hypothes.is
Media timeline in chronological order
Try here!
State of mind…
Ukraine,
March 2022
https://www.instagram.com/
p/CbUeSOOMnWL/
Thank you! Questions?
Slava Tykhonov
DANS-KNAW
vyacheslav.tykhonov@dans.knaw.nl

Mais conteúdo relacionado

Mais procurados

Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricNathan Bijnens
 
Fujitsu Hybrid IT & Multi Cloud Services
Fujitsu Hybrid IT & Multi Cloud ServicesFujitsu Hybrid IT & Multi Cloud Services
Fujitsu Hybrid IT & Multi Cloud ServicesAlessandro Guli
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Kent Graziano
 
Introduction to Knowledge Graphs for Information Architects.pdf
Introduction to Knowledge Graphs for Information Architects.pdfIntroduction to Knowledge Graphs for Information Architects.pdf
Introduction to Knowledge Graphs for Information Architects.pdfHeather Hedden
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Accenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdfAccenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdfRajvir Kaushal
 
Workshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceWorkshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceNeo4j
 
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...DataWorks Summit
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data MeshLibbySchulze
 
How to build a business glossary linked with data dictionary
How to build a business glossary linked with data dictionaryHow to build a business glossary linked with data dictionary
How to build a business glossary linked with data dictionaryPiotr Kononow
 
Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Hans Hultgren
 
Metadata and ontologies
Metadata and ontologiesMetadata and ontologies
Metadata and ontologiesDavid Lamas
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
 
Making connections matter: 2 use cases on graphs & analytics solutions
Making connections matter: 2 use cases on graphs & analytics solutionsMaking connections matter: 2 use cases on graphs & analytics solutions
Making connections matter: 2 use cases on graphs & analytics solutionsNeo4j
 

Mais procurados (20)

Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
Fujitsu Hybrid IT & Multi Cloud Services
Fujitsu Hybrid IT & Multi Cloud ServicesFujitsu Hybrid IT & Multi Cloud Services
Fujitsu Hybrid IT & Multi Cloud Services
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Introduction to Knowledge Graphs for Information Architects.pdf
Introduction to Knowledge Graphs for Information Architects.pdfIntroduction to Knowledge Graphs for Information Architects.pdf
Introduction to Knowledge Graphs for Information Architects.pdf
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Accenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdfAccenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdf
 
Workshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceWorkshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data Science
 
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
How to build a business glossary linked with data dictionary
How to build a business glossary linked with data dictionaryHow to build a business glossary linked with data dictionary
How to build a business glossary linked with data dictionary
 
Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011
 
Metadata and ontologies
Metadata and ontologiesMetadata and ontologies
Metadata and ontologies
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Making connections matter: 2 use cases on graphs & analytics solutions
Making connections matter: 2 use cases on graphs & analytics solutionsMaking connections matter: 2 use cases on graphs & analytics solutions
Making connections matter: 2 use cases on graphs & analytics solutions
 

Semelhante a Metaverse for Dataverse

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs vty
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Projectvty
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution vty
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligencevty
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic WaveKaniska Mandal
 
Open Source Clouds: Be The Change...
Open Source Clouds: Be The Change...Open Source Clouds: Be The Change...
Open Source Clouds: Be The Change...GreenQloud
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things PayamBarnaghi
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic TechnolgyTalat Fakhri
 
A Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfA Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfGeethaPratyusha
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic WebJohn Breslin
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)vty
 
Designing High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCDesigning High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCObject Automation
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Artefactual Systems - AtoM
 
Metaverse Developments, Technologies, and Standards - Towards a Military Meta...
Metaverse Developments, Technologies, and Standards - Towards a Military Meta...Metaverse Developments, Technologies, and Standards - Towards a Military Meta...
Metaverse Developments, Technologies, and Standards - Towards a Military Meta...Andy Fawkes
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...vty
 
OpenReality Vision
OpenReality VisionOpenReality Vision
OpenReality Visionktweedy1
 

Semelhante a Metaverse for Dataverse (20)

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Edu.03 assignment
Edu.03 assignment Edu.03 assignment
Edu.03 assignment
 
Edu.03
Edu.03 Edu.03
Edu.03
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic Wave
 
Open Source Clouds: Be The Change...
Open Source Clouds: Be The Change...Open Source Clouds: Be The Change...
Open Source Clouds: Be The Change...
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic Technolgy
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
A Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfA Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdf
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Designing High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCDesigning High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPC
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
 
Metaverse Developments, Technologies, and Standards - Towards a Military Meta...
Metaverse Developments, Technologies, and Standards - Towards a Military Meta...Metaverse Developments, Technologies, and Standards - Towards a Military Meta...
Metaverse Developments, Technologies, and Standards - Towards a Military Meta...
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 
OpenReality Vision
OpenReality VisionOpenReality Vision
OpenReality Vision
 

Mais de vty

Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs vty
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museumvty
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7vty
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyvty
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes vty
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryvty
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...vty
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataversevty
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research datavty
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataversevty
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse vty
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC,  Service QA and DataverseIntegration of WORSICA’s thematic service in EOSC,  Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataversevty
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes vty
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challengesvty
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloudvty
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2vty
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloudvty
 

Mais de vty (20)

Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repository
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research data
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC,  Service QA and DataverseIntegration of WORSICA’s thematic service in EOSC,  Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challenges
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloud
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloud
 

Último

Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 

Último (20)

Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 

Metaverse for Dataverse

  • 1. Metaverse for Dataverse Building collaborative Machine Learning platform for Dataverse network Slava Tykhonov, R&D (DANS-KNAW) DANS seminar, 29.03.2022
  • 2. What is Metaverse? “A metaverse is a network of 3D virtual worlds focused on social connection. In futurism and science fiction, it is often described as a hypothetical iteration of the Internet as a single, universal virtual world that is facilitated by the use of virtual and augmented reality headsets.” “Access points for metaverses include general-purpose computers and smartphones, in addition to augmented reality (AR), mixed reality, virtual reality (VR), and virtual world technologies.” Wikipedia Where is the place of Open Science in this vision?
  • 3. Moving towards Open Science Source: Citizen Science and Open Science Core Concepts and Areas of Synergy (Vohland and Göbel, 2017)
  • 4. Time Machine project ● An international collaboration to bring 5000 years of European history to life ● Digitising millions of historical documents, painting and monuments ● The largest computer simulation ever developed ● An open access, interactive resource ● 600+ consortium members from the European countries ● top academic and research institutions ● Private sector partners from SMEs to international companies “Our focus in on the joint efforts on Big Data, artificial intelligence, augmented reality and 3D and the development of European platforms in line with European values”. Visit http://timemachine.eu
  • 5. (Semantic) Web 3.0 as a new Internet ● Web 3.0 is democratized – it will be built on a decentralized blockchain protocol where there is no centralized ownership of content, services, or platforms. ● It is semantic – the Semantic Web is not identical to Web 3.0, but is an underlying technology for the third generation of the internet. It would allow multiple internet pages to be correlated using a semantic protocol so that the relationships between pages are apparent, indexed, and searchable. ● It may be spatial in nature – While Web 1.0 and Web 2.0 offered two-dimensional experiences, Web 3.0 may be immersive and offers spatial experiences similar to the real world. To achieve this, there will be a spatially-interactive layer on top of the digital information layer, which uses sensory triggers and controls like voice, gesture, biometric commands, and others. Source: XR Today
  • 6. NFT (a non-fungible token) in Metaverse “The metaverse is a future evolution of the Internet based on persistent, shared virtual worlds in which people interact as 3D avatars. Blockchain technology may provide the backbone of the metaverse, with interoperable NFT assets that can be used across different metaverse spaces.” Source: Decrypt Open questions: can Metaverse be created without common FAIR principles? How the semantic interoperability layer should look like?
  • 7. Vision: Semantic interoperability on the infrastructure level We envision a situation where thousands of Dataverse instances (due to EOSC) on the web can be simultaneously search for data and will form Data Lake. The old dream of Federated search/Universal catalogue can only be realised if: (1) Cross -walks; mapping across different metadata schemes are implemented (2) In metadata schemes we seek for ways to enrich indexes with values from controlled vocabularies Standard response (centralized) = standardisation and harmonisation = repository software, certain metadata standards, or certain controlled vocabularies New response (distributed) = explore agile solutions (Proof of Concepts) which can be implemented by different communities (even smaller ones), so we keep variety and still enable integration by applying Linked Data technologies.
  • 8. Data Stations - Future Data Services Dataverse is API based data platform and a key framework for Open Innovation!
  • 9. Conceptual approach: building common infrastructure components Dataverse Semantic API in release 5.6: https://github.com/IQSS/dataverse/releases/tag/v5.6 “Dataset metadata can be retrieved, set, and updated using a new, flatter JSON-LD format - following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier transfer of metadata to/from other systems (i.e. without needing to know Dataverse's metadata block and field storage architecture). This new API also allows for the update of terms metadata“. External controlled vocabularies support is being developed by DANS in SSHOC project and already integrated in Dataverse core in release 5.7. Proposal: https://docs.google.com/document/d/1txdcFuxskRx_tLsDQ7KKLFTMR_r9IBhorDu3V_r445w/ Interfaces: http://github.com/gdcc/dataverse-external-vocab-support Integrations: Wikidata, ORCID, MeSH, Skosmos vocabularies
  • 10. External controlled vocabularies in Dataverse Any research community can run the same Dataverse instance with own controlled vocabularies linked in FAIR way!
  • 11. “Archive in a box” - SSHOC Dataverse ● fully automatic Dataverse deployment with Traefik proxy ● Dataverse configuration managed through environmental file .env ● different Dataverse distributions with services on your preference suitable for different use cases ● external controlled vocabularies support (demo of CESSDA CMM metadata fields connected to Skosmos framework) ● MinIO storage support for Cloud Storage ● data previewers integrated in the distributive ● startup process managed through scripts located in init.d folder ● automatic SOLR reindex ● external services integration PostgreSQL triggers ● support of custom metadata schemes (CESSDA CMM, CLARIN CMDI, ...) ● built-in Web interface localization uses Dataverse language pack to support multiple languages out of the box https://github.com/IQSS/dataverse-docker
  • 12. Building Dataverse distributions “Software distribution is the process of delivering software to the end user. A distro is a collection of software components built, assembled and configured so that it can essentially be used "as is". It is often the closest thing to turnkey form of free software. A distro may take the form of a binary distribution, with an executable installer which can be downloaded from the Internet. Examples range from whole operating system distributions to server and interpreter distributions (for example WAMP installers). Examples of software distributions include BSD-based distros (such as FreeBSD, NetBSD, OpenBSD, and DragonflyBSD) and Linux-based distros (such as openSUSE, Ubuntu, and Fedora). “ Source: Wikipedia We can build a Dataverse based distributions for research communities and link them into distributed data network to solve all interoperability problems! EOSC, CESSDA, CLARIAH, DARIAH, ODISSEI, … will have own metadata schemes but use the same Dataverse technology.
  • 13. Benefits of the Common Data Infrastructure (Distributions) ● maintenance costs will drop massively, as more organizations will join, less expensive it will be to support ● It’s distributed and sustainable, suitable for the future ● maintenance costs could be reallocated to the training and further development of the new (common) features ● reuse of the same infrastructure components will enforce the quality and the speed of the knowledge exchange ● building a multidisciplinary teams reusing the same infra can bring us new insights and unexpected views ● Common Data Infrastructure plays a role of the universal gravitation layer for Data Science projects (and so on…)
  • 14. Historically most of datasets preserved in data silos (archives), not interlinked and lacking of standardization. There are cultural, structural and technological challenges. Solutions: ● Integrating Linked Data and Semantic Web technologies, forcing research communities to share data and add more interoperability following FAIR principles ● Create a standardized (meta)data layer for Large Scale projects like Time Machine and CoronaWhy ● Working on the automatic metadata linkage to ontologies and external controlled vocabularies in order to get it linked in the Linked Open Data Cloud ● Using the Knowledge Graph for the Machine Learning Supporting Semantic Web for Data
  • 15. Why Artificial Intelligence? Human resources are very expensive and deficit, it’s difficult to find appropriate expertise in-house. Solution: ● Building AI/ML pipelines for the automatic metadata enrichment and linkage prediction ● applying NLP for NER, data mining, topic classification etc ● building multidisciplinary knowledge graphs should facilitate the development of new projects with economic and social scientists, they will take ownership of their own data if they see value (Clio Infra)
  • 16. How to control Artificial Intelligence Problem: It’s naive to fully trust Machine Learning and AI, we need to support a “human in the loop” processes to take a control over automatic workflows. Ethics is also important, fake detection problem. Solution: A lot of “human in the loop” tools already developed in research projects, we need to support the best for the different use cases, add the appropriate maturity, for example, with CI/CD and introduce them to research communities.
  • 17. Human-in-the-Loop for Machine Learning “Computers are incredibly fast, accurate and stupid; humans are incredibly slow, inaccurate and brilliant; together they are powerful beyond imagination." Albert Einstein “A combination of AI and Human Intelligence gives rise to an extremely high level of accuracy and intelligence (Super Intelligence)” 17 Source: Hackernoon.com
  • 18. Human in the loop explained General blueprint for a human-in-the-loop interactive AI system. Credits: Stanford University HAI “how do we build a smarter system?” to “how do we incorporate useful, meaningful human interaction into the system?”
  • 19. Hypothes.is annotations as a peer review service 1. AI pipeline does domain specific entities extraction and ranking of relevant papers. 2. Automatic entities and statements will be added, important fragments should be highlighted. 3. Human annotators should verify results and validate all statements. 19
  • 20. Doccano annotation with Machine Learning Source: Doccano Labs
  • 21. Dataverse network as solution for Metaverse - persistency, how to archive and move an object (artifact) from one metaverse space to another virtual world - how persistent identifiers in Dataverse can solve the problem PIDs for files (from archiving to live representation) - Interoperability layer should allow to build smart contracts and track their usage Problem: too much human resources and adoption time is about 3-5 years. How to speed up?
  • 22. SpaCy - state of art NLP library
  • 24. Using modern NLP frameworks (spacy, for example) Recognised entities can form a metadata layer and stored in Dataverse
  • 27. Integrating metadata keywords in annotation Metadata / Dataverse Annotation / Doccano Keyword: “legal research”
  • 28. Building Metaverse: COVID-19 Museum ● The idea to create a ‘museum’ on Covid-19 objects emerged at the Université de Paris, France and directed by Prof. Yves Rozenholc ● the COVID-19 Museum is envisioned as a public service and will respect the ownership of all digital artefacts ● the aim is to create a virtual space where all metadata related to the pandemic collected and interlinked ● while it starts as a French initiative, the vision of its makers is to turn it into an international effort in multiple languages ● This museum should bring together researchers and artists, curious and critical, amateurs and professionals around their objects of care, study and astonishment.
  • 29. COVID-19 Museum demonstrator Instagram workflow: create a subcollection of images related to a few COVID-19 related keywords like ‘nurse’, ‘art’, ‘museum’, etc. Extraction and storage of metadata in Dataverse collections should be done without breaking any copyrights by keeping original references to the artifacts. COVID-19 pandemic coverage in newspapers: collect news articles from the French news media for each article extract important public attributes like title, date, summary, illustrating image and a few lines of text publicly available. Corona pages in Wikipedia: collect attributes from wiki pages revision: date of change, author, contribution by amount of characters realize a 2D-timeline of this set of wikipedia pages with respect to time-creation and time- history and offer a vision of the history evolution of every page.
  • 32. Media timeline in chronological order Try here!
  • 33. State of mind… Ukraine, March 2022 https://www.instagram.com/ p/CbUeSOOMnWL/
  • 34. Thank you! Questions? Slava Tykhonov DANS-KNAW vyacheslav.tykhonov@dans.knaw.nl