20191119_The OpenAIRE Research Graph

@openaire_euOpenAIRE-Connect Review
23rd of April, 2018 - Brussels
The OpenAIRE Research Graph
Bringing scholarly communication back into the
hands of scientists
PaoloManghi
InstituteofInformationScienceandTechnologies
ConsiglioNazionaledelleRicerche

Materializing the Open Science Graph
Project
communit
y
FunderFunding
Product
Publicatio
n
Researc
h Data
Software
Organizatio
n
Source
Other
res.
products
Mining
Deduplication
End-user feedback
Scientific product
catalogue
Harvesting
GUIDE
LINES
Research Infrastructures Publishing
IT
OpenAIREAdvance1stReview|Luxembourg|10Oct2019

Providing an open metadata
research graph of interlinked
scientific products, with Open
Access information, linked to
funding information and research
communities
The OpenAIRE research graph
Open
Complete
De-duplicated
Transparent
Participatory
Decentralized
Trusted

De-duplicated
More information about the de-duplication framework used by OpenAIRE can be found
searching on Zenodo for :
• “De-duplicating the OpenAIRE Scholarly Communication Big Graph” (poster)
• “GDup: De-Duplication of Scholarly Communication Big Graphs”
Metadata records
corresponding to equivalent
objects are merged
Scientific products
Organizations

Complete: community-trusted sources
Academic Graph
… and more
… and more
… and more
… and more
… and more
… and more

• Rely on quality scholarly
communication sources of
different kinds
Participatory
• Include solutions and content
from any interested and known
content provider in scholarly
communication
Institutional repositories
Aggregators
Data archives
Software repositories
Research infrastructure sources
Funder grant databases
Authors & Orgs entity registries
Publishers & journals

• Metadata in the graph includes provenance when harvested
and reliability indicators when obtained from mining
Transparent

• Preservation and ownership beyond OpenAIRE
Exchanged with other graph initiatives
Broker Service: Redistributed via subscription and
notification to contributing data sources
(provide.openaire.eu)
• Openly accessible via APIs
(develop.openaire.eu)
Decentralized

• Authors in the loop to enrich their ORCID record
• Validation of end-user ”claims”
Trusted (November 2019)

Harvesting: Revised Classification of Research
Products
Publications
• Article
• Preprint
• Report
• …
Datasets
• Dataset
• Collection
• Clinical Trials
• …
Software
• Research
Software
• …
Other Research
Products
• Service
• Workflow
• Interactive
Resource
• …
Institutional/
publication
repositories
Journals/
publishers
Data
repositories
Other
Products
repositories
Software
repositories
Workshop Técnico OpenAIRE / LA Referencia | 29-30 October, 2019 | Costa Rica

Open Science publishing
Bridging RIs and Scholarly Communication
Transparency and reproducibility
e-Infrastructures and
Research Infrastructures
Scholarly Communication
infrastructure
Dataset
Method Thematic
Service
Dataset
Experiment Publishing
the experiment
Input
Dataset
Input
Method
Output
Dataset
Experiment
product
Thematic Service
Parameters
Experiment
repo
Research data,
Software,
Workflows,
Publications
Data repo
Method repo
Publications
IT
Harvesting

• EPOS Research Infrastructure
Reproducibility
Transparency
Seamless publishing
Open Science publishing workflows

Pre-processed sources
Article-dataset links
480Mi links
CrossRef enriched
85Mi publication records
DOIBoost
Academic Graph
Published every 6 months
(new versions to be published next week)

Context Propagation
Product
Source
Country
Project
Organization
communit
y
Product
Project Source
Product
Project
Product
supplementedBy
fundedBy
hostedBy
(institutional repository)
located
Funder
funds
(National Funder)
fundedBy
jurisdiction
located
ofInterestofInterest
fundedBy
hostedBy
Product
supplementedBy
157K
8Mi 10K

Production: Open Access CAPs
BETA: Open Science CAPs
0
10000000
20000000
30000000
40000000
50000000
60000000
70000000
80000000
90000000
100000000
Old CAP New CAP
literature
0
2000000
4000000
6000000
8000000
10000000
12000000
Old CAP New CAP
research data
0
20000
40000
60000
80000
100000
120000
140000
Old CAP New CAP
software
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
Old CAP New CAP
other
110Mi
30Mi
1Mi
10Mi
100K
180K
3Mi
7.5Mi
Harvested content
• Data sources
10K +
• Records
~480Mi
• Publication full-texts
~12Mi (Springer N. coming)
• Links (also text-mined)
~960Mi
PROD BETA PROD BETA
PROD BETAPROD BETA

Microsoft Research (being drafted)
Unpaywall (ongoing)
ORCID membership (November 2019)
RDA IG Open Science Graphs for FAIR Data
FREYA, ResearchGraph, OpenCitations,
Open Knowledge Research Graph
IG Session at RDA Helsinki 2019 (15th of October 2019)
Liaisons
Academic Graph

• October-November 2019:
OpenAIRE Research Graph open for consultation
Collecting feedback via Trello (operational end of September)
• December 2019:
OpenAIRE Research Graph
in production
BETA Graph Open Consultation
http://beta.explore.openaire.eu

Thank you!
Paolo Manghi
paolo.manghi@isti.cnr.it

Architecture,
technologies, and
infrastructure

Metadata
records
files
cleaned
records
Full-text
cache
Transform
Clean
Identify
equivelent
products
and
organisation
s
Aggregation subsystem
De-duplication
subsystem
Information Inference subsystem
Data Sources
Populate
Merge equivalent objects
Data provision
subsystem
Collect
Native graph
“slices”
Publishing
subsystem
Data Monitoring
Action Sets
(similarity
rels)
Front-end
Native
graph
Deduped
graph
Extract full-text
Copy of deduped
graph
Enrich graphs with links
Action Set
(inferred
links)
Enriched
graph
Propagation
Text-mining of
the full-texts and
the graph to
derive new
semantic links
Architecture and technologies: today

Task 9.1. System administration -
infrastructure: before Jan 2018
Public
System
20srv
122CPU
320GB
8TB
Mining
System
21srv
406CPU
2TB
385TB
Data provision
System
23srv
154CPU
430GB
23TB
Testing
System
5srv
30CPU
100GB
3TB
Public
System
44srv
274CPU
905GB
20TB
Mining
System
22srv
414CPU
2.2TB
388TB
Data provision
System
23srv
154CPU
430GB
24TB
Testing
System
14srv
86CPU
302GB
9TB

20191119_The OpenAIRE Research Graph

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a 20191119_The OpenAIRE Research Graph

Semelhante a 20191119_The OpenAIRE Research Graph (20)

Mais de OpenAIRE

Mais de OpenAIRE (20)

Último

Último (20)

20191119_The OpenAIRE Research Graph

Notas do Editor