SlideShare a Scribd company logo
1 of 1
Once your repository platform has been made OpenAIRE compliant, researchers from your institution will
be able to deposit their publications by providing the relative files and bibliographic metadata, inclusive of
license information and the list of EC projects which funded such publications. Integrating your repository
with the OpenAIRE infrastructure is an important step towards helping your researchers at complying with
the EC Open Access mandate. However, while this will be a clear benefit for the future, what happens with
all the publications deposited in the past, whose metadata did not include EC project information?
You can approach the problem in two ways. The so-called manual approach consists in asking your
researchers to revise and complete all past depositions through the newly provided user interfaces. Since
this may be a tedious job, the OpenAIRE infrastructure offers an automatic inference approach, according
to which special services are capable of inferring from the PDF files of the publications the list of EC projects
that have likely funded such publications.
To this aim, repository managers must make available the PDF files of the publications to the OpenAIRE
infrastructure. This can happen through standards protocols, such as FTP, to be agreed with the OpenAIRE
technical team. Most importantly, the names of the PDF files must include the OAI-PMH identifier provided
with the corresponding metadata records. This implicit link will allow for the completion of the metadata
information with the EC project information to be extracted by OpenAIRE.
The inference process returns to repository managers the list of file names for which it was possible to infer
at least one EC project, followed by the relative list of grant agreement numbers. The list can be provided in
several formats, including txt or Excel files, to be agreed with the OpenAIRE technical team. Repository
managers must write scripts capable of processing such list to complete the local database with the missing
associations between publications and EC projects. At this stage, repository managers may involve
researchers to confirm the result of the inference process and therefore enable a simplified and faster
manual approach.
The automatic inference service requires considerable CPU consumption in order to parse large sets of PDF
files and identify references to EC projects grant agreement numbers. To this aim, OpenAIRE exploits the
GRID power supported by the D4Science infrastructure, in turn powered by the gCube software system. For
further information, please visit the highlighted URLs.

More Related Content

More from OpenAIRE

More from OpenAIRE (20)

9th Content Providers Community Call\
9th Content Providers Community Call\9th Content Providers Community Call\
9th Content Providers Community Call\
 
OpenAIRE in the European Open Science Cloud (EOSC)
OpenAIRE in the European Open Science Cloud (EOSC)OpenAIRE in the European Open Science Cloud (EOSC)
OpenAIRE in the European Open Science Cloud (EOSC)
 
8th Content Providers Community Call
8th Content Providers Community Call8th Content Providers Community Call
8th Content Providers Community Call
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community Call
 
OpenAIRE PROVIDE Dashboard for Turkish repository managers
OpenAIRE PROVIDE Dashboard for Turkish repository managersOpenAIRE PROVIDE Dashboard for Turkish repository managers
OpenAIRE PROVIDE Dashboard for Turkish repository managers
 
What will it cost to manage and share my data?
What will it cost to manage and share my data?What will it cost to manage and share my data?
What will it cost to manage and share my data?
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
 
6th Content Providers Community Call
6th Content Providers Community Call6th Content Providers Community Call
6th Content Providers Community Call
 
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
 
20200504_Research Data & the GDPR: How Open is Open?
20200504_Research Data & the GDPR: How Open is Open?20200504_Research Data & the GDPR: How Open is Open?
20200504_Research Data & the GDPR: How Open is Open?
 
20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open Science20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open Science
 
20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Research Data & the GDPR: How Open is Open? (updated version)20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Research Data & the GDPR: How Open is Open? (updated version)
 
20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science
 
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
 
COVID-19: Activities, tools, best practice and contact points in Greece
 COVID-19: Activities, tools, best practice and contact points in Greece COVID-19: Activities, tools, best practice and contact points in Greece
COVID-19: Activities, tools, best practice and contact points in Greece
 
5th Content Providers Community Call
5th Content Providers Community Call5th Content Providers Community Call
5th Content Providers Community Call
 
4th Content Providers Community Call
4th Content Providers Community Call4th Content Providers Community Call
4th Content Providers Community Call
 
3rd Content Providers Community Call
3rd Content Providers Community Call3rd Content Providers Community Call
3rd Content Providers Community Call
 

OpenAIRE Text notes of the Tutorial on Automatic Inference Of Links

  • 1. Once your repository platform has been made OpenAIRE compliant, researchers from your institution will be able to deposit their publications by providing the relative files and bibliographic metadata, inclusive of license information and the list of EC projects which funded such publications. Integrating your repository with the OpenAIRE infrastructure is an important step towards helping your researchers at complying with the EC Open Access mandate. However, while this will be a clear benefit for the future, what happens with all the publications deposited in the past, whose metadata did not include EC project information? You can approach the problem in two ways. The so-called manual approach consists in asking your researchers to revise and complete all past depositions through the newly provided user interfaces. Since this may be a tedious job, the OpenAIRE infrastructure offers an automatic inference approach, according to which special services are capable of inferring from the PDF files of the publications the list of EC projects that have likely funded such publications. To this aim, repository managers must make available the PDF files of the publications to the OpenAIRE infrastructure. This can happen through standards protocols, such as FTP, to be agreed with the OpenAIRE technical team. Most importantly, the names of the PDF files must include the OAI-PMH identifier provided with the corresponding metadata records. This implicit link will allow for the completion of the metadata information with the EC project information to be extracted by OpenAIRE. The inference process returns to repository managers the list of file names for which it was possible to infer at least one EC project, followed by the relative list of grant agreement numbers. The list can be provided in several formats, including txt or Excel files, to be agreed with the OpenAIRE technical team. Repository managers must write scripts capable of processing such list to complete the local database with the missing associations between publications and EC projects. At this stage, repository managers may involve researchers to confirm the result of the inference process and therefore enable a simplified and faster manual approach. The automatic inference service requires considerable CPU consumption in order to parse large sets of PDF files and identify references to EC projects grant agreement numbers. To this aim, OpenAIRE exploits the GRID power supported by the D4Science infrastructure, in turn powered by the gCube software system. For further information, please visit the highlighted URLs.