O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

4th Content Providers Community Call

Presentation of the 2nd Content Providers Community Call, targeting the following topics: 1) OpenAIRE Content provider dashboard updates;
2) OpenAIRE aggregation and enrichment processes: specifications and good practices;
3) Community questions & comments.

  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

4th Content Providers Community Call

  1. 1. @openaire_eu 4th Community Call OpenAIRE content providers managers aggregation and enrichment processes Alessia Bardi (CNR-ISTI), Andreas Czerniak (UNIBI), Pedro Príncipe (UMINHO) 04/03/2020
  2. 2. 1) OpenAIRE Provide updates 2) OpenAIRE aggregation and enrichment processes - The OpenAIRE Aggregator - How to monitor the aggregation of your data source - OpenAIRE enrichment processes 3) Questions & comments (please share your use cases, issues) AGENDA: Notes & Agenda ⇨ https://bit.ly/2rTgJwy www.openaire.eu/provide-community-calls
  3. 3. OpenAIRE Provide – recent news Dashboard UI/UX redesign https://beta.provide.openaire.eu (Coming soon - March) Your participation is needed (take part of the user board Collection monitor feature (aggregation history more complete) Subscribe the newsletter www.openaire.eu/past-cp- newsletters/listing Provide Public Roadmap https://trello.com/b/JHbHKLZ 4/openaire-provide-roadmap
  4. 4. @openaire_eu OpenAIRE aggregation and enrichment processes: specifications and good practices CNR & Bielefeld University Community Call | 04 MAR 2020
  5. 5. The OpenAIRE Aggregator Community Call | 04 MAR 2020 how the OpenAIRE Research Graph is materialized
  6. 6. An open metadata research graph of interlinked scientific products, with access rights information, linked to funding information and research communities Graph: model for the representation of information OpenAIRE uses it to represent objects in the scholarly communication domain and the relationships that exist among them. Edges of the graph are annotated with a label that specifies the semantics of the relationships between two objects, each represented as a node in the graph.
  7. 7. The OpenAIRE Research Graph in numbers Data sources 17K Publications 37M (deduplicated 8M with full-texts) Datasets 975K Software 52K Projects 3M Funders 22 Production Data sources 10K Publications 110M (deduplicated, 10M with full-texts) Datasets 2M Software 43K Projects 3M Funders 29 Beta
  8. 8. … and more Academic Graph … and more … and more … and more European and international funders … and more … and more … and more Collecting metadata, links, and full-texts from more than 10K sources worldwide to materialize a graph where entities of the research life cycle are linked to each other
  9. 9. 9 RAW Aggregators Repositories Registries OA Journals Data sources Publishers Metadata, relationships, full-texts Relationships Metadata, relationships, full-texts Metadata, relationships Aggregated data sources CRIS Community Call | 04 MAR 2020
  10. 10. Enrichment 10 Different records representing the same entity (results or organization) are merged in one Mining full-texts and abstracts to identify links (e.g. to projects, to datasets, to software), affiliations, subject classification, citations Enrich records with information about relevant research communities and infrastructures based on the provenance of the records and their keywords. Propagation Enrich records based on information available in the records that are linked to them with a relationship with “strong” semantics (e.g. supplements/isSupplementTo) RAW OpenAIRE Research Graph the supply chain
  11. 11. Integration Scenarios ● Directly harvested (from repositories, journals) ● Indirectly harvested (via aggregators, publishers) ○ see “collected from a compatible aggregator” in the Explore portal ○ records are marked to be collected from the aggregator and hosted by the specific repository/journal (if resolvable via OpenDOAR/re3data/ISSN). ○ when the hosting source cannot be resolved, the record appears as hosted by the “Unknown repository” Community Call | 04 MAR 2020
  12. 12. OpenAIRE‘s Guidelines for Open Science Content Providers https://guidelines.openaire.eu Community Call | 04 MAR 2020
  13. 13. How to join
  14. 14. 14 OpenAIRE aggregator Snapshot of transformed records Aggregators Repositories Registries OA Journals Data sources Publishers transformation collection transformation collection transformation collection Aggregation workflow: one per data source RAW Aggregation workflows CRIS Community Call | 04 MAR 2020 full-text collection
  15. 15. OpenAIRE aggregation team: UNIBI Activities: • Activate the aggregation workflow • Check supplied data • Configure transformation step to • assign the proper typologies to records (literature, dataset, software, or other) • address metadata quality imperfections • Contact repository managers • suggest improvements • ask for permission to download Open Access full-texts Aggregation of metadata
  16. 16. Aggregated record and Data Source Types Publications • Article • Preprint • Report • Patent • … Datasets • Dataset • Collection • Clinical Trial • … Software • Research Software • … Other Research Products • Service • Workflow • Interactive Resource • … Institutional/ publication repositories Journals/ publishers Data repositories Other Products repositories Software repositories CRIS Community Call | 04 MAR 2020
  17. 17. • OpenAIRE collects them but do not re-distribute them • OpenAIRE explore portal will send the user to your URL • OpenAIRE as a way to get new users/more accesses to your platform • OpenAIRE runs mining algorithms to enrich the available metadata • You can get back this information via the Broker • Let other repository know that you have an Open Access version of the paper • OpenAIRE is directly connected to the EC participant portal: • If an Open Access version of the paper is known to exist, the life of the project coordinator will be easier... Open Access full-texts
  18. 18. How to monitor your aggregation workflow? Community Call | 04 MAR 2020
  19. 19. Collection monitor Each box is an aggregation stage: COLLECT or TRANSFORM
  20. 20. Collection monitor Collection mode: - REFRESH: OpenAIRE collected everything - INCREMENTAL: OpenAIRE collected only records that have been updated since the previous collection
  21. 21. Collection monitor The number of records is different: some had to be discarded by the aggregation team. Suggestion: validate your repository and check the report.
  22. 22. Collection monitor The portal shows the metadata collected in this date... ...but only these number of records get into the pipeline Look for the OpenAIRE logo This version of the metadata is still not visible in the portal
  23. 23. OpenAIRE enrichment processes Community Call | 04 MAR 2020
  24. 24. • Infer information with full-text and data mining • Fostering PIDs • Propagation of ORCID IDs • Improve discovery • Propagation of abstracts between articles, datasets, and software • Improve monitoring • Propagation of organizations from institutional data sources to relative products and from products to linked products with same authors Enrichment of the graph OpenAIRE-Advance Kick off | Athens | 17-19 Jan 2018
  25. 25. • Under implementation - Introduce relevant links between scientific products, for example: • Rapid view of science: identify links between articles and relative presentations • Hidden research software: identify link to URLs targeting rar and zip archives Enrichment of the graph OpenAIRE-Advance Kick off | Athens | 17-19 Jan 2018
  26. 26. Enrichment 27 Different records representing the same entity (results or organization) are merged in one Mining full-texts and abstracts to identify links (e.g. to projects, to datasets, to software), affiliations, subject classification, citations Enrich records with information about relevant research communities and infrastructures based on the provenance of the records and their keywords. Propagation Enrich records based on information available in the records that are linked to them with a relationship with “strong” semantics (e.g. supplements/isSupplementTo) RAW OpenAIRE Research Graph the supply chain Metadata records corresponding to equivalent objects are merged. Pre-print, post-print, published versions are considered equivalent for stats & monitoring purposes Harvested publications 160Mi Unique publications 110Mi
  27. 27. Enriching metadata Inference Assign research products to communities/infrastructures based on their provenance, subjects Info deduction Abstracts, links to projects, countries, communities/infrastructures, ORCID ids from a research product to other products Info propagation 10Mi OA full-texts Mining output • Text-mined links • 130Mi • Links to projects, software, datasets, research infra/communities, similarities • Text-mined values • 178Mi • Citations, abstract, subject classification terms Coming soon: links to patents (EPO/PATSTAT) subject: s part_of r csubjects:[s, s1, … , sn] Example Under consideration: propagation of organization from one product to another linked to it with “supplementedBy/supplementTo”
  28. 28. 1) OpenAIRE Provide updates 2) OpenAIRE aggregation and enrichment processes - The OpenAIRE Aggregator - How to monitor the aggregation of your data source - OpenAIRE enrichment processes 3) Questions & comments (please share your use cases, issues) AGENDA: Notes & Agenda ⇨ https://bit.ly/2rTgJwy www.openaire.eu/provide-community-calls
  29. 29. Upcoming calls April 1st - main topic: DSpace-CRIS for OpenAIRE: implementation of the CRIS guidelines and beyond www.openaire.eu/provide-community-calls
  30. 30. Subscribe to our newsletter! www.openaire.eu/past-cp-newsletters
  31. 31. Thank you! www.openaire.eu/provide-community-calls Alessia Bardi (CNR-ISTI), alessia.bardi@isti.cnr.it Andreas Czerniak (UNIBI), openaire-helpdesk@uni-bielefeld.de Pedro Príncipe (UMINHO), pedroprincipe@sdum.uminho.pt

×