O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Freedom for bibliographic references: OpenCitations arise

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
2009 0807 Lod Gmod
2009 0807 Lod Gmod
Carregando em…3
×

Confira estes a seguir

1 de 12 Anúncio

Freedom for bibliographic references: OpenCitations arise

Baixar para ler offline

Scholarly citations from one publication to another, expressed as reference lists within academic articles, are core elements of scholarly communication. Unfortunately, they usually can be accessed en masse only by paying significant subscription fees to commercial organizations, while those few services that do made them available for free impose strict limitations on their reuse. In this paper we provide an overview of the OpenCitations Project (http://opencitations.net) undertaken to remedy this situation, and of its main product, the OpenCitations Corpus, which is an open repository of accurate bibliographic citation data harvested from the scholarly literature, made available in RDF under a Creative Commons public domain dedication.

Paper at: https://w3id.org/oc/paper/occ-lisc2016.html

Scholarly citations from one publication to another, expressed as reference lists within academic articles, are core elements of scholarly communication. Unfortunately, they usually can be accessed en masse only by paying significant subscription fees to commercial organizations, while those few services that do made them available for free impose strict limitations on their reuse. In this paper we provide an overview of the OpenCitations Project (http://opencitations.net) undertaken to remedy this situation, and of its main product, the OpenCitations Corpus, which is an open repository of accurate bibliographic citation data harvested from the scholarly literature, made available in RDF under a Creative Commons public domain dedication.

Paper at: https://w3id.org/oc/paper/occ-lisc2016.html

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Freedom for bibliographic references: OpenCitations arise (20)

Anúncio

Mais de University of Bologna (14)

Mais recentes (20)

Anúncio

Freedom for bibliographic references: OpenCitations arise

  1. 1. Freedom for bibliographic references: OpenCitations arise Silvio Peroni, David Shotton, Fabio Vitali 4th International Workshop on 
 Linked Data for Information Extraction (LD4IE 2016)
 Kobe, Japan, October 18, 2016 https://w3id.org/oc/paper/occ-lisc2016.html
  2. 2. The Venice analogy • Island = 
 scholarly publication • Bridge = citation • Current situation: – local travel to the next island is permitted – unrestricted travel over the entire network of bridges requires an expensive season ticket – general populace is excluded https://w3id.org/oc/paper/the-venice-analogy.html
  3. 3. Opening the bridges • What – Citation data are one of the main tools used by researchers to gain knowledge about particular topics, and they also serve institutional goals, for example in research assessment • Problem – The most authoritative databases of citation data, Scopus and Web of Science, can only be accessed by paying significant annual access fees – The University of Bologna pays about 6,000,000 euros per year for accessing to digital bibliographic resources • Solution – To create a citation database that freely and legally makes available citation data in an open repository to assist scholars with their academic studies and serve knowledge to the wider public
  4. 4. OpenCitations • The OpenCitations Project aims at creating an open repository of scholarly citation data – the OpenCitations Corpus (OCC) – made available under a Creative Commons public domain dedication to provide in RDF accurate citation information (bibliographic references) harvested from the scholarly literature – All scripts are released with Open Source ISC Licence and available on GitHub at http://github.com/essepuntato/opencitations • Currently processing papers available in the PubMedCentral Open Access subset (which contains paper related to the medical, biological, life science domains) by means of the Europe PubMedCentral API • As of October 17, 2016 the OCC contains – 1,311,196 citing/cited bibliographic resources – 1,584,945 citation links http://opencitations.net
  5. 5. OpenCitations Ontology • The OpenCitations Ontology (OCO) groups existing complementary ontological entities from several other ontologies for the purpose of providing descriptive metadata for the OCC • SPAR Ontologies reused: – FRBR-aligned Bibliographic Ontology (FaBiO) http:// purl.org/spar/fabio) – Publishing Roles Ontology (PRO, http://purl.org/ spar/pro) – Bibliographic Reference Ontology (BiRO, http:// purl.org/spar/biro) – Citation Counting and Context Characterization Ontology (C4O, http://purl.org/ spar/c4o) – DataCite Ontology (http:// purl.org/spar/datacite)
  6. 6. OpenCitations Corpus • Six distinct kinds of bibliographic entities – bibliographic resources (citing/cited articles, journals, books, proceedings, etc.) – resource embodiments (format information about bibliographic resources) – bibliographic entries (literal textual entries occurring in the reference lists) – responsible agents (agents having certain roles with respect to the bibliographic resources) – agent roles (author, editor, publisher); – identifiers (DOI, ORCID, PubMedID, URL, etc.) • Provenance for each entity handled by means of PROV-O – as described in the Drift-a-LOD 2016 (a workshop held in Bologna next month during EKAW 2016) paper available at 
 https://w3id.org/oc/paper/occ-driftalod2016.html • Access the OCC via – HTTP (content negotiation, formats: JSON-LD, RDF/XML, Trig, HTML), 
 e.g. https://w3id.org/oc/corpus/br/1 – SPARQL endpoint, available at https://w3id.org/oc/sparql – dumps, downloadable at https://opencitations.net/download
  7. 7. Ingestion workflow BEE EuropeanPubMedCentralProcessor Parsing the XML source of PubMed Central Open Access articles. 1 SPACIN Producing JSON with DOI and bib entries. {
 "doi": "10.1590/1414-431x20154655", 
 "localid": "MED-26577845", 
 "curator": "BEE EuropeanPubMedCentralProcessor", 
 "source": "http://www.ebi.ac.uk/europepmc/webservices/rest/PMC4678653/ fullTextXML", "source_provider": "Europe PubMed Central", 
 "pmid": "26577845", 
 "pmcid": “PMC4678653",
 "references": [
 {
 "bibentry": "Wenger, NK. Coronary heart disease: an older woman's major health risk, BMJ, 1997, 315, 1085, 1090, DOI: 10.1136/bmj.315.7115.1085, PMID: 9366743", 
 "pmid": "9366743", 
 "doi": "10.1136/bmj.315.7115.1085", 
 "pmcid": "PMC2127693", 
 "process_entry": "True"
 } … ] } 2 For each citing/cited resource, if an ID (DOI, PMID, PMCID) is specified check if the resource exists already. If it does go to 5. store ResourceFinder 3 GraphSet ProvSet DatasetHandler
 Storer Load all the statements onthe triplestore and storethem in the file system for easy recovering. OCC 6 If the resource doesn’t exist, extract possible IDs from the entry and query CrossRef and ORCID. CrossRefProcessor
 ORCIDProcessor 4 GraphEntity New metadata resources are created. If CrossRef/ORCID returned something, all the related metadata will be used, otherwise only basic metadata (IDs and entries) will be added. 5
  8. 8. Test • Hardware: MacBook Pro, with 2 GHz Intel Core i7 processor, 8 GB DDR3 1600 MHz, OS X 10.11.3 • BEE: running for 30 minutes (querying Europe PubMedCentral API), produced 185 JSON files (~6 new JSON files per minute) • SPACIN – 45 minutes to process all BEE JSON files related to the 67 papers in the ISWC 2015 Proceedings (sources kindly made available by Springer-Nature) – 210 minutes to process BEE JSON files related to 67 papers from Europe PubMed Central (OA subset) All these data are available on Figshare – their URLs is included in the article.
  9. 9. ISWC2015: most cited papers PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX fabio: <http://purl.org/spar/fabio/> PREFIX cito: <http://purl.org/spar/cito/> SELECT ?cited ?title ?tot { { SELECT ?cited (count(?citing) as ?tot) { ?cited a fabio:Expression ; ^cito:cites ?citing } GROUP BY ?cited } OPTIONAL { ?cited dcterms:title ?title } } ORDER BY DESC(?tot) LIMIT 15 no title?
  10. 10. No Crossref metadata PREFIX biro: <http://purl.org/spar/biro/> PREFIX c4o: <http://purl.org/spar/c4o/> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX frbr: <http://purl.org/vocab/frbr/core#> SELECT ?citing ?entry { <http://localhost:8000/corpus/br/1302> ^biro:references ?ref . ?ref c4o:hasContent ?entry ; ^frbr:part ?citing } How the “no title” paper has been referenced in the 4 papers citing it SPACIN used the URL in the textual entries (i.e. “http://www.w3.org/DesignIssues/LinkedData.html”) to associate them to the same bibliographic resource: <http://localhost:8000/corpus/br/1302>
  11. 11. Conclusions • We have introduced the OpenCitations Project, which has created an open repository of accurate bibliographic references harvested from the scholarly literature, i.e. the OpenCitations Corpus (OCC) • The number of citation links is growing day by day (about 25,000 new citation links per day) as the continuous workflow adds new data dynamically from Europe PubMedCentral (and other authoritative sources, i.e. Crossref and ORCID) • First adopter: Wikidata (via WikiCite) – The Wikidata community has created a property for associating the OCC bibliographic resource identifier to the metadata about scholarly papers in Wikidata – Several links from Wikidata to the OCC have been already added • Future plans: developing tools for linking the resources within the OCC with those included in other datasets, e.g. Wikidata, Scholarly Data, Springer LOD • Don’t hesitate to poke me during the poster and demo session on Wednesday (panel P30) for additional details about OpenCitations – and don’t forgot to vote for it, of course :-)
  12. 12. Thanks for your attention Silvio Peroni, David Shotton, Fabio Vitali 4th International Workshop on 
 Linked Data for Information Extraction (LD4IE 2016)
 Kobe, Japan, October 18, 2016

×