SlideShare uma empresa Scribd logo
1 de 12
Baixar para ler offline
Freedom for bibliographic references:
OpenCitations arise
Silvio Peroni, David Shotton, Fabio Vitali
4th International Workshop on 

Linked Data for Information Extraction (LD4IE 2016)

Kobe, Japan, October 18, 2016
https://w3id.org/oc/paper/occ-lisc2016.html
The Venice analogy
• Island = 

scholarly publication
• Bridge = citation
• Current situation:
– local travel to
the next island
is permitted
– unrestricted
travel over the
entire network
of bridges
requires an
expensive
season ticket
– general
populace is
excluded
https://w3id.org/oc/paper/the-venice-analogy.html
Opening the bridges
• What – Citation data are one of the main tools used by
researchers to gain knowledge about particular topics, and
they also serve institutional goals, for example in research
assessment
• Problem – The most authoritative databases of citation data,
Scopus and Web of Science, can only be accessed by paying
significant annual access fees
– The University of Bologna pays about 6,000,000 euros per year for
accessing to digital bibliographic resources
• Solution – To create a citation database that freely and legally
makes available citation data in an open repository to assist
scholars with their academic studies and serve knowledge to
the wider public
OpenCitations
• The OpenCitations Project aims at creating an open repository of
scholarly citation data – the OpenCitations Corpus (OCC) – made
available under a Creative Commons public domain dedication to
provide in RDF accurate citation information (bibliographic
references) harvested from the scholarly literature
– All scripts are released with Open Source ISC Licence and available on
GitHub at http://github.com/essepuntato/opencitations
• Currently processing papers available in the PubMedCentral Open
Access subset (which contains paper related to the medical,
biological, life science domains) by means of the Europe
PubMedCentral API
• As of October 17, 2016 the OCC contains
– 1,311,196 citing/cited bibliographic resources
– 1,584,945 citation links
http://opencitations.net
OpenCitations Ontology
• The OpenCitations Ontology
(OCO) groups existing
complementary ontological
entities from several other
ontologies for the purpose of
providing descriptive metadata
for the OCC
• SPAR Ontologies reused:
– FRBR-aligned Bibliographic
Ontology (FaBiO) http://
purl.org/spar/fabio)
– Publishing Roles Ontology
(PRO, http://purl.org/
spar/pro)
– Bibliographic Reference
Ontology (BiRO, http://
purl.org/spar/biro)
– Citation Counting and Context
Characterization Ontology
(C4O, http://purl.org/
spar/c4o)
– DataCite Ontology (http://
purl.org/spar/datacite)
OpenCitations Corpus
• Six distinct kinds of bibliographic entities
– bibliographic resources (citing/cited articles, journals, books, proceedings, etc.)
– resource embodiments (format information about bibliographic resources)
– bibliographic entries (literal textual entries occurring in the reference lists)
– responsible agents (agents having certain roles with respect to the bibliographic
resources)
– agent roles (author, editor, publisher);
– identifiers (DOI, ORCID, PubMedID, URL, etc.)
• Provenance for each entity handled by means of PROV-O – as described in the
Drift-a-LOD 2016 (a workshop held in Bologna next month during EKAW 2016)
paper available at 

https://w3id.org/oc/paper/occ-driftalod2016.html
• Access the OCC via
– HTTP (content negotiation, formats: JSON-LD, RDF/XML, Trig, HTML), 

e.g. https://w3id.org/oc/corpus/br/1
– SPARQL endpoint, available at https://w3id.org/oc/sparql
– dumps, downloadable at https://opencitations.net/download
Ingestion workflow
BEE
EuropeanPubMedCentralProcessor
Parsing the
XML source of
PubMed Central
Open Access
articles.
1
SPACIN
Producing
JSON with DOI
and bib entries.
{

"doi": "10.1590/1414-431x20154655", 

"localid": "MED-26577845", 

"curator": "BEE EuropeanPubMedCentralProcessor", 

"source": "http://www.ebi.ac.uk/europepmc/webservices/rest/PMC4678653/
fullTextXML",
"source_provider": "Europe PubMed Central", 

"pmid": "26577845", 

"pmcid": “PMC4678653",

"references": [

{

"bibentry": "Wenger, NK. Coronary heart disease: an older woman's major
health risk, BMJ, 1997, 315, 1085, 1090, DOI: 10.1136/bmj.315.7115.1085, PMID:
9366743", 

"pmid": "9366743", 

"doi": "10.1136/bmj.315.7115.1085", 

"pmcid": "PMC2127693", 

"process_entry": "True"

} …
]
}
2
For each citing/cited resource,
if an ID (DOI, PMID, PMCID) is
specified check if the resource
exists already. If it does go to 5.
store
ResourceFinder
3
GraphSet
ProvSet
DatasetHandler

Storer
Load all the statements onthe triplestore and storethem in the file system for
easy recovering.
OCC
6
If the resource doesn’t exist,
extract possible IDs from the entry
and query CrossRef and ORCID.
CrossRefProcessor

ORCIDProcessor
4
GraphEntity
New metadata resources are created.
If CrossRef/ORCID returned something, all
the related metadata will be used,
otherwise only basic metadata (IDs and
entries) will be added.
5
Test
• Hardware: MacBook Pro, with 2 GHz Intel Core i7 processor, 8 GB
DDR3 1600 MHz, OS X 10.11.3
• BEE: running for 30 minutes (querying Europe PubMedCentral API),
produced 185 JSON files (~6 new JSON files per minute)
• SPACIN
– 45 minutes to process all BEE JSON files related to the 67 papers in the
ISWC 2015 Proceedings (sources kindly made available by Springer-Nature)
– 210 minutes to process BEE JSON files related to 67 papers from Europe
PubMed Central (OA subset)
All these data are available on Figshare – their URLs is included in the article.
ISWC2015: most cited papers
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX fabio: <http://purl.org/spar/fabio/>
PREFIX cito: <http://purl.org/spar/cito/>
SELECT ?cited ?title ?tot {
{ SELECT ?cited (count(?citing) as ?tot) { ?cited a fabio:Expression ; ^cito:cites ?citing }
GROUP BY ?cited }
OPTIONAL { ?cited dcterms:title ?title } } ORDER BY DESC(?tot) LIMIT 15
no title?
No Crossref metadata
PREFIX biro: <http://purl.org/spar/biro/>
PREFIX c4o: <http://purl.org/spar/c4o/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX frbr: <http://purl.org/vocab/frbr/core#>
SELECT ?citing ?entry {
<http://localhost:8000/corpus/br/1302> ^biro:references ?ref .
?ref c4o:hasContent ?entry ; ^frbr:part ?citing
}
How the “no title” paper has
been referenced in the 4
papers citing it
SPACIN used the URL in the textual entries
(i.e. “http://www.w3.org/DesignIssues/LinkedData.html”)
to associate them to the same bibliographic resource:
<http://localhost:8000/corpus/br/1302>
Conclusions
• We have introduced the OpenCitations Project, which has created an open
repository of accurate bibliographic references harvested from the scholarly
literature, i.e. the OpenCitations Corpus (OCC)
• The number of citation links is growing day by day (about 25,000 new citation
links per day) as the continuous workflow adds new data dynamically from
Europe PubMedCentral (and other authoritative sources, i.e. Crossref and
ORCID)
• First adopter: Wikidata (via WikiCite)
– The Wikidata community has created a property for associating the OCC bibliographic
resource identifier to the metadata about scholarly papers in Wikidata
– Several links from Wikidata to the OCC have been already added
• Future plans: developing tools for linking the resources within the OCC with those
included in other datasets, e.g. Wikidata, Scholarly Data, Springer LOD
• Don’t hesitate to poke me during the poster and demo session on Wednesday
(panel P30) for additional details about OpenCitations – and don’t forgot to vote
for it, of course :-)
Thanks for your attention
Silvio Peroni, David Shotton, Fabio Vitali
4th International Workshop on 

Linked Data for Information Extraction (LD4IE 2016)

Kobe, Japan, October 18, 2016

Mais conteúdo relacionado

Mais procurados

Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?OCLC
 
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE
 
Analysing Structured Scholarly Data Embedded in Web Pages
Analysing Structured Scholarly Data Embedded in Web PagesAnalysing Structured Scholarly Data Embedded in Web Pages
Analysing Structured Scholarly Data Embedded in Web PagesUjwal Gadiraju
 
The expanding dataverse
The expanding dataverseThe expanding dataverse
The expanding dataverseMerce Crosas
 
Publishing the British National Bibliography as Linked Open Data / Corine Del...
Publishing the British National Bibliography as Linked Open Data / Corine Del...Publishing the British National Bibliography as Linked Open Data / Corine Del...
Publishing the British National Bibliography as Linked Open Data / Corine Del...CIGScotland
 
BibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 PresentationBibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 PresentationReynold Xin
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4jSimon Jupp
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISimon Jupp
 
Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)dgarijo
 
Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinSimon Jupp
 
Research Objects in Scientific Publications
Research Objects in Scientific PublicationsResearch Objects in Scientific Publications
Research Objects in Scientific Publicationsdgarijo
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyFAIRDOM
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf OpenflydataJun Zhao
 
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Dag Endresen
 

Mais procurados (20)

Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?
 
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
 
Expanding the content categories at JaLC
Expanding the content categories at JaLCExpanding the content categories at JaLC
Expanding the content categories at JaLC
 
Datasets with bioschemas
Datasets with bioschemasDatasets with bioschemas
Datasets with bioschemas
 
Analysing Structured Scholarly Data Embedded in Web Pages
Analysing Structured Scholarly Data Embedded in Web PagesAnalysing Structured Scholarly Data Embedded in Web Pages
Analysing Structured Scholarly Data Embedded in Web Pages
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
The expanding dataverse
The expanding dataverseThe expanding dataverse
The expanding dataverse
 
Publishing the British National Bibliography as Linked Open Data / Corine Del...
Publishing the British National Bibliography as Linked Open Data / Corine Del...Publishing the British National Bibliography as Linked Open Data / Corine Del...
Publishing the British National Bibliography as Linked Open Data / Corine Del...
 
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early AdoptersApril 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
 
BibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 PresentationBibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 Presentation
 
Intro nsl-sc-july
Intro nsl-sc-julyIntro nsl-sc-july
Intro nsl-sc-july
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 
Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)
 
Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlin
 
Research Objects in Scientific Publications
Research Objects in Scientific PublicationsResearch Objects in Scientific Publications
Research Objects in Scientific Publications
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems Biology
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata
 
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
 

Semelhante a Freedom for bibliographic references: OpenCitations arise

Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the webChiara Del Vescovo
 
finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptxhasanrdhaiwi
 
The Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusThe Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusUniversity of Bologna
 
Open data sources in VOSviewer
Open data sources in VOSviewerOpen data sources in VOSviewer
Open data sources in VOSviewerNees Jan van Eck
 
Closing the scientific literature access gap with CORE - how to gain free acc...
Closing the scientific literature access gap with CORE - how to gain free acc...Closing the scientific literature access gap with CORE - how to gain free acc...
Closing the scientific literature access gap with CORE - how to gain free acc...Nancy Pontika
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community CallOpenAIRE
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...OpenAIRE
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Paolo Manghi
 
BHL @ #TDWG09 - with discussion
BHL @ #TDWG09 - with discussionBHL @ #TDWG09 - with discussion
BHL @ #TDWG09 - with discussionChris Freeland
 
David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017Crossref
 
A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...University of Bologna
 
Annotopia: Open Annotation Server
Annotopia: Open Annotation ServerAnnotopia: Open Annotation Server
Annotopia: Open Annotation ServerPaolo Ciccarese
 
Towards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial FindingsTowards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial Findingsalc28
 
Matthew Hale - Open Source at the Kings Fund
Matthew Hale - Open Source at the Kings FundMatthew Hale - Open Source at the Kings Fund
Matthew Hale - Open Source at the Kings FundTracy Kent
 
A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...alc28
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Open Bibliography, Citations and Scholarship
Open Bibliography, Citations and ScholarshipOpen Bibliography, Citations and Scholarship
Open Bibliography, Citations and Scholarshipbenosteen
 

Semelhante a Freedom for bibliographic references: OpenCitations arise (20)

OpenCitations
OpenCitationsOpenCitations
OpenCitations
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptx
 
The Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusThe Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations Corpus
 
Open data sources in VOSviewer
Open data sources in VOSviewerOpen data sources in VOSviewer
Open data sources in VOSviewer
 
Closing the scientific literature access gap with CORE - how to gain free acc...
Closing the scientific literature access gap with CORE - how to gain free acc...Closing the scientific literature access gap with CORE - how to gain free acc...
Closing the scientific literature access gap with CORE - how to gain free acc...
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community Call
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
 
BHL @ #TDWG09 - with discussion
BHL @ #TDWG09 - with discussionBHL @ #TDWG09 - with discussion
BHL @ #TDWG09 - with discussion
 
David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017
 
A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...
 
Annotopia: Open Annotation Server
Annotopia: Open Annotation ServerAnnotopia: Open Annotation Server
Annotopia: Open Annotation Server
 
Towards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial FindingsTowards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial Findings
 
Matthew Hale - Open Source at the Kings Fund
Matthew Hale - Open Source at the Kings FundMatthew Hale - Open Source at the Kings Fund
Matthew Hale - Open Source at the Kings Fund
 
A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Open Bibliography, Citations and Scholarship
Open Bibliography, Citations and ScholarshipOpen Bibliography, Citations and Scholarship
Open Bibliography, Citations and Scholarship
 

Mais de University of Bologna

A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentUniversity of Bologna
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsUniversity of Bologna
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherUniversity of Bologna
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...University of Bologna
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentUniversity of Bologna
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersUniversity of Bologna
 
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...University of Bologna
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsUniversity of Bologna
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...University of Bologna
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...University of Bologna
 
Embedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachEmbedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachUniversity of Bologna
 

Mais de University of Bologna (14)

A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology Development
 
FOOD: FOod in Open Data
FOOD: FOod in Open DataFOOD: FOod in Open Data
FOOD: FOod in Open Data
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflows
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing together
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experiment
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointers
 
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citations
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
 
Embedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachEmbedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approach
 
Dealing with Markup Semantics
Dealing with Markup SemanticsDealing with Markup Semantics
Dealing with Markup Semantics
 
Handling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWLHandling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWL
 

Último

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 

Último (20)

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 

Freedom for bibliographic references: OpenCitations arise

  • 1. Freedom for bibliographic references: OpenCitations arise Silvio Peroni, David Shotton, Fabio Vitali 4th International Workshop on 
 Linked Data for Information Extraction (LD4IE 2016)
 Kobe, Japan, October 18, 2016 https://w3id.org/oc/paper/occ-lisc2016.html
  • 2. The Venice analogy • Island = 
 scholarly publication • Bridge = citation • Current situation: – local travel to the next island is permitted – unrestricted travel over the entire network of bridges requires an expensive season ticket – general populace is excluded https://w3id.org/oc/paper/the-venice-analogy.html
  • 3. Opening the bridges • What – Citation data are one of the main tools used by researchers to gain knowledge about particular topics, and they also serve institutional goals, for example in research assessment • Problem – The most authoritative databases of citation data, Scopus and Web of Science, can only be accessed by paying significant annual access fees – The University of Bologna pays about 6,000,000 euros per year for accessing to digital bibliographic resources • Solution – To create a citation database that freely and legally makes available citation data in an open repository to assist scholars with their academic studies and serve knowledge to the wider public
  • 4. OpenCitations • The OpenCitations Project aims at creating an open repository of scholarly citation data – the OpenCitations Corpus (OCC) – made available under a Creative Commons public domain dedication to provide in RDF accurate citation information (bibliographic references) harvested from the scholarly literature – All scripts are released with Open Source ISC Licence and available on GitHub at http://github.com/essepuntato/opencitations • Currently processing papers available in the PubMedCentral Open Access subset (which contains paper related to the medical, biological, life science domains) by means of the Europe PubMedCentral API • As of October 17, 2016 the OCC contains – 1,311,196 citing/cited bibliographic resources – 1,584,945 citation links http://opencitations.net
  • 5. OpenCitations Ontology • The OpenCitations Ontology (OCO) groups existing complementary ontological entities from several other ontologies for the purpose of providing descriptive metadata for the OCC • SPAR Ontologies reused: – FRBR-aligned Bibliographic Ontology (FaBiO) http:// purl.org/spar/fabio) – Publishing Roles Ontology (PRO, http://purl.org/ spar/pro) – Bibliographic Reference Ontology (BiRO, http:// purl.org/spar/biro) – Citation Counting and Context Characterization Ontology (C4O, http://purl.org/ spar/c4o) – DataCite Ontology (http:// purl.org/spar/datacite)
  • 6. OpenCitations Corpus • Six distinct kinds of bibliographic entities – bibliographic resources (citing/cited articles, journals, books, proceedings, etc.) – resource embodiments (format information about bibliographic resources) – bibliographic entries (literal textual entries occurring in the reference lists) – responsible agents (agents having certain roles with respect to the bibliographic resources) – agent roles (author, editor, publisher); – identifiers (DOI, ORCID, PubMedID, URL, etc.) • Provenance for each entity handled by means of PROV-O – as described in the Drift-a-LOD 2016 (a workshop held in Bologna next month during EKAW 2016) paper available at 
 https://w3id.org/oc/paper/occ-driftalod2016.html • Access the OCC via – HTTP (content negotiation, formats: JSON-LD, RDF/XML, Trig, HTML), 
 e.g. https://w3id.org/oc/corpus/br/1 – SPARQL endpoint, available at https://w3id.org/oc/sparql – dumps, downloadable at https://opencitations.net/download
  • 7. Ingestion workflow BEE EuropeanPubMedCentralProcessor Parsing the XML source of PubMed Central Open Access articles. 1 SPACIN Producing JSON with DOI and bib entries. {
 "doi": "10.1590/1414-431x20154655", 
 "localid": "MED-26577845", 
 "curator": "BEE EuropeanPubMedCentralProcessor", 
 "source": "http://www.ebi.ac.uk/europepmc/webservices/rest/PMC4678653/ fullTextXML", "source_provider": "Europe PubMed Central", 
 "pmid": "26577845", 
 "pmcid": “PMC4678653",
 "references": [
 {
 "bibentry": "Wenger, NK. Coronary heart disease: an older woman's major health risk, BMJ, 1997, 315, 1085, 1090, DOI: 10.1136/bmj.315.7115.1085, PMID: 9366743", 
 "pmid": "9366743", 
 "doi": "10.1136/bmj.315.7115.1085", 
 "pmcid": "PMC2127693", 
 "process_entry": "True"
 } … ] } 2 For each citing/cited resource, if an ID (DOI, PMID, PMCID) is specified check if the resource exists already. If it does go to 5. store ResourceFinder 3 GraphSet ProvSet DatasetHandler
 Storer Load all the statements onthe triplestore and storethem in the file system for easy recovering. OCC 6 If the resource doesn’t exist, extract possible IDs from the entry and query CrossRef and ORCID. CrossRefProcessor
 ORCIDProcessor 4 GraphEntity New metadata resources are created. If CrossRef/ORCID returned something, all the related metadata will be used, otherwise only basic metadata (IDs and entries) will be added. 5
  • 8. Test • Hardware: MacBook Pro, with 2 GHz Intel Core i7 processor, 8 GB DDR3 1600 MHz, OS X 10.11.3 • BEE: running for 30 minutes (querying Europe PubMedCentral API), produced 185 JSON files (~6 new JSON files per minute) • SPACIN – 45 minutes to process all BEE JSON files related to the 67 papers in the ISWC 2015 Proceedings (sources kindly made available by Springer-Nature) – 210 minutes to process BEE JSON files related to 67 papers from Europe PubMed Central (OA subset) All these data are available on Figshare – their URLs is included in the article.
  • 9. ISWC2015: most cited papers PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX fabio: <http://purl.org/spar/fabio/> PREFIX cito: <http://purl.org/spar/cito/> SELECT ?cited ?title ?tot { { SELECT ?cited (count(?citing) as ?tot) { ?cited a fabio:Expression ; ^cito:cites ?citing } GROUP BY ?cited } OPTIONAL { ?cited dcterms:title ?title } } ORDER BY DESC(?tot) LIMIT 15 no title?
  • 10. No Crossref metadata PREFIX biro: <http://purl.org/spar/biro/> PREFIX c4o: <http://purl.org/spar/c4o/> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX frbr: <http://purl.org/vocab/frbr/core#> SELECT ?citing ?entry { <http://localhost:8000/corpus/br/1302> ^biro:references ?ref . ?ref c4o:hasContent ?entry ; ^frbr:part ?citing } How the “no title” paper has been referenced in the 4 papers citing it SPACIN used the URL in the textual entries (i.e. “http://www.w3.org/DesignIssues/LinkedData.html”) to associate them to the same bibliographic resource: <http://localhost:8000/corpus/br/1302>
  • 11. Conclusions • We have introduced the OpenCitations Project, which has created an open repository of accurate bibliographic references harvested from the scholarly literature, i.e. the OpenCitations Corpus (OCC) • The number of citation links is growing day by day (about 25,000 new citation links per day) as the continuous workflow adds new data dynamically from Europe PubMedCentral (and other authoritative sources, i.e. Crossref and ORCID) • First adopter: Wikidata (via WikiCite) – The Wikidata community has created a property for associating the OCC bibliographic resource identifier to the metadata about scholarly papers in Wikidata – Several links from Wikidata to the OCC have been already added • Future plans: developing tools for linking the resources within the OCC with those included in other datasets, e.g. Wikidata, Scholarly Data, Springer LOD • Don’t hesitate to poke me during the poster and demo session on Wednesday (panel P30) for additional details about OpenCitations – and don’t forgot to vote for it, of course :-)
  • 12. Thanks for your attention Silvio Peroni, David Shotton, Fabio Vitali 4th International Workshop on 
 Linked Data for Information Extraction (LD4IE 2016)
 Kobe, Japan, October 18, 2016