The reasons why an author cites other publications are varied: an author can cite previous works to gain assistance of some sort in the form of background information, ideas, methods, or to review, critique or refute previous works. The problem is that the best possible way to retrieve the nature of citations is very time consuming: one should read article by article to assign a particular characterisation to each citation. In this work we propose an algorithm, called CiTalO, to infer automatically the function of citations by means of Semantic Web technologies and NLP techniques. We also present some preliminary experiments and discuss some strengths and limitations of this approach.
3. Citations as tools
• Bibliographic citations can be seen as tools for:
✦ linking research: making pointers to related works, to source of experimental data, to
methods used, etc.
✦ disseminating research: conference proceedings, journals,Web platforms (e.g. blogs,
wikis), Semantic Publishing platforms and projects (e.g. OpenCitation,
OpenBibliography, Lucero)
✦ exploring research: new ways of browsing article through networks of citations (e.g.
CiteWiz, Citation Sensitive In-browser Summariser)
✦ evaluating research: measuring the importance of journals (e.g. impact factor) or the
scientific productivity of authors (e.g. h-index)
• This work is based on the assumption that all these activities can be
radically improved by exploiting the actual function of citations, i.e.
author’s reason for citing a given paper
✦ Could a paper that is cited many times negatively be given a high score?
✦ Could a paper containing several self-citations be given the same score of a paper with
heterogeneous citations?
4. Intended vs. interpreted
meaning of citation functions
It extends the research outlined in earlier work [3].
Intended meaning
Interpreted meaning
The author of the text
A reader of the text Another reader of the text
“Text as a medium through which an
author transmits knowledge”
“The reader decodes this meaning
and literally ‘make sense’ of it”
“The meaning is not
contained within the words
themselves but in the
minds of the participants.”
Quotations from From Proteins to Fairytales: Directions in Semantic Publishing by Anita De Waard, DOI: 10.1109/MIS.2010.49
I want to convey a
particular meaning through
this text
I recognise
a particular meaning by
reading that text
I recognise
a particular meaning by
reading that text
These three
“meanings” are
not the same
5. http://wit.istc.cnr.it:8080/tools/citalo
Our setting
It extends the research outlined in earlier work [3].Interpreted meaning
A reader of the text Another reader of the text
I recognise
a particular meaning by
reading that text
I recognise
a particular meaning by
reading that text
I recognise a particular
meaning by reading that text
CiTalO (English translation: cite it) is a tool
that recognises the function of citations by
exploiting Semantic Web technologies
(CiTO, FRED, SPARQL) and NLP techniques
(IMS,AlchemyAPI)
6. pipeline
It extends the research
outlined in earlier work X.
Ontology
learning
Citation type
extraction
Word-sense
disambiguation
Alignment
to CiTO
Sentiment
analysis
output:
cito:extends
Input: a sentence
containing a
reference to a
bibliographic entity
indicated by an “X”
Derive a logical (i.e.
an OWL ontology)
representation of
the sentence
through FRED
Extract candidate
types for the citation
by looking for
patterns in FRED
output via SPARQL
Gather the sense of
the candidate types
through IMS with
respect to
OntoWordNet
Capture the sentiment
polarity emerging
from the text through
AlchemyAPI
Assign CiTO types
to the citation
through SPARQL
CONSTRUCT
7. Ontology learning and
citation type extraction
FRED output
It is returned as an OWL ontology in RDF/XML format
SELECT ?type WHERE {
?subj ?prop fred:X; a ?typeTmp.
?typeTmp rdfs:subClassOf+ ?type}
SELECT ?type WHERE {
?subj ?prop fred:X; a ?type}
SELECT ?type WHERE {?subj a dul:Event;
boxer:patient ?patient. ?patient a ?type}
SELECT ?type WHERE {?subj a dul:Event,
?type. FILTER(?type != dul:Event)}
Candidates
extraction
Looking for patterns
through SPARQL
8. Word-sense disambiguation and
alignment to CiTO
• We use IMS (a word-sense disambiguator) to disambiguate the candidate
types found, with respect to OntoWordNet (OWN)
✦ EarlierWork and Work: own:synset-work-noun-1
✦ Extend: own:synset-prolong-verb-1
✦ Outline: own:synset-delineate-verb-3
✦ Research: own:synset-research-noun-1
It is possible to extend
the set of retrieved
synsets by adding the
proximal ones
automatically
Wordnetsynsets
retrievedbyIMS
CiTO2Wordnet is an
ontology we developed
to maps all the CiTO
properties defining
citations with related
Wordnet synsets
CiTO is an ontology
that defines
functions of citations
as object properties
• Finally we perform SPARQL
CONSTRUCT to align the synsets to
those defined in the CiTO2Wordnet
ontology, which imports CiTO
✦ Properties in CiTO having opposite polarity
(that is defined in the CiTOFunctions
ontology) to what is returned by the
sentiment analysis module are not
considered
✦ If no alignment is performed,
cito:citesForInformation is returned as default
cito:extends
skos:closeMatch
own:synset-prolong-verb-1 .
cito:extends
is selected!
9. A preliminary empirical evaluation
• Comparing the results of CiTalO with a human classification of citations
• The test bed we used for our experiments includes some scientific papers (written in
English) encoded in DocBook, containing citations of different types
• We automatically extracted citation sentences, through an XSLT document, from all
the papers published in the 7th volume of Balisage Proceedings
✦ 18 scientific papers written by different authors
✦ 377 citations (20.94 citations per paper)
• We filtered all the citation sentences from the selected articles
✦ The annotation of citation functions is an hard problem to address
✦ We kept only 106 citations, which were accompanied by verbs and/or other grammatical structures
carrying explicitly a particular citation function, as suggested by Teufel et al. [18]
• We marked (according to CiTO properties) these 106 citations, obtaining at least
one representative citation for each of the 18 paper used (5.89 citations per paper)
• We used 21 CiTO properties out of 38 to annotate all these citations
10. CiTalO setting
• We run CiTalO on these 106 citation sentences and compared
results with our annotations
• We also tested eight different configurations of CiTalO,
corresponding to all possible combinations of three options:
✦ activating or deactivating the sentiment-analysis module;
✦ applying or not the proximal synsets to the IMS output;
✦ using an extended version of CiTO2Wordnet that includes all the synsets
Wordnet retrieves for a particular string (including those synsets having the
gloss radically different to the CiTO property in consideration)
CiTO2Wordnet
does not link cito:extends to that synset
CiTO2Wordnet – extended
do link the synset with cito:extends
Consider the synset own:synset-credit-verb-3, having gloss
“accounting: enter as credit”
and the CiTO definition for cito:credits
“the citing entity acknowledges contributions made by the cited entity”
11. Results
Kinds and number of citations marked by us
Similarly to Teufel et al. [19] the most
neutral CiTO property,
citesForInformation, was the most
prevalent function in our dataset
too, as the second most used
property was usedMethodIn
Precision and recall of CiTalO according to different configurations
No
configuration
that emerges as
the absolutely
best one from
these data
Worst
configurations
were those that
took into
account all the
proximal synsets
12. Conclusions
• The implementation of CiTalO is still at an early stage
• Current experiments are not enough to fully validate this approach
• However, the goal of this work was to build such a modular
architecture, to perform some exploratory experiments and to
identify issues and possible developments of our approach
• Future works:
✦ Propose extensions to CiTO to cover more scenarios (e.g. cito:speculatesOn
and cito:citesAsPotentialSolution)
✦ Decrease the noise given by proximal synsets
✦ Identification of anaphoras (e.g. nouns, pronouns) speaking about the citations
✦ Perform exhaustive tests with a larger set of documents and users
13. Thanks for your attention
Please come to see CiTalO in action
during the ESWC Demo Session on Tuesday