Talk at ISWC 2012 Workshop on Semantic Technologies Applied to Biomedical Informatics and Individualized Medicine (SATBI+SWIM 2012)
1. Formalising
Uncertainty:
An
Ontology
of
Reasoning,
Certainty
and
A9ribu<on
(ORCA)
Anita
de
Waard
Jodi
Schneider
Disrup<ve
Technologies
Director
PhD
Researcher
Elsevier
Labs,
Jericho,
VT,
USA
DERI,
Galway,
Ireland
2. Outline
• Background:
– Metadiscourse,
epistemic
modality,
and
knowledge
a9ribu<on,
oh
my!
– Some
related
work:
genre
studies,
linguis<cs,
NLP
• Our
model:
– What
it
models
– The
ontology
– How
can
we
find
this
in
text?
• Possible
applica<ons:
– Possible
uses
– Next
steps
4. Scien<sts
make
uncertain
claims
Uncertainty
These
miRNAs
neutralize
p53-‐mediated
CDK
inhibi;on,
possibly
through
direct
inhibi;on
of
the
expression
of
the
tumor-‐suppressor
LATS2.
5. But
uncertainty
gets
lost
while
ci<ng
Uncertainty
These
miRNAs
neutralize
p53-‐mediated
CDK
inhibi;on,
possibly
through
direct
inhibi;on
of
the
expression
of
the
tumor-‐suppressor
LATS2.
Certainty
Two
oncogenic
miRNAs,
miR-‐372
and
miR-‐373,
directly
inhibit
the
expression
of
Lats2,
thereby
allowing
tumorigenic
growth
in
the
presence
of
p53
(Voorhoeve
et
al.,
2006)
6. Uncertainty
in
ac<on:
“[Y]ou
can
transform
..
fic<on
into
fact
just
by
adding
or
subtrac<ng
references”,
Bruno
Latour
[1]
• Voorhoeve
et
al.,
2006:
These
miRNAs
neutralize
p53-‐
mediated
CDK
inhibi<on,
possibly
through
direct
inhibi<on
of
the
expression
of
the
tumor
suppressor
LATS2.
• Kloosterman
and
Plasterk,
2006:
In
a
gene<c
screen,
miR-‐372
and
miR-‐373
were
found
to
allow
prolifera<on
of
primary
human
cells
that
express
oncogenic
RAS
and
ac<ve
p53,
possibly
by
inhibi<ng
the
tumor
suppressor
LATS2
(Voorhoeve
et
al.,
2006).
• Yabuta
et
al.,
2007:
[On
the
other
hand,]
two
miRNAs,
miRNA-‐372
and-‐373,
func<on
as
poten6al
novel
oncogenes
in
tes<cular
germ
cell
tumors
by
inhibi<on
of
LATS2
expression,
which
suggests
that
Lats2
is
an
important
tumor
suppressor
(Voorhoeve
et
al.,
2006).
• Okada
et
al.,
2011:
Two
oncogenic
miRNAs,
miR-‐372
and
miR-‐373,
directly
inhibit
the
expression
of
Lats2,
thereby
allowing
tumorigenic
growth
in
the
presence
of
p53
(Voorhoeve
et
al.,
2006).
7. Uncertainty
=
Hedging:
• Why
do
authors
hedge?
– Make
a
claim
‘pending
[…]
acceptance
in
the
community’
[2]
– ‘Create
A
Research
Space’
–
hedging
allows
authors
to
insert
themselves
into
the
discourse
in
a
community
[3]
– ‘the
strongest
claim
a
careful
researcher
can
make’
[4]
• Hedging
cues,
specula<ve
language,
modality/nega<on:
– Light
et
al
[5]:
finding
specula<ve
language
– Wilbur
et
al
[6]:
focus,
polarity,
certainty,
evidence,
and
direc<onality
– Thompson
et
al
[7]:
level
of
specula<on,
type/source
of
the
evidence
and
level
of
certainty
• Sen<ment
detec<on
(e.g.
Kim
and
Hovy
[8]
a.m.o.):
– Holder
of
the
opinion,
strength,
polarity
as
‘mathema<cal
func<on’
ac<ng
on
main
proposi<onal
content
– Wide
applica<ons
in
product
reviews;
but
not
(yet)
in
science!
9. Our
model
for
epistemic
evalua<ons:
For
a
Proposi<on
P,
an
epistemically
marked
clause
E
is
an
evalua<on
of
P,
where
EV,
B,
S(P),
with:
– V
=
Value:
3
=
Assumed
true,
2
=
Probable,
1
=
Possible,
0
=
Unknown,
(-‐
1=
possibly
untrue,
-‐
2
=
probably
untrue,
-‐3
=
assumed
untrue)
– B
=
Basis:
Reasoning
Data
– S
=
Source:
A
=
speaker
is
author
A,
explicit
IA
=
speaker
author,
A,
implicit
N
=
other
author
N,
explicit
NN
=
other
author
NN,
implicit
Model
suggested
by
Eduard
Hovy,
Informa;on
Sciences
Ins;tute
University
South
Califormia
10. Adding
Epistemic
Evalua<on
Together,
Lats2
and
ASPP1
shunt
p53
to
proapopto<c
Value
=
3
promoters
and
promote
the
death
of
polyploid
cells
[1].
(…)
Source
=
N
Basis
=
0
Further
biochemical
characteriza<on
of
hMOBs
showed
that
Value
=
3
only
hMOB1A
and
hMOB1B
interact
with
both
LATS1
and
Source
=
N
LATS2
in
vitro
and
in
vivo
[39].
(…)
Basis
=
Data
Our
findings
reveal
that
miR-‐373
would
be
a
poten<al
Value
=
1
oncogene
and
it
par<cipates
in
the
carcinogenesis
of
human
Source
=
Author
esophageal
cancer
by
suppressing
LATS2
expression.
Basis
=
Data
Furthermore,
we
demonstrated
that
the
direct
inhibi<on
of
Value
=
2
(3?)
LATS2
protein
was
mediated
by
miR-‐373
and
manipulated
the
Source
=
Author
expression
of
miR-‐373
to
affect
esophageal
cancer
cells
growth.
Basis
=
Data
11. Finding
hedges
in
text
[9]:
• Modal
auxiliary
verbs
(e.g.
can,
could,
might)
• Qualifying
adverbs
and
adjec<ves
(e.g.
interes;ngly,
possibly,
likely,
poten;al,
somewhat,
slightly,
powerful,
unknown,
undefined)
• References,
either
external
(e.g.
‘[Voorhoeve
et
al.,
2006]’)
or
internal
(e.g.
‘See
fig.
2a’).
• Repor<ng/epistemic
verbs
(e.g.
suggest,
imply,
indicate,
show)
– either
within
the
clause:
‘These
results
suggest
that...’
– or
in
a
subordinate
clause
governed
by
repor<ng-‐verb
matrix
clause
‘{These
results
suggest
that}
indeed,
this
represents
the
true
endogenous
ac;vity.’
12. Manual
iden<fica<on:
Value
Modal
Repor6ng
Ruled
by
Adverbs/ Referenc None
Total
Aux
Verb
RV
Adjec6ves
es
Total
value
=
3
1
(0.5%)
81
(40%)
24
(12%)
7
(4%)
41
(20%)
47
(24%)
201(100%)
Total
Value
=
2
29
(51%)
23
(40%)
1
(2%)
4(7%)
57(100%)
Total
Value
=
1
9(27%)
11(33%)
11(33%)
1(3%)
1(3%)
33(100%)
Total
Value
=
0
9
(64%)
3
(21%)
1(7%)
1(7%)
14(100%)
Total
No
Modality
16(37%)
3(7%)
0
3(7%)
22(50%)
44(100%)
Overall
Total
10
(2%)
146(23%)
64(10%)
10(2%)
50(8%)
69(11%)
640(100%)
13. Most
prevalent
clause
type:
“These
results
suggest
that...”
Adverb/Connec<ve
thus,
therefore,
together,
recently,
in
summary
Determiner/Pronoun
it,
this,
these,
we/our
Adjec<ve
previous,
future,
beeer
Noun
phrase
data,
report,
study,
result(s);
method
or
reference
Modal
form
of
‘to
be’,
may,
remain
Adjec<ve
ogen,
recently,
generally
Verb
show,
obtain,
consider,
view,
reveal,
suggest,
hypothesize,
indicate,
believe
Preposi<on
that,
to
14. Repor<ng
verbs
vs.
epistemic
value:
Value
=
0
establish,
(remain
to
be)
elucidated,
(unknown)
be
(clear/useful),
(remain
to
be)
examined/determined,
describe,
make
difficult
to
infer,
report
Value
=
1
be
important,
consider,
expect,
hypothesize
(5x),
give
(hypothe<cal)
insight,
raise
possibility
that,
suspect,
think
Value
=
2
appear,
believe,
implicate
(2x),
imply,
indicate
(12x),
play
a
(probable)
role,
represent,
suggest
(18x),
validate
(2x),
Value
=
3
be
able/apparent/important
/posi<ve/visible,
compare
(presumed
true)
(2x),
confirm
(2x),
define,
demonstrate
(15x),
detect
(5x),
discover,
display
(3x),
eliminate,
find
(3x),
iden<fy
(4x),
know,
need,
note
(2x),
observe
(2x),
obtain
(success/
results-‐
3x),
prove
to
be,
refer,
report(2x),
reveal
(3x),
see(2x),
show(24x),
study,
view
15. Finding
Claimed
Knowledge
Updates
[10]:
Defini<on:
1)
A
CKU
expresses
a
proposi<on
about
biological
en<<es
2)
A
CKU
is
a
new
proposi<on
3)
The
authors
present
the
CKU
as
factual:
=>
Strength
=
Certainty
4)
A
CKU
is
derived
from
experimental
work
described
in
the
ar<cle:
=>
Basis
=
Data
5)
The
ownership
is
a9ributed
to
the
author(s)
of
the
ar<cle.
=>
Source
=
Author,
Explicit
3),
4)
and
5)
are
either
explicitly
expressed
or
structurally
conveyed:
Here
we
used
mass
spectrometry
to
iden:fy
HuD
as
a
novel
SMN-‐
interac;ng
partner
Our
analysis
of
known
HuD-‐associated
mRNAs
iden:fied
cpg15
mRNA
as
a
highly
abundant
mRNA
in
HuD
Ips
16. Automa<c
hedge
detec<on
with
The
Xerox
Incremental
Parser:
Concept-‐matching:
Match
concept
pa9erns
with
rules
Assign
features
to
keywords,
dependencies
and
sentences
General
linguis<c
analysis
of
running
texts:
Extract
syntac<c
dependencies
between
words
Chunking
Part-‐of-‐speech
disambigua<on
Segment
the
sentences
into
words
Segment
the
text
into
sentences
17. Result:
CKUs
appear
throughout
the
paper
bio-event
entity 1 event name entity 2 location
HuD interaction SMN motor neurons
Title Abstract Intro. Results Figures Discussion Citation
Interaction of Here we used Here we Together with SMN Our MS and Furthermore,
survival of mass identify HuD our co-IP interacts co-IP data these findings
motor spectrometry as a novel data, these with HuD. demonstrate are consistent
neuron to identify interacting results a strong with recent
(SMN) and HuD as a partner of indicate that interaction studies
HuD proteins novel SMN, SMN between demonstrating
[with m RNA neuronal associates SMN and that the
cpg15rescues SMN- with HuD in HuD in interaction of
motor neuron interacting motor spinal motor HuD with the
axonal partner. neurons. neuron spinal
deficits] axons. muscular
atrophy
(SMA)
protein SMN
…
18. The
Xerox
Incremental
Parser:
Concept-‐matching:
Match
concept
pa9erns
with
rules
Assign
features
to
keywords,
dependencies
and
sentences
General
linguis<c
analysis
of
running
texts:
Extract
syntac<c
dependencies
between
words
Chunking
Part-‐of-‐speech
disambigua<on
Segment
the
sentences
into
words
Segment
the
text
into
sentences
25. How
to
represent
the
hierarchy?
lack
of
knowledge
<
hypothe;cal
knowledge
<
dubita;ve
knowledge
<
doxas;c
knowledge
• skos:broaderThan
–
not
appropriate
• skos
Collec<ons
add
an
unwanted
layer
of
complexity.
• Our
approach:
transi<ve
proper<es
“lessCertain”
and
“moreCertain”
29. Add
knowledge
value/basis/source
to
a
bio-‐event
Biological
statement
with
epistemic
markup
Epistemic
evalua6on
Our
findings
reveal
that
miR-‐373
would
be
a
Value
=
Probable
poten<al
oncogene
and
it
par<cipates
in
the
Source
=
Author
carcinogenesis
of
human
esophageal
cancer
by
Basis
=
Data
suppressing
LATS2
expression.
Further
biochemical
characteriza<on
of
hMOBs
Value
=
Presumed
showed
that
only
hMOB1A
and
hMOB1B
interact
true
with
both
LATS1
and
LATS2
in
vitro
and
in
vivo
[39].
Source
=
Reference
Basis
=
Data
Moreover,
the
mechanisms
by
which
tumor
Value
=
Possible
suppressor
genes
are
inhibited
may
vary
between
Source
=
Unknown
tumors.
Basis
=
Unknown
30. E.g.
to
augment
Medscan
[13]
Biological
statement
with
Medscan/ MedScan
Analysis:
Epistemic
epistemic
markup
evalua6on
Furthermore,
we
present
evidence
that
IL-‐6
è
NUCB2
(nesfa;n-‐1)
Value
=
Probable
the
secre;on
of
nesfa:n-‐1
into
the
Rela<on:
MolTransport
Source
=
Author
culture
media
was
drama<cally
increased
Effect:
Posi<ve
Basis
=
Data
during
the
differen<a<on
of
3T3-‐L1
CellType:
Adipocytes
preadipocytes
into
adipocytes
(P
<
0.001)
Cell
Line:
3T3-‐L1
and
a{er
treatments
with
TNF-‐alpha,
IL-‐6,
insulin,
and
dexamethasone
(P
<
0.01).
31. Or
Biological
Exchange
Language
[14]:
Biological
statement
with
BEL
representa6on:
Epistemic
BEL/
epistemic
markup
evalua6on
These
miRNAs
neutralize
p53-‐ Increased
abundance
of
miR-‐372
Value
=
Possible
decreases:
Increased
ac;vity
of
TP53
mediated
CDK
inhibi;on,
Source
=
decreases
ac;vity
of
CDK
protein
family
possibly
through
direct
r(MIR:miR-‐372)
-‐| Unknown
inhibi;on
of
the
expression
of
(tscript(p(HUGO:Trp53))
-‐|
Basis
=
the
tumor-‐suppressor
LATS2.
kin(p(PFH:”CDK
Family”)))
Unknown
Increased
abundance
of
miR-‐372
decreases
abundance
of
LATS2
r(MIR:miR-‐372)
-‐|
r(HUGO:LATS2)
32. Using
ORCA
for
Nanopublica<ons
[15]:
• Use
to
indicate
Strength,
Basis,
Source
of
Asser<ons:
Knowledge
Strength,
Methods
Authors,
DOIs
Basis,
Source
33. Next
steps:
• Con<nuing
experiments
with
automated
detec<on
• Can
be
used
in
Claim-‐Evidence
network
projects,
e.g.
Data2Seman<cs
or
DIKB
• Could
replace
more
complicated
models
of
argumenta<on
• Ontology
is
available
for
all
to
use!
34. Thank
you!
• Funding:
• Discussion
partners:
– Elsevier
Labs
– Phil
Bourne,
UCSD
– NWO
Casimir
programme
– Ed
Hovy,
• Collaborators:
– Gully
Burns,
ISI
– Henk
Pander
Maat,
UU
– Joanne
Luciano,
RPI
– Agnes
Sandor,
XRCE
– Tim
Clark
et
al.,
Harvard
– Siegfried
Handshuh,
DERI
– Rinke
Hoekstra
&
co,
VU
– Richard
Boyce
&
co,
UPi9
– Maria
Liakata,
EBI
– Sophia
Ananiadou
&
co,
NaCTeM
35. Ques<ons?
Anita
de
Waard
a.dewaard@elsevier.com
h9p://elsatglabs.com/labs/anita/
Jodi
Schneider
jodi.schneider@deri.org
h9p://jodischneider.com/jodi.html
36. References
[1]
Latour,
B.
and
Woolgar,
S.,
Laboratory
Life:
the
Social
Construc<on
of
Scien<fic
Facts,
1979,
Sage
[2]
Myers,
G.
(1992).
‘In
this
paper
we
report’:
Speech
acts
and
scien<fic
facts,
Jnl
of
Pragmatlcs
17
(1992)
295-‐313
[3]
Swales,
J.
(1990).
Genre
Analysis,
English
in
Acad.
and
Res.Se}ngs,
Cambridge
University
Press,
1990.
[4]
Salager-‐Meyer,
F.
(1994),
Hedges
and
Textual
Communica<ve
Func<on
in
Medical
English
Wri9en
Discourse,
English
for
Specific
Purposes,
Vol.
13,
No.
2,
pp.
149-‐170,
1994.
[5]
Light
M,
Qiu
XY,
Srinivasan
P.
(2004).
The
language
of
bioscience:
facts,
specula<ons,
and
statements
in
between.
BioLINK
2004:
Linking
Biological
Literature,
Ontologies
and
Databases
2004:17-‐24.
[6]
Wilbur
WJ,
Rzhetsky
A,
Shatkay
H
(2006).
New
direc<ons
in
biomedical
text
annota<ons:
defini<ons,
guidelines
and
corpus
construc<on.
BMC
Bioinforma<cs
2006,
7:356.
[7]
Thompson
P.,
Venturi
G.
et
al.
(2008).
Categorising
modality
in
biomedical
texts.
Proc.
LREC
2008
Wkshp
Building
and
Evalua<ng
Resources
for
Biomedical
Text
Mining
2008.
[8]
Kim,
S-‐M.
Hovy,
E.H.
(2004).
Determining
the
Sen<ment
of
Opinions,COLING
conference,
Geneva,
2004.
[9]
de
Waard,
A.
and
Pander
Maat,
H.
(2012).
Epistemic
Modality
and
Knowledge
A9ribu<on
in
Scien<fic
Discourse:
A
Taxonomy
of
Types
and
Overview
of
Features.
Workshop
on
Detec<ng
Structure
in
Scholarly
Discourse,
ACL
2012.
[10]
Sándor,
À.
and
de
Waard,
A.,
(2012).
Iden<fying
Claimed
Knowledge
Updates
in
Biomedical
Research
Ar<cles,
Workshop
on
Detec<ng
Structure
in
Scholarly
Discourse,
ACL
2012.
[11]
de
Waard,
A.
and
Schneider,
J.
(2012)
Formalising
Uncertainty:
An
Ontology
of
Reasoning,
Certainty
and
A9ribu<on
(ORCA),
SATBI+SWIM,
ISWC
2012.
[12]
Medscan
[13]
Biological
Expression
Language
–
h9p://www.openbel.org
[14]
Groth
et
al
(2010)
'The
anatomy
of
a
nanopublica<on'
Informa<on
Services
&
Use
30:51-‐6