How Scientists Read, And Whether Computers Can Help Them
1. How
Scien*sts
Read,
And
Whether
Computers
Can
Help
Them
Anita
de
Waard
Disrup*ve
Technologies
Director
Elsevier
Labs
Making
Sense
of
Biological
Systems,
Bozeman,
MT
2. Outline
• Why
do
scien*sts
read?
• How
do
we
read?
(Discourse
comprehension
101)
• What
do
we
need
to
read:
– Noun
phrases
– Triples
– Metadiscourse
– Claims
and
Evidence
• Can
the
computer
iden*fy
these
components?
• Some
thoughts
on
explaining
our
texts
to
computers
3. How
and
why
scien*sts
read:
• Why
do
we
read?
To
learn,
i.e.:
obtain
the
knowledge
contained
within
the
text
and
integrate
it
with
what
we
already
know.
• What
do
we
read?
Things
that
are
‘interes*ng’
:
– Per*nent
– Possibly/probably
true
– Novel,
but
in
agreement
with
what
I
know
• How
do
we
read?
4. Discourse
Comprehension
101
• LeTer
<
syllable
<
word
<
clause
<
sentence
<
discourse:
This
is
how
linguis*cs
is
structured.
But
it
is
not
how
we
understand
text!
5. Discourse
Comprehension
101
• LeTer
<
syllable
<
word
<
clause
<
sentence
<
discourse:
This
is
how
linguis*cs
is
structured.
But
it
is
not
how
we
understand
text!
6. Discourse
Comprehension
101
• LeTer
<
syllable
<
word
<
clause
<
sentence
<
discourse:
This
is
how
linguis*cs
is
structured.
But
it
is
not
how
we
understand
text!
7. Discourse
Comprehension
101
• LeTer
<
syllable
<
word
<
clause
<
sentence
<
discourse:
This
is
how
linguis*cs
is
structured.
But
it
is
not
how
we
understand
text!
8. Discourse
Comprehension
101
• LeTer
<
syllable
<
word
<
clause
<
sentence
<
discourse:
This
is
how
linguis*cs
is
structured.
But
it
is
not
how
we
understand
text!
9. Discourse
Comprehension
101
• LeTer
<
syllable
<
word
<
clause
<
sentence
<
discourse:
This
is
how
linguis*cs
is
structured.
But
it
is
not
how
we
understand
text!
10. Discourse
Comprehension
101
• LeTer
<
syllable
<
word
<
clause
<
sentence
<
discourse:
This
is
how
linguis*cs
is
structured.
But
it
is
not
how
we
understand
text!
• Kintsch
and
Van
Dijk,
‘93:
we
read
a
text
at
three
levels:
– surface
code:
literal
text,
exact
words/syntax
– text
base:
preserves
meaning,
but
not
exact
wording
– situa*on
model:
‘microworld’
that
the
text
is
about:
constructed
inferen*ally
through
interac*on
between
the
text
and
background
knowledge
• We
use
knowledge
about
text
genre
to
ac*vate
a
schema:
this
allows
crea*on
of
the
text
base
and
situa*on
model
13. What
is
this
paper
about?
A.
NOUN
PHRASES
transiently
expressed
miRNA
sponges
human
breast
cancer
high-‐grade
malignancy
miR-‐31
noninvasive
MCF7-‐Ras
an*sense
oligonucleo*des
cell
viability
cloned
retroviral
vector
Is
it
per*nent?
-‐>
Possibly…
Is
it
true?
-‐>
?
Is
it
new,
but
in
agreement
with
what
I
know?
-‐>
-‐?
14. What
is
this
paper
about?
B.
TRIPLES
miR-‐31
expression
DEPRIVE
metasta*c
cells
miR-‐31
PREVENT
acquisi*on
of
aggressive
traits
miR-‐31
INHIBIT
noninvasive
MCF7-‐Ras
cells
miR-‐31
ENHANCE
invasion
cell
viability
AFFECT
inhibitor
Is
it
per*nent?
-‐>
Possibly…
Is
it
true?
-‐>
?
Is
it
new,
but
in
agreement
with
what
I
know?
-‐>?
15. What
is
this
paper
about?
C.
METADISCOURSE
The
preceding
observa*ons
demonstrated
that
X
expression
deprives
Y
cells
of
aTributes
associated
with
Z.
We
next
asked
whether
X
also
prevents
the
acquisi*on
of
A
traits
by
B
cells.
To
do
so,
we
transiently
inhibited
X
in
C
cells
with
either
D
or
E.
Both
approaches
inhibited
X
func*on
by
>
4.5-‐fold
(Figure
S7A).
Suppression
of
X
enhanced
invasion
by
20-‐fold
and
mo*lity
by
5-‐fold,
but
F
was
unaffected
by
either
inhibitor
(Figure
3A;
Figure
S7B).
The
E
sponge
reduced
X
func*on
by
2.5-‐fold,
but
did
not
affect
the
ac*vity
of
other
known
Js
(Figures
S8A
and
S8B).
Collec*vely,
these
data
indicated
that
sustained
X
ac*vity
is
necessary
to
prevent
the
acquisi*on
of
Z
traits
by
both
K
and
untransformed
B
cells.
Is
it
per*nent?
-‐>
Need
content
Is
it
true?
-‐>
Sounds
likely!
I
know
this
stuff!
Is
it
new,
but
in
agreement
with
what
I
know?
-‐>
Need
content
16. What
is
this
paper
about?
D.
CLAIMS
AND
EVIDENCE
Claim:
• sustained
miR-‐31
ac*vity
is
necessary
to
prevent
the
acquisi*on
of
aggressive
traits
by
both
tumor
cells
and
untransformed
breast
epithelial
Evidence:
Method:
• We
transiently
inhibited
miR-‐31
in
noninvasive
MCF7-‐Ras
cells
with
either
an*sense
oligonucleo*des
or
miRNA
sponges.
Evidence:
Result:
• Both
approaches
inhibited
miR-‐31
func*on
by
>4.5-‐fold
(Figure
S7A).
• Suppression
of
miR-‐31
enhanced
invasion
by
20-‐fold
and
mo*lity
by
5-‐fold,
but
cell
viability
was
unaffected
by
either
inhibitor
(Figure
3A;
Figure
S7B).
• The
miR-‐31
sponge
reduced
miR-‐31
func*on
by
2.5-‐fold,
but
did
not
affect
the
ac*vity
of
other
known
an*metasta*c
miRNAs
(Figures
S8A
and
S8B).
Is
it
per*nent?
-‐>
Probably
Is
it
true?
-‐>
Sounds
likely!
Is
it
new,
but
in
agreement
with
what
I
know?
-‐>
Check/know
17. What
is
this
paper
about?
E.
JOURNAL
&
AUTHOR’S
NAMES/AFFILIATIONS
Is
it
per*nent?
-‐>
Possibly
Is
it
true?
-‐>
Probably!
Is
it
new,
but
in
agreement
with
what
I
know?
-‐>
Need
background
18. In
summary,
how
scien*sts
read:
• Surface
code
provides
noun
phrases
and
triples
that
offer
pointers
re.
topical
relevance
• Text
base
and
and
situa*on
model
are
created
through
specific
metadiscourse
conven*ons
(e.g.
refs
at
the
end)
that
create
a
biological
reasoning
model:
We
next
asked
whether
…
Hypothesis
To
do
so,
we
transiently
inhibited…
Goal/Method
Suppression
of
X
enhanced
invasion
…
Result
but
F
was
unaffected
…(Figure
3A).
…
Results
Collec*vely,
these
data
indicated
that
…
.
Implica*on
• This
can
be
expressed
as
a
set
of
claims,
linked
to
evidence,
that
can
help
represent
key
points
in
the
paper
• Journal
name
and
author’s
affiliaHon
help
define
schema
and
provide
‘willingness
to
be
convinced’
socially/interpersonally.
19. Can
computers
help
us
iden*fy:
A. Noun
phrases
B. Triples
C. Metadiscourse
elements
D. Claims
+
evidence
E. Journal
and
author’s
names
and
affilia*on
20. Can
computers
help
us
iden*fy:
A. Noun
phrases
B. Triples
C. Metadiscourse
elements
D. Claims
+
evidence
E. Journal
and
author’s
names
and
affiliaHon
22. Noun
Phrases:
some
progress
• Despite
these
difficul*es,
noun
phrase
recall/precision
is
quite
high,
e.g.
I2B22011
[1],
[2],
others:
90%-‐98%
• Many
tools,
see
[3]
for
a
list;
e.g.
GoPubMed:
23. Triples:
some
issues:
• Con*ngent
on
good
NP
&
VP
detec*on
• Hard
to
parse
text!
E.g.
a
commercial
tool
gave:
insulin
maintaining
glucose
homeostasis
When
insulin
secre*on
cannot
be
increased
adequately
(type
I
diabetes
defect)
to
overcome
insulin
resistance
in
maintaining
glucose
homeostasis,
hyperglycemia
and
glucose
intolerance
ensues.
insulin
may
be
involved
glucose
homeostasis
Because
PANDER
is
expressed
by
pancrea*c
beta-‐cells
and
in
response
to
glucose
in
a
similar
way
to
those
of
insulin,
PANDER
may
be
involved
in
glucose
homeostasis.
24. Triples:
some
progress:
Biological
Expression
Language
[4]:
We
provide
evidence
that
these
miRNAs
are
potenHal
novel
oncogenes
parHcipaHng
in
the
development
of
human
tesHcular
germ
cell
tumors
by
numbing
the
p53
pathway,
thus
allowing
tumorigenic
growth
in
the
presence
of
wild-‐type
p53.
Increased
abundance
of
miR-‐372
decreases
ac5vity
of
TP53
r(MIR:miR-372) -| tscript(p(HUGO:Trp53))
Context:
cancer
SET Disease = “Cancer”
Ac5vity
of
TP53
decreases
cell
growth
tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”
25. Metadiscourse:
why
it
maTers
“[Y]ou
can
transform
..
fic*on
into
fact
just
by
adding
or
subtrac*ng
references”,
Bruno
Latour
[5]
• Voorhoeve
et
al.,
2006:
These
miRNAs
neutralize
p53-‐
mediated
CDK
inhibi*on,
possibly
through
direct
inhibi*on
of
the
expression
of
the
tumor
suppressor
LATS2.
• Kloosterman
and
Plasterk,
2006:
In
a
gene*c
screen,
miR-‐372
and
miR-‐373
were
found
to
allow
prolifera*on
of
primary
human
cells
that
express
oncogenic
RAS
and
ac*ve
p53,
possibly
by
inhibi*ng
the
tumor
suppressor
LATS2
(Voorhoeve
et
al.,
2006).
• Yabuta
et
al.,
2007:
[On
the
other
hand,]
two
miRNAs,
miRNA-‐372
and-‐373,
func*on
as
poten5al
novel
oncogenes
in
tes*cular
germ
cell
tumors
by
inhibi*on
of
LATS2
expression,
which
suggests
that
Lats2
is
an
important
tumor
suppressor
(Voorhoeve
et
al.,
2006).
• Okada
et
al.,
2011:
Two
oncogenic
miRNAs,
miR-‐372
and
miR-‐373,
directly
inhibit
the
expression
of
Lats2,
thereby
allowing
tumorigenic
growth
in
the
presence
of
p53
(Voorhoeve
et
al.,
2006).
26. Metadiscourse:
some
progress
• Hedging
cues,
specula*ve
language,
modality/nega*on:
– Light
et
al
[6]:
finding
specula*ve
language
– Wilbur
et
al
(Hagit)
[7]:
focus,
polarity,
certainty,
evidence,
and
direc*onality
– Thompson
et
al
(Sophia)
[8]:
level
of
specula*on,
type/source
of
the
evidence
and
level
of
certainty
• Sen*ment
detec*on
(e.g.
Kim
and
Hovy
[9]
a.m.o.):
– Holder
of
the
opinion,
strength,
polarity
as
‘mathema*cal
func*on’
ac*ng
on
main
proposi*onal
content
• Can
make
this
part
of
the
seman*c
web:
(e.g.,
Ontology
for
Reasoning,
Certainty
and
ATribu*on,
ORCA
[10]):
– Value
(Presumed
True,
Probable,
Possible,
Unknown)
– Source
(Author,
Named
Other,
Unknown)
– Basis
(Data,
Reasoning,
Unknown)
27. Claims
and
Evidence:
some
issues:
• Data2Seman*cs
[11]:
linking
clinical
guidelines
to
evidence.
Inconsistency
within
guideline
and
guidelines
v.
evidence:
• Studies
have
demonstrated
inconsistent
results
regarding
the
use
of
such
markers
of
inflamma*on
as
C-‐reac*ve
protein
(CRP),
interleukins-‐
6
(IL-‐6)
and
-‐8,
and
procalcitonin
(PCT)
in
neutropenic
pa*ents
with
cancer
[55–57].
• [55]:
PCT
and
IL-‐6
are
more
reliable
markers
than
CRP
for
predic*ng
bacteremia
in
pa*ents
with
febrile
neutropenia
• [56]
In
conclusion,
daily
measurement
of
PCT
or
IL-‐6
could
help
iden5fy
neutropenic
pa5ents
with
a
stable
course
when
the
fever
lasts
>3
d.
…,
it
would
reduce
adverse
events
and
treatment
costs.
• [57]
Our
study
supports
the
value
of
PCT
as
a
reliable
tool
to
predict
clinical
outcome
in
febrile
neutropenia.
• Drug
Interac*on
Knowledgebase
[12]:
how
to
iden*fy
evidence?
• R-‐citalopram_is_not_substrate_of_cyp2c19:
• At
10uM
R-‐
or
S-‐CT,
ketoconazole
reduced
reac*on
velocity
to
55
-‐60%
of
control,
quinidine
to
80%,
and
omeprazole
to
80-‐85%
of
control
(Fig.
6).
28. Claims
and
Evidence:
some
progress
• Defining
‘salient
knowledge
components’
in
text:
– Argumenta*ve
zones,
CoreSC
can
both
be
found
– Blake,
Claim
networks
(more
soon!)
– Claimed
Knowledge
Updates
(Sandor/de
Waard,
[13]):
29. Perhaps
we
should
start
wri*ng
for
computers?
• So
why
doesn’t
the
author
add
this
informa*on?
If
you’re
know
you’re
going
to
mine
it,
why
bury
it?
• Authoring
tools
for
en*ty
iden*fica*on:
MS
for
Chemistry,
Math,
proteins;
some
experiments
but
no
solu*on
yet
[14]
• Authoring
tool
for
triple
iden*fica*on
(MS
Ac*veText)
• But
the
ques*on
remains:
A}er
we’ve
‘extracted’
all
the
‘facts’,
what
is
all
the
gunk
that
remains
in
the
filter?
30. Perhaps
we
should
explain:
a
paper
is
rhetorical?
Aristotle
Quin5lian
Scien5fic
Paper
The
introduc*on
of
a
speech,
where
one
announces
the
subject
Introduc*on and
purpose
of
the
discourse,
and
where
one
usually
employs
Introduc*on:
prooimion
/
exordium
the
persuasive
appeal
to
ethos
in
order
to
establish
credibility
posi*oning
with
the
audience.
Statement
of
The
speaker
here
provides
a
narra*ve
account
of
what
has
Introduc*on:
research
prothesis
Facts/
happened
and
generally
explains
the
nature
of
the
case.
narraHo
ques*on
Summary/
The
proposi*o
provides
a
brief
summary
of
what
one
is
about
proposHHo
to
speak
on,
or
concisely
puts
forth
the
charges
or
accusa*on.
Summary
of
contents
Proof/
The
main
body
of
the
speech
where
one
offers
logical
pis*s
confirmaHo
arguments
as
proof.
The
appeal
to
logos
is
emphasized
here.
Results
Refuta*on/
As
the
name
connotes,
this
sec*on
of
a
speech
was
devoted
to
refutaHo
answering
the
counterarguments
of
one's
opponent.
Related
Work
Following
the
refuta*o
and
concluding
the
classical
ora*on,
the
Discussion:
summary,
epilogos
peroraHo
perora*o
conven*onally
employed
appeals
through
pathos,
and
o}en
included
a
summing
up.
implica*ons.
-
goal
of
the
paper
is
to
be
published;
it
uses
author/journal
as
a
host
-
format
has
co-‐evolved:
predator-‐prey
rela*onship
with
reviewers
31. Perhaps
we
should
explain:
a
paper
is
a
story?
Story Grammar
The Story of Goldilocks and Paper The AXH Domain of Ataxin-1 Mediates
the Three Bears
Grammar
Neurodegeneration through Its Interaction with Gfi-1/
Senseless Proteins
Setting
Time
Once upon a time
Background
The mechanisms mediating SCA1 pathogenesis are still not fully
understood, but some general principles have emerged.
Character
a little girl named Goldilocks
Objects of the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,
study
Location
She went for a walk in the forest.
Pretty soon, she came upon a Experimental studied and compared in vivo effects and interactions to those of the
house.
setup
human protein
Theme
Goal
She knocked and, when no one Research Gain insight into how Atx-1's function contributes to SCA1
answered,
goal
pathogenesis. How these interactions might contribute to the disease
process and how they might cause toxicity in only a subset of neurons
in SCA1 is not fully understood.
Attempt
she walked right in.
Hypothesis
Atx-1 may play a role in the regulation of gene expression
Episode
Name
At the table in the kitchen, there Name
dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed
were three bowls of porridge.
in Files
Subgoal
Goldilocks was hungry.
Subgoal
test the function of the AXH domain
Attempt
She tasted the porridge from the Method
overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and
first bowl.
Perrimon, 1993) and compared its effects to those of hAtx-1.
Outcome
This porridge is too hot! she Results
Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives
exclaimed.
expression in the differentiated R1-R6 photoreceptor cells (Mollereau
et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in
Attempt
So, she tasted the porridge from the the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days
second bowl.
after eclosion, overexpression of either Atx-1 does not show obvious
morphological changes in the photoreceptor cells
Outcome
This porridge is too cold, she said
Data
(data not shown),
Attempt
So, she tasted the last bowl of
porridge.
Results
both genotypes show many large holes and loss of cell integrity at 28
days
Outcome
Ahhh, this porridge is just right, she
(Figures 1B-1D).
32. A
closer
look
at
verb
tense:
Conceptual realm: ‘state’ (gnomic) present
• ‘Dopaminergic innervation plays a major role in the control of mood
and its perturbation’
Experimental realm: ‘event’ past
• ‘Four out of seven cell lines expressed this cluster’,
• ‘Adult rats were individually housed for 2 days before testing.’
Argumentational realm: ‘instantaneous’ present; to-infinitive
• ‘These results suggest that...’,
• ‘To identify these mechanisms…’
Discourse progression: ‘instantaneous’ present
• ‘Fig 2a shows that’
• ‘see figure 7A’,
Reference to other work: present perfect - ‘finalised’ past
• ‘Previous work has demonstrated that VPCs are sensitive to the
levels of let-60/RAS (Han and Sternberg, 1990).’
33. Tense
use
in
science
and
mythology:
Facts
in
the
Endogenous
small
RNAs
(miRNAs)
regulate
I
sing
of
golden-‐throned
Hera
whom
Rhea
bare.
eternal
present
gene
expression
by
mechanisms
conserved
Queen
of
the
immortals
is
she,
surpassing
all
in
across
metazoans.
beauty:
she
is
the
sister
and
the
wife
of
loud-‐
thundering
Zeus,
-‐-‐the
glorious
one
whom
all
the
blessed
throughout
high
Olympus
reverence
and
honor.
Events
in
the
Vehicle-‐treated
animals
spent
equivalent
Now
the
wooers
turned
to
the
dance
and
to
simple
past
*me
inves*ga*ng
a
juvenile
in
the
first
and
gladsome
song,
and
made
them
merry,
and
waited
second
sessions
in
experiments
conducted
in
*ll
evening
should
come;
and
as
they
made
merry
the
NAC
and
the
striatum:
T1
values
were
dark
evening
came
upon
them.
122
±
6
s
and
114
±
5
s.
Events
with
We
also
generated
BJ/ET
cells
expressing
the
And
she
took
her
mighty
spear,
*pped
with
sharp
embedded
RASV12-‐ERTAM
chimera
gene,
which
is
only
bronze,
heavy
and
huge
and
strong,
wherewith
facts
ac*ve
when
tamoxifen
is
added
(De
Vita
et
al,
she
vanquishes
the
ranks
of
men-‐of
warriors,
with
2005).
whom
she
is
wroth,
she,
the
daughter
of
the
mighty
sire.
AMribu5on
in
miRNAs
have
emerged
as
important
In
this
book
I
have
had
old
stories
wriTen
down,
as
the
present
regulators
of
development
and
control
I
have
heard
them
told
by
intelligent
people,
perfect
processes
such
as
cell
fate
determina*on
and
concerning
chiefs
who
have
held
dominion
in
the
cell
death
(Abrahante
et
al.,
2003,
Brennecke
northern
countries,
and
who
spoke
the
Danish
et
al.,
2003,
Chang
et
al.,
2004,
Chen
et
al.,
tongue;
and
also
concerning
some
of
their
family
2004,
Johnston
and
Hobert,
2003,
Lee
et
al.,
branches,
according
to
what
has
been
told
me.
1993]
Implica5ons
These
results
indicate
that
although
Now
it
is
said
that
ever
since
then
whenever
the
are
hedged,
miR-‐3723
confer
complete
protec*on
to
camel
sees
a
place
where
ashes
have
been
and
in
the
oncogene-‐induced
senescence
in
a
manner
scaTered,
he
wants
to
get
revenge
with
his
enemy
present
tense
similar
to
p53
inac*va*on,
the
cellular
the
rat
and
stomps
and
rolls
in
the
ashes
hoping
to
response
to
DNA
damage
remains
intact
get
the
rat
34. Some
conclusions:
• How
we
read:
surface
code,
textbase,
situa*on
model
• Useful
components:
find
noun
phrases,
triples,
metadiscourse,
claims
and
evidence
• Computers
keep
ge•ng
beTer
at
iden*fying
these
• Authoring
tools
might
let
us
help
computers
• But
for
the
forseeable
future,
scien*sts
will
con*nue
to
need
to
scan
the
literature
to
understand
and
believe
science
and
make
connec*ons
between
knowledge
• To
achieve
progress,
perhaps
focus
less
on
what
computers
can
do
and
more
on
how
humans
communicate?
• Let’s
pursue
collabora*ons
with
linguists,
cogni*ve
psychologists
etc.
on
how
we
read
and
learn!
35. Acknowledgements
• Funding:
• Discussion
partners:
– Elsevier
Labs
– Phil
Bourne,
UCSD
– NWO
– Ed
Hovy,
• Collaborators:
– Gully
Burns,
ISI
– Henk
Pander
Maat,
UU
– Joanne
Luciano,
RPI
– Agnes
Sandor,
XRCE
– Tim
Clark
et
al.,
Harvard
– Jodi
Schneider,
DERI
…
and
all
of
you
J!
– Rinke
Hoekstra
co,
VU
– Richard
Boyce
co,
UpiT
– Maria
Liakata,
EBI
– Sophia
Ananiadou
co,
NaCTeM
36. Ques*ons?
Anita
de
Waard
a.dewaard@elsevier.com
hTp://elsatglabs.com/labs/anita/
37. References
[1]
J
Am
Med
Inform
Assoc.
2010
September;
17(5):
514–518
hTp://dx.doi.org/10.1136/jamia.2010.003947
[2]
Quanzhi
Li,
Yi-‐Fang
Brook
Wu
(2006):
Iden*fying
important
concepts
from
medical
documents,
Journal
of
Biomedical
Informa*cs
39
(2006)
668–679
[3]
Useful
list
of
resources
in
bioinforma*cs
hTp://www.bioinforma*cs.ca/
[4]
Biological
Expression
Language
–
hTp://www.openbel.org
[5]
Latour,
B.
and
Woolgar,
S.,
Laboratory
Life:
the
Social
Construc*on
of
Scien*fic
Facts,
1979,
Sage
Publica*ons
[6]
Light
M,
Qiu
XY,
Srinivasan
P.
(2004).
The
language
of
bioscience:
facts,
specula*ons,
and
statements
in
between.
BioLINK
2004:
Linking
Biological
Literature,
Ontologies
and
Databases
2004:17-‐24.
[7]
Wilbur
WJ,
Rzhetsky
A,
Shatkay
H
(2006).
New
direc*ons
in
biomedical
text
annota*ons:
defini*ons,
guidelines
and
corpus
construc*on.
BMC
Bioinforma*cs
2006,
7:356.
[8]
Thompson
P.,
Venturi
G.,
McNaught
J,
Montemagni
S,
Ananiadou
S.
(2008).
Categorising
modality
in
biomedical
texts.
Proc.
LREC
2008
Wkshp
Building
and
Evalua*ng
Resources
for
Biomedical
Text
Mining
2008.
[9]
Kim,
S-‐M.
Hovy,
E.H.
(2004).
Determining
the
Sen*ment
of
Opinions.
Proceedings
of
the
COLING
conference,
Geneva,
2004.
[10]
de
Waard,
A.
and
Schneider,
J.
(2012)
Formalising
Uncertainty:
An
Ontology
of
Reasoning,
Certainty
and
ATribu*on
(ORCA),
Seman*c
Technologies
Applied
to
Biomedical
Informa*cs
and
Individualized
Medicine
workshop
at
ISWC
2012
(submibed)
[11]
Data2Seman*cs
project:
hTp://www.data2seman*cs.org/
[12]
Boyce
R,
Collins
C,
Horn
J,
Kalet
I.
(2009)
Compu*ng
with
evidence
Part
I:
A
drug-‐mechanism
evidence
taxonomy
oriented
toward
confidence
assignment.
J
Biomed
Inform.
2009
Dec;42(6):979-‐89.
Epub
2009
May
10,
see
also
hTp://dbmi-‐icode-‐01.dbmi.piT.edu/dikb-‐evidence/front-‐page.html
[13]
Sándor,
Àgnes
and
de
Waard,
Anita,
(2012).
Iden*fying
Claimed
Knowledge
Updates
in
Biomedical
Research
Ar*cles,
Workshop
on
Detec*ng
Structure
in
Scholarly
Discourse,
ACL
2012.
[14]
See
e.g.
hTp://ucsdbiolit.codeplex.com/
and
hTp://research.microso}.com/en-‐us/projects/ontology/
for
MS
Word
ontology
add-‐ins
39. Logical
structure
of
epistemic
evalua*ons:
For
a
Proposi*on
P,
an
epistemically
marked
clause
E
is
an
evalua*on
of
P,
where
EV,
B,
S(P),
with:
– V
=
Value:
3
=
Assumed
true,
2
=
Probable,
1
=
Possible,
0
=
Unknown,
(-‐
1=
possibly
untrue,
-‐
2
=
probably
untrue,
-‐3
=
assumed
untrue)
– B
=
Basis:
Reasoning
Data
– S
=
Source:
A
=
speaker
is
author
A,
explicit
IA
=
speaker
author,
A,
implicit
N
=
other
author
N,
explicit
NN
=
other
author
NN,
implicit
Model
suggested
by
Eduard
Hovy,
InformaHon
Sciences
InsHtute
University
South
Califormia
40. Adding
Epistemic
Evalua*on
Claim
ORCA
Value
Together,
Lats2
and
ASPP1
shunt
p53
to
proapopto*c
Value
=
3
promoters
and
promote
the
death
of
polyploid
cells
[1].
(…)
Source
=
N
Basis
=
0
Further
biochemical
characteriza*on
of
hMOBs
showed
that
Value
=
3
only
hMOB1A
and
hMOB1B
interact
with
both
LATS1
and
Source
=
N
LATS2
in
vitro
and
in
vivo
[39].
(…)
Basis
=
Data
Our
findings
reveal
that
miR-‐373
would
be
a
poten*al
Value
=
1
or
2
?
oncogene
and
it
par*cipates
in
the
carcinogenesis
of
human
Source
=
Author
esophageal
cancer
by
suppressing
LATS2
expression.
Basis
=
Data
Furthermore,
we
demonstrated
that
the
direct
inhibi*on
of
Value
=
2
(or
3?)
LATS2
protein
was
mediated
by
miR-‐373
and
manipulated
the
Source
=
Author
expression
of
miR-‐373
to
affect
esophageal
cancer
cells
growth.
Basis
=
Data
41. Textual
Markers
• Modal
auxiliary
verbs
(e.g.
can,
could,
might)
• Qualifying
adverbs
and
adjec*ves
(e.g.
interesHngly,
possibly,
likely,
potenHal,
somewhat,
slightly,
powerful,
unknown,
undefined)
• References,
either
external
(e.g.
‘[Voorhoeve
et
al.,
2006]’)
or
internal
(e.g.
‘See
fig.
2a’).
• Repor*ng/epistemic
verbs
(e.g.
suggest,
imply,
indicate,
show)
– either
within
the
clause:
‘These
results
suggest
that...’
– or
in
a
subordinate
clause
governed
by
repor*ng-‐verb
matrix
clause
‘{These
results
suggest
that}
indeed,
this
represents
the
true
endogenous
acHvity.’
42. Markers
v.
Types:
1
paper,
640
segments
Value
Modal
Repor5ng
Ruled
by
Adverbs/ Referenc None
Total
Aux
Verb
RV
Adjec5ves
es
Total
value
=
3
1
(0.5%)
81
(40%)
24
(12%)
7
(4%)
41
(20%)
47
(24%)
201(100%)
Total
Value
=
2
29
(51%)
23
(40%)
1
(2%)
4(7%)
57(100%)
Total
Value
=
1
9(27%)
11(33%)
11(33%)
1(3%)
1(3%)
33(100%)
Total
Value
=
0
9
(64%)
3
(21%)
1(7%)
1(7%)
14(100%)
Total
No
Modality
16(37%)
3(7%)
0
3(7%)
22(50%)
44(100%)
Overall
Total
10
(2%)
146(23%)
64(10%)
10(2%)
50(8%)
69(11%)
640(100%)
43. Most
prevalent
clause
type:
“These
results
suggest
that...”
Adverb/Connec*ve
thus,
therefore,
together,
recently,
in
summary
Determiner/Pronoun
it,
this,
these,
we/our
Adjec*ve
previous,
future,
beber
Noun
phrase
data,
report,
study,
result(s);
method
or
reference
Modal
form
of
‘to
be’,
may,
remain
Adjec*ve
oken,
recently,
generally
Verb
show,
obtain,
consider,
view,
reveal,
suggest,
hypothesize,
indicate,
believe
Preposi*on
that,
to
44. Repor*ng
verbs
vs.
epistemic
value:
Value
=
0
establish,
(remain
to
be)
elucidated,
(unknown)
be
(clear/useful),
(remain
to
be)
examined/determined,
describe,
make
difficult
to
infer,
report
Value
=
1
be
important,
consider,
expect,
hypothesize
(5x),
give
(hypothe*cal)
insight,
raise
possibility
that,
suspect,
think
Value
=
2
appear,
believe,
implicate
(2x),
imply,
indicate
(12x),
play
a
(probable)
role,
represent,
suggest
(18x),
validate
(2x),
Value
=
3
be
able/apparent/important
/posi*ve/visible,
compare
(presumed
true)
(2x),
confirm
(2x),
define,
demonstrate
(15x),
detect
(5x),
discover,
display
(3x),
eliminate,
find
(3x),
iden*fy
(4x),
know,
need,
note
(2x),
observe
(2x),
obtain
(success/
results-‐
3x),
prove
to
be,
refer,
report(2x),
reveal
(3x),
see(2x),
show(24x),
study,
view