SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
How	
  Scien*sts	
  Read,	
  And	
  Whether	
  
   Computers	
  Can	
  Help	
  Them	
  
             Anita	
  de	
  Waard	
  
     Disrup*ve	
  Technologies	
  Director	
  
              Elsevier	
  Labs	
  



                       Making	
  Sense	
  of	
  Biological	
  Systems,	
  Bozeman,	
  MT	
  
Outline	
  
•  Why	
  do	
  scien*sts	
  read?	
  
•  How	
  do	
  we	
  read?	
  (Discourse	
  comprehension	
  101)	
  
•  What	
  do	
  we	
  need	
  to	
  read:	
  	
  
    –  Noun	
  phrases	
  
    –  Triples	
  
    –  Metadiscourse	
  
    –  Claims	
  and	
  Evidence	
  
•  Can	
  the	
  computer	
  iden*fy	
  these	
  components?	
  	
  
•  Some	
  thoughts	
  on	
  explaining	
  our	
  texts	
  to	
  computers	
  
How	
  and	
  why	
  scien*sts	
  read:	
  
•  Why	
  do	
  we	
  read?	
  	
  
   To	
  learn,	
  i.e.:	
  obtain	
  the	
  knowledge	
  contained	
  within	
  the	
  
   text	
  and	
  integrate	
  it	
  with	
  what	
  we	
  already	
  know.	
  
•  What	
  do	
  we	
  read?	
  	
  
   Things	
  that	
  are	
  ‘interes*ng’	
  :	
  
    –  Per*nent	
  
    –  Possibly/probably	
  true	
  
    –  Novel,	
  but	
  in	
  agreement	
  with	
  what	
  I	
  know	
  
•  How	
  do	
  we	
  read?	
  	
  
Discourse	
  Comprehension	
  101	
  
•  LeTer	
  <	
  syllable	
  <	
  word	
  <	
  clause	
  <	
  sentence	
  <	
  discourse:	
  
     This	
  is	
  how	
  linguis*cs	
  is	
  structured.	
  	
  
     But	
  it	
  is	
  not	
  how	
  we	
  understand	
  text!	
  
Discourse	
  Comprehension	
  101	
  
•  LeTer	
  <	
  syllable	
  <	
  word	
  <	
  clause	
  <	
  sentence	
  <	
  discourse:	
  
     This	
  is	
  how	
  linguis*cs	
  is	
  structured.	
  	
  
     But	
  it	
  is	
  not	
  how	
  we	
  understand	
  text!	
  
Discourse	
  Comprehension	
  101	
  
•  LeTer	
  <	
  syllable	
  <	
  word	
  <	
  clause	
  <	
  sentence	
  <	
  discourse:	
  
     This	
  is	
  how	
  linguis*cs	
  is	
  structured.	
  	
  
     But	
  it	
  is	
  not	
  how	
  we	
  understand	
  text!	
  
Discourse	
  Comprehension	
  101	
  
•  LeTer	
  <	
  syllable	
  <	
  word	
  <	
  clause	
  <	
  sentence	
  <	
  discourse:	
  
     This	
  is	
  how	
  linguis*cs	
  is	
  structured.	
  	
  
     But	
  it	
  is	
  not	
  how	
  we	
  understand	
  text!	
  
Discourse	
  Comprehension	
  101	
  
•  LeTer	
  <	
  syllable	
  <	
  word	
  <	
  clause	
  <	
  sentence	
  <	
  discourse:	
  
     This	
  is	
  how	
  linguis*cs	
  is	
  structured.	
  	
  
     But	
  it	
  is	
  not	
  how	
  we	
  understand	
  text!	
  
Discourse	
  Comprehension	
  101	
  
•  LeTer	
  <	
  syllable	
  <	
  word	
  <	
  clause	
  <	
  sentence	
  <	
  discourse:	
  
     This	
  is	
  how	
  linguis*cs	
  is	
  structured.	
  	
  
     But	
  it	
  is	
  not	
  how	
  we	
  understand	
  text!	
  
Discourse	
  Comprehension	
  101	
  
•  LeTer	
  <	
  syllable	
  <	
  word	
  <	
  clause	
  <	
  sentence	
  <	
  discourse:	
  
     This	
  is	
  how	
  linguis*cs	
  is	
  structured.	
  	
  
     But	
  it	
  is	
  not	
  how	
  we	
  understand	
  text!	
  
•  Kintsch	
  and	
  Van	
  Dijk,	
  ‘93:	
  we	
  read	
  a	
  text	
  at	
  three	
  levels:	
  
    –  surface	
  code:	
  literal	
  text,	
  exact	
  words/syntax	
  
    –  text	
  base:	
  preserves	
  meaning,	
  but	
  not	
  exact	
  wording	
  
    –  situa*on	
  model:	
  ‘microworld’	
  that	
  the	
  text	
  is	
  about:	
  
       constructed	
  inferen*ally	
  through	
  interac*on	
  between	
  the	
  
       text	
  and	
  background	
  knowledge	
  
•  We	
  use	
  knowledge	
  about	
  text	
  genre	
  to	
  ac*vate	
  a	
  schema:	
  	
  
   this	
  allows	
  crea*on	
  of	
  the	
  text	
  base	
  and	
  situa*on	
  model	
  
Examples	
  of	
  schema’s:	
  	
  
What	
  is	
  this	
  paper	
  about?	
  	
  
What	
  is	
  this	
  paper	
  about?	
  	
  
                          A.	
  NOUN	
  PHRASES	
  
          transiently	
  expressed	
  miRNA	
  sponges	
  

                 human	
  breast	
  cancer	
  	
   high-­‐grade	
  malignancy	
  
    miR-­‐31	
  
                   noninvasive	
  MCF7-­‐Ras	
  
          an*sense	
  oligonucleo*des	
  	
  
                      cell	
  viability	
  	
                                  cloned	
  	
  
                retroviral	
  vector	
  

Is	
  it	
  per*nent?	
  -­‐>	
  Possibly…	
  
Is	
  it	
  true?	
  -­‐>	
  ?	
  
Is	
  it	
  new,	
  but	
  in	
  agreement	
  with	
  what	
  I	
  know?	
  -­‐>	
  -­‐?	
  
What	
  is	
  this	
  paper	
  about?	
  	
  
                             B.	
  TRIPLES	
  
           miR-­‐31	
  expression	
  DEPRIVE	
  metasta*c	
  cells	
  
 miR-­‐31	
  PREVENT	
  acquisi*on	
  of	
  aggressive	
  traits	
  
      miR-­‐31	
  INHIBIT	
  noninvasive	
  MCF7-­‐Ras	
  cells	
  	
  
                      miR-­‐31	
  ENHANCE	
  	
  invasion	
  	
  
                              cell	
  viability	
  AFFECT	
  inhibitor	
  	
  

Is	
  it	
  per*nent?	
  -­‐>	
  Possibly…	
  
Is	
  it	
  true?	
  -­‐>	
  ?	
  
Is	
  it	
  new,	
  but	
  in	
  agreement	
  with	
  what	
  I	
  know?	
  -­‐>?	
  
What	
  is	
  this	
  paper	
  about?	
  	
  
                                        C.	
  METADISCOURSE	
  
The	
  preceding	
  observa*ons	
  demonstrated	
  that	
  X	
  expression	
  deprives	
  Y	
  cells	
  of	
  
aTributes	
  associated	
  with	
  Z.	
  	
  
We	
  next	
  asked	
  whether	
  X	
  also	
  prevents	
  the	
  acquisi*on	
  of	
  A	
  traits	
  by	
  B	
  cells.	
  
To	
  do	
  so,	
  we	
  transiently	
  inhibited	
  X	
  in	
  C	
  cells	
  with	
  either	
  D	
  or	
  E.	
  	
  
Both	
  approaches	
  inhibited	
  X	
  func*on	
  by	
  >	
  4.5-­‐fold	
  (Figure	
  S7A).	
  
Suppression	
  of	
  X	
  enhanced	
  invasion	
  by	
  20-­‐fold	
  and	
  mo*lity	
  by	
  5-­‐fold,	
  	
  but	
  F	
  was	
  
unaffected	
  by	
  either	
  inhibitor	
  (Figure	
  3A;	
  Figure	
  S7B).	
   	
  	
  
The	
  E	
  sponge	
  reduced	
  X	
  func*on	
  by	
  2.5-­‐fold,	
  but	
  did	
  not	
  affect	
  the	
  ac*vity	
  of	
  other	
  
known	
  Js	
  (Figures	
  S8A	
  and	
  S8B).	
  	
  
Collec*vely,	
  these	
  data	
  indicated	
  that	
  sustained	
  X	
  ac*vity	
  is	
  necessary	
  to	
  prevent	
  the	
  
acquisi*on	
  of	
  Z	
  traits	
  by	
  both	
  K	
  and	
  untransformed	
  B	
  cells.	
  	
  
      Is	
  it	
  per*nent?	
  -­‐>	
  Need	
  content	
  
      Is	
  it	
  true?	
  -­‐>	
  Sounds	
  likely!	
  I	
  know	
  this	
  stuff!	
  
      Is	
  it	
  new,	
  but	
  in	
  agreement	
  with	
  what	
  I	
  know?	
  	
  -­‐>	
  Need	
  content	
  	
  
What	
  is	
  this	
  paper	
  about?	
  	
  
                                D.	
  CLAIMS	
  AND	
  EVIDENCE	
  
Claim:	
  	
  
•  sustained	
  miR-­‐31	
  ac*vity	
  is	
  necessary	
  to	
  prevent	
  the	
  acquisi*on	
  of	
  aggressive	
  
   traits	
  by	
  both	
  tumor	
  cells	
  and	
  untransformed	
  breast	
  epithelial	
  
Evidence:	
  Method:	
  	
  
•  We	
   transiently	
   inhibited	
   miR-­‐31	
   in	
   noninvasive	
   MCF7-­‐Ras	
   cells	
   with	
   either	
  
   an*sense	
  oligonucleo*des	
  or	
  miRNA	
  sponges.	
  
Evidence:	
  Result:	
  	
  
•  Both	
  approaches	
  inhibited	
  miR-­‐31	
  func*on	
  by	
  >4.5-­‐fold	
  (Figure	
  S7A).	
  	
  
•  Suppression	
   of	
   miR-­‐31	
   enhanced	
   invasion	
   by	
   20-­‐fold	
   and	
   mo*lity	
   by	
   5-­‐fold,	
  
   but	
  cell	
  viability	
  was	
  unaffected	
  by	
  either	
  inhibitor	
  (Figure	
  3A;	
  Figure	
  S7B).	
  	
  
•  The	
   miR-­‐31	
   sponge	
   reduced	
   miR-­‐31	
   func*on	
   by	
   2.5-­‐fold,	
   but	
   did	
   not	
   affect	
  
   the	
  ac*vity	
  of	
  other	
  known	
  an*metasta*c	
  miRNAs	
  (Figures	
  S8A	
  and	
  S8B).	
  

  Is	
  it	
  per*nent?	
  -­‐>	
  Probably	
  
  Is	
  it	
  true?	
  -­‐>	
  Sounds	
  likely!	
  	
  	
  	
  
  Is	
  it	
  new,	
  but	
  in	
  agreement	
  with	
  what	
  I	
  know?	
  -­‐>	
  Check/know	
  
What	
  is	
  this	
  paper	
  about?	
  	
  
        E.	
  JOURNAL	
  &	
  AUTHOR’S	
  NAMES/AFFILIATIONS	
  




Is	
  it	
  per*nent?	
  -­‐>	
  Possibly	
  	
  
Is	
  it	
  true?	
  	
      -­‐>	
  Probably!	
  
Is	
  it	
  new,	
  but	
  in	
  agreement	
  with	
  what	
  I	
  know?	
  	
  -­‐>	
  Need	
  background	
  
In	
  summary,	
  how	
  scien*sts	
  read:	
  
•  Surface	
  code	
  provides	
  noun	
  phrases	
  and	
  triples	
  that	
  offer	
  
   pointers	
  re.	
  topical	
  relevance	
  
•  Text	
  base	
  and	
  and	
  situa*on	
  model	
  are	
  created	
  through	
  specific	
  
   metadiscourse	
  conven*ons	
  	
  (e.g.	
  refs	
  at	
  the	
  end)	
  that	
  create	
  a	
  
   biological	
  reasoning	
  model:	
  	
  
          We	
  next	
  asked	
  whether	
  …	
                                  Hypothesis	
  
          To	
  do	
  so,	
  we	
  transiently	
  inhibited…	
  	
               Goal/Method	
  
          Suppression	
  of	
  X	
  enhanced	
  invasion	
  …	
  	
              Result	
  
          but	
  F	
  was	
  unaffected	
  …(Figure	
  3A).	
  	
  …	
            Results	
  
          Collec*vely,	
  these	
  data	
  indicated	
  that	
  …	
  .	
         Implica*on	
  

•  This	
  can	
  be	
  expressed	
  as	
  a	
  set	
  of	
  claims,	
  linked	
  to	
  evidence,	
  that	
  
   can	
  help	
  represent	
  key	
  points	
  in	
  the	
  paper	
  
•  Journal	
  name	
  and	
  author’s	
  affiliaHon	
  help	
  define	
  schema	
  and	
  
   provide	
  ‘willingness	
  to	
  be	
  convinced’	
  socially/interpersonally.	
  
Can	
  computers	
  help	
  us	
  iden*fy:	
  
A.  Noun	
  phrases	
  
B.  Triples	
  
C.  Metadiscourse	
  elements	
  
D.  Claims	
  +	
  evidence	
  
E.  Journal	
  and	
  author’s	
  names	
  and	
  affilia*on	
  
Can	
  computers	
  help	
  us	
  iden*fy:	
  
A.  Noun	
  phrases	
  
B.  Triples	
  
C.  Metadiscourse	
  elements	
  
D.  Claims	
  +	
  evidence	
  
E.  Journal	
  and	
  author’s	
  names	
  and	
  affiliaHon	
  
Noun	
  Phrases:	
  some	
  issues	
  
•  Problem	
  1:	
  disambigua*ng	
  terms	
  (©	
  GoPubMed):	
  
   –  Hnrpa1	
  =	
  Tis	
  =	
  Fli-­‐2	
  =	
  nuclear	
  ribonucleoprotein	
  A1	
  =	
  helix	
  
      destabilizing	
  protein	
  =	
  single-­‐strand	
  binding	
  protein	
  =	
  hnRNP	
  core	
  
      protein	
  A1	
  =	
  HDP-­‐1	
  =	
  topoisomerase-­‐inhibitor	
  suppressed.	
  
   –  Cellulose	
  1,4-­‐beta-­‐cellobiosidase	
  =	
  exoglucanase	
  
   –  COLD	
  =/	
  C.O.L.D.	
  =/	
  cold	
  (runny	
  nose)	
  =/	
  cold	
  (low	
  T)	
  	
  

•  Problem	
  2:	
  disambigua*ng	
  en**es	
  (©	
  M.	
  Martone):	
  
   –  95	
  an*bodies	
  were	
  (manually!)	
  iden*fied	
  in	
  8	
  ar*cles	
  
   –  52	
  did	
  not	
  contain	
  enough	
  informa*on	
  to	
  determine	
  the	
  an*body	
  
      used	
  
   –  Some	
  provided	
  details	
  in	
  other	
  papers	
  
   –  Failed	
  to	
  give	
  species,	
  clonality,	
  vendor,	
  or	
  catalog	
  number	
  
Noun	
  Phrases:	
  some	
  progress	
  
•  Despite	
  these	
  difficul*es,	
  noun	
  phrase	
  recall/precision	
  is	
  
     quite	
  high,	
  e.g.	
  I2B22011	
  [1],	
  [2],	
  others:	
  90%-­‐98%	
  
•  Many	
  tools,	
  see	
  [3]	
  for	
  a	
  list;	
  e.g.	
  GoPubMed:	
  	
  
	
  
Triples:	
  some	
  issues:	
  
•  Con*ngent	
  on	
  good	
  NP	
  &	
  VP	
  detec*on	
  
•  Hard	
  to	
  parse	
  text!	
  E.g.	
  a	
  commercial	
  tool	
  gave:	
  
insulin	
  	
  maintaining	
  	
   	
  glucose	
  homeostasis	
  	
  	
  
When	
  insulin	
  secre*on	
  cannot	
  be	
  increased	
  adequately	
  (type	
  I	
  
diabetes	
  defect)	
  to	
  overcome	
  insulin	
  resistance	
  in	
  maintaining	
  
glucose	
  homeostasis,	
  hyperglycemia	
  and	
  glucose	
  intolerance	
  
ensues.	
  	
  
insulin	
  	
  may	
  be	
  involved	
  	
   	
  glucose	
  homeostasis	
  	
  	
  
Because	
  PANDER	
  is	
  expressed	
  by	
  pancrea*c	
  beta-­‐cells	
  and	
  in	
  
response	
  to	
  glucose	
  in	
  a	
  similar	
  way	
  to	
  those	
  of	
  insulin,	
  PANDER	
  
may	
  be	
  involved	
  in	
  glucose	
  homeostasis.	
  
Triples:	
  some	
  progress:	
  
Biological	
  Expression	
  Language	
  [4]:	
  	
  
We	
  provide	
  evidence	
  that	
  these	
  miRNAs	
  are	
  potenHal	
  novel	
  oncogenes	
  parHcipaHng	
  in	
  the	
  development	
  
of	
  human	
  tesHcular	
  germ	
  cell	
  tumors	
  by	
  numbing	
  the	
  p53	
  pathway,	
  thus	
  allowing	
  tumorigenic	
  growth	
  in	
  
the	
  presence	
  of	
  wild-­‐type	
  p53.	
  	
  
Increased	
  abundance	
  of	
  miR-­‐372	
  decreases	
  ac5vity	
  of	
  TP53	
  
r(MIR:miR-372) -| tscript(p(HUGO:Trp53))
Context:	
  cancer	
  
SET Disease = “Cancer”
Ac5vity	
  of	
  TP53	
  decreases	
  cell	
  growth	
  
tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”	
  
Metadiscourse:	
  why	
  it	
  maTers	
  
       “[Y]ou	
  can	
  transform	
  ..	
  fic*on	
  into	
  fact	
  just	
  by	
  adding	
  or	
  
       subtrac*ng	
  references”,	
  Bruno	
  Latour	
  [5]
•  Voorhoeve	
  et	
  al.,	
  2006:	
   These	
  miRNAs	
  neutralize	
  p53-­‐	
  mediated	
  CDK	
  
   inhibi*on,	
  possibly	
  through	
  direct	
  inhibi*on	
  of	
  the	
  expression	
  of	
  the	
  tumor	
  
   suppressor	
  LATS2. 	
  
•  Kloosterman	
  and	
  Plasterk,	
  2006:	
   In	
  a	
  gene*c	
  screen,	
  miR-­‐372	
  and	
  miR-­‐373	
  
   were	
  found	
  to	
  allow	
  prolifera*on	
  of	
  primary	
  human	
  cells	
  that	
  express	
  
   oncogenic	
  RAS	
  and	
  ac*ve	
  p53,	
  possibly	
  by	
  inhibi*ng	
  the	
  tumor	
  suppressor	
  
   LATS2	
  (Voorhoeve	
  et	
  al.,	
  2006). 	
  
•  Yabuta	
  et	
  al.,	
  2007:	
  	
   [On	
  the	
  other	
  hand,]	
  two	
  miRNAs,	
  miRNA-­‐372	
  and-­‐373,	
  
   func*on	
  as	
  poten5al	
  novel	
  oncogenes	
  in	
  tes*cular	
  germ	
  cell	
  tumors	
  by	
  
   inhibi*on	
  of	
  LATS2	
  expression,	
  which	
  suggests	
  that	
  Lats2	
  is	
  an	
  important	
  
   tumor	
  suppressor	
  (Voorhoeve	
  et	
  al.,	
  2006). 	
  	
  
•  Okada	
  et	
  al.,	
  2011:	
   Two	
  oncogenic	
  miRNAs,	
  miR-­‐372	
  and	
  miR-­‐373,	
  directly	
  
   inhibit	
  the	
  expression	
  of	
  Lats2,	
  thereby	
  allowing	
  tumorigenic	
  growth	
  in	
  the	
  
   presence	
  of	
  p53	
  (Voorhoeve	
  et	
  al.,	
  2006). 	
  
Metadiscourse:	
  some	
  progress	
  
•  Hedging	
  cues,	
  specula*ve	
  language,	
  modality/nega*on:	
  
     –  Light	
  et	
  al	
  [6]:	
  finding	
  specula*ve	
  language	
  
     –  Wilbur	
  et	
  al	
  (Hagit)	
  [7]:	
  focus,	
  polarity,	
  certainty,	
  evidence,	
  and	
  
        direc*onality	
  
     –  Thompson	
  et	
  al	
  (Sophia)	
  [8]:	
  level	
  of	
  specula*on,	
  type/source	
  
        of	
  the	
  evidence	
  and	
  level	
  of	
  certainty	
  	
  	
  
•  Sen*ment	
  detec*on	
  (e.g.	
  Kim	
  and	
  Hovy	
  [9]	
  a.m.o.):	
  	
  
    –  Holder	
  of	
  the	
  opinion,	
  strength,	
  polarity	
  as	
  ‘mathema*cal	
  
       func*on’	
  ac*ng	
  on	
  main	
  proposi*onal	
  content	
  	
  
•  Can	
  make	
  this	
  part	
  of	
  the	
  seman*c	
  web:	
  (e.g.,	
  Ontology	
  for	
  
   Reasoning,	
  Certainty	
  and	
  ATribu*on,	
  ORCA	
  [10]):	
  	
  
    –  Value	
  (Presumed	
  True,	
  Probable,	
  Possible,	
  Unknown)	
  
    –  Source	
  (Author,	
  Named	
  Other,	
  Unknown)	
  
    –  Basis	
  (Data,	
  Reasoning,	
  Unknown)	
  
Claims	
  and	
  Evidence:	
  some	
  issues:	
  
•    Data2Seman*cs	
  [11]:	
  linking	
  clinical	
  guidelines	
  to	
  evidence.	
  
     Inconsistency	
  within	
  guideline	
  and	
  guidelines	
  v.	
  evidence:	
  	
  	
  
     •     Studies	
  have	
  demonstrated	
  inconsistent	
  results	
  regarding	
  the	
  use	
  of	
  such	
  
           markers	
  of	
  inflamma*on	
  as	
  C-­‐reac*ve	
  protein	
  (CRP),	
  interleukins-­‐	
  6	
  (IL-­‐6)	
  and	
  
           -­‐8,	
  and	
  procalcitonin	
  (PCT)	
  in	
  neutropenic	
  pa*ents	
  with	
  cancer	
  [55–57].	
  	
  
           •  [55]:	
  PCT	
  and	
  IL-­‐6	
  are	
  more	
  reliable	
  markers	
  than	
  CRP	
  for	
  predic*ng	
  
                     bacteremia	
  in	
  pa*ents	
  with	
  febrile	
  neutropenia	
  
           •  [56]	
  In	
  conclusion,	
  daily	
  measurement	
  of	
  PCT	
  or	
  IL-­‐6	
  could	
  help	
  iden5fy	
  
                     neutropenic	
  pa5ents	
  with	
  a	
  stable	
  course	
  when	
  the	
  fever	
  lasts	
  >3	
  d.	
  …,	
  	
  
                     it	
  would	
  reduce	
  adverse	
  events	
  and	
  treatment	
  costs.	
  	
  
           •  [57]	
  Our	
  study	
  supports	
  the	
  value	
  of	
  PCT	
  as	
  a	
  reliable	
  tool	
  to	
  predict	
  
                     clinical	
  outcome	
  in	
  febrile	
  neutropenia.	
  
•    Drug	
  Interac*on	
  Knowledgebase	
  [12]:	
  how	
  to	
  iden*fy	
  evidence?	
  	
  
     •     R-­‐citalopram_is_not_substrate_of_cyp2c19:	
  	
  
           •  At	
  10uM	
  R-­‐	
  or	
  S-­‐CT,	
  ketoconazole	
  reduced	
  reac*on	
  velocity	
  to	
  55	
  -­‐60%	
  of	
  
                  control,	
  quinidine	
  to	
  80%,	
  and	
  omeprazole	
  to	
  80-­‐85%	
  of	
  control	
  (Fig.	
  6).	
  	
  
Claims	
  and	
  Evidence:	
  some	
  progress	
  
•  Defining	
  ‘salient	
  knowledge	
  components’	
  in	
  text:	
  
       –  Argumenta*ve	
  zones,	
  CoreSC	
  can	
  both	
  be	
  found	
  
       –  Blake,	
  Claim	
  networks	
  (more	
  soon!)	
  
       –  Claimed	
  Knowledge	
  Updates	
  (Sandor/de	
  Waard,	
  [13]):	
  	
  
	
  
Perhaps	
  we	
  should	
  start	
  wri*ng	
  for	
  
                          computers?	
  
•  So	
  why	
  doesn’t	
  the	
  author	
  add	
  this	
  informa*on?	
  	
  
   If	
  you’re	
  know	
  you’re	
  going	
  to	
  mine	
  it,	
  why	
  bury	
  it?	
  
•  Authoring	
  tools	
  for	
  en*ty	
  iden*fica*on:	
  MS	
  for	
  
   Chemistry,	
  Math,	
  proteins;	
  some	
  experiments	
  but	
  no	
  
   solu*on	
  yet	
  [14]	
  
•  Authoring	
  tool	
  for	
  triple	
  iden*fica*on	
  (MS	
  Ac*veText)	
  
•  But	
  the	
  ques*on	
  remains:	
  	
  
       A}er	
  we’ve	
  ‘extracted’	
  all	
  the	
  ‘facts’,	
  
       what	
  is	
  all	
  the	
  gunk	
  that	
  remains	
  	
  
       in	
  the	
  filter?	
  	
  
	
  
Perhaps	
  we	
  should	
  explain:	
  a	
  paper	
  is	
  rhetorical?	
  
 Aristotle	
                                                     Quin5lian	
                                                               Scien5fic	
  Paper	
  
                                    The	
  introduc*on	
  of	
  a	
  speech,	
  where	
  one	
  announces	
  the	
  subject	
  
                 Introduc*on and	
  purpose	
  of	
  the	
  discourse,	
  and	
  where	
  one	
  usually	
  employs	
                       Introduc*on:	
  
prooimion	
       /	
  exordium	
   the	
  persuasive	
  appeal	
  to	
  ethos	
  in	
  order	
  to	
  establish	
  credibility	
            posi*oning	
  
                                    with	
  the	
  audience.	
  	
  
                 Statement	
  of	
  
                                     The	
  speaker	
  here	
  provides	
  a	
  narra*ve	
  account	
  of	
  what	
  has	
             Introduc*on:	
  research	
  
 prothesis	
        Facts/
                                     happened	
  and	
  generally	
  explains	
  the	
  nature	
  of	
  the	
  case.	
  	
  
                   narraHo	
                                                                                                                 ques*on	
  

                  Summary/	
   The	
  proposi*o	
  provides	
  a	
  brief	
  summary	
  of	
  what	
  one	
  is	
  about	
  
       	
         proposHHo	
   to	
  speak	
  on,	
  or	
  concisely	
  puts	
  forth	
  the	
  charges	
  or	
  accusa*on.	
  	
     Summary	
  of	
  contents	
  
                    Proof/	
    The	
  main	
  body	
  of	
  the	
  speech	
  where	
  one	
  offers	
  logical	
  
    pis*s	
       confirmaHo	
   arguments	
  as	
  proof.	
  The	
  appeal	
  to	
  logos	
  is	
  emphasized	
  here.	
                        Results	
  

                  Refuta*on/	
   As	
  the	
  name	
  connotes,	
  this	
  sec*on	
  of	
  a	
  speech	
  was	
  devoted	
  to	
  
       	
          refutaHo	
   answering	
  the	
  counterarguments	
  of	
  one's	
  opponent.	
                                          Related	
  Work	
  

                                    Following	
  the	
  refuta*o	
  and	
  concluding	
  the	
  classical	
  ora*on,	
  the	
  
                                                                                                                                       Discussion:	
  summary,	
  
  epilogos	
       peroraHo	
  	
   perora*o	
  conven*onally	
  employed	
  appeals	
  through	
  pathos,	
  
                                    and	
  o}en	
  included	
  a	
  summing	
  up.	
                                                       implica*ons.	
  

- 	
  goal	
  of	
  the	
  paper	
  is	
  to	
  be	
  published;	
  it	
  uses	
  author/journal	
  as	
  a	
  host	
  
- 	
  format	
  has	
  co-­‐evolved:	
  predator-­‐prey	
  rela*onship	
  with	
  reviewers	
  
Perhaps	
  we	
  should	
  explain:	
  a	
  paper	
  is	
  a	
  story?	
  
Story Grammar	

             The Story of Goldilocks and              Paper             The AXH Domain of Ataxin-1 Mediates
                             the Three Bears	

                       Grammar	

        Neurodegeneration through Its Interaction with Gfi-1/
                                                                                        Senseless Proteins	


Setting	

   Time	

         Once upon a time	

                      Background	

     The mechanisms mediating SCA1 pathogenesis are still not fully
                                                                                        understood, but some general principles have emerged. 	


             Character	

 a little girl named Goldilocks	

           Objects of        the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract, 	

                                                                      study	

             Location	

     She went for a walk in the forest.
                             Pretty soon, she came upon a             Experimental studied and compared in vivo effects and interactions to those of the
                             house.	

                                setup	

     human protein	


Theme	

     Goal	

         She knocked and, when no one             Research       Gain insight into how Atx-1's function contributes to SCA1
                             answered, 	

                            goal	

           pathogenesis. How these interactions might contribute to the disease
                                                                                        process and how they might cause toxicity in only a subset of neurons
                                                                                        in SCA1 is not fully understood.	


             Attempt	

      she walked right in. 	

                 Hypothesis	

     Atx-1 may play a role in the regulation of gene expression	


Episode	

 Name	

           At the table in the kitchen, there       Name	

           dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed
                             were three bowls of porridge. 	

                          in Files 	


             Subgoal	

      Goldilocks was hungry. 	

               Subgoal	

        test the function of the AXH domain	



             Attempt	

      She tasted the porridge from the         Method	

         overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and
                             first bowl. 	

                                             Perrimon, 1993) and compared its effects to those of hAtx-1. 	


             Outcome	

      This porridge is too hot! she            Results	

        Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives
                             exclaimed.	

                                              expression in the differentiated R1-R6 photoreceptor cells (Mollereau
                                                                                        et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in
             Attempt	

      So, she tasted the porridge from the                       the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days
                             second bowl.	

                                            after eclosion, overexpression of either Atx-1 does not show obvious
                                                                                        morphological changes in the photoreceptor cells	

             Outcome	

      This porridge is too cold, she said	

                                                                      Data	

           (data not shown), 	

             Attempt	

      So, she tasted the last bowl of
                             porridge.	

                             Results	

        both genotypes show many large holes and loss of cell integrity at 28
                                                                                        days 	

             Outcome	

      Ahhh, this porridge is just right, she
                                                                                        (Figures 1B-1D).
A	
  closer	
  look	
  at	
  verb	
  tense:	
  
Conceptual realm: ‘state’ (gnomic) present
•  ‘Dopaminergic innervation plays a major role in the control of mood
   and its perturbation’
Experimental realm: ‘event’ past
•  ‘Four out of seven cell lines expressed this cluster’,
•  ‘Adult rats were individually housed for 2 days before testing.’
Argumentational realm: ‘instantaneous’ present; to-infinitive
•  ‘These results suggest that...’,
•  ‘To identify these mechanisms…’
Discourse progression: ‘instantaneous’ present
•  ‘Fig 2a shows that’
•  ‘see figure 7A’,
Reference to other work: present perfect - ‘finalised’ past
•  ‘Previous work has demonstrated that VPCs are sensitive to the
   levels of let-60/RAS (Han and Sternberg, 1990).’	
  
Tense	
  use	
  in	
  science	
  and	
  mythology:	
  
Facts	
  in	
  the	
   Endogenous	
  small	
  RNAs	
  (miRNAs)	
  regulate	
                               I	
  sing	
  of	
  golden-­‐throned	
  Hera	
  whom	
  Rhea	
  bare.	
  
eternal	
  present	
   gene	
  expression	
  by	
  mechanisms	
  conserved	
                               Queen	
  of	
  the	
  immortals	
  is	
  she,	
  surpassing	
  all	
  in	
  
                       across	
  metazoans.	
                                                              beauty:	
  she	
  is	
  the	
  sister	
  and	
  the	
  wife	
  of	
  loud-­‐
                                                                                                           thundering	
  Zeus,	
  -­‐-­‐the	
  glorious	
  one	
  whom	
  all	
  the	
  
                                                                                                           blessed	
  throughout	
  high	
  Olympus	
  reverence	
  and	
  
                                                                                                           honor.	
  
Events	
  in	
  the	
      Vehicle-­‐treated	
  animals	
  spent	
  equivalent	
                           Now	
  the	
  wooers	
  turned	
  to	
  the	
  dance	
  and	
  to	
  
simple	
  past	
           *me	
  inves*ga*ng	
  a	
  juvenile	
  in	
  the	
  first	
  and	
               gladsome	
  song,	
  and	
  made	
  them	
  merry,	
  and	
  waited	
  
                           second	
  sessions	
  in	
  experiments	
  conducted	
  in	
                    *ll	
  evening	
  should	
  come;	
  and	
  as	
  they	
  made	
  merry	
  
                           the	
  NAC	
  and	
  the	
  striatum:	
  	
  T1	
  values	
  were	
             dark	
  evening	
  came	
  upon	
  them.	
  
                           122	
  ±	
  6	
  s	
  and	
  114	
  ±	
  5	
  s.	
  
Events	
  with	
           We	
  also	
  generated	
  BJ/ET	
  cells	
  expressing	
  the	
   And	
  she	
  took	
  her	
  mighty	
  spear,	
  *pped	
  with	
  sharp	
  
embedded	
                 RASV12-­‐ERTAM	
  chimera	
  gene,	
  which	
  is	
  only	
   bronze,	
  heavy	
  and	
  huge	
  and	
  strong,	
  wherewith	
  
facts	
                    ac*ve	
  when	
  tamoxifen	
  is	
  added	
  (De	
  Vita	
  et	
  al,	
   she	
  vanquishes	
  the	
  ranks	
  of	
  men-­‐of	
  warriors,	
  with	
  
                           2005).	
                                                                  whom	
  she	
  is	
  wroth,	
  she,	
  the	
  daughter	
  of	
  the	
  
                                                                                                     mighty	
  sire.	
  
AMribu5on	
  in	
          miRNAs	
  have	
  emerged	
  as	
  important	
                                  In	
  this	
  book	
  I	
  have	
  had	
  old	
  stories	
  wriTen	
  down,	
  as	
  
the	
  present	
           regulators	
  of	
  development	
  and	
  control	
                             I	
  have	
  heard	
  them	
  told	
  by	
  intelligent	
  people,	
  
perfect	
                  processes	
  such	
  as	
  cell	
  fate	
  determina*on	
  and	
                concerning	
  chiefs	
  who	
  have	
  held	
  dominion	
  in	
  the	
  
                           cell	
  death	
  (Abrahante	
  et	
  al.,	
  2003,	
  Brennecke	
               northern	
  countries,	
  and	
  who	
  spoke	
  the	
  Danish	
  
                           et	
  al.,	
  2003,	
  Chang	
  et	
  al.,	
  2004,	
  Chen	
  et	
  al.,	
     tongue;	
  and	
  also	
  concerning	
  some	
  of	
  their	
  family	
  
                           2004,	
  Johnston	
  and	
  Hobert,	
  2003,	
  Lee	
  et	
  al.,	
             branches,	
  according	
  to	
  what	
  has	
  been	
  told	
  me.	
  
                           1993]	
  
Implica5ons	
              These	
  results	
  indicate	
  that	
  although	
                              Now	
  it	
  is	
  said	
  that	
  ever	
  since	
  then	
  whenever	
  the	
  
are	
  hedged,	
           miR-­‐3723	
  confer	
  complete	
  protec*on	
  to	
                          camel	
  sees	
  a	
  place	
  where	
  ashes	
  have	
  been	
  
and	
  in	
  the	
         oncogene-­‐induced	
  senescence	
  in	
  a	
  manner	
                         scaTered,	
  he	
  wants	
  to	
  get	
  revenge	
  with	
  his	
  enemy	
  
present	
  tense	
         similar	
  to	
  p53	
  inac*va*on,	
  the	
  cellular	
                        the	
  rat	
  and	
  stomps	
  and	
  rolls	
  in	
  the	
  ashes	
  hoping	
  to	
  
                           response	
  to	
  DNA	
  damage	
  remains	
  intact	
                          get	
  the	
  rat	
  
Some	
  conclusions:	
  
•  How	
  we	
  read:	
  surface	
  code,	
  textbase,	
  situa*on	
  model	
  
•  Useful	
  components:	
  find	
  noun	
  phrases,	
  triples,	
  
   metadiscourse,	
  claims	
  and	
  evidence	
  	
  
•  Computers	
  keep	
  ge•ng	
  beTer	
  at	
  iden*fying	
  these	
  
•  Authoring	
  tools	
  might	
  let	
  us	
  help	
  computers	
  
•  But	
  for	
  the	
  forseeable	
  future,	
  scien*sts	
  will	
  con*nue	
  to	
  
   need	
  to	
  scan	
  the	
  literature	
  to	
  understand	
  and	
  believe	
  
   science	
  and	
  make	
  connec*ons	
  between	
  knowledge	
  
•  To	
  achieve	
  progress,	
  perhaps	
  focus	
  less	
  on	
  what	
  computers	
  
   can	
  do	
  and	
  more	
  on	
  how	
  humans	
  communicate?	
  
•  Let’s	
  pursue	
  collabora*ons	
  with	
  linguists,	
  cogni*ve	
  
   psychologists	
  etc.	
  on	
  how	
  we	
  read	
  and	
  learn!	
  
Acknowledgements	
  
•  Funding:	
  	
                                 •  Discussion	
  partners:	
  	
  
    –  Elsevier	
  Labs	
                             –  Phil	
  Bourne,	
  UCSD	
  
    –  NWO	
                                          –  Ed	
  Hovy,	
  	
  
•  Collaborators:	
  	
                               –  Gully	
  Burns,	
  ISI	
  
    –  Henk	
  Pander	
  Maat,	
  UU	
                –  Joanne	
  Luciano,	
  RPI	
  
    –  Agnes	
  Sandor,	
  XRCE	
                     –  Tim	
  Clark	
  et	
  al.,	
  Harvard	
  
    –  Jodi	
  Schneider,	
  DERI	
                      	
  …	
  and	
  all	
  of	
  you	
  J!	
  
    –  Rinke	
  Hoekstra	
  	
  co,	
  VU	
  
    –  Richard	
  Boyce	
  	
  co,	
  UpiT	
  
    –  Maria	
  Liakata,	
  EBI	
  
    –  Sophia	
  Ananiadou	
  	
  co,	
  
         NaCTeM	
  
    	
  
Ques*ons?	
  	
  


                  	
  
        Anita	
  de	
  Waard	
  
   a.dewaard@elsevier.com	
  
hTp://elsatglabs.com/labs/anita/	
  	
  
References	
  
[1]	
  J	
  Am	
  Med	
  Inform	
  Assoc.	
  2010	
  September;	
  17(5):	
  514–518	
  hTp://dx.doi.org/10.1136/jamia.2010.003947	
  	
  
[2]	
  Quanzhi	
  Li,	
  Yi-­‐Fang	
  Brook	
  Wu	
  (2006):	
  Iden*fying	
  important	
  concepts	
  from	
  medical	
  documents,	
  Journal	
  of	
  Biomedical	
  
Informa*cs	
  39	
  (2006)	
  668–679	
  
[3]	
  Useful	
  list	
  of	
  resources	
  in	
  bioinforma*cs	
  hTp://www.bioinforma*cs.ca/	
  
[4]	
  Biological	
  Expression	
  Language	
  –	
  hTp://www.openbel.org	
  	
  
[5]	
  Latour,	
  B.	
  and	
  Woolgar,	
  S.,	
  Laboratory	
  Life:	
  the	
  Social	
  Construc*on	
  of	
  Scien*fic	
  Facts,	
  1979,	
  Sage	
  Publica*ons	
  
[6]	
  Light	
  M,	
  Qiu	
  XY,	
  Srinivasan	
  P.	
  (2004).	
  The	
  language	
  of	
  bioscience:	
  facts,	
  specula*ons,	
  and	
  statements	
  in	
  between.	
  
BioLINK	
  2004:	
  Linking	
  Biological	
  Literature,	
  Ontologies	
  and	
  Databases	
  2004:17-­‐24.	
  
[7]	
  Wilbur	
  WJ,	
  Rzhetsky	
  A,	
  Shatkay	
  H	
  (2006).	
  New	
  direc*ons	
  in	
  biomedical	
  text	
  annota*ons:	
  defini*ons,	
  guidelines	
  and	
  
corpus	
  construc*on.	
  BMC	
  Bioinforma*cs	
  2006,	
  7:356.	
  
[8]	
  Thompson	
  P.,	
  Venturi	
  G.,	
  McNaught	
  J,	
  Montemagni	
  S,	
  Ananiadou	
  S.	
  (2008).	
  Categorising	
  modality	
  in	
  biomedical	
  texts.	
  
Proc.	
  LREC	
  2008	
  Wkshp	
  Building	
  and	
  Evalua*ng	
  Resources	
  for	
  Biomedical	
  Text	
  Mining	
  2008.	
  
[9]	
  Kim,	
  S-­‐M.	
  Hovy,	
  E.H.	
  (2004).	
  Determining	
  the	
  Sen*ment	
  of	
  Opinions.	
  Proceedings	
  of	
  the	
  COLING	
  conference,	
  Geneva,	
  
2004.	
  	
  
[10]	
  de	
  Waard,	
  A.	
  and	
  Schneider,	
  J.	
  (2012)	
  Formalising	
  Uncertainty:	
  An	
  Ontology	
  of	
  Reasoning,	
  Certainty	
  and	
  ATribu*on	
  
(ORCA),	
  Seman*c	
  Technologies	
  Applied	
  to	
  Biomedical	
  Informa*cs	
  and	
  Individualized	
  Medicine	
  workshop	
  at	
  ISWC	
  2012	
  
(submibed)	
  
[11]	
  Data2Seman*cs	
  project:	
  hTp://www.data2seman*cs.org/	
  	
  
[12]	
  Boyce	
  R,	
  Collins	
  C,	
  Horn	
  J,	
  Kalet	
  I.	
  (2009)	
  	
  Compu*ng	
  with	
  evidence	
  Part	
  I:	
  A	
  drug-­‐mechanism	
  evidence	
  taxonomy	
  
oriented	
  toward	
  confidence	
  assignment.	
  J	
  Biomed	
  Inform.	
  2009	
  Dec;42(6):979-­‐89.	
  Epub	
  2009	
  May	
  10,	
  see	
  also	
  
hTp://dbmi-­‐icode-­‐01.dbmi.piT.edu/dikb-­‐evidence/front-­‐page.html	
  	
  
[13]	
  Sándor,	
  Àgnes	
  and	
  de	
  Waard,	
  Anita,	
  (2012).	
  Iden*fying	
  Claimed	
  Knowledge	
  Updates	
  in	
  Biomedical	
  Research	
  Ar*cles,	
  
Workshop	
  on	
  Detec*ng	
  Structure	
  in	
  Scholarly	
  Discourse,	
  ACL	
  2012.	
  	
  
[14]	
  See	
  e.g.	
  hTp://ucsdbiolit.codeplex.com/	
  and	
  hTp://research.microso}.com/en-­‐us/projects/ontology/	
  for	
  MS	
  Word	
  
ontology	
  add-­‐ins	
  
Appendix:	
  ORCA	
  
Logical	
  structure	
  of	
  epistemic	
  evalua*ons:	
  
For	
  a	
  Proposi*on	
  P,	
  an	
  epistemically	
  marked	
  clause	
  E	
  
is	
  an	
  evalua*on	
  of	
  P,	
  	
  where	
  	
  EV,	
  B,	
  S(P),	
  with:	
  
    –  V	
  =	
  Value:	
  
            3	
  =	
  Assumed	
  true,	
  2	
  =	
  Probable,	
  1	
  =	
  Possible,	
  0	
  =	
  Unknown,	
  	
  
            (-­‐	
  1=	
  possibly	
  untrue,	
  -­‐	
  2	
  =	
  probably	
  untrue,	
  -­‐3	
  =	
  assumed	
  untrue)	
  
    –  B	
  =	
  Basis:	
  
            Reasoning	
  
            Data	
  	
  
    –  S	
  =	
  Source:	
  
            A	
  =	
  speaker	
  is	
  author	
  A,	
  explicit	
  
            IA	
  =	
  speaker	
  author,	
  A,	
  implicit	
  
            N	
  =	
  other	
  author	
  N,	
  explicit	
  
            NN	
  =	
  other	
  author	
  NN,	
  implicit	
  
            	
                                                                      Model	
  suggested	
  by	
  Eduard	
  Hovy,	
  	
  
                                                       InformaHon	
  Sciences	
  InsHtute	
  University	
  South	
  Califormia	
  
Adding	
  Epistemic	
  Evalua*on	
  
Claim	
                                                                                            ORCA	
  Value	
  
Together,	
  Lats2	
  and	
  ASPP1	
  shunt	
  p53	
  to	
  proapopto*c	
                          Value	
  =	
  3	
  
promoters	
  and	
  promote	
  the	
  death	
  of	
  polyploid	
  cells	
  [1].	
  (…)	
           Source	
  =	
  N	
  
	
                                                                                                 Basis	
  =	
  0	
  	
  
Further	
  biochemical	
  characteriza*on	
  of	
  hMOBs	
  showed	
  that	
  	
                   Value	
  =	
  3	
  
only	
  hMOB1A	
  and	
  hMOB1B	
  interact	
  with	
  both	
  LATS1	
  and	
                      Source	
  =	
  N	
  
LATS2	
  in	
  vitro	
  and	
  in	
  vivo	
  [39].	
  (…)	
                                        Basis	
  =	
  Data	
  	
  
	
                                                                                                 	
  
Our	
  findings	
  reveal	
  that	
  miR-­‐373	
  would	
  be	
  a	
  poten*al	
                    Value	
  =	
  1	
  or	
  2	
  ?	
  
oncogene	
  and	
  it	
  par*cipates	
  in	
  the	
  carcinogenesis	
  of	
  human	
               Source	
  =	
  Author	
  
esophageal	
  cancer	
  by	
  suppressing	
  LATS2	
  expression.	
  	
  	
                        Basis	
  =	
  Data	
  	
  
	
                                                                                                 	
  
Furthermore,	
  we	
  demonstrated	
  that	
  the	
  direct	
  inhibi*on	
  of	
                   Value	
  =	
  2	
  (or	
  3?)	
  
LATS2	
  protein	
  was	
  mediated	
  by	
  miR-­‐373	
  and	
  manipulated	
  the	
              Source	
  =	
  Author	
  
expression	
  of	
  miR-­‐373	
  to	
  affect	
  esophageal	
  cancer	
  cells	
  growth.	
  	
     Basis	
  =	
  Data	
  	
  
	
                                                                                                 	
  
Textual	
  Markers	
  
•  Modal	
  auxiliary	
  verbs	
  (e.g.	
  can,	
  could,	
  might)	
  	
  
•  Qualifying	
  adverbs	
  and	
  adjec*ves	
  (e.g.	
  interesHngly,	
  
   possibly,	
  likely,	
  potenHal,	
  somewhat,	
  slightly,	
  
   powerful,	
  unknown,	
  undefined)	
  
•  References,	
  either	
  external	
  (e.g.	
  ‘[Voorhoeve	
  et	
  al.,	
  
   2006]’)	
  or	
  internal	
  (e.g.	
  ‘See	
  fig.	
  2a’).	
  	
  
•  Repor*ng/epistemic	
  verbs	
  (e.g.	
  suggest,	
  imply,	
  
   indicate,	
  show)	
  	
  
   –  either	
  within	
  the	
  clause:	
  ‘These	
  results	
  suggest	
  that...’	
  	
  
   –  or	
  in	
  a	
  subordinate	
  clause	
  governed	
  by	
  repor*ng-­‐verb	
  
      matrix	
  clause	
  ‘{These	
  results	
  suggest	
  that}	
  indeed,	
  this	
  
      represents	
  the	
  true	
  endogenous	
  acHvity.’	
  
Markers	
  v.	
  Types:	
  1	
  paper,	
  640	
  segments	
  
Value	
                        Modal	
          Repor5ng	
         Ruled	
  by	
   Adverbs/ Referenc None	
                              Total	
  	
  
                               Aux	
  	
        Verb	
             RV	
            Adjec5ves	
   es	
  

Total	
  value	
  =	
  3	
        1	
  (0.5%)	
      81	
  (40%)	
   24	
  (12%)	
      7	
  (4%)	
   41	
  (20%)	
   47	
  (24%)	
  201(100%)	
  

Total	
  Value	
  =	
  2	
                           29	
  (51%)	
   23	
  (40%)	
      1	
  (2%)	
       4(7%)	
                 57(100%)	
  

Total	
  Value	
  =	
  1	
          9(27%)	
         11(33%)	
   11(33%)	
               1(3%)	
          1(3%)	
                 33(100%)	
  

Total	
  Value	
  =	
  0	
                            9	
  (64%)	
     3	
  (21%)	
      1(7%)	
          1(7%)	
                 14(100%)	
  

Total	
  No	
  Modality	
                            16(37%)	
            3(7%)	
                0	
      3(7%)	
   22(50%)	
   44(100%)	
  

Overall	
  Total	
                 10	
  (2%)	
     146(23%)	
   64(10%)	
              10(2%)	
         50(8%)	
   69(11%)	
  640(100%)	
  
Most	
  prevalent	
  clause	
  type:	
  	
  
                “These	
  results	
  suggest	
  that...”	
  
Adverb/Connec*ve	
           thus,	
  therefore,	
  together,	
  recently,	
  in	
  summary	
  	
  

Determiner/Pronoun	
  	
     it,	
  this,	
  these,	
  we/our	
  

Adjec*ve	
                   previous,	
  future,	
  beber	
  

Noun	
  phrase	
             data,	
  report,	
  study,	
  result(s);	
  method	
  or	
  reference	
  


Modal	
                      form	
  of	
  	
  ‘to	
  be’,	
  may,	
  remain	
  

Adjec*ve	
                   oken,	
  recently,	
  generally	
  

Verb	
                       show,	
  obtain,	
  consider,	
  view,	
  reveal,	
  suggest,	
  
                             hypothesize,	
  indicate,	
  believe	
  

Preposi*on	
  	
             that,	
  to	
  
Repor*ng	
  verbs	
  vs.	
  epistemic	
  value:	
  
Value	
  =	
  0	
        establish,	
  (remain	
  to	
  be)	
  elucidated,	
  	
  
(unknown)	
              be	
  (clear/useful),	
  (remain	
  to	
  be)	
  examined/determined,	
  
                         describe,	
  make	
  difficult	
  to	
  infer,	
  report	
  
Value	
  =	
  1	
        be	
  important,	
  consider,	
  expect,	
  hypothesize	
  (5x),	
  give	
  
(hypothe*cal)	
          insight,	
  raise	
  possibility	
  that,	
  suspect,	
  think	
  

Value	
  =	
  2	
        appear,	
  believe,	
  implicate	
  (2x),	
  imply,	
  indicate	
  (12x),	
  play	
  a	
  
(probable)	
             role,	
  represent,	
  suggest	
  (18x),	
  validate	
  (2x),	
  	
  

Value	
  =	
  3	
        be	
  able/apparent/important	
  /posi*ve/visible,	
  compare	
  
(presumed	
  true)	
     (2x),	
  confirm	
  (2x),	
  define,	
  	
  demonstrate	
  (15x),	
  detect	
  (5x),	
  
                         discover,	
  display	
  (3x),	
  eliminate,	
  find	
  (3x),	
  iden*fy	
  (4x),	
  
                         know,	
  need,	
  note	
  (2x),	
  observe	
  (2x),	
  obtain	
  (success/
                         results-­‐	
  3x),	
  prove	
  to	
  be,	
  refer,	
  report(2x),	
  	
  reveal	
  (3x),	
  
                         see(2x),	
  show(24x),	
  	
  study,	
  view	
  

Mais conteúdo relacionado

Mais de Anita de Waard

Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Anita de Waard
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingAnita de Waard
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataAnita de Waard
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...Anita de Waard
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecycleAnita de Waard
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to ReuseAnita de Waard
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareAnita de Waard
 
Argumentation in biology papers
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papersAnita de Waard
 

Mais de Anita de Waard (20)

Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
History of the future
History of the futureHistory of the future
History of the future
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data Lifecycle
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to Reuse
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and software
 
Argumentation in biology papers
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papers
 

Último

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

How Scientists Read, And Whether Computers Can Help Them

  • 1. How  Scien*sts  Read,  And  Whether   Computers  Can  Help  Them   Anita  de  Waard   Disrup*ve  Technologies  Director   Elsevier  Labs   Making  Sense  of  Biological  Systems,  Bozeman,  MT  
  • 2. Outline   •  Why  do  scien*sts  read?   •  How  do  we  read?  (Discourse  comprehension  101)   •  What  do  we  need  to  read:     –  Noun  phrases   –  Triples   –  Metadiscourse   –  Claims  and  Evidence   •  Can  the  computer  iden*fy  these  components?     •  Some  thoughts  on  explaining  our  texts  to  computers  
  • 3. How  and  why  scien*sts  read:   •  Why  do  we  read?     To  learn,  i.e.:  obtain  the  knowledge  contained  within  the   text  and  integrate  it  with  what  we  already  know.   •  What  do  we  read?     Things  that  are  ‘interes*ng’  :   –  Per*nent   –  Possibly/probably  true   –  Novel,  but  in  agreement  with  what  I  know   •  How  do  we  read?    
  • 4. Discourse  Comprehension  101   •  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:   This  is  how  linguis*cs  is  structured.     But  it  is  not  how  we  understand  text!  
  • 5. Discourse  Comprehension  101   •  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:   This  is  how  linguis*cs  is  structured.     But  it  is  not  how  we  understand  text!  
  • 6. Discourse  Comprehension  101   •  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:   This  is  how  linguis*cs  is  structured.     But  it  is  not  how  we  understand  text!  
  • 7. Discourse  Comprehension  101   •  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:   This  is  how  linguis*cs  is  structured.     But  it  is  not  how  we  understand  text!  
  • 8. Discourse  Comprehension  101   •  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:   This  is  how  linguis*cs  is  structured.     But  it  is  not  how  we  understand  text!  
  • 9. Discourse  Comprehension  101   •  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:   This  is  how  linguis*cs  is  structured.     But  it  is  not  how  we  understand  text!  
  • 10. Discourse  Comprehension  101   •  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:   This  is  how  linguis*cs  is  structured.     But  it  is  not  how  we  understand  text!   •  Kintsch  and  Van  Dijk,  ‘93:  we  read  a  text  at  three  levels:   –  surface  code:  literal  text,  exact  words/syntax   –  text  base:  preserves  meaning,  but  not  exact  wording   –  situa*on  model:  ‘microworld’  that  the  text  is  about:   constructed  inferen*ally  through  interac*on  between  the   text  and  background  knowledge   •  We  use  knowledge  about  text  genre  to  ac*vate  a  schema:     this  allows  crea*on  of  the  text  base  and  situa*on  model  
  • 12. What  is  this  paper  about?    
  • 13. What  is  this  paper  about?     A.  NOUN  PHRASES   transiently  expressed  miRNA  sponges   human  breast  cancer     high-­‐grade  malignancy   miR-­‐31   noninvasive  MCF7-­‐Ras   an*sense  oligonucleo*des     cell  viability     cloned     retroviral  vector   Is  it  per*nent?  -­‐>  Possibly…   Is  it  true?  -­‐>  ?   Is  it  new,  but  in  agreement  with  what  I  know?  -­‐>  -­‐?  
  • 14. What  is  this  paper  about?     B.  TRIPLES   miR-­‐31  expression  DEPRIVE  metasta*c  cells   miR-­‐31  PREVENT  acquisi*on  of  aggressive  traits   miR-­‐31  INHIBIT  noninvasive  MCF7-­‐Ras  cells     miR-­‐31  ENHANCE    invasion     cell  viability  AFFECT  inhibitor     Is  it  per*nent?  -­‐>  Possibly…   Is  it  true?  -­‐>  ?   Is  it  new,  but  in  agreement  with  what  I  know?  -­‐>?  
  • 15. What  is  this  paper  about?     C.  METADISCOURSE   The  preceding  observa*ons  demonstrated  that  X  expression  deprives  Y  cells  of   aTributes  associated  with  Z.     We  next  asked  whether  X  also  prevents  the  acquisi*on  of  A  traits  by  B  cells.   To  do  so,  we  transiently  inhibited  X  in  C  cells  with  either  D  or  E.     Both  approaches  inhibited  X  func*on  by  >  4.5-­‐fold  (Figure  S7A).   Suppression  of  X  enhanced  invasion  by  20-­‐fold  and  mo*lity  by  5-­‐fold,    but  F  was   unaffected  by  either  inhibitor  (Figure  3A;  Figure  S7B).       The  E  sponge  reduced  X  func*on  by  2.5-­‐fold,  but  did  not  affect  the  ac*vity  of  other   known  Js  (Figures  S8A  and  S8B).     Collec*vely,  these  data  indicated  that  sustained  X  ac*vity  is  necessary  to  prevent  the   acquisi*on  of  Z  traits  by  both  K  and  untransformed  B  cells.     Is  it  per*nent?  -­‐>  Need  content   Is  it  true?  -­‐>  Sounds  likely!  I  know  this  stuff!   Is  it  new,  but  in  agreement  with  what  I  know?    -­‐>  Need  content    
  • 16. What  is  this  paper  about?     D.  CLAIMS  AND  EVIDENCE   Claim:     •  sustained  miR-­‐31  ac*vity  is  necessary  to  prevent  the  acquisi*on  of  aggressive   traits  by  both  tumor  cells  and  untransformed  breast  epithelial   Evidence:  Method:     •  We   transiently   inhibited   miR-­‐31   in   noninvasive   MCF7-­‐Ras   cells   with   either   an*sense  oligonucleo*des  or  miRNA  sponges.   Evidence:  Result:     •  Both  approaches  inhibited  miR-­‐31  func*on  by  >4.5-­‐fold  (Figure  S7A).     •  Suppression   of   miR-­‐31   enhanced   invasion   by   20-­‐fold   and   mo*lity   by   5-­‐fold,   but  cell  viability  was  unaffected  by  either  inhibitor  (Figure  3A;  Figure  S7B).     •  The   miR-­‐31   sponge   reduced   miR-­‐31   func*on   by   2.5-­‐fold,   but   did   not   affect   the  ac*vity  of  other  known  an*metasta*c  miRNAs  (Figures  S8A  and  S8B).   Is  it  per*nent?  -­‐>  Probably   Is  it  true?  -­‐>  Sounds  likely!         Is  it  new,  but  in  agreement  with  what  I  know?  -­‐>  Check/know  
  • 17. What  is  this  paper  about?     E.  JOURNAL  &  AUTHOR’S  NAMES/AFFILIATIONS   Is  it  per*nent?  -­‐>  Possibly     Is  it  true?     -­‐>  Probably!   Is  it  new,  but  in  agreement  with  what  I  know?    -­‐>  Need  background  
  • 18. In  summary,  how  scien*sts  read:   •  Surface  code  provides  noun  phrases  and  triples  that  offer   pointers  re.  topical  relevance   •  Text  base  and  and  situa*on  model  are  created  through  specific   metadiscourse  conven*ons    (e.g.  refs  at  the  end)  that  create  a   biological  reasoning  model:     We  next  asked  whether  …   Hypothesis   To  do  so,  we  transiently  inhibited…     Goal/Method   Suppression  of  X  enhanced  invasion  …     Result   but  F  was  unaffected  …(Figure  3A).    …   Results   Collec*vely,  these  data  indicated  that  …  .   Implica*on   •  This  can  be  expressed  as  a  set  of  claims,  linked  to  evidence,  that   can  help  represent  key  points  in  the  paper   •  Journal  name  and  author’s  affiliaHon  help  define  schema  and   provide  ‘willingness  to  be  convinced’  socially/interpersonally.  
  • 19. Can  computers  help  us  iden*fy:   A.  Noun  phrases   B.  Triples   C.  Metadiscourse  elements   D.  Claims  +  evidence   E.  Journal  and  author’s  names  and  affilia*on  
  • 20. Can  computers  help  us  iden*fy:   A.  Noun  phrases   B.  Triples   C.  Metadiscourse  elements   D.  Claims  +  evidence   E.  Journal  and  author’s  names  and  affiliaHon  
  • 21. Noun  Phrases:  some  issues   •  Problem  1:  disambigua*ng  terms  (©  GoPubMed):   –  Hnrpa1  =  Tis  =  Fli-­‐2  =  nuclear  ribonucleoprotein  A1  =  helix   destabilizing  protein  =  single-­‐strand  binding  protein  =  hnRNP  core   protein  A1  =  HDP-­‐1  =  topoisomerase-­‐inhibitor  suppressed.   –  Cellulose  1,4-­‐beta-­‐cellobiosidase  =  exoglucanase   –  COLD  =/  C.O.L.D.  =/  cold  (runny  nose)  =/  cold  (low  T)     •  Problem  2:  disambigua*ng  en**es  (©  M.  Martone):   –  95  an*bodies  were  (manually!)  iden*fied  in  8  ar*cles   –  52  did  not  contain  enough  informa*on  to  determine  the  an*body   used   –  Some  provided  details  in  other  papers   –  Failed  to  give  species,  clonality,  vendor,  or  catalog  number  
  • 22. Noun  Phrases:  some  progress   •  Despite  these  difficul*es,  noun  phrase  recall/precision  is   quite  high,  e.g.  I2B22011  [1],  [2],  others:  90%-­‐98%   •  Many  tools,  see  [3]  for  a  list;  e.g.  GoPubMed:      
  • 23. Triples:  some  issues:   •  Con*ngent  on  good  NP  &  VP  detec*on   •  Hard  to  parse  text!  E.g.  a  commercial  tool  gave:   insulin    maintaining      glucose  homeostasis       When  insulin  secre*on  cannot  be  increased  adequately  (type  I   diabetes  defect)  to  overcome  insulin  resistance  in  maintaining   glucose  homeostasis,  hyperglycemia  and  glucose  intolerance   ensues.     insulin    may  be  involved      glucose  homeostasis       Because  PANDER  is  expressed  by  pancrea*c  beta-­‐cells  and  in   response  to  glucose  in  a  similar  way  to  those  of  insulin,  PANDER   may  be  involved  in  glucose  homeostasis.  
  • 24. Triples:  some  progress:   Biological  Expression  Language  [4]:     We  provide  evidence  that  these  miRNAs  are  potenHal  novel  oncogenes  parHcipaHng  in  the  development   of  human  tesHcular  germ  cell  tumors  by  numbing  the  p53  pathway,  thus  allowing  tumorigenic  growth  in   the  presence  of  wild-­‐type  p53.     Increased  abundance  of  miR-­‐372  decreases  ac5vity  of  TP53   r(MIR:miR-372) -| tscript(p(HUGO:Trp53)) Context:  cancer   SET Disease = “Cancer” Ac5vity  of  TP53  decreases  cell  growth   tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”  
  • 25. Metadiscourse:  why  it  maTers   “[Y]ou  can  transform  ..  fic*on  into  fact  just  by  adding  or   subtrac*ng  references”,  Bruno  Latour  [5] •  Voorhoeve  et  al.,  2006:   These  miRNAs  neutralize  p53-­‐  mediated  CDK   inhibi*on,  possibly  through  direct  inhibi*on  of  the  expression  of  the  tumor   suppressor  LATS2.   •  Kloosterman  and  Plasterk,  2006:   In  a  gene*c  screen,  miR-­‐372  and  miR-­‐373   were  found  to  allow  prolifera*on  of  primary  human  cells  that  express   oncogenic  RAS  and  ac*ve  p53,  possibly  by  inhibi*ng  the  tumor  suppressor   LATS2  (Voorhoeve  et  al.,  2006).   •  Yabuta  et  al.,  2007:     [On  the  other  hand,]  two  miRNAs,  miRNA-­‐372  and-­‐373,   func*on  as  poten5al  novel  oncogenes  in  tes*cular  germ  cell  tumors  by   inhibi*on  of  LATS2  expression,  which  suggests  that  Lats2  is  an  important   tumor  suppressor  (Voorhoeve  et  al.,  2006).     •  Okada  et  al.,  2011:   Two  oncogenic  miRNAs,  miR-­‐372  and  miR-­‐373,  directly   inhibit  the  expression  of  Lats2,  thereby  allowing  tumorigenic  growth  in  the   presence  of  p53  (Voorhoeve  et  al.,  2006).  
  • 26. Metadiscourse:  some  progress   •  Hedging  cues,  specula*ve  language,  modality/nega*on:   –  Light  et  al  [6]:  finding  specula*ve  language   –  Wilbur  et  al  (Hagit)  [7]:  focus,  polarity,  certainty,  evidence,  and   direc*onality   –  Thompson  et  al  (Sophia)  [8]:  level  of  specula*on,  type/source   of  the  evidence  and  level  of  certainty       •  Sen*ment  detec*on  (e.g.  Kim  and  Hovy  [9]  a.m.o.):     –  Holder  of  the  opinion,  strength,  polarity  as  ‘mathema*cal   func*on’  ac*ng  on  main  proposi*onal  content     •  Can  make  this  part  of  the  seman*c  web:  (e.g.,  Ontology  for   Reasoning,  Certainty  and  ATribu*on,  ORCA  [10]):     –  Value  (Presumed  True,  Probable,  Possible,  Unknown)   –  Source  (Author,  Named  Other,  Unknown)   –  Basis  (Data,  Reasoning,  Unknown)  
  • 27. Claims  and  Evidence:  some  issues:   •  Data2Seman*cs  [11]:  linking  clinical  guidelines  to  evidence.   Inconsistency  within  guideline  and  guidelines  v.  evidence:       •  Studies  have  demonstrated  inconsistent  results  regarding  the  use  of  such   markers  of  inflamma*on  as  C-­‐reac*ve  protein  (CRP),  interleukins-­‐  6  (IL-­‐6)  and   -­‐8,  and  procalcitonin  (PCT)  in  neutropenic  pa*ents  with  cancer  [55–57].     •  [55]:  PCT  and  IL-­‐6  are  more  reliable  markers  than  CRP  for  predic*ng   bacteremia  in  pa*ents  with  febrile  neutropenia   •  [56]  In  conclusion,  daily  measurement  of  PCT  or  IL-­‐6  could  help  iden5fy   neutropenic  pa5ents  with  a  stable  course  when  the  fever  lasts  >3  d.  …,     it  would  reduce  adverse  events  and  treatment  costs.     •  [57]  Our  study  supports  the  value  of  PCT  as  a  reliable  tool  to  predict   clinical  outcome  in  febrile  neutropenia.   •  Drug  Interac*on  Knowledgebase  [12]:  how  to  iden*fy  evidence?     •  R-­‐citalopram_is_not_substrate_of_cyp2c19:     •  At  10uM  R-­‐  or  S-­‐CT,  ketoconazole  reduced  reac*on  velocity  to  55  -­‐60%  of   control,  quinidine  to  80%,  and  omeprazole  to  80-­‐85%  of  control  (Fig.  6).    
  • 28. Claims  and  Evidence:  some  progress   •  Defining  ‘salient  knowledge  components’  in  text:   –  Argumenta*ve  zones,  CoreSC  can  both  be  found   –  Blake,  Claim  networks  (more  soon!)   –  Claimed  Knowledge  Updates  (Sandor/de  Waard,  [13]):      
  • 29. Perhaps  we  should  start  wri*ng  for   computers?   •  So  why  doesn’t  the  author  add  this  informa*on?     If  you’re  know  you’re  going  to  mine  it,  why  bury  it?   •  Authoring  tools  for  en*ty  iden*fica*on:  MS  for   Chemistry,  Math,  proteins;  some  experiments  but  no   solu*on  yet  [14]   •  Authoring  tool  for  triple  iden*fica*on  (MS  Ac*veText)   •  But  the  ques*on  remains:     A}er  we’ve  ‘extracted’  all  the  ‘facts’,   what  is  all  the  gunk  that  remains     in  the  filter?      
  • 30. Perhaps  we  should  explain:  a  paper  is  rhetorical?   Aristotle   Quin5lian   Scien5fic  Paper   The  introduc*on  of  a  speech,  where  one  announces  the  subject   Introduc*on and  purpose  of  the  discourse,  and  where  one  usually  employs   Introduc*on:   prooimion   /  exordium   the  persuasive  appeal  to  ethos  in  order  to  establish  credibility   posi*oning   with  the  audience.     Statement  of   The  speaker  here  provides  a  narra*ve  account  of  what  has   Introduc*on:  research   prothesis   Facts/ happened  and  generally  explains  the  nature  of  the  case.     narraHo   ques*on   Summary/   The  proposi*o  provides  a  brief  summary  of  what  one  is  about     proposHHo   to  speak  on,  or  concisely  puts  forth  the  charges  or  accusa*on.     Summary  of  contents   Proof/   The  main  body  of  the  speech  where  one  offers  logical   pis*s   confirmaHo   arguments  as  proof.  The  appeal  to  logos  is  emphasized  here.   Results   Refuta*on/   As  the  name  connotes,  this  sec*on  of  a  speech  was  devoted  to     refutaHo   answering  the  counterarguments  of  one's  opponent.   Related  Work   Following  the  refuta*o  and  concluding  the  classical  ora*on,  the   Discussion:  summary,   epilogos   peroraHo     perora*o  conven*onally  employed  appeals  through  pathos,   and  o}en  included  a  summing  up.   implica*ons.   -   goal  of  the  paper  is  to  be  published;  it  uses  author/journal  as  a  host   -   format  has  co-­‐evolved:  predator-­‐prey  rela*onship  with  reviewers  
  • 31. Perhaps  we  should  explain:  a  paper  is  a  story?   Story Grammar The Story of Goldilocks and Paper The AXH Domain of Ataxin-1 Mediates the Three Bears Grammar Neurodegeneration through Its Interaction with Gfi-1/ Senseless Proteins Setting Time Once upon a time Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged. Character a little girl named Goldilocks Objects of the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract, study Location She went for a walk in the forest. Pretty soon, she came upon a Experimental studied and compared in vivo effects and interactions to those of the house. setup human protein Theme Goal She knocked and, when no one Research Gain insight into how Atx-1's function contributes to SCA1 answered, goal pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood. Attempt she walked right in. Hypothesis Atx-1 may play a role in the regulation of gene expression Episode Name At the table in the kitchen, there Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed were three bowls of porridge. in Files Subgoal Goldilocks was hungry. Subgoal test the function of the AXH domain Attempt She tasted the porridge from the Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and first bowl. Perrimon, 1993) and compared its effects to those of hAtx-1. Outcome This porridge is too hot! she Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives exclaimed. expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in Attempt So, she tasted the porridge from the the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days second bowl. after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells Outcome This porridge is too cold, she said Data (data not shown), Attempt So, she tasted the last bowl of porridge. Results both genotypes show many large holes and loss of cell integrity at 28 days Outcome Ahhh, this porridge is just right, she (Figures 1B-1D).
  • 32. A  closer  look  at  verb  tense:   Conceptual realm: ‘state’ (gnomic) present •  ‘Dopaminergic innervation plays a major role in the control of mood and its perturbation’ Experimental realm: ‘event’ past •  ‘Four out of seven cell lines expressed this cluster’, •  ‘Adult rats were individually housed for 2 days before testing.’ Argumentational realm: ‘instantaneous’ present; to-infinitive •  ‘These results suggest that...’, •  ‘To identify these mechanisms…’ Discourse progression: ‘instantaneous’ present •  ‘Fig 2a shows that’ •  ‘see figure 7A’, Reference to other work: present perfect - ‘finalised’ past •  ‘Previous work has demonstrated that VPCs are sensitive to the levels of let-60/RAS (Han and Sternberg, 1990).’  
  • 33. Tense  use  in  science  and  mythology:   Facts  in  the   Endogenous  small  RNAs  (miRNAs)  regulate   I  sing  of  golden-­‐throned  Hera  whom  Rhea  bare.   eternal  present   gene  expression  by  mechanisms  conserved   Queen  of  the  immortals  is  she,  surpassing  all  in   across  metazoans.   beauty:  she  is  the  sister  and  the  wife  of  loud-­‐ thundering  Zeus,  -­‐-­‐the  glorious  one  whom  all  the   blessed  throughout  high  Olympus  reverence  and   honor.   Events  in  the   Vehicle-­‐treated  animals  spent  equivalent   Now  the  wooers  turned  to  the  dance  and  to   simple  past   *me  inves*ga*ng  a  juvenile  in  the  first  and   gladsome  song,  and  made  them  merry,  and  waited   second  sessions  in  experiments  conducted  in   *ll  evening  should  come;  and  as  they  made  merry   the  NAC  and  the  striatum:    T1  values  were   dark  evening  came  upon  them.   122  ±  6  s  and  114  ±  5  s.   Events  with   We  also  generated  BJ/ET  cells  expressing  the   And  she  took  her  mighty  spear,  *pped  with  sharp   embedded   RASV12-­‐ERTAM  chimera  gene,  which  is  only   bronze,  heavy  and  huge  and  strong,  wherewith   facts   ac*ve  when  tamoxifen  is  added  (De  Vita  et  al,   she  vanquishes  the  ranks  of  men-­‐of  warriors,  with   2005).   whom  she  is  wroth,  she,  the  daughter  of  the   mighty  sire.   AMribu5on  in   miRNAs  have  emerged  as  important   In  this  book  I  have  had  old  stories  wriTen  down,  as   the  present   regulators  of  development  and  control   I  have  heard  them  told  by  intelligent  people,   perfect   processes  such  as  cell  fate  determina*on  and   concerning  chiefs  who  have  held  dominion  in  the   cell  death  (Abrahante  et  al.,  2003,  Brennecke   northern  countries,  and  who  spoke  the  Danish   et  al.,  2003,  Chang  et  al.,  2004,  Chen  et  al.,   tongue;  and  also  concerning  some  of  their  family   2004,  Johnston  and  Hobert,  2003,  Lee  et  al.,   branches,  according  to  what  has  been  told  me.   1993]   Implica5ons   These  results  indicate  that  although   Now  it  is  said  that  ever  since  then  whenever  the   are  hedged,   miR-­‐3723  confer  complete  protec*on  to   camel  sees  a  place  where  ashes  have  been   and  in  the   oncogene-­‐induced  senescence  in  a  manner   scaTered,  he  wants  to  get  revenge  with  his  enemy   present  tense   similar  to  p53  inac*va*on,  the  cellular   the  rat  and  stomps  and  rolls  in  the  ashes  hoping  to   response  to  DNA  damage  remains  intact   get  the  rat  
  • 34. Some  conclusions:   •  How  we  read:  surface  code,  textbase,  situa*on  model   •  Useful  components:  find  noun  phrases,  triples,   metadiscourse,  claims  and  evidence     •  Computers  keep  ge•ng  beTer  at  iden*fying  these   •  Authoring  tools  might  let  us  help  computers   •  But  for  the  forseeable  future,  scien*sts  will  con*nue  to   need  to  scan  the  literature  to  understand  and  believe   science  and  make  connec*ons  between  knowledge   •  To  achieve  progress,  perhaps  focus  less  on  what  computers   can  do  and  more  on  how  humans  communicate?   •  Let’s  pursue  collabora*ons  with  linguists,  cogni*ve   psychologists  etc.  on  how  we  read  and  learn!  
  • 35. Acknowledgements   •  Funding:     •  Discussion  partners:     –  Elsevier  Labs   –  Phil  Bourne,  UCSD   –  NWO   –  Ed  Hovy,     •  Collaborators:     –  Gully  Burns,  ISI   –  Henk  Pander  Maat,  UU   –  Joanne  Luciano,  RPI   –  Agnes  Sandor,  XRCE   –  Tim  Clark  et  al.,  Harvard   –  Jodi  Schneider,  DERI    …  and  all  of  you  J!   –  Rinke  Hoekstra    co,  VU   –  Richard  Boyce    co,  UpiT   –  Maria  Liakata,  EBI   –  Sophia  Ananiadou    co,   NaCTeM    
  • 36. Ques*ons?       Anita  de  Waard   a.dewaard@elsevier.com   hTp://elsatglabs.com/labs/anita/    
  • 37. References   [1]  J  Am  Med  Inform  Assoc.  2010  September;  17(5):  514–518  hTp://dx.doi.org/10.1136/jamia.2010.003947     [2]  Quanzhi  Li,  Yi-­‐Fang  Brook  Wu  (2006):  Iden*fying  important  concepts  from  medical  documents,  Journal  of  Biomedical   Informa*cs  39  (2006)  668–679   [3]  Useful  list  of  resources  in  bioinforma*cs  hTp://www.bioinforma*cs.ca/   [4]  Biological  Expression  Language  –  hTp://www.openbel.org     [5]  Latour,  B.  and  Woolgar,  S.,  Laboratory  Life:  the  Social  Construc*on  of  Scien*fic  Facts,  1979,  Sage  Publica*ons   [6]  Light  M,  Qiu  XY,  Srinivasan  P.  (2004).  The  language  of  bioscience:  facts,  specula*ons,  and  statements  in  between.   BioLINK  2004:  Linking  Biological  Literature,  Ontologies  and  Databases  2004:17-­‐24.   [7]  Wilbur  WJ,  Rzhetsky  A,  Shatkay  H  (2006).  New  direc*ons  in  biomedical  text  annota*ons:  defini*ons,  guidelines  and   corpus  construc*on.  BMC  Bioinforma*cs  2006,  7:356.   [8]  Thompson  P.,  Venturi  G.,  McNaught  J,  Montemagni  S,  Ananiadou  S.  (2008).  Categorising  modality  in  biomedical  texts.   Proc.  LREC  2008  Wkshp  Building  and  Evalua*ng  Resources  for  Biomedical  Text  Mining  2008.   [9]  Kim,  S-­‐M.  Hovy,  E.H.  (2004).  Determining  the  Sen*ment  of  Opinions.  Proceedings  of  the  COLING  conference,  Geneva,   2004.     [10]  de  Waard,  A.  and  Schneider,  J.  (2012)  Formalising  Uncertainty:  An  Ontology  of  Reasoning,  Certainty  and  ATribu*on   (ORCA),  Seman*c  Technologies  Applied  to  Biomedical  Informa*cs  and  Individualized  Medicine  workshop  at  ISWC  2012   (submibed)   [11]  Data2Seman*cs  project:  hTp://www.data2seman*cs.org/     [12]  Boyce  R,  Collins  C,  Horn  J,  Kalet  I.  (2009)    Compu*ng  with  evidence  Part  I:  A  drug-­‐mechanism  evidence  taxonomy   oriented  toward  confidence  assignment.  J  Biomed  Inform.  2009  Dec;42(6):979-­‐89.  Epub  2009  May  10,  see  also   hTp://dbmi-­‐icode-­‐01.dbmi.piT.edu/dikb-­‐evidence/front-­‐page.html     [13]  Sándor,  Àgnes  and  de  Waard,  Anita,  (2012).  Iden*fying  Claimed  Knowledge  Updates  in  Biomedical  Research  Ar*cles,   Workshop  on  Detec*ng  Structure  in  Scholarly  Discourse,  ACL  2012.     [14]  See  e.g.  hTp://ucsdbiolit.codeplex.com/  and  hTp://research.microso}.com/en-­‐us/projects/ontology/  for  MS  Word   ontology  add-­‐ins  
  • 39. Logical  structure  of  epistemic  evalua*ons:   For  a  Proposi*on  P,  an  epistemically  marked  clause  E   is  an  evalua*on  of  P,    where    EV,  B,  S(P),  with:   –  V  =  Value:   3  =  Assumed  true,  2  =  Probable,  1  =  Possible,  0  =  Unknown,     (-­‐  1=  possibly  untrue,  -­‐  2  =  probably  untrue,  -­‐3  =  assumed  untrue)   –  B  =  Basis:   Reasoning   Data     –  S  =  Source:   A  =  speaker  is  author  A,  explicit   IA  =  speaker  author,  A,  implicit   N  =  other  author  N,  explicit   NN  =  other  author  NN,  implicit     Model  suggested  by  Eduard  Hovy,     InformaHon  Sciences  InsHtute  University  South  Califormia  
  • 40. Adding  Epistemic  Evalua*on   Claim   ORCA  Value   Together,  Lats2  and  ASPP1  shunt  p53  to  proapopto*c   Value  =  3   promoters  and  promote  the  death  of  polyploid  cells  [1].  (…)   Source  =  N     Basis  =  0     Further  biochemical  characteriza*on  of  hMOBs  showed  that     Value  =  3   only  hMOB1A  and  hMOB1B  interact  with  both  LATS1  and   Source  =  N   LATS2  in  vitro  and  in  vivo  [39].  (…)   Basis  =  Data         Our  findings  reveal  that  miR-­‐373  would  be  a  poten*al   Value  =  1  or  2  ?   oncogene  and  it  par*cipates  in  the  carcinogenesis  of  human   Source  =  Author   esophageal  cancer  by  suppressing  LATS2  expression.       Basis  =  Data         Furthermore,  we  demonstrated  that  the  direct  inhibi*on  of   Value  =  2  (or  3?)   LATS2  protein  was  mediated  by  miR-­‐373  and  manipulated  the   Source  =  Author   expression  of  miR-­‐373  to  affect  esophageal  cancer  cells  growth.     Basis  =  Data        
  • 41. Textual  Markers   •  Modal  auxiliary  verbs  (e.g.  can,  could,  might)     •  Qualifying  adverbs  and  adjec*ves  (e.g.  interesHngly,   possibly,  likely,  potenHal,  somewhat,  slightly,   powerful,  unknown,  undefined)   •  References,  either  external  (e.g.  ‘[Voorhoeve  et  al.,   2006]’)  or  internal  (e.g.  ‘See  fig.  2a’).     •  Repor*ng/epistemic  verbs  (e.g.  suggest,  imply,   indicate,  show)     –  either  within  the  clause:  ‘These  results  suggest  that...’     –  or  in  a  subordinate  clause  governed  by  repor*ng-­‐verb   matrix  clause  ‘{These  results  suggest  that}  indeed,  this   represents  the  true  endogenous  acHvity.’  
  • 42. Markers  v.  Types:  1  paper,  640  segments   Value   Modal   Repor5ng   Ruled  by   Adverbs/ Referenc None   Total     Aux     Verb   RV   Adjec5ves   es   Total  value  =  3   1  (0.5%)   81  (40%)   24  (12%)   7  (4%)   41  (20%)   47  (24%)  201(100%)   Total  Value  =  2   29  (51%)   23  (40%)   1  (2%)   4(7%)   57(100%)   Total  Value  =  1   9(27%)   11(33%)   11(33%)   1(3%)   1(3%)   33(100%)   Total  Value  =  0   9  (64%)   3  (21%)   1(7%)   1(7%)   14(100%)   Total  No  Modality   16(37%)   3(7%)   0   3(7%)   22(50%)   44(100%)   Overall  Total   10  (2%)   146(23%)   64(10%)   10(2%)   50(8%)   69(11%)  640(100%)  
  • 43. Most  prevalent  clause  type:     “These  results  suggest  that...”   Adverb/Connec*ve   thus,  therefore,  together,  recently,  in  summary     Determiner/Pronoun     it,  this,  these,  we/our   Adjec*ve   previous,  future,  beber   Noun  phrase   data,  report,  study,  result(s);  method  or  reference   Modal   form  of    ‘to  be’,  may,  remain   Adjec*ve   oken,  recently,  generally   Verb   show,  obtain,  consider,  view,  reveal,  suggest,   hypothesize,  indicate,  believe   Preposi*on     that,  to  
  • 44. Repor*ng  verbs  vs.  epistemic  value:   Value  =  0   establish,  (remain  to  be)  elucidated,     (unknown)   be  (clear/useful),  (remain  to  be)  examined/determined,   describe,  make  difficult  to  infer,  report   Value  =  1   be  important,  consider,  expect,  hypothesize  (5x),  give   (hypothe*cal)   insight,  raise  possibility  that,  suspect,  think   Value  =  2   appear,  believe,  implicate  (2x),  imply,  indicate  (12x),  play  a   (probable)   role,  represent,  suggest  (18x),  validate  (2x),     Value  =  3   be  able/apparent/important  /posi*ve/visible,  compare   (presumed  true)   (2x),  confirm  (2x),  define,    demonstrate  (15x),  detect  (5x),   discover,  display  (3x),  eliminate,  find  (3x),  iden*fy  (4x),   know,  need,  note  (2x),  observe  (2x),  obtain  (success/ results-­‐  3x),  prove  to  be,  refer,  report(2x),    reveal  (3x),   see(2x),  show(24x),    study,  view