SlideShare uma empresa Scribd logo
1 de 125
Baixar para ler offline
Whatʼs wrong with
  research papers -
and (how) can we fix it?

          Anita de Waard
    Disruptive Technologies Director
              Elsevier Labs
       a.dewaard@elsevier.com
    http://elsatglabs.com/labs/anita
The Big Problem:




                   2
The Big Problem:

1)" There are too many papers




                                2
The Big Problem:

1)" There are too many papers
2)" We have too little time to read them




                                           2
The Big Problem:

1)" There are too many papers
2)" We have too little time to read them




                                           2
To address this problem, we make:




                                    3
To address this problem, we make:
• databases
• text mining tools
• nanopublications
• data publications
• wiki publications
• ontologies; ontology integration tools
• workflow/data integration systems
• executable components
• ....and write emails/grants/papers/blogs about this...
• ... and we end up with:


                                                           3
To address this problem, we make:
• databases
• text mining tools
• nanopublications
• data publications
• wiki publications
• ontologies; ontology integration tools
• workflow/data integration systems
• executable components
• ....and write emails/grants/papers/blogs about this...
• ... and we end up with:
           1)" Even more papers!!
           2)" Even less time to read them!!               3
What problems are we solving?




                                4
What problems are we solving?

• Weʼre mostly improving the format of the research article.




                                                           4
What problems are we solving?

• Weʼre mostly improving the format of the research article.
• This talk: aspects of the format that are being improved
  (and some examples of work to improve them):
   A.Issues with the paper format
   B.Issues pertaining to habits of writing
   C.Issues inherent to language
   D.Issues in trying to create connected content




                                                           4
What problems are we solving?

• Weʼre mostly improving the format of the research article.
• This talk: aspects of the format that are being improved
  (and some examples of work to improve them):
   A.Issues with the paper format
   B.Issues pertaining to habits of writing
   C.Issues inherent to language
   D.Issues in trying to create connected content
• Do any of these address the Big Problem?



                                                           4
What problems are we solving?

• Weʼre mostly improving the format of the research article.
• This talk: aspects of the format that are being improved
  (and some examples of work to improve them):
   A.Issues with the paper format
   B.Issues pertaining to habits of writing
   C.Issues inherent to language
   D.Issues in trying to create connected content
• Do any of these address the Big Problem?
• What shall we do about it?


                                                           4
A. Issue: the paper format




                             5
A. Issue: the paper format
A1:" Paper is two-dimensional




                                5
A. Issue: the paper format
A1:" Paper is two-dimensional
A2:" Paper is linear




                                5
A. Issue: the paper format
A1:" Paper is two-dimensional
A2:" Paper is linear
A3: Paper is not interactive




                                5
A. Issue: the paper format
A1:" Paper is two-dimensional
A2:" Paper is linear
A3: Paper is not interactive




                                5
A1: Issue: paper is two-dimensional




                                      6
A1: Issue: paper is two-dimensional
• Some experiments: allow representations of interactive
  figures (Wolfram Alpha), Utopia, Chem-3d




                                                           6
A1: Issue: paper is two-dimensional
• Some experiments: allow representations of interactive
  figures (Wolfram Alpha), Utopia, Chem-3d
• Lack of experimentation with formats: the internet is
  multi-dimensional, so why do we still need page limits?




                                                            6
A1: Issue: paper is two-dimensional
• Some experiments: allow representations of interactive
  figures (Wolfram Alpha), Utopia, Chem-3d
• Lack of experimentation with formats: the internet is
  multi-dimensional, so why do we still need page limits?




                                                            6
A2: Issue: paper is linear




                             7
A2: Issue: paper is linear
• Read from front to back (although research
  suggests a quick skim to core parts, but
  linearity helps us do that)




                                               7
A2: Issue: paper is linear
• Read from front to back (although research
  suggests a quick skim to core parts, but
  linearity helps us do that)
• References are at the end, so your reading is
  not interrupted




                                                  7
A2: Issue: paper is linear
• Read from front to back (although research
  suggests a quick skim to core parts, but
  linearity helps us do that)
• References are at the end, so your reading is
  not interrupted
• Headers are sequential - and not directly
  accessible



                                                  7
A2: (Old) Experiment: ABCDE




                              8
A2: (Old) Experiment: ABCDE
• LaTeX Stylesheet:
 –Annotation
 –Background
 –Contribution
 –Discussion
 –Entities (references, projects,
  terms in ontologies, etc) in RDF
 –Core sentences create structured
  abstract




                                     8
A2: (Old) Experiment: ABCDE
• LaTeX Stylesheet:
   –Annotation
   –Background
   –Contribution
   –Discussion
   –Entities (references, projects,
    terms in ontologies, etc) in RDF
   –Core sentences create structured
    abstract
• E.g. in proceedings: collect all core Contribution
  components

                                                       8
A2: (Old) Experiment: ABCDE
• LaTeX Stylesheet:
   –Annotation
   –Background
   –Contribution
   –Discussion
   –Entities (references, projects,
     terms in ontologies, etc) in RDF
   –Core sentences create structured
     abstract
• E.g. in proceedings: collect all core Contribution
  components
• I still have the stylesheets, if anyone’s interested :-)!
                                                              8
A3: Paper is not interactive




                               9
A3: Paper is not interactive
• Experiment:
  Executable papers:
 –Run code within a paper
 –Experiments: R, SPSS,
  Vistrails
 –Rerender code within a
  paper, change algorithm/see effect;
  run different dataset
 –How do you archive software?
  Satyanarayanan at CMU: Olive, ‘Internet ecosystem
  of curated virtual machine image collections’
                                                  9
B. Issue: habits of writing




                              10
B. Issue: habits of writing
B1: Cite a paper - not a claim




                                  10
B. Issue: habits of writing
B1: Cite a paper - not a claim
B2: No precision in describing entities




                                          10
B. Issue: habits of writing
B1: Cite a paper - not a claim
B2: No precision in describing entities
B3: We write post-mortems (stories :-)!)




                                           10
B1: Citations create facts:




                              11
B1: Citations create facts:
-   Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK
    inhibition, possibly through direct inhibition of the expression of the
    tumorsuppressor LATS2.”




                                                                          11
B1: Citations create facts:
-   Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK
    inhibition, possibly through direct inhibition of the expression of the
    tumorsuppressor LATS2.”

-   Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and
    miR-373 were found to allow proliferation of primary human cells
    that express oncogenic RAS and active p53, possibly by inhibiting
    the tumor suppressor LATS2 (Voorhoeve et al., 2006).”




                                                                          11
B1: Citations create facts:
-   Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK
    inhibition, possibly through direct inhibition of the expression of the
    tumorsuppressor LATS2.”

-   Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and
    miR-373 were found to allow proliferation of primary human cells
    that express oncogenic RAS and active p53, possibly by inhibiting
    the tumor suppressor LATS2 (Voorhoeve et al., 2006).”

-   Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372
    and-373, function as potential novel oncogenes in testicular germ
    cell tumors by inhibition of LATS2 expression, which suggests
    that Lats2 is an important tumor suppressor (Voorhoeve et al.,
    2006).”




                                                                          11
B1: Citations create facts:
-   Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK
    inhibition, possibly through direct inhibition of the expression of the
    tumorsuppressor LATS2.”

-   Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and
    miR-373 were found to allow proliferation of primary human cells
    that express oncogenic RAS and active p53, possibly by inhibiting
    the tumor suppressor LATS2 (Voorhoeve et al., 2006).”

-   Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372
    and-373, function as potential novel oncogenes in testicular germ
    cell tumors by inhibition of LATS2 expression, which suggests
    that Lats2 is an important tumor suppressor (Voorhoeve et al.,
    2006).”

-   Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and
    miR-373, directly inhibit the expression of Lats2, thereby allowing
    tumorigenic growth in the presence of p53 (Voorhoeve et al.,
    2006).”
                                                                          11
B1: TAC2012: Add authorʼs text to citation




                                        12
B1: TAC2012: Add authorʼs text to citation
Voorhoeve, P. M.; le Sage, C et al. (2006). A Genetic Screen Implicates miRNA-372 and
miRNA-373 As Oncogenes in Testicular Germ Cell Tumors, Cell 124 (6) pp.1169 - 1181
Citing goal: “To perform genetic screens for novel functions of miRNAs,”
−   in order to identify miRNAs functionally associated with carcinogenesis
−    to identify miRNAs that when overexpressed could substitute for p53 loss and allow
continued proliferation in the context of Ras activation




                                                                                          12
B1: TAC2012: Add authorʼs text to citation
Voorhoeve, P. M.; le Sage, C et al. (2006). A Genetic Screen Implicates miRNA-372 and
miRNA-373 As Oncogenes in Testicular Germ Cell Tumors, Cell 124 (6) pp.1169 - 1181
Citing goal: “To perform genetic screens for novel functions of miRNAs,”
−   in order to identify miRNAs functionally associated with carcinogenesis
−    to identify miRNAs that when overexpressed could substitute for p53 loss and allow
continued proliferation in the context of Ras activation
Citing method: “We subsequently created a human miRNA expression library (miR-Lib) by
cloning almost all annotated human miRNAs into our vector (Rfam release 6) (Figure S3).”
−   Voorhoeve et al. (116) employed a novel strategy by combining an miRNA vector library
and corresponding bar code array
−   using a retroviral expression library of miRNAs,
−     Using a novel retroviral miRNA expression library, Agami and co-workers performed a
cell-based screen




                                                                                          12
B1: TAC2012: Add authorʼs text to citation
Voorhoeve, P. M.; le Sage, C et al. (2006). A Genetic Screen Implicates miRNA-372 and
miRNA-373 As Oncogenes in Testicular Germ Cell Tumors, Cell 124 (6) pp.1169 - 1181
Citing goal: “To perform genetic screens for novel functions of miRNAs,”
−   in order to identify miRNAs functionally associated with carcinogenesis
−    to identify miRNAs that when overexpressed could substitute for p53 loss and allow
continued proliferation in the context of Ras activation
Citing method: “We subsequently created a human miRNA expression library (miR-Lib) by
cloning almost all annotated human miRNAs into our vector (Rfam release 6) (Figure S3).”
−   Voorhoeve et al. (116) employed a novel strategy by combining an miRNA vector library
and corresponding bar code array
−   using a retroviral expression library of miRNAs,
−     Using a novel retroviral miRNA expression library, Agami and co-workers performed a
cell-based screen
Citing result: “we identified miR-372-373, each permitting proliferation and tumorigenesis of
primary human cells that harbor both oncogenic RAS and active wildtype p53.”
−    miR-372 and miR-373 were consequently found to permit proliferation and tumorigenesis
of these primary cells carrying both oncogenic RAS and wild-type p53,
−   Voorhoeve et al. (2006) identified miR-372 and miR-373
−   miR-372 and miR-373 were found to allow proliferation of primary human cells that
express oncogenic RAS and active p53,                                                     12
B2: Issue: entities in papers are not exact


 • Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9) and
   from subjects with AD (n11) were obtained from the Rapid Autopsy Program
 • Immunoblot analysis and antibodies
 • The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution,
   Sigma-Aldrich); -tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000,
   Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8
   mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau
   Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356,
   1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000,
   Santa Cruz Biotechnology)…




                                                                         Maryann Martone, Jan 2012:
                           2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012)
B2: Issue: entities in papers are not exact


 • Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9) and
   from subjects with AD (n11) were obtained from the Rapid Autopsy Program
 • Immunoblot analysis and antibodies
 • The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution,
   Sigma-Aldrich); -tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000,
   Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8
   mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau
   Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356,
   1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000,
   Santa Cruz Biotechnology)…




                                                                         Maryann Martone, Jan 2012:
                           2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012)
B2: Issue: entities in papers are not exact


 • Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9) and
   from subjects with AD (n11) were obtained from the Rapid Autopsy Program
 • Immunoblot analysis and antibodies
 • The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution,
   Sigma-Aldrich); -tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000,
   Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8
   mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau
   Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356,
   1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000,
                        •95 antibodies were identified in 8 articles
   Santa Cruz Biotechnology)…

                        •52 did not contain enough information to
                        determine the antibody used




                                                                         Maryann Martone, Jan 2012:
                           2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012)
B3: Issue: methods are written post-mortem




                                       14
B3: Issue: methods are written post-mortem
• Yolanda Gil at ISI modeled Bourne et al. paper in Wings




                                                        14
B3: Issue: methods are written post-mortem
• Yolanda Gil at ISI modeled Bourne et al. paper in Wings
• Anecdotal evidence: Phil Bourne couldn’t remember most
  of this, even after digging through emails!




                                                       14
B3: So why not write the data first and
     wrap the paper around it??
B3: So why not write the data first and
     wrap the paper around it??
                      metadata
                                               1. Research: Each item in the system has metadata (including
                                 metadata      provenance) and relations to other data items added to it.

     metadata




           metadata

                                    metadata
B3: So why not write the data first and
     wrap the paper around it??
                      metadata
                                               1. Research: Each item in the system has metadata (including
                                 metadata      provenance) and relations to other data items added to it.
                                               2. Workflow: All data items created in the lab are added to a
     metadata
                                               (lab-owned) workflow system.




           metadata

                                    metadata
B3: So why not write the data first and
     wrap the paper around it??
                                                                metadata
                                                                                         1. Research: Each item in the system has metadata (including
                                                                           metadata      provenance) and relations to other data items added to it.
                                                                                         2. Workflow: All data items created in the lab are added to a
                 metadata
                                                                                         (lab-owned) workflow system.
                                                                                         3. Authoring: A paper is written in an authoring tool which can pull
                                                                                         data with provenance from the workflow tool in the appropriate
                                                                                         representation into the document.

                               metadata

                                                                              metadata




   Rats	
  were	
  subjected	
  to	
  two	
  grueling	
  
   tests
   (click	
  on	
  fig	
  2	
  to	
  see	
  underlying	
  data).	
  
   These	
  results	
  suggest	
  that	
  the	
  
   neurological	
  pain	
  pro-­‐
B3: So why not write the data first and
     wrap the paper around it??
                                                                   metadata
                                                                                            1. Research: Each item in the system has metadata (including
                                                                              metadata      provenance) and relations to other data items added to it.
                                                                                            2. Workflow: All data items created in the lab are added to a
                    metadata
                                                                                            (lab-owned) workflow system.
                                                                                            3. Authoring: A paper is written in an authoring tool which can pull
                                                                                            data with provenance from the workflow tool in the appropriate
                                                                                            representation into the document.

                                  metadata                                                  4. Editing and review: Once the co-authors agree, the paper is
                                                                                            ‘exposed’ to the editors, who in turn expose it to reviewers.
                                                                                 metadata   Reports are stored in the authoring/editing system, the paper gets
                                                                                            updated, until it is validated.




      Rats	
  were	
  subjected	
  to	
  two	
  grueling	
  
      tests
      (click	
  on	
  fig	
  2	
  to	
  see	
  underlying	
  data).	
  
      These	
  results	
  suggest	
  that	
  the	
  
      neurological	
  pain	
  pro-­‐



  Review
                                                         Revise
                               Edit
B3: So why not write the data first and
     wrap the paper around it??
                                                                   metadata
                                                                                            1. Research: Each item in the system has metadata (including
                                                                              metadata      provenance) and relations to other data items added to it.
                                                                                            2. Workflow: All data items created in the lab are added to a
                    metadata
                                                                                            (lab-owned) workflow system.
                                                                                            3. Authoring: A paper is written in an authoring tool which can pull
                                                                                            data with provenance from the workflow tool in the appropriate
                                                                                            representation into the document.

                                  metadata                                                  4. Editing and review: Once the co-authors agree, the paper is
                                                                                            ‘exposed’ to the editors, who in turn expose it to reviewers.
                                                                                 metadata   Reports are stored in the authoring/editing system, the paper gets
                                                                                            updated, until it is validated.
                                                                                            5. Publishing and distribution: When a paper is published, a
                                                                                            collection of validated information is exposed to the world. It
                                                                                            remains connected to its related data item, and its heritage can
      Rats	
  were	
  subjected	
  to	
  two	
  grueling	
                                  be traced.
      tests
      (click	
  on	
  fig	
  2	
  to	
  see	
  underlying	
  data).	
  
      These	
  results	
  suggest	
  that	
  the	
  
      neurological	
  pain	
  pro-­‐



  Review
                                                         Revise
                               Edit
B3: So why not write the data first and
     wrap the paper around it??
                                                                   metadata
                                                                                            1. Research: Each item in the system has metadata (including
                                                                              metadata      provenance) and relations to other data items added to it.
                                                                                            2. Workflow: All data items created in the lab are added to a
                    metadata
                                                                                            (lab-owned) workflow system.
                                                                                            3. Authoring: A paper is written in an authoring tool which can pull
                                                                                            data with provenance from the workflow tool in the appropriate
                                                                                            representation into the document.

                                  metadata                                                  4. Editing and review: Once the co-authors agree, the paper is
                                                                                            ‘exposed’ to the editors, who in turn expose it to reviewers.
                                                                                 metadata   Reports are stored in the authoring/editing system, the paper gets
                                                                                            updated, until it is validated.
                                                                                            5. Publishing and distribution: When a paper is published, a
                                                                                            collection of validated information is exposed to the world. It
                                                                                            remains connected to its related data item, and its heritage can
      Rats	
  were	
  subjected	
  to	
  two	
  grueling	
                                  be traced.
      tests
      (click	
  on	
  fig	
  2	
  to	
  see	
  underlying	
  data).	
                        6. User applications: distributed applications run on this
      These	
  results	
  suggest	
  that	
  the	
                                          ‘exposed data’ universe.
      neurological	
  pain	
  pro-­‐


                                                                                                                Some	
  other	
  publisher
  Review
                                                         Revise
                               Edit
C. Issue: language




                     16
C. Issue: language
C1:" Language is coherent




                            16
C. Issue: language
C1:" Language is coherent
C2:" Language is narrative




                             16
C. Issue: language
C1:" Language is coherent
C2:" Language is narrative
C3:" Language is abstract




                             16
C. Issue: language
C1:" Language is coherent
C2:" Language is narrative
C3:" Language is abstract




                             16
C1: Language is coherent:
Adding drug-drug interactions to DIKB




                                        17
C1: Language is coherent:
Adding drug-drug interactions to DIKB
• Drug-Interaction Knowledge Base:
  Clinically-oriented, evidence-based knowledge base
  designed to support adding data to product inserts




                                                       17
C1: Language is coherent:
Adding drug-drug interactions to DIKB
• Drug-Interaction Knowledge Base:
  Clinically-oriented, evidence-based knowledge base
  designed to support adding data to product inserts
• Contains quantitative and qualitative assertions about drug
  mechanisms and pharmacokinetic drug-drug interactions
  (DDI) for over 60 drugs




                                                          17
C1: Language is coherent:
Adding drug-drug interactions to DIKB
• Drug-Interaction Knowledge Base:
  Clinically-oriented, evidence-based knowledge base
  designed to support adding data to product inserts
• Contains quantitative and qualitative assertions about drug
  mechanisms and pharmacokinetic drug-drug interactions
  (DDI) for over 60 drugs
• HCLS Sig: Currently working on expanding the DIKB with
  more content and making a “mash‐up” view of package
  inserts adding up‐to‐date information

 View project: http://dbmi-icode-01.dbmi.pitt.edu/dikb-evidence/front-page.html
 SPARQL endpoint: http://dbmi-icode-01.dbmi.pitt.edu:2020/directory/Drugs

                                                                                  17
C1: Coherent language is hard to parse




                                    18
C1: Coherent language is hard to parse
• Self-reference:
    R-CT and its metabolites, studied using the same procedures, had
    properties very similar to those of the corresponding S-enantiomers.




                                                                           18
C1: Coherent language is hard to parse
• Self-reference:
    R-CT and its metabolites, studied using the same procedures, had
    properties very similar to those of the corresponding S-enantiomers.




                                                                           18
C1: Coherent language is hard to parse
• Self-reference:
    R-CT and its metabolites, studied using the same procedures, had
    properties very similar to those of the corresponding S-enantiomers.

• Reference to external data sources:
    Average relative in vivo abundances equivalent to the relative activity
    factors, were estimated using methods described in detail previously
    (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001;
    von Moltke et al., 1999 a,b; Störmer et al., 2000).




                                                                              18
C1: Coherent language is hard to parse
• Self-reference:
    R-CT and its metabolites, studied using the same procedures, had
    properties very similar to those of the corresponding S-enantiomers.

• Reference to external data sources:
    Average relative in vivo abundances equivalent to the relative activity
    factors, were estimated using methods described in detail previously
    (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001;
    von Moltke et al., 1999 a,b; Störmer et al., 2000).




                                                                              18
C1: Coherent language is hard to parse
• Self-reference:
    R-CT and its metabolites, studied using the same procedures, had
    properties very similar to those of the corresponding S-enantiomers.

• Reference to external data sources:
    Average relative in vivo abundances equivalent to the relative activity
    factors, were estimated using methods described in detail previously
    (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001;
    von Moltke et al., 1999 a,b; Störmer et al., 2000).




                                                                              18
C1: Coherent language is hard to parse
• Self-reference:
    R-CT and its metabolites, studied using the same procedures, had
    properties very similar to those of the corresponding S-enantiomers.

• Reference to external data sources:
    Average relative in vivo abundances equivalent to the relative activity
    factors, were estimated using methods described in detail previously
    (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001;
    von Moltke et al., 1999 a,b; Störmer et al., 2000).

• Ways of describing meant for human eyes
    Based on established index reactions, S-CT and S-DCT were negligible
    inhibitors (IC50> 100 µM) of CYP1A2, -2C9, -2C19, -2E1, and -3A, and
    weakly inhibited CYP2D6 (IC50 = 70–80 µM)




                                                                              18
C1: Coherent language is hard to parse
• Self-reference:
    R-CT and its metabolites, studied using the same procedures, had
    properties very similar to those of the corresponding S-enantiomers.

• Reference to external data sources:
    Average relative in vivo abundances equivalent to the relative activity
    factors, were estimated using methods described in detail previously
    (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001;
    von Moltke et al., 1999 a,b; Störmer et al., 2000).

• Ways of describing meant for human eyes
    Based on established index reactions, S-CT and S-DCT were negligible
    inhibitors (IC50> 100 µM) of CYP1A2, -2C9, -2C19, -2E1, and -3A, and
    weakly inhibited CYP2D6 (IC50 = 70–80 µM)

• Many statements wrapped into one:
    S-CT was transformed to S-DCT by CYP2C19 (Km = 69 µM), CYP2D6 (Km
    = 29 µM), and CYP3A4 (Km = 588 µM).
                                                                              18
C2: Issue: Language is narrative




                               19
C2: Issue: Language is narrative
• ‘The truth can only be told in stories’




                                            19
C2: Issue: Language is narrative
• ‘The truth can only be told in stories’
• Complex knowledge such as scientific theories,
  findings, conclusions have a narrative/rhetorical
  structure




                                                 19
C2: Issue: Language is narrative
• ‘The truth can only be told in stories’
• Complex knowledge such as scientific theories,
  findings, conclusions have a narrative/rhetorical
  structure
• Typical pattern: claim/hypothesis, discussion of
  experimental findings, recap of claim, rebuttals,
  recap of claim



                                                 19
C2: Issue: Language is narrative
• ‘The truth can only be told in stories’
• Complex knowledge such as scientific theories,
  findings, conclusions have a narrative/rhetorical
  structure
• Typical pattern: claim/hypothesis, discussion of
  experimental findings, recap of claim, rebuttals,
  recap of claim
• Roughly the same claim appears 4 or 5 times in a
  paper

                                               19
C2: Experiment:ʻClaimed Knowledge Updatesʼ




                                       20
C3: Issue: Language is abstract




                                  21
C3: Issue: Language is abstract
“These results are consistent with those obtained by RPA
and demonstrate that AhR ligands suppress IL-6 mRNA levels
by approximately 40–60%.”
“Data presented in Figure 5A extend previous studies
performed with monocytes by demonstrating that
LPS induces NF-κB-DNA binding in bone marrow stromal cells.”
“An added incentive for these studies was provided by the
observation that the IL-6 gene promoter contains an NF-κB
binding site which plays a major role in regulating LPS-induced
IL-6 transcription [55-57].”
• Purple = deictic/anaphoric markers, pointing to current text
• Blue = metalanguage/epistemic evaluation
• Green = experimental method
• Red = conceptual claim
• Orange = claim referred to in other work
                                                                  21
C3: Formal Language:
          Biological Exchange Language
In a screen for miRNAs that cooperate with oncogenes in cellular transformation,
we identified miR-372 and miR-373, each permitting proliferation and tumorigenesis
of primary human cells that harbor both oncogenic RAS and active wild-type p53.
Increased abundance of miR-372 increases cell proliferation
r(MIR:miR-372) -| bp(GO:”Cell Proliferation”))
Increased abundance of miR-372 increases tumorgenesis
r(MIR:miR-372) -| bp(GO:Tumorgenesis))

We provide evidence that these miRNAs are potential novel oncogenes
participating in the development of human testicular germ cell tumors by numbing
the p53 pathway, thus allowing tumorigenic growth in the presence of wild-type p53.
Increased abundance of miR-372 decreases activity of TP53
r(MIR:miR-372) -| tscript(p(HUGO:Trp53))
Context: cancer
Activity of TP53 decreases cell growth
SET Disease = “Cancer”
tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”



                                                                                      22
C3: Experiment: add epistemic evaluation/
      knowledge attribution to BEL
C3: Experiment: add epistemic evaluation/
          knowledge attribution to BEL
For a Proposition P, an epistemically marked clause E is an
Evaluation of P, EV, B, S(P), with:
-   V = Value:
         3 = Assumed true, 2 = Probable, 1 = Possible,
         0 = Unknown,
         (- 1= possibly untrue, - 2 = probably untrue, -3 = assumed
         untrue)
-   B = Basis:
         Reasoning
         Data
-   S = Source:
         A = speaker is author A, explicit
         IA = speaker author, A, implicit
         N = other author N, explicit
         NN = other author NN, implicit
D. Collections of papers




                           24
D. Collections of papers
D1:" Canʼt search papers easily




                                  24
D. Collections of papers
D1:" Canʼt search papers easily
D2:" Canʼt connect papers well




                                  24
D. Collections of papers
D1:" Canʼt search papers easily
D2:" Canʼt connect papers well
D3:" Canʼt combine knowledge from
different papers




                                    24
D1: Searching collections of papers




                                      25
D1: Searching collections of papers
• It is relatively easy to find a paper you are looking for:
  Google Scholar, Google,..., Scopus... (in that order?)




                                                               25
D1: Searching collections of papers
• It is relatively easy to find a paper you are looking for:
  Google Scholar, Google,..., Scopus... (in that order?)
• But it is very hard to find if something was done about a
  certain topic (e.g. ‘citances’)




                                                           25
D1: Searching collections of papers
• It is relatively easy to find a paper you are looking for:
  Google Scholar, Google,..., Scopus... (in that order?)
• But it is very hard to find if something was done about a
  certain topic (e.g. ‘citances’)
• And it’s impossible to know if nothing was done on a
  topic




                                                           25
D1: Searching collections of papers
• It is relatively easy to find a paper you are looking for:
  Google Scholar, Google,..., Scopus... (in that order?)
• But it is very hard to find if something was done about a
  certain topic (e.g. ‘citances’)
• And it’s impossible to know if nothing was done on a
  topic
• Why aren’t more people working on this?




                                                           25
D1: Searching collections of papers
• It is relatively easy to find a paper you are looking for:
  Google Scholar, Google,..., Scopus... (in that order?)
• But it is very hard to find if something was done about a
  certain topic (e.g. ‘citances’)
• And it’s impossible to know if nothing was done on a
  topic
• Why aren’t more people working on this?
• What happened to the semantic desktop??




                                                           25
D2: How do we connect papers?




                            26
D2: How do we connect papers?
• Papers exist within a con-text: preceding knowledge,
  succeeding knowledge, knowledge in your head or on
  your computer




                                                 26
D2: How do we connect papers?
• Papers exist within a con-text: preceding knowledge,
  succeeding knowledge, knowledge in your head or on
  your computer
• How can we annotate these relations, maintain
  connections, explore ones that others have made?




                                                 26
D2: Experiment:
Annotation in SWAN using DOMEO
                                                      rdf:type
            "#$%&''()*+,-./01'2#341546!                                  !"#$%&'()#*+!

                                   dct:title

                                           ,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!2092
                                                    0-5&.2+&+/.#$&28.0-&*$!!
       G1

        swanrel:referencesAsSupportiveEvidence

                                                      "#$%&''7841%-7.9):0'/9=4(0)'<6!
       G5
               pav:contributedBy
                                                       "#$%&''7841%-7.9):0'%7,;0)'<6!
       G6




                                                                                          27
D2: Experiment:
Annotation in SWAN using DOMEO
                                                      rdf:type
            "#$%&''()*+,-./01'2#341546!                                  !"#$%&'()#*+!

                                   dct:title

                                           ,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!2092
                                                    0-5&.2+&+/.#$&28.0-&*$!!
       G1

        swanrel:referencesAsSupportiveEvidence

                                                      "#$%&''7841%-7.9):0'/9=4(0)'<6!
       G5
               pav:contributedBy
                                                       "#$%&''7841%-7.9):0'%7,;0)'<6!
       G6




                                                                                          27
D2: Experiment:
Annotation in SWAN using DOMEO
                                                      rdf:type
            "#$%&''()*+,-./01'2#341546!                                  !"#$%&'()#*+!

                                   dct:title

                                           ,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!2092
                                                    0-5&.2+&+/.#$&28.0-&*$!!
       G1

        swanrel:referencesAsSupportiveEvidence

                                                      "#$%&''7841%-7.9):0'/9=4(0)'<6!
       G5
               pav:contributedBy
                                                       "#$%&''7841%-7.9):0'%7,;0)'<6!
       G6




                                                                                          27
D2: Experiment:
Annotation in SWAN using DOMEO
                                                      rdf:type
            "#$%&''()*+,-./01'2#341546!                                  !"#$%&'()#*+!

                                   dct:title

                                           ,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!2092
                                                    0-5&.2+&+/.#$&28.0-&*$!!
       G1

        swanrel:referencesAsSupportiveEvidence

                                                      "#$%&''7841%-7.9):0'/9=4(0)'<6!
       G5
               pav:contributedBy
                                                       "#$%&''7841%-7.9):0'%7,;0)'<6!
       G6




                                                                                          27
D3: Tracing the heritage of a statement




                                      28
D3: Tracing the heritage of a statement

• On paper, you can’t see whether a claim or a
  recommendation is valid




                                             28
D3: Tracing the heritage of a statement

• On paper, you can’t see whether a claim or a
  recommendation is valid
• E.g. required to check for clinical
  recommendations:
 –Is this statistically valid?
 –Was it shown for my patient?
 –Are there other things I need to know (side effects,
  funding, etc)



                                                     28
D3: Experiment:
          Linking Clinical Guidelines to Evidence




                                                                                                            B.	
  Elsevier-­‐published	
  
.	
  Philips’	
  Electronic	
  PaNent	
  Records	
                                                          Clinical	
  Guideline




                                                       C.	
  Elsevier	
  (or	
  other	
  publisher’s)	
  
                                                                                                                                             29
                                                       Research	
  Report	
  or	
  Data
D3: Experiment:
          Linking Clinical Guidelines to Evidence
                                                       Step	
  1:	
  PaNent	
  data	
  +	
  diagnosis	
  link	
  
                                                       to	
  Guideline	
  recommendaNon




                                                                                                               B.	
  Elsevier-­‐published	
  
.	
  Philips’	
  Electronic	
  PaNent	
  Records	
                                                             Clinical	
  Guideline




                                                        C.	
  Elsevier	
  (or	
  other	
  publisher’s)	
  
                                                                                                                                                29
                                                        Research	
  Report	
  or	
  Data
D3: Experiment:
          Linking Clinical Guidelines to Evidence
                                                       Step	
  1:	
  PaNent	
  data	
  +	
  diagnosis	
  link	
  
                                                       to	
  Guideline	
  recommendaNon




                                                                                                               B.	
  Elsevier-­‐published	
  
.	
  Philips’	
  Electronic	
  PaNent	
  Records	
                                                             Clinical	
  Guideline

                                                                                              Step	
  2:	
  Guideline	
  recommendaNon	
  links	
  
                                                                                              to	
  research	
  report/data




                                                        C.	
  Elsevier	
  (or	
  other	
  publisher’s)	
  
                                                                                                                                                29
                                                        Research	
  Report	
  or	
  Data
D3: The reality of linking evidence:
Recommenda)on	
  in	
  Guideline                     Level Evidence	
  (in	
  the	
  text)                                  Ref     Recommenda)on	
  in	
  Reference
5.1.	
  Laboratory	
  tests	
  should	
              A-­‐III   No	
  evidence	
  in	
  text                                         No	
  reference
include	
  a	
  CBC	
  count	
  with	
  
differenNal	
  leukocyte	
  count	
  and	
  
platelet	
  count;	
  
5.2.	
  measurement	
  of	
  serum	
  levels	
   A-­‐III       CBC	
  counts	
  and	
  determinaNon	
  of	
  the	
                  No	
  reference
of	
  creaNnine	
  and	
  blood	
  urea	
                      levels	
  of	
  serum	
  creaNnine	
  and	
  urea	
  
nitrogen;	
                                                    nitrogen	
  are	
  needed	
  to	
  plan	
  supporNve	
  
                                                               care	
  and	
  to	
  monitor	
  for	
  the	
  possible	
  
                                                               occurrence	
  of	
  drug	
  toxicity.
5.3.	
  and	
  measurement	
  of	
                  A-­‐III    No	
  evidence	
  in	
  text                                         No	
  reference
electrolytes,	
  hepaNc	
  transaminase	
  
enzymes,	
  and	
  total	
  bilirubin	
  (A-­‐III).

Not	
  menNoned:	
                                             The	
  total	
  volume	
  of	
  blood	
  cultured	
  is	
  a	
   [47] Our	
  data,	
  together	
  with	
  an	
  
GET	
  ENOUGH	
  BLOOD,	
  IN	
  TWO	
                         crucial	
  determinant	
  of	
  detecNng	
  a	
                       analysis	
  of	
  previous	
  studies,	
  
SEPARATE	
  BOTTLES	
                                          bloodstream	
  infecNon	
  [47].                                      show	
  that	
  the	
  yield	
  of	
  blood	
  
                                                                                                                                     cultures	
  in	
  adults	
  increases	
  
                                                               (a	
  ‘‘set’’	
  consists	
  of	
  1	
  venipuncture	
  or	
  
                                                                                                                                     approximately	
  3%	
  per	
  millilitre	
  of	
  
                                                               catheter	
  access	
  draw	
  of	
  20	
  mL	
  of	
  blood	
  
                                                                                                                                     blood	
  cultured.	
  
                                                               divided	
  into	
  1	
  aerobic	
  and	
  1	
  anaerobic	
  
                                                               blood	
  culture	
  bogle).

Not	
  menNoned:	
  REPEAT	
  TESTS                            These	
  tests	
  should	
  be	
  done	
  at	
  least	
  
                                                               every	
  3	
  days	
  during	
  the	
  course	
  of	
  
                                                               intensive	
  anNbioNc	
  therapy.
                                                               At	
  least	
  weekly	
  monitoring	
  of	
  serum	
  
                                                               transaminase	
  levels	
  is	
  advisable	
  for	
                                                          30
In summary:
Type    Problems                    Experiments                Issues
A. Paper format:
A1      Two-dimensional             Utopia, Wolfram CDF        Standards, tools
A2      Linear                      ABCDE                      Adoption?
A3      Not interactive             Executable papers          Adoption
B. Writing habits
B1      Reference to papers         TAC: CItance summaries     Need to start at author
B2      Inexact entity references   NIF antibodies             Need mandate!
B3      Methods post-mortem         Data-centric publishing    Change research recording!
C. Language:
C1      Coherent                    DIKB                       Hard to parse!
C2      Narrative                   CKUs                       Fractal nature of paper
C3      Abstract                    BEL                        Formalize knowledge level
D. Collections of papers:
D1      Can’t find                  Scientific search engines? Is anyone working on this?
D2      Can’t compare               DOMEO/SWAN                 Manual, doesn’t scale
D3      Can’t combine               Evidence-based guidelines Inconsistencies!           31
Have we solved the Big Problem?




                                  32
Have we solved the Big Problem?
  1) Too many papers?
• Do not make publication numbers factor in evaluation
• Do not make conference attendance contingent on publication
• Write fewer papers! Limit yourself to write only what is
  significant and profound (and entertaining!)
2)! Too little time to read?
• Collectively: change expectation of work in a day
• Make grant process less of a waste of time and talent
• Reduce burden of administration on (senior) scientists: reinstate
  departmental administrators!
• Teach administration as a class: Lethbridge journal incubator
• Make time to read some new (or old!) interesting work!

                                                                  32
So how do we tackle all this?
• DERI-Elsevier collaboration - define research projects?
• Perhaps under aegis of Force11?
    • Dagstuhl Workshop in August of 2011: 35 invited
      attendees from different parts of science, industry,
      funding agencies, data centers
    • Goal: map main obstacles preventing new models
      of science publishing and develop ways to
      overcome them
    • Just received funding from Sloan foundation to:
      –Start online community
      –Hold next workshop
      –Collaboratively work on next steps
• Any thoughts?
                                                             33
Acknowledgements/collaborations:
1.Executable papers: Juliana Freire, NYU & Matthias Troyer, ETH Zurich
  (Vistrails); Micah Altman, Harvard SQSS (R), Gloriana St. Claire &
  Mahadev Satyanarayanan, CMU (Olive) (pending IMLS grant)
2.Citance summaries: Lucy Vanderwende, Microsoft Research; Hoa
  Trang, NIST; Eduard Hovy, ISI/USC
3.NIF antibodies: Maryann Martone, NIF/UCSD
4.Data-centric publishing: Phil Bourne, UCSD, Yolanda Gil, ISI/USC
  (funded in part by Elsevier Labs)
5.DIKB: Rich Boyce, U Pittsburgh, Jodi Schneider, DERI, Maria Liakata,
  EBI (looking for funding opportunities!)
6.CKUs: Agnes Sandor, Xerox Research Europe
7.BEL/knowledge attribution: Dexter Pratt, Selventa; Henk Pander Maat,
  University Utrecht (funded in part by NWO)
8.DOMEO/SWAN:Paolo Ciccarese & Tim Clark, Harvard/MGH (funded in
  part by Elsevier Labs)
9.Evidence-based guidelines: Paul Groth, Rinke Hoekstra, Frank van
  Harmelen, VU; Richard Vdovjak, Philips Research (funded by STW)
10.Force11: Phil Bourne, UCSD; Eduard Hovy, ISI/USC; Tim Clark,
  Harvard/MGH; Cameron Neylon, PLoS; Ivan Herman, W3C (funded in
  part by Sloan Foundation)                                          34
Anything here we can work on?
Type    Problems                    Experiments                 Issues
A. Paper format:
A1      Two-dimensional             Utopia, Wolfram CDF         Standards, tools
A2      Linear                      ABCDE                       Adoption?
A3      Not interactive             Executable papers           Adoption
B. Writing habits
B1      Reference to papers         TAC: Citance summaries      Need to start at author
B2      Inexact entity references   NIF antibodies              Need mandate!
B3      Methods post-mortem         Data-centric publishing     Change research recording!
C. Language:
C1      Coherent                    DIKB                        Hard to parse!
C2      Narrative                   CKUs                        Fractal nature of paper
C3      Abstract                    BEL                         Formalize knowledge level
D. Collections of papers:
D1      Can’t find                  Scientific search engines? Is anyone working on this?
D2      Can’t compare               DOMEO/SWAN                  Manual, doesn’t scale
D3      Can’t combine               Evidence-based guidelines Inconsistencies!
Writing less and reading more       Force11, perhaps?         Social/political/personal!35
What about writing completely differently?




[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/
                                                                                                                       36
network-enabled-research/
What about writing completely differently?




[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/
                                                                                                                       36
network-enabled-research/
What about writing completely differently?
  Internet of things: (Bleecker, [1])
  Interact with ‘objects that blog’ or ‘Blogjects’, that:
  track where they are and where they’ve been;
  have histories of their encounters and experiences
  have agency - an assertive voice on the social web [2]




[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/
                                                                                                                       36
network-enabled-research/
What about writing completely differently?
  Internet of things: (Bleecker, [1])
  Interact with ‘objects that blog’ or ‘Blogjects’, that:
  track where they are and where they’ve been;
  have histories of their encounters and experiences
  have agency - an assertive voice on the social web [2]




[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/
                                                                                                                       36
network-enabled-research/
What about writing completely differently?
  Internet of things: (Bleecker, [1])
  Interact with ‘objects that blog’ or ‘Blogjects’, that:
  track where they are and where they’ve been;
  have histories of their encounters and experiences
  have agency - an assertive voice on the social web [2]
  Research Objects: (Bechofer et al, [2])
  Create semantically rich aggregations of resources,
  that can possess some scientific intent or support
  some research objective




[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/
                                                                                                                       36
network-enabled-research/
What about writing completely differently?
  Internet of things: (Bleecker, [1])
  Interact with ‘objects that blog’ or ‘Blogjects’, that:
  track where they are and where they’ve been;
  have histories of their encounters and experiences
  have agency - an assertive voice on the social web [2]
  Research Objects: (Bechofer et al, [2])
  Create semantically rich aggregations of resources,
  that can possess some scientific intent or support
  some research objective




[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/
                                                                                                                       36
network-enabled-research/
What about writing completely differently?
  Internet of things: (Bleecker, [1])
  Interact with ‘objects that blog’ or ‘Blogjects’, that:
  track where they are and where they’ve been;
  have histories of their encounters and experiences
  have agency - an assertive voice on the social web [2]
  Research Objects: (Bechofer et al, [2])
  Create semantically rich aggregations of resources,
  that can possess some scientific intent or support
  some research objective
  Networked Knowledge: (Neylon, [3])
  If we care about taking advantage of the web and
  internet for research then we must tackle the building
  of scholarly communication networks.
  These networks will have two critical characteristics:
  scale and a lack of friction. [3]

[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/
                                                                                                                       36
network-enabled-research/
Networked science in action:
• Galaxy Zoo: citizen science: classify galaxies in the comfort
      of your own home – like Hanny!
•   Tim Gowers, Polymath: “This	
  is	
  to	
  normal	
  research	
  as	
  driving	
  is	
  
      to	
  pushing	
  a	
  car”
•   Mathoverflow: virtual network of mathematicians working
      collectively to answer big/small, clear/fuzzy questions
•   Jean-Claude Bradley: ‘short-form chemistry’: tweet/blog
      about an experiment, Storify into a narrative
•   Read Cameron Neylon’s blog
      on networked science!




                                                                                       37
Anything here we can work on?
Type Problems                       Experiments                  Issues
A. Paper format:
A1       Two-dimensional            Utopia, Wolfram CDF          Standards, tools
A2      Linear                      ABCDE                        Adoption?
A3      Not interactive             Executable papers            Adoption
B. Writing habits
B1      Reference to papers         TAC: Citance summaries       Need to start at author
B2      Inexact entity references   NIF antibodies               Need mandate!
B3      Methods post-mortem         Data-centric publishing      Change research recording!
C. Language:
C1      Coherent                    DIKB                         Hard to parse!
C2      Narrative                   CKUs                         Fractal nature of paper
C3      Abstract                    BEL                          Formalize knowledge level
D. Collections of papers:
D1      Can’t find                  Scientific search engines?   Is anyone working on this?
D2      Can’t compare               DOMEO/SWAN                   Manual, doesn’t scale
D3      Can’t combine               Evidence-based guidelines    Inconsistencies!
Networked science                   Mathoverflow, Bradley        But is it science?
Writing less and reading more       Force11, perhaps?            Social/political/personal!38

Mais conteúdo relacionado

Semelhante a What’s wrong with research papers - and (how) can we fix it?

Mff720 s3 Sentence Outline CAA
Mff720 s3 Sentence Outline CAAMff720 s3 Sentence Outline CAA
Mff720 s3 Sentence Outline CAA
skissel
 

Semelhante a What’s wrong with research papers - and (how) can we fix it? (20)

Putting the science in computer science
Putting the science in computer sciencePutting the science in computer science
Putting the science in computer science
 
jon-on reasearch.ppt
jon-on reasearch.pptjon-on reasearch.ppt
jon-on reasearch.ppt
 
Thesis powerpoint
Thesis powerpointThesis powerpoint
Thesis powerpoint
 
Writing Technical Papers
Writing Technical PapersWriting Technical Papers
Writing Technical Papers
 
Tutorial on Paper-Writing in Applied Mathematics (Preliminary Draft of Slides)
Tutorial on Paper-Writing in Applied Mathematics (Preliminary Draft of Slides)Tutorial on Paper-Writing in Applied Mathematics (Preliminary Draft of Slides)
Tutorial on Paper-Writing in Applied Mathematics (Preliminary Draft of Slides)
 
Chapter 12: Abstract ( english for writing research papers)
Chapter 12: Abstract ( english for writing research papers)Chapter 12: Abstract ( english for writing research papers)
Chapter 12: Abstract ( english for writing research papers)
 
Module 4_ Lesson 1 and 2_with Reviewer.pdf
Module 4_ Lesson 1 and 2_with Reviewer.pdfModule 4_ Lesson 1 and 2_with Reviewer.pdf
Module 4_ Lesson 1 and 2_with Reviewer.pdf
 
Guerilla Alt Text
Guerilla Alt TextGuerilla Alt Text
Guerilla Alt Text
 
Module 4_ Lesson 1 and 2.pptx
Module 4_ Lesson 1 and 2.pptxModule 4_ Lesson 1 and 2.pptx
Module 4_ Lesson 1 and 2.pptx
 
Mff720 s3 Sentence Outline CAA
Mff720 s3 Sentence Outline CAAMff720 s3 Sentence Outline CAA
Mff720 s3 Sentence Outline CAA
 
ENS/OCN 3911 Preparation for Field Projects
ENS/OCN 3911 Preparation for Field ProjectsENS/OCN 3911 Preparation for Field Projects
ENS/OCN 3911 Preparation for Field Projects
 
Paper Writing in Applied Mathematics (slightly updated slides)
Paper Writing in Applied Mathematics (slightly updated slides)Paper Writing in Applied Mathematics (slightly updated slides)
Paper Writing in Applied Mathematics (slightly updated slides)
 
How2research
How2researchHow2research
How2research
 
Research Writing - 2018.07.18
Research Writing - 2018.07.18Research Writing - 2018.07.18
Research Writing - 2018.07.18
 
How to write papers, part 1 principles
How to  write papers, part 1 principlesHow to  write papers, part 1 principles
How to write papers, part 1 principles
 
Lo "AI-infused interfaces for reading AI preprints"
Lo "AI-infused interfaces for reading AI preprints"Lo "AI-infused interfaces for reading AI preprints"
Lo "AI-infused interfaces for reading AI preprints"
 
Data and Donuts: Data organization
Data and Donuts: Data organizationData and Donuts: Data organization
Data and Donuts: Data organization
 
Library Linked Data
Library Linked DataLibrary Linked Data
Library Linked Data
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrases
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
 

Mais de Anita de Waard

Mais de Anita de Waard (20)

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
History of the future
History of the futureHistory of the future
History of the future
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

What’s wrong with research papers - and (how) can we fix it?

  • 1. Whatʼs wrong with research papers - and (how) can we fix it? Anita de Waard Disruptive Technologies Director Elsevier Labs a.dewaard@elsevier.com http://elsatglabs.com/labs/anita
  • 3. The Big Problem: 1)" There are too many papers 2
  • 4. The Big Problem: 1)" There are too many papers 2)" We have too little time to read them 2
  • 5. The Big Problem: 1)" There are too many papers 2)" We have too little time to read them 2
  • 6. To address this problem, we make: 3
  • 7. To address this problem, we make: • databases • text mining tools • nanopublications • data publications • wiki publications • ontologies; ontology integration tools • workflow/data integration systems • executable components • ....and write emails/grants/papers/blogs about this... • ... and we end up with: 3
  • 8. To address this problem, we make: • databases • text mining tools • nanopublications • data publications • wiki publications • ontologies; ontology integration tools • workflow/data integration systems • executable components • ....and write emails/grants/papers/blogs about this... • ... and we end up with: 1)" Even more papers!! 2)" Even less time to read them!! 3
  • 9. What problems are we solving? 4
  • 10. What problems are we solving? • Weʼre mostly improving the format of the research article. 4
  • 11. What problems are we solving? • Weʼre mostly improving the format of the research article. • This talk: aspects of the format that are being improved (and some examples of work to improve them): A.Issues with the paper format B.Issues pertaining to habits of writing C.Issues inherent to language D.Issues in trying to create connected content 4
  • 12. What problems are we solving? • Weʼre mostly improving the format of the research article. • This talk: aspects of the format that are being improved (and some examples of work to improve them): A.Issues with the paper format B.Issues pertaining to habits of writing C.Issues inherent to language D.Issues in trying to create connected content • Do any of these address the Big Problem? 4
  • 13. What problems are we solving? • Weʼre mostly improving the format of the research article. • This talk: aspects of the format that are being improved (and some examples of work to improve them): A.Issues with the paper format B.Issues pertaining to habits of writing C.Issues inherent to language D.Issues in trying to create connected content • Do any of these address the Big Problem? • What shall we do about it? 4
  • 14. A. Issue: the paper format 5
  • 15. A. Issue: the paper format A1:" Paper is two-dimensional 5
  • 16. A. Issue: the paper format A1:" Paper is two-dimensional A2:" Paper is linear 5
  • 17. A. Issue: the paper format A1:" Paper is two-dimensional A2:" Paper is linear A3: Paper is not interactive 5
  • 18. A. Issue: the paper format A1:" Paper is two-dimensional A2:" Paper is linear A3: Paper is not interactive 5
  • 19. A1: Issue: paper is two-dimensional 6
  • 20. A1: Issue: paper is two-dimensional • Some experiments: allow representations of interactive figures (Wolfram Alpha), Utopia, Chem-3d 6
  • 21. A1: Issue: paper is two-dimensional • Some experiments: allow representations of interactive figures (Wolfram Alpha), Utopia, Chem-3d • Lack of experimentation with formats: the internet is multi-dimensional, so why do we still need page limits? 6
  • 22. A1: Issue: paper is two-dimensional • Some experiments: allow representations of interactive figures (Wolfram Alpha), Utopia, Chem-3d • Lack of experimentation with formats: the internet is multi-dimensional, so why do we still need page limits? 6
  • 23. A2: Issue: paper is linear 7
  • 24. A2: Issue: paper is linear • Read from front to back (although research suggests a quick skim to core parts, but linearity helps us do that) 7
  • 25. A2: Issue: paper is linear • Read from front to back (although research suggests a quick skim to core parts, but linearity helps us do that) • References are at the end, so your reading is not interrupted 7
  • 26. A2: Issue: paper is linear • Read from front to back (although research suggests a quick skim to core parts, but linearity helps us do that) • References are at the end, so your reading is not interrupted • Headers are sequential - and not directly accessible 7
  • 28. A2: (Old) Experiment: ABCDE • LaTeX Stylesheet: –Annotation –Background –Contribution –Discussion –Entities (references, projects, terms in ontologies, etc) in RDF –Core sentences create structured abstract 8
  • 29. A2: (Old) Experiment: ABCDE • LaTeX Stylesheet: –Annotation –Background –Contribution –Discussion –Entities (references, projects, terms in ontologies, etc) in RDF –Core sentences create structured abstract • E.g. in proceedings: collect all core Contribution components 8
  • 30. A2: (Old) Experiment: ABCDE • LaTeX Stylesheet: –Annotation –Background –Contribution –Discussion –Entities (references, projects, terms in ontologies, etc) in RDF –Core sentences create structured abstract • E.g. in proceedings: collect all core Contribution components • I still have the stylesheets, if anyone’s interested :-)! 8
  • 31. A3: Paper is not interactive 9
  • 32. A3: Paper is not interactive • Experiment: Executable papers: –Run code within a paper –Experiments: R, SPSS, Vistrails –Rerender code within a paper, change algorithm/see effect; run different dataset –How do you archive software? Satyanarayanan at CMU: Olive, ‘Internet ecosystem of curated virtual machine image collections’ 9
  • 33. B. Issue: habits of writing 10
  • 34. B. Issue: habits of writing B1: Cite a paper - not a claim 10
  • 35. B. Issue: habits of writing B1: Cite a paper - not a claim B2: No precision in describing entities 10
  • 36. B. Issue: habits of writing B1: Cite a paper - not a claim B2: No precision in describing entities B3: We write post-mortems (stories :-)!) 10
  • 37. B1: Citations create facts: 11
  • 38. B1: Citations create facts: - Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK inhibition, possibly through direct inhibition of the expression of the tumorsuppressor LATS2.” 11
  • 39. B1: Citations create facts: - Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK inhibition, possibly through direct inhibition of the expression of the tumorsuppressor LATS2.” - Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor LATS2 (Voorhoeve et al., 2006).” 11
  • 40. B1: Citations create facts: - Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK inhibition, possibly through direct inhibition of the expression of the tumorsuppressor LATS2.” - Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor LATS2 (Voorhoeve et al., 2006).” - Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).” 11
  • 41. B1: Citations create facts: - Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK inhibition, possibly through direct inhibition of the expression of the tumorsuppressor LATS2.” - Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor LATS2 (Voorhoeve et al., 2006).” - Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).” - Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and miR-373, directly inhibit the expression of Lats2, thereby allowing tumorigenic growth in the presence of p53 (Voorhoeve et al., 2006).” 11
  • 42. B1: TAC2012: Add authorʼs text to citation 12
  • 43. B1: TAC2012: Add authorʼs text to citation Voorhoeve, P. M.; le Sage, C et al. (2006). A Genetic Screen Implicates miRNA-372 and miRNA-373 As Oncogenes in Testicular Germ Cell Tumors, Cell 124 (6) pp.1169 - 1181 Citing goal: “To perform genetic screens for novel functions of miRNAs,” − in order to identify miRNAs functionally associated with carcinogenesis − to identify miRNAs that when overexpressed could substitute for p53 loss and allow continued proliferation in the context of Ras activation 12
  • 44. B1: TAC2012: Add authorʼs text to citation Voorhoeve, P. M.; le Sage, C et al. (2006). A Genetic Screen Implicates miRNA-372 and miRNA-373 As Oncogenes in Testicular Germ Cell Tumors, Cell 124 (6) pp.1169 - 1181 Citing goal: “To perform genetic screens for novel functions of miRNAs,” − in order to identify miRNAs functionally associated with carcinogenesis − to identify miRNAs that when overexpressed could substitute for p53 loss and allow continued proliferation in the context of Ras activation Citing method: “We subsequently created a human miRNA expression library (miR-Lib) by cloning almost all annotated human miRNAs into our vector (Rfam release 6) (Figure S3).” − Voorhoeve et al. (116) employed a novel strategy by combining an miRNA vector library and corresponding bar code array − using a retroviral expression library of miRNAs, − Using a novel retroviral miRNA expression library, Agami and co-workers performed a cell-based screen 12
  • 45. B1: TAC2012: Add authorʼs text to citation Voorhoeve, P. M.; le Sage, C et al. (2006). A Genetic Screen Implicates miRNA-372 and miRNA-373 As Oncogenes in Testicular Germ Cell Tumors, Cell 124 (6) pp.1169 - 1181 Citing goal: “To perform genetic screens for novel functions of miRNAs,” − in order to identify miRNAs functionally associated with carcinogenesis − to identify miRNAs that when overexpressed could substitute for p53 loss and allow continued proliferation in the context of Ras activation Citing method: “We subsequently created a human miRNA expression library (miR-Lib) by cloning almost all annotated human miRNAs into our vector (Rfam release 6) (Figure S3).” − Voorhoeve et al. (116) employed a novel strategy by combining an miRNA vector library and corresponding bar code array − using a retroviral expression library of miRNAs, − Using a novel retroviral miRNA expression library, Agami and co-workers performed a cell-based screen Citing result: “we identified miR-372-373, each permitting proliferation and tumorigenesis of primary human cells that harbor both oncogenic RAS and active wildtype p53.” − miR-372 and miR-373 were consequently found to permit proliferation and tumorigenesis of these primary cells carrying both oncogenic RAS and wild-type p53, − Voorhoeve et al. (2006) identified miR-372 and miR-373 − miR-372 and miR-373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, 12
  • 46. B2: Issue: entities in papers are not exact • Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9) and from subjects with AD (n11) were obtained from the Rapid Autopsy Program • Immunoblot analysis and antibodies • The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution, Sigma-Aldrich); -tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000, Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8 mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356, 1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000, Santa Cruz Biotechnology)… Maryann Martone, Jan 2012: 2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012)
  • 47. B2: Issue: entities in papers are not exact • Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9) and from subjects with AD (n11) were obtained from the Rapid Autopsy Program • Immunoblot analysis and antibodies • The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution, Sigma-Aldrich); -tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000, Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8 mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356, 1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000, Santa Cruz Biotechnology)… Maryann Martone, Jan 2012: 2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012)
  • 48. B2: Issue: entities in papers are not exact • Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9) and from subjects with AD (n11) were obtained from the Rapid Autopsy Program • Immunoblot analysis and antibodies • The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution, Sigma-Aldrich); -tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000, Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8 mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356, 1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000, •95 antibodies were identified in 8 articles Santa Cruz Biotechnology)… •52 did not contain enough information to determine the antibody used Maryann Martone, Jan 2012: 2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012)
  • 49. B3: Issue: methods are written post-mortem 14
  • 50. B3: Issue: methods are written post-mortem • Yolanda Gil at ISI modeled Bourne et al. paper in Wings 14
  • 51. B3: Issue: methods are written post-mortem • Yolanda Gil at ISI modeled Bourne et al. paper in Wings • Anecdotal evidence: Phil Bourne couldn’t remember most of this, even after digging through emails! 14
  • 52. B3: So why not write the data first and wrap the paper around it??
  • 53. B3: So why not write the data first and wrap the paper around it?? metadata 1. Research: Each item in the system has metadata (including metadata provenance) and relations to other data items added to it. metadata metadata metadata
  • 54. B3: So why not write the data first and wrap the paper around it?? metadata 1. Research: Each item in the system has metadata (including metadata provenance) and relations to other data items added to it. 2. Workflow: All data items created in the lab are added to a metadata (lab-owned) workflow system. metadata metadata
  • 55. B3: So why not write the data first and wrap the paper around it?? metadata 1. Research: Each item in the system has metadata (including metadata provenance) and relations to other data items added to it. 2. Workflow: All data items created in the lab are added to a metadata (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata metadata Rats  were  subjected  to  two  grueling   tests (click  on  fig  2  to  see  underlying  data).   These  results  suggest  that  the   neurological  pain  pro-­‐
  • 56. B3: So why not write the data first and wrap the paper around it?? metadata 1. Research: Each item in the system has metadata (including metadata provenance) and relations to other data items added to it. 2. Workflow: All data items created in the lab are added to a metadata (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata 4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. metadata Reports are stored in the authoring/editing system, the paper gets updated, until it is validated. Rats  were  subjected  to  two  grueling   tests (click  on  fig  2  to  see  underlying  data).   These  results  suggest  that  the   neurological  pain  pro-­‐ Review Revise Edit
  • 57. B3: So why not write the data first and wrap the paper around it?? metadata 1. Research: Each item in the system has metadata (including metadata provenance) and relations to other data items added to it. 2. Workflow: All data items created in the lab are added to a metadata (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata 4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. metadata Reports are stored in the authoring/editing system, the paper gets updated, until it is validated. 5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related data item, and its heritage can Rats  were  subjected  to  two  grueling   be traced. tests (click  on  fig  2  to  see  underlying  data).   These  results  suggest  that  the   neurological  pain  pro-­‐ Review Revise Edit
  • 58. B3: So why not write the data first and wrap the paper around it?? metadata 1. Research: Each item in the system has metadata (including metadata provenance) and relations to other data items added to it. 2. Workflow: All data items created in the lab are added to a metadata (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata 4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. metadata Reports are stored in the authoring/editing system, the paper gets updated, until it is validated. 5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related data item, and its heritage can Rats  were  subjected  to  two  grueling   be traced. tests (click  on  fig  2  to  see  underlying  data).   6. User applications: distributed applications run on this These  results  suggest  that  the   ‘exposed data’ universe. neurological  pain  pro-­‐ Some  other  publisher Review Revise Edit
  • 60. C. Issue: language C1:" Language is coherent 16
  • 61. C. Issue: language C1:" Language is coherent C2:" Language is narrative 16
  • 62. C. Issue: language C1:" Language is coherent C2:" Language is narrative C3:" Language is abstract 16
  • 63. C. Issue: language C1:" Language is coherent C2:" Language is narrative C3:" Language is abstract 16
  • 64. C1: Language is coherent: Adding drug-drug interactions to DIKB 17
  • 65. C1: Language is coherent: Adding drug-drug interactions to DIKB • Drug-Interaction Knowledge Base: Clinically-oriented, evidence-based knowledge base designed to support adding data to product inserts 17
  • 66. C1: Language is coherent: Adding drug-drug interactions to DIKB • Drug-Interaction Knowledge Base: Clinically-oriented, evidence-based knowledge base designed to support adding data to product inserts • Contains quantitative and qualitative assertions about drug mechanisms and pharmacokinetic drug-drug interactions (DDI) for over 60 drugs 17
  • 67. C1: Language is coherent: Adding drug-drug interactions to DIKB • Drug-Interaction Knowledge Base: Clinically-oriented, evidence-based knowledge base designed to support adding data to product inserts • Contains quantitative and qualitative assertions about drug mechanisms and pharmacokinetic drug-drug interactions (DDI) for over 60 drugs • HCLS Sig: Currently working on expanding the DIKB with more content and making a “mash‐up” view of package inserts adding up‐to‐date information View project: http://dbmi-icode-01.dbmi.pitt.edu/dikb-evidence/front-page.html SPARQL endpoint: http://dbmi-icode-01.dbmi.pitt.edu:2020/directory/Drugs 17
  • 68. C1: Coherent language is hard to parse 18
  • 69. C1: Coherent language is hard to parse • Self-reference: R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers. 18
  • 70. C1: Coherent language is hard to parse • Self-reference: R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers. 18
  • 71. C1: Coherent language is hard to parse • Self-reference: R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers. • Reference to external data sources: Average relative in vivo abundances equivalent to the relative activity factors, were estimated using methods described in detail previously (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001; von Moltke et al., 1999 a,b; Störmer et al., 2000). 18
  • 72. C1: Coherent language is hard to parse • Self-reference: R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers. • Reference to external data sources: Average relative in vivo abundances equivalent to the relative activity factors, were estimated using methods described in detail previously (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001; von Moltke et al., 1999 a,b; Störmer et al., 2000). 18
  • 73. C1: Coherent language is hard to parse • Self-reference: R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers. • Reference to external data sources: Average relative in vivo abundances equivalent to the relative activity factors, were estimated using methods described in detail previously (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001; von Moltke et al., 1999 a,b; Störmer et al., 2000). 18
  • 74. C1: Coherent language is hard to parse • Self-reference: R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers. • Reference to external data sources: Average relative in vivo abundances equivalent to the relative activity factors, were estimated using methods described in detail previously (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001; von Moltke et al., 1999 a,b; Störmer et al., 2000). • Ways of describing meant for human eyes Based on established index reactions, S-CT and S-DCT were negligible inhibitors (IC50> 100 µM) of CYP1A2, -2C9, -2C19, -2E1, and -3A, and weakly inhibited CYP2D6 (IC50 = 70–80 µM) 18
  • 75. C1: Coherent language is hard to parse • Self-reference: R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers. • Reference to external data sources: Average relative in vivo abundances equivalent to the relative activity factors, were estimated using methods described in detail previously (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001; von Moltke et al., 1999 a,b; Störmer et al., 2000). • Ways of describing meant for human eyes Based on established index reactions, S-CT and S-DCT were negligible inhibitors (IC50> 100 µM) of CYP1A2, -2C9, -2C19, -2E1, and -3A, and weakly inhibited CYP2D6 (IC50 = 70–80 µM) • Many statements wrapped into one: S-CT was transformed to S-DCT by CYP2C19 (Km = 69 µM), CYP2D6 (Km = 29 µM), and CYP3A4 (Km = 588 µM). 18
  • 76. C2: Issue: Language is narrative 19
  • 77. C2: Issue: Language is narrative • ‘The truth can only be told in stories’ 19
  • 78. C2: Issue: Language is narrative • ‘The truth can only be told in stories’ • Complex knowledge such as scientific theories, findings, conclusions have a narrative/rhetorical structure 19
  • 79. C2: Issue: Language is narrative • ‘The truth can only be told in stories’ • Complex knowledge such as scientific theories, findings, conclusions have a narrative/rhetorical structure • Typical pattern: claim/hypothesis, discussion of experimental findings, recap of claim, rebuttals, recap of claim 19
  • 80. C2: Issue: Language is narrative • ‘The truth can only be told in stories’ • Complex knowledge such as scientific theories, findings, conclusions have a narrative/rhetorical structure • Typical pattern: claim/hypothesis, discussion of experimental findings, recap of claim, rebuttals, recap of claim • Roughly the same claim appears 4 or 5 times in a paper 19
  • 82. C3: Issue: Language is abstract 21
  • 83. C3: Issue: Language is abstract “These results are consistent with those obtained by RPA and demonstrate that AhR ligands suppress IL-6 mRNA levels by approximately 40–60%.” “Data presented in Figure 5A extend previous studies performed with monocytes by demonstrating that LPS induces NF-κB-DNA binding in bone marrow stromal cells.” “An added incentive for these studies was provided by the observation that the IL-6 gene promoter contains an NF-κB binding site which plays a major role in regulating LPS-induced IL-6 transcription [55-57].” • Purple = deictic/anaphoric markers, pointing to current text • Blue = metalanguage/epistemic evaluation • Green = experimental method • Red = conceptual claim • Orange = claim referred to in other work 21
  • 84. C3: Formal Language: Biological Exchange Language In a screen for miRNAs that cooperate with oncogenes in cellular transformation, we identified miR-372 and miR-373, each permitting proliferation and tumorigenesis of primary human cells that harbor both oncogenic RAS and active wild-type p53. Increased abundance of miR-372 increases cell proliferation r(MIR:miR-372) -| bp(GO:”Cell Proliferation”)) Increased abundance of miR-372 increases tumorgenesis r(MIR:miR-372) -| bp(GO:Tumorgenesis)) We provide evidence that these miRNAs are potential novel oncogenes participating in the development of human testicular germ cell tumors by numbing the p53 pathway, thus allowing tumorigenic growth in the presence of wild-type p53. Increased abundance of miR-372 decreases activity of TP53 r(MIR:miR-372) -| tscript(p(HUGO:Trp53)) Context: cancer Activity of TP53 decreases cell growth SET Disease = “Cancer” tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth” 22
  • 85. C3: Experiment: add epistemic evaluation/ knowledge attribution to BEL
  • 86. C3: Experiment: add epistemic evaluation/ knowledge attribution to BEL For a Proposition P, an epistemically marked clause E is an Evaluation of P, EV, B, S(P), with: - V = Value: 3 = Assumed true, 2 = Probable, 1 = Possible, 0 = Unknown, (- 1= possibly untrue, - 2 = probably untrue, -3 = assumed untrue) - B = Basis: Reasoning Data - S = Source: A = speaker is author A, explicit IA = speaker author, A, implicit N = other author N, explicit NN = other author NN, implicit
  • 87. D. Collections of papers 24
  • 88. D. Collections of papers D1:" Canʼt search papers easily 24
  • 89. D. Collections of papers D1:" Canʼt search papers easily D2:" Canʼt connect papers well 24
  • 90. D. Collections of papers D1:" Canʼt search papers easily D2:" Canʼt connect papers well D3:" Canʼt combine knowledge from different papers 24
  • 92. D1: Searching collections of papers • It is relatively easy to find a paper you are looking for: Google Scholar, Google,..., Scopus... (in that order?) 25
  • 93. D1: Searching collections of papers • It is relatively easy to find a paper you are looking for: Google Scholar, Google,..., Scopus... (in that order?) • But it is very hard to find if something was done about a certain topic (e.g. ‘citances’) 25
  • 94. D1: Searching collections of papers • It is relatively easy to find a paper you are looking for: Google Scholar, Google,..., Scopus... (in that order?) • But it is very hard to find if something was done about a certain topic (e.g. ‘citances’) • And it’s impossible to know if nothing was done on a topic 25
  • 95. D1: Searching collections of papers • It is relatively easy to find a paper you are looking for: Google Scholar, Google,..., Scopus... (in that order?) • But it is very hard to find if something was done about a certain topic (e.g. ‘citances’) • And it’s impossible to know if nothing was done on a topic • Why aren’t more people working on this? 25
  • 96. D1: Searching collections of papers • It is relatively easy to find a paper you are looking for: Google Scholar, Google,..., Scopus... (in that order?) • But it is very hard to find if something was done about a certain topic (e.g. ‘citances’) • And it’s impossible to know if nothing was done on a topic • Why aren’t more people working on this? • What happened to the semantic desktop?? 25
  • 97. D2: How do we connect papers? 26
  • 98. D2: How do we connect papers? • Papers exist within a con-text: preceding knowledge, succeeding knowledge, knowledge in your head or on your computer 26
  • 99. D2: How do we connect papers? • Papers exist within a con-text: preceding knowledge, succeeding knowledge, knowledge in your head or on your computer • How can we annotate these relations, maintain connections, explore ones that others have made? 26
  • 100. D2: Experiment: Annotation in SWAN using DOMEO rdf:type "#$%&''()*+,-./01'2#341546! !"#$%&'()#*+! dct:title ,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!2092 0-5&.2+&+/.#$&28.0-&*$!! G1 swanrel:referencesAsSupportiveEvidence "#$%&''7841%-7.9):0'/9=4(0)'<6! G5 pav:contributedBy "#$%&''7841%-7.9):0'%7,;0)'<6! G6 27
  • 101. D2: Experiment: Annotation in SWAN using DOMEO rdf:type "#$%&''()*+,-./01'2#341546! !"#$%&'()#*+! dct:title ,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!2092 0-5&.2+&+/.#$&28.0-&*$!! G1 swanrel:referencesAsSupportiveEvidence "#$%&''7841%-7.9):0'/9=4(0)'<6! G5 pav:contributedBy "#$%&''7841%-7.9):0'%7,;0)'<6! G6 27
  • 102. D2: Experiment: Annotation in SWAN using DOMEO rdf:type "#$%&''()*+,-./01'2#341546! !"#$%&'()#*+! dct:title ,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!2092 0-5&.2+&+/.#$&28.0-&*$!! G1 swanrel:referencesAsSupportiveEvidence "#$%&''7841%-7.9):0'/9=4(0)'<6! G5 pav:contributedBy "#$%&''7841%-7.9):0'%7,;0)'<6! G6 27
  • 103. D2: Experiment: Annotation in SWAN using DOMEO rdf:type "#$%&''()*+,-./01'2#341546! !"#$%&'()#*+! dct:title ,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!2092 0-5&.2+&+/.#$&28.0-&*$!! G1 swanrel:referencesAsSupportiveEvidence "#$%&''7841%-7.9):0'/9=4(0)'<6! G5 pav:contributedBy "#$%&''7841%-7.9):0'%7,;0)'<6! G6 27
  • 104. D3: Tracing the heritage of a statement 28
  • 105. D3: Tracing the heritage of a statement • On paper, you can’t see whether a claim or a recommendation is valid 28
  • 106. D3: Tracing the heritage of a statement • On paper, you can’t see whether a claim or a recommendation is valid • E.g. required to check for clinical recommendations: –Is this statistically valid? –Was it shown for my patient? –Are there other things I need to know (side effects, funding, etc) 28
  • 107. D3: Experiment: Linking Clinical Guidelines to Evidence B.  Elsevier-­‐published   .  Philips’  Electronic  PaNent  Records   Clinical  Guideline C.  Elsevier  (or  other  publisher’s)   29 Research  Report  or  Data
  • 108. D3: Experiment: Linking Clinical Guidelines to Evidence Step  1:  PaNent  data  +  diagnosis  link   to  Guideline  recommendaNon B.  Elsevier-­‐published   .  Philips’  Electronic  PaNent  Records   Clinical  Guideline C.  Elsevier  (or  other  publisher’s)   29 Research  Report  or  Data
  • 109. D3: Experiment: Linking Clinical Guidelines to Evidence Step  1:  PaNent  data  +  diagnosis  link   to  Guideline  recommendaNon B.  Elsevier-­‐published   .  Philips’  Electronic  PaNent  Records   Clinical  Guideline Step  2:  Guideline  recommendaNon  links   to  research  report/data C.  Elsevier  (or  other  publisher’s)   29 Research  Report  or  Data
  • 110. D3: The reality of linking evidence: Recommenda)on  in  Guideline Level Evidence  (in  the  text) Ref Recommenda)on  in  Reference 5.1.  Laboratory  tests  should   A-­‐III No  evidence  in  text No  reference include  a  CBC  count  with   differenNal  leukocyte  count  and   platelet  count;   5.2.  measurement  of  serum  levels   A-­‐III CBC  counts  and  determinaNon  of  the   No  reference of  creaNnine  and  blood  urea   levels  of  serum  creaNnine  and  urea   nitrogen;   nitrogen  are  needed  to  plan  supporNve   care  and  to  monitor  for  the  possible   occurrence  of  drug  toxicity. 5.3.  and  measurement  of   A-­‐III No  evidence  in  text No  reference electrolytes,  hepaNc  transaminase   enzymes,  and  total  bilirubin  (A-­‐III). Not  menNoned:   The  total  volume  of  blood  cultured  is  a   [47] Our  data,  together  with  an   GET  ENOUGH  BLOOD,  IN  TWO   crucial  determinant  of  detecNng  a   analysis  of  previous  studies,   SEPARATE  BOTTLES   bloodstream  infecNon  [47]. show  that  the  yield  of  blood   cultures  in  adults  increases   (a  ‘‘set’’  consists  of  1  venipuncture  or   approximately  3%  per  millilitre  of   catheter  access  draw  of  20  mL  of  blood   blood  cultured.   divided  into  1  aerobic  and  1  anaerobic   blood  culture  bogle). Not  menNoned:  REPEAT  TESTS These  tests  should  be  done  at  least   every  3  days  during  the  course  of   intensive  anNbioNc  therapy. At  least  weekly  monitoring  of  serum   transaminase  levels  is  advisable  for   30
  • 111. In summary: Type Problems Experiments Issues A. Paper format: A1 Two-dimensional Utopia, Wolfram CDF Standards, tools A2 Linear ABCDE Adoption? A3 Not interactive Executable papers Adoption B. Writing habits B1 Reference to papers TAC: CItance summaries Need to start at author B2 Inexact entity references NIF antibodies Need mandate! B3 Methods post-mortem Data-centric publishing Change research recording! C. Language: C1 Coherent DIKB Hard to parse! C2 Narrative CKUs Fractal nature of paper C3 Abstract BEL Formalize knowledge level D. Collections of papers: D1 Can’t find Scientific search engines? Is anyone working on this? D2 Can’t compare DOMEO/SWAN Manual, doesn’t scale D3 Can’t combine Evidence-based guidelines Inconsistencies! 31
  • 112. Have we solved the Big Problem? 32
  • 113. Have we solved the Big Problem? 1) Too many papers? • Do not make publication numbers factor in evaluation • Do not make conference attendance contingent on publication • Write fewer papers! Limit yourself to write only what is significant and profound (and entertaining!) 2)! Too little time to read? • Collectively: change expectation of work in a day • Make grant process less of a waste of time and talent • Reduce burden of administration on (senior) scientists: reinstate departmental administrators! • Teach administration as a class: Lethbridge journal incubator • Make time to read some new (or old!) interesting work! 32
  • 114. So how do we tackle all this? • DERI-Elsevier collaboration - define research projects? • Perhaps under aegis of Force11? • Dagstuhl Workshop in August of 2011: 35 invited attendees from different parts of science, industry, funding agencies, data centers • Goal: map main obstacles preventing new models of science publishing and develop ways to overcome them • Just received funding from Sloan foundation to: –Start online community –Hold next workshop –Collaboratively work on next steps • Any thoughts? 33
  • 115. Acknowledgements/collaborations: 1.Executable papers: Juliana Freire, NYU & Matthias Troyer, ETH Zurich (Vistrails); Micah Altman, Harvard SQSS (R), Gloriana St. Claire & Mahadev Satyanarayanan, CMU (Olive) (pending IMLS grant) 2.Citance summaries: Lucy Vanderwende, Microsoft Research; Hoa Trang, NIST; Eduard Hovy, ISI/USC 3.NIF antibodies: Maryann Martone, NIF/UCSD 4.Data-centric publishing: Phil Bourne, UCSD, Yolanda Gil, ISI/USC (funded in part by Elsevier Labs) 5.DIKB: Rich Boyce, U Pittsburgh, Jodi Schneider, DERI, Maria Liakata, EBI (looking for funding opportunities!) 6.CKUs: Agnes Sandor, Xerox Research Europe 7.BEL/knowledge attribution: Dexter Pratt, Selventa; Henk Pander Maat, University Utrecht (funded in part by NWO) 8.DOMEO/SWAN:Paolo Ciccarese & Tim Clark, Harvard/MGH (funded in part by Elsevier Labs) 9.Evidence-based guidelines: Paul Groth, Rinke Hoekstra, Frank van Harmelen, VU; Richard Vdovjak, Philips Research (funded by STW) 10.Force11: Phil Bourne, UCSD; Eduard Hovy, ISI/USC; Tim Clark, Harvard/MGH; Cameron Neylon, PLoS; Ivan Herman, W3C (funded in part by Sloan Foundation) 34
  • 116. Anything here we can work on? Type Problems Experiments Issues A. Paper format: A1 Two-dimensional Utopia, Wolfram CDF Standards, tools A2 Linear ABCDE Adoption? A3 Not interactive Executable papers Adoption B. Writing habits B1 Reference to papers TAC: Citance summaries Need to start at author B2 Inexact entity references NIF antibodies Need mandate! B3 Methods post-mortem Data-centric publishing Change research recording! C. Language: C1 Coherent DIKB Hard to parse! C2 Narrative CKUs Fractal nature of paper C3 Abstract BEL Formalize knowledge level D. Collections of papers: D1 Can’t find Scientific search engines? Is anyone working on this? D2 Can’t compare DOMEO/SWAN Manual, doesn’t scale D3 Can’t combine Evidence-based guidelines Inconsistencies! Writing less and reading more Force11, perhaps? Social/political/personal!35
  • 117. What about writing completely differently? [[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/ 2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA. http://precedings.nature.com/documents/4626/version/1 [3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/ 36 network-enabled-research/
  • 118. What about writing completely differently? [[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/ 2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA. http://precedings.nature.com/documents/4626/version/1 [3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/ 36 network-enabled-research/
  • 119. What about writing completely differently? Internet of things: (Bleecker, [1]) Interact with ‘objects that blog’ or ‘Blogjects’, that: track where they are and where they’ve been; have histories of their encounters and experiences have agency - an assertive voice on the social web [2] [[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/ 2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA. http://precedings.nature.com/documents/4626/version/1 [3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/ 36 network-enabled-research/
  • 120. What about writing completely differently? Internet of things: (Bleecker, [1]) Interact with ‘objects that blog’ or ‘Blogjects’, that: track where they are and where they’ve been; have histories of their encounters and experiences have agency - an assertive voice on the social web [2] [[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/ 2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA. http://precedings.nature.com/documents/4626/version/1 [3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/ 36 network-enabled-research/
  • 121. What about writing completely differently? Internet of things: (Bleecker, [1]) Interact with ‘objects that blog’ or ‘Blogjects’, that: track where they are and where they’ve been; have histories of their encounters and experiences have agency - an assertive voice on the social web [2] Research Objects: (Bechofer et al, [2]) Create semantically rich aggregations of resources, that can possess some scientific intent or support some research objective [[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/ 2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA. http://precedings.nature.com/documents/4626/version/1 [3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/ 36 network-enabled-research/
  • 122. What about writing completely differently? Internet of things: (Bleecker, [1]) Interact with ‘objects that blog’ or ‘Blogjects’, that: track where they are and where they’ve been; have histories of their encounters and experiences have agency - an assertive voice on the social web [2] Research Objects: (Bechofer et al, [2]) Create semantically rich aggregations of resources, that can possess some scientific intent or support some research objective [[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/ 2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA. http://precedings.nature.com/documents/4626/version/1 [3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/ 36 network-enabled-research/
  • 123. What about writing completely differently? Internet of things: (Bleecker, [1]) Interact with ‘objects that blog’ or ‘Blogjects’, that: track where they are and where they’ve been; have histories of their encounters and experiences have agency - an assertive voice on the social web [2] Research Objects: (Bechofer et al, [2]) Create semantically rich aggregations of resources, that can possess some scientific intent or support some research objective Networked Knowledge: (Neylon, [3]) If we care about taking advantage of the web and internet for research then we must tackle the building of scholarly communication networks. These networks will have two critical characteristics: scale and a lack of friction. [3] [[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/ 2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA. http://precedings.nature.com/documents/4626/version/1 [3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/ 36 network-enabled-research/
  • 124. Networked science in action: • Galaxy Zoo: citizen science: classify galaxies in the comfort of your own home – like Hanny! • Tim Gowers, Polymath: “This  is  to  normal  research  as  driving  is   to  pushing  a  car” • Mathoverflow: virtual network of mathematicians working collectively to answer big/small, clear/fuzzy questions • Jean-Claude Bradley: ‘short-form chemistry’: tweet/blog about an experiment, Storify into a narrative • Read Cameron Neylon’s blog on networked science! 37
  • 125. Anything here we can work on? Type Problems Experiments Issues A. Paper format: A1 Two-dimensional Utopia, Wolfram CDF Standards, tools A2 Linear ABCDE Adoption? A3 Not interactive Executable papers Adoption B. Writing habits B1 Reference to papers TAC: Citance summaries Need to start at author B2 Inexact entity references NIF antibodies Need mandate! B3 Methods post-mortem Data-centric publishing Change research recording! C. Language: C1 Coherent DIKB Hard to parse! C2 Narrative CKUs Fractal nature of paper C3 Abstract BEL Formalize knowledge level D. Collections of papers: D1 Can’t find Scientific search engines? Is anyone working on this? D2 Can’t compare DOMEO/SWAN Manual, doesn’t scale D3 Can’t combine Evidence-based guidelines Inconsistencies! Networked science Mathoverflow, Bradley But is it science? Writing less and reading more Force11, perhaps? Social/political/personal!38