SlideShare uma empresa Scribd logo
1 de 82
Baixar para ler offline
Measuring progress
toward a cultural norm of
  shared (and reused!)
biomedical research data
          Heather Piwowar

   Department of Biomedical Informatics
         University of Pittsburgh
Sharing research data




http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/
Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif;
http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
Sharing research data




http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/
Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif;
http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
Sharing research data




         PAST MEDICAL HISTORY:
         Past medical history showed she had
         superficial phlebitis times two in the past, had
         non-insulin dependent diabetes mellitus for
         four years.
         She had been hypothyroid for three years.
         HISTORY OF PRESENT ILLNESS:
         The patient is a 58-year-old female, …
http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/
Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif;
http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
Sharing research data




         PAST MEDICAL HISTORY:
         Past medical history showed she had
         superficial phlebitis times two in the past, had
         non-insulin dependent diabetes mellitus for
         four years.
         She had been hypothyroid for three years.
         HISTORY OF PRESENT ILLNESS:
         The patient is a 58-year-old female, …
http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/
Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif;
http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
Sharing research data




         PAST MEDICAL HISTORY:
         Past medical history showed she had
         superficial phlebitis times two in the past, had
         non-insulin dependent diabetes mellitus for
         four years.
         She had been hypothyroid for three years.
         HISTORY OF PRESENT ILLNESS:
         The patient is a 58-year-old female, …
http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/
Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif;
http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
Sharing research data




         PAST MEDICAL HISTORY:
         Past medical history showed she had
         superficial phlebitis times two in the past, had
         non-insulin dependent diabetes mellitus for
         four years.
         She had been hypothyroid for three years.
         HISTORY OF PRESENT ILLNESS:
         The patient is a 58-year-old female, …
http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/
Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif;
http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
Shared data benefits science
 Verify
 Understand
 Extend
 Explore
 Combine
 Synergize
 Train
 Reduce
But... costly for authors
    Find
    Organize
    Document
    Deidentify
    Format
    Decide
    Ask
    Submit

    Answer questions
    Worry about mistakes being found
    Worry about data being misinterpreted
    Worry about being scooped
    Forgo money and IP and prestige???
As a result, policy makers have spent 
 lots of time and money ....




                      http://www.flickr.com/photos/johnnyvulkan/381941233/
                           http://www.flickr.com/photos/tonivc/2283676770/
... on initiatives, requests, 
  requirements, and tools
     NIH data sharing plan requirement

     Journal requirements

     Public databases

     Data sharing grids like BIRN and caBIG

     Data formatting standards

     Editorials, letters to the editor, discussion....
http://www.flickr.com/photos/mesh/14102209/
lots of data sharing!




                        http://www.genome.jp/en/db_growth.html
but how much isn’t 
 shared?

  what isn’t shared?
              who isn’t sharing it?
why not?
     how much does it matter?
             what can we do 
              about it?
you can not manage 
what you do not measure




               http://www.flickr.com/photos/archeon/2941655917/
research questions

  1. Is there benefit for those who share?
  2. Do journal policies increase rates of sharing?
  3. What other factors are correlated with
     sharing and withholding data?
http://en.wikipedia.org/wiki/DNA_microarray
   http://en.wikipedia.org/wiki/Image:Heatmap.png
   http://commons.wikimedia.org/wiki/
       File:DNA_double_helix_vertikal.PNG




microarray
      data
microarray
      data
1. Is there benefit for 
 those who share?




                 http://www.flickr.com/photos/sunrise/35819369/
currency of value?

     Citations.

           $50!




                     Diamond,Arthur M. What is a Citation Worth?.
                        The Journal of Human Resources (1986)
                        vol. 21 (2) pp. 200-215
Prior work focused on the citation
 advantage of an open access
 publishing model.

Our question: are articles that share
 their raw research data cited more
 than articles that don’t?
dataset
85 cancer microarray trials published in 1999-2003, as
identified by Ntzani and Ioannidis (2003)

citations
ISI Web of Science Citation index, citations from
2004-2005

data sharing locations
Publisher and lab websites, microarray databases, WayBack
Internet Archive, Oncomine

statistics
Multivariate linear regression
Note:
 log
 scale
In multivariate regression, we found studies
that had made their data publicly available
received 69% more citations than similar
studies that did not share their data
(95% confidence interval: 18% to 143%)

Piwowar, Day and Fridsma (2007) Sharing Detailed Research
Data Is Associated with Increased Citation Rate.
PLoS ONE 2(3): e308
future work
     • collect a larger dataset for citation
       analysis (stay tuned)

     • investigate other datatypes
     • examine citation context
2. Do journal data sharing 
 policies increase sharing?




                 http://www.flickr.com/photos/ryanr/142455033/
“An inherent principle of
 publication is that others
 should be able to replicate and
 build upon the authors'
 published claims. Therefore, a
 condition of publication
 in a Nature journal is that
 authors are required to make
 materials, data and associated
 protocols available in a
 publicly accessible database
 …”
         http://www.nature.com/authors/editorial_policies/availability.html
             http://www.nature.com/nature/journal/v453/n7197/index.html
Prior work examined data sharing
 policies in biomedicine, but these
 reviews are now dated,
 consider a variety of resources,
 and don’t correlate policy to
 behaviour.




          McCain. Science Communication, Vol. 16, No. 4. (1 June 1995), pp. 403-431
                 NAS. Sharing Publication-Related Data and Materials. (2003), p. 33
Our aim: look at data sharing policies
 within Instruction to Author
 statements of 70 journals, as they
 apply to gene expression microarray
 data.
content of data sharing policies

   Very diverse policies in terms of:
    •   statements of policy motivation
    •   datatype-specific policies
    •   requested vs. required
    •   data location
    •   data format
    •   data completeness
    •   timeliness of sharing
    •   consequences for not sharing
    •   exceptions
strength of data sharing policies

    No applicable policy (43%)


    Weak policy (24%)
      should, recommend, request
      must, but without database accession number
    Strong policy (33%)
      must, required, condition of publication
      requires database accession number
strength of data sharing policies
multivariate associations
                                         •! Biochemistry
                                         &Molecular Biology
  Impact       Open         Society
                                         •! Oncology
  Factor       Access?      Publisher?




           Journal has a data sharing policy?
strength of data sharing policies
associated with impact factor
                   High-impact journals
                        tend to have
                   a strong data-sharing
                           policy
data sharing policies
associated with amount of sharing

     For each of the 70 journals,
         we measured the percent of articles that
         were cited from within GEO and
         ArrayExpress.


     We considered this a proxy for percent of articles
     with shared data.
data sharing policies
associated with amount of sharing

           Having a data-sharing policy?   •! Genetics &
                                               Heredity
 Impact   Open                 Society     •! Multidisciplinary
 Factor   Access?            Publisher?        Sciences




          % of articles with shared data
•   our corpus of “gene expression microarray” articles
    may have included some that reused data and did not
    themselves produce primary data

•   these results should be considered preliminary, pending
    a more precise filter (stay tuned)


                                      http://www.flickr.com/photos/vlastula/300102949/
future work on journal policies

    • use a more precise filter to isolate
      data producing articles and thereby
      understand the absolute levels of data
      sharing
    • investigate other datatypes
    • look at associations with reviewer
      instructions and opinions
future work on funder policies

    • are they effective? (stay tuned)
    • what do people propose in data
      sharing plans? Do they do what they
      propose? Why not?
    • quantify the perceived worth of data
      sharing plans and accomplishments in
      funding and promotion decisions
3. What other factors are 
 correlated with sharing 
 and withholding data?




                   http://www.flickr.com/photos/cogdog/123072/
Prior work has focused on surveys and
studies of intention.


Our aim: measure associations between
observed data sharing behaviour and
environmental variables

                             Blumenthal et al. Acad Med. 2006
                                   Campbell et al. JAMA. 2002
                           Kyzas et al. J Natl Cancer Inst. 2005
                                  Vogeli et al. Acad Med. 2006
                                 Reidpath et al. Bioethics 2001
pilot dataset


  Ochsner et al. manually reviewed 20 journals for 2007:
       400 studies
       200 shared their microarray data


  Ochsner et al. (2008). Much room for improvement in
  deposition rates of expression microarray datasets. Nature
  Methods, 5(12), 991.
pilot variables

                          Journal
  Funder     Journal                     Investigator
                          impact
 mandates   mandates                    “experience”
                           factor




              Is research data shared
                  after publication?
funder mandates



 NIH 2003 Data Sharing Requirement

 Requires a data sharing plan
 for studies funded after October 2003
 that receive more than $500 000 in direct funding per year
funder mandates


 Assumed data sharing requirement was applicable if:
 the NIH grant numbers associated with PubMed entry had

    $750 000 in total funding any year since 2004
    plus
    a NIH grant number with a leading “1” or “2” since 2004
author experience
   Publication history and impact proxy

   First and last authors:
   • years since first paper
   • h-index (the largest number N such that
      an author has N papers cited at least N
      times)
   • a-index
author experience
Derived h-index (pubmedi citation indices):

 Author publication
 history:

 Author name           Author-ity web service
                       Torvik & Smalheiser. (2009). Author Name
 disambiguation:       Disambiguation in MEDLINE. ACM Transactions on
                       Knowledge Discovery from Data, 3(3):11.


 Citation
 counts:
pilot variables

                          Journal
  Funder     Journal                     Investigator
                          impact
 mandates   mandates                    “experience”
                           factor




              Is research data shared
                  after publication?
stats

    Univariate odds ratios
    Multivariate logistic regression
results of pilot
 Not statistically significant             Statistically significant



                                    Journal
  Funder             Journal                        Investigator
                                    impact
 mandates           mandates                       “experience”
                                     factor




                        Is research data shared
                            after publication?
results of pilot


                   33%
results of pilot
results of pilot
results of pilot
results of pilot
results of pilot
results of pilot
PhD dissertation

  More samples,
  more variables




                   http://www.flickr.com/photos/krcla/2069243613/
More samples:

  Developed and evaluated automated
  methods to:

   • Identify studies that generate datasets that
    could potentially be shared

   • Determine which of these have in fact been
    shared
To identify studies that generate datasets,

use a query on the full text of published articles:
  ("gene expression" AND microarray AND cell AND rna)
  AND (rneasy OR trizol OR "real-time pcr")
  NOT (“tissue microarray*” OR “cpg island*”)
To determine which articles have shared data,

use a query on the full text of published articles:
  pubmed_gds[filter] and query ArrayExpress
More variables:

  Use PubMed and a variety of other internet
  resources...
Funder       Journal       Investigator   Institution     Study

funded by     impact         years since   sector        humans?
NIH?          factor         first paper
                                           size          mice?
size of       strength of    h-index
grant         policy                       impact        plants?
                             a-index       rank
sharing       open                                       cancer?
plan req’d?   access?        previously    country
                             shared?                     clinical
funded by     number of                                  trial?
non-NIH?      microarray     previously
                             reused?                     number of
              studies                                    authors
              published      gender
                                                         year
stats

    Univariate odds ratios
    Multivariate logistic regression
    Exploratory factor analysis
results?




           http://www.flickr.com/photos/skrb/2427171774/
research questions

  1. Is there benefit for those who share?
  2. Do journal policies increase rates of sharing?
  3. What other factors are correlated with
     sharing and withholding data?
what’s next?
future work previously mentioned...

     • citation analysis of larger cohort
     • journal policies with refined filter
     • beyond microarray data
     • deeper into journal and funder policies
     • and, finally....
Reuse.




         http://www.flickr.com/photos/boitabulle/3668162701/
who reuses data?
                  why?
      when?
                     who doesn’t?
 which datasets are most likely 
  to be reused?
       how many datasets could be 
        reused but aren’t?
   why aren’t they?
                 what can we do 
                  about it?
One possible reuse research agenda

  1. Inventory reuse acknowlegement patterns
  2. Build full-text and metadata filters to identify
     instances of data reuse
  3. Analyze patterns in data reuse choices
  4. Survey data producers and data consumers
     to augment with intentions and perspectives
Resources

 • GEO list of reuse
   articles (currently 618)
 • Previous work in citation context
   classification
 • Amazon Mechanical Turk for annotation
 • Experimental Philosophy for insight into
   cultural norms
 • ...                        Teufel et al. (2006) Automatic classification
                                of citation function. EMNLP.
Stakeholders
  • readers
  • reusers             For their perspectives,

  • authors           and also to design studies
                     that have actionable results
                           for these groups
  • editors
  • reviewers
  • funders
  • database designers, maintainers, curators
  • patients, subjects, or populations
Data sharing plan



  I post my data, code, and statistical scripts at
  http://www.dbmi.pitt.edu/piwowar
  Share yours too!



                           http://www.flickr.com/photos/myklroventine/892446624/
Dept of Biomedical Informatics at U of Pittsburgh
NLM for training grant funding
Open science online community and those who release their
 articles, datasets and photos openly
Dr Wendy Chapman for her support and feedback


                thank you
“Does anyone want your data?

 That’s hard to predict[…]


 After all, no one ever knocked on your door asking to buy
 those figurines collecting dust in your cabinet before you listed
 them on eBay.

 Your data, too, may simply be awaiting an effective
 matchmaker.”




                                           Got data? Nature Neuroscience 10, 931 (2007)
Journal
mandates




           variables
Correlates with self‐reported data 
withholding
            industry involvement
perceived competitiveness of field
                             male
   sharing discouraged in training
              human participants
           academic productivity
                                     0   1             2            3




                                             Blumenthal et al. Acad Med. 2006
Self‐reported reasons for data 
withholding
               sharing is too much effort
want student or jr faculty to publish more
   they themselves want to publish more
                                       cost
                         industrial sponsor
                             confidentiality
              commercial value of results
                                              0%   20%   40%    60%    80%



                                                     Campbell et al. JAMA 2002.
Prevalence of data withholding 
via surveys
 self-reported denying a request in last 3 years

      trainees self-reported denying a request

   been denied access to data, materials, code

       authors “not able to retrieve raw data”

                     not willing to release data

                                                   0%   10%      20%      30%      40%

                                                                Campbell et al. JAMA. 2002.
                                                        Kyzas et al. J Natl Cancer Inst. 2005.
                                                               Vogeli et al. Acad Med. 2006.
                                                              Reidpath et al. Bioethics 2001.

Mais conteúdo relacionado

Mais procurados

GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.caGenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.cafionabrinkman
 
Thesis Proposal, as presented for dissertation proposal defense
Thesis Proposal, as presented for dissertation proposal defenseThesis Proposal, as presented for dissertation proposal defense
Thesis Proposal, as presented for dissertation proposal defenseHeather Piwowar
 
W3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesW3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesMichel Dumontier
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET
 
Why study Data Sharing? (+ why share your data)
Why study Data Sharing?  (+ why share your data)Why study Data Sharing?  (+ why share your data)
Why study Data Sharing? (+ why share your data)Heather Piwowar
 
Research into Open Research Data
Research into Open Research DataResearch into Open Research Data
Research into Open Research DataHeather Piwowar
 
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...GigaScience, BGI Hong Kong
 
Seattle-Denver VA Center for Innovation
Seattle-Denver VA Center for InnovationSeattle-Denver VA Center for Innovation
Seattle-Denver VA Center for InnovationBrian Bot
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationBenjamin Good
 
biomedical research in an increasingly digital world
biomedical research in an increasingly digital worldbiomedical research in an increasingly digital world
biomedical research in an increasingly digital worldBrian Bot
 
Web-scale Discovery Tools and the Backgrounding of Government Information
Web-scale Discovery Tools and the Backgrounding of Government InformationWeb-scale Discovery Tools and the Backgrounding of Government Information
Web-scale Discovery Tools and the Backgrounding of Government InformationChristopher Brown
 
Laurie Goodman: Overcoming Hurdles to Data Publication
Laurie Goodman: Overcoming Hurdles to Data PublicationLaurie Goodman: Overcoming Hurdles to Data Publication
Laurie Goodman: Overcoming Hurdles to Data PublicationGigaScience, BGI Hong Kong
 
GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience, BGI Hong Kong
 
Nicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchNicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchGigaScience, BGI Hong Kong
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
 
AI in translational medicine webinar
AI in translational medicine webinarAI in translational medicine webinar
AI in translational medicine webinarPistoia Alliance
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECAProject
 

Mais procurados (20)

GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.caGenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
 
Thesis Proposal, as presented for dissertation proposal defense
Thesis Proposal, as presented for dissertation proposal defenseThesis Proposal, as presented for dissertation proposal defense
Thesis Proposal, as presented for dissertation proposal defense
 
W3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesW3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description Guidelines
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019
 
Why study Data Sharing? (+ why share your data)
Why study Data Sharing?  (+ why share your data)Why study Data Sharing?  (+ why share your data)
Why study Data Sharing? (+ why share your data)
 
Research into Open Research Data
Research into Open Research DataResearch into Open Research Data
Research into Open Research Data
 
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
 
Seattle-Denver VA Center for Innovation
Seattle-Denver VA Center for InnovationSeattle-Denver VA Center for Innovation
Seattle-Denver VA Center for Innovation
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocuration
 
biomedical research in an increasingly digital world
biomedical research in an increasingly digital worldbiomedical research in an increasingly digital world
biomedical research in an increasingly digital world
 
Web-scale Discovery Tools and the Backgrounding of Government Information
Web-scale Discovery Tools and the Backgrounding of Government InformationWeb-scale Discovery Tools and the Backgrounding of Government Information
Web-scale Discovery Tools and the Backgrounding of Government Information
 
Laurie Goodman: Overcoming Hurdles to Data Publication
Laurie Goodman: Overcoming Hurdles to Data PublicationLaurie Goodman: Overcoming Hurdles to Data Publication
Laurie Goodman: Overcoming Hurdles to Data Publication
 
GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.
 
Nicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchNicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do research
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
Cartegena051811
Cartegena051811Cartegena051811
Cartegena051811
 
AI in translational medicine webinar
AI in translational medicine webinarAI in translational medicine webinar
AI in translational medicine webinar
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
 

Destaque

RDAP13 Ixchel Faniel: Can Quantitative Social Scientists Get Data Reuse Satis...
RDAP13 Ixchel Faniel: Can Quantitative Social Scientists Get Data Reuse Satis...RDAP13 Ixchel Faniel: Can Quantitative Social Scientists Get Data Reuse Satis...
RDAP13 Ixchel Faniel: Can Quantitative Social Scientists Get Data Reuse Satis...ASIS&T
 
Altmetrics: how librarians can support researchers in improving their impact
Altmetrics: how librarians can support researchers in improving their impactAltmetrics: how librarians can support researchers in improving their impact
Altmetrics: how librarians can support researchers in improving their impactGIDIF-RBM
 
RDAP13 Kathleen Fear: The impact of data reuse: a pilot study of 5 measures
RDAP13 Kathleen Fear: The impact of data reuse: a pilot study of 5 measuresRDAP13 Kathleen Fear: The impact of data reuse: a pilot study of 5 measures
RDAP13 Kathleen Fear: The impact of data reuse: a pilot study of 5 measuresASIS&T
 
Proactive Guide In Securing Essential Need Facilities
Proactive Guide In Securing Essential Need FacilitiesProactive Guide In Securing Essential Need Facilities
Proactive Guide In Securing Essential Need FacilitiesBdeboth
 
Presentacion barcelona 3.rafael.antonia.v2
Presentacion barcelona 3.rafael.antonia.v2Presentacion barcelona 3.rafael.antonia.v2
Presentacion barcelona 3.rafael.antonia.v2maredata
 
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)ASIS&T
 

Destaque (6)

RDAP13 Ixchel Faniel: Can Quantitative Social Scientists Get Data Reuse Satis...
RDAP13 Ixchel Faniel: Can Quantitative Social Scientists Get Data Reuse Satis...RDAP13 Ixchel Faniel: Can Quantitative Social Scientists Get Data Reuse Satis...
RDAP13 Ixchel Faniel: Can Quantitative Social Scientists Get Data Reuse Satis...
 
Altmetrics: how librarians can support researchers in improving their impact
Altmetrics: how librarians can support researchers in improving their impactAltmetrics: how librarians can support researchers in improving their impact
Altmetrics: how librarians can support researchers in improving their impact
 
RDAP13 Kathleen Fear: The impact of data reuse: a pilot study of 5 measures
RDAP13 Kathleen Fear: The impact of data reuse: a pilot study of 5 measuresRDAP13 Kathleen Fear: The impact of data reuse: a pilot study of 5 measures
RDAP13 Kathleen Fear: The impact of data reuse: a pilot study of 5 measures
 
Proactive Guide In Securing Essential Need Facilities
Proactive Guide In Securing Essential Need FacilitiesProactive Guide In Securing Essential Need Facilities
Proactive Guide In Securing Essential Need Facilities
 
Presentacion barcelona 3.rafael.antonia.v2
Presentacion barcelona 3.rafael.antonia.v2Presentacion barcelona 3.rafael.antonia.v2
Presentacion barcelona 3.rafael.antonia.v2
 
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
 

Semelhante a Measuring Progress Toward Shared Biomedical Data

NEDCC 2010 Piwowar Leaders and Laggards
NEDCC 2010 Piwowar Leaders and LaggardsNEDCC 2010 Piwowar Leaders and Laggards
NEDCC 2010 Piwowar Leaders and LaggardsHeather Piwowar
 
Thesis defense, Heather Piwowar, Sharing biomedical research data
Thesis defense, Heather Piwowar, Sharing biomedical research dataThesis defense, Heather Piwowar, Sharing biomedical research data
Thesis defense, Heather Piwowar, Sharing biomedical research dataHeather Piwowar
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsARDC
 
ELPUB 2008: A review of journal policies for sharing research data
ELPUB 2008:    A review of journal policies for sharing research dataELPUB 2008:    A review of journal policies for sharing research data
ELPUB 2008: A review of journal policies for sharing research dataHeather Piwowar
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Susanna-Assunta Sansone
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...Todd Vision
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
Public Sharing of Research Datasets: A Pilot Study of Associations
Public Sharing of Research Datasets: A Pilot Study of Associations Public Sharing of Research Datasets: A Pilot Study of Associations
Public Sharing of Research Datasets: A Pilot Study of Associations Heather Piwowar
 
Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...
Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...
Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...Pistoia Alliance
 
Public data archiving: Who does? Who doesn't? What can we do about it?
Public data archiving: Who does?  Who doesn't?  What can we do about it?Public data archiving: Who does?  Who doesn't?  What can we do about it?
Public data archiving: Who does? Who doesn't? What can we do about it?Heather Piwowar
 
NCI Support for Cancer Data Sharing
NCI Support for Cancer Data SharingNCI Support for Cancer Data Sharing
NCI Support for Cancer Data SharingWarren Kibbe
 
BioSHaRE: Operationalizing responsible data sharing and access: GA4GH - Barth...
BioSHaRE: Operationalizing responsible data sharing and access: GA4GH - Barth...BioSHaRE: Operationalizing responsible data sharing and access: GA4GH - Barth...
BioSHaRE: Operationalizing responsible data sharing and access: GA4GH - Barth...Lisette Giepmans
 
One Funder’s View for Advancing Open Science
One Funder’s View for Advancing Open ScienceOne Funder’s View for Advancing Open Science
One Funder’s View for Advancing Open SciencePhilip Bourne
 
Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Sandra Binning
 
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011Adam Ford
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Michel Dumontier
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global EcosystemPhilip Bourne
 
Secure Data Sharing and Related Matters – An NIH View
Secure Data Sharing and Related Matters – An NIH ViewSecure Data Sharing and Related Matters – An NIH View
Secure Data Sharing and Related Matters – An NIH ViewPhilip Bourne
 
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Amit Sheth
 

Semelhante a Measuring Progress Toward Shared Biomedical Data (20)

NEDCC 2010 Piwowar Leaders and Laggards
NEDCC 2010 Piwowar Leaders and LaggardsNEDCC 2010 Piwowar Leaders and Laggards
NEDCC 2010 Piwowar Leaders and Laggards
 
Thesis defense, Heather Piwowar, Sharing biomedical research data
Thesis defense, Heather Piwowar, Sharing biomedical research dataThesis defense, Heather Piwowar, Sharing biomedical research data
Thesis defense, Heather Piwowar, Sharing biomedical research data
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directions
 
ELPUB 2008: A review of journal policies for sharing research data
ELPUB 2008:    A review of journal policies for sharing research dataELPUB 2008:    A review of journal policies for sharing research data
ELPUB 2008: A review of journal policies for sharing research data
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Public Sharing of Research Datasets: A Pilot Study of Associations
Public Sharing of Research Datasets: A Pilot Study of Associations Public Sharing of Research Datasets: A Pilot Study of Associations
Public Sharing of Research Datasets: A Pilot Study of Associations
 
Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...
Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...
Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...
 
Public data archiving: Who does? Who doesn't? What can we do about it?
Public data archiving: Who does?  Who doesn't?  What can we do about it?Public data archiving: Who does?  Who doesn't?  What can we do about it?
Public data archiving: Who does? Who doesn't? What can we do about it?
 
NCI Support for Cancer Data Sharing
NCI Support for Cancer Data SharingNCI Support for Cancer Data Sharing
NCI Support for Cancer Data Sharing
 
BioSHaRE: Operationalizing responsible data sharing and access: GA4GH - Barth...
BioSHaRE: Operationalizing responsible data sharing and access: GA4GH - Barth...BioSHaRE: Operationalizing responsible data sharing and access: GA4GH - Barth...
BioSHaRE: Operationalizing responsible data sharing and access: GA4GH - Barth...
 
One Funder’s View for Advancing Open Science
One Funder’s View for Advancing Open ScienceOne Funder’s View for Advancing Open Science
One Funder’s View for Advancing Open Science
 
Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?
 
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
 
Secure Data Sharing and Related Matters – An NIH View
Secure Data Sharing and Related Matters – An NIH ViewSecure Data Sharing and Related Matters – An NIH View
Secure Data Sharing and Related Matters – An NIH View
 
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
 
Trends in Data Sharing
Trends in Data SharingTrends in Data Sharing
Trends in Data Sharing
 

Mais de Heather Piwowar

Calculating how much your University spends on Open Access--and what to do ab...
Calculating how much your University spends on Open Access--and what to do ab...Calculating how much your University spends on Open Access--and what to do ab...
Calculating how much your University spends on Open Access--and what to do ab...Heather Piwowar
 
How to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your UniversityHow to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your UniversityHeather Piwowar
 
Intro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid UseIntro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid UseHeather Piwowar
 
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
 The Future of OA: 
The Impact of Open Access on Readership and Subscription ... The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...Heather Piwowar
 
The time has come to talk of... who should own scholarly infrastructure?
 The time has come to talk of... who should own scholarly infrastructure? The time has come to talk of... who should own scholarly infrastructure?
The time has come to talk of... who should own scholarly infrastructure?Heather Piwowar
 
What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...Heather Piwowar
 
Data science needs Data and lots of it
Data science needs Data and lots of itData science needs Data and lots of it
Data science needs Data and lots of itHeather Piwowar
 
Impactstory OA week 2017
Impactstory OA week 2017Impactstory OA week 2017
Impactstory OA week 2017Heather Piwowar
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedHeather Piwowar
 
What's your Impactstory?
What's your Impactstory?What's your Impactstory?
What's your Impactstory?Heather Piwowar
 
capturing the impact of software AAS 2017
capturing the impact of software AAS 2017capturing the impact of software AAS 2017
capturing the impact of software AAS 2017Heather Piwowar
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedHeather Piwowar
 
submission summary for #WSSSPE Policy session on Credit, Citation, and Impact
submission summary for #WSSSPE Policy session on Credit, Citation, and Impactsubmission summary for #WSSSPE Policy session on Credit, Citation, and Impact
submission summary for #WSSSPE Policy session on Credit, Citation, and ImpactHeather Piwowar
 
Building Skyscrapers with our Scholarship
Building Skyscrapers with our ScholarshipBuilding Skyscrapers with our Scholarship
Building Skyscrapers with our ScholarshipHeather Piwowar
 
Right time, right place, to change the world
Right time, right place, to change the worldRight time, right place, to change the world
Right time, right place, to change the worldHeather Piwowar
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset useHeather Piwowar
 
Analyzing data about our data
Analyzing data about our dataAnalyzing data about our data
Analyzing data about our dataHeather Piwowar
 

Mais de Heather Piwowar (20)

Calculating how much your University spends on Open Access--and what to do ab...
Calculating how much your University spends on Open Access--and what to do ab...Calculating how much your University spends on Open Access--and what to do ab...
Calculating how much your University spends on Open Access--and what to do ab...
 
Unsub Lightning Talk
Unsub Lightning TalkUnsub Lightning Talk
Unsub Lightning Talk
 
How to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your UniversityHow to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your University
 
Intro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid UseIntro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid Use
 
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
 The Future of OA: 
The Impact of Open Access on Readership and Subscription ... The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
 
The time has come to talk of... who should own scholarly infrastructure?
 The time has come to talk of... who should own scholarly infrastructure? The time has come to talk of... who should own scholarly infrastructure?
The time has come to talk of... who should own scholarly infrastructure?
 
What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...
 
Data science needs Data and lots of it
Data science needs Data and lots of itData science needs Data and lots of it
Data science needs Data and lots of it
 
Oadoi and libraries
Oadoi and librariesOadoi and libraries
Oadoi and libraries
 
Impactstory OA week 2017
Impactstory OA week 2017Impactstory OA week 2017
Impactstory OA week 2017
 
Paperbuzz sneak peek
Paperbuzz sneak peekPaperbuzz sneak peek
Paperbuzz sneak peek
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learned
 
What's your Impactstory?
What's your Impactstory?What's your Impactstory?
What's your Impactstory?
 
capturing the impact of software AAS 2017
capturing the impact of software AAS 2017capturing the impact of software AAS 2017
capturing the impact of software AAS 2017
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learned
 
submission summary for #WSSSPE Policy session on Credit, Citation, and Impact
submission summary for #WSSSPE Policy session on Credit, Citation, and Impactsubmission summary for #WSSSPE Policy session on Credit, Citation, and Impact
submission summary for #WSSSPE Policy session on Credit, Citation, and Impact
 
Building Skyscrapers with our Scholarship
Building Skyscrapers with our ScholarshipBuilding Skyscrapers with our Scholarship
Building Skyscrapers with our Scholarship
 
Right time, right place, to change the world
Right time, right place, to change the worldRight time, right place, to change the world
Right time, right place, to change the world
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset use
 
Analyzing data about our data
Analyzing data about our dataAnalyzing data about our data
Analyzing data about our data
 

Measuring Progress Toward Shared Biomedical Data

  • 1. Measuring progress toward a cultural norm of shared (and reused!) biomedical research data Heather Piwowar Department of Biomedical Informatics University of Pittsburgh
  • 2. Sharing research data http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/ Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
  • 3. Sharing research data http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/ Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
  • 4. Sharing research data PAST MEDICAL HISTORY: Past medical history showed she had superficial phlebitis times two in the past, had non-insulin dependent diabetes mellitus for four years. She had been hypothyroid for three years. HISTORY OF PRESENT ILLNESS: The patient is a 58-year-old female, … http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/ Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
  • 5. Sharing research data PAST MEDICAL HISTORY: Past medical history showed she had superficial phlebitis times two in the past, had non-insulin dependent diabetes mellitus for four years. She had been hypothyroid for three years. HISTORY OF PRESENT ILLNESS: The patient is a 58-year-old female, … http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/ Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
  • 6. Sharing research data PAST MEDICAL HISTORY: Past medical history showed she had superficial phlebitis times two in the past, had non-insulin dependent diabetes mellitus for four years. She had been hypothyroid for three years. HISTORY OF PRESENT ILLNESS: The patient is a 58-year-old female, … http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/ Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
  • 7. Sharing research data PAST MEDICAL HISTORY: Past medical history showed she had superficial phlebitis times two in the past, had non-insulin dependent diabetes mellitus for four years. She had been hypothyroid for three years. HISTORY OF PRESENT ILLNESS: The patient is a 58-year-old female, … http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/ Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
  • 8. Shared data benefits science Verify Understand Extend Explore Combine Synergize Train Reduce
  • 9. But... costly for authors Find Organize Document Deidentify Format Decide Ask Submit Answer questions Worry about mistakes being found Worry about data being misinterpreted Worry about being scooped Forgo money and IP and prestige???
  • 10. As a result, policy makers have spent  lots of time and money .... http://www.flickr.com/photos/johnnyvulkan/381941233/ http://www.flickr.com/photos/tonivc/2283676770/
  • 11. ... on initiatives, requests,  requirements, and tools NIH data sharing plan requirement Journal requirements Public databases Data sharing grids like BIRN and caBIG Data formatting standards Editorials, letters to the editor, discussion....
  • 13. lots of data sharing! http://www.genome.jp/en/db_growth.html
  • 14. but how much isn’t  shared? what isn’t shared? who isn’t sharing it? why not? how much does it matter? what can we do  about it?
  • 15. you can not manage  what you do not measure http://www.flickr.com/photos/archeon/2941655917/
  • 16. research questions 1. Is there benefit for those who share? 2. Do journal policies increase rates of sharing? 3. What other factors are correlated with sharing and withholding data?
  • 17. http://en.wikipedia.org/wiki/DNA_microarray http://en.wikipedia.org/wiki/Image:Heatmap.png http://commons.wikimedia.org/wiki/ File:DNA_double_helix_vertikal.PNG microarray data
  • 18. microarray data
  • 19. 1. Is there benefit for  those who share? http://www.flickr.com/photos/sunrise/35819369/
  • 20. currency of value? Citations. $50! Diamond,Arthur M. What is a Citation Worth?. The Journal of Human Resources (1986) vol. 21 (2) pp. 200-215
  • 21. Prior work focused on the citation advantage of an open access publishing model. Our question: are articles that share their raw research data cited more than articles that don’t?
  • 22. dataset 85 cancer microarray trials published in 1999-2003, as identified by Ntzani and Ioannidis (2003) citations ISI Web of Science Citation index, citations from 2004-2005 data sharing locations Publisher and lab websites, microarray databases, WayBack Internet Archive, Oncomine statistics Multivariate linear regression
  • 24. In multivariate regression, we found studies that had made their data publicly available received 69% more citations than similar studies that did not share their data (95% confidence interval: 18% to 143%) Piwowar, Day and Fridsma (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308
  • 25. future work • collect a larger dataset for citation analysis (stay tuned) • investigate other datatypes • examine citation context
  • 26. 2. Do journal data sharing  policies increase sharing? http://www.flickr.com/photos/ryanr/142455033/
  • 27. “An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols available in a publicly accessible database …” http://www.nature.com/authors/editorial_policies/availability.html http://www.nature.com/nature/journal/v453/n7197/index.html
  • 28. Prior work examined data sharing policies in biomedicine, but these reviews are now dated, consider a variety of resources, and don’t correlate policy to behaviour. McCain. Science Communication, Vol. 16, No. 4. (1 June 1995), pp. 403-431 NAS. Sharing Publication-Related Data and Materials. (2003), p. 33
  • 29. Our aim: look at data sharing policies within Instruction to Author statements of 70 journals, as they apply to gene expression microarray data.
  • 30. content of data sharing policies Very diverse policies in terms of: • statements of policy motivation • datatype-specific policies • requested vs. required • data location • data format • data completeness • timeliness of sharing • consequences for not sharing • exceptions
  • 31. strength of data sharing policies No applicable policy (43%) Weak policy (24%) should, recommend, request must, but without database accession number Strong policy (33%) must, required, condition of publication requires database accession number
  • 32. strength of data sharing policies multivariate associations •! Biochemistry &Molecular Biology Impact Open Society •! Oncology Factor Access? Publisher? Journal has a data sharing policy?
  • 33. strength of data sharing policies associated with impact factor High-impact journals tend to have a strong data-sharing policy
  • 34. data sharing policies associated with amount of sharing For each of the 70 journals, we measured the percent of articles that were cited from within GEO and ArrayExpress. We considered this a proxy for percent of articles with shared data.
  • 35. data sharing policies associated with amount of sharing Having a data-sharing policy? •! Genetics & Heredity Impact Open Society •! Multidisciplinary Factor Access? Publisher? Sciences % of articles with shared data
  • 36. our corpus of “gene expression microarray” articles may have included some that reused data and did not themselves produce primary data • these results should be considered preliminary, pending a more precise filter (stay tuned) http://www.flickr.com/photos/vlastula/300102949/
  • 37. future work on journal policies • use a more precise filter to isolate data producing articles and thereby understand the absolute levels of data sharing • investigate other datatypes • look at associations with reviewer instructions and opinions
  • 38. future work on funder policies • are they effective? (stay tuned) • what do people propose in data sharing plans? Do they do what they propose? Why not? • quantify the perceived worth of data sharing plans and accomplishments in funding and promotion decisions
  • 40. Prior work has focused on surveys and studies of intention. Our aim: measure associations between observed data sharing behaviour and environmental variables Blumenthal et al. Acad Med. 2006 Campbell et al. JAMA. 2002 Kyzas et al. J Natl Cancer Inst. 2005 Vogeli et al. Acad Med. 2006 Reidpath et al. Bioethics 2001
  • 41. pilot dataset Ochsner et al. manually reviewed 20 journals for 2007: 400 studies 200 shared their microarray data Ochsner et al. (2008). Much room for improvement in deposition rates of expression microarray datasets. Nature Methods, 5(12), 991.
  • 42. pilot variables Journal Funder Journal Investigator impact mandates mandates “experience” factor Is research data shared after publication?
  • 43. funder mandates NIH 2003 Data Sharing Requirement Requires a data sharing plan for studies funded after October 2003 that receive more than $500 000 in direct funding per year
  • 44. funder mandates Assumed data sharing requirement was applicable if: the NIH grant numbers associated with PubMed entry had $750 000 in total funding any year since 2004 plus a NIH grant number with a leading “1” or “2” since 2004
  • 45. author experience Publication history and impact proxy First and last authors: • years since first paper • h-index (the largest number N such that an author has N papers cited at least N times) • a-index
  • 46. author experience Derived h-index (pubmedi citation indices): Author publication history: Author name Author-ity web service Torvik & Smalheiser. (2009). Author Name disambiguation: Disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3):11. Citation counts:
  • 47. pilot variables Journal Funder Journal Investigator impact mandates mandates “experience” factor Is research data shared after publication?
  • 48. stats Univariate odds ratios Multivariate logistic regression
  • 49. results of pilot Not statistically significant Statistically significant Journal Funder Journal Investigator impact mandates mandates “experience” factor Is research data shared after publication?
  • 57. PhD dissertation More samples, more variables http://www.flickr.com/photos/krcla/2069243613/
  • 58. More samples: Developed and evaluated automated methods to: • Identify studies that generate datasets that could potentially be shared • Determine which of these have in fact been shared
  • 59. To identify studies that generate datasets, use a query on the full text of published articles: ("gene expression" AND microarray AND cell AND rna) AND (rneasy OR trizol OR "real-time pcr") NOT (“tissue microarray*” OR “cpg island*”)
  • 60. To determine which articles have shared data, use a query on the full text of published articles: pubmed_gds[filter] and query ArrayExpress
  • 61. More variables: Use PubMed and a variety of other internet resources...
  • 62. Funder Journal Investigator Institution Study funded by impact years since sector humans? NIH? factor first paper size mice? size of strength of h-index grant policy impact plants? a-index rank sharing open cancer? plan req’d? access? previously country shared? clinical funded by number of trial? non-NIH? microarray previously reused? number of studies authors published gender year
  • 63. stats Univariate odds ratios Multivariate logistic regression Exploratory factor analysis
  • 64. results? http://www.flickr.com/photos/skrb/2427171774/
  • 65. research questions 1. Is there benefit for those who share? 2. Do journal policies increase rates of sharing? 3. What other factors are correlated with sharing and withholding data?
  • 67. future work previously mentioned... • citation analysis of larger cohort • journal policies with refined filter • beyond microarray data • deeper into journal and funder policies • and, finally....
  • 68. Reuse. http://www.flickr.com/photos/boitabulle/3668162701/
  • 69. who reuses data? why? when? who doesn’t? which datasets are most likely  to be reused? how many datasets could be  reused but aren’t? why aren’t they? what can we do  about it?
  • 70. One possible reuse research agenda 1. Inventory reuse acknowlegement patterns 2. Build full-text and metadata filters to identify instances of data reuse 3. Analyze patterns in data reuse choices 4. Survey data producers and data consumers to augment with intentions and perspectives
  • 71. Resources • GEO list of reuse articles (currently 618) • Previous work in citation context classification • Amazon Mechanical Turk for annotation • Experimental Philosophy for insight into cultural norms • ... Teufel et al. (2006) Automatic classification of citation function. EMNLP.
  • 72. Stakeholders • readers • reusers For their perspectives, • authors and also to design studies that have actionable results for these groups • editors • reviewers • funders • database designers, maintainers, curators • patients, subjects, or populations
  • 73. Data sharing plan I post my data, code, and statistical scripts at http://www.dbmi.pitt.edu/piwowar Share yours too! http://www.flickr.com/photos/myklroventine/892446624/
  • 74. Dept of Biomedical Informatics at U of Pittsburgh NLM for training grant funding Open science online community and those who release their articles, datasets and photos openly Dr Wendy Chapman for her support and feedback thank you
  • 75.
  • 76. “Does anyone want your data? That’s hard to predict[…] After all, no one ever knocked on your door asking to buy those figurines collecting dust in your cabinet before you listed them on eBay. Your data, too, may simply be awaiting an effective matchmaker.” Got data? Nature Neuroscience 10, 931 (2007)
  • 77. Journal mandates variables
  • 78.
  • 79.
  • 80. Correlates with self‐reported data  withholding industry involvement perceived competitiveness of field male sharing discouraged in training human participants academic productivity 0 1 2 3 Blumenthal et al. Acad Med. 2006
  • 81. Self‐reported reasons for data  withholding sharing is too much effort want student or jr faculty to publish more they themselves want to publish more cost industrial sponsor confidentiality commercial value of results 0% 20% 40% 60% 80% Campbell et al. JAMA 2002.
  • 82. Prevalence of data withholding  via surveys self-reported denying a request in last 3 years trainees self-reported denying a request been denied access to data, materials, code authors “not able to retrieve raw data” not willing to release data 0% 10% 20% 30% 40% Campbell et al. JAMA. 2002. Kyzas et al. J Natl Cancer Inst. 2005. Vogeli et al. Acad Med. 2006. Reidpath et al. Bioethics 2001.