SlideShare a Scribd company logo
1 of 125
Download to read offline
Public data archiving:

     Who shares?
    Who doesn’t?
What can we do about it?
               Heather Piwowar
         Presented at UBC BLISS, Sept 2010

 DataONE postdoc with Dryad and NESCent, @UBC
PhD in Dept of Biomedical Informatics, U of Pittsburgh
http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm
http://www.flickr.com/photos/jsmjr/62443357/
http://www.flickr.com/photos/camilleharrington/3587294608/
http://www.flickr.com/photos/rkuhnau/3318245976/
http://www.flickr.com/photos/conformpdx/1796399674/
http://www.flickr.com/photos/rkuhnau/3317418699/
http://www.flickr.com/photos/zemlinki/261617721/
http://www.flickr.com/photos/tracenmatt/3020786491/
http://www.flickr.com/photos/the-o/2078239333/
http://www.flickr.com/photos/ryanr/142455033/
http://www.flickr.com/photos/75166820@N00/5318468/
Find
Organize
Document
Deidentify
Format
Decide
Ask
Submit

Answer questions
Worry about mistakes being found
Worry about data being misinterpreted
Worry about being scooped
Forgo money and IP and prestige???
not very motivating.
As a result, policy makers have spent 
 lots of time and money ....




                      http://www.flickr.com/photos/johnnyvulkan/381941233/
                           http://www.flickr.com/photos/tonivc/2283676770/
building databases, 
developing standards, 
articulating best practices

to support public archiving of 
 research datasets 
lots of data sharing!




                        http://www.genome.jp/en/db_growth.html
but how much isn’t 
 shared?

  what isn’t shared?
              who isn’t sharing it?
why not?
     how much does it matter?
             what can we do 
              about it?
you can not manage 
what you do not measure




               quote: Lord Kelvin
               http://www.flickr.com/photos/archeon/2941655917/
As we seek to embrace and
 encourage data sharing,

understanding patterns of adoption
 will allow us to make informed
 decisions about tools, policies, and
 best practices.

Measuring adoption over time will
 allow us to note progress and
 identify best practices and
 opportunities for improvement.
research questions

  1. Is there benefit for those who share?
  2. How can we study data sharing behaviour in
     a scalable, systematic way?
  3. What factors are correlated with sharing
     and withholding data?
http://www.flickr.com/photos/paulhami/1020538523//
Which data?




              http://www.flickr.com/photos/paulhami/1020538523//
Where?




         http://www.flickr.com/photos/paulhami/1020538523//
With whom?




      http://www.flickr.com/photos/paulhami/1020538523//
When?




        http://www.flickr.com/photos/paulhami/1020538523//
Under what terms?




                http://www.flickr.com/photos/paulhami/1020538523//
http://www.flickr.com/photos/paulhami/1020538523//
http://www.flickr.com/photos/paulhami/1020538523//
• gene expression microarray data
• raw intensity data
• upon publication
• publicly on the internet
• (centralized databases)

                       http://www.flickr.com/photos/paulhami/1020538523//
http://en.wikipedia.org/wiki/DNA_microarray
   http://en.wikipedia.org/wiki/Image:Heatmap.png
   http://commons.wikimedia.org/wiki/
       File:DNA_double_helix_vertikal.PNG




microarray
      data
microarray
      data
1.  Is there benefit for 
 those who share?




                 http://www.flickr.com/photos/sunrise/35819369/
currency of value?

     Citations.
currency of value?

     Citations.

           $50!




                     Diamond,Arthur M. What is a Citation Worth?.
                        The Journal of Human Resources (1986)
                        vol. 21 (2) pp. 200-215
dataset
85 cancer microarray trials published in 1999-2003, as
identified by Ntzani and Ioannidis (2003)

citations
ISI Web of Science Citation index, citations from
2004-2005

data sharing locations
Publisher and lab websites, microarray databases, WayBack
Internet Archive, Oncomine

statistics
Multivariate linear regression
Note:
 log
 scale
~70%
2. Need automated methods to:

a) Identify studies that create datasets
b) Determine which of these
        have in fact been shared
c) Extract attributes about the environment
a) Identify studies that create datasets




                                 http://www.flickr.com/photos/lofaesofa/248546821/
Look for wetlab methods in article full text:




                         http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1522022&tool=pmcentrez
                         http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1590031&tool=pmcentrez
                   http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1482311&tool=pmcentrez#id331936
                         http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2082469&tool=pmcentrez
                    http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=126870&tool=pmcentrez#id442745
Combined, these full-text portals reach 85%
of the articles available through
U of Pittsburgh library subscriptions.
But how to generate an effective query?
Use open access articles.
• text analysis:
               automatically catalogued
 single words and word-pairs from full text
• assessed precision and recall
• combined the high performers:
Derived query:
  ("gene expression" AND microarray AND cell AND rna)

  AND (rneasy OR trizol OR "real-time pcr")

  NOT (“tissue microarray*” OR “cpg island*”)
Evaluation:
Ochsner et al. Nature Methods (2008)
400 studies across 20 journals

Precision: 90% (conf int: 86% to 93%)
Recall:    56% (conf int: 52% to 61%)
a) Identify studies that create datasets
b) Determine which of these
        have in fact been shared
c) Extract attributes about the environment
b) Determine which datasets
        have in fact been shared
77 % 
a) Identify studies that create datasets
b) Determine which of these
        have in fact been shared
c) Extract attributes about the environment
Funder   Journal       Investigator   Institution   Study




                   Is research data shared
                       after publication?
Funder       Journal       Investigator   Institution     Study

funded by     impact         years since   sector        humans?
NIH?          factor         first paper
                                           size          mice?
size of       strength of    # pubs
grant         policy                       impact        plants?
                             # citations   rank
sharing       open                                       cancer?
plan req’d?   access?        previously    country
                             shared?                     clinical
funded by     number of                                  trial?
non-NIH?      microarray     previously
                             reused?                     number of
              studies                                    authors
              published      gender
                                                         year
journal rank
journal data sharing policy


          “An inherent principle of publication is that
           others should be able to replicate and build
           upon the authors' published claims.
           Therefore, a condition of publication
           in a Nature journal is that authors are
           required to make materials, data and
           associated protocols available in a publicly
           accessible database …”


                          http://www.nature.com/authors/editorial_policies/availability.html
                              http://www.nature.com/nature/journal/v453/n7197/index.html
institution rank




Yu et al. BMC medical
  informatics and decision
  making (2007) vol. 7 pp. 17
study type
author “experience”

Author publication history:

Author name            Author-ity web service
                       Torvik & Smalheiser. (2009). Author Name
disambiguation:        Disambiguation in MEDLINE. ACM Transactions on
                       Knowledge Discovery from Data, 3(3):11.



Citation counts:
author gender
funding level

PubMed grant lists   + NIH grant details
funder mandates




     Requires a data sharing plan
     for studies funded after October 2003
     that receive more than $500 000 in
     direct funding per year
funder mandates

Proxy for NIH data sharing policy
applicability:

If in any year since 2004,
• funded by an NIH grant number
   with a “1” or “2” type code
• received more than $750 000 in
   total funding from the grant
and so on...


    124 variables
Now equipped with automated methods to:

a) Identify studies that create datasets
b) Determine which of these
        have in fact been shared
c) Extract attributes about the environment
3.  What factors are correlated 
 with sharing and withholding 
 data?
                     http://www.flickr.com/photos/cogdog/123072/
11,603 datapoints


25% had links from datasets in databases
univariate analysis
Proportion of articles with shared datasets, by year




                                                                    0.35
Proportion of articles with datasets found in GEO or ArrayExpress

                                                                    0.30
                                                                    0.25
                                                                    0.20
                                                                    0.15




                                                                                                          Across time
                                                                    0.10
                                                                    0.05




                                                                           2000   2001   2002   2003   2004   2005    2006   2007   2008   2009

                                                                                                  Year article published
Proportion of datasets shared




                                     0.0
                                           0.2
                                                 0.4
                                                       0.6
                                                                      0.8
                                                                                    1.0
             Physiol Genomics
                    PLoS Genet
                   Genome Biol
                    Microbiology
                      PLoS One
                BMC Genomics
                       Plant Cell
                  Genome Res
                  Eukaryot Cell
        Appl Environ Microbiol
          BMC Med Genomics
                Hum Mol Genet
      Proc Natl Acad Sci U S A
                   Infect Immun
      Am J Respir Cell Mol Biol
                         Dev Biol
                      J Bacteriol
                 Mol Endocrinol
                   BMC Cancer
                   Plant Physiol
                    Biol Reprod
                           Blood
                      J Immunol
                        FASEB J
                     Toxicol Sci
                       J Exp Bot
             Nucleic Acids Res
                        Diabetes
                    Mol Cell Biol
               Mol Cancer Ther
           BMC Bioinformatics
                     Stem Cells
                      FEBS Lett
                      J Neurosci
                    Am J Pathol
                    J Biol Chem
                           J Virol
                         OTHER
                    Cancer Res
       J Clin Endocrinol Metab
                  Plant Mol Biol
               Clin Cancer Res
                      Genomics
                                                                                   Journals




     Invest Ophthalmol Vis Sci
              Mol Hum Reprod
                Carcinogenesis
                            Gene
                 Endocrinology
                      Oncogene
                     Cancer Lett
Biochem Biophys Res Commun
                                                        (Physiological Genomics)
Proportion of datasets shared




                                            0.0
                                                     0.2
                                                           0.4
                                                                      0.6
                                                                                 0.8
                                                                                        1.0
                   Stanford University
            University of Pennsylvania
                   University of Illinois
  University of California, Los Angeles
     University of Wisconsin, Madison
             University of Washington
        University of California, Davis
    The University of British Columbia
University of California, San Francisco
                  University of Florida
   University of California, San Diego
  University of Minnesota, Twin Cities
           Baylor College of Medicine
                                OTHER
             Max Planck Gesellschaft
                    Harvard University
      Duke University Medical Center
                       Yale University


             Johns Hopkins University
               University of Pittsburgh
                                                                 (Stanford)




 Washington University in Saint Louis
                 University of Toronto
     University of California, Berkeley
    University of Michigan, Ann Arbor
             Michigan State University
                                                                              Institutions




             National Cancer Institute
                       Tokyo Daigaku
Proportion of datasets shared




       0.0
             0.2
                         0.4
                                       0.6
                                                   0.8
                                                             1.0




   1
 101
 201
 301
 401
 501
 601
 701
 801
 901
1001
1101
1201
1301
                                               rank




1401
1501
1601
1701
1801
1901
                                               Institution
multivariate analysis
factor analysis
multivariate logistic regression over
the first-order factors
Multivariate nonlinear regressions with interactions
                                                                       Odds Ratio
                                                                                        0.25       0.50                 1.00            2.00   4.00   8.00

                                                             Has journal policy
                                                       Multivariate nonlinear regressions with interactions
                            Count of                R01 & other NIH grants                 Odds Ratio




                                                                                                                                 0.95
                                                                                     0.25   0.50   1.00          2.00     4.00          8.00
Authors prev GEOAE sharing & OA & microarray creation
                                                                   Has journal policy
                                        NO K funding other P funding
                                                   Count of R01 & or NIH grants




                                                                                                          0.95
                        Authors prev GEOAE sharing & OA & microarray creation
                                                          NO K Journalfunding
                                                                funding or P impact
                                           Institution high citations & collaboration
              Journal policy consequences & Journal impact            long halflife
                                      Journal policy consequences & long halflife
                   Institution high citations NOTcollaboration  & animals or mice
                                      Instititution is government & NOT higher ed
                                                   NOT animals or mice
                                       Last author num prev pubs & first year pub
                                                                     Large NIH grant
              Instititution is government & NOT higher ed          Humans & cancer
                                      NO geo reuse + YES high institution output
               Last author num prev pubs & first year pub
                                       First author num prev pubs & first year pub

                                                             Large NIH grant
                                                          Humans & cancer
              NO geo reuse + YES high institution output
               First author num prev pubs & first year pub
Multivariate nonlinear regressions with interactions
                                                                       Odds Ratio
                                                                                        0.25       0.50                 1.00            2.00   4.00   8.00

                                                             Has journal policy
                                                       Multivariate nonlinear regressions with interactions
                            Count of                R01 & other NIH grants                 Odds Ratio




                                                                                                                                 0.95
                                                                                     0.25   0.50   1.00          2.00     4.00          8.00
Authors prev GEOAE sharing & OA & microarray creation
                                                                   Has journal policy
                                        NO K funding other P funding
                                                   Count of R01 & or NIH grants




                                                                                                          0.95
                        Authors prev GEOAE sharing & OA & microarray creation
                                                          NO K Journalfunding
                                                                funding or P impact
                                           Institution high citations & collaboration
              Journal policy consequences & Journal impact            long halflife
                                      Journal policy consequences & long halflife
                   Institution high citations NOTcollaboration  & animals or mice
                                      Instititution is government & NOT higher ed
                                                   NOT animals or mice
                                       Last author num prev pubs & first year pub
                                                                     Large NIH grant
              Instititution is government & NOT higher ed          Humans & cancer
                                      NO geo reuse + YES high institution output
               Last author num prev pubs & first year pub
                                       First author num prev pubs & first year pub

                                                             Large NIH grant
                                                          Humans & cancer
              NO geo reuse + YES high institution output
               First author num prev pubs & first year pub
logistic regression
using second-order factors
Multivariate nonlinear regression with interactions
                                                 Odds Ratio
                                     0.25   0.50    1.00       2.00      4.00

OA journal & previous GEO-AE sharing

               Amount of NIH funding




                                                        0.95
      Journal impact factor and policy

                    Higher Ed in USA

                   Cancer & humans
Multivariate nonlinear regression with interactions
                                                 Odds Ratio
                                     0.25   0.50    1.00       2.00      4.00

OA journal & previous GEO-AE sharing

               Amount of NIH funding




                                                        0.95
      Journal impact factor and policy

                    Higher Ed in USA

                   Cancer & humans
Conclusions:
   • data sharing rates are increasing,
     but overall levels are low

Preliminary evidence:
   • levels are particularly low in cancer
   • levels are highest for those who
      • publish in a journal with a policy
      • publish in an open access journal
      • have shared data before
•   data and filters were imperfect
•   many assumptions
•   didn’t capture all types of sharing
•   don’t know how generalizable across datatypes
•   should be considered hypothesis-generating


                                  http://www.flickr.com/photos/vlastula/300102949/
http://www.flickr.com/photos/gatewaystreets/3838452287/
NSF-funded distributed framework
 and cyberinfrastructure for
 environmental science.



Dryad is a repository of data
 underlying scientific publications,
 with an initial focus on evolution,
 ecology, and related fields.


The National Evolutionary
  Synthesis Center, NSF-funded:
• Duke University,
• UNC at Chapel Hill
• North Carolina State University
1.  new domain
http://www.flickr.com/photos/paulhami/1020538523//
http://www.flickr.com/photos/paulhami/1020538523//
• evolution and ecology
    datasets
•   raw data that support results
•   upon publication
    or short embargo
•   publicly on the internet




                   http://www.flickr.com/photos/paulhami/1020538523//
challenges!

  1. No PubMed
  2. Diverse data types, norms, repositories
  3. Data almost always collected for a specific
     hypothesis
  4. Less public sharing so far
2.  new initiatives
JDAP
       •   The American Naturalist
       •   Evolution
       •   Journal of Evolutionary Biology
       •   Molecular Ecology
       •   Evolutionary Applications
       •   Genetics
       •   Heredity
       •   Molecular Biology and Evolution
       •   Systematic Biology
       •   Paleobiology
       •   BMC Evolutionary Biology
Blumenthal et al. Acad Med. 2006
        Campbell et al. JAMA. 2002.
Kyzas et al. J Natl Cancer Inst. 2005.
       Vogeli et al. Acad Med. 2006.
      Reidpath et al. Bioethics 2001.
http://www.flickr.com/photos/jima/606588905/
3.  Reuse.




             http://www.flickr.com/photos/boitabulle/3668162701/
who reuses data?
                  why?
     when?
                       who doesn’t?
which datasets are most likely 
 to be reused?
         how many datasets could be 
          reused but aren’t?
 why aren’t they?
      does it matter?
                  what can we do 
                   about it?
http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/
    Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png
I post my data, code, and statistical scripts on
GitHub (links from http://researchremix.org)
Share yours too!


                         http://www.flickr.com/photos/myklroventine/892446624/
“Does anyone want your data?

That’s hard to predict […]
After all, no one ever knocked on your door asking to
buy those figurines collecting dust in your cabinet
before you listed them on eBay.

Your data, too, may simply be awaiting an effective
matchmaker.”




                     Got data? Nature Neuroscience (2007)
Dept of Biomedical Informatics at U of Pittsburgh
Wendy Chapman for support and feedback
Todd Vision, Mike Whitlock for ongoing discussions
NIH NLM. NSF through DataONE, NESCent, Dryad.
Open science online community and those who release their
 articles, datasets and photos openly


                thank you
http://www.flickr.com/photos/jep42/3017149415/in/set-72157608797298056/
Journal
mandates




           variables
• readers
• reusers               perspectives,
• authors        and also driving towards
• editors             actionable results
                      for these groups
• reviewers
• funders
• database designers, maintainers, curators
• patients, subjects, or populations
http://www.flickr.com/photos/sunrise/35819369/
http://www.flickr.com/photos/fboyd/2156630044/
Correlates with self‐reported data 
withholding
            industry involvement
perceived competitiveness of field
                             male
   sharing discouraged in training
              human participants
            academic productivity
                                     0   1             2            3




                                             Blumenthal et al. Acad Med. 2006
Self‐reported reasons for data 
withholding
               sharing is too much effort
want student or jr faculty to publish more
   they themselves want to publish more
                                       cost
                         industrial sponsor
                             confidentiality
              commercial value of results
                                               0%   20%   40%    60%    80%



                                                      Campbell et al. JAMA 2002.
Table 2: Second-order factor loadings, by first-order factors

                   Amount of NIH funding
                0.88 Count of R01 & other NIH grants
                         0.49 Large NIH grant
                   -0.55 NO K funding or P funding

                       Cancer & humans
                        0.83 Humans & cancer

           OA journal & previous GEO-AE sharing
     0.59 Authors prev GEOAE sharing & OA & microarray creation
               0.43 Institution high citations & collaboration
             0.31 First author num prev pubs & first year pub
            -0.36 Last author num prev pubs & first year pub

               Journal impact factor and policy
                          0.57 Journal impact
            0.51 Last author num prev pubs & first year pub

                         Higher Ed in USA
            0.40 NO geo reuse + YES high institution output
           -0.44 Institution is government & NOT higher ed
Table 3: Second-order factor loadings, by   OA journal & previous GEO-AE sharing
original variables
                                              0.40 first.author.num.prev.geoae.sharing.tr
Amount of NIH funding                         0.37 pubmed.is.open.access
 0.87 nih.cumulative.years.tr                 0.37 first.author.num.prev.oa.tr
 0.85 num.grants.via.nih.tr                   0.35 last.author.num.prev.geoae.sharing.tr
 0.84 max.grant.duration.tr                   0.32 pubmed.is.effectiveness
 0.82 num.grant.numbers.tr                    0.32 last.author.num.prev.oa.tr
 0.80 pubmed.is.funded.nih                    0.31 pubmed.is.geo.reuse
 0.79 nih.max.max.dollars.tr                 -0.38 country.japan
 0.70 nih.sum.avg.dollars.tr
 0.70 nih.sum.sum.dollars.tr                Journal impact factor and policy
 0.59 has.R.funding                            0.48 journal.impact.factor.log
 0.59 num.post2003.morethan500k.tr             0.47 jour.policy.requires.microarray.accession
 0.58 country.usa                              0.46 jour.policy.mentions.exceptions
 0.58 has.U.funding                            0.46 pubmed.num.cites.from.pmc.tr
 0.57 has.R01.funding                          0.45 journal.5yr.impact.factor.log
 0.55 num.post2003.morethan750k.tr             0.45 jour.policy.contains.word.miame.mged
 0.53 has.T.funding                            0.42 last.author.num.prev.pmc.cites.tr
 0.53 num.post2003.morethan1000k.tr            0.41 jour.policy.requests.accession
 0.49 num.post2004.morethan500k.tr             0.40 journal.immediacy.index.log
 0.45 num.post2004.morethan750k.tr             0.40 journal.num.articles.2008.tr
 0.44 has.P.funding                            0.39 years.ago.tr
 0.43 num.post2004.morethan1000k.tr            0.36 jour.policy.says.must.deposit
 0.43 num.nih.is.nci.tr                        0.35 pubmed.num.cites.from.pmc.per.year
 0.35 num.post2005.morethan500k.tr             0.33 institution.mean.norm.citation.score
 0.32 num.nih.is.nigms.tr                      0.32 last.author.year.first.pub.ago.tr
 0.31 num.post2005.morethan750k.tr             0.31 country.usa
                                               0.31 last.author.num.prev.pubs.tr
Cancer & humans                                0.31 jour.policy.contains.word.microarray
  0.60 pubmed.is.cancer                       -0.31 pubmed.is.open.access
  0.59 pubmed.is.humans
  0.52 pubmed.is.cultured.cells             Higher Ed in USA
  0.43 pubmed.is.core.clinical.journal        0.36 institution.stanford
  0.39 institution.is.medical                 0.36 institution.is.higher.ed
 -0.58 pubmed.is.plants                       0.35 country.usa
 -0.50 pubmed.is.fungi                        0.35 has.R.funding
 -0.37 pubmed.is.shared.other                 0.33 has.R01.funding
 -0.30 pubmed.is.bacteria                     0.30 institution.harvard
                                             -0.37 institution.is.govnt

More Related Content

What's hot

IRJET- Characteristics of Research Process and Methods for Web-Based Rese...
IRJET-  	  Characteristics of Research Process and Methods for Web-Based Rese...IRJET-  	  Characteristics of Research Process and Methods for Web-Based Rese...
IRJET- Characteristics of Research Process and Methods for Web-Based Rese...IRJET Journal
 
Incentivizing data sharing: a "bottom up" perspective/Louise Bezuidenhout
Incentivizing data sharing: a "bottom up" perspective/Louise BezuidenhoutIncentivizing data sharing: a "bottom up" perspective/Louise Bezuidenhout
Incentivizing data sharing: a "bottom up" perspective/Louise BezuidenhoutAfrican Open Science Platform
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsMartin Donnelly
 
Open Data and the Panton Principles in the Humanities
Open Data and the Panton Principles in the HumanitiesOpen Data and the Panton Principles in the Humanities
Open Data and the Panton Principles in the HumanitiesOpen Knowledge Maps
 
2-6-14 ESI Supplemental Webinar: The Data Information Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information Literacy ProjectDuraSpace
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data managementCunera Buys
 
LEARN Conference - How to cost
LEARN Conference - How to costLEARN Conference - How to cost
LEARN Conference - How to costJisc RDM
 
Without data, science is merely an opinion: African Open Science Platform/Ina...
Without data, science is merely an opinion: African Open Science Platform/Ina...Without data, science is merely an opinion: African Open Science Platform/Ina...
Without data, science is merely an opinion: African Open Science Platform/Ina...African Open Science Platform
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data managementMichael Day
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Datacunera
 
Open science curriculum for students, June 2019
Open science curriculum for students, June 2019Open science curriculum for students, June 2019
Open science curriculum for students, June 2019Dag Endresen
 
Open science and data sharing: the DataFirst experience/Martin Wittenberg
Open science and data sharing: the DataFirst experience/Martin WittenbergOpen science and data sharing: the DataFirst experience/Martin Wittenberg
Open science and data sharing: the DataFirst experience/Martin WittenbergAfrican Open Science Platform
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
 
Open Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesOpen Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesMartin Donnelly
 
Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceJian Qin
 

What's hot (20)

Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
 
IRJET- Characteristics of Research Process and Methods for Web-Based Rese...
IRJET-  	  Characteristics of Research Process and Methods for Web-Based Rese...IRJET-  	  Characteristics of Research Process and Methods for Web-Based Rese...
IRJET- Characteristics of Research Process and Methods for Web-Based Rese...
 
Open Science Incentives/Veerle van den Eynden
Open Science Incentives/Veerle van den EyndenOpen Science Incentives/Veerle van den Eynden
Open Science Incentives/Veerle van den Eynden
 
Incentivizing data sharing: a "bottom up" perspective/Louise Bezuidenhout
Incentivizing data sharing: a "bottom up" perspective/Louise BezuidenhoutIncentivizing data sharing: a "bottom up" perspective/Louise Bezuidenhout
Incentivizing data sharing: a "bottom up" perspective/Louise Bezuidenhout
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and Solutions
 
Open Data and the Panton Principles in the Humanities
Open Data and the Panton Principles in the HumanitiesOpen Data and the Panton Principles in the Humanities
Open Data and the Panton Principles in the Humanities
 
2-6-14 ESI Supplemental Webinar: The Data Information Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information Literacy Project
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
LEARN Conference - How to cost
LEARN Conference - How to costLEARN Conference - How to cost
LEARN Conference - How to cost
 
Without data, science is merely an opinion: African Open Science Platform/Ina...
Without data, science is merely an opinion: African Open Science Platform/Ina...Without data, science is merely an opinion: African Open Science Platform/Ina...
Without data, science is merely an opinion: African Open Science Platform/Ina...
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
 
Open science curriculum for students, June 2019
Open science curriculum for students, June 2019Open science curriculum for students, June 2019
Open science curriculum for students, June 2019
 
Open science and data sharing: the DataFirst experience/Martin Wittenberg
Open science and data sharing: the DataFirst experience/Martin WittenbergOpen science and data sharing: the DataFirst experience/Martin Wittenberg
Open science and data sharing: the DataFirst experience/Martin Wittenberg
 
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521
 
Open Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesOpen Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practices
 
The African Open Science Platform/Geoffrey Boulton
The African Open Science Platform/Geoffrey BoultonThe African Open Science Platform/Geoffrey Boulton
The African Open Science Platform/Geoffrey Boulton
 
Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information Science
 
Data and communication of research: incentives and disincentives
Data and communication of research: incentives and disincentivesData and communication of research: incentives and disincentives
Data and communication of research: incentives and disincentives
 

Similar to Public data archiving: Who does? Who doesn't? What can we do about it?

Research into Open Research Data
Research into Open Research DataResearch into Open Research Data
Research into Open Research DataHeather Piwowar
 
Thesis defense, Heather Piwowar, Sharing biomedical research data
Thesis defense, Heather Piwowar, Sharing biomedical research dataThesis defense, Heather Piwowar, Sharing biomedical research data
Thesis defense, Heather Piwowar, Sharing biomedical research dataHeather Piwowar
 
Public Sharing of Research Datasets: A Pilot Study of Associations
Public Sharing of Research Datasets: A Pilot Study of Associations Public Sharing of Research Datasets: A Pilot Study of Associations
Public Sharing of Research Datasets: A Pilot Study of Associations Heather Piwowar
 
NEDCC 2010 Piwowar Leaders and Laggards
NEDCC 2010 Piwowar Leaders and LaggardsNEDCC 2010 Piwowar Leaders and Laggards
NEDCC 2010 Piwowar Leaders and LaggardsHeather Piwowar
 
Reputation as (dis)incentive
Reputation as (dis)incentiveReputation as (dis)incentive
Reputation as (dis)incentiveHeather Piwowar
 
NESCent visit: Measuring progress toward a cultural norm of shared (and reus...
NESCent visit:  Measuring progress toward a cultural norm of shared (and reus...NESCent visit:  Measuring progress toward a cultural norm of shared (and reus...
NESCent visit: Measuring progress toward a cultural norm of shared (and reus...Heather Piwowar
 
Thesis Proposal, as presented for dissertation proposal defense
Thesis Proposal, as presented for dissertation proposal defenseThesis Proposal, as presented for dissertation proposal defense
Thesis Proposal, as presented for dissertation proposal defenseHeather Piwowar
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015William Gunn
 
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04jodischneider
 
RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…ASIS&T
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Susanna-Assunta Sansone
 
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...GUANGYUAN PIAO
 
Kwon Ph.D. Dissertation 2016
Kwon Ph.D. Dissertation 2016Kwon Ph.D. Dissertation 2016
Kwon Ph.D. Dissertation 2016Karl Kwon, Ph.D.
 
Kyeongan Kwon - PhD Dissertation 2016
Kyeongan Kwon - PhD Dissertation 2016Kyeongan Kwon - PhD Dissertation 2016
Kyeongan Kwon - PhD Dissertation 2016Karl Kwon, Ph.D.
 
Scio12 sem web_final
Scio12 sem web_finalScio12 sem web_final
Scio12 sem web_finalKristi Holmes
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsARDC
 
Biositemaps: A Framework for Biomedical Resource Discovery
Biositemaps: A Framework for Biomedical Resource DiscoveryBiositemaps: A Framework for Biomedical Resource Discovery
Biositemaps: A Framework for Biomedical Resource DiscoveryTrish Whetzel
 
The Internet, Science, and Transformations of Knowledge
The Internet, Science, and Transformations of KnowledgeThe Internet, Science, and Transformations of Knowledge
The Internet, Science, and Transformations of KnowledgeEric Meyer
 

Similar to Public data archiving: Who does? Who doesn't? What can we do about it? (20)

Research into Open Research Data
Research into Open Research DataResearch into Open Research Data
Research into Open Research Data
 
Thesis defense, Heather Piwowar, Sharing biomedical research data
Thesis defense, Heather Piwowar, Sharing biomedical research dataThesis defense, Heather Piwowar, Sharing biomedical research data
Thesis defense, Heather Piwowar, Sharing biomedical research data
 
Public Sharing of Research Datasets: A Pilot Study of Associations
Public Sharing of Research Datasets: A Pilot Study of Associations Public Sharing of Research Datasets: A Pilot Study of Associations
Public Sharing of Research Datasets: A Pilot Study of Associations
 
NEDCC 2010 Piwowar Leaders and Laggards
NEDCC 2010 Piwowar Leaders and LaggardsNEDCC 2010 Piwowar Leaders and Laggards
NEDCC 2010 Piwowar Leaders and Laggards
 
Reputation as (dis)incentive
Reputation as (dis)incentiveReputation as (dis)incentive
Reputation as (dis)incentive
 
NESCent visit: Measuring progress toward a cultural norm of shared (and reus...
NESCent visit:  Measuring progress toward a cultural norm of shared (and reus...NESCent visit:  Measuring progress toward a cultural norm of shared (and reus...
NESCent visit: Measuring progress toward a cultural norm of shared (and reus...
 
Thesis Proposal, as presented for dissertation proposal defense
Thesis Proposal, as presented for dissertation proposal defenseThesis Proposal, as presented for dissertation proposal defense
Thesis Proposal, as presented for dissertation proposal defense
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
 
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04
 
RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
RDAP13 Jared Lyle: Domain Repositories and Institutional Repositories Partn…
 
Collaborative Data Management at the University of California
Collaborative Data Management at the University of CaliforniaCollaborative Data Management at the University of California
Collaborative Data Management at the University of California
 
Nicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShowNicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShow
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014
 
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
 
Kwon Ph.D. Dissertation 2016
Kwon Ph.D. Dissertation 2016Kwon Ph.D. Dissertation 2016
Kwon Ph.D. Dissertation 2016
 
Kyeongan Kwon - PhD Dissertation 2016
Kyeongan Kwon - PhD Dissertation 2016Kyeongan Kwon - PhD Dissertation 2016
Kyeongan Kwon - PhD Dissertation 2016
 
Scio12 sem web_final
Scio12 sem web_finalScio12 sem web_final
Scio12 sem web_final
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directions
 
Biositemaps: A Framework for Biomedical Resource Discovery
Biositemaps: A Framework for Biomedical Resource DiscoveryBiositemaps: A Framework for Biomedical Resource Discovery
Biositemaps: A Framework for Biomedical Resource Discovery
 
The Internet, Science, and Transformations of Knowledge
The Internet, Science, and Transformations of KnowledgeThe Internet, Science, and Transformations of Knowledge
The Internet, Science, and Transformations of Knowledge
 

More from Heather Piwowar

Calculating how much your University spends on Open Access--and what to do ab...
Calculating how much your University spends on Open Access--and what to do ab...Calculating how much your University spends on Open Access--and what to do ab...
Calculating how much your University spends on Open Access--and what to do ab...Heather Piwowar
 
How to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your UniversityHow to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your UniversityHeather Piwowar
 
Intro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid UseIntro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid UseHeather Piwowar
 
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
 The Future of OA: 
The Impact of Open Access on Readership and Subscription ... The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...Heather Piwowar
 
The time has come to talk of... who should own scholarly infrastructure?
 The time has come to talk of... who should own scholarly infrastructure? The time has come to talk of... who should own scholarly infrastructure?
The time has come to talk of... who should own scholarly infrastructure?Heather Piwowar
 
What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...Heather Piwowar
 
Data science needs Data and lots of it
Data science needs Data and lots of itData science needs Data and lots of it
Data science needs Data and lots of itHeather Piwowar
 
Impactstory OA week 2017
Impactstory OA week 2017Impactstory OA week 2017
Impactstory OA week 2017Heather Piwowar
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedHeather Piwowar
 
What's your Impactstory?
What's your Impactstory?What's your Impactstory?
What's your Impactstory?Heather Piwowar
 
capturing the impact of software AAS 2017
capturing the impact of software AAS 2017capturing the impact of software AAS 2017
capturing the impact of software AAS 2017Heather Piwowar
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedHeather Piwowar
 
submission summary for #WSSSPE Policy session on Credit, Citation, and Impact
submission summary for #WSSSPE Policy session on Credit, Citation, and Impactsubmission summary for #WSSSPE Policy session on Credit, Citation, and Impact
submission summary for #WSSSPE Policy session on Credit, Citation, and ImpactHeather Piwowar
 
Building Skyscrapers with our Scholarship
Building Skyscrapers with our ScholarshipBuilding Skyscrapers with our Scholarship
Building Skyscrapers with our ScholarshipHeather Piwowar
 
Right time, right place, to change the world
Right time, right place, to change the worldRight time, right place, to change the world
Right time, right place, to change the worldHeather Piwowar
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset useHeather Piwowar
 
Analyzing data about our data
Analyzing data about our dataAnalyzing data about our data
Analyzing data about our dataHeather Piwowar
 

More from Heather Piwowar (20)

Calculating how much your University spends on Open Access--and what to do ab...
Calculating how much your University spends on Open Access--and what to do ab...Calculating how much your University spends on Open Access--and what to do ab...
Calculating how much your University spends on Open Access--and what to do ab...
 
Unsub Lightning Talk
Unsub Lightning TalkUnsub Lightning Talk
Unsub Lightning Talk
 
How to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your UniversityHow to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your University
 
Intro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid UseIntro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid Use
 
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
 The Future of OA: 
The Impact of Open Access on Readership and Subscription ... The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
 
The time has come to talk of... who should own scholarly infrastructure?
 The time has come to talk of... who should own scholarly infrastructure? The time has come to talk of... who should own scholarly infrastructure?
The time has come to talk of... who should own scholarly infrastructure?
 
What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...
 
Data science needs Data and lots of it
Data science needs Data and lots of itData science needs Data and lots of it
Data science needs Data and lots of it
 
Oadoi and libraries
Oadoi and librariesOadoi and libraries
Oadoi and libraries
 
Impactstory OA week 2017
Impactstory OA week 2017Impactstory OA week 2017
Impactstory OA week 2017
 
Paperbuzz sneak peek
Paperbuzz sneak peekPaperbuzz sneak peek
Paperbuzz sneak peek
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learned
 
What's your Impactstory?
What's your Impactstory?What's your Impactstory?
What's your Impactstory?
 
capturing the impact of software AAS 2017
capturing the impact of software AAS 2017capturing the impact of software AAS 2017
capturing the impact of software AAS 2017
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learned
 
submission summary for #WSSSPE Policy session on Credit, Citation, and Impact
submission summary for #WSSSPE Policy session on Credit, Citation, and Impactsubmission summary for #WSSSPE Policy session on Credit, Citation, and Impact
submission summary for #WSSSPE Policy session on Credit, Citation, and Impact
 
Building Skyscrapers with our Scholarship
Building Skyscrapers with our ScholarshipBuilding Skyscrapers with our Scholarship
Building Skyscrapers with our Scholarship
 
Right time, right place, to change the world
Right time, right place, to change the worldRight time, right place, to change the world
Right time, right place, to change the world
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset use
 
Analyzing data about our data
Analyzing data about our dataAnalyzing data about our data
Analyzing data about our data
 

Recently uploaded

Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 

Recently uploaded (20)

Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 

Public data archiving: Who does? Who doesn't? What can we do about it?

  • 1. Public data archiving: Who shares? Who doesn’t? What can we do about it? Heather Piwowar Presented at UBC BLISS, Sept 2010 DataONE postdoc with Dryad and NESCent, @UBC PhD in Dept of Biomedical Informatics, U of Pittsburgh
  • 13. Find Organize Document Deidentify Format Decide Ask Submit Answer questions Worry about mistakes being found Worry about data being misinterpreted Worry about being scooped Forgo money and IP and prestige???
  • 15. As a result, policy makers have spent  lots of time and money .... http://www.flickr.com/photos/johnnyvulkan/381941233/ http://www.flickr.com/photos/tonivc/2283676770/
  • 17. lots of data sharing! http://www.genome.jp/en/db_growth.html
  • 18. but how much isn’t  shared? what isn’t shared? who isn’t sharing it? why not? how much does it matter? what can we do  about it?
  • 19. you can not manage  what you do not measure quote: Lord Kelvin http://www.flickr.com/photos/archeon/2941655917/
  • 20. As we seek to embrace and encourage data sharing, understanding patterns of adoption will allow us to make informed decisions about tools, policies, and best practices. Measuring adoption over time will allow us to note progress and identify best practices and opportunities for improvement.
  • 21. research questions 1. Is there benefit for those who share? 2. How can we study data sharing behaviour in a scalable, systematic way? 3. What factors are correlated with sharing and withholding data?
  • 23. Which data? http://www.flickr.com/photos/paulhami/1020538523//
  • 24. Where? http://www.flickr.com/photos/paulhami/1020538523//
  • 25. With whom? http://www.flickr.com/photos/paulhami/1020538523//
  • 26. When? http://www.flickr.com/photos/paulhami/1020538523//
  • 27. Under what terms? http://www.flickr.com/photos/paulhami/1020538523//
  • 30. • gene expression microarray data • raw intensity data • upon publication • publicly on the internet • (centralized databases) http://www.flickr.com/photos/paulhami/1020538523//
  • 31. http://en.wikipedia.org/wiki/DNA_microarray http://en.wikipedia.org/wiki/Image:Heatmap.png http://commons.wikimedia.org/wiki/ File:DNA_double_helix_vertikal.PNG microarray data
  • 32. microarray data
  • 33. 1.  Is there benefit for  those who share? http://www.flickr.com/photos/sunrise/35819369/
  • 34. currency of value? Citations.
  • 35. currency of value? Citations. $50! Diamond,Arthur M. What is a Citation Worth?. The Journal of Human Resources (1986) vol. 21 (2) pp. 200-215
  • 36. dataset 85 cancer microarray trials published in 1999-2003, as identified by Ntzani and Ioannidis (2003) citations ISI Web of Science Citation index, citations from 2004-2005 data sharing locations Publisher and lab websites, microarray databases, WayBack Internet Archive, Oncomine statistics Multivariate linear regression
  • 38.
  • 39. ~70%
  • 40. 2. Need automated methods to: a) Identify studies that create datasets b) Determine which of these have in fact been shared c) Extract attributes about the environment
  • 41. a) Identify studies that create datasets http://www.flickr.com/photos/lofaesofa/248546821/
  • 42. Look for wetlab methods in article full text: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1522022&tool=pmcentrez http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1590031&tool=pmcentrez http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1482311&tool=pmcentrez#id331936 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2082469&tool=pmcentrez http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=126870&tool=pmcentrez#id442745
  • 43. Combined, these full-text portals reach 85% of the articles available through U of Pittsburgh library subscriptions.
  • 44. But how to generate an effective query? Use open access articles.
  • 45. • text analysis: automatically catalogued single words and word-pairs from full text • assessed precision and recall • combined the high performers:
  • 46. Derived query: ("gene expression" AND microarray AND cell AND rna) AND (rneasy OR trizol OR "real-time pcr") NOT (“tissue microarray*” OR “cpg island*”)
  • 47. Evaluation: Ochsner et al. Nature Methods (2008) 400 studies across 20 journals Precision: 90% (conf int: 86% to 93%) Recall: 56% (conf int: 52% to 61%)
  • 48. a) Identify studies that create datasets b) Determine which of these have in fact been shared c) Extract attributes about the environment
  • 49. b) Determine which datasets have in fact been shared
  • 50.
  • 52.
  • 53. a) Identify studies that create datasets b) Determine which of these have in fact been shared c) Extract attributes about the environment
  • 54. Funder Journal Investigator Institution Study Is research data shared after publication?
  • 55. Funder Journal Investigator Institution Study funded by impact years since sector humans? NIH? factor first paper size mice? size of strength of # pubs grant policy impact plants? # citations rank sharing open cancer? plan req’d? access? previously country shared? clinical funded by number of trial? non-NIH? microarray previously reused? number of studies authors published gender year
  • 57. journal data sharing policy “An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols available in a publicly accessible database …” http://www.nature.com/authors/editorial_policies/availability.html http://www.nature.com/nature/journal/v453/n7197/index.html
  • 58. institution rank Yu et al. BMC medical informatics and decision making (2007) vol. 7 pp. 17
  • 60. author “experience” Author publication history: Author name Author-ity web service Torvik & Smalheiser. (2009). Author Name disambiguation: Disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3):11. Citation counts:
  • 62. funding level PubMed grant lists + NIH grant details
  • 63. funder mandates Requires a data sharing plan for studies funded after October 2003 that receive more than $500 000 in direct funding per year
  • 64. funder mandates Proxy for NIH data sharing policy applicability: If in any year since 2004, • funded by an NIH grant number with a “1” or “2” type code • received more than $750 000 in total funding from the grant
  • 65. and so on... 124 variables
  • 66. Now equipped with automated methods to: a) Identify studies that create datasets b) Determine which of these have in fact been shared c) Extract attributes about the environment
  • 68. 11,603 datapoints 25% had links from datasets in databases
  • 70. Proportion of articles with shared datasets, by year 0.35 Proportion of articles with datasets found in GEO or ArrayExpress 0.30 0.25 0.20 0.15 Across time 0.10 0.05 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Year article published
  • 71. Proportion of datasets shared 0.0 0.2 0.4 0.6 0.8 1.0 Physiol Genomics PLoS Genet Genome Biol Microbiology PLoS One BMC Genomics Plant Cell Genome Res Eukaryot Cell Appl Environ Microbiol BMC Med Genomics Hum Mol Genet Proc Natl Acad Sci U S A Infect Immun Am J Respir Cell Mol Biol Dev Biol J Bacteriol Mol Endocrinol BMC Cancer Plant Physiol Biol Reprod Blood J Immunol FASEB J Toxicol Sci J Exp Bot Nucleic Acids Res Diabetes Mol Cell Biol Mol Cancer Ther BMC Bioinformatics Stem Cells FEBS Lett J Neurosci Am J Pathol J Biol Chem J Virol OTHER Cancer Res J Clin Endocrinol Metab Plant Mol Biol Clin Cancer Res Genomics Journals Invest Ophthalmol Vis Sci Mol Hum Reprod Carcinogenesis Gene Endocrinology Oncogene Cancer Lett Biochem Biophys Res Commun (Physiological Genomics)
  • 72. Proportion of datasets shared 0.0 0.2 0.4 0.6 0.8 1.0 Stanford University University of Pennsylvania University of Illinois University of California, Los Angeles University of Wisconsin, Madison University of Washington University of California, Davis The University of British Columbia University of California, San Francisco University of Florida University of California, San Diego University of Minnesota, Twin Cities Baylor College of Medicine OTHER Max Planck Gesellschaft Harvard University Duke University Medical Center Yale University Johns Hopkins University University of Pittsburgh (Stanford) Washington University in Saint Louis University of Toronto University of California, Berkeley University of Michigan, Ann Arbor Michigan State University Institutions National Cancer Institute Tokyo Daigaku
  • 73. Proportion of datasets shared 0.0 0.2 0.4 0.6 0.8 1.0 1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 rank 1401 1501 1601 1701 1801 1901 Institution
  • 75.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
  • 88. multivariate logistic regression over the first-order factors
  • 89. Multivariate nonlinear regressions with interactions Odds Ratio 0.25 0.50 1.00 2.00 4.00 8.00 Has journal policy Multivariate nonlinear regressions with interactions Count of R01 & other NIH grants Odds Ratio 0.95 0.25 0.50 1.00 2.00 4.00 8.00 Authors prev GEOAE sharing & OA & microarray creation Has journal policy NO K funding other P funding Count of R01 & or NIH grants 0.95 Authors prev GEOAE sharing & OA & microarray creation NO K Journalfunding funding or P impact Institution high citations & collaboration Journal policy consequences & Journal impact long halflife Journal policy consequences & long halflife Institution high citations NOTcollaboration & animals or mice Instititution is government & NOT higher ed NOT animals or mice Last author num prev pubs & first year pub Large NIH grant Instititution is government & NOT higher ed Humans & cancer NO geo reuse + YES high institution output Last author num prev pubs & first year pub First author num prev pubs & first year pub Large NIH grant Humans & cancer NO geo reuse + YES high institution output First author num prev pubs & first year pub
  • 90. Multivariate nonlinear regressions with interactions Odds Ratio 0.25 0.50 1.00 2.00 4.00 8.00 Has journal policy Multivariate nonlinear regressions with interactions Count of R01 & other NIH grants Odds Ratio 0.95 0.25 0.50 1.00 2.00 4.00 8.00 Authors prev GEOAE sharing & OA & microarray creation Has journal policy NO K funding other P funding Count of R01 & or NIH grants 0.95 Authors prev GEOAE sharing & OA & microarray creation NO K Journalfunding funding or P impact Institution high citations & collaboration Journal policy consequences & Journal impact long halflife Journal policy consequences & long halflife Institution high citations NOTcollaboration & animals or mice Instititution is government & NOT higher ed NOT animals or mice Last author num prev pubs & first year pub Large NIH grant Instititution is government & NOT higher ed Humans & cancer NO geo reuse + YES high institution output Last author num prev pubs & first year pub First author num prev pubs & first year pub Large NIH grant Humans & cancer NO geo reuse + YES high institution output First author num prev pubs & first year pub
  • 92. Multivariate nonlinear regression with interactions Odds Ratio 0.25 0.50 1.00 2.00 4.00 OA journal & previous GEO-AE sharing Amount of NIH funding 0.95 Journal impact factor and policy Higher Ed in USA Cancer & humans
  • 93. Multivariate nonlinear regression with interactions Odds Ratio 0.25 0.50 1.00 2.00 4.00 OA journal & previous GEO-AE sharing Amount of NIH funding 0.95 Journal impact factor and policy Higher Ed in USA Cancer & humans
  • 94. Conclusions: • data sharing rates are increasing, but overall levels are low Preliminary evidence: • levels are particularly low in cancer • levels are highest for those who • publish in a journal with a policy • publish in an open access journal • have shared data before
  • 95. data and filters were imperfect • many assumptions • didn’t capture all types of sharing • don’t know how generalizable across datatypes • should be considered hypothesis-generating http://www.flickr.com/photos/vlastula/300102949/
  • 97. NSF-funded distributed framework and cyberinfrastructure for environmental science. Dryad is a repository of data underlying scientific publications, with an initial focus on evolution, ecology, and related fields. The National Evolutionary Synthesis Center, NSF-funded: • Duke University, • UNC at Chapel Hill • North Carolina State University
  • 101. • evolution and ecology datasets • raw data that support results • upon publication or short embargo • publicly on the internet http://www.flickr.com/photos/paulhami/1020538523//
  • 102. challenges! 1. No PubMed 2. Diverse data types, norms, repositories 3. Data almost always collected for a specific hypothesis 4. Less public sharing so far
  • 104.
  • 105. JDAP • The American Naturalist • Evolution • Journal of Evolutionary Biology • Molecular Ecology • Evolutionary Applications • Genetics • Heredity • Molecular Biology and Evolution • Systematic Biology • Paleobiology • BMC Evolutionary Biology
  • 106. Blumenthal et al. Acad Med. 2006 Campbell et al. JAMA. 2002. Kyzas et al. J Natl Cancer Inst. 2005. Vogeli et al. Acad Med. 2006. Reidpath et al. Bioethics 2001. http://www.flickr.com/photos/jima/606588905/
  • 107. 3.  Reuse. http://www.flickr.com/photos/boitabulle/3668162701/
  • 108. who reuses data? why? when? who doesn’t? which datasets are most likely  to be reused? how many datasets could be  reused but aren’t? why aren’t they? does it matter? what can we do  about it?
  • 109. http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/ Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png
  • 110.
  • 111.
  • 112. I post my data, code, and statistical scripts on GitHub (links from http://researchremix.org) Share yours too! http://www.flickr.com/photos/myklroventine/892446624/
  • 113. “Does anyone want your data? That’s hard to predict […] After all, no one ever knocked on your door asking to buy those figurines collecting dust in your cabinet before you listed them on eBay. Your data, too, may simply be awaiting an effective matchmaker.” Got data? Nature Neuroscience (2007)
  • 114. Dept of Biomedical Informatics at U of Pittsburgh Wendy Chapman for support and feedback Todd Vision, Mike Whitlock for ongoing discussions NIH NLM. NSF through DataONE, NESCent, Dryad. Open science online community and those who release their articles, datasets and photos openly thank you
  • 115.
  • 117. Journal mandates variables
  • 118. • readers • reusers perspectives, • authors and also driving towards • editors actionable results for these groups • reviewers • funders • database designers, maintainers, curators • patients, subjects, or populations
  • 119.
  • 120.
  • 122. Correlates with self‐reported data  withholding industry involvement perceived competitiveness of field male sharing discouraged in training human participants academic productivity 0 1 2 3 Blumenthal et al. Acad Med. 2006
  • 123. Self‐reported reasons for data  withholding sharing is too much effort want student or jr faculty to publish more they themselves want to publish more cost industrial sponsor confidentiality commercial value of results 0% 20% 40% 60% 80% Campbell et al. JAMA 2002.
  • 124. Table 2: Second-order factor loadings, by first-order factors Amount of NIH funding 0.88 Count of R01 & other NIH grants 0.49 Large NIH grant -0.55 NO K funding or P funding Cancer & humans 0.83 Humans & cancer OA journal & previous GEO-AE sharing 0.59 Authors prev GEOAE sharing & OA & microarray creation 0.43 Institution high citations & collaboration 0.31 First author num prev pubs & first year pub -0.36 Last author num prev pubs & first year pub Journal impact factor and policy 0.57 Journal impact 0.51 Last author num prev pubs & first year pub Higher Ed in USA 0.40 NO geo reuse + YES high institution output -0.44 Institution is government & NOT higher ed
  • 125. Table 3: Second-order factor loadings, by OA journal & previous GEO-AE sharing original variables 0.40 first.author.num.prev.geoae.sharing.tr Amount of NIH funding 0.37 pubmed.is.open.access 0.87 nih.cumulative.years.tr 0.37 first.author.num.prev.oa.tr 0.85 num.grants.via.nih.tr 0.35 last.author.num.prev.geoae.sharing.tr 0.84 max.grant.duration.tr 0.32 pubmed.is.effectiveness 0.82 num.grant.numbers.tr 0.32 last.author.num.prev.oa.tr 0.80 pubmed.is.funded.nih 0.31 pubmed.is.geo.reuse 0.79 nih.max.max.dollars.tr -0.38 country.japan 0.70 nih.sum.avg.dollars.tr 0.70 nih.sum.sum.dollars.tr Journal impact factor and policy 0.59 has.R.funding 0.48 journal.impact.factor.log 0.59 num.post2003.morethan500k.tr 0.47 jour.policy.requires.microarray.accession 0.58 country.usa 0.46 jour.policy.mentions.exceptions 0.58 has.U.funding 0.46 pubmed.num.cites.from.pmc.tr 0.57 has.R01.funding 0.45 journal.5yr.impact.factor.log 0.55 num.post2003.morethan750k.tr 0.45 jour.policy.contains.word.miame.mged 0.53 has.T.funding 0.42 last.author.num.prev.pmc.cites.tr 0.53 num.post2003.morethan1000k.tr 0.41 jour.policy.requests.accession 0.49 num.post2004.morethan500k.tr 0.40 journal.immediacy.index.log 0.45 num.post2004.morethan750k.tr 0.40 journal.num.articles.2008.tr 0.44 has.P.funding 0.39 years.ago.tr 0.43 num.post2004.morethan1000k.tr 0.36 jour.policy.says.must.deposit 0.43 num.nih.is.nci.tr 0.35 pubmed.num.cites.from.pmc.per.year 0.35 num.post2005.morethan500k.tr 0.33 institution.mean.norm.citation.score 0.32 num.nih.is.nigms.tr 0.32 last.author.year.first.pub.ago.tr 0.31 num.post2005.morethan750k.tr 0.31 country.usa 0.31 last.author.num.prev.pubs.tr Cancer & humans 0.31 jour.policy.contains.word.microarray 0.60 pubmed.is.cancer -0.31 pubmed.is.open.access 0.59 pubmed.is.humans 0.52 pubmed.is.cultured.cells Higher Ed in USA 0.43 pubmed.is.core.clinical.journal 0.36 institution.stanford 0.39 institution.is.medical 0.36 institution.is.higher.ed -0.58 pubmed.is.plants 0.35 country.usa -0.50 pubmed.is.fungi 0.35 has.R.funding -0.37 pubmed.is.shared.other 0.33 has.R01.funding -0.30 pubmed.is.bacteria 0.30 institution.harvard -0.37 institution.is.govnt