SlideShare uma empresa Scribd logo
1 de 41
BioMed Central’s open data
        initiatives
 Alliance for Permanent Access conference
              7th November 2012

              Iain Hrynaszkiewicz
    Publisher (Open Science), BioMed Central
     iain.hrynaszkiewicz@biomedcentral.com
                    @iainh_z
About BioMed Central
• Launched in 2000, largest global publisher of peer-
  reviewed open access journals (>240)
• >136,000 peer-reviewed open access articles published
• Part of Springer Science+Business Media since 2008
• Publish using Creative Commons (CC-BY) licenses
• Non-journal products include ISRCTN database
• Interested in innovation and recognise the growing need
  for data sharing and publication
  http://blogs.biomedcentral.com/bmcblog/tag/Open-Data/
BioMed Central and open data
• Increasing transparency in scientific research and
  scholarly communication is at the core of strategy
• Data are an increasingly integral part of scholarly
  communication, with many opportunities for increasing
  the pace of knowledge discovery
• Publishers, particularly open access publishers, are well-
  placed to share information across domain boundaries
   http://www.biomedcentral.com/about/access


“By ‘open data’ BioMed Central means that these data are freely available on the public
    internet permitting any user to download, copy, analyse, re-process, pass them to
    software or use them for any other purpose without financial, legal, or technical
    barriers other than those inseparable from gaining access to the internet itself. BioMed
    Central encourages the use of fully open formats wherever possible.”
BioMed Central open data initiatives
• Data journals and article types
• Open Data Award
• Data hosting, citation, deposition and linking
• Lab notebook-journal integration (LabArchives)
• Data licensing
• Guidance and best practice e.g. human subjects –
  confidentiality and consent
• Data formats and standards – efficient reuse
• Facilitation of data/text mining research
Problem: Lack of credit/recognition for
    data sharing and publication
• In science credit is everything but incentives for data
  publication are still emerging
• Datasets are not generally as discoverable and
  citable as journal articles – yet
• Requirements for data sharing are field/location-
  specific
• Need more empirical evidence of the benefits of data
  publication for individual scientists
Solution #1: Journals and article types
       enabling data publication
              Data notes: “[B]riefly describe a biomedical data
              set or database, with the data being readily
              accessible and attributed to a source”
              http://bit.ly/y3Jb3b
              Research: E.g. The International Stroke Trial
              database
              http://www.trialsjournal.com/content/12/1/101


              Data notes: “[E]xceptional datasets deposited
              in our GigaScience repository that have been
              selected for further peer review”
              http://bit.ly/yPBsAA
Solution #2: Open Data Award


“We ... recognize
researchers who
have ... have
demonstrated
leadership in the
sharing,
standardization,
publication, or re-use of
biomedical research
         http://www.biomedcentral.com/researchawards/opendata
data.”
Solution #3: Enable and
        encourage/require data citation
“References
...
Only articles, datasets and abstracts that have been published or
are in press, or are available through public e-print/preprint servers,
may be cited
…
“Dataset with persistent identifier
Zheng, L-Y; Guo, X-S; He, B; Sun, L-J; Peng, Y; Dong, S-S; Liu, T-
F; Jiang, S; Ramachandran, S; Liu, C-M; Jing, H-C (2011): Genome
data from sweet and grain sorghum (Sorghum bicolor).
GigaScience. http://dx.doi.org/10.5524/100012."




                       http://blogs.biomedcentral.com/bmcblog/2012/01/19/citing-and-linking-dat
Problem: Where can data be stored –
           permanently?
• Publishers not best placed to run repositories for long
  term preservation of large datasets
• Mirrors of publisher content not able to accept
  arbitrary amounts of additional data
• Many data repositories exist but most are
  domain/location specific and there are many different
  types of funding model, license agreement and
  persistent identifiers in use
Solution #1: Journal with integrated database
Editor-in-Chief:           Editor:                    Assistant Editor:
Laurie Goodman, BGI (USA) Scott Edmunds, BGI (China) Alexandra Basford, BGI (China)




                               GigaScience publishes ‘big-
                              data’ studies from the entire
                                spectrum of life sciences

                                               Benefits
                              • Novel publishing format -
                              manuscript publication and
                              data   hosting
                              • Assignment of data DOIs
                              allows separate data citation
                              •  The BGI is covering all APCs
                              for    the first year after
                              launch



     www.gigasciencejournal.
     com                                                   www.biomedcentral.c
http://gigadb.org/
GigaDB is a new database integrated with the GigaScience journal to meet the needs of a new generation of biological
and biomedical research as it enters the era of “big-data”… (see more)
http://gigadb.org/
Anatomy of a GigaScience Publication
 Idea




Study




           Metadata

           Data

Analysis




Answer
Solution #2: Comprehensive author
information on available data repositories


     http://datacite.org/repolist



                                    http://www.biomedcentral.com/about/su
Solution #3: Research on repositories



http://publicationethics.org/files/u661/EthicalEditing_Autumn2012_final.pdf



We are looking for
repositories with interests
in clinical research data –
can you help?
Problem: Data are not consistently
         linked to publications
• Data deposition policies are not established in all
  fields
• Even where they are links/accession numbers tend to
  be inconsistently presented and rarely cited
• Researchers may, independently of journal
  requirements, deposit data in repositories
• A missed opportunity to enhance the literature
Solution #1: ‘Availability of supporting
          data’ article section
• A tool to put data deposition policies – encouraged or
  mandated – into practice
• Provides links in a consistent place within an article to
  supporting data, regardless of the location or format
  of the data
• Data must be permanently available (DOI or
  equivalent)
• ~50 journals including GigaScience, BMC series




                       http://www.biomedcentral.com/about/supportingdata
Availability of supporting data



BMC Res Notes 2012, 5:21 http://www.biomedcentral.com/1756-0500/5/21/




GigaScience 2012, 1:3 http://www.gigasciencejournal.com/content/1/1/3
Solution #3: Lab notebook integration
    • BMC authors entitled to LabArchives’ (
      http://www.labarchives.com/bmc) online lab notebook
      with 100Mb of free storage
    • Features include:
      - Data publishing with DOIs assignment
      - Citable, linkable data supporting publications
      - Reusable/integrate-able data with CC0 waiver
      - Integrated manuscript submission to BMC journals
      - Additional free storage (standard is 25Mb)
http://blogs.openaccesscentral.com/blogs/bmcblog/entry/labarchives_and_biomed_central_a
LabArchives partnership
24 Oct 2012
Open data
partnership leads to
release of data
from Nobel Prize-
winning laboratory
for public use
http://www.biomedcentral.com/
presscenter/pressreleases/201
21024c
Problem: Licensing that restricts data
         integration and (re)use efficiently


http://pantonprinciples.org/
                                                           “[P]eople mis-use copyright licenses on
                                                           uncopyrightable materials and data sets: the
                                                           confusion of the legal right of attribution in
                                                           copyright with the academic and professional
                                                           norm of citation of one's efforts. ” John
                                                           Wilbanks, VP, Science, Creative Commons,
                                                           http://bit.ly/djl5Fa August 11, 2010
“...any restrictions on use should be strongly
resisted and we endorse explicit encouragement
of open sharing.” Schofield et al.: Post-publication
sharing of data and tools. Nature 2009, 461:171.

                                                       “The data should be released in standardized
                                                       formats without intellectual property constraints. ”
                                                       Conway PH, VanLare JM: Improving Access to
                                                       Health Care Data: The Open Government
                   http://www.isitopendata.org/        Strategy. JAMA 2010;304(9):1007-1008.
Why Creative Commons CC0?
• interoperability: CC0 is human and machine-
  readable
• universality: CC0 is global and universal and
  widely recognized
• simplicity: no need for humans to make, and
  respond to, individual data requests – avoids
  “attribution stacking” with CC-BY licenses
 Schaeffer P: Why does Dryad use CC0?
 http://blog.datadryad.org/2011/10/05/why-does-dryad-use-cc0/




                          http://creativecommons.org/publicdomain/zero/1.0/
Solution: Stakeholder engagement and
  community collaboration, leadership
Public consultation on
implementing CC0 for
data published in open
access journals: closes
10 th November 2012
http://blogs.biomedcentral.com/bmcblog/
2012/09/10/put-the-open-in-open-data/


Hrynaszkiewicz I, Cockerill MJ:
Open by default: a proposed
copyright license and waiver
agreement for open access
research and data in peer-
reviewed journals. BMC Research
Notes 2012, 5:494 
http://www.biomedcentral.com/1756-
0500/5/494
Implementing CC0 in journals – how?

• Specify a date from which the new license would
  apply to data (CC-BY remains for other content)
• Only applies to data submitted to the journal
• Some relatively minor technical and operational
  implications
• Cultural change may be the biggest challenge
• Consultation is identifying common concerns, FAQs,
  and further definitions and use cases for open data in
  journal publications
 Hrynaszkiewicz I, Cockerill MJ: Open by default: a proposed copyright
 license and waiver agreement for open access research and data in
 peer-reviewed journals. BMC Research Notes 2012, 5:494 
 http://www.biomedcentral.com/1756-0500/5/494
Problem: Lack of guidance, exemplars,
   incentives to make date reusable
• Sharing/publishing detailed human subjects data, in
  the absence of explicit consent, can potentially
  infringe privacy (ethically and legally)
• Data are more (re)usable if published in community
  endorsed, standard formats
• Standards and appropriate guidance do not yet exist
  in all domains
• Few incentives to follow data standards
Solution #1: Work with journal editors
to produce guidance where it is needed

                     BMJ 2010;340:c181
                     Co-published in:
                     Trials 2010, 11:9
Solution #2: Publish exemplars
Solution #2: Publish exemplars
Solution #3: Incentivize, promote and
             share best practice and standards
http://www.biomedcentral.com/bmcresnotes/series/datasharing   http://biosharing.org/standards_view
Problem: Adding value to data of use to
  researchers, readers and publishers
• Text/data mining applications often are research
  project or research specific and not always attractive
  to commercial publishing platforms and their
  customers
• Value to the non-expert can be limited
• Makes business model/case challenging for
  publishers
http://www.biomedcentral.com/about/datamining/
www.casesdatabase.com
www.casesdatabase.com –
      coming soon
www.casesdatabase.com –
      coming soon
www.casesdatabase.com –
      coming soon
The future...




Image adapted from Gillam
et al: The Healthcare
Singularity and the Age
of Semantic Medicine. In
The Fourth Paradigm (2009)
Questions?


            Iain Hrynaszkiewicz
     Publisher (Open Science), BioMed Central
      iain.hrynaszkiewicz@biomedcentral.com

http://www.mendeley.com/profiles/iain-hrynaszkiewicz/
            http://uk.linkedin.com/in/iainhz
                         @iainh_z

Mais conteúdo relacionado

Mais procurados

Mais procurados (18)

Research impact beyond metrics
Research impact beyond metricsResearch impact beyond metrics
Research impact beyond metrics
 
Searching for Trials for a Systematic Review
Searching for Trials for a Systematic ReviewSearching for Trials for a Systematic Review
Searching for Trials for a Systematic Review
 
Webinar: Literature Searching 101
Webinar: Literature Searching 101Webinar: Literature Searching 101
Webinar: Literature Searching 101
 
EAHIL CPD Pilot Program: 10 things you may not know about Cochrane Library - ...
EAHIL CPD Pilot Program: 10 things you may not know about Cochrane Library - ...EAHIL CPD Pilot Program: 10 things you may not know about Cochrane Library - ...
EAHIL CPD Pilot Program: 10 things you may not know about Cochrane Library - ...
 
EAHIL CPD Pilot Program: Search filters - what are they good for?
EAHIL CPD Pilot Program: Search filters - what are they good for?EAHIL CPD Pilot Program: Search filters - what are they good for?
EAHIL CPD Pilot Program: Search filters - what are they good for?
 
How to increase your Citations
How to increase your CitationsHow to increase your Citations
How to increase your Citations
 
Literature search pipeline
Literature search pipelineLiterature search pipeline
Literature search pipeline
 
Dent4104 OneSearch and EBP 2017
Dent4104 OneSearch and EBP 2017Dent4104 OneSearch and EBP 2017
Dent4104 OneSearch and EBP 2017
 
Assessing Research Impact: Bibliometrics, Citations and the H-Index
Assessing Research Impact: Bibliometrics, Citations and the H-IndexAssessing Research Impact: Bibliometrics, Citations and the H-Index
Assessing Research Impact: Bibliometrics, Citations and the H-Index
 
Demonstrating Research Impact: Measuring Return on Investment with an Impact ...
Demonstrating Research Impact: Measuring Return on Investment with an Impact ...Demonstrating Research Impact: Measuring Return on Investment with an Impact ...
Demonstrating Research Impact: Measuring Return on Investment with an Impact ...
 
searching for evidence
searching for evidencesearching for evidence
searching for evidence
 
HN313 Library Lecture Jan 26
HN313 Library Lecture Jan 26HN313 Library Lecture Jan 26
HN313 Library Lecture Jan 26
 
Scholarly Research: Application of Nursing Leadership Theory Research
Scholarly Research: Application of Nursing Leadership Theory ResearchScholarly Research: Application of Nursing Leadership Theory Research
Scholarly Research: Application of Nursing Leadership Theory Research
 
Search for evidence
Search for evidenceSearch for evidence
Search for evidence
 
Introduction to F1000Research: an open review journal
Introduction to F1000Research: an open review journalIntroduction to F1000Research: an open review journal
Introduction to F1000Research: an open review journal
 
Selection of journal for publication
Selection of journal for publicationSelection of journal for publication
Selection of journal for publication
 
Thakur Interim Research Products
Thakur Interim Research ProductsThakur Interim Research Products
Thakur Interim Research Products
 
Finding empirical evidence: E searching,evaluating evidence
Finding empirical evidence: E searching,evaluating evidence Finding empirical evidence: E searching,evaluating evidence
Finding empirical evidence: E searching,evaluating evidence
 

Semelhante a BioMed Central's open data initiatives

Semelhante a BioMed Central's open data initiatives (20)

Iain Hrynaszkiewicz - Research Integrity: Integrity of the published record
Iain Hrynaszkiewicz - Research Integrity: Integrity of the published recordIain Hrynaszkiewicz - Research Integrity: Integrity of the published record
Iain Hrynaszkiewicz - Research Integrity: Integrity of the published record
 
Nicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchNicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do research
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
 
Nicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShowNicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShow
 
How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...
 
How practising open research can benefit you
How practising open research can benefit youHow practising open research can benefit you
How practising open research can benefit you
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data Publishing
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
ACRL STS Liaisons Forum - AIBS
ACRL STS Liaisons Forum - AIBSACRL STS Liaisons Forum - AIBS
ACRL STS Liaisons Forum - AIBS
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing data
 
How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ Library
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands
 
Instituting an Institutional Repository for Sharing, Archiving, and Accessing...
Instituting an Institutional Repository for Sharing, Archiving, and Accessing...Instituting an Institutional Repository for Sharing, Archiving, and Accessing...
Instituting an Institutional Repository for Sharing, Archiving, and Accessing...
 
Implementing and Institutional Repository for Sharing, Archiving, and Accessi...
Implementing and Institutional Repository for Sharing, Archiving, and Accessi...Implementing and Institutional Repository for Sharing, Archiving, and Accessi...
Implementing and Institutional Repository for Sharing, Archiving, and Accessi...
 

BioMed Central's open data initiatives

  • 1. BioMed Central’s open data initiatives Alliance for Permanent Access conference 7th November 2012 Iain Hrynaszkiewicz Publisher (Open Science), BioMed Central iain.hrynaszkiewicz@biomedcentral.com @iainh_z
  • 2. About BioMed Central • Launched in 2000, largest global publisher of peer- reviewed open access journals (>240) • >136,000 peer-reviewed open access articles published • Part of Springer Science+Business Media since 2008 • Publish using Creative Commons (CC-BY) licenses • Non-journal products include ISRCTN database • Interested in innovation and recognise the growing need for data sharing and publication http://blogs.biomedcentral.com/bmcblog/tag/Open-Data/
  • 3. BioMed Central and open data • Increasing transparency in scientific research and scholarly communication is at the core of strategy • Data are an increasingly integral part of scholarly communication, with many opportunities for increasing the pace of knowledge discovery • Publishers, particularly open access publishers, are well- placed to share information across domain boundaries http://www.biomedcentral.com/about/access “By ‘open data’ BioMed Central means that these data are freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. BioMed Central encourages the use of fully open formats wherever possible.”
  • 4. BioMed Central open data initiatives • Data journals and article types • Open Data Award • Data hosting, citation, deposition and linking • Lab notebook-journal integration (LabArchives) • Data licensing • Guidance and best practice e.g. human subjects – confidentiality and consent • Data formats and standards – efficient reuse • Facilitation of data/text mining research
  • 5. Problem: Lack of credit/recognition for data sharing and publication • In science credit is everything but incentives for data publication are still emerging • Datasets are not generally as discoverable and citable as journal articles – yet • Requirements for data sharing are field/location- specific • Need more empirical evidence of the benefits of data publication for individual scientists
  • 6. Solution #1: Journals and article types enabling data publication Data notes: “[B]riefly describe a biomedical data set or database, with the data being readily accessible and attributed to a source” http://bit.ly/y3Jb3b Research: E.g. The International Stroke Trial database http://www.trialsjournal.com/content/12/1/101 Data notes: “[E]xceptional datasets deposited in our GigaScience repository that have been selected for further peer review” http://bit.ly/yPBsAA
  • 7. Solution #2: Open Data Award “We ... recognize researchers who have ... have demonstrated leadership in the sharing, standardization, publication, or re-use of biomedical research http://www.biomedcentral.com/researchawards/opendata data.”
  • 8. Solution #3: Enable and encourage/require data citation “References ... Only articles, datasets and abstracts that have been published or are in press, or are available through public e-print/preprint servers, may be cited … “Dataset with persistent identifier Zheng, L-Y; Guo, X-S; He, B; Sun, L-J; Peng, Y; Dong, S-S; Liu, T- F; Jiang, S; Ramachandran, S; Liu, C-M; Jing, H-C (2011): Genome data from sweet and grain sorghum (Sorghum bicolor). GigaScience. http://dx.doi.org/10.5524/100012." http://blogs.biomedcentral.com/bmcblog/2012/01/19/citing-and-linking-dat
  • 9. Problem: Where can data be stored – permanently? • Publishers not best placed to run repositories for long term preservation of large datasets • Mirrors of publisher content not able to accept arbitrary amounts of additional data • Many data repositories exist but most are domain/location specific and there are many different types of funding model, license agreement and persistent identifiers in use
  • 10. Solution #1: Journal with integrated database
  • 11. Editor-in-Chief: Editor: Assistant Editor: Laurie Goodman, BGI (USA) Scott Edmunds, BGI (China) Alexandra Basford, BGI (China) GigaScience publishes ‘big- data’ studies from the entire spectrum of life sciences Benefits • Novel publishing format - manuscript publication and data hosting • Assignment of data DOIs allows separate data citation • The BGI is covering all APCs for the first year after launch www.gigasciencejournal. com www.biomedcentral.c
  • 13. GigaDB is a new database integrated with the GigaScience journal to meet the needs of a new generation of biological and biomedical research as it enters the era of “big-data”… (see more)
  • 15. Anatomy of a GigaScience Publication Idea Study Metadata Data Analysis Answer
  • 16. Solution #2: Comprehensive author information on available data repositories http://datacite.org/repolist http://www.biomedcentral.com/about/su
  • 17. Solution #3: Research on repositories http://publicationethics.org/files/u661/EthicalEditing_Autumn2012_final.pdf We are looking for repositories with interests in clinical research data – can you help?
  • 18. Problem: Data are not consistently linked to publications • Data deposition policies are not established in all fields • Even where they are links/accession numbers tend to be inconsistently presented and rarely cited • Researchers may, independently of journal requirements, deposit data in repositories • A missed opportunity to enhance the literature
  • 19. Solution #1: ‘Availability of supporting data’ article section • A tool to put data deposition policies – encouraged or mandated – into practice • Provides links in a consistent place within an article to supporting data, regardless of the location or format of the data • Data must be permanently available (DOI or equivalent) • ~50 journals including GigaScience, BMC series http://www.biomedcentral.com/about/supportingdata
  • 20. Availability of supporting data BMC Res Notes 2012, 5:21 http://www.biomedcentral.com/1756-0500/5/21/ GigaScience 2012, 1:3 http://www.gigasciencejournal.com/content/1/1/3
  • 21. Solution #3: Lab notebook integration • BMC authors entitled to LabArchives’ ( http://www.labarchives.com/bmc) online lab notebook with 100Mb of free storage • Features include: - Data publishing with DOIs assignment - Citable, linkable data supporting publications - Reusable/integrate-able data with CC0 waiver - Integrated manuscript submission to BMC journals - Additional free storage (standard is 25Mb) http://blogs.openaccesscentral.com/blogs/bmcblog/entry/labarchives_and_biomed_central_a
  • 23. 24 Oct 2012 Open data partnership leads to release of data from Nobel Prize- winning laboratory for public use http://www.biomedcentral.com/ presscenter/pressreleases/201 21024c
  • 24. Problem: Licensing that restricts data integration and (re)use efficiently http://pantonprinciples.org/ “[P]eople mis-use copyright licenses on uncopyrightable materials and data sets: the confusion of the legal right of attribution in copyright with the academic and professional norm of citation of one's efforts. ” John Wilbanks, VP, Science, Creative Commons, http://bit.ly/djl5Fa August 11, 2010 “...any restrictions on use should be strongly resisted and we endorse explicit encouragement of open sharing.” Schofield et al.: Post-publication sharing of data and tools. Nature 2009, 461:171. “The data should be released in standardized formats without intellectual property constraints. ” Conway PH, VanLare JM: Improving Access to Health Care Data: The Open Government http://www.isitopendata.org/ Strategy. JAMA 2010;304(9):1007-1008.
  • 25. Why Creative Commons CC0? • interoperability: CC0 is human and machine- readable • universality: CC0 is global and universal and widely recognized • simplicity: no need for humans to make, and respond to, individual data requests – avoids “attribution stacking” with CC-BY licenses Schaeffer P: Why does Dryad use CC0? http://blog.datadryad.org/2011/10/05/why-does-dryad-use-cc0/ http://creativecommons.org/publicdomain/zero/1.0/
  • 26. Solution: Stakeholder engagement and community collaboration, leadership
  • 27. Public consultation on implementing CC0 for data published in open access journals: closes 10 th November 2012 http://blogs.biomedcentral.com/bmcblog/ 2012/09/10/put-the-open-in-open-data/ Hrynaszkiewicz I, Cockerill MJ: Open by default: a proposed copyright license and waiver agreement for open access research and data in peer- reviewed journals. BMC Research Notes 2012, 5:494  http://www.biomedcentral.com/1756- 0500/5/494
  • 28. Implementing CC0 in journals – how? • Specify a date from which the new license would apply to data (CC-BY remains for other content) • Only applies to data submitted to the journal • Some relatively minor technical and operational implications • Cultural change may be the biggest challenge • Consultation is identifying common concerns, FAQs, and further definitions and use cases for open data in journal publications Hrynaszkiewicz I, Cockerill MJ: Open by default: a proposed copyright license and waiver agreement for open access research and data in peer-reviewed journals. BMC Research Notes 2012, 5:494  http://www.biomedcentral.com/1756-0500/5/494
  • 29. Problem: Lack of guidance, exemplars, incentives to make date reusable • Sharing/publishing detailed human subjects data, in the absence of explicit consent, can potentially infringe privacy (ethically and legally) • Data are more (re)usable if published in community endorsed, standard formats • Standards and appropriate guidance do not yet exist in all domains • Few incentives to follow data standards
  • 30. Solution #1: Work with journal editors to produce guidance where it is needed BMJ 2010;340:c181 Co-published in: Trials 2010, 11:9
  • 31. Solution #2: Publish exemplars
  • 32. Solution #2: Publish exemplars
  • 33. Solution #3: Incentivize, promote and share best practice and standards http://www.biomedcentral.com/bmcresnotes/series/datasharing http://biosharing.org/standards_view
  • 34. Problem: Adding value to data of use to researchers, readers and publishers • Text/data mining applications often are research project or research specific and not always attractive to commercial publishing platforms and their customers • Value to the non-expert can be limited • Makes business model/case challenging for publishers
  • 40. The future... Image adapted from Gillam et al: The Healthcare Singularity and the Age of Semantic Medicine. In The Fourth Paradigm (2009)
  • 41. Questions? Iain Hrynaszkiewicz Publisher (Open Science), BioMed Central iain.hrynaszkiewicz@biomedcentral.com http://www.mendeley.com/profiles/iain-hrynaszkiewicz/ http://uk.linkedin.com/in/iainhz @iainh_z

Notas do Editor

  1. We publish under the Creative Commons license Authors/copyright owners irrevocably grant to anyone the right to use, reproduce or disseminate the research article in its entirety or in part in perpetuity Article processing charge is typically levied for each accepted article. When we say open access we mean research which is fee to build upon, distribute and use with the minimum of barriers – Budapest definition. We launched our first data publishing journal, BMC Research Notes, in 2008.
  2. In addition to service provision, as a successful open access publisher, increasing transparency is science communication is at the core of our strategy. And as a technology-driven company, we believe that data are and will become an increasingly integral part of the published scientific record. Moreover, publishers who serve a broad section of the scientific community are well placed to share information across domain boundaries i.e. Between scientists more likely to share data and those who are not. Our policy on open data is analogous to our policy on open access to papers, although as we will see the appropriate legal tools (licenses for data) are not the same.
  3. Current projects include but are not limited to.... In other words BMC’s initiatives have been about removing barriers to data sharing and publication, and solving problems. I’m not going to dwell too much on why we should preserve, share and publish research data. Data publication has moved on from the Why? And on to the How?
  4. So what are the problems we’re trying to solve? I imagine we all know that credit for data sharing is major barrier, and the mandates and incentives are often field-specific. Data sharing only happens routinely in some scientific fields (e.g. Physics, genomics have lead the way) Benefits of data sharing for science as whole, the economy and, in clinical research, patients, are well documented but evidence of the benefits to individual scientists and evidence of intelligent reuses are still emerging in some fields.
  5. Data journals are now not a new idea, and many publishers now have data journals. They are helpful for overcoming the credit problem because they make data publication equivalent to publication of research. BMC has always had, since it launched in 2000, a “reproducible research friendly” ethos, encouraging submission of supplementary materials (additional files). And, for more than 5 years BMC has published journals that specifically aim to make raw data available, either as additional files (supplementary material) with research or as data-driven articles – data papers (we call them data notes). And in the case of data notes (papers) data are the primary purpose of publication. Data notes describe a biomedical dataset or database, with the data being readily attributable to a source. Furthermore, efficient online publication processes can facilitate dataset publication. Currently, only a fraction of experimental data sets make it into the literature and many more datasets have the potential to be useful, but do not warrant a traditional publication. There are different approaches to data publishing and a number of new journals/services are emerging (F1000, datasets international). Datasets can still be included as supplementary material (virtually unlimited numbers of files no larger than 20Mb per file). BMCRN: Can be additional files or elsewhere on the web GigaScience: Data in the GigaScience repository, Gigadb Trials: Data as additional files (so far)
  6. Other solutions. Everyone likes getting prizes. Since 2010 we have had an open data category in our annual research awards, which aims to recognise researchers who have shown leadership in the sharing, standardization, publication or re-use of biomedical research data. The 2011 winner, awarded in 2012, is pictured here. More about the specifics of their dataset later.
  7. STM Association announced endorsement for data citation in reference lists in June 2012. BMC has included data citation in the style of every journal it publishes since January 2012. and since July 2011 selected journals have encouraged or required data citation and linking consistently from published research.
  8. We often hear debate about publishing additional files (supplementary material). Publishers not best placed to run repositories for long term preservation of large datasets Mirrors of publisher content not able to accept arbitrary amounts of additional data Long term preservation presents a challenge with respect to continuity BGI capable of sequencing ~2000 genomes per day (6 Tb/day = 2Pb/year)
  9. GigaScience aims to revolutionize data dissemination, organization and use. Publishes big data studies across the life sciences. Includes a novel publication format which links manuscript publication with an extensive database that hosts all associated data and provides analysis tools and cloud-computing resources. Published first articles in July 2012. High-profile articles on e.g. - Puerto Rican parrot genome
  10. But what if you want to publish everything – all your data, code (analysis tools). You want a fully executable account of your research, allowing others to explore, validate – and reproduce – your findings. Well, in data intensive science we have a solution for that problem too.
  11. Will soon look like. Database accepts data and code for all research and data papers accepted for GigaScience journal. It also accepts important datasets supporting publications in other journals. Has data from Genome Biology, Nature. Uses the Creative Commons CC0 waiver for all data ensuring maximum potential for integration and reuse.
  12. Data and code for one of the first research articles in GigaDB. About 80Gb of data. Large and linked in scientific publishing: the launch of ‘big data’ journal GigaScience   BGI, the world’s largest genomics institute, and BioMed Central, a leader in scientific data sharing, aim to revolutionize science publishing with the launch of GigaScience , a new open access, open data journal with a scope that embraces all life science research that generates ‘big data’.   This launch is a major first step towards the open access publication of complete, reproducible accounts of all parts of data-intensive scientific research projects. Together GigaScience and its integrated database Giga DB provide scientific analyses, full dataset hosting, and access to the software tools used to conduct these analyses, along with publication of more traditional scientific articles describing the studies.   Having all these together finally allows readers to not only glean the scientific conclusions in the papers, but also to directly test these using the underlying data and analysis tools. In this way, GigaScience offers a way to help overcome the growing problem of the lack of reproducibility of research. GigaScience publications also include Digital Object Identifier (DOIs) for all datasets in the journal database, Giga DB. This helps make datasets more permanent, as well as fully track-able, discoverable, linkable, and citable, which traditionally has only been possible for journal articles. Citation enables scientists, who generate these enormous datasets and share them with the community, to gain more appropriate credit for their contributions to research.   Laurie Goodman, Editor-in-Chief, says, “The full use of large-scale data has sadly lagged far behind our ability to produce it. The leaders of BGI realized they had the ability, given their vast computational resources, to create an innovative new journal format — one where enormous datasets could be fully hosted and directly linked to their original scientific studies. By including analysis tools in a data platform, as well as the planned addition of cloud technology later this year, GigaScience can serve as a means to put such data into the hands of researchers who do not have the vast computational resources required for optimal data use. This is in keeping with the goals of our co-publisher BioMed Central, which makes them the perfect partner in achieving this endeavour.”   Exemplifying GigaScience and Giga DB’s innovative approach to publishing, in the launch edition, is a research article from Stephan Beck’s group at the University College London, UK. This article focuses on ways to conduct whole-genome analyses of DNA methylation, an important mechanism that regulates gene expression. The article contains all of the supporting data and software tools needed to recreate the experiments — a total of 84 GB — freely available for download and reuse    from Giga DB. Using BGI’s data storage capacity, GigaScience is able to host these and other files, which are far larger than any other journals are able to publish. GigaDB furthermore supports open data by giving up all copyright in published datasets by its use of the Creative Commons CC0 public domain dedication waiver. This allows anyone to access and reuse published data without restrictions.       This is part of a   forward thinking, technology-driven approach to science publishing. As Publisher Iain Hrynaszkiewicz says, “We traditionally have only had access to limited amounts of scientific knowledge – usually articles in journals summarizing experiments – which means we do not reap the full benefits of research. Through GigaScience ’s open access, open data journal and database we are entering a new era of publishing, where large amounts of scientific data are as accessible, citable and interconnected as the literature which they support. BioMed Central are delighted to be leading this revolution in open data and science communication with GigaScience and BGI, which we hope will ultimately help make scientific research faster and more reliable.”   As well as this innovative, big-data-driven publication format the journal also provides reviews and commentaries that address the many hurdles that still need to be surmounted to improve future big-data handling .     BioMed Central and the GigaScience editors will be marking the journal’s launch at the ISMB conference 15-17 July 2012 (booth number 36).   -ENDS-   Media Contact Rebecca Fairbairn Public Relations Manager, BioMed Central Tel:  +44 (0) 20 3192 2433 Mob: +44 (0) 7825 257423 Email: [email_address]     Notes to Editors BioMed Central ( http://www.biomedcentral.com/ ) is an STM (Science, Technology and Medicine) publisher which has pioneered the open access publishing model. All peer-reviewed research articles published by BioMed Central are made immediately and freely accessible online, and are licensed to allow redistribution and reuse. BioMed Central is part of Springer Science+Business Media, a leading global publisher in the STM sector. BGI (formerly known as Beijing Genomics Institute) was founded in 1999 and has since become the largest genomic organization in the world. With a focus on research and applications in the healthcare, agriculture, conservation, and bio-energy fields, BGI has a proven track record of innovative, high profile research, which has generated over 178 publications in top-tier journals such as Nature and Science. BGI’s distinguished achievements have made a great contribution to the development of genomics in both China and the world. Their goal is to make leading-edge genomics highly accessible to the global research community by integrating industry’s best technology, economies of scale, and expert bioinformatics resources. BGI and its affiliates, BGI Americas and BGI Europe, have established partnerships and collaborations with leading academic and government research institutions, as well as global biotechnology and pharmaceutical companies.   GigaScience (http://www.gigasciencejournal.com) is co-published by BGI, the world’s largest genomics institute, and BioMed Central, the world’s largest open-access publisher. The journal covers research that uses or produces ‘big data’ from the full spectrum of the life-sciences. It also serves as a forum for discussing the difficulties of and unique needs for handling large-scale data from all areas of the life sciences. The journal has a completely novel publication format — one that integrates manuscript publication with complete data hosting, and analyses tool incorporation. To encourage transparent reporting of scientific research as well as enable future access and analyses, it is a requirement of manuscript submission to GigaScience that all supporting data and source code be made available in the GigaScience database, Giga DB (http://gigadb.org), as well as in their publicly available repositories. GigaScience will provide users access to associated online tools and workflows, and will be integrating cloud resources into the database later this year, maximizing the potential utility and re-use of data. (Follow us on twitter @GigaScience; and keep up-to-date on our blogs http://blogs.openaccesscentral.com/blogs/gigablog/feed/entries/rss).    
  13. Consider the typical flow of scientific knowledge, and we only usually capture these last two steps. The aim is to publish “executable” papers, capturing all outputs of a project, enabling truly reproducible research. Doesn’t replace community expectations to deposit data in e.g. Sequence databases but GigaDB can co-host, and include the analysis tools – the latter is not possible in e.g. NCBI databases.
  14. But of course few journals have this functionality. So therefore we’ve been working with DataCite, Digital Curation Centre and the British Library to provide a comprehensive list of >100 data repositories, which is linked from our instructions for authors. Also, if as a publisher we are encouraging data sharing and publication it’s a good service to our authors to tell them where it can be done.
  15. As well as passively identifying potentially useful repositories we are trying to fill known “gaps in the market” through research. In medicine, there is no obvious/central/major repository. We recently received funding to work with researchers at Ottawa to address these gaps in knowledge and produce comprehensive information on the features and practices of existing repositories which have common interests, or potential interest, in sharing and public disclosure of clinical trials data. The methodology of this study includes reviewing existing resources that catalogue information of data repositories, such as Databib ( http://databib.org/ ), literature review, analysis of websites of repositories, and engagement of relevant stakeholders – such as interviews with repository managers. We aim to capture the methods of existing repositories for public disclosure of clinical data, and non-public forms of data sharing, such as the unique and persistent identification systems for datasets; the license; use or other agreements employed by the repositories; sustainability (business) models to understand how they have addressed the issues and summarize what are considered good practices. We also plan to gather information on how repositories define raw data and meta-data; data formats and standards; methodology of data preparation; privacy; standards of quality control; policies and terms of data inclusion and access to data for re-use ; system architecture   ; features that encourage data sharing across geographical and domain boundaries, such as networks of repositories (federated) vs. centralized repository   , and collaborations   with other stakeholders in clinical research data including journals and publishers.  
  16. With all these repositories we know data archiving goes on in a number of disciplines/institutions but unless there is a community or journal/editorial mandate for data availability this will not be consistently checked as part of the peer review and publication process and documented in the article. If data are assigned persistent identifiers then they can potentially be linked to publications, and substantially enhance the reliability and reproducibility of the literature.
  17. A number of journals require ‘data sharing statements’ from their authors (e.g. BMJ, Annals), which are a step in the right direction. But I’m more for data sharing, rather than data sharing statements . Different approaches to data deposition policies. So BMC offers a different approach –an article section- that enables consistent linking of datasets when they are permanently available online. In effect, it is a tool for editors and communities to put data deposition policies into practice. Makes clear to readers when they can access data as well as paper
  18. Simple, standard statement that can be used for a variety of formats, and also when data are available in additional files.
  19. Another way to link data publications is give authors their own personal repository, and the ability to permanently publish datasets. This is what LabArchives offers through its online lab notebook. So anyone can be a data publisher now. Integrated submission – manuscript templates and links to submission system
  20. Last month a data paper was published in BMC Research Notes which described a data set from a Nobel Prize winning lab.
  21. Back to trying to solve problems. To gain the full benefit from scientific data it must be free to build upon and reuse with the minimum of barriers, which includes legal barriers. Removing them requires data to be placed in the public domain and there have been loud calls for the right license or, in fact, no copyright license to be applied to research data. Problems with copyright and data Some examples here from Nature, JAMA, and prominent figures and initiatives in the open science/data community. Most permissive license under which journal content is typically published is CC-BY, even in open access journals.
  22. A solution for making data maximally reusable in accordance with these open data principles – Creative Commons (again) CC0 public domain dedication waiver. For data this legal tool is attractive for a number of reasons. Waiver is a means to give up rights rather than assert them. And these reasons have been cogently expressed by someone else (Peggy Schaeffer at the Dryad repository), so I’ll paraphrase here. interoperability: Since CC0 is both human and machine-readable, other people and indexing services will automatically be able to determine the terms of use. universality: CC0 is a single mechanism that is both global and universal, covering all data and all countries.  It is also widely recognized. Can be applied in all jurisdictions simplicity: there is no need for humans to make, and respond to, individual data requests, and no need for click-through agreements.  This allows more scientists to spend their time doing science. Basically, it’s much more efficient.
  23. So what is the solution? Copyright and licensing aren’t often the topics at the forefront of scientists’ minds. How do you engage scientists and other stakeholders in research on licensing and promotion of reuse of data?
  24. Made a series of announcements including a draft position statement on open data, on the BMC blog. Invited contributions from the entire scientific community and held a series of working group meetings which led. This led to where we are today – development of a detailed proposal and protocol for implementing a variable/combined license agreement for OA journals, which places data in the public domain under CC0 and the remainder of papers under CC-BY. This went out for public consultation in September and is open until the end of this week.
  25. So, how can a combined CC-BY and CC0 license agreement be implemented? On one hand it’s quite easy. The publisher just sets a date from which submitting authors must agree to the new license. You would need to address some relatively minor technical and operational tasks, such as what license information you encode in article XML and, if you publish it, RDF. Also, you need to have a robust system for handing opt-outs when the standard license is not feasible for a small number of authors. But on the other hand it’s quite complicated, as there may well be cultural objections to change. A way of addressing this is a public consultation, which was launched last week by BioMed Central, along with publication of a detailed white paper setting out all the issues and practical steps. Publishing platform developments Implementing a new license agreement will have technical as well as policy and procedural implications. Tagging of articles and data files published under a non-standard license agreement (where authors have opted out of the new default open access-open data license) Editing standard embedded license information in article XML metadata and RDF and a tool to automate insertion of non-standard licensing terms Insertion of license information to additional files and associated metadata Furthermore, the following would be desirable to enhance the discoverability and usefulness of open data in journal articles: Tagging and classification of published data files, for example by file type A tool to automatically discover and aggregate additional files A tool to (retrospectively) associate data objects with papers on the web Approaches to associating published datasets with journal articles which go beyond hyper-linking, such as through linked data methods Searching within and filtering of additional files
  26. There are various domain specific challenges in preparing data for publication, which as a publisher we come to understand through our interactions with different communities. In clinical research patient privacy must be protected, but at the same time does not need to always be an excuse for not sharing data. Data standards are important for efficient data reuse but there are many of them, and not always the right guidelines and incentives for authors to adhere to them.
  27. From 2008-10 we worked with some of the editors of a journal of clinical trials research, Trials, to help facilitate publication of raw data. We convened a working group involving publishing, funding, research ethics, editorial stakeholders. Developed practical guidelines on preparing clinical data for publication while maintaining privacy, and the alternatives when open access to clinical data is not possible. Published in the BMJ and in Trials in 2010.
  28. This guidance was put into practice by the International Stroke Trial group, led by Peter Sandercock. They published a paper and dataset in Trials journal for the primary purpose of making the IPD, from one of the largest trials in acute stroke ever conducted, available for alternative analyses and reuse – comprising 19,000 IPD.
  29. Which interestingly amounted to less than 5Mb of data. The data will be useful for a number of purposes including teaching and for planning future trials (the resources currently available in the developing world are fairly similar to the resources available for the IST in 1990s).
  30. BMC Research Notes partnered with a group called BioSharing in Oxford to work collaboratively on the cataloguing and promotion of use of data standards. BMCNRN has since 2008 published data papers and encourages the publication of software tools, databases and data sets and a key objective of the journal is to ensure that associated data files will, wherever possible, be published in standardised, reusable formats and to define appropriate recommendations for domain-specific data file standards. As an incentive anyone developing a data standard for life sciences can publish a paper describing it and publishing an exemplary dataset in BMCRN – and BMC is waiving the article processing charge.
  31. BioMed Central has, since 2003 or 2004, made its full text corpus of articles published under CC-BY available for bulk download for data mining research. Enabling the scientific community, without special permission, to explore and build on the open access literature as a scientific resource.
  32. And we’ve always had our own visions of adding value to the open access literature. First mooted in 2007 when we launched our Journal of Medical Case Reports, our database of medical case reports is now soon to be launched. What is it?
  33. It’s search, effectively. It uses TEMIS text mining technology and medical ontologies to mine XML from all BioMed Central journals which publish medical case reports. And we are working with other publishers to include their content in the database. Publishers who use the CC-BY license and deposit content in PubMed Central can be included efficiently.
  34. Not a clinical decision support tool but can potentially supplement evidence-based resources when clinicians come across a situation where there isn’t a RCT or treatment guideline for the patient they are treating’s history, co-medications and co-morbidity. Also useful for teaching, for regulators, researchers.
  35. Uses textmining technology. OA and OA licenses make textmining a much more efficient process. So as well as generating “the problem” (in air quotes), OA and the internet can also find solutions to information overload. By removing barriers to discovery and reuse of the literature we may ultimately improve the pace of research and patient outcomes.
  36. Products such as CasesDB will help us deal with the inevitable information overload in the literature. OA may be driving growth in the literature but when done intelligently (with the right licenses) can also drive innovation and make the literature more useful.