Scott Edmunds talk in the "Policies and Standards for Reproducible Research" session on Revolutionizing Data Dissemination: GigaScience, at the Genomic Standards Consortium meeting at Shenzhen. 6th March 2012
2. Now taking submissions…
Large-Scale Data
Journal/Database
In conjunction with:
Editor-in-Chief: Laurie Goodman, PhD
Editor: Scott Edmunds, PhD
Assistant Editor: Alexandra Basford, PhD
Lead Curator: Tam Sneddon D.Phil
www.gigasciencejournal.com
13. New incentives/credit
Credit where credit is overdue:
“One option would be to provide researchers who release data to
public repositories with a means of accreditation.”
“An ability to search the literature for all online papers that used a
particular data set would enable appropriate attribution for those
who share. “
Nature Biotechnology 27, 579 (2009)
Prepublication data sharing
(Toronto International Data Release Workshop)
“Data producers benefit from creating a citable reference, as it can
?
later be used to reflect impact of the data sets.”
Nature 461, 168-170 (2009)
14. Datacitation: Datacite and DOIs
Aims to: “increase acceptance of research data as
legitimate, citable contributions to the
scholarly record”.
“data generated in the course of research
are just as valuable to the ongoing
academic discourse as papers and
monographs”.
15. For data citation to work, needs:
• Proven utility/potential user base.
• Acceptance/inclusion by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
17. BGI Datasets Get DOI®s
Many released pre-publication…
Invertebrate PLANTS
Ant Vertebrates Chinese cabbage
- Florida carpenter ant Giant panda Macaque Cucumber
- Jerdon’s jumping ant - Chinese rhesus Foxtail millet
- Leaf-cutter ant - Crab-eating Pigeonpea
Roundworm Naked mole rat Potato
Silkworm Penguin Sorghum
- Emperor penguin
Human - Adelie penguin
Asian individual (YH) Pigeon, domestic
- DNA Methylome Polar bear
- Genome Assembly Sheep
doi:10.5524/100004
- Transcriptome Tibetan antelope
Ancient DNA (coming soon)
- Saqqaq Eskimo Microbe
- Aboriginal Australian E. Coli O104:H4 TY-2482
Cell-Line
Chinese Hamster Ovary
18. Our first DOI:
To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G;
Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S;
Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z;
Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and
the Escherichia coli O104:H4 TY-2482 isolate genome sequencing
consortium (2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI
Shenzhen. doi:10.5524/100001
http://dx.doi.org/10.5524/100001
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
25. • Data submitted to NCBI databases:
- Raw data SRA:SRA046843
- Assemblies of 3 strains Genbank:AHAO00000000-AHAQ00000000
- SNPs dbSNP:1056306
- CNVs
-
-
InDels
SV
} dbGAP:nstd63
• Submission to public databases complemented by
its citable form in GigaDB.
32. Datacitation: tracking?
DataCite metadata in harvestable form (OAI-PMH)
Plans in 2012 to link central metadata repository with WoS
- Will finally track and credit use!
To be continued…
33. Thanks to:
Laurie Goodman Alexandra Basford
Tam Sneddon Shaoguang Liang
Tin-Lap Lee (CUHK) Qiong Luo (HKUST)
scott@gigasciencejournal.com
Contact us:
editorial@gigasciencejournal.com
@gigascience
Follow us: facebook.com/GigaScience
blogs.openaccesscentral.com/blogs/gigablog/
www.gigasciencejournal.com
34. GSC13 special series
Seeking submissions highlighting best practice in
genomics research:
• Discussion/comment/white papers
• Cloud computing, software for data handling
• Research highlighting best practice
• Rapid review - rolling publication after launch issue
• High-visibility – published/promoted by BMC/GigaScience
• Article Processing Charge covered by BGI
• Hosting of any test datasets in GigaDB
Contact: editorial@gigasciencejournal.com
www.gigasciencejournal.com
Editor's Notes
Helps reproducibility, but some debate over whether it can help that much regarding scaling.
Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.