SlideShare uma empresa Scribd logo
1 de 59
www.gigasciencejournal.com
Overview
           /               Genomics #101
                          Data-Sharing Issues
  Introduction


                          How it’s working…
Adventures in Data
    Citation
                          Downstream consequences…


   Our Examples           My two RMB/what is still
                          needed…
A brief history of genomics…




Human Genome Project: 1990-2003.
1 Genome = $3 Billion
   Source: http://www.genome.gov/Images/press_photos/highres/38-300.jpg
A brief history of genomics…




 Source: http://www.genome.gov/sequencingcosts/ (with apologies)
A brief history of genomics…




         1st Gen         2nd (next) Gen



                                              3rd (next-next) Gen?




 Source: http://www.genome.gov/sequencingcosts/ (with apologies)
A brief history of genomics…




         3rd (next-next) Gen?




 Source: http://www.genome.gov/sequencingcosts/ (with apologies)
BGI Introduction

• Formerly known as Beijing Genomics Institute
• Founded in 1999 (1% of HGP)
• Not-for-profit research institute funded by
  commercial sequencing-as-a-service
• Now the largest genomic organization in the world
• Goal
  – Use genomics technology to impact the society
  – Make leading edge genomics highly
    accessible to the global research community
Global, with HQ in Shenzhen
Global, with HQ in Shenzhen
Global Sequencing Capacity




                        Data Production
                          5.6 Tb / day
                > 1500X of human genome / day

                Multiple Supercomputing Centers
                       157 TB   Flops
                       20 TB Memory
                       14.7 PB Storage
BGI Sequencing Capacity




           Sequencers                 Data Production
137   Illumina/HiSeq 2000               5.6 Tb / day
27    LifeTech/SOLiD 4        > 1500X of human genome / day
1     454 GS FLX+                              137

2     Illumina iScan          Multiple Supercomputing Centers
1     Illumina MiSeq                 157 TB   Flops
1     Ion Torrent                    20 TB Memory
                                     14.7 PB Storage
Goal – “Just sequence it.”
  M+M+M: Million Genome Projects
• Plant and Animal Genomes: G10K, i5K...
• Variation Genomes: 10K rice resequencing....
• Human Genomes: Ancient, Population, Medical
• Cell Genomes: cancer single cell
• Micro Ecosystems: Metahit, EMP, etc.
• Personal Genomes
BGI Goes Denmark
BGI Goes Denmark
Genomics: the data-
sharing success story?:



                V
Sharing/reproducibility helped by
stability of:

                  1st Gen       2nd Gen


1. Platforms

1. Repositories             :

2. Standards
Genomics Data Sharing Policies…
   Bermuda Accords 1996/1997/1998:
   1. Automatic release of sequence assemblies within 24 hours.
   2. Immediate publication of finished annotated sequences.
   3. Aim to make the entire sequence freely available in the public domain for
      both research and development in order to maximise benefits to society.

   Fort Lauderdale Agreement, 2003:
   1. Sequence traces from whole genome shotgun projects are to be
      deposited in a trace archive within one week of production.
   2. Whole genome assemblies are to be deposited in a public nucleotide
      sequence database as soon as possible after the assembled sequence
      has met a set of quality evaluation criteria.
    Toronto International data release workshop, 2009:
    The goal was to reaffirm and refine, where needed, the policies related to
    the early release of genomic data, and to extend, if possible, similar data
    release policies to other types of large biological datasets – whether from
    proteomics, biobanking or metabolite research.
Challenges for the future…
  (A) Cumulative base pairs in INSDC over
  time, excluding the Trace Archive.




  (B) Base pairs in INSDC, broken down into
  selected data components.




Published by Oxford University Press 2011.
                                             Karsch-Mizrachi I et al. Nucl. Acids Res. 2012;40:D33-D37
Challenges for the future…
1. Data Volumes (transfer, backlogs, funding issues)

2. Compliance

3. Lack of interoperability/sufficient metadata

4. Long tail of curation (“Democratization” of “big-data”)
New incentives/credit
Credit where credit is overdue:
“One option would be to provide researchers who release data to
public repositories with a means of accreditation.”
“An ability to search the literature for all online papers that used a
particular data set would enable appropriate attribution for those
who share. “
Nature Biotechnology 27, 579 (2009)

Prepublication data sharing
(Toronto International Data Release Workshop)
“Data producers benefit from creating a citable reference, as it can
                                 ?
later be used to reflect impact of the data sets.”
Nature 461, 168-170 (2009)
New incentives/credit
      = Data Citation?
         “increase acceptance of research data as
         legitimate, citable contributions to the
         scholarly record”.

         “data generated in the course of research
         are just as valuable to the ongoing
         academic discourse as papers and
         monographs”. ?
First issue next month…




      Large-Scale Data
      Journal/Database
    In conjunction with:


Editor-in-Chief: Laurie Goodman, PhD
Editor: Scott Edmunds, PhD
Assistant Editor: Alexandra Basford, PhD
Lead Curator: Tam Sneddon D.Phil
  www.gigasciencejournal.com
Associated Database




   www.gigaDB.org
Papers in the era of big-data
       goal: Executable Research Objects




                              Citable DOI
Adventures in Data Citation




  doi:10.5524/100001
For data citation to work, needs:

1. Proven utility/potential user base.

2. Acceptance/inclusion by journals.

3. Data+Citation: inclusion in the references.

4. Tracking by citation indexes.

5. Usage of the metrics by the community…
Datacitation 1: utility/user base.
Establishment of data DOIs and use by databases:
                  Shackleton NJ, Hall MA, Vincent E (2001): Mean stable carbon isotope ratios
                  of Cibicidoides wuellerstorfi from sediment core MD95-2042 on the Iberian
                  margin, North Atlantic. PANGAEA - Data Publisher for Earth & Environmental
                  Science. http://doi.pangaea.de/10.1594/PANGAEA.58229
 Cited in:
 Pahnke K, Zahn R: Southern Hemisphere Water Mass Conversion Linked with North Atlantic
 Climate Variability. Science 2005, 307:1741 -1746.


                              Nocek B, Xu X, Savchenko A, Edwards A, Joachimiak A. 2007. PDB
                             ID: 2P06 Crystal structure of a predicted coding region AF_0060
                             from Archaeoglobus fulgidus DSM 4304. 10.2210/pdb2p06/pdb.

 Cited in:
 Andreeva A, Howorth D, Chandonia J-M, Brenner SE, Hubbard TJP, Chothia C, Murzin AG: Data
 growth and its impact on the SCOP database: new developments. Nucleic Acids Res.
 2008, 36:D419-425.
BGI Datasets Get DOI®s
Invertebrate
                                            Many released pre-publication…
Ant                                                    PLANTS
- Florida carpenter ant                                Chinese cabbage
                             Vertebrates
- Jerdon’s jumping ant                                 Cucumber
                             Giant panda Macaque
- Leaf-cutter ant                                      Foxtail millet
                             - Chinese rhesus
Roundworm                                              Pigeonpea
                             - Crab-eating
Schistosoma                                            Potato
                             Mini-Pig
Silkworm                                               Sorghum
                             Naked mole rat
                             Penguin
Human                        - Emperor penguin
Asian individual (YH)        - Adelie penguin
- DNA Methylome              Pigeon, domestic
- Genome Assembly            Polar bear
- Transcriptome              Sheep
                                                           doi:10.5524/100004

Cancer (14TB)                Tibetan antelope
Ancient DNA                  Microbe
- Saqqaq Eskimo              E. Coli O104:H4 TY-2482
- Aboriginal Australian
                             Cell-Line
                             Chinese Hamster Ovary
Our first DOI:


To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:

Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang,
J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J;
Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X;
Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the
Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium
(2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI
Shenzhen. doi:10.5524/100001
http://dx.doi.org/10.5524/100001
             To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
                 Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
Downstream consequences:
1. Therapeutics (primers, antimicrobials) 2. Platform Comparisons (Loman et al., Nature Biotech 2012)

3. Speed/legal-freedom




“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli
strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days
for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could
use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that
allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and
publish their work without wasting time on legal wrangling.”
Data Citation 2: acceptance by journals
Data Citation 2: acceptance by journals
Data+Citation 3: inclusion in the references
• Data submitted to NCBI databases:
-   Raw data                      SRA:SRA046843
-   Assemblies of 3 strains       Genbank:AHAO00000000-AHAQ00000000
-   SNPs                          dbSNP:1056306
-   CNVs
-
-
    InDels
    SV
                              }   dbVAR:nstd63


• Submission to public databases complemented by
  its citable form in GigaDB (doi:10.5524/100012).
In the references…
Is the DOI…
And now in Nature Biotech…
And in more journals…

               Hodkinson BP, Uehling JK, Smith ME (2012) Data from: Lepidostroma
               vilgalysii, a new basidiolichen from the New World. Dryad Digital
               Repository. doi:10.5061/dryad.j1g5dh23
Cited in:
Hodkinson BP, Uehling JK, Smith ME: Lepidostroma vilgalysii, a new basidiolichen
from the New World. Mycological Progress 2012. Advance Online Publication.



                        Roberts SB (2012) Herring Hepatic Transcriptome 34300
                        contigs.fa. Figshare. Available:
                        hdl.handle.net/10779/084d34370fbda29bbc6​7b3c5ecb02
                        575. Accessed 2012 Jan 20.
 Cited in:
 Roberts SB, Hauser L, Seeb LW, Seeb JE (2012) Development of Genomic Resources
 for Pacific Herring through Targeted Transcriptome Pyrosequencing. PLoS ONE 7(2):
 e30908. doi:10.1371/journal.pone.0030908
For data citation to work, needs:

1. Proven utility/potential user base.   ✔
2. Acceptance/inclusion by journals.     ✔
3. Data+Citation: inclusion in the references.   ✔
4. Tracking by citation indexes.

5. Usage of the metrics by the community…
Datacitation 4: tracking?
Datacitation 4: tracking?
                        ✗FAIL
       DataCite metadata in harvestable form (OAI-PMH)

               - lists some DataCite DOIs, but says:

Datasets listed are the “result of approximations in the indexing
algorithms.”
“Google Scholar's intended coverage is for scholarly articles. At
this point, we don't include datasets. “
Datacitation 4: tracking?
             ✗FAIL
DataCite metadata in harvestable form (OAI-PMH)




✗      Working on it.       Coming soon?
                               …the final
                              challenge?
Datacitation 5: metrics?
“As a result of diverse practices and tool
limitations, data citations are currently very
difficult to track.”
Datacitation 5: metrics?
                          ✗FAIL
    Research Remix, 29th May 2012: http://researchremix.wordpress.com/2012/05/29/dear-research-
    data-advocate-please-sign-the-petition-oamonday/

I’m afraid we are making promises to data
creators about attribution and reward that we
can’t keep. ”Make your data citeable!” is the cry.
Ok. So citeable is step one. Cited is step two. But
for the citation to be useful, it has to be indexed
so that citation metrics can be tracked and
admired and used.
Who is indexing data citations right now? As far
as I can tell: absolutely no one.
Where data citation is in 2012:
1. Proven utility/potential user base.   ✔
2. Acceptance/inclusion by journals.     ✔
3. Data+Citation: inclusion in the references.   ✔
4. Tracking by citation indexes.       ✗
5. Usage of the metrics by the community… ✗
Minor quibbles: export to citation managers

                       DCC/DataCite recommended format:
Zheng, L-Y; Guo, X-S; He, B; Sun, L-J; Peng, Y; Dong, S-S; Liu, T-F; Jiang, S;
Ramachandran, S; Liu, C-M; Jing, H-C; (2011): Genome data from sweet and grain
sorghum (Sorghum bicolor); GigaScience. http://dx.doi.org/10.5524/100012

                  formatting:
Zheng, L-Y (2011). Genome data from sweet and grain sorghum (Sorghum bicolor).
GigaScience. Retrieved from http://dx.doi.org/10.5524/100012


       Mendeley formatting:
Zheng L-Y  Guo X-S  He B  Sun L-J  Peng Y  Dong S-S  Liu T-F  Jiang S 
          ;          ;     ;       ;       ;          ;        ;        ;
Ramachandran S  Liu C-M  Jing H-C: Genome data from sweet and grain sorghum
                 ;       ;
(Sorghum bicolor). 2011.
Minor quibbles: clearer guidelines
     Rules for versioning/where do you set granularity?

   Experiment                                  e.g. doi:10.5524/100001        Papers
(e.g. ACRG project)


                                               e.g. doi:10.5524/100001-2     Data/
    Datasets                                                               Micropubs
 (e.g. cancer type)

                                               e.g. doi:10.5524/100001-2000
    Sample                                     or doi:10.5524/100001_xyz
(e.g. specimen xyz)



 Smaller still?       Facts/Assertations (~1013 in literature)             Nanopubs
Papers in the era of big-data
                            goal: Executable Research Objects

July 2012   Wilson GA, Dhami P, Feber A, Cortázar D, Suzuki Y, Schulz R, Schär P, Beck S:
            Resources for methylome analysis suitable for gene knockout studies of
            potential epigenome modifiers. GigaScience 2012, 1:3. (in press)
            GigaDB hosting all data + tools (84GB total): doi:10.5524/100035
                                                    +
            Partial (~80%) integration of workflow into our data platform.
            (all the data processing steps, but not the enrichment analysis)

            Data in ISA-Tab compliant format



Next stage…        Papers fully integrating all data + all workflows in our platform.
Do you have interesting large-scale
            biological data sets?
   Submit to:
• Rapid review/Open Access/High-visibility
• Article Processing Charge covered by BGI
• Hosting of any test datasets/workflows in GigaDB

   Interested in Reproducible Research?
Take part in our session on: “Cloud and workflows for reproducible bioinformatics”
Thanks to:
Laurie Goodman       Alexandra Basford
Tam Sneddon          Shaoguang Liang
Tin-Lap Lee (CUHK)   Qiong Luo (HKUST)
                        scott@gigasciencejournal.com
Contact us:
                        editorial@gigasciencejournal.com



                          @gigascience

 Follow us:               facebook.com/GigaScience

                          blogs.openaccesscentral.com/blogs/gigablog/


            www.gigasciencejournal.com

Mais conteúdo relacionado

Mais procurados

Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Ellinor Michel
 
Mouse-Human Research Classifier
Mouse-Human Research ClassifierMouse-Human Research Classifier
Mouse-Human Research ClassifierOsama Jomaa
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Jonathan Eisen
 
DNA Testing: Living Longer Via Personal Genomics
DNA Testing: Living Longer Via Personal GenomicsDNA Testing: Living Longer Via Personal Genomics
DNA Testing: Living Longer Via Personal GenomicsMelanie Swan
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Surya Saha
 
Why Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItWhy Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItAnita de Waard
 
Parfrey smbe euk_2013_final
Parfrey smbe euk_2013_finalParfrey smbe euk_2013_final
Parfrey smbe euk_2013_finalLaura_Parfrey
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
 
Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomesSurya Saha
 
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Surya Saha
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgAndrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Andrew Su
 
Personal Genomes: what can I do with my data?
Personal Genomes: what can I do with my data?Personal Genomes: what can I do with my data?
Personal Genomes: what can I do with my data?Melanie Swan
 
Little Rotters: Adventures With Plant-Pathogenic Bacteria
Little Rotters: Adventures With Plant-Pathogenic BacteriaLittle Rotters: Adventures With Plant-Pathogenic Bacteria
Little Rotters: Adventures With Plant-Pathogenic BacteriaLeighton Pritchard
 
Microbiology an evolving science 3rd edition
Microbiology an evolving science 3rd editionMicrobiology an evolving science 3rd edition
Microbiology an evolving science 3rd editionJimmy Liang
 

Mais procurados (20)

In a Different Class?
In a Different Class?In a Different Class?
In a Different Class?
 
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
 
Pathogen Genome Data
Pathogen Genome DataPathogen Genome Data
Pathogen Genome Data
 
Mouse-Human Research Classifier
Mouse-Human Research ClassifierMouse-Human Research Classifier
Mouse-Human Research Classifier
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
 
DNA Testing: Living Longer Via Personal Genomics
DNA Testing: Living Longer Via Personal GenomicsDNA Testing: Living Longer Via Personal Genomics
DNA Testing: Living Longer Via Personal Genomics
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
 
Why Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItWhy Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About It
 
Parfrey smbe euk_2013_final
Parfrey smbe euk_2013_finalParfrey smbe euk_2013_final
Parfrey smbe euk_2013_final
 
Coyne CV Nov 2016
Coyne CV Nov 2016Coyne CV Nov 2016
Coyne CV Nov 2016
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomes
 
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
 
Zfin
ZfinZfin
Zfin
 
Personal Genomes: what can I do with my data?
Personal Genomes: what can I do with my data?Personal Genomes: what can I do with my data?
Personal Genomes: what can I do with my data?
 
Little Rotters: Adventures With Plant-Pathogenic Bacteria
Little Rotters: Adventures With Plant-Pathogenic BacteriaLittle Rotters: Adventures With Plant-Pathogenic Bacteria
Little Rotters: Adventures With Plant-Pathogenic Bacteria
 
2013 alumni-webinar
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinar
 
Microbiology an evolving science 3rd edition
Microbiology an evolving science 3rd editionMicrobiology an evolving science 3rd edition
Microbiology an evolving science 3rd edition
 

Destaque

2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...
2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...
2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...datacite
 
Data Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersData Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersBrian Hole
 
Why Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointWhy Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointMark Parsons
 
Thoughts on addressing data citation challenges: experiences of Vibrant project
Thoughts on addressing data citation challenges: experiences of Vibrant projectThoughts on addressing data citation challenges: experiences of Vibrant project
Thoughts on addressing data citation challenges: experiences of Vibrant projectVince Smith
 
Let's talk about data: Citation and publication
Let's talk about data: Citation and publicationLet's talk about data: Citation and publication
Let's talk about data: Citation and publicationAdam Leadbetter
 
Moving beyond the box: automating the digitisation of insect collections
Moving beyond the box: automating the digitisation of insect collectionsMoving beyond the box: automating the digitisation of insect collections
Moving beyond the box: automating the digitisation of insect collectionsVince Smith
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeVince Smith
 

Destaque (9)

Data citation - new AGU guidelines
Data citation - new AGU guidelinesData citation - new AGU guidelines
Data citation - new AGU guidelines
 
2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...
2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...
2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...
 
Data Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersData Citation: A Critical Role for Publishers
Data Citation: A Critical Role for Publishers
 
Why Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointWhy Data Citation Currently Misses the Point
Why Data Citation Currently Misses the Point
 
Data Exchange, Data Citation: An overview of some community work
Data Exchange, Data Citation: An overview of some community workData Exchange, Data Citation: An overview of some community work
Data Exchange, Data Citation: An overview of some community work
 
Thoughts on addressing data citation challenges: experiences of Vibrant project
Thoughts on addressing data citation challenges: experiences of Vibrant projectThoughts on addressing data citation challenges: experiences of Vibrant project
Thoughts on addressing data citation challenges: experiences of Vibrant project
 
Let's talk about data: Citation and publication
Let's talk about data: Citation and publicationLet's talk about data: Citation and publication
Let's talk about data: Citation and publication
 
Moving beyond the box: automating the digitisation of insect collections
Moving beyond the box: automating the digitisation of insect collectionsMoving beyond the box: automating the digitisation of insect collections
Moving beyond the box: automating the digitisation of insect collections
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 

Semelhante a Scott Edmunds at DataCite 2012: Adventures in Data Citation

Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceGigaScience, BGI Hong Kong
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingGigaScience, BGI Hong Kong
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
GigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience, BGI Hong Kong
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeGigaScience, BGI Hong Kong
 
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...GigaScience, BGI Hong Kong
 
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...GigaScience, BGI Hong Kong
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersLarry Smarr
 
Scott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire TalkScott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire TalkGigaScience, BGI Hong Kong
 
GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience, BGI Hong Kong
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p collegeSKUASTKashmir
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...GigaScience, BGI Hong Kong
 
Sequencing Genomics: The New Big Data Driver
Sequencing Genomics:The New Big Data DriverSequencing Genomics:The New Big Data Driver
Sequencing Genomics: The New Big Data DriverLarry Smarr
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...GigaScience, BGI Hong Kong
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
 
Genome sequencing and the development of our current information library
Genome sequencing and the development of our current information libraryGenome sequencing and the development of our current information library
Genome sequencing and the development of our current information libraryZarlishAttique1
 
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysisTin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysisGigaScience, BGI Hong Kong
 

Semelhante a Scott Edmunds at DataCite 2012: Adventures in Data Citation (20)

Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
GigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDB
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
 
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics Researchers
 
Scott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire TalkScott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
 
GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
 
Sequencing Genomics: The New Big Data Driver
Sequencing Genomics:The New Big Data DriverSequencing Genomics:The New Big Data Driver
Sequencing Genomics: The New Big Data Driver
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Genome sequencing and the development of our current information library
Genome sequencing and the development of our current information libraryGenome sequencing and the development of our current information library
Genome sequencing and the development of our current information library
 
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysisTin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
 
Shorthouse
ShorthouseShorthouse
Shorthouse
 

Mais de GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixGigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserGigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceGigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveGigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...GigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
 
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...GigaScience, BGI Hong Kong
 

Mais de GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
 

Último

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Último (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Scott Edmunds at DataCite 2012: Adventures in Data Citation

  • 2. Overview / Genomics #101 Data-Sharing Issues Introduction How it’s working… Adventures in Data Citation Downstream consequences… Our Examples My two RMB/what is still needed…
  • 3. A brief history of genomics… Human Genome Project: 1990-2003. 1 Genome = $3 Billion Source: http://www.genome.gov/Images/press_photos/highres/38-300.jpg
  • 4. A brief history of genomics… Source: http://www.genome.gov/sequencingcosts/ (with apologies)
  • 5. A brief history of genomics… 1st Gen 2nd (next) Gen 3rd (next-next) Gen? Source: http://www.genome.gov/sequencingcosts/ (with apologies)
  • 6. A brief history of genomics… 3rd (next-next) Gen? Source: http://www.genome.gov/sequencingcosts/ (with apologies)
  • 7. BGI Introduction • Formerly known as Beijing Genomics Institute • Founded in 1999 (1% of HGP) • Not-for-profit research institute funded by commercial sequencing-as-a-service • Now the largest genomic organization in the world • Goal – Use genomics technology to impact the society – Make leading edge genomics highly accessible to the global research community
  • 8. Global, with HQ in Shenzhen
  • 9. Global, with HQ in Shenzhen
  • 10. Global Sequencing Capacity Data Production 5.6 Tb / day > 1500X of human genome / day Multiple Supercomputing Centers 157 TB Flops 20 TB Memory 14.7 PB Storage
  • 11. BGI Sequencing Capacity Sequencers Data Production 137 Illumina/HiSeq 2000 5.6 Tb / day 27 LifeTech/SOLiD 4 > 1500X of human genome / day 1 454 GS FLX+ 137 2 Illumina iScan Multiple Supercomputing Centers 1 Illumina MiSeq 157 TB Flops 1 Ion Torrent 20 TB Memory 14.7 PB Storage
  • 12.
  • 13. Goal – “Just sequence it.” M+M+M: Million Genome Projects • Plant and Animal Genomes: G10K, i5K... • Variation Genomes: 10K rice resequencing.... • Human Genomes: Ancient, Population, Medical • Cell Genomes: cancer single cell • Micro Ecosystems: Metahit, EMP, etc. • Personal Genomes
  • 16. Genomics: the data- sharing success story?: V
  • 17. Sharing/reproducibility helped by stability of: 1st Gen 2nd Gen 1. Platforms 1. Repositories : 2. Standards
  • 18. Genomics Data Sharing Policies… Bermuda Accords 1996/1997/1998: 1. Automatic release of sequence assemblies within 24 hours. 2. Immediate publication of finished annotated sequences. 3. Aim to make the entire sequence freely available in the public domain for both research and development in order to maximise benefits to society. Fort Lauderdale Agreement, 2003: 1. Sequence traces from whole genome shotgun projects are to be deposited in a trace archive within one week of production. 2. Whole genome assemblies are to be deposited in a public nucleotide sequence database as soon as possible after the assembled sequence has met a set of quality evaluation criteria. Toronto International data release workshop, 2009: The goal was to reaffirm and refine, where needed, the policies related to the early release of genomic data, and to extend, if possible, similar data release policies to other types of large biological datasets – whether from proteomics, biobanking or metabolite research.
  • 19. Challenges for the future… (A) Cumulative base pairs in INSDC over time, excluding the Trace Archive. (B) Base pairs in INSDC, broken down into selected data components. Published by Oxford University Press 2011. Karsch-Mizrachi I et al. Nucl. Acids Res. 2012;40:D33-D37
  • 20. Challenges for the future… 1. Data Volumes (transfer, backlogs, funding issues) 2. Compliance 3. Lack of interoperability/sufficient metadata 4. Long tail of curation (“Democratization” of “big-data”)
  • 21. New incentives/credit Credit where credit is overdue: “One option would be to provide researchers who release data to public repositories with a means of accreditation.” “An ability to search the literature for all online papers that used a particular data set would enable appropriate attribution for those who share. “ Nature Biotechnology 27, 579 (2009) Prepublication data sharing (Toronto International Data Release Workshop) “Data producers benefit from creating a citable reference, as it can ? later be used to reflect impact of the data sets.” Nature 461, 168-170 (2009)
  • 22. New incentives/credit = Data Citation? “increase acceptance of research data as legitimate, citable contributions to the scholarly record”. “data generated in the course of research are just as valuable to the ongoing academic discourse as papers and monographs”. ?
  • 23. First issue next month… Large-Scale Data Journal/Database In conjunction with: Editor-in-Chief: Laurie Goodman, PhD Editor: Scott Edmunds, PhD Assistant Editor: Alexandra Basford, PhD Lead Curator: Tam Sneddon D.Phil www.gigasciencejournal.com
  • 24. Associated Database www.gigaDB.org
  • 25. Papers in the era of big-data goal: Executable Research Objects Citable DOI
  • 26. Adventures in Data Citation doi:10.5524/100001
  • 27. For data citation to work, needs: 1. Proven utility/potential user base. 2. Acceptance/inclusion by journals. 3. Data+Citation: inclusion in the references. 4. Tracking by citation indexes. 5. Usage of the metrics by the community…
  • 28. Datacitation 1: utility/user base. Establishment of data DOIs and use by databases: Shackleton NJ, Hall MA, Vincent E (2001): Mean stable carbon isotope ratios of Cibicidoides wuellerstorfi from sediment core MD95-2042 on the Iberian margin, North Atlantic. PANGAEA - Data Publisher for Earth & Environmental Science. http://doi.pangaea.de/10.1594/PANGAEA.58229 Cited in: Pahnke K, Zahn R: Southern Hemisphere Water Mass Conversion Linked with North Atlantic Climate Variability. Science 2005, 307:1741 -1746. Nocek B, Xu X, Savchenko A, Edwards A, Joachimiak A. 2007. PDB ID: 2P06 Crystal structure of a predicted coding region AF_0060 from Archaeoglobus fulgidus DSM 4304. 10.2210/pdb2p06/pdb. Cited in: Andreeva A, Howorth D, Chandonia J-M, Brenner SE, Hubbard TJP, Chothia C, Murzin AG: Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008, 36:D419-425.
  • 29. BGI Datasets Get DOI®s Invertebrate Many released pre-publication… Ant PLANTS - Florida carpenter ant Chinese cabbage Vertebrates - Jerdon’s jumping ant Cucumber Giant panda Macaque - Leaf-cutter ant Foxtail millet - Chinese rhesus Roundworm Pigeonpea - Crab-eating Schistosoma Potato Mini-Pig Silkworm Sorghum Naked mole rat Penguin Human - Emperor penguin Asian individual (YH) - Adelie penguin - DNA Methylome Pigeon, domestic - Genome Assembly Polar bear - Transcriptome Sheep doi:10.5524/100004 Cancer (14TB) Tibetan antelope Ancient DNA Microbe - Saqqaq Eskimo E. Coli O104:H4 TY-2482 - Aboriginal Australian Cell-Line Chinese Hamster Ovary
  • 30. Our first DOI: To maximize its utility to the research community and aid those fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as: Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001 To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
  • 31.
  • 32.
  • 33.
  • 34. Downstream consequences: 1. Therapeutics (primers, antimicrobials) 2. Platform Comparisons (Loman et al., Nature Biotech 2012) 3. Speed/legal-freedom “Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and publish their work without wasting time on legal wrangling.”
  • 35. Data Citation 2: acceptance by journals
  • 36. Data Citation 2: acceptance by journals
  • 37. Data+Citation 3: inclusion in the references
  • 38. • Data submitted to NCBI databases: - Raw data SRA:SRA046843 - Assemblies of 3 strains Genbank:AHAO00000000-AHAQ00000000 - SNPs dbSNP:1056306 - CNVs - - InDels SV } dbVAR:nstd63 • Submission to public databases complemented by its citable form in GigaDB (doi:10.5524/100012).
  • 39.
  • 42.
  • 43. And now in Nature Biotech…
  • 44. And in more journals… Hodkinson BP, Uehling JK, Smith ME (2012) Data from: Lepidostroma vilgalysii, a new basidiolichen from the New World. Dryad Digital Repository. doi:10.5061/dryad.j1g5dh23 Cited in: Hodkinson BP, Uehling JK, Smith ME: Lepidostroma vilgalysii, a new basidiolichen from the New World. Mycological Progress 2012. Advance Online Publication. Roberts SB (2012) Herring Hepatic Transcriptome 34300 contigs.fa. Figshare. Available: hdl.handle.net/10779/084d34370fbda29bbc6​7b3c5ecb02 575. Accessed 2012 Jan 20. Cited in: Roberts SB, Hauser L, Seeb LW, Seeb JE (2012) Development of Genomic Resources for Pacific Herring through Targeted Transcriptome Pyrosequencing. PLoS ONE 7(2): e30908. doi:10.1371/journal.pone.0030908
  • 45. For data citation to work, needs: 1. Proven utility/potential user base. ✔ 2. Acceptance/inclusion by journals. ✔ 3. Data+Citation: inclusion in the references. ✔ 4. Tracking by citation indexes. 5. Usage of the metrics by the community…
  • 47. Datacitation 4: tracking? ✗FAIL DataCite metadata in harvestable form (OAI-PMH) - lists some DataCite DOIs, but says: Datasets listed are the “result of approximations in the indexing algorithms.” “Google Scholar's intended coverage is for scholarly articles. At this point, we don't include datasets. “
  • 48. Datacitation 4: tracking? ✗FAIL DataCite metadata in harvestable form (OAI-PMH) ✗ Working on it. Coming soon? …the final challenge?
  • 49.
  • 50. Datacitation 5: metrics? “As a result of diverse practices and tool limitations, data citations are currently very difficult to track.”
  • 51. Datacitation 5: metrics? ✗FAIL Research Remix, 29th May 2012: http://researchremix.wordpress.com/2012/05/29/dear-research- data-advocate-please-sign-the-petition-oamonday/ I’m afraid we are making promises to data creators about attribution and reward that we can’t keep. ”Make your data citeable!” is the cry. Ok. So citeable is step one. Cited is step two. But for the citation to be useful, it has to be indexed so that citation metrics can be tracked and admired and used. Who is indexing data citations right now? As far as I can tell: absolutely no one.
  • 52. Where data citation is in 2012: 1. Proven utility/potential user base. ✔ 2. Acceptance/inclusion by journals. ✔ 3. Data+Citation: inclusion in the references. ✔ 4. Tracking by citation indexes. ✗ 5. Usage of the metrics by the community… ✗
  • 53. Minor quibbles: export to citation managers DCC/DataCite recommended format: Zheng, L-Y; Guo, X-S; He, B; Sun, L-J; Peng, Y; Dong, S-S; Liu, T-F; Jiang, S; Ramachandran, S; Liu, C-M; Jing, H-C; (2011): Genome data from sweet and grain sorghum (Sorghum bicolor); GigaScience. http://dx.doi.org/10.5524/100012 formatting: Zheng, L-Y (2011). Genome data from sweet and grain sorghum (Sorghum bicolor). GigaScience. Retrieved from http://dx.doi.org/10.5524/100012 Mendeley formatting: Zheng L-Y  Guo X-S  He B  Sun L-J  Peng Y  Dong S-S  Liu T-F  Jiang S  ; ; ; ; ; ; ; ; Ramachandran S  Liu C-M  Jing H-C: Genome data from sweet and grain sorghum ; ; (Sorghum bicolor). 2011.
  • 54. Minor quibbles: clearer guidelines Rules for versioning/where do you set granularity? Experiment e.g. doi:10.5524/100001 Papers (e.g. ACRG project) e.g. doi:10.5524/100001-2 Data/ Datasets Micropubs (e.g. cancer type) e.g. doi:10.5524/100001-2000 Sample or doi:10.5524/100001_xyz (e.g. specimen xyz) Smaller still? Facts/Assertations (~1013 in literature) Nanopubs
  • 55.
  • 56.
  • 57. Papers in the era of big-data goal: Executable Research Objects July 2012 Wilson GA, Dhami P, Feber A, Cortázar D, Suzuki Y, Schulz R, Schär P, Beck S: Resources for methylome analysis suitable for gene knockout studies of potential epigenome modifiers. GigaScience 2012, 1:3. (in press) GigaDB hosting all data + tools (84GB total): doi:10.5524/100035 + Partial (~80%) integration of workflow into our data platform. (all the data processing steps, but not the enrichment analysis) Data in ISA-Tab compliant format Next stage… Papers fully integrating all data + all workflows in our platform.
  • 58. Do you have interesting large-scale biological data sets? Submit to: • Rapid review/Open Access/High-visibility • Article Processing Charge covered by BGI • Hosting of any test datasets/workflows in GigaDB Interested in Reproducible Research? Take part in our session on: “Cloud and workflows for reproducible bioinformatics”
  • 59. Thanks to: Laurie Goodman Alexandra Basford Tam Sneddon Shaoguang Liang Tin-Lap Lee (CUHK) Qiong Luo (HKUST) scott@gigasciencejournal.com Contact us: editorial@gigasciencejournal.com @gigascience Follow us: facebook.com/GigaScience blogs.openaccesscentral.com/blogs/gigablog/ www.gigasciencejournal.com

Notas do Editor

  1. BGI (formerly known as Beijing Genomics Institute) was founded in 1999 and has since become the largest genomic organization in the world, with a focus on research and applications in healthcare, agriculture, conservation, and bio-energy fields.Our goal is to make leading-edge genomics highly accessible to the global research community by leveraging industry’s best technology, economies of scale and expert bioinformatics resources. BGI Americas was established as an interface with customer and collaborations in North and South Americas.
  2. Our facilities feature Sanger and next-generation sequencing technologies, providing the highest throughput sequencing capacity in the world. Powered by 137 IlluminaHiSeq 2000 instruments and 27 Applied BiosystemsSOLiD™ 4 Systems, we provide, high-quality sequencing results with industry-leading turnaround time. As of December 2010, our sequencing capacity is 5 Tb raw data per day, supported by several supercomputing centers with a total peak performance up to 102 Tflops, 20 TB of memory, and 10 PB storage. We provide stable and efficient resources to store and analyze massive amounts of data generated by next generation sequencing.
  3. Our facilities feature Sanger and next-generation sequencing technologies, providing the highest throughput sequencing capacity in the world. Powered by 137 IlluminaHiSeq 2000 instruments and 27 Applied BiosystemsSOLiD™ 4 Systems, we provide, high-quality sequencing results with industry-leading turnaround time. As of December 2010, our sequencing capacity is 5 Tb raw data per day, supported by several supercomputing centers with a total peak performance up to 102 Tflops, 20 TB of memory, and 15 PB storage. We provide stable and efficient resources to store and analyze massive amounts of data generated by next generation sequencing. The LHC of Biology?
  4. Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
  5. Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
  6. Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
  7. Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.