SlideShare uma empresa Scribd logo
1 de 26
Curation
Ewan Birney (tweetable)
Who am I?
• Associate Director at
  European Bioinformatics
  Institute (EBI)
• Involved in genomics since I
  was 19 (> 20 years!)
• Trained as a biochemist –
  most people think I am CS
                                 EBI is in Hinxton, South
• Analysed – sometimes lead
                                 Cambridgeshire
  –
  human/mouse/rat/platypus
                                 EBI is part of EMBL, ~like
  etc genomes, ENCODE,
                                 CERN for molecular biology
  Others.
Molecular Biology
• The study of how life works – at a molecular level

• Key molecules:
  • DNA – Information store (Disk)
  • RNA – Key information transformer, also does stuff (RAM)
  • Proteins – The business end of life (Chip, robotic arms)
  • Metabolites – Fuel and signalling molecules (electricity)
• Theories of how these interact – no theories of to predict what
  they are
• Instead we determine attributes of molecules and store them in
  globally accessible, open, databases
Theory  Observation


                    Can accurately predict from models




 Must directly observe
    Molecular Geology,  Climate        High Energy
    Biology   Astronomy modelling      Physics
This ratio is not well correlated with data size


   ~60PB                        High Energy Physics

Data Size
             Molecular Astronomy
             Biology
    ~5PB                      Climate Models




             Ratio of model predictability
“Knowing stuff” is critical to biology…

• The bases of the human genome
  • … and the Mouse, Rat, Wheat, Ecoli, Plasmodium, Cow….
• The functions of proteins
  • Enzymes, Transcription Factors, Signalling….
• The types of cells, their lineages and organ composition
  • …and all the molecular components in each cell
• Small molecules
  • … and their conversions, binding partners
• Structures of molecules, complexes and cells
  • … at atomic and higher resolution
Two fundamental types of information

• Experimental data           • Consensus Knowledge

• The result of a specific    • Integration of different
  experiment                    strands of information on a
• Often an experiment           topic
  specific, data heavy part   • Realised as a
  plus a “meta-data” part       computationally accessible
• Might be contradictory        scheme


• “Primary paper”             • “Review article”
Five types of curation
Experimental Data Entry

• Intact – Protein:Protein
  interactions


• GWAS Catalog –
  extraction of summary
  statistics
Experimental Meta data capture

• Sample, CDS lines in
  ENA
• Sample in Metabolights,
  PRIDE etc
• Machine and analysis
  specification in PDB,
  PRIDE, ENA
Consensus integration of information

• GenCode gene models in
  human
• Summaries and GO
  assignment in UniProt
• Pathway information in
  Reactome
• GO assignment and
  summaries in MODs (eg,
  PomBase, WormBase,
  PhytoPathDB etc)
Knowledge frameworks

•   The EC classification
•   Cell type ontologies
•   Cell lineages – Worms!
•   SnowMed, HPO etc
•   GO ontologies
Knowledge management

• Creation of rules
  representing ENA
  standards compliance
• Cross-ontology
  coordination (eg, EFO) or
  tieing (GO  ChEBI)
• RuleBase / UniRule
  curation processes
Data Entry vs Programming

 Direct                                    Programmatic
 Data Entry                                Data Entry




                      “Messy” Scripting
         Improved
         Data entry
         tools              RuleBase,
                            Computational Accessible
                            Standards
Thank You!
Curation Dilema

• If you do your job well…   • If you do your job badly…

• Everyone assumes it’s      • Everyone assumes it’s
  easy                         easy
• People forget about the    • People forget about the
  complexity                   complexity


• You are ignored           • People complain 
Why we need an infrastructure…
Infrastructures are critical…
But we only notice them when they go wrong
Biology already needs an information
infrastructure

• For the human genome
  • (…and the mouse, and the rat, and… x 150 now, 1000 in the
    future!) - Ensembl
• For the function of genes and proteins
  • For all genes, in text and computational – UniProt and GO
• For all 3D structures
  • To understand how proteins work – PDBe
• For where things are expressed
  • The differences and functionality of cells - Atlas
..But this keeps on going…

• We have to scale across all of (interesting) life
  • There are a lot of species out there!
• We have to handle new areas, in particular medicine
  • A set of European haplotypes for good imputation
  • A set of actionable variants in germline and cancers
• We have to improve our chemical understanding
  • Of biological chemicals
  • Of chemicals which interfere with Biology
ELIXIR’s mission
To build a sustainable
European infrastructure for
biological
information, supporting life
science research and its
                                                  medicine
translation to:

                                    environment


                         bioindustries

            society


              22
How?

Fully Centralised                                 Fully Distributed




Pros: Stability, reuse,             Pros: Responsive, Geographic
Learning ease                       Language responsive
Cons: Hard to concentrate           Cons: Internal communication overhead
Expertise across of life science    Harder for end users to learn
Geographic, language placement      Harder to provide multi-decade stability
Bottlenecks and lack of diversity
Research        Healthcare




    International    National
    EBI / Elixir     Healthcare
    English          National Language
    Low legalities   Complex legalities

2
Other infrastructures needed for biology
• EuroBioImaging
  • Cellular and whole organism Imaging
• BioBanks (BBMRI)
  • We need numbers – European populations – in particular for rare
    diseases, but also for specific sub types of common disease
• Mouse models and phenotypes (Infrafrontier)
  • A baseline set of knockouts and phenotypes in our most tractable
    mammalian model
  • (it’s hard to prove something in human)
• Robust molecular assays in a clinical setting (EATRIS)
  • The ability to reliably use state of the art molecular techniques in a
    clinical research setting
(you can follow me on twitter @ewanbirney)
I blog and update this on Google Plus publically

Mais conteúdo relacionado

Mais procurados

Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Fokhruz Zaman
 
UniProt & Ontologies
UniProt & OntologiesUniProt & Ontologies
UniProt & OntologiesEric Jain
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Sijo A
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsAyeshaYousaf20
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...Natalio Krasnogor
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014Anil Thanki
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOAEBI
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenix Bioinformatics
 
Introduction to proteomics
Introduction to proteomicsIntroduction to proteomics
Introduction to proteomicsHoffman Lab
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic WebChimezie Ogbuji
 
University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012Brock University
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Chris Mungall
 
Shorter bioinformatics
Shorter bioinformaticsShorter bioinformatics
Shorter bioinformaticsNimrita Koul
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Chris Mungall
 

Mais procurados (20)

Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
 
Ensembl Browser Workshop
Ensembl Browser WorkshopEnsembl Browser Workshop
Ensembl Browser Workshop
 
UniProt & Ontologies
UniProt & OntologiesUniProt & Ontologies
UniProt & Ontologies
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenes
 
Introduction to proteomics
Introduction to proteomicsIntroduction to proteomics
Introduction to proteomics
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic Web
 
University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
 
Shorter bioinformatics
Shorter bioinformaticsShorter bioinformatics
Shorter bioinformatics
 
Kegg
KeggKegg
Kegg
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017
 

Semelhante a Ewan Birney Biocuration 2013

Computer science history.pdf
Computer science history.pdfComputer science history.pdf
Computer science history.pdfsirwansleman
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomicsNikhil Aggarwal
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biologyrobertstevens65
 
Biocurator2012.41.hu
Biocurator2012.41.huBiocurator2012.41.hu
Biocurator2012.41.hujimhutamu
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesLeighton Pritchard
 
Building and Using Ontologies to do biology
Building and Using Ontologies to do biologyBuilding and Using Ontologies to do biology
Building and Using Ontologies to do biologyrobertstevens65
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08Russ Altman
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesConnected Data World
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...RussellHanson
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformaticsChris Dwan
 
Big Data
Big DataBig Data
Big DataSURFnet
 
Molecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionMolecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionUdayBhanushali111
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Jan Aerts
 
Bi 140 science, technology and society module 4
Bi 140 science, technology and society module 4Bi 140 science, technology and society module 4
Bi 140 science, technology and society module 4Michael Matthews
 
World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016Christoph Steinbeck
 
Using public databases to inform research questions
Using public databases to inform research questionsUsing public databases to inform research questions
Using public databases to inform research questionsamlbinder
 
Introduction to epigenetics and study design
Introduction to epigenetics and study designIntroduction to epigenetics and study design
Introduction to epigenetics and study designamlbinder
 
Genomics and bioinformatics
Genomics and bioinformatics Genomics and bioinformatics
Genomics and bioinformatics Senthil Natesan
 

Semelhante a Ewan Birney Biocuration 2013 (20)

Computer science history.pdf
Computer science history.pdfComputer science history.pdf
Computer science history.pdf
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
Biocurator2012.41.hu
Biocurator2012.41.huBiocurator2012.41.hu
Biocurator2012.41.hu
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
 
Building and Using Ontologies to do biology
Building and Using Ontologies to do biologyBuilding and Using Ontologies to do biology
Building and Using Ontologies to do biology
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
 
Big Data
Big DataBig Data
Big Data
 
Molecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionMolecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contruction
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
 
Bi 140 science, technology and society module 4
Bi 140 science, technology and society module 4Bi 140 science, technology and society module 4
Bi 140 science, technology and society module 4
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016
 
Using public databases to inform research questions
Using public databases to inform research questionsUsing public databases to inform research questions
Using public databases to inform research questions
 
Introduction to epigenetics and study design
Introduction to epigenetics and study designIntroduction to epigenetics and study design
Introduction to epigenetics and study design
 
Genomics and bioinformatics
Genomics and bioinformatics Genomics and bioinformatics
Genomics and bioinformatics
 

Mais de Iddo

What can Community Challenges do for You?
What can Community Challenges do for You?What can Community Challenges do for You?
What can Community Challenges do for You?Iddo
 
Surviving Scientific Presentations
Surviving Scientific PresentationsSurviving Scientific Presentations
Surviving Scientific PresentationsIddo
 
Friedberg lab-overview-grad-students-2019-nr
Friedberg lab-overview-grad-students-2019-nrFriedberg lab-overview-grad-students-2019-nr
Friedberg lab-overview-grad-students-2019-nrIddo
 
The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...Iddo
 
Why Your Microbiome Analysis is Wrong
Why Your Microbiome Analysis is WrongWhy Your Microbiome Analysis is Wrong
Why Your Microbiome Analysis is WrongIddo
 
Tracing the Ancestry of Genomes in Bacteria
Tracing the Ancestry of Genomes in BacteriaTracing the Ancestry of Genomes in Bacteria
Tracing the Ancestry of Genomes in BacteriaIddo
 
Computational Challenges in Biological Data Science: an Optimistically Cautio...
Computational Challenges in Biological Data Science: an Optimistically Cautio...Computational Challenges in Biological Data Science: an Optimistically Cautio...
Computational Challenges in Biological Data Science: an Optimistically Cautio...Iddo
 
Friedberg lab-overview-grad-students
Friedberg lab-overview-grad-studentsFriedberg lab-overview-grad-students
Friedberg lab-overview-grad-studentsIddo
 
Understanding Biological Function in Times of High Throughput and Low Output
Understanding Biological Function in Times of High Throughput and Low OutputUnderstanding Biological Function in Times of High Throughput and Low Output
Understanding Biological Function in Times of High Throughput and Low OutputIddo
 
Random Musings on Fixing Data Shambles in Science
Random Musings on Fixing Data Shambles in ScienceRandom Musings on Fixing Data Shambles in Science
Random Musings on Fixing Data Shambles in ScienceIddo
 
Genome Informatics 2015 Bacteriocin Discovery
Genome Informatics 2015 Bacteriocin DiscoveryGenome Informatics 2015 Bacteriocin Discovery
Genome Informatics 2015 Bacteriocin DiscoveryIddo
 
Convergent divergent
Convergent divergentConvergent divergent
Convergent divergentIddo
 
Some US Science Funding sources
Some US Science Funding sourcesSome US Science Funding sources
Some US Science Funding sourcesIddo
 
CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013Iddo
 
Metagenomics Biocuration 2013
Metagenomics Biocuration 2013Metagenomics Biocuration 2013
Metagenomics Biocuration 2013Iddo
 
Ismb grant-writing-2012
Ismb grant-writing-2012Ismb grant-writing-2012
Ismb grant-writing-2012Iddo
 
David Jones AFP/CAFA2011
David Jones AFP/CAFA2011David Jones AFP/CAFA2011
David Jones AFP/CAFA2011Iddo
 
Vienna afp2011
Vienna afp2011Vienna afp2011
Vienna afp2011Iddo
 
Afp cafa djuric
Afp cafa djuricAfp cafa djuric
Afp cafa djuricIddo
 
Go camp 2010_cacao
Go camp 2010_cacaoGo camp 2010_cacao
Go camp 2010_cacaoIddo
 

Mais de Iddo (20)

What can Community Challenges do for You?
What can Community Challenges do for You?What can Community Challenges do for You?
What can Community Challenges do for You?
 
Surviving Scientific Presentations
Surviving Scientific PresentationsSurviving Scientific Presentations
Surviving Scientific Presentations
 
Friedberg lab-overview-grad-students-2019-nr
Friedberg lab-overview-grad-students-2019-nrFriedberg lab-overview-grad-students-2019-nr
Friedberg lab-overview-grad-students-2019-nr
 
The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...
 
Why Your Microbiome Analysis is Wrong
Why Your Microbiome Analysis is WrongWhy Your Microbiome Analysis is Wrong
Why Your Microbiome Analysis is Wrong
 
Tracing the Ancestry of Genomes in Bacteria
Tracing the Ancestry of Genomes in BacteriaTracing the Ancestry of Genomes in Bacteria
Tracing the Ancestry of Genomes in Bacteria
 
Computational Challenges in Biological Data Science: an Optimistically Cautio...
Computational Challenges in Biological Data Science: an Optimistically Cautio...Computational Challenges in Biological Data Science: an Optimistically Cautio...
Computational Challenges in Biological Data Science: an Optimistically Cautio...
 
Friedberg lab-overview-grad-students
Friedberg lab-overview-grad-studentsFriedberg lab-overview-grad-students
Friedberg lab-overview-grad-students
 
Understanding Biological Function in Times of High Throughput and Low Output
Understanding Biological Function in Times of High Throughput and Low OutputUnderstanding Biological Function in Times of High Throughput and Low Output
Understanding Biological Function in Times of High Throughput and Low Output
 
Random Musings on Fixing Data Shambles in Science
Random Musings on Fixing Data Shambles in ScienceRandom Musings on Fixing Data Shambles in Science
Random Musings on Fixing Data Shambles in Science
 
Genome Informatics 2015 Bacteriocin Discovery
Genome Informatics 2015 Bacteriocin DiscoveryGenome Informatics 2015 Bacteriocin Discovery
Genome Informatics 2015 Bacteriocin Discovery
 
Convergent divergent
Convergent divergentConvergent divergent
Convergent divergent
 
Some US Science Funding sources
Some US Science Funding sourcesSome US Science Funding sources
Some US Science Funding sources
 
CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013
 
Metagenomics Biocuration 2013
Metagenomics Biocuration 2013Metagenomics Biocuration 2013
Metagenomics Biocuration 2013
 
Ismb grant-writing-2012
Ismb grant-writing-2012Ismb grant-writing-2012
Ismb grant-writing-2012
 
David Jones AFP/CAFA2011
David Jones AFP/CAFA2011David Jones AFP/CAFA2011
David Jones AFP/CAFA2011
 
Vienna afp2011
Vienna afp2011Vienna afp2011
Vienna afp2011
 
Afp cafa djuric
Afp cafa djuricAfp cafa djuric
Afp cafa djuric
 
Go camp 2010_cacao
Go camp 2010_cacaoGo camp 2010_cacao
Go camp 2010_cacao
 

Último

Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliRewAs ALI
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...narwatsonia7
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Modelssonalikaur4
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowNehru place Escorts
 
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Serviceparulsinha
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformKweku Zurek
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000aliya bhat
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...narwatsonia7
 
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original PhotosBook Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknownarwatsonia7
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...narwatsonia7
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipurparulsinha
 
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbaisonalikaur4
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Gabriel Guevara MD
 

Último (20)

Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas Ali
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
 
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Servicesauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy Platform
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
 
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original PhotosBook Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
 
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
 
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024
 

Ewan Birney Biocuration 2013

  • 2. Who am I? • Associate Director at European Bioinformatics Institute (EBI) • Involved in genomics since I was 19 (> 20 years!) • Trained as a biochemist – most people think I am CS EBI is in Hinxton, South • Analysed – sometimes lead Cambridgeshire – human/mouse/rat/platypus EBI is part of EMBL, ~like etc genomes, ENCODE, CERN for molecular biology Others.
  • 3. Molecular Biology • The study of how life works – at a molecular level • Key molecules: • DNA – Information store (Disk) • RNA – Key information transformer, also does stuff (RAM) • Proteins – The business end of life (Chip, robotic arms) • Metabolites – Fuel and signalling molecules (electricity) • Theories of how these interact – no theories of to predict what they are • Instead we determine attributes of molecules and store them in globally accessible, open, databases
  • 4. Theory  Observation Can accurately predict from models Must directly observe Molecular Geology, Climate High Energy Biology Astronomy modelling Physics
  • 5. This ratio is not well correlated with data size ~60PB High Energy Physics Data Size Molecular Astronomy Biology ~5PB Climate Models Ratio of model predictability
  • 6. “Knowing stuff” is critical to biology… • The bases of the human genome • … and the Mouse, Rat, Wheat, Ecoli, Plasmodium, Cow…. • The functions of proteins • Enzymes, Transcription Factors, Signalling…. • The types of cells, their lineages and organ composition • …and all the molecular components in each cell • Small molecules • … and their conversions, binding partners • Structures of molecules, complexes and cells • … at atomic and higher resolution
  • 7. Two fundamental types of information • Experimental data • Consensus Knowledge • The result of a specific • Integration of different experiment strands of information on a • Often an experiment topic specific, data heavy part • Realised as a plus a “meta-data” part computationally accessible • Might be contradictory scheme • “Primary paper” • “Review article”
  • 8. Five types of curation
  • 9. Experimental Data Entry • Intact – Protein:Protein interactions • GWAS Catalog – extraction of summary statistics
  • 10. Experimental Meta data capture • Sample, CDS lines in ENA • Sample in Metabolights, PRIDE etc • Machine and analysis specification in PDB, PRIDE, ENA
  • 11. Consensus integration of information • GenCode gene models in human • Summaries and GO assignment in UniProt • Pathway information in Reactome • GO assignment and summaries in MODs (eg, PomBase, WormBase, PhytoPathDB etc)
  • 12. Knowledge frameworks • The EC classification • Cell type ontologies • Cell lineages – Worms! • SnowMed, HPO etc • GO ontologies
  • 13. Knowledge management • Creation of rules representing ENA standards compliance • Cross-ontology coordination (eg, EFO) or tieing (GO  ChEBI) • RuleBase / UniRule curation processes
  • 14. Data Entry vs Programming Direct Programmatic Data Entry Data Entry “Messy” Scripting Improved Data entry tools RuleBase, Computational Accessible Standards
  • 16. Curation Dilema • If you do your job well… • If you do your job badly… • Everyone assumes it’s • Everyone assumes it’s easy easy • People forget about the • People forget about the complexity complexity • You are ignored  • People complain 
  • 17. Why we need an infrastructure…
  • 19. But we only notice them when they go wrong
  • 20. Biology already needs an information infrastructure • For the human genome • (…and the mouse, and the rat, and… x 150 now, 1000 in the future!) - Ensembl • For the function of genes and proteins • For all genes, in text and computational – UniProt and GO • For all 3D structures • To understand how proteins work – PDBe • For where things are expressed • The differences and functionality of cells - Atlas
  • 21. ..But this keeps on going… • We have to scale across all of (interesting) life • There are a lot of species out there! • We have to handle new areas, in particular medicine • A set of European haplotypes for good imputation • A set of actionable variants in germline and cancers • We have to improve our chemical understanding • Of biological chemicals • Of chemicals which interfere with Biology
  • 22. ELIXIR’s mission To build a sustainable European infrastructure for biological information, supporting life science research and its medicine translation to: environment bioindustries society 22
  • 23. How? Fully Centralised Fully Distributed Pros: Stability, reuse, Pros: Responsive, Geographic Learning ease Language responsive Cons: Hard to concentrate Cons: Internal communication overhead Expertise across of life science Harder for end users to learn Geographic, language placement Harder to provide multi-decade stability Bottlenecks and lack of diversity
  • 24. Research Healthcare International National EBI / Elixir Healthcare English National Language Low legalities Complex legalities 2
  • 25. Other infrastructures needed for biology • EuroBioImaging • Cellular and whole organism Imaging • BioBanks (BBMRI) • We need numbers – European populations – in particular for rare diseases, but also for specific sub types of common disease • Mouse models and phenotypes (Infrafrontier) • A baseline set of knockouts and phenotypes in our most tractable mammalian model • (it’s hard to prove something in human) • Robust molecular assays in a clinical setting (EATRIS) • The ability to reliably use state of the art molecular techniques in a clinical research setting
  • 26. (you can follow me on twitter @ewanbirney) I blog and update this on Google Plus publically