SlideShare uma empresa Scribd logo
1 de 1
An online game for improving human phenotype prediction
                                                                                Benjamin M Good, Salvatore Loguercio, Andrew I Su
                                                                            The Scripps Research Institute, La Jolla, California, USA


                        ABSTRACT
                             ABSTRACT                                                                          Motivation                                                            Combo: feature selection with community intelligence

An important goal for biomedical research is to produce genetic and                                                                                                           • Goal: pick the best set of genes
genomic predictors for human phenotypes such as disease prognosis or
drug response. To this end, we can now quantify an extremely large
                                                                                  •   Using prior biological knowledge, it is possible                                        • Best: the gene set that produces the best decision tree classifier
number of potential biomarkers for any biological sample. In fact, a                  to identify stronger, more consistent                                                   • Classifier: created using training data and selected genes, used to
single sample could reasonably be described by millions of molecular                                                                                                            predict phenotype (e.g. breast cancer prognosis)
variations in DNA, RNA, proteins, and metabolites. However, the actual
                                                                                      predictive patterns.
number of samples processed typically remains small in comparison. As a
result, attempts to use this data to build predictors often face problems                                                                                                                     A game board                                      A hand
of overfitting. (While a predictive pattern may describe training data
very well, it may not reproduce well on other datasets.)                    •    Prior knowledge
It has recently been shown that biological knowledge in the form of gene
                                                                                 encoded in protein-
annotations and pathway databases can be used to guide the process of            protein interaction
inferring phenotype predictors [1-3]. While promising, such methods are
limited by the amount, quality and problem-specific applicability of the
                                                                                 databases [1,2] and
structured knowledge that is available.                                          pathway databases
                                                                                 [3] has been used to
Following in the line of games that have recently demonstrated success
as a means of ‘crowdsourcing’ difficult biological problems [4,5], we are        improve phenotype
                                                                                                                                                                                                                                                          Inferred
developing games with the purpose of improving human phenotype                   prediction                                                                                            Score: 78 (percent correct)                                        decision tree
predictions. Our games work on two levels: (1) games such as Dizeez
and GenESP collect novel gene annotations and (2) games like Combo                                                                                                                 Game Score: determined by                                                           Phenotype 1
                                                                                                                     Network Guided Forest from Dutkowski et al (2011)
engage players directly in the process of predictor inference.                                                                                                                     estimating performance of trees                                                  Phenotype 2
                                                                                                                                                                                   constructed using the selected                    Feature sets from many

Play game prototypes at:       http://www.genegames.org                           •   What about knowledge that is not recorded in                                                 features on training data.                        individual games used to create
                                                                                                                                                                                                                                     a Decision Tree Forest classifier.
(Also see Poster I03)                                                                 structured databases?                                                                                                                          (Each tree votes once.)

                               Challenge                                                                       Opportunity                                                                                                              Human Guided Forest

                                                                                                                                                                                                                                     Ensemble classifier where
                                              make predictions on                 •   Online games are successfully tapping into the                                                                                                 components are decision
  cancer        normal                        new samples                             knowledge and reasoning abilities of                                                                                                           trees      constructed using
                                                                                      thousands of people.                                                                                                                           manually selected subsets of
                               find patterns                                                                                                                                                                                         features.     Adaptation of
                                                           cancer                                                                                                                                                                    Network Guided and Random
                                                                                                                                                                                                                                     Forests [1].
                                                           normal
                                                                                       Label all images on the Web
                                                                                                                            Devise protein folding algorithms


                                                                                                                                                                                                                      REFERENCES
                                                                                                                                                                         1. Dutkowski and Ideker (2011) Protein Networks as Logic Functions in Development and Cancer. PLoS
                                                                                                                                                                            Computational Biology
                                                                                        Design RNA molecules                   Fix multiple sequence alignments          2. Winter et al (2012) Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based
                                                                                                                                                                            Ranking of Marker Genes. PLoS Computational Biology
                                                                                                                                                                         3. Liu et al (2012) Identifying dysregulated pathways in cancers from pathway interaction networks. BMC
                                                                                                                                                                            Bioinformatics
    •    With tens of thousands of measurements                                   •   COMBO is designed to motivate and enable                                           4. Good and Su (2011) Games with a Scientific Purpose. Genome Biology
                                                                                                                                                                         5. Kawrykow et al (2012) Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment. PLoS One
         but only hundreds of samples, many                                           people to help improve phenotype predictors
         possible patterns are found.                                                                                                                                                                                    CONTACT
    •    But which ones are real?                                                                                                                                        Benjamin Good: bgood@scripps.edu Salvatore Loguercio: loguerci@scripps.edu Andrew Su: asu@scripps.edu


                                                                                                                                                                                                                        FUNDING
                                                                                                     select predictive gene sets
                                                                                                                                                                         We acknowledge support from the National Institute of General Medical Sciences (GM089820 and
                                                                                                                                                                         GM083924) and the NIH through the FaceBase Consortium for a particular emphasis on
                                                                                                                                                                         craniofacial genes (DE-20057).
                                                                                                                                                                         .

Mais conteúdo relacionado

Semelhante a An online game for human phenotype prediction

Genegames.org (poster ISMB2012)
Genegames.org (poster ISMB2012)Genegames.org (poster ISMB2012)
Genegames.org (poster ISMB2012)Sal
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS
 
voice and speech recognition using machine learning
voice and speech recognition using machine learningvoice and speech recognition using machine learning
voice and speech recognition using machine learningMohammedWahhab4
 
Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...laserxiong
 
Genetically Modified Organisms (Carrie)
Genetically Modified Organisms (Carrie)Genetically Modified Organisms (Carrie)
Genetically Modified Organisms (Carrie)Eileen O'Connor
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseHilmar Lapp
 
Genetic Algorithm Demonstation System
Genetic Algorithm Demonstation SystemGenetic Algorithm Demonstation System
Genetic Algorithm Demonstation SystemBenjamin Murphy
 
Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20Sage Base
 
Multi-scale network biology model & the model library
Multi-scale network biology model & the model libraryMulti-scale network biology model & the model library
Multi-scale network biology model & the model librarylaserxiong
 
Impact of Generative AI in Biology - Biology TVM.pptx
Impact of Generative AI in Biology - Biology TVM.pptxImpact of Generative AI in Biology - Biology TVM.pptx
Impact of Generative AI in Biology - Biology TVM.pptxSuresh V
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBenjamin Good
 
Pathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource CenterPathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource CenterPathema
 
Bio-inspired Active Vision System
Bio-inspired Active Vision SystemBio-inspired Active Vision System
Bio-inspired Active Vision SystemMartin Peniak
 
Sciences Games #Glass2015
Sciences Games #Glass2015Sciences Games #Glass2015
Sciences Games #Glass2015Antoine Taly
 
Research Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance GenomicsResearch Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance GenomicsMelanie Swan
 

Semelhante a An online game for human phenotype prediction (20)

Genegames.org (poster ISMB2012)
Genegames.org (poster ISMB2012)Genegames.org (poster ISMB2012)
Genegames.org (poster ISMB2012)
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
 
voice and speech recognition using machine learning
voice and speech recognition using machine learningvoice and speech recognition using machine learning
voice and speech recognition using machine learning
 
Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...
 
Gene Expression Lab Summary
Gene Expression Lab SummaryGene Expression Lab Summary
Gene Expression Lab Summary
 
Genetically Modified Organisms (Carrie)
Genetically Modified Organisms (Carrie)Genetically Modified Organisms (Carrie)
Genetically Modified Organisms (Carrie)
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
 
Semantic Hybridized Image Features in Visual Diagnostic of Plant Health
Semantic Hybridized Image Features in Visual Diagnostic of Plant HealthSemantic Hybridized Image Features in Visual Diagnostic of Plant Health
Semantic Hybridized Image Features in Visual Diagnostic of Plant Health
 
Genetic Algorithm Demonstation System
Genetic Algorithm Demonstation SystemGenetic Algorithm Demonstation System
Genetic Algorithm Demonstation System
 
Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20
 
Multi-scale network biology model & the model library
Multi-scale network biology model & the model libraryMulti-scale network biology model & the model library
Multi-scale network biology model & the model library
 
Impact of Generative AI in Biology - Biology TVM.pptx
Impact of Generative AI in Biology - Biology TVM.pptxImpact of Generative AI in Biology - Biology TVM.pptx
Impact of Generative AI in Biology - Biology TVM.pptx
 
Content-based Image Retrieval
Content-based Image RetrievalContent-based Image Retrieval
Content-based Image Retrieval
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
 
Pathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource CenterPathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource Center
 
Be24365370
Be24365370Be24365370
Be24365370
 
Bio-inspired Active Vision System
Bio-inspired Active Vision SystemBio-inspired Active Vision System
Bio-inspired Active Vision System
 
Sciences Games #Glass2015
Sciences Games #Glass2015Sciences Games #Glass2015
Sciences Games #Glass2015
 
WiML Poster
WiML PosterWiML Poster
WiML Poster
 
Research Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance GenomicsResearch Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance Genomics
 

Mais de Benjamin Good

Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledgeBenjamin Good
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsBenjamin Good
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsBenjamin Good
 
Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Benjamin Good
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of FoodBenjamin Good
 
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopGene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopBenjamin Good
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationBenjamin Good
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Benjamin Good
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giantsBenjamin Good
 
Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Benjamin Good
 
Channeling Collaborative Spirit
Channeling Collaborative SpiritChanneling Collaborative Spirit
Channeling Collaborative SpiritBenjamin Good
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidataBenjamin Good
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery Benjamin Good
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KBenjamin Good
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbioBenjamin Good
 
Citizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfCitizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfBenjamin Good
 

Mais de Benjamin Good (20)

Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMs
 
Knowledge Beacons
Knowledge BeaconsKnowledge Beacons
Knowledge Beacons
 
Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden
 
Science Game Lab
Science Game LabScience Game Lab
Science Game Lab
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
 
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopGene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshop
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocuration
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016
 
Channeling Collaborative Spirit
Channeling Collaborative SpiritChanneling Collaborative Spirit
Channeling Collaborative Spirit
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata
 
2016 mem good
2016 mem good2016 mem good
2016 mem good
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2K
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio
 
(Bio)Hackathons
(Bio)Hackathons(Bio)Hackathons
(Bio)Hackathons
 
Citizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfCitizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdf
 

An online game for human phenotype prediction

  • 1. An online game for improving human phenotype prediction Benjamin M Good, Salvatore Loguercio, Andrew I Su The Scripps Research Institute, La Jolla, California, USA ABSTRACT ABSTRACT Motivation Combo: feature selection with community intelligence An important goal for biomedical research is to produce genetic and • Goal: pick the best set of genes genomic predictors for human phenotypes such as disease prognosis or drug response. To this end, we can now quantify an extremely large • Using prior biological knowledge, it is possible • Best: the gene set that produces the best decision tree classifier number of potential biomarkers for any biological sample. In fact, a to identify stronger, more consistent • Classifier: created using training data and selected genes, used to single sample could reasonably be described by millions of molecular predict phenotype (e.g. breast cancer prognosis) variations in DNA, RNA, proteins, and metabolites. However, the actual predictive patterns. number of samples processed typically remains small in comparison. As a result, attempts to use this data to build predictors often face problems A game board A hand of overfitting. (While a predictive pattern may describe training data very well, it may not reproduce well on other datasets.) • Prior knowledge It has recently been shown that biological knowledge in the form of gene encoded in protein- annotations and pathway databases can be used to guide the process of protein interaction inferring phenotype predictors [1-3]. While promising, such methods are limited by the amount, quality and problem-specific applicability of the databases [1,2] and structured knowledge that is available. pathway databases [3] has been used to Following in the line of games that have recently demonstrated success as a means of ‘crowdsourcing’ difficult biological problems [4,5], we are improve phenotype Inferred developing games with the purpose of improving human phenotype prediction Score: 78 (percent correct) decision tree predictions. Our games work on two levels: (1) games such as Dizeez and GenESP collect novel gene annotations and (2) games like Combo Game Score: determined by Phenotype 1 Network Guided Forest from Dutkowski et al (2011) engage players directly in the process of predictor inference. estimating performance of trees Phenotype 2 constructed using the selected Feature sets from many Play game prototypes at: http://www.genegames.org • What about knowledge that is not recorded in features on training data. individual games used to create a Decision Tree Forest classifier. (Also see Poster I03) structured databases? (Each tree votes once.) Challenge Opportunity Human Guided Forest Ensemble classifier where make predictions on • Online games are successfully tapping into the components are decision cancer normal new samples knowledge and reasoning abilities of trees constructed using thousands of people. manually selected subsets of find patterns features. Adaptation of cancer Network Guided and Random Forests [1]. normal Label all images on the Web Devise protein folding algorithms REFERENCES 1. Dutkowski and Ideker (2011) Protein Networks as Logic Functions in Development and Cancer. PLoS Computational Biology Design RNA molecules Fix multiple sequence alignments 2. Winter et al (2012) Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based Ranking of Marker Genes. PLoS Computational Biology 3. Liu et al (2012) Identifying dysregulated pathways in cancers from pathway interaction networks. BMC Bioinformatics • With tens of thousands of measurements • COMBO is designed to motivate and enable 4. Good and Su (2011) Games with a Scientific Purpose. Genome Biology 5. Kawrykow et al (2012) Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment. PLoS One but only hundreds of samples, many people to help improve phenotype predictors possible patterns are found. CONTACT • But which ones are real? Benjamin Good: bgood@scripps.edu Salvatore Loguercio: loguerci@scripps.edu Andrew Su: asu@scripps.edu FUNDING select predictive gene sets We acknowledge support from the National Institute of General Medical Sciences (GM089820 and GM083924) and the NIH through the FaceBase Consortium for a particular emphasis on craniofacial genes (DE-20057). .