SlideShare uma empresa Scribd logo
1 de 33
GeneGames.org
  The Gene Wiki: Crowdsourcing
     human gene annotation
                Andrew Su, Ph.D.
                    @andrewsu
                  asu@scripps.edu
                   http://sulab.org   OK

          Genome Informatics          OK

          September 6, 2012
2
The Gene Wiki crib sheet
                                                   http://www.slideshare.net/andrewsu

   • Bulk creation of ~10k Wikipedia articles
     (http://dx.doi.org/10.1371/journal.pbio.0060175)
   • Monthly stats: > 4 million views, > 1000 edits
     (http://dx.doi.org/10.1093/nar/gkr925)
   • Text mining reveals novel Gene Ontology and Disease
     Ontology annotations (http://dx.doi.org/doi:10.1186/1471-2164-
     12-603)
   • Mash-up with SNPedia for crowdsourced gene-
     disease database (http://www.jbiomedsem.com/content/3/S1/S6)
   • Merging Wikipedia with the Semantic Web
     (http://dx.doi.org/10.1093/database/bar060)
3



Seven million human hours




                            http://www.flickr.com/photos/archana3k1/4124330493/
4



Twenty million human hours




                             http://www.flickr.com/photos/ableman/2171326385/
5
-
    150 billion human hours
              per year




                              http://www.flickr.com/photos/rvp-cw/6243289302/
6
Using games to fold proteins



        Fold.it players have successfully:
        • Outperformed state of the art protein
          folding algorithms (Cooper, Nature, 2010)
        • Solved a previously-intractable crystal
          structure (Khatib, Nat Struct Mol Biol, 2011)
        • Designed an improved protein folding
          algorithm (Khatib, PNAS, 2011)
        • Improved enzyme activity of de novo
          designed enzyme (Eiben, Nat Biotechnol, 2011)

                         http://fold.it
7
Using games to fold RNAs




              http://eterna.cmu.edu/
8
Using games to align sequences




              http://phylo.cs.mcgill.ca
9
Using games to annotate genes?




              http://genegames.org
10
No good gene-disease annotation database
             Query: Apolipoprotein E




            Alzheimer's disease (AD)
            Lipoprotein glomerulopathy
            Sea-blue histiocyte disease
11
No good gene-disease annotation database
             Query: Apolipoprotein E




            Alzheimer's disease (AD)
            Lipoprotein glomerulopathy
            Sea-blue histiocyte disease
            Hyperlipoproteinemia, type III
            Macular degeneration, age-related
            Myocardial infarction susceptibility
12
No good gene-disease annotation database
              Query: Apolipoprotein E




           ? Alzheimer's disease (AD)
           ? Lipoprotein glomerulopathy
           ? Sea-blue histiocyte disease
             Hyperlipoproteinemia, type III
           ? Macular degeneration, age-related
           ? Myocardial infarction susceptibility
             HIV
             Psoriasis
             Vascular Diseases
13
No good gene-disease annotation database
             Query: Apolipoprotein E




            Alzheimer's disease (AD)    Memory
                                        Coronary Artery Disease
            Neuropsychological Tests    Hypertension
            Cognition Disorders         Mental Status Schedule
                                        Psychiatric Status Rating
            Dementia                        Scales
            Cognition                   Hyperlipidemias
                                        Atrophy
            Disease Progression         Dementia, Vascular
            Cardiovascular Diseases     Parkinson Disease
                                        Brain Injuries
            Coronary Disease            Myocardial Infarction
            Diabetes Mellitus, Type 2   …

            Memory Disorders            477 diseases!
14
Play Dizeez to annotate gene-disease links
                                                6. Play to win!
               5. Hurry!
                                 4. Then on to the
                                 next question…

           3. If it‟s „right‟, you get points

            1. Read the clue (gene)




                             2. Click the related disease
                                (only one is “right”)
15
Dizeez players seem pretty smart…

  In total (since Dec 2011):
  • 207 unique gamers
  • 1045 games played
  • 8525 guesses

# Occurrences   Gene Disease              Pubmed   OMIM PharmGKB   Gene Wiki

      7         GAST gastrinoma
      7         RBP3 retinoblastoma
      7         SSX1 synovial sarcoma
      6          TG    Graves' disease
      6         CRYGC Cataract
      6         SOX8 mental retardation
      6          WRN Werner syndrome
      6          ABL1 leukemia
      6         MLL3 leukemia
      6         SNAI2 breast carcinoma
16
Dizeez players seem pretty smart…

  In total (since Dec 2011):
  • 207 unique gamers
  • 1045 games played
  • 8525 guesses

# Occurrences    Gene Disease              Pubmed   OMIM PharmGKB   Gene Wiki

      5         MECOM sarcoma
      4         ATF7   cancer
      3         ABCB5 acute myeloid leukemia
      3         SART1 glioblastoma
      3         NCK1   leukemia
      3         NEK1   cancer
17
Using games to predict phenotype from genotype?




                                  The Cure




               http://genegames.org
18
Classification problems in genome biology

                                                   Classify new
   cancer                    normal                  samples


                                      find patterns
                                                                  cancer
   100,000s features




                                                                  normal
                                          SVM
                                         Neural
                                        networks
                                          Naïve
                                          Bayes
                                          KNN
                                           …
                       100s samples
19
Random forests
                                      Sample subset
                                       of cases and   Train decision
  cancer                     normal       features         tree
   100,000s features




                       100s samples
20
Random forests


  cancer                     normal
   100,000s features




                       100s samples
21
Random forests

                                                         Classify new
  cancer                     normal                        samples



                                                                        cancer
   100,000s features




                                                                        normal




                                      How to interject
                                        biological
                       100s samples    knowledge?
22
Network-guided forests




                         Dutkowski & Ideker (2011). PLoS Computational Biology
23
Network-guided forests
                                          Sample
                                      features by PPI   Train decision
  cancer                     normal       network            tree
   100,000s features




                       100s samples
24
Human-guided forests
                                        Sample
                                      features by    Train decision
  cancer                     normal      human            tree
                                      intelligence
   100,000s features




                       100s samples
25
The Cure: Genomic predictors for disease
26
The Cure: Genomic predictors for disease
27
The Cure: Genomic predictors for disease
28
The Cure: Genomic predictors for disease
29
The Cure: Genomic predictors for disease
30
The Cure: Genomic predictors for disease
31
Human-guided forests

                       Classify new
                         samples



                                      cancer
                                      normal
32
“Critical Assessment”-style challenge




      Will this work? Check our blog after October 15.
33
       Collaborators                                                        Group members
Doug Howe, ZFIN                                             Ben Good                   Max Nanis
John Hogenesch, U Penn
Jon Huss, GNF
                                                            Salvatore Loguercio        Chunlei Wu
Luca de Alfaro, UCSC                                        Ian Macleod
Angel Pizzaro, U Penn
Faramarz Valafar, SDSU
Pierre Lindenbaum,
      Fondation Jean Dausset
Michael Martone, Rush
Konrad Koehler, Karo Bio
Warren Kibbe, Simon Lim, Northwestern
Many Wikipedia editors
    WP:MCB Project



                                                                                         Contact
                                                                                     http://sulab.org
 Recruiting graduate students
                                                                                    asu@scripps.edu
  in quantitative biology! See                                                        @andrewsu
 http://education.scripps.edu/                                                        +Andrew Su



                                        Funding and Support


                                                                                      @genegame
                                   (BioGPS: GM83924, Gene Wiki: GM089820)

Mais conteúdo relacionado

Destaque

3B_2_Development of a server to manage a customised localised local version o...
3B_2_Development of a server to manage a customised localised local version o...3B_2_Development of a server to manage a customised localised local version o...
3B_2_Development of a server to manage a customised localised local version o...GISRUK conference
 
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotationISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotationAndrew Su
 
Phenotypes and models at rgd -meet joe rat
Phenotypes and models at rgd -meet joe ratPhenotypes and models at rgd -meet joe rat
Phenotypes and models at rgd -meet joe ratJennifer Smith
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informaticsDaniela Rotariu
 
Proteins – Basics you need to know for Proteomics
Proteins – Basics you need to know for ProteomicsProteins – Basics you need to know for Proteomics
Proteins – Basics you need to know for ProteomicsLionel Wolberger
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsNikesh Narayanan
 

Destaque (11)

3B_2_Development of a server to manage a customised localised local version o...
3B_2_Development of a server to manage a customised localised local version o...3B_2_Development of a server to manage a customised localised local version o...
3B_2_Development of a server to manage a customised localised local version o...
 
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotationISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
 
Phenotypes and models at rgd -meet joe rat
Phenotypes and models at rgd -meet joe ratPhenotypes and models at rgd -meet joe rat
Phenotypes and models at rgd -meet joe rat
 
Biological sequences analysis
Biological sequences analysisBiological sequences analysis
Biological sequences analysis
 
Phylogenetics1
Phylogenetics1Phylogenetics1
Phylogenetics1
 
CV Biplabendu Das
CV Biplabendu DasCV Biplabendu Das
CV Biplabendu Das
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
 
Proteins – Basics you need to know for Proteomics
Proteins – Basics you need to know for ProteomicsProteins – Basics you need to know for Proteomics
Proteins – Basics you need to know for Proteomics
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Molecular marker
Molecular markerMolecular marker
Molecular marker
 
Translation
TranslationTranslation
Translation
 

Semelhante a GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Xenotech presentation May 1 2008
Xenotech presentation May 1 2008Xenotech presentation May 1 2008
Xenotech presentation May 1 2008NSF Health Sciences
 
Judith Campisi at Health Extension Salon #3
Judith Campisi at Health Extension Salon #3Judith Campisi at Health Extension Salon #3
Judith Campisi at Health Extension Salon #3Health_Extension
 
Neurodegerative Disorder.pptx
Neurodegerative Disorder.pptxNeurodegerative Disorder.pptx
Neurodegerative Disorder.pptxBharatipathopunu
 
Neurodegerative Disorder.pptx
Neurodegerative Disorder.pptxNeurodegerative Disorder.pptx
Neurodegerative Disorder.pptxBVDUPathology1
 
Haas diagnosis 2012
Haas diagnosis 2012Haas diagnosis 2012
Haas diagnosis 2012mitoaction
 
Alz capability 1.13
Alz capability 1.13Alz capability 1.13
Alz capability 1.13Folio Bio
 
Stephan Zuckner - 'Neuropatías periféricas hereditarias'
Stephan Zuckner - 'Neuropatías periféricas hereditarias'Stephan Zuckner - 'Neuropatías periféricas hereditarias'
Stephan Zuckner - 'Neuropatías periféricas hereditarias'Fundación Ramón Areces
 
From Bench to Bedside: Research and Clinical Applications of Induced Pluripot...
From Bench to Bedside: Research and Clinical Applications of Induced Pluripot...From Bench to Bedside: Research and Clinical Applications of Induced Pluripot...
From Bench to Bedside: Research and Clinical Applications of Induced Pluripot...TheresaGold
 
Stem cells in regenrative therapy
Stem cells in regenrative therapyStem cells in regenrative therapy
Stem cells in regenrative therapyRaghavendra Raghu
 
Copy number variations in monozygotic twins discordant for schizophrenia
Copy number variations in monozygotic twins discordant for schizophreniaCopy number variations in monozygotic twins discordant for schizophrenia
Copy number variations in monozygotic twins discordant for schizophreniaccastel3
 
Elementary genetics by momen
Elementary genetics by momenElementary genetics by momen
Elementary genetics by momenMomen Ali Khan
 
Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldJoe Parker
 

Semelhante a GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012) (20)

Xenotech presentation May 1 2008
Xenotech presentation May 1 2008Xenotech presentation May 1 2008
Xenotech presentation May 1 2008
 
Judith Campisi at Health Extension Salon #3
Judith Campisi at Health Extension Salon #3Judith Campisi at Health Extension Salon #3
Judith Campisi at Health Extension Salon #3
 
Biotechnology
BiotechnologyBiotechnology
Biotechnology
 
Neurodegerative Disorder.pptx
Neurodegerative Disorder.pptxNeurodegerative Disorder.pptx
Neurodegerative Disorder.pptx
 
Neurodegerative Disorder.pptx
Neurodegerative Disorder.pptxNeurodegerative Disorder.pptx
Neurodegerative Disorder.pptx
 
Haas diagnosis 2012
Haas diagnosis 2012Haas diagnosis 2012
Haas diagnosis 2012
 
Alz capability 1.13
Alz capability 1.13Alz capability 1.13
Alz capability 1.13
 
Presentation from Dr. Melton
Presentation from Dr. MeltonPresentation from Dr. Melton
Presentation from Dr. Melton
 
Stephan Zuckner - 'Neuropatías periféricas hereditarias'
Stephan Zuckner - 'Neuropatías periféricas hereditarias'Stephan Zuckner - 'Neuropatías periféricas hereditarias'
Stephan Zuckner - 'Neuropatías periféricas hereditarias'
 
SBGN comprehensive disease maps at LCSB.
SBGN comprehensive disease maps at LCSB.SBGN comprehensive disease maps at LCSB.
SBGN comprehensive disease maps at LCSB.
 
From Bench to Bedside: Research and Clinical Applications of Induced Pluripot...
From Bench to Bedside: Research and Clinical Applications of Induced Pluripot...From Bench to Bedside: Research and Clinical Applications of Induced Pluripot...
From Bench to Bedside: Research and Clinical Applications of Induced Pluripot...
 
2013 alumni-webinar
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinar
 
Stem cells in regenrative therapy
Stem cells in regenrative therapyStem cells in regenrative therapy
Stem cells in regenrative therapy
 
Copy number variations in monozygotic twins discordant for schizophrenia
Copy number variations in monozygotic twins discordant for schizophreniaCopy number variations in monozygotic twins discordant for schizophrenia
Copy number variations in monozygotic twins discordant for schizophrenia
 
Cloning - #Scichallenge2017
Cloning - #Scichallenge2017Cloning - #Scichallenge2017
Cloning - #Scichallenge2017
 
Apoptosis Pathway
 Apoptosis Pathway Apoptosis Pathway
Apoptosis Pathway
 
Elementary genetics by momen
Elementary genetics by momenElementary genetics by momen
Elementary genetics by momen
 
Pathology of CNS Degenerations Lecture
Pathology of CNS Degenerations LecturePathology of CNS Degenerations Lecture
Pathology of CNS Degenerations Lecture
 
Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' world
 
Genetics and health
Genetics and healthGenetics and health
Genetics and health
 

Mais de Andrew Su

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphAndrew Su
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesAndrew Su
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeAndrew Su
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...Andrew Su
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)Andrew Su
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseAndrew Su
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Andrew Su
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Andrew Su
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchAndrew Su
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceAndrew Su
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceAndrew Su
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Andrew Su
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeAndrew Su
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6Andrew Su
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Andrew Su
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceAndrew Su
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Andrew Su
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgAndrew Su
 

Mais de Andrew Su (20)

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 

Último

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

  • 1. GeneGames.org The Gene Wiki: Crowdsourcing human gene annotation Andrew Su, Ph.D. @andrewsu asu@scripps.edu http://sulab.org OK Genome Informatics OK September 6, 2012
  • 2. 2 The Gene Wiki crib sheet http://www.slideshare.net/andrewsu • Bulk creation of ~10k Wikipedia articles (http://dx.doi.org/10.1371/journal.pbio.0060175) • Monthly stats: > 4 million views, > 1000 edits (http://dx.doi.org/10.1093/nar/gkr925) • Text mining reveals novel Gene Ontology and Disease Ontology annotations (http://dx.doi.org/doi:10.1186/1471-2164- 12-603) • Mash-up with SNPedia for crowdsourced gene- disease database (http://www.jbiomedsem.com/content/3/S1/S6) • Merging Wikipedia with the Semantic Web (http://dx.doi.org/10.1093/database/bar060)
  • 3. 3 Seven million human hours http://www.flickr.com/photos/archana3k1/4124330493/
  • 4. 4 Twenty million human hours http://www.flickr.com/photos/ableman/2171326385/
  • 5. 5 - 150 billion human hours per year http://www.flickr.com/photos/rvp-cw/6243289302/
  • 6. 6 Using games to fold proteins Fold.it players have successfully: • Outperformed state of the art protein folding algorithms (Cooper, Nature, 2010) • Solved a previously-intractable crystal structure (Khatib, Nat Struct Mol Biol, 2011) • Designed an improved protein folding algorithm (Khatib, PNAS, 2011) • Improved enzyme activity of de novo designed enzyme (Eiben, Nat Biotechnol, 2011) http://fold.it
  • 7. 7 Using games to fold RNAs http://eterna.cmu.edu/
  • 8. 8 Using games to align sequences http://phylo.cs.mcgill.ca
  • 9. 9 Using games to annotate genes? http://genegames.org
  • 10. 10 No good gene-disease annotation database Query: Apolipoprotein E Alzheimer's disease (AD) Lipoprotein glomerulopathy Sea-blue histiocyte disease
  • 11. 11 No good gene-disease annotation database Query: Apolipoprotein E Alzheimer's disease (AD) Lipoprotein glomerulopathy Sea-blue histiocyte disease Hyperlipoproteinemia, type III Macular degeneration, age-related Myocardial infarction susceptibility
  • 12. 12 No good gene-disease annotation database Query: Apolipoprotein E ? Alzheimer's disease (AD) ? Lipoprotein glomerulopathy ? Sea-blue histiocyte disease Hyperlipoproteinemia, type III ? Macular degeneration, age-related ? Myocardial infarction susceptibility HIV Psoriasis Vascular Diseases
  • 13. 13 No good gene-disease annotation database Query: Apolipoprotein E Alzheimer's disease (AD) Memory Coronary Artery Disease Neuropsychological Tests Hypertension Cognition Disorders Mental Status Schedule Psychiatric Status Rating Dementia Scales Cognition Hyperlipidemias Atrophy Disease Progression Dementia, Vascular Cardiovascular Diseases Parkinson Disease Brain Injuries Coronary Disease Myocardial Infarction Diabetes Mellitus, Type 2 … Memory Disorders 477 diseases!
  • 14. 14 Play Dizeez to annotate gene-disease links 6. Play to win! 5. Hurry! 4. Then on to the next question… 3. If it‟s „right‟, you get points 1. Read the clue (gene) 2. Click the related disease (only one is “right”)
  • 15. 15 Dizeez players seem pretty smart… In total (since Dec 2011): • 207 unique gamers • 1045 games played • 8525 guesses # Occurrences Gene Disease Pubmed OMIM PharmGKB Gene Wiki 7 GAST gastrinoma 7 RBP3 retinoblastoma 7 SSX1 synovial sarcoma 6 TG Graves' disease 6 CRYGC Cataract 6 SOX8 mental retardation 6 WRN Werner syndrome 6 ABL1 leukemia 6 MLL3 leukemia 6 SNAI2 breast carcinoma
  • 16. 16 Dizeez players seem pretty smart… In total (since Dec 2011): • 207 unique gamers • 1045 games played • 8525 guesses # Occurrences Gene Disease Pubmed OMIM PharmGKB Gene Wiki 5 MECOM sarcoma 4 ATF7 cancer 3 ABCB5 acute myeloid leukemia 3 SART1 glioblastoma 3 NCK1 leukemia 3 NEK1 cancer
  • 17. 17 Using games to predict phenotype from genotype? The Cure http://genegames.org
  • 18. 18 Classification problems in genome biology Classify new cancer normal samples find patterns cancer 100,000s features normal SVM Neural networks Naïve Bayes KNN … 100s samples
  • 19. 19 Random forests Sample subset of cases and Train decision cancer normal features tree 100,000s features 100s samples
  • 20. 20 Random forests cancer normal 100,000s features 100s samples
  • 21. 21 Random forests Classify new cancer normal samples cancer 100,000s features normal How to interject biological 100s samples knowledge?
  • 22. 22 Network-guided forests Dutkowski & Ideker (2011). PLoS Computational Biology
  • 23. 23 Network-guided forests Sample features by PPI Train decision cancer normal network tree 100,000s features 100s samples
  • 24. 24 Human-guided forests Sample features by Train decision cancer normal human tree intelligence 100,000s features 100s samples
  • 25. 25 The Cure: Genomic predictors for disease
  • 26. 26 The Cure: Genomic predictors for disease
  • 27. 27 The Cure: Genomic predictors for disease
  • 28. 28 The Cure: Genomic predictors for disease
  • 29. 29 The Cure: Genomic predictors for disease
  • 30. 30 The Cure: Genomic predictors for disease
  • 31. 31 Human-guided forests Classify new samples cancer normal
  • 32. 32 “Critical Assessment”-style challenge Will this work? Check our blog after October 15.
  • 33. 33 Collaborators Group members Doug Howe, ZFIN Ben Good Max Nanis John Hogenesch, U Penn Jon Huss, GNF Salvatore Loguercio Chunlei Wu Luca de Alfaro, UCSC Ian Macleod Angel Pizzaro, U Penn Faramarz Valafar, SDSU Pierre Lindenbaum, Fondation Jean Dausset Michael Martone, Rush Konrad Koehler, Karo Bio Warren Kibbe, Simon Lim, Northwestern Many Wikipedia editors WP:MCB Project Contact http://sulab.org Recruiting graduate students asu@scripps.edu in quantitative biology! See @andrewsu http://education.scripps.edu/ +Andrew Su Funding and Support @genegame (BioGPS: GM83924, Gene Wiki: GM089820)

Notas do Editor

  1. Empire state building
  2. One of the seven wonders of the modern world
  3. Except for a bit of personal pleasure, that expended effort has no societal valueOver last ~decade, “serious games” have attempted to harness this resourceTraining and educationHealth and fitness
  4. Question: how to interject biological knowledge in the feature selection process?