Digital Identity is Under Attack: FIDO Paris Seminar.pptx
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
1. GeneGames.org
The Gene Wiki: Crowdsourcing
human gene annotation
Andrew Su, Ph.D.
@andrewsu
asu@scripps.edu
http://sulab.org OK
Genome Informatics OK
September 6, 2012
2. 2
The Gene Wiki crib sheet
http://www.slideshare.net/andrewsu
• Bulk creation of ~10k Wikipedia articles
(http://dx.doi.org/10.1371/journal.pbio.0060175)
• Monthly stats: > 4 million views, > 1000 edits
(http://dx.doi.org/10.1093/nar/gkr925)
• Text mining reveals novel Gene Ontology and Disease
Ontology annotations (http://dx.doi.org/doi:10.1186/1471-2164-
12-603)
• Mash-up with SNPedia for crowdsourced gene-
disease database (http://www.jbiomedsem.com/content/3/S1/S6)
• Merging Wikipedia with the Semantic Web
(http://dx.doi.org/10.1093/database/bar060)
5. 5
-
150 billion human hours
per year
http://www.flickr.com/photos/rvp-cw/6243289302/
6. 6
Using games to fold proteins
Fold.it players have successfully:
• Outperformed state of the art protein
folding algorithms (Cooper, Nature, 2010)
• Solved a previously-intractable crystal
structure (Khatib, Nat Struct Mol Biol, 2011)
• Designed an improved protein folding
algorithm (Khatib, PNAS, 2011)
• Improved enzyme activity of de novo
designed enzyme (Eiben, Nat Biotechnol, 2011)
http://fold.it
10. 10
No good gene-disease annotation database
Query: Apolipoprotein E
Alzheimer's disease (AD)
Lipoprotein glomerulopathy
Sea-blue histiocyte disease
11. 11
No good gene-disease annotation database
Query: Apolipoprotein E
Alzheimer's disease (AD)
Lipoprotein glomerulopathy
Sea-blue histiocyte disease
Hyperlipoproteinemia, type III
Macular degeneration, age-related
Myocardial infarction susceptibility
12. 12
No good gene-disease annotation database
Query: Apolipoprotein E
? Alzheimer's disease (AD)
? Lipoprotein glomerulopathy
? Sea-blue histiocyte disease
Hyperlipoproteinemia, type III
? Macular degeneration, age-related
? Myocardial infarction susceptibility
HIV
Psoriasis
Vascular Diseases
13. 13
No good gene-disease annotation database
Query: Apolipoprotein E
Alzheimer's disease (AD) Memory
Coronary Artery Disease
Neuropsychological Tests Hypertension
Cognition Disorders Mental Status Schedule
Psychiatric Status Rating
Dementia Scales
Cognition Hyperlipidemias
Atrophy
Disease Progression Dementia, Vascular
Cardiovascular Diseases Parkinson Disease
Brain Injuries
Coronary Disease Myocardial Infarction
Diabetes Mellitus, Type 2 …
Memory Disorders 477 diseases!
14. 14
Play Dizeez to annotate gene-disease links
6. Play to win!
5. Hurry!
4. Then on to the
next question…
3. If it‟s „right‟, you get points
1. Read the clue (gene)
2. Click the related disease
(only one is “right”)
15. 15
Dizeez players seem pretty smart…
In total (since Dec 2011):
• 207 unique gamers
• 1045 games played
• 8525 guesses
# Occurrences Gene Disease Pubmed OMIM PharmGKB Gene Wiki
7 GAST gastrinoma
7 RBP3 retinoblastoma
7 SSX1 synovial sarcoma
6 TG Graves' disease
6 CRYGC Cataract
6 SOX8 mental retardation
6 WRN Werner syndrome
6 ABL1 leukemia
6 MLL3 leukemia
6 SNAI2 breast carcinoma
16. 16
Dizeez players seem pretty smart…
In total (since Dec 2011):
• 207 unique gamers
• 1045 games played
• 8525 guesses
# Occurrences Gene Disease Pubmed OMIM PharmGKB Gene Wiki
5 MECOM sarcoma
4 ATF7 cancer
3 ABCB5 acute myeloid leukemia
3 SART1 glioblastoma
3 NCK1 leukemia
3 NEK1 cancer
17. 17
Using games to predict phenotype from genotype?
The Cure
http://genegames.org
18. 18
Classification problems in genome biology
Classify new
cancer normal samples
find patterns
cancer
100,000s features
normal
SVM
Neural
networks
Naïve
Bayes
KNN
…
100s samples
19. 19
Random forests
Sample subset
of cases and Train decision
cancer normal features tree
100,000s features
100s samples
33. 33
Collaborators Group members
Doug Howe, ZFIN Ben Good Max Nanis
John Hogenesch, U Penn
Jon Huss, GNF
Salvatore Loguercio Chunlei Wu
Luca de Alfaro, UCSC Ian Macleod
Angel Pizzaro, U Penn
Faramarz Valafar, SDSU
Pierre Lindenbaum,
Fondation Jean Dausset
Michael Martone, Rush
Konrad Koehler, Karo Bio
Warren Kibbe, Simon Lim, Northwestern
Many Wikipedia editors
WP:MCB Project
Contact
http://sulab.org
Recruiting graduate students
asu@scripps.edu
in quantitative biology! See @andrewsu
http://education.scripps.edu/ +Andrew Su
Funding and Support
@genegame
(BioGPS: GM83924, Gene Wiki: GM089820)
Notas do Editor
Empire state building
One of the seven wonders of the modern world
Except for a bit of personal pleasure, that expended effort has no societal valueOver last ~decade, “serious games” have attempted to harness this resourceTraining and educationHealth and fitness
Question: how to interject biological knowledge in the feature selection process?