Mais conteúdo relacionado
Mais de Benjamin Good (20)
ASHG poster - Games for gene annotation and phenotype classification
- 1. Games for gene annotation and phenotype classification
Andrew I. Su, Salvatore Loguercio, Benjamin M. Good
Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA
ABSTRACT Game 3: The Cure
The Empire State Building was built with 7 million hours of human effort. The Panama make predictions on
The Challenge cancer normal
Canal took 20 million hours to complete. By comparison, it is estimated that up to 150 new samples
billion hours are spent playing games every year (9 billion on Solitaire alone). Obviously
people play games because they are enjoyable and fun. But aside from that enjoyment, • With tens of thousands of
find patterns cancer
games largely result in no tangible benefit, neither to the individual nor to society at measurements but only
large. hundreds of samples,
Recently, several groups have built “games with a purpose”, a class of games that many possible patterns are normal
focuses on collaboratively harnessing gamers for productive ends. In biology, games have found.
been built to fold proteins and RNAs, and to perform multiple sequence alignment. Here, • But which ones are real?
we present our efforts to apply games to two critical challenges in genetics.
First, we have built games focused on organizing and structuring gene annotations. With
the increasing popularity of genome-scale science, many analysis strategies (including • Prior knowledge encoded in databases has been used to improve classifiers by
gene set enrichment, pathway analysis, and cross-species comparisons) depend on guiding the search predictive gene sets [3]
comprehensive and accurate gene annotations. These structured annotations are mostly • What about knowledge that is not recorded in structured databases?
the result of centralized manual curation efforts, but these initiatives do not scale well • The Cure is designed to motivate and enable people to help improve the feature
with the explosive growth of the biomedical literature. We describe several games that selection step for predictor inference.
target working biologists to extract their expert domain knowledge in computable form.
http://genegames.org/cure/
Second, we describe a game for predicting human phenotypes from molecular
descriptors. Researchers can now relatively easily characterize any biological sample The Game Gene info. provided from
Gene Ontology, Gene Rifs.
according to a number of features, including genotype, gene expression, and epigenetics. • Goal: pick the best set Search box highlights genes
A key challenge in the field is identifying exactly which of those molecular features can be of genes. with annotation match
used to predict a clinical phenotype like disease susceptibility or adverse drug events. • Best: the gene set that
While statistical classifiers have been applied to this challenge, they typically do not produces the best
incorporate prior biological knowledge, and they often fail to replicate in external test decision tree classifier of
populations. Here, we present results from the „The Cure‟ a game to help identify breast cancer prognosis.
biomarker gene sets that can be used to improve predictions of breast cancer prognosis • Classifier: created using
based on gene expression. training data and
Play these games now!!! at: http://genegames.org selected genes, used to
predict phenotype.
Game 1: Dizeez • Score: cross-validation
performance of decision
• Purpose: identify new gene-disease links tree using selected
• Rules: genes and training data.
• Select biological area (e.g. ‘cancer’) to start game.
Decision trees built
• Given a gene, guess the related disease. Your current ‘hand’. automatically using
round ends at 5 cards genes in player’s
• Points are awarded for correct guesses within one hands
minute.
• ‘Correct’ answers drawn from text mining
RESULTS
• Data: • 214 Players registered (125 in 1st • Clinical data
• When several different players suggest the same week): 40% have a PhD. (Age, etc.)
‘incorrect’ gene-disease link, we detect a new candidate
gene annotation.
DIzeez Results • Predictor scored 69% correct on
Sage Breast Cancer Prognosis
• Time frame: 2 months Challenge test set. [4]
• (Best of all submitted predictors
• Unique players: 230
scored 72%)
• Games played: 1045 • Awaiting results on external
• Guesses collected: 8,525 • 3,954 games played in 47 days
validation set.
• Unique gene-disease pairs: 6,941 Genes selected at
• Guesses that match existing annotation: highest frequency
4804 (69%)
• For 14 novel gene-disease pairs guessed REFERENCES
by >3 players, 9 (64%) were validated by 1. Salvatore Loguercio, Benjamin M. Good, Andrew I. Su (2012) Dizeez: an online game for
a literature search human gene-disease annotation. In: Bio-Ontologies SIG, ISMB: 15 July 2011, Vienna.
• Player consensus correlates with probability of validation [1] http://bio-ontologies.knowledgeblog.org/438
2. Luis Von Ahn and Laura Dabbish (2004) Labeling images with a computer game. In:
Game 2: GenESP Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
3. Janus Dutkowski and Trey Ideker (2011) Protein Networks as Logic Functions in
Development and Cancer. PLoS Computational Biology
• Direct reward for
4. Sage bionetworks: DREAM7 Breast Cancer Prognosis Challenge. http://www.the-dream-
consensus formation
project.org/challenges/sage-bionetworks-dream-breast-cancer-prognosis-challenge
• Multiplayer
• Open-ended
Contact and Acknowledgements
• Tested pattern [2]
Benjamin Good: bgood@scripps.edu @bgood , Andrew Su: asu@scripps.edu @andrew.su
• Work in Progress
Guess what genes your We acknowledge support from the National Institute of General Medical Sciences
partner is thinking about
when they see
(GM089820 and GM083924) and the NIH through the FaceBase Consortium for a particular
‘neuroblastoma’ emphasis on craniofacial genes (DE-20057).
RESEARCH POSTER PRESENTATION DESIGN © 2012
www.PosterPresentations.com