SlideShare uma empresa Scribd logo
1 de 40
Baixar para ler offline
@yannick__ http://wurmlab.github.io
Opportunities to
overcome human &
computational challenges
in biology.
2015-03-25-CW15
This changes
everything.454
Illumina
Solid...
Any lab can
sequence
anything!
• Biology/life is complex
• Field is young.
• Biologists lack computational training.
• Generally, analysis tools suck.
• badly written
• badly tested
• hard to install
• output quality… often questionable.
• Understanding/visualizing/massaging data is hard.
• Datasets continue to grow!
Genomics is hard.
Four tools
suck less.
Four tools that
Four tools
suck less.
(hopefully)
Four tools that
1. SequenceServer
“Can you BLAST this for me?”
BLAST is the most commonly used tool: >100,000 citations
But:
•convoluted interface
•challenging on custom data
http://www.sequenceserver.com/
If no config file:Asks interactive setup questions.
If needed: Downloads BLAST binaries
If needed: Formats FASTA into BLAST database.
1. Installing
gem install sequenceserver
### Launched SequenceServer at: http://0.0.0.0:4567
2. Launch
sequenceserver
Demo
Anurag Priyam - @yeban
2. oswitch
Bioinformatics software challenges
• Software hard to install 😖
• Central versions on cluster
• “I need it now” - slow system administrators? → delays
• Impossible to install?
• AmazonVirtual Machine - copy there - run analysis - copy
back→😕
• Consistent version for specific project?
One project dozens of tools?
Miracle solution?
Too complicated & too much copying back & forth
mymacbook:~/2015-­‐02-­‐01-­‐myproject>	
  abyss-­‐pe	
  k=25	
  reads.fastq.gz	
  
	
  	
  	
  	
  zsh:	
  command	
  not	
  found:	
  abyss-­‐pe	
  
mymacbook:~/2015-­‐02-­‐01-­‐myproject>	
  oswitch	
  -­‐l	
  
	
  	
  	
  	
  yeban/biolinux:8	
  
	
  	
  	
  	
  ubuntu:14.04	
  
	
  	
  	
  	
  ontouchstart/texlive-­‐full	
  
	
  	
  	
  	
  ipython/ipython	
  
mymacbook:~/2015-­‐02-­‐01-­‐myproject>	
  oswitch	
  yeban/biolinux	
  
	
  	
  	
  	
  ######	
  You	
  are	
  now	
  running:	
  biolinux	
  in	
  container	
  biolinux-­‐7187.	
  ######	
  
biolinux-­‐7187:~/2015-­‐02-­‐01-­‐myproject>	
  abyss-­‐pe	
  k=25	
  reads.fastq.gz	
  
	
  	
  	
  	
  [...	
  just	
  works	
  on	
  your	
  files	
  where	
  they	
  are…]	
  
biolinux-­‐7187:~/2015-­‐02-­‐01-­‐myproject>	
  exit	
  
mymacbook:~/2015-­‐02-­‐01-­‐myproject>	
  
	
  	
  	
  	
  [...	
  output	
  is	
  where	
  you	
  expect	
  it	
  to	
  be	
  ...]	
  
https://github.com/yeban/oswitch
oSwitch
One-line access to other operating systems.
oSwitch
One-line access to other operating systems.
Things feel (largely) unchanged:
• Current working directory is maintained.
• User name, uid and gid are maintained.
• Login shell (bash/zsh/fish) is maintained.
• Home directory is maintained (thus all .dotfiles and config files are maintained).
• read/write permissions are maintained.
• Paths are maintained whenever possible.Thus volumes (external drives, NAS, USB)
mounted on the host are available in the container at the same path.
https://github.com/yeban/oswitch
Working with Gene predictions
Gene prediction
Dozens of software algorithms: dozens of predictions
20% failure rate:
•missing pieces
•extra pieces
•incorrect merging
•incorrect splitting
Visual inspection... and
manual fixing required.
1 gene = 5 minutes to 3 days
Yandell&Ence2013NRG
GTTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATTATGTTGAATaTTAGGGTTTTTATAAAGAATGTGTATATTGUTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTTA
Evidence
Consensus:
2. GeneValidator
Monica Dragan
Ismail Moghul
https://github.com/monicadragan/GeneValidator
http://genevalidator.sbcs.qmul.ac.uk
Monica Dragan
https://github.com/monicadragan/GeneValidator
http://genevalidator.sbcs.qmul.ac.uk
Ismail Moghul
GeneValidator
Run on:
★whole geneset: identify most problematic predictions
★alternative models for a gene (choose best)
★individual genes (while manually curating)
3.Afra: Crowdsourcing
gene model curation
Gene prediction
Dozens of software algorithms: dozens of predictions
20% failure rate:
•missing pieces
•extra pieces
•incorrect merging
•incorrect splitting
Visual inspection... and
manual fixing required.
1 gene = 20 minutes to 3 days
15,000 genes * 20 species =
impossible.
Yandell&Ence2013NRG
TTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATACAGTTTGTAATaTTAGGTATTTTATAAACAGTGTGTATATTTCTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTT
Evidence
Consensus:
Algorithm discovery by protein folding game players
Firas Khatiba
, Seth Cooperb
, Michael D. Tykaa
, Kefan Xub
, Ilya Makedonb
, Zoran Popovićb
,
David Bakera,c,1
, and Foldit Players
a
Department of Biochemistry; b
Department of Computer Science and Engineering; and c
Howard Hughes Medical Institute, University of Washington,
Box 357370, Seattle, WA 98195
Contributed by David Baker, October 5, 2011 (sent for review June 29, 2011)
Foldit is a multiplayer online game in which players collaborate
and compete to create accurate protein structure models. For spe-
cific hard problems, Foldit player solutions can in some cases out-
perform state-of-the-art computational methods. However, very
little is known about how collaborative gameplay produces these
results and whether Foldit player strategies can be formalized and
structured so that they can be used by computers. To determine
whether high performing player strategies could be collectively
codified, we augmented the Foldit gameplay mechanics with tools
for players to encode their folding strategies as “recipes” and to
share their recipes with other players, who are able to further mod-
ify and redistribute them. Here we describe the rapid social evolu-
tion of player-developed folding algorithms that took place in the
year following the introduction of these tools. Players developed
over 5,400 different recipes, both by creating new algorithms and
by modifying and recombining successful recipes developed by
other players. The most successful recipes rapidly spread through
the Foldit player population, and two of the recipes became parti-
cularly dominant. Examination of the algorithms encoded in these
two recipes revealed a striking similarity to an unpublished algo-
rithm developed by scientists over the same period. Benchmark
calculations show that the new algorithm independently discov-
ered by scientists and by Foldit players outperforms previously
published methods. Thus, online scientific game frameworks have
the potential not only to solve hard scientific problems, but also to
discover and formalize effective new strategies and algorithms.
citizen science ∣ crowd-sourcing ∣ optimization ∣ structure prediction ∣
strategy
Citizen science is an approach to leveraging natural human
abilities for scientific purposes. Most such efforts involve
visual tasks such as tagging images or locating image features
(1–3). In contrast, Foldit is a multiplayer online scientific discovery
game, in which players become highly skilled at creating accurate
protein structure models through extended game play (4, 5). Foldit
recruits online gamers to optimize the computed Rosetta energy
using human spatial problem-solving skills. Players manipulate
protein structures with a palette of interactive tools and manipula-
tions. Through their interactive exploration Foldit players also uti-
lize user-friendly versions of algorithms from the Rosetta structure
prediction methodology (6) such as wiggle (gradient-based energy
minimization) and shake (combinatorial side chain rotamer pack-
ing). The potential of gamers to solve more complex scientific pro-
blems was recently highlighted by the solution of a long-standing
protein structure determination problem by Foldit players (7).
One of the key strengths of game-based human problem ex-
ploration is the human ability to search over the space of possible
strategies and adapt those strategies to the type of problem and
stage of problem solving (5). The variability of tactics and
strategies stems from the individuality of each player as well as
multiple methods of sharing and evolution within the game
(group play, game chat), and outside of the game [wiki pages (8)].
One way to arrive at algorithmic methods underlying successful
human Foldit play would be to apply machine learning techniques
to the detailed logs of expert Foldit players (9). We chose instead
to rely on a superior learning machine: Foldit players themselves.
As the players themselves understand their strategies better than
anyone, we decided to allow them to codify their algorithms
directly, rather than attempting to automatically learn approxi-
mations. We augmented standard Foldit play with the ability to
create, edit, share, and rate gameplay macros, referred to as
“recipes” within the Foldit game (10). In the game each player
has their own “cookbook” of such recipes, from which they can
invoke a variety of interactive automated strategies. Players can
share recipes they write with the rest of the Foldit community or
they can choose to keep their creations to themselves.
In this paper we describe the quite unexpected evolution of
recipes in the year after they were released, and the striking con-
vergence of this very short evolution on an algorithm very similar
to an unpublished algorithm recently developed independently
by scientific experts that improves over previous methods.
Results
In the social development environment provided by Foldit,
players evolved a wide variety of recipes to codify their diverse
strategies to problem solving. During the three and a half month
study period (see Materials and Methods), 721 Foldit players ran
5,488 unique recipes 158,682 times and 568 players wrote 5,202
recipes. We studied these algorithms and found that they fell
into four main categories: (i) perturb and minimize, (ii) aggressive
rebuilding, (iii) local optimize, and (iv) set constraints. The first
category goes beyond the deterministic minimize function
provided to Foldit players, which has the disadvantage of readily
being trapped in local minima, by adding in perturbations to lead
the minimizer in different directions (11). The second category
uses the rebuild tool, which performs fragment insertion with
loop closure, to search different areas of conformation space;
these recipes are often run for long periods of time as they are
designed to rebuild entire regions of a protein rather than just
refining them (Fig. S1). The third category of recipes performs
local minimizations along the protein backbone in order to im-
prove the Rosetta energy for every segment of a protein. The final
category of recipes assigns constraints between beta strands or
pairs of residues (rubber bands), or changes the secondary struc-
ture assignment to guide subsequent optimization.
Different algorithms were used with very different frequencies
during the experiment. Some are designated by the authors as
public and are available for use by all Foldit players, whereas
others are private and available only to their creator or their
Foldit team. The distribution of recipe usage among different
players is shown in Fig. 1 for the 26 recipes that were run over
1,000 times. Some recipes, such as the one represented by the
leftmost bar, were used many times by many different players,
while others, such as the one represented by the pink bar in the
Author contributions: F.K., S.C., Z.P., and D.B. designed research; F.K., S.C., M.D.T., and
F.P. performed research; F.K., S.C., M.D.T., K.X., and I.M. analyzed data; and F.K., S.C., Z.P.,
and D.B. wrote the paper.
The authors declare no conflict of interest.
Freely available online through the PNAS open access option.
1
To whom correspondence should be addressed. E-mail: dabaker@u.washington.edu.
This article contains supporting information online at www.pnas.org/lookup/suppl/
doi:10.1073/pnas.1115898108/-/DCSupplemental.
BIOPHYSICSAND
COMPUTATIONALBIOLOGY
PSYCHOLOGICALAND
COGNITIVESCIENCES
http://Fold.it
• Recruiting & retaining contributors
Crowd-sourcing the visual inspection + correction
of gene models.
Challenges
Recruiting & retaining contributors
Plan A: get students.
• Increase accessibility:
• Make tasks small & simple
• Need excellent tutorials & training
• Need an intelligent “mothering” user interface.
• Provide rewards:
• Better grades
• Learning experience
• Good karma (helping science)
• Prestige & pride (on facebook; points & badges “leaderboard”, with
certificates, in publications)
• Opportunities to develop expertise & responsibilities
Crowd-sourcing the visual inspection + correction
of gene models.
Challenges
• Recruiting & retaining contributors
• Ensuring quality
Ensuring quality
• Excellent tutorials/training
• Make tasks small & simple
• Redundancy
• Review of conflicts by senior
users.
Begin
Needs curation
Create initial tasks
Being curated
Curate
Being curated
Curate
Being curated
Curate
Submit Submit Submit
Auto-check
Done
Inconsistent: create
“review” task
Consistent:
create next required task
Crowd-sourcing the visual inspection + correction.
Challenges
http://afra.sbcs.qmul.ac.ukAnurag Priyam http://github.com/yeban/afra
• Recruiting & retaining contributors
• Ensuring quality
Warning:Work in Progress
Timelines
• Rolled out to:
• 8 MSc students
• 20 3rd year students
• Need to improve tutorials/guidance/documentation/outputs
• Roll out to 200 first years (autumn)
• Expand
5. Bionode… stay tuned.
Thanks!
y.wurm@qmul.ac.uk
@yannick__
http://wurmlab.github.io
Colleagues & Collaborators
@ QMUL & UNIL
Anurag Priyam 		 @yeban
Monica Dragan
Ismail Moghul
Vivek Rai
Bruno Vieira @bmpvieira
Minimally guided demo :)
1. Set up a custom BLAST server (Sequenceserver)
2. Set up oswitch to rapidly switch to yeban/biolinux
@yannick__

Mais conteúdo relacionado

Semelhante a Sustainable software institute Collaboration workshop

Genetic Algorithm Demonstation System
Genetic Algorithm Demonstation SystemGenetic Algorithm Demonstation System
Genetic Algorithm Demonstation System
Benjamin Murphy
 
10 Ways To Improve Your Code( Neal Ford)
10  Ways To  Improve  Your  Code( Neal  Ford)10  Ways To  Improve  Your  Code( Neal  Ford)
10 Ways To Improve Your Code( Neal Ford)
guestebde
 
Software Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesSoftware Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical Sciences
Aron Ahmadia
 

Semelhante a Sustainable software institute Collaboration workshop (20)

Genetic Algorithm Demonstation System
Genetic Algorithm Demonstation SystemGenetic Algorithm Demonstation System
Genetic Algorithm Demonstation System
 
8 Usability Lessons from the UPA Conference by Mark Alves
8 Usability Lessons from the UPA Conference by Mark Alves8 Usability Lessons from the UPA Conference by Mark Alves
8 Usability Lessons from the UPA Conference by Mark Alves
 
Sciences Games #Glass2015
Sciences Games #Glass2015Sciences Games #Glass2015
Sciences Games #Glass2015
 
Ready, Set, Refactor
Ready, Set, RefactorReady, Set, Refactor
Ready, Set, Refactor
 
402 w2
402 w2402 w2
402 w2
 
Reproducible Science and Deep Software Variability
Reproducible Science and Deep Software VariabilityReproducible Science and Deep Software Variability
Reproducible Science and Deep Software Variability
 
KorraAI - a probabilistic virtual agent framework
KorraAI - a probabilistic virtual agent frameworkKorraAI - a probabilistic virtual agent framework
KorraAI - a probabilistic virtual agent framework
 
10 Ways To Improve Your Code( Neal Ford)
10  Ways To  Improve  Your  Code( Neal  Ford)10  Ways To  Improve  Your  Code( Neal  Ford)
10 Ways To Improve Your Code( Neal Ford)
 
Software Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesSoftware Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical Sciences
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
10 Ways To Improve Your Code
10 Ways To Improve Your Code10 Ways To Improve Your Code
10 Ways To Improve Your Code
 
myExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research EnvironmentmyExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research Environment
 
Ludo: An Ontology to Create Linked Data Driven Serious Games
Ludo: An Ontology to Create Linked Data Driven Serious GamesLudo: An Ontology to Create Linked Data Driven Serious Games
Ludo: An Ontology to Create Linked Data Driven Serious Games
 
Will Postgres Live Forever?
Will Postgres Live Forever?Will Postgres Live Forever?
Will Postgres Live Forever?
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple Rules
 
OpenAI-Copilot-ChatGPT.pptx
OpenAI-Copilot-ChatGPT.pptxOpenAI-Copilot-ChatGPT.pptx
OpenAI-Copilot-ChatGPT.pptx
 
Introduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdfIntroduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdf
 
Science Game Lab
Science Game LabScience Game Lab
Science Game Lab
 

Mais de Yannick Wurm

Mais de Yannick Wurm (20)

2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics
 
2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research
 
2017 11-15-reproducible research
2017 11-15-reproducible research2017 11-15-reproducible research
2017 11-15-reproducible research
 
2016 09-16-fairdom
2016 09-16-fairdom2016 09-16-fairdom
2016 09-16-fairdom
 
2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome
 
2016 05-30-monday-assembly
2016 05-30-monday-assembly2016 05-30-monday-assembly
2016 05-30-monday-assembly
 
2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad
 
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
 
2015 11-17-programming inr.key
2015 11-17-programming inr.key2015 11-17-programming inr.key
2015 11-17-programming inr.key
 
2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch
 
Week 5 genetic basis of evolution
Week 5   genetic basis of evolutionWeek 5   genetic basis of evolution
Week 5 genetic basis of evolution
 
Biol113 week4 evolution
Biol113 week4 evolutionBiol113 week4 evolution
Biol113 week4 evolution
 
Evolution week3
Evolution week3Evolution week3
Evolution week3
 
2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research
 
2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key
 
Evolution week2
Evolution week2Evolution week2
Evolution week2
 
2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcomm2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcomm
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
Sbc322 intro.key
Sbc322 intro.keySbc322 intro.key
Sbc322 intro.key
 
2015 09-28 bio721 intro
2015 09-28 bio721 intro2015 09-28 bio721 intro
2015 09-28 bio721 intro
 

Último

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 

Último (20)

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 

Sustainable software institute Collaboration workshop

  • 1. @yannick__ http://wurmlab.github.io Opportunities to overcome human & computational challenges in biology. 2015-03-25-CW15
  • 3. • Biology/life is complex • Field is young. • Biologists lack computational training. • Generally, analysis tools suck. • badly written • badly tested • hard to install • output quality… often questionable. • Understanding/visualizing/massaging data is hard. • Datasets continue to grow! Genomics is hard.
  • 8. “Can you BLAST this for me?” BLAST is the most commonly used tool: >100,000 citations But: •convoluted interface •challenging on custom data
  • 9. http://www.sequenceserver.com/ If no config file:Asks interactive setup questions. If needed: Downloads BLAST binaries If needed: Formats FASTA into BLAST database. 1. Installing gem install sequenceserver ### Launched SequenceServer at: http://0.0.0.0:4567 2. Launch sequenceserver Demo Anurag Priyam - @yeban
  • 10.
  • 11.
  • 13. Bioinformatics software challenges • Software hard to install 😖 • Central versions on cluster • “I need it now” - slow system administrators? → delays • Impossible to install? • AmazonVirtual Machine - copy there - run analysis - copy back→😕 • Consistent version for specific project? One project dozens of tools?
  • 14. Miracle solution? Too complicated & too much copying back & forth
  • 15. mymacbook:~/2015-­‐02-­‐01-­‐myproject>  abyss-­‐pe  k=25  reads.fastq.gz          zsh:  command  not  found:  abyss-­‐pe   mymacbook:~/2015-­‐02-­‐01-­‐myproject>  oswitch  -­‐l          yeban/biolinux:8          ubuntu:14.04          ontouchstart/texlive-­‐full          ipython/ipython   mymacbook:~/2015-­‐02-­‐01-­‐myproject>  oswitch  yeban/biolinux          ######  You  are  now  running:  biolinux  in  container  biolinux-­‐7187.  ######   biolinux-­‐7187:~/2015-­‐02-­‐01-­‐myproject>  abyss-­‐pe  k=25  reads.fastq.gz          [...  just  works  on  your  files  where  they  are…]   biolinux-­‐7187:~/2015-­‐02-­‐01-­‐myproject>  exit   mymacbook:~/2015-­‐02-­‐01-­‐myproject>          [...  output  is  where  you  expect  it  to  be  ...]   https://github.com/yeban/oswitch oSwitch One-line access to other operating systems.
  • 16. oSwitch One-line access to other operating systems. Things feel (largely) unchanged: • Current working directory is maintained. • User name, uid and gid are maintained. • Login shell (bash/zsh/fish) is maintained. • Home directory is maintained (thus all .dotfiles and config files are maintained). • read/write permissions are maintained. • Paths are maintained whenever possible.Thus volumes (external drives, NAS, USB) mounted on the host are available in the container at the same path. https://github.com/yeban/oswitch
  • 17. Working with Gene predictions
  • 18. Gene prediction Dozens of software algorithms: dozens of predictions 20% failure rate: •missing pieces •extra pieces •incorrect merging •incorrect splitting Visual inspection... and manual fixing required. 1 gene = 5 minutes to 3 days Yandell&Ence2013NRG GTTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATTATGTTGAATaTTAGGGTTTTTATAAAGAATGTGTATATTGUTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTTA Evidence Consensus:
  • 19.
  • 23. GeneValidator Run on: ★whole geneset: identify most problematic predictions ★alternative models for a gene (choose best) ★individual genes (while manually curating)
  • 25. Gene prediction Dozens of software algorithms: dozens of predictions 20% failure rate: •missing pieces •extra pieces •incorrect merging •incorrect splitting Visual inspection... and manual fixing required. 1 gene = 20 minutes to 3 days 15,000 genes * 20 species = impossible. Yandell&Ence2013NRG TTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATACAGTTTGTAATaTTAGGTATTTTATAAACAGTGTGTATATTTCTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTT Evidence Consensus:
  • 26.
  • 27. Algorithm discovery by protein folding game players Firas Khatiba , Seth Cooperb , Michael D. Tykaa , Kefan Xub , Ilya Makedonb , Zoran Popovićb , David Bakera,c,1 , and Foldit Players a Department of Biochemistry; b Department of Computer Science and Engineering; and c Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98195 Contributed by David Baker, October 5, 2011 (sent for review June 29, 2011) Foldit is a multiplayer online game in which players collaborate and compete to create accurate protein structure models. For spe- cific hard problems, Foldit player solutions can in some cases out- perform state-of-the-art computational methods. However, very little is known about how collaborative gameplay produces these results and whether Foldit player strategies can be formalized and structured so that they can be used by computers. To determine whether high performing player strategies could be collectively codified, we augmented the Foldit gameplay mechanics with tools for players to encode their folding strategies as “recipes” and to share their recipes with other players, who are able to further mod- ify and redistribute them. Here we describe the rapid social evolu- tion of player-developed folding algorithms that took place in the year following the introduction of these tools. Players developed over 5,400 different recipes, both by creating new algorithms and by modifying and recombining successful recipes developed by other players. The most successful recipes rapidly spread through the Foldit player population, and two of the recipes became parti- cularly dominant. Examination of the algorithms encoded in these two recipes revealed a striking similarity to an unpublished algo- rithm developed by scientists over the same period. Benchmark calculations show that the new algorithm independently discov- ered by scientists and by Foldit players outperforms previously published methods. Thus, online scientific game frameworks have the potential not only to solve hard scientific problems, but also to discover and formalize effective new strategies and algorithms. citizen science ∣ crowd-sourcing ∣ optimization ∣ structure prediction ∣ strategy Citizen science is an approach to leveraging natural human abilities for scientific purposes. Most such efforts involve visual tasks such as tagging images or locating image features (1–3). In contrast, Foldit is a multiplayer online scientific discovery game, in which players become highly skilled at creating accurate protein structure models through extended game play (4, 5). Foldit recruits online gamers to optimize the computed Rosetta energy using human spatial problem-solving skills. Players manipulate protein structures with a palette of interactive tools and manipula- tions. Through their interactive exploration Foldit players also uti- lize user-friendly versions of algorithms from the Rosetta structure prediction methodology (6) such as wiggle (gradient-based energy minimization) and shake (combinatorial side chain rotamer pack- ing). The potential of gamers to solve more complex scientific pro- blems was recently highlighted by the solution of a long-standing protein structure determination problem by Foldit players (7). One of the key strengths of game-based human problem ex- ploration is the human ability to search over the space of possible strategies and adapt those strategies to the type of problem and stage of problem solving (5). The variability of tactics and strategies stems from the individuality of each player as well as multiple methods of sharing and evolution within the game (group play, game chat), and outside of the game [wiki pages (8)]. One way to arrive at algorithmic methods underlying successful human Foldit play would be to apply machine learning techniques to the detailed logs of expert Foldit players (9). We chose instead to rely on a superior learning machine: Foldit players themselves. As the players themselves understand their strategies better than anyone, we decided to allow them to codify their algorithms directly, rather than attempting to automatically learn approxi- mations. We augmented standard Foldit play with the ability to create, edit, share, and rate gameplay macros, referred to as “recipes” within the Foldit game (10). In the game each player has their own “cookbook” of such recipes, from which they can invoke a variety of interactive automated strategies. Players can share recipes they write with the rest of the Foldit community or they can choose to keep their creations to themselves. In this paper we describe the quite unexpected evolution of recipes in the year after they were released, and the striking con- vergence of this very short evolution on an algorithm very similar to an unpublished algorithm recently developed independently by scientific experts that improves over previous methods. Results In the social development environment provided by Foldit, players evolved a wide variety of recipes to codify their diverse strategies to problem solving. During the three and a half month study period (see Materials and Methods), 721 Foldit players ran 5,488 unique recipes 158,682 times and 568 players wrote 5,202 recipes. We studied these algorithms and found that they fell into four main categories: (i) perturb and minimize, (ii) aggressive rebuilding, (iii) local optimize, and (iv) set constraints. The first category goes beyond the deterministic minimize function provided to Foldit players, which has the disadvantage of readily being trapped in local minima, by adding in perturbations to lead the minimizer in different directions (11). The second category uses the rebuild tool, which performs fragment insertion with loop closure, to search different areas of conformation space; these recipes are often run for long periods of time as they are designed to rebuild entire regions of a protein rather than just refining them (Fig. S1). The third category of recipes performs local minimizations along the protein backbone in order to im- prove the Rosetta energy for every segment of a protein. The final category of recipes assigns constraints between beta strands or pairs of residues (rubber bands), or changes the secondary struc- ture assignment to guide subsequent optimization. Different algorithms were used with very different frequencies during the experiment. Some are designated by the authors as public and are available for use by all Foldit players, whereas others are private and available only to their creator or their Foldit team. The distribution of recipe usage among different players is shown in Fig. 1 for the 26 recipes that were run over 1,000 times. Some recipes, such as the one represented by the leftmost bar, were used many times by many different players, while others, such as the one represented by the pink bar in the Author contributions: F.K., S.C., Z.P., and D.B. designed research; F.K., S.C., M.D.T., and F.P. performed research; F.K., S.C., M.D.T., K.X., and I.M. analyzed data; and F.K., S.C., Z.P., and D.B. wrote the paper. The authors declare no conflict of interest. Freely available online through the PNAS open access option. 1 To whom correspondence should be addressed. E-mail: dabaker@u.washington.edu. This article contains supporting information online at www.pnas.org/lookup/suppl/ doi:10.1073/pnas.1115898108/-/DCSupplemental. BIOPHYSICSAND COMPUTATIONALBIOLOGY PSYCHOLOGICALAND COGNITIVESCIENCES http://Fold.it
  • 28. • Recruiting & retaining contributors Crowd-sourcing the visual inspection + correction of gene models. Challenges
  • 29. Recruiting & retaining contributors Plan A: get students. • Increase accessibility: • Make tasks small & simple • Need excellent tutorials & training • Need an intelligent “mothering” user interface. • Provide rewards: • Better grades • Learning experience • Good karma (helping science) • Prestige & pride (on facebook; points & badges “leaderboard”, with certificates, in publications) • Opportunities to develop expertise & responsibilities
  • 30. Crowd-sourcing the visual inspection + correction of gene models. Challenges • Recruiting & retaining contributors • Ensuring quality
  • 31. Ensuring quality • Excellent tutorials/training • Make tasks small & simple • Redundancy • Review of conflicts by senior users. Begin Needs curation Create initial tasks Being curated Curate Being curated Curate Being curated Curate Submit Submit Submit Auto-check Done Inconsistent: create “review” task Consistent: create next required task
  • 32. Crowd-sourcing the visual inspection + correction. Challenges http://afra.sbcs.qmul.ac.ukAnurag Priyam http://github.com/yeban/afra • Recruiting & retaining contributors • Ensuring quality
  • 34.
  • 35.
  • 36.
  • 37. Timelines • Rolled out to: • 8 MSc students • 20 3rd year students • Need to improve tutorials/guidance/documentation/outputs • Roll out to 200 first years (autumn) • Expand
  • 39. Thanks! y.wurm@qmul.ac.uk @yannick__ http://wurmlab.github.io Colleagues & Collaborators @ QMUL & UNIL Anurag Priyam @yeban Monica Dragan Ismail Moghul Vivek Rai Bruno Vieira @bmpvieira
  • 40. Minimally guided demo :) 1. Set up a custom BLAST server (Sequenceserver) 2. Set up oswitch to rapidly switch to yeban/biolinux @yannick__