SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Today's bioinformatics lesson
is brought to you by the letter 'W'
by
Keith Bradnam
Image from flickr.com/91619273@N00/
Today'sbloinformatieslesson
isbroughttoyoubytheletter1W1
Imagefromflickr.com/91619273©NO0/
Wis for WorkflowsisforWorkflows
A typical bioinformatics workflow
Illumina data
(FASTQ format)
Remove adapter contamination
Atypicalbioinformaticsworkflow
Removeadaptercontamination
A typical bioinformatics workflow
Illumina data
(FASTQ format)
Remove adapter contamination
scythe
cutadapt
trimgalore
skewer
Btrim
Trimmomatic
Atypicalbioinformaticsworkflow
Removeadaptercontamination
scythe
cutadapt
trimgalore
skewer
Btrim
Trimmomatic
A typical bioinformatics workflow
Illumina data
(FASTQ format)
Remove adapter contamination
scythe
cutadapt
trimgalore
skewer
Btrim
Trimmomatic
Lots of tools
you could use!
Atypicalbioinformaticsworkflow
Lotsoftools
youcoulduse!
Removeadaptercontamination
scythe
cutadapt
trimgalore
skewer
Btrim
Trimmomatic
Trim reads for low quality bases
sickle
Qtrim
FastQC
FastX
PRINSEQ
Trimmomatic
Trimreadsforlowqualitybases
sickle
Qtrim
FastQC
FastX
PRINSEC)
Trimmomatic
Map reads to genome/transcriptome
BWA
Bowtie
TopHat
SHRiMP
BFAST
MAQ
From ebi.ac.uk/~nf/hts_mappers/
There are a lot of
read mappers out there!
Fromebi.ac.uk/-nf/hts_mappers/ H I S A T •-JAGuaR • -
BWA-PSSM • - -
MOSAIK•- - - - - -
Hobbes2 •
CUSHAW3a-
NextGenMap •
Subread/Subjunc •
CRAC•-
SRmapper•-
GEM•
STAR •
ERNE•-
BatMelh•-
BLASRa-
YAHA •
SeciAlto •
Batmis •
Therearealotof DynMaPp O S A •
ContextMap•-
as?n1 •-
RUMa_
readmappersoutthere!StampydrFAST•-Bismark•-
•-
MapSplicea-REALa--
BS-Seekera-- - B S - S e e k e r 2 - ••
Supersplat
liceMapRAT • - B R A T - S W -•-
BFAST•-
segemeht•-
GNUMAP•-
GenomeMapper•-
mrFAST • • - mrsFAST m r s FA S T- L i l t r a - -• - - - -
PerM • - - - - - ---
RNA-Mate • - - -X-Matea- - - - SBSMAP • - - - - S p l a z e r
RazerS • --•--MicroRazerS - • - - • RazerS3
SHRIMPa ——•SHR1MP2-•
BWAs - - •BWA-SW
CloudBurst •
ProbeMatch •• W H A M - •
TopHata- T o p H a t 2-•-
Bowlie •- B o w t i e 2 •-
MOM4-
PASS•- P A S S - b i s - -•
Slider • - - -Slider-II-
()PALMA •
SOCS"-
MAO•
SegMap •
ZOOM•
PalMaNa-
RMAP•
SOAP• —SOAP2--•
BWT-SW • - - S O A P S p l i c e - -•
Blata-
SSAHA•
GMAP •
Exonerate •
Mummer3 •
ELAND •
GSNAP-a-
20012002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Years
Map reads to genome/transcriptome
BWA
Bowtie
TopHat
SHRiMP
BFAST
MAQ
From ebi.ac.uk/~nf/hts_mappers/Fromeloi.ac.uki-nti GnotdrnietAtft.- 2 c 1 4 . 1.5auppl9:512
hitk.,:,www.bicrileckentrakuoiryt41-2105/75.•9•512
HISAT
JAGuaIR - -
Bw •A-PSSM - - - -M0-A1K
Approach
ARYANA:AligningReadsbyVetAnother
MiladGnoliimi•r,Arjeankba::'',AliSharifiviv:1-•.44,Harritireza(..hitsazMerio. . ..ignit5.
Abstract
PitTsburgh,PA,1..,'SA31March-OSApril20.4
iert)mRic:COM8-Seq:FourthAnnualRkC(....V/111Satellite'Workshopor)MassivelyParallelSequencing
Motivation:Althoughthereare
'•'--AarlycihretentaigorithmsancsoftwarerookbrNigningsequencingreacio s r
gappeos,Fo./pncesearchisfarfromsoivenStrongInterestinfastalignrrien:-ishest1.1,1pc7e0intheSV or.7tmforaigorithms',V-rhbeperrionfastaridaccuratealignment.
anclitiortdenow?assembtyofneat-GeneratoniPet.enringlngreadequitesfastoveriap-layriur-concensus
tieInnoczmvecompetitiononagoingaroller:tonofreadstoagiverdatabasedfreferencegenomes.In
-f_ultra-• -
Contribution:I'leintrot-LreARvANA.afastgappecrear!alignerdevelopedonMebissofiilleAincleA•ing
nisastr,_cturewithaco-ripletelyneooaighrrentengOPthatrh.akesitsignrfiramlyfasterthan7hreeotheraligner's:
Sowtie2,BMAantiSegAirt),wtncomparableGen-t,c-.:tyant:acruracy.Insteadofthporne-consurningt-haricraciong:vac:et:ores''L,!•handhingrntsrnatrtx5,s,ARYANIAcome;withthpsese-anO-exten0aigorIMmirframeworkanoa
5lonificantlyIrnPrOvedmth
efficiencybyIntegrongriNpialgorithmictetirnidt.elincluong
dynamArseer:seteCtion,
nin'ectionalspeceltensiortreset-4.rephashtablesanogap-fillingcAnynn•nirbrogsarnming.Asthpreaclength _ - -
increasesARYA-V/A•.!TItioeflornyintermsofspeedanaahgnmentratebecomesmoreevelent.Thisisinperfect
',lakesAtpar)/todeveionmission-specieNignersforotherappiicationsusingARVANAengine.harmony4viththeiFelilit'ngthtrenaas:heseci4enclnigTechnologiesevohieIhealgorithmcplaTformofARYANA
introduction
Availability:ARYAN.4compip7esourcerexiecanheobrairteilfromkittp.//gitbubcOrnlar)'ana-aligner
i:vt-tyliv:nscellcarriesahatA4offnreconsistingorseveralusedalaborioushierarchilprocesstodividethegertorne
thnuNanditl r
billitmsofcharacteniwithanswerstomany into srnalier.covegtamwhiletheCelera(;i-siolnicsfirm
vitalqumlions_.1-11.mnineffortstodecipherthathookhasreplacedthatb rin
yatrnnputationalsequence-assemblysoli-
Islernatio,:ratilnynanGenolne..eq.ite-ncingConxort,Lion
gainedincreasing:rloitivntlintsince/953WhtiLthedoublewareappliedtothedatageneatedfrontbhoellyshredded
helicalstructure011)NAwasdiscovered-'twentyyears(shotgun)wholegentorte17,.ti:.'theautomatedSanger
Liter.W..GilbertandA.Maxarnreactthenrst2,1-tit...It-atter r
methodwasthegoldstandardfin-abouttwodettleN,as
wordofthebook[I].svhenIISangerandhistsolleastiesthe.first*-ene.,-ntieoror021i/Axecitiencing.untiliecreasing
applicationoflabeleddideoxynucleotidetriphosphatexvolome ofen-orfreegenomirinformationcan%edmiler-
weredmelopinganothmsequentingmethodbasedonthedemandforla.,,tandinexpensivemethodstoproducehigh
I I
thatact;ISchainterminatorsinaPC.Rrmclior:/2,3...
genceofnewtechnologies.thesotailedNett-Geno-rainn I
drearnofreadingthehunzarihonk f e wasrtallaedhyAboutthreedecadesafterthefirnONAvegurnLing,SequericisvOVG,S)
.-1,paradigrnshihinboththeexperimentaltechnititieli 2 0 1 3 2 0 1 4 2 0 1 5
completionofthe t 3 I li t h efrulnangenrmreprofect(4-61,rhe and computationalInettulthocturred
doetothetransition
SSAHA• -II B l o t •-_
Ftli 1stca'Aut'O' iniblniran 1 avaiklii‘41MI' (–CIa? V* artfig•
.
rit:ctir;s1P,eye iveSangermate-pairedreadst-,-41t7to
•coeirsgt:,-,1,vi, i,),:kly•ieri?itt,ari,
relmenregerunnes,suchasthehumangenotr, ormore
hvananliJ-Ktrutoa' V areSarrt-tunnowtr-eas,tat,
ttore-.4.0,7f4,,ati,
than2000prokitryotex-toilvar),nesandArchaea.lamg,
totheNGStec:hnologiesandalso;Availabilityoffinished
2001 2 0 0 0 WattledCentral'''''..•„
Nzvoetr - - --—-ecthecrtPrta4
4..0,,,,t,:.0.,.a.,....„.0,,,elun.:06,z,kx...,0_,-;:t:eC—rnOrdo.Ercfo;CerretnseS:0;xa:13'stect'AL:i.deelat;,,13,17,a5Vt.GISrbtco,„.-"•amoeue?aro%x,,,, (-1'sYl't“:""Mort$Fttecr,...-0-?D14',1C.4,Tr'lelow:ccrseitv..43P.Ittfrtfct'NIa61Lt)&-.ACUISark*arnkozoimat,re:errrao'rPt.v•nit
el,A
(611;
Bloinformatics
Filter for uniquely mapped reads
SAMtools
Picard
GATK
Unix
Filterforuniquelymappedreads
SAMtools
Picard
GATK
Unix
Filter for high quality alignments
SAMtools
Picard
GATK
Unix
Filterforhighqualityalignments
SAMtools
Picard
GATK
Unix
Data suitable for
final analysis
Datasuitablefor
finalanalysis
Some questions you should ask yourself…Somequestionsyoushouldaskyourself..
Wis for 'Why?'isfor'Why?
Why are each of these steps needed?Whyareeachofthesestepsneeded?
Why should I use tool 'X' at this step?WhyshouldIusetoolX'atthisstep?
Wis for 'What?'isfor'What?'
What is the effect on running each step?Whatistheeffectonrunningeachstep?
What is a good result?Whatisagoodresult?
The effect of applying many
'bioinformatics axes'
Illumina data
(FASTQ format)
2 FASTQ files
Files are ~6.5 GB
52.5 million reads total
Theeffectofapplyingmany
1bloinformaticsaxes'
IIluminadata
(FASTQformat)
2FASIQfiles
52.5millionreadstotal
Filesare,-,64.5GB
Remove adapters & trim
50.1 million reads
Removeadapters&trim
50.1millionreads
Align to transcriptome with Bowtie
35.8 million reads map
AligntotranscriptomewithBowtie
35.8millionreadsmap
Filter for uniquely mapped reads
31.4 million reads align uniquely
Filterforuniquelymappedreads
31.4millionreadsalignuniquely
Filter for high quality alignments
22.7 million reads have alignment scores of zero
Filterforhighqualityalignments
22.7millionreadshavealignmentscoresofzero
Data suitable for
final analysis
Reduced data from 52.5 to 22.7 million reads
Datasuitablefor
finalanalysis
Reduceddatafrom52.5to22.7millionreads
It can be helpful to know how the different
steps in a workflow reduce your data
Itcanbehelpfultoknowhowthedifferent
stepsinaworkflowreduceyourdata
One final tip…Onefinaltip...
ls -ltris ltr
Run this command after
every step of a workflow
Runthiscommandafter
everystepofaworkflow
Let's you see whether output files
were actually created
Let'syouseewhetheroutputfiles
wereactuallycreated
Let's you see whether output files
contain any data
Let'syouseewhetheroutputfiles
containanydata
Most recently modified files will be
at bottom of your terminal window
Mostrecentlymodifiedfileswillbe
atbottomofyourterminalwindow
The endTheend

Mais conteúdo relacionado

Semelhante a BIOINFORMATICS WORKFLOW STEPS

20110114 Next Generation Sequencing Course
20110114 Next Generation Sequencing Course20110114 Next Generation Sequencing Course
20110114 Next Generation Sequencing CoursePierre Lindenbaum
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Surya Saha
 
Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009Paolo Missier
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Surya Saha
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartAraport
 
[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introduction[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introductionMads Albertsen
 
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)Jing-Doo Wang
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomicsNeil Saunders
 
Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...
Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...
Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...Catalina Arango
 
EnviroInsite training workshop - Overview of EnviroInsite Features
EnviroInsite training workshop - Overview of EnviroInsite FeaturesEnviroInsite training workshop - Overview of EnviroInsite Features
EnviroInsite training workshop - Overview of EnviroInsite FeaturesBruce Jacobs
 
From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File FormatsFrom Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File FormatsCharles Fracchia
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerSasha Goldshtein
 
Quick Introduction to Cytoscape for Undergraduates
Quick Introduction to Cytoscape for UndergraduatesQuick Introduction to Cytoscape for Undergraduates
Quick Introduction to Cytoscape for UndergraduatesKeiichiro Ono
 

Semelhante a BIOINFORMATICS WORKFLOW STEPS (20)

CSIRT-Kit: Your Security Response toolkit
CSIRT-Kit: Your Security Response toolkitCSIRT-Kit: Your Security Response toolkit
CSIRT-Kit: Your Security Response toolkit
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
20110114 Next Generation Sequencing Course
20110114 Next Generation Sequencing Course20110114 Next Generation Sequencing Course
20110114 Next Generation Sequencing Course
 
Arduino uno-schematic
Arduino uno-schematicArduino uno-schematic
Arduino uno-schematic
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015
 
Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introduction[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introduction
 
NASA Biocene Workshop 10th Sept 2019
NASA Biocene Workshop 10th Sept 2019NASA Biocene Workshop 10th Sept 2019
NASA Biocene Workshop 10th Sept 2019
 
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomics
 
Sequencing
SequencingSequencing
Sequencing
 
Chang Sha, China
Chang Sha, ChinaChang Sha, China
Chang Sha, China
 
Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...
Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...
Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...
 
EnviroInsite training workshop - Overview of EnviroInsite Features
EnviroInsite training workshop - Overview of EnviroInsite FeaturesEnviroInsite training workshop - Overview of EnviroInsite Features
EnviroInsite training workshop - Overview of EnviroInsite Features
 
From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File FormatsFrom Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF Primer
 
Quick Introduction to Cytoscape for Undergraduates
Quick Introduction to Cytoscape for UndergraduatesQuick Introduction to Cytoscape for Undergraduates
Quick Introduction to Cytoscape for Undergraduates
 

Mais de Keith Bradnam

Thoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestThoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestKeith Bradnam
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Keith Bradnam
 
Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Keith Bradnam
 
Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Keith Bradnam
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Keith Bradnam
 
Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1Keith Bradnam
 
The art of good science writing
The art of good science writingThe art of good science writing
The art of good science writingKeith Bradnam
 
Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0Keith Bradnam
 
Polish that presentation! 25 tips to bring clarity to your slides
Polish that presentation! 25 tips to bring clarity to your slidesPolish that presentation! 25 tips to bring clarity to your slides
Polish that presentation! 25 tips to bring clarity to your slidesKeith Bradnam
 
10 tips for adding polish to presentations
10 tips for adding polish to presentations10 tips for adding polish to presentations
10 tips for adding polish to presentationsKeith Bradnam
 
Database talk for Bits & Bites meeting
Database talk for Bits & Bites meetingDatabase talk for Bits & Bites meeting
Database talk for Bits & Bites meetingKeith Bradnam
 
Benchmarking short-read mapping programs
Benchmarking short-read mapping programsBenchmarking short-read mapping programs
Benchmarking short-read mapping programsKeith Bradnam
 
Thoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesThoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesKeith Bradnam
 
When is a genome finished?
When is a genome finished? When is a genome finished?
When is a genome finished? Keith Bradnam
 
Twitter 101 - an introduction to Twitter
Twitter 101  - an introduction to TwitterTwitter 101  - an introduction to Twitter
Twitter 101 - an introduction to TwitterKeith Bradnam
 

Mais de Keith Bradnam (15)

Thoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestThoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contest
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
 
Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2
 
Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1
 
Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1
 
The art of good science writing
The art of good science writingThe art of good science writing
The art of good science writing
 
Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0
 
Polish that presentation! 25 tips to bring clarity to your slides
Polish that presentation! 25 tips to bring clarity to your slidesPolish that presentation! 25 tips to bring clarity to your slides
Polish that presentation! 25 tips to bring clarity to your slides
 
10 tips for adding polish to presentations
10 tips for adding polish to presentations10 tips for adding polish to presentations
10 tips for adding polish to presentations
 
Database talk for Bits & Bites meeting
Database talk for Bits & Bites meetingDatabase talk for Bits & Bites meeting
Database talk for Bits & Bites meeting
 
Benchmarking short-read mapping programs
Benchmarking short-read mapping programsBenchmarking short-read mapping programs
Benchmarking short-read mapping programs
 
Thoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesThoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore Technologies
 
When is a genome finished?
When is a genome finished? When is a genome finished?
When is a genome finished?
 
Twitter 101 - an introduction to Twitter
Twitter 101  - an introduction to TwitterTwitter 101  - an introduction to Twitter
Twitter 101 - an introduction to Twitter
 

Último

Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 

Último (20)

Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 

BIOINFORMATICS WORKFLOW STEPS

  • 1. Today's bioinformatics lesson is brought to you by the letter 'W' by Keith Bradnam Image from flickr.com/91619273@N00/ Today'sbloinformatieslesson isbroughttoyoubytheletter1W1 Imagefromflickr.com/91619273©NO0/
  • 3. A typical bioinformatics workflow Illumina data (FASTQ format) Remove adapter contamination Atypicalbioinformaticsworkflow Removeadaptercontamination
  • 4. A typical bioinformatics workflow Illumina data (FASTQ format) Remove adapter contamination scythe cutadapt trimgalore skewer Btrim Trimmomatic Atypicalbioinformaticsworkflow Removeadaptercontamination scythe cutadapt trimgalore skewer Btrim Trimmomatic
  • 5. A typical bioinformatics workflow Illumina data (FASTQ format) Remove adapter contamination scythe cutadapt trimgalore skewer Btrim Trimmomatic Lots of tools you could use! Atypicalbioinformaticsworkflow Lotsoftools youcoulduse! Removeadaptercontamination scythe cutadapt trimgalore skewer Btrim Trimmomatic
  • 6. Trim reads for low quality bases sickle Qtrim FastQC FastX PRINSEQ Trimmomatic Trimreadsforlowqualitybases sickle Qtrim FastQC FastX PRINSEC) Trimmomatic
  • 7. Map reads to genome/transcriptome BWA Bowtie TopHat SHRiMP BFAST MAQ From ebi.ac.uk/~nf/hts_mappers/ There are a lot of read mappers out there! Fromebi.ac.uk/-nf/hts_mappers/ H I S A T •-JAGuaR • - BWA-PSSM • - - MOSAIK•- - - - - - Hobbes2 • CUSHAW3a- NextGenMap • Subread/Subjunc • CRAC•- SRmapper•- GEM• STAR • ERNE•- BatMelh•- BLASRa- YAHA • SeciAlto • Batmis • Therearealotof DynMaPp O S A • ContextMap•- as?n1 •- RUMa_ readmappersoutthere!StampydrFAST•-Bismark•- •- MapSplicea-REALa-- BS-Seekera-- - B S - S e e k e r 2 - •• Supersplat liceMapRAT • - B R A T - S W -•- BFAST•- segemeht•- GNUMAP•- GenomeMapper•- mrFAST • • - mrsFAST m r s FA S T- L i l t r a - -• - - - - PerM • - - - - - --- RNA-Mate • - - -X-Matea- - - - SBSMAP • - - - - S p l a z e r RazerS • --•--MicroRazerS - • - - • RazerS3 SHRIMPa ——•SHR1MP2-• BWAs - - •BWA-SW CloudBurst • ProbeMatch •• W H A M - • TopHata- T o p H a t 2-•- Bowlie •- B o w t i e 2 •- MOM4- PASS•- P A S S - b i s - -• Slider • - - -Slider-II- ()PALMA • SOCS"- MAO• SegMap • ZOOM• PalMaNa- RMAP• SOAP• —SOAP2--• BWT-SW • - - S O A P S p l i c e - -• Blata- SSAHA• GMAP • Exonerate • Mummer3 • ELAND • GSNAP-a- 20012002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Years
  • 8. Map reads to genome/transcriptome BWA Bowtie TopHat SHRiMP BFAST MAQ From ebi.ac.uk/~nf/hts_mappers/Fromeloi.ac.uki-nti GnotdrnietAtft.- 2 c 1 4 . 1.5auppl9:512 hitk.,:,www.bicrileckentrakuoiryt41-2105/75.•9•512 HISAT JAGuaIR - - Bw •A-PSSM - - - -M0-A1K Approach ARYANA:AligningReadsbyVetAnother MiladGnoliimi•r,Arjeankba::'',AliSharifiviv:1-•.44,Harritireza(..hitsazMerio. . ..ignit5. Abstract PitTsburgh,PA,1..,'SA31March-OSApril20.4 iert)mRic:COM8-Seq:FourthAnnualRkC(....V/111Satellite'Workshopor)MassivelyParallelSequencing Motivation:Althoughthereare '•'--AarlycihretentaigorithmsancsoftwarerookbrNigningsequencingreacio s r gappeos,Fo./pncesearchisfarfromsoivenStrongInterestinfastalignrrien:-ishest1.1,1pc7e0intheSV or.7tmforaigorithms',V-rhbeperrionfastaridaccuratealignment. anclitiortdenow?assembtyofneat-GeneratoniPet.enringlngreadequitesfastoveriap-layriur-concensus tieInnoczmvecompetitiononagoingaroller:tonofreadstoagiverdatabasedfreferencegenomes.In -f_ultra-• - Contribution:I'leintrot-LreARvANA.afastgappecrear!alignerdevelopedonMebissofiilleAincleA•ing nisastr,_cturewithaco-ripletelyneooaighrrentengOPthatrh.akesitsignrfiramlyfasterthan7hreeotheraligner's: Sowtie2,BMAantiSegAirt),wtncomparableGen-t,c-.:tyant:acruracy.Insteadofthporne-consurningt-haricraciong:vac:et:ores''L,!•handhingrntsrnatrtx5,s,ARYANIAcome;withthpsese-anO-exten0aigorIMmirframeworkanoa 5lonificantlyIrnPrOvedmth efficiencybyIntegrongriNpialgorithmictetirnidt.elincluong dynamArseer:seteCtion, nin'ectionalspeceltensiortreset-4.rephashtablesanogap-fillingcAnynn•nirbrogsarnming.Asthpreaclength _ - - increasesARYA-V/A•.!TItioeflornyintermsofspeedanaahgnmentratebecomesmoreevelent.Thisisinperfect ',lakesAtpar)/todeveionmission-specieNignersforotherappiicationsusingARVANAengine.harmony4viththeiFelilit'ngthtrenaas:heseci4enclnigTechnologiesevohieIhealgorithmcplaTformofARYANA introduction Availability:ARYAN.4compip7esourcerexiecanheobrairteilfromkittp.//gitbubcOrnlar)'ana-aligner i:vt-tyliv:nscellcarriesahatA4offnreconsistingorseveralusedalaborioushierarchilprocesstodividethegertorne thnuNanditl r billitmsofcharacteniwithanswerstomany into srnalier.covegtamwhiletheCelera(;i-siolnicsfirm vitalqumlions_.1-11.mnineffortstodecipherthathookhasreplacedthatb rin yatrnnputationalsequence-assemblysoli- Islernatio,:ratilnynanGenolne..eq.ite-ncingConxort,Lion gainedincreasing:rloitivntlintsince/953WhtiLthedoublewareappliedtothedatageneatedfrontbhoellyshredded helicalstructure011)NAwasdiscovered-'twentyyears(shotgun)wholegentorte17,.ti:.'theautomatedSanger Liter.W..GilbertandA.Maxarnreactthenrst2,1-tit...It-atter r methodwasthegoldstandardfin-abouttwodettleN,as wordofthebook[I].svhenIISangerandhistsolleastiesthe.first*-ene.,-ntieoror021i/Axecitiencing.untiliecreasing applicationoflabeleddideoxynucleotidetriphosphatexvolome ofen-orfreegenomirinformationcan%edmiler- weredmelopinganothmsequentingmethodbasedonthedemandforla.,,tandinexpensivemethodstoproducehigh I I thatact;ISchainterminatorsinaPC.Rrmclior:/2,3... genceofnewtechnologies.thesotailedNett-Geno-rainn I drearnofreadingthehunzarihonk f e wasrtallaedhyAboutthreedecadesafterthefirnONAvegurnLing,SequericisvOVG,S) .-1,paradigrnshihinboththeexperimentaltechnititieli 2 0 1 3 2 0 1 4 2 0 1 5 completionofthe t 3 I li t h efrulnangenrmreprofect(4-61,rhe and computationalInettulthocturred doetothetransition SSAHA• -II B l o t •-_ Ftli 1stca'Aut'O' iniblniran 1 avaiklii‘41MI' (–CIa? V* artfig• . rit:ctir;s1P,eye iveSangermate-pairedreadst-,-41t7to •coeirsgt:,-,1,vi, i,),:kly•ieri?itt,ari, relmenregerunnes,suchasthehumangenotr, ormore hvananliJ-Ktrutoa' V areSarrt-tunnowtr-eas,tat, ttore-.4.0,7f4,,ati, than2000prokitryotex-toilvar),nesandArchaea.lamg, totheNGStec:hnologiesandalso;Availabilityoffinished 2001 2 0 0 0 WattledCentral'''''..•„ Nzvoetr - - --—-ecthecrtPrta4 4..0,,,,t,:.0.,.a.,....„.0,,,elun.:06,z,kx...,0_,-;:t:eC—rnOrdo.Ercfo;CerretnseS:0;xa:13'stect'AL:i.deelat;,,13,17,a5Vt.GISrbtco,„.-"•amoeue?aro%x,,,, (-1'sYl't“:""Mort$Fttecr,...-0-?D14',1C.4,Tr'lelow:ccrseitv..43P.Ittfrtfct'NIa61Lt)&-.ACUISark*arnkozoimat,re:errrao'rPt.v•nit el,A (611; Bloinformatics
  • 9. Filter for uniquely mapped reads SAMtools Picard GATK Unix Filterforuniquelymappedreads SAMtools Picard GATK Unix
  • 10. Filter for high quality alignments SAMtools Picard GATK Unix Filterforhighqualityalignments SAMtools Picard GATK Unix
  • 11. Data suitable for final analysis Datasuitablefor finalanalysis
  • 12. Some questions you should ask yourself…Somequestionsyoushouldaskyourself..
  • 14. Why are each of these steps needed?Whyareeachofthesestepsneeded?
  • 15. Why should I use tool 'X' at this step?WhyshouldIusetoolX'atthisstep?
  • 17. What is the effect on running each step?Whatistheeffectonrunningeachstep?
  • 18. What is a good result?Whatisagoodresult?
  • 19. The effect of applying many 'bioinformatics axes' Illumina data (FASTQ format) 2 FASTQ files Files are ~6.5 GB 52.5 million reads total Theeffectofapplyingmany 1bloinformaticsaxes' IIluminadata (FASTQformat) 2FASIQfiles 52.5millionreadstotal Filesare,-,64.5GB
  • 20. Remove adapters & trim 50.1 million reads Removeadapters&trim 50.1millionreads
  • 21. Align to transcriptome with Bowtie 35.8 million reads map AligntotranscriptomewithBowtie 35.8millionreadsmap
  • 22. Filter for uniquely mapped reads 31.4 million reads align uniquely Filterforuniquelymappedreads 31.4millionreadsalignuniquely
  • 23. Filter for high quality alignments 22.7 million reads have alignment scores of zero Filterforhighqualityalignments 22.7millionreadshavealignmentscoresofzero
  • 24. Data suitable for final analysis Reduced data from 52.5 to 22.7 million reads Datasuitablefor finalanalysis Reduceddatafrom52.5to22.7millionreads
  • 25. It can be helpful to know how the different steps in a workflow reduce your data Itcanbehelpfultoknowhowthedifferent stepsinaworkflowreduceyourdata
  • 28. Run this command after every step of a workflow Runthiscommandafter everystepofaworkflow
  • 29. Let's you see whether output files were actually created Let'syouseewhetheroutputfiles wereactuallycreated
  • 30. Let's you see whether output files contain any data Let'syouseewhetheroutputfiles containanydata
  • 31. Most recently modified files will be at bottom of your terminal window Mostrecentlymodifiedfileswillbe atbottomofyourterminalwindow