SlideShare uma empresa Scribd logo
1 de 57
Baixar para ler offline
Today's bioinformatics lesson
is brought to you by the letter 'D'
by
Keith Bradnam
Image from flickr.com/91619273@N00/
Today'sbloinformatieslesson
isbroughttoyoubytheletter101
Imagefromflickr.com/91619273©NO0/
D
is for Default parametersisforDefaultparameters
D
is also for Danger!isalsoforDanger!
about0,91-6-1?proelootspleasefrtaltusat415,1v.ostet:co/?7
caiwetalatlal7soriyourpateliaseofanostz-Rttoaster/Toleatt?%re
bralleltaclones a b e t efectaadocoMPlaererostadotaOSIZ.R*l
PapaapreovetozrlssabrelosprocluctosaeOSTER',visite/7ospottavoteo
ivit*ostet:coto.
X
X
Nobody reads a toaster manual!Nobodyreadsatoastermanual!
But everyone would read a manual for thisButeveryonewouldreadamanualforthis
, - - r -----------
opts....tos
Bioinformatics programs are not toasters!
ANL
Bloinformaticsprogramsarenottoasters!
-EL
Read the manual!Readthemanual!
At least, read *some* of the manualAtleast,read*some*ofthemanual
TIEBOW
Bowtie
Anultrafestmemory-efficientshortreedaligner
OHNSHOPKINS
U N I V E R S I T Y
Bowtle isanultrafast, memory-efficientshortreadaligner. It alignsshortDNAsequences(reads) to
thehumangenomeat arate of over25million 35-bpreadsperhour.Bowtieindexesthegenomewith
aBurrows-Wheelerindex tokeep itsmemoryfootprint small: typically about2.2GBfor thehuman
genome(2.9GBfor paired-end).
OSIcertified
Recentnews
"Lighterreleased
OLighter isanextremely fastandmemory-efficientprogramfor
correctingsequencingerrors inDNAsequencingdata.Fordetailson
howerror correctioncanhelpimprovethespeedandaccuracy of
downstreamanalysistools,seethepaperinGenomeBiology.
Sourceandsoftwareavailable atGitHub.
"1.1.1-101112014
OFixed acompilinglinkageproblemrelated withMacOSXMavericks.
OImprovedperformance forcaseswherethereferencecontainsmany
stretches ofNs.
OSome minorautomatictestsupdates.
1.1.0-7/19/2014
OAdded support for largeandsmallindexes,removing4-billion-
nucleotidebarrier.Bowtiecannowbeusedwithreferencegenomes
ofanysize
ONo longerreleasing 32-bit binaries.SimplifiedmanualandMakefile
accordingly.
OPhased outCygWinsupport.
OImproved efficiency ofindexfilesloading.
OFixed abug thatmadebowtic-inspecz fail insomesituations.
O(This releasewasbrieflygivenversionnumber2.0.0, butwe
changed it to 1.1.0 to avoidconfusionwithBowtie 2.)
1.0.1release-3/1412014
bowie-bio_sourceforge.ne:
SiteMap
Home
Newsarchive
Gettingstarted
Manual
ToolsthatuseBowtie
LatestRelease
Bowtie1.1.1 1 0 / 1 / 1 4
Pleasecite.Langmead8,TrapnellCoPopM,Salzberg
Ultraastancmemory-efficientalignmentofshot
DNAsequencestothenumangenome.GenomeEltol
10:1125.
Forreleaseupdates,subscribetothemailinglist.
relatedTools
Bowtie2: Fast,accuratereadalignment
Crossbow:Genotyping,cloudcomputing
Tophat:RNA-Seqsplicejunctionmapper
Cufflinks:Isoformassembly,quantitation
Myrna:Cloud,differentialgeneexpression
Lighter:Fasterrorcorrection
OthertoolsusingBowtie
Pre-builtindexes
Considerusing Illumina'siGenomes
collection.EachiGenomesarchivecontains
pre-builtBowtieandBowtie2indexes.
H.sapiens, NCBIGRCh38 2 . 7 GB
How to use bowtie
bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]
Howtousebowtie
bowtie [options]* <ebwt> 1-1 <ml> -2 <m2>1 --12 <r> 1 <s>1 [<hit>]
How to use bowtie
bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]
Howtousebowtie
bowtie [options]* <ebwt> 1-1 <ml> -2 <m2>1 --12 <r> 1 <s>1 [<hit>]
bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]
Bowtie has a lot of options!Bowtiehasalotofoptions!
Thequeryinputfiles(specifiedeitheras<m1>and<m2>,oras<s>)areFASTQfiles(usuallyhavingextension • fq or , fastg).
Thisisthedefault.Seealso: --solexa-quais and --integer-quals.
Thequeryinputfiles(specifiedeitheras<mi>and<m2>,oras<s>)areFASTAfiles(usuallyhavingextension fa, .mfa, fna
orsimilar).All qualityvaluesareassumedtobe40onthePhredqualityscale.
-r T h e queryinputfiles(specifiedeitheras<rni>and<m2>,oras<s>)areRawfiles:onesequenceperline,withoutqualityvalues
ornames.All qualityvaluesareassumedtobe40onthePhredqualityscale.
-c T h e querysequencesaregivenoncommandline. I.e.<ml>,<m2>and<singles>arecomma-separatedlists ofreadsrather
thanlists ofreadfiles.
-C/--color A l i g n incolorspace.Readcharactersareinterpretedascolors.Theindexspecifiedmustbeacolorspaceindex(i.e. built with
bowtie-build -C,or bowtie will printanerrormessageandquit.SeeColorspacealignmentformoredetails.
-Qt--quals <files> Comma-separated list offilescontainingqualityvaluesforcorrespondingunpairedCSFASTAreads.Useincombinationwith -c
and-t. --integer-quais is setautomaticallywhen-Q/--guals isspecified.
--Q1 <files> Comma-separated list offilescontainingqualityvaluesforcorrespondingCSFASTA#1.mates.Useincombinationwith-C, -f,
and-1. --integer-quals issetautomaticallywhen--Q1isspecified.
--Q2 <files> Comma-separated list offilescontainingqualityvaluesforcorrespondingCSFASTA#2mates.Useincombinationwith-C, -f,
and-2. --integer-quals issetautomaticallywhen--Q2isspecified.
-s/--skip <int> S k i p (i.e.donotalign)the first <int>readsorpairsintheinput.
-u/--qupto <int> O n l y alignthe first <int>readsorreadpairsfromtheinput (afterthe -s/--skip readsorpairshavebeenskipped).Default:
nolimit.
-51--trim5 <int> T r i m <int>basesfromhigh-quality(left)endofeachreadbeforealignment(default:0).
-3/--trim3 <int> T r i m <int>basesfromlow-quality (right)endofeachreadbeforealignment(default: 0).
--phred33-quals I n p u t qualitiesareASCIIcharsequalto thePhredqualityplus33.Default:on.
--phred64-quals I n p u t qualitiesareASCIIcharsequalto thePhredqualityplus64.Default: off.
--solexa-quals C o n v e r t inputqualitiesfromSolexa(whichcanbenegative)toPhred(whichcan't).Thisisusuallytherightoptionforusewith
(unconverted)readsemittedbyGAPipelineversionspriorto1.3.Default: off.
--solexa1.3-quals S a m e as--phred64-quals.Thisisusuallythe rightoption forusewith(unconverted)readsemittedbyGAPipelineversion1.3
orlater.Default: off.
--integer-quals Qualityvaluesarerepresentedinthereadinput fileasspace-separatedASCIIintegers,e.g.,4040 30 40-, ratherthanASCII
characters,e.g., I n t e g e r s aretreatedasbeingonthePhredqualityscaleunless--s01.exa-quals isalsospecified.
-k <int>
-m<int>
-M <int>
--best
Reportupto<int>validalignmentsperreadorpair(default:1).Validityofalignmentsisdeterminedbythealignmentpolicy(combined
effectsof-n, -v, -1,and-e). Ifmorethanonevalidalignmentexistsandthe--bestand--strataoptionsarespecified,thenonlythose
alignmentsbelongingtothebestalignment'stratum"willbereported.Bowtieisdesignedtobeveryfastforsmall-kbutbowtiecan
becomesignificantlysloweras-kincreases.IfyouwouldliketouseBowtieforlargervaluesof considerbuildinganindexwitha
densersuffix-arraysample,i.e.specifyasmaller-ot—offratewheninvokingbowtie-buildfortherelevantindex(seethePerformance
tuningsectionfordetails).
-a/--all Report allvalidalignmentsperreadorpair(default:off).Validityofalignmentsisdeterminedbythealignmentpolicy(combinedeffectsof
-n,-v, -1,and-e). Ifmorethanonevalidalignmentexistsandthe--bestand--strataoptionsarespecified,thenonlythose
alignmentsbelongingtothebestalignment"stratum"willbereported.Bowtieisdesignedtobeveryfastforsmall-kbutbowtiecan
becomesignificantlyslowerif -a/--all isspecified.IfyouwouldliketouseBowtiewith-a,considerbuildinganindexwithadensersuffix-
arraysample,i.e.specifyasmaller-oi—offratewheninvokingbowtie-buildfortherelevantindex(seethePerformancetuningsection
fordetails).
Suppressallalignmentsforaparticularreadorpair ifmorethan <int> reportablealignmentsexistfor it.Reportablealignmentsarethose
thatwouldbereportedgiventhe -n, -v, -1, -e, -k, -a, --best,and --strata options.Default:nolimit.Bowtieisdesignedtobeveryfast
forsmall-mbutbowtiecanbecomesignificantlyslowerforlargervaluesof-in. IfyouwouldliketouseBowtieforlargervaluesof-k,
considerbuildinganindexwithadensersuffix-arraysample,i.e.specifyasmaller-0/--offratewheninvokingbowtie-buildforthe
relevantindex(seethePerformancetuningsectionfordetails).
Behaveslike-raexceptthatifareadhasmorethan<int>reportablealignments,oneisreportedatrandom.Indefaultoutputmode,the
selectedalignment's7thcolumnissetto<int>-1-1toindicatethereadhasatleast<int>+1validalignments.In -S/--sammode,the
selectedalignmentisgivenaMAPQ(mappingquality)of0andthexm:ifieldissetto<int>4-1.Thisoptionrequires--best; ifspecified
without--best, --bestisenabledautomatically.
MakeBowtieguaranteethatreportedsingletonalignmentsare"best"intermsofstratum(i.e.numberofmismatches,ormismatchesin
theseedinthecaseof-r_mode)andintermsofthequalityvaluesatthemismatchedposition(s).Stratumalwaystrumpsquality;e.g.a
1-mismatchalignmentwherethemismatchedpositionhasPhredquality40ispreferredovera2-mismatchalignmentwherethe
mismatchedpositionsbothhavePhredquality10.When--bestisnotspecified,Bowtiemayreportalignmentsthataresub-optimalin
termsofstratumand/orquality(thoughaneffortismadetoreportthebestalignment).--bes7_modealsoremovesallstrandbias.Note
that --bestdoesnotaffectwhichalignmentsareconsidered"valid"bybowtie,onlywhichvalidalignmentsarereportedbyboTertie.When
--best isspecifiedandmultiplehitsareallowed(via -k or -a), thealignmentsforagivenreadareguaranteedtoappearinbest-to-worst
orderinbewtie'soutput.bowtie issomewhatslowerwhen--best isspecified.
--strata I f manyvalidalignmentsexistandarereportable(e.g.arenotdisallowedviathe -k option)andtheyfall intomorethanonealignment
"stratum",reportonlythosealignmentsthatfallintothebeststratum.Bydefault,Bowtiereportsallreportablealignmentsregardlessof
whethertheyfallintomultiplestrata.When--strata isspecified,--bestmustalsobespecified.
-v <int> R e p o r t alignmentswithatmost<int>mismatches.-0and-1optionsareignoredandqualityvalueshavenoeffectonwhat
alignmentsarevalid.-v ismutuallyexclusivewith-n.
-n/--seedmms<int> Maximum numberofmismatchespermittedinthe"seed",i.e.thefirstLbasepairsoftheread(whereLissetwith -1/--
seedien).Thismaybe0,1, 2or3andthedefaultis2.Thisoptionismutuallyexclusivewiththe -voption.
-ef--magerr <int> Maximum permittedtotalofqualityvaluesatallmismatchedreadpositionsthroughouttheentirealignment,notjustinthe
"seed".Thedefaultis70.LikeMaq,Dow-tieroundsqualityvaluestothenearest10andsaturatesat30;roundingcanbe
disabled with --nomaground.
-1/--seedien <int>
--nomaground
-I/--minins <int>
-X/--maxins <int>
--nofw/--norc
The"seedlength";i.e.,thenumberofbasesonthehigh-qualityendofthereadtowhichthe-nceilingapplies.Thelowest
permittedsettingis5andthedefaultis28.bowtieisfasterforlargervaluesof
MaqacceptsqualityvaluesinthePhredqualityscale,butinternallyroundsvaluestothenearest10,withamaximumof30.By
default,bowtiealsoroundsthisway.--nomagrouncipreventsthisroundinginbowtie.
Theminimuminsertsizeforvalidpaired-endalignments.E.g.if -I 60isspecifiedandapaired-endalignmentconsistsoftwo
20-bpalignmentsintheappropriateorientationwitha20-bpgapbetweenthem,thatalignmentisconsideredvalid(aslongas
-xisalsosatisfied).A19-bpgapwouldnotbevalidinthatcase.Iftrimmingoptions-3or-!,;arealsoused,the constraint
isappliedwithrespecttotheuntrimmedmates.Default:O.
Themaximuminsertsizeforvalidpaired-endalignments.E.g.if -x100isspecifiedandapaired-endalignmentconsistsoftwo
20-bpalignmentsintheproperorientationwitha60-bpgapbetweenthem,thatalignmentisconsideredvalid(aslongas -I is
alsosatisfied).A61-bpgapwouldnotbevalidinthatcase.Iftrimmingoptions-3or -5arealsoused,the -xconstraintis
appliedwithrespecttotheuntrimmedmates,notthetrimmedmates.Default:250.
Theupstream/downstreammateorientationsforavalidpaired-endalignmentagainsttheforwardreferencestrand.E.g.,if --
fr isspecifiedandthereisacandidatepaired-endalignmentwherematelappearsupstreamofthereversecomplementof
mate2andtheinsertlengthconstraintsaremet,thatalignmentisvalid.Also,ifmate2appearsupstreamofthereverse
complementofmatelandallotherconstraintsaremet,thattooisvalid. --rf likewiserequiresthatanupstreammatelbe
reverse-complementedandadownstreammate2beforward-oriented. --ff requiresbothanupstreammatelanda
downstreammate2tobeforward-oriented.Default: --fr when-C(colorspacealignment)isnotspecified, --ff when-Cis
specified.
If --nowisspecified,bowtiewillnotattempttoalignagainsttheforwardreferencestrand. If --nort isspecified,bowtiewill
notattempttoalignagainstthereverse-complementreferencestrand.Forpaired-endreadsusing --fr or --rf modes,--nofIsT
and--norcapplytotheforwardandreverse-complementpairorientations.I.e.specifying--nofwand--±r willonlyfindreads
intheR/Forientationwheremate2occursupstreamofmate1withrespecttotheforwardreferencestrand.
--maxbts T h e maximumnumberofbacktrackspermittedwhenaligningareadin 2 or-n3mode(default:125without--best,800
with--best).A"backtrack"istheintroductionofaspeculativesubstitutionintothealignment.Withoutthislimit,thedefault
Printtheamountofwall-clocktimetakenbyeachphase.
-V--offbase <int> When outputtingalignmentsinBowtieformat,considerthefirstbaseofareferencesequencetohaveoffset<int>.Thisoption
hasnoeffectin-si—salamode,sinceSAMmandates1-basedoffsets.Default:O.
--quiet P r i n t nothingbesidesalignments.
--refout
--al <filename>
--un <filename>
--max <filename>
--suppress <cols>
--fullref
WritealignmentstoasetoffilesnamedrefXXXXX.map,wherexxxXXisthe0-paddedindexofthereferencesequencealigned
to.Thiscanbeausefulwaytobreakupworkfordownstreamanalyseswhendealingwith,forexample,largenumbersofreads
alignedtotheassembledhumangenome.If <hits>isalsospecified,itwillbeignored.
--refidx W h e n areferencesequenceisreferredtoinareportedalignment,refertoitby0-basedindex(itsoffsetintothelistof
referencesthatwereindexed)ratherthanbyname.
Writeallreadsforwhichatleastonealignmentwasreportedtoafilewithname<filename>.Writtenreadswillappearasthey
didintheinput,withoutanyofthetrimmingortranslationofqualityvaluesthatmayhavetakenplacewithinbowtie.Paired-
endreadswillbewrittentotwoparallelfileswith_1andinserted inthefilename,e.g.,if <filename>isaligned.fq,the#1
andIt2matesthatalignatleastoncewillbewrittentoaligned_l.fqandaligned_2.fa_respectively.
Writeallreadsthatcouldnotbealignedtoafilewithname<filename>.Writtenreadswillappearastheydidintheinput,
withoutanyofthetrimmingortranslationofqualityvaluesthatmayhavetakenplacewithinBowtie.Paired-endreadswillbe
writtentotwoparallelfileswith_1and_2insertedinthefilename,e.g.,if<filename>isunaligned.fq,the#1and#2mates
thatfailtoalignwillbewrittentounaligned_l fo andunaligned_2 q respectively.Unless--maxisalsospecified,readswith
anumberofvalidalignmentsexceedingthelimitsetwiththe-moptionarealsowrittento<filenane>.
Writeallreadswithanumberofvalidalignmentsexceedingthelimitsetwiththe-moptiontoafilewithname<filename>.
Writtenreadswillappearastheydidintheinput,withoutanyofthetrimmingortranslationofqualityvaluesthatmayhave
takenplacewithin•zowtie.Paired-endreadswillbewrittentotwoparallelfileswith_1and_2insertedinthefilename,e.g.,if
<filename>ismax.fa,the#1and#2matesthatexceedthe-mlimitwillbewrittentomax_1.fqandmax_2.fqrespectively.
Thesereadsarenotwrittentothefilespecifiedwith--lart.
Suppresscolumnsofoutputinthedefaultoutputmode.E.g.if--suppress 1, 5,6isspecified,thereadname,readsequence,
andreadqualityfieldswillbeomitted.SeeDefaultBowtieoutputforfielddescriptions.Thisoptionisignored if theoutput
modeis-S/--sarr..
Printthefullreferncesequencename,includingwhitespace,inalignmentoutput.Bydefaultbowtieprintseverythinguptobut
notincludingthe firstwhitespace.
Colorspace
--snpphred <int>
--snpfrac <dec>
--col-cseq
--col-equal
--col-keepends
SAM
-S/--sam
Whendecodingcolorspacealignments,use <int> astheSNPpenalty.Thisshouldbesetto theuser'sbestguessof thetrue ratio
ofSNPsperbasein thesubjectgenome,converted to thePhredqualityscale.E.g., if theuserexpectsabout1SNPevery1,000
positions,--snpphredshouldbeset to30(whichisalsothedefault).Tospecifythefractiondirectly,use --snpfrac.
Whendecodingcolorspacealignments,use<dot>astheestimatedratio ofSNPsperbase.Forbestdecodingresults, thisshould
beset to theuser'sbestguessof thetrue ratio. bowtie internallyconvertsthe ratio toaPhredquality,andbehavesas if that
qualityhadbeensetviathe--zinpphredoption.Default:0.001.
Ifreadsareincolorspaceandthe defaultoutputmodeisactive, --col-cseq causesthereads'colorsequencetoappearinthe
read-sequencecolumn(column5)instead of thedecodednucleotidesequence.SeetheDecodingcolorspacealignmentssection
fordetailsaboutdecoding.Thisoptionisignoredin -s/--sammode.
Ifreadsareincolorspaceandthedefaultoutputmodeisactive,--col-cguaicausesthereadsoriginal(color)qualitysequence
toappearinthe qualitycolumn(column6)instead of thedecodedqualities.SeetheColorspacealignmentsectionfor details
aboutdecoding.Thisoptionisignoredin-S1--sarrimode.
Whendecodingcolorpsacealignments,bowtie trims offanucleotideandqualityfromthe leftandrightedgesofthealignment.
Thisisbecausethosenucleotidesaresupportedbyonlyonecolor,in contrasttothemiddlenucleotideswhicharesupportedby
two.Specify--col-keepends tokeeptheextreme-endnucleotidesandqualities.
PrintalignmentsinSAMformat.SeetheSAMoutputsectionofthemanualfordetails.TosuppressallSAMheaders,use--sam-
noheadinaddition to -S/--sam.Tosuppressjust the headers (e.g. if thealignmentisagainstaverylargenumberofreference
sequences),use--sam-nosqinaddition to -S/--sam. bowtiedoesnot writeBAMfilesdirectly, butSAMoutputcanbeconvertedto
BAMonthe flybypiping•DowtielSoutput tosamtools view. -Si—sarnisnotcompatiblewith --refout.
--mapo<int> I f analignmentisnon-repetitive(accordingto-m,--strataandotheroptions)settheMAPQ(mappingquality)fieldtothisvalue.
SeetheSAMSpecfordetailsabouttheMAK,fieldDefault:255.
--sam-hohead S u p p r e s s headerlines(starting with@)whenoutputis-S/--sarr..Thismustbespecifiedinadditionto -S/--sam.--sam-noheadis
ignoredunless-s/--sarr. isalsospecified.
--sam-hosq S u p p r e s s 1S0headerlineswhenoutputis--Si—sam.Thismustbespecifiedinaddition to -S/--sam.--sam-hosqisignoredunless
-sj--sam isalsospecified.
--sam-RG<text> A d d <text> (usually of theformTAG:VAL,e.g.ID:IL-1LANE2)asafieldonthe2:RGheaderline.Specify--sam-RGmultipletimesto
setmultiplefields.SeetheSAMSpecfordetailsaboutwhatfieldsarelegal.Notethat, if any@RGfieldsaresetusingthisoption,
theIDandSMfieldsmustbothbeamongthemtomakethegRGlinelegalaccordingto theSAMSpec.--sari-RGisignoredunless -
Performance
-of—offrate <int>
-pi—threads <int>
--mm
--shmem
Other
Overridetheoffrate oftheindexwith <int>. If <int> isgreaterthantheoffrateusedtobuildtheindex,thensomerow
markingsarediscardedwhentheindexisreadintomemory.Thisreducesthememoryfootprintofthealignerbutrequires
moretimetocalculatetextoffsets. <int> mustbegreaterthanthevalueusedtobuildtheindex.
Launch<in':>parallelsearchthreads(default: 1).Threadswillrunonseparateprocessors/coresandsynchronizewhenparsing
readsandoutputtingalignments.Searchingforalignmentsishighlyparallel,andspeedupisfairlyclosetolinear.Thisoptionis
onlyavailable if b,owtieislinkedwiththeothreadslibrary(i.e. ifBOVIIE_PTHREADS=0isnotspecifiedatbuildtime).
Usememory-mappedI/O toloadtheIndex,ratherthannormalCfileI/O.Memory-mappingtheindexallowsmanyconcurrent
bowtioprocessesonthesamecomputertosharethesamememoryimageoftheindex(i.e.youpaythememoryoverhead
justonce).Thisfacilitatesmemory-efficientparallelizationofbowtieInsituationswhereusing-p isnotpossible.
Usesharedmemorytoloadtheindex,ratherthannormalCfileI/O.Usingsharedmemoryallowsmanyconcurrentbowtie
processesonthesamecomputertosharethesamememoryimageoftheindex(i.e.youpaythememoryoverheadjustonce).
Thisfacilitatesmemory-efficientparallelizationofbowtieinsituationswhereusing-p isnotdesirable.Unlike--mm,--shnem
installstheindexintosharedmemorypermanently,or untiltheuserdeletesthesharedmemorychunksmanually.Seeyour
operatingsystemdocumentationfordetailsonhowtomanuallylistandremovesharedmemorychunks(onLinuxandMacOS
X,thesecommandsareipcsandipcm).YoumayalsoneedtoincreaseyourOS'smaximumshared-memorychunksizeto
accomodatelargerindexes;seeyourOSdocumentation.
--seed <int> U s e <int>astheseedforpseudo-randomnumbergenerator.
--verbose P r i n t verboseoutput(fordebugging).
--version P r i n t versioninformationandquit.
-hi—help P r i n t usageinformationandquit.
flickr.com/photos/dannyjacksonflickrcomiphotosidannyjackson
4
• -
"I'll just use the default parameters!""I'lljustusethedefaultparameters!"
"What could go wrong?""Whatcouldgowrong?"
First, some terminology…
Read 1 Read 2
'Insert'
inner-mate pair distance
DNA/RNA Fragmentadapter adapteradapter
First,someterminology...
DNA/RNAFragment adapter
Read1
inner-matepairdistance
'Insert'
Read2
We can plot the distribution
of inner mate pair distances
Wecanplotthedistribution
ofinnermatepairdistances
ReadsmappedtoTranscriptomewithBowtie2
200 4 0 0 6 0 0 8 0 0
Innersizebetweenmappedreadpairs
Notice anything unusual?
ReadsmappedtoTranscriptomewithBowtie2
Noticeanythingunusual?
200 4 0 0 6 0 0 8 0 0
Innersizebetweenmappedreadpairs
Bowtie 2 has an -X option
for 'max fragment length'
The default value is 500 bp
= 100 + 100 + 300
What happens if we
increase -X to 2000 bp?
Bowtie2hasan-Xoption
for'maxfragmentlength'
Thedefaultvalueis500bp
=100+100+300
Whathappensifwe
increase-Xto2000bp?
New data!
c7,
0 2 0 0
ReadsmappedtoTranscriptomewithBowtie2
1
Newdata!
11111r1n1Ithimin
Innersizebetweenmappedreadpairs
400 6 0 0 8 0 0
Most programs will have some options
that you should consider changing
Mostprogramswillhavesomeoptions
thatyoushouldconsiderchanging
Some options from TopHat
TopHat
command-line option
Meaning
Default
value
--num-threads
How many CPU threads to
use when running TopHat
1
--min-intron-length Minimum intron length 70
-r / --mate-inner-dist
Expected (mean) inner
distance between mate pairs
50
--mate-std-dev
Standard deviation for the
distribution on inner distances
20
SomeoptionsfromTopHat
1WTopHat
command-lineoption Meaning
Default
value
--num-threads HowmanyCPUthreadsto
usewhenrunningTopHat
1
You nearly always can run with more
processors/threads than the default (1)
Younearlyalwayscanrunwithmore
processors/threadsthanthedefault(1)
Some options from TopHat
TopHat
command-line option
Meaning
Default
value
--num-threads
How many CPU threads to
use when running TopHat
1
--min-intron-length Minimum intron length 70
-r / --mate-inner-dist
Expected (mean) inner
distance between mate pairs
50
--mate-std-dev
Standard deviation for the
distribution on inner distances
20
SomeoptionsfromTopHat
1WTopHat
command-lineoption Meaning
Default
value
--num-threads HowmanyCPUthreadsto
usewhenrunningTopHat
1
--min-intron-lengthMinimumintronlength 7 0
This might not be suitable for non-vertebratesThismightnotbesuitablefornonvertebrates
D
is for DocumentationisforDocumentation
You should document your efforts!Youshoulddocumentyourefforts!
You should document as you goYoushoulddocumentasyougo
1iortt,peke),,,v,
4Z-c;(>t_t'
17:11
tresP L
LA-r,,oc-nrt
t Lek
(20tf-1-*)1re-4,
(3,31 - or-
1?-1.,tokos•,,,,,4
Rool1,11--
12ProN
rc
RIcA.046
-AccAlitipow)
Pr5TPlowe
opoy.•)&!).
t
r Aer-od %Pt,'
• •••••-
t'vsa ( ( A t %44.--5 0-F 6 Co% c -
(tcei L0pAr COV Pv-0 t ) ‘_
5i et-f)triz. a c et 0 ) Loe_ev-T,t,voS
otir•I'L re e?
(c• ctfte,4411,
rev6-1esTmok•as,
kJ,So&ISIFir,at- 05.771 - efive,:t •••1
_
0V(t,d2(sty
LI5
IP(112,'A
lActi - r r e r
Cc. 5 4 , e c e 14(,,r5 ,Ferv'A4t r
3L.
PSAe e V O I N 4
or-ti-t1/4 etec.vt,
6,11,2) e
ot%Ps. C E-4• a t . t ,
LAJo-oevw'_1__ s v l y, - 1 c/94,er Ttrei-4/. _
(3 eos-1- r . v c-rf Pi410 /4,5
ok.1 t%'t / e n AY) 6"-',00•4)
6•, O f f t ) s s .
.SV.Cte,
(v-11re,iYVIte%kV%) ,,,,,e1r,C rot; C : )
r pu ) 4 61 Cke,4tteV c•At r a t tocr
CT') kJsIlk); 0 1 ; , 4 - S P -h0460w b l ' " "
re.4,ki,6- 1 S t . e12-",r,POT cve,e3e4crey
"egg,40LS-TAN oPcaftle - r t , *ger o-yreleve
_ A t , 1 7 . 7 L t 4-,5 F a t
1 1 0
PI21rA, ke, i 2 eleta
I:3.0ex4itiv a , - Pp v . / )•% ) 1 a rient
or-
g,vA.) e s - $ • T4P
••••• _ 3
covy L a _ Tsres
)ti7t-eltri
—1TO r e
(Drrn)t-iPM-N4C Arc- CA-13
rrsArtf..tg-S6,„tid—toAtt-
e1244-r
(0;:t tn.)
•••,trttrteT—.1.t.,,IA
1
(..•Oren
t
irtra
t4..c..ec4
IAI/oe4 ( I N T cyt _
tu.s.+-•
ST ft:,S
i42the4.1st2_7
IRO Ft-ca*.$
nit.P0-6
Lab books are good…
tiortt,peke),,,v,
17-6,Nos•••,,,.4
PsIcA.0*s6.._111Q•
-kce.iitoow)
RIN'YrIl
-1---- —
T c p
- efIc•I:rt _ 3
44,elk, -r 4rer
-:-1)
re-v6-1esTA.Was,
U_Si)S IFINit( OS:71—C plAlt " 1 n r • t e . . . n c ,
1:Y4'k.. r e PIONCti
(C•Nitt. C I E tot,
L
wkoD°'6'.%L•
54,ec e 14e.m-, r a 1E4--iter,'A‘tr
1 )
PsAee
(vo f;) wtE ct.c1,..J)
1160411,pt„,„
1
V)1.N„.
41.-ettneT—.1„.„;1,1
abbooksaregood...
L
• •11 • • 1 • - •
•nrt (20•- • - • • • • — •
- •• • • ••• -
opoysn&i)
4._(3,3-, 0 r- r
FpaL ( f A ( 0 5 - F 6 col6,1 c -
L/tors E 0 P v - 0 s p . ) _
P v i z , 5 i e r •-t• 1 2 0 et 0 ) Weaft-Tt0,1 S
, A r c , 6 ( el2-•
ittg, LSTAN
Or:-
L/ ( r , 1 7 2 - 1 ) ,
oex ' L k - P p 1 . 1 1 ) ) ) 1 a 1 f l I
1/1/1,-1_1w4,12,'A
•5) eltri
OPco2at1/4.)
P'1't F a t * -
or-
"••••••-)
a. Qr44-19—
v1,6A*5z•-
*gerE-yrn,J,6_1
PQtrAl / 2 e
Cer7 / rvs-1,- It 4
tA>11)*-, it-N4C
1244--r
(or'sesn
IAI/to'NNI (INT cyt
V I - , E WA '
nii.P0-6
…but electronic lab books are more helpful
few,":-)
1iortt,peke),,,v,
17-1-Nos•et,4
RIcA.e*s6
-kce.tiovw)
ROVNI-11
-1----- —
51NA
_
L
1:3"kk- r e x? IONCti
(C•Nist. ClEtoc.,
PsAee
v
—ef4c--;rtt
(Les-$ Tcp
t - r 4rer
-z1)
01"14
i1/4JitsTA.LIVai1241
- C
5-0Zc e k.,041,,,,r,
1 )
(vo ,;) e ct.cro,) _
f i c tert-t'4(-1 r
416041 4 , 1
1
V)I
d_et41
-
butelectroniclabbooksaremorehelpful
L
LA-1--,oc.•nrt tve7).1re-4,
(3,310ife--or-
(//k./ 4 L t e l l " - S
e P p A r S
P i t
Pit$ rc
• ••••-
opoysn&i)
F C014:140 *C-
C0ev—o1)_%--tvs")
5ieor-(3,11-0.'" a° ec.,-) t w o s
ittg. LSTphekje
Or-
,/71-111kA
est-
I:100v4lt-Ae'Lk- pp 1 . 1 1 ) ) ) 1 1 4 t (1:1 "•5
co2at1/4.) ci
17-7,E,Fat
•g,cv,rOrtor>y
or- 904---,
aver to,Vreteve6-1
Pi2trA,Ike,t, /2e/ta
mc.A
It71-
-1 b C LI7 / rr.,1-1-t 4
a>11n),-,P11-N4cAgzAwk-o_p_
Qr44-19-
e12-44--r
ratorNN.)eiNtlcyt
nit.P0-6
Microsoft
Word
XJft
ord
w
Tools like Microsoft Word
might not be future proof
ToolslikeMicrosoftWord
mightnotbefutureproof
Consider using plain text filesConsiderusingplaintextfiles
I.e. something that can be read using 'less'Lasomethingthatcanbereadusing'less'
I like to write README files in Markdown format for everything
Milk-DNase-Seq-Project:RNA-SeqAnalyis
--- - - -
Seemain,READMErl'ADME.md) file for moreinformation about this project.
4*BovineRNA-seqdata ##
Stored in 'ishare/tamu/Data/RNA-Seq/Cow/2014-10'.Looks like paired-read100
bpdata. In total 31 x 2 files, ranging from 1-3.5GBin size. Seealso the '/
share/tamu/Data/RNA-Seq/Cow/Metadata'directory whichcontains ametadata file
whichsuggests thatwehavedata from15 virgin cowsand16 lacating cows.
Theultimate goal is to find genes that are differentially expressedbetween
thesetwodevelopmentalstages.
Thesefiles were originallycompressedwith bzip2, will re-compress with gzip
sothat existing pipelines canwork with them.And will alsorenamethem to
havefastq suffix:
—bash
cdishare/tamu/Data/RNA-Seq/Cow/2014-10
bunzip2*.bz2
rename.pl sitxt/fastq/ *.txt
gzip *.fastq
tttCheckingbarcodes inRNA-Seqdata ##
Let'scheckon all barcodesbeingused. Will makesomesoft links to the data
files
"'bash
cdAnalysis/Test
mkdirRNA-Seq_Barcode_check
cdRNA-Seq_Barcode_check
qlogin
bunzip2 -c ../../../Data/RNA-Seq/Cow/2014-10/*.bz21grep@HWI"1 sed 's/.*
[12]:N:0://' I sort I uniq -c > barcodes_in_identifiers.txt
Unfortunately this failed due to a 'Nospace left ondevice error'. Somaybe
needto treat each file separately.
#TestrunofScytheandSickle#
IliketowriteREADMEfilesinMarkdownformatforeverything
Easy to output to HTML or PDF
Milk-DNase-Seq-Project:RNA-SeqAnalyis
--- - - -
Seemain,READMErl'ADME.md) file for moreinformation about this project.
4*BovineRNA-seqdata ##
Stored in 'ishare/tamu/Data/RNA-Seq/Cow/2014-10'.Looks like paired-read100
bpdata. In total 31 x 2 files, ranging from 1-3.5GBin size. Seealso the '/
share/tamu/Data/RNA-Seq/Cow/Metadata'directory whichcontains ametadata file
whichsuggests thatwehavedata from15 virgin cowsand16 lacating cows.
Theultimate goal is to find genes that are differentially expressedbetween
thesetwodevelopmentalstages.
Thesefiles were originallycompressedwith bzip2, will re-compress with gzip
sothat existing pipelines canwork with them.And will alsorenamethem to
havefastq suffix:
—bash
cdishare/tamu/Data/RNA-5eq/Cow/2014-10
bunzip2*.bz2
rename.pl sitxt/fastq/ *.txt
gzip *.fastq
tttCheckingbarcodes inRNA-Seqdata ## 1
is/Test
Let'scheckon all barcodesbeingused. Will makesomesoft links to the data Barcodecheck
files q Barcode_check
"'bash
cdAnalysis/Test
mkdirRNA-Seq_Barcode_check
cdRNA-Seq_Barcode_check
qlogin
bunzip2 -c ../../../Data/RNA-Seq/Cow/2014-10/*.bz21grep@HWI"1 sed 's/.*
[12]:N:0://' I sort I uniq -c > barcodes_in_identifiers.txt
Unfortunately this failed due to a 'Nospace left ondevice error'. Somaybe
needto treat each file separately.
#TestrunofScytheandSickle#
Milk-DNase-Seq-Project:RNA-SeqAnalyis
SeemainREADMEfileformoreinformationaboutthisproject.
BovineRNA-secidata
Storedin/share/tamu/Data/RNA-Seq/Cow/2014-10Lookslikepaired-read100bpdata.Intotal31x2files,rangingfrom1-3.5
GBinsize.Seealsotheishareitamo/Data/RNA-Seq/Cow/Metadatadirectorywhichcontainsametadatafilewhichsuggeststhat
wehavedatafrom15virgincowsand16lacatingcows.
Theultimategoalistofindgenesthataredifferentiallyexpressedbetweenthesetwodevelopmentalstages.
Thesefileswereoriginallycompressedwithbzip2,willre-compresswithgzipsothatexistingpipelinescanworkwiththem.Andwill
alsorenamethemtohavefastqsuffix:
cd/share/tamu/Data/RNA-Seq/Cow/2014-10
bunzip2*.b22
rename.pl s/txt/fastq/ *.txt
gzip *.fastq
CheckingbarcodesinRNA-Seqdata
et'scheckonallbarcodesbeingused.Willmakesomesoftlinkstothedatafiles
ogin
bunzip2 -c ../../../Data/RNA-Seq/Cow/2014-10/*.bz2 1 grep "MI" 1 sed 's/.. f121:N:0://' 1 sort 1 unlq -c >
UnfortunatelythisfailedduetoaNospaceleftondeviceerror'.Somaybeneedtotreateachfileseparately.
TestrunofScytheandSickle
UnliketheDNase-Segdata.wenowhavepaired-enddata,whichrequiresrunningSicklealittledifferently.Sofirst,let'sdoatest
(using10.000readsfromeachoftwopairedFAST()files):
cdishare/tamu/Analysis/Test
mkdirPaired_end_seythe_sickle test
EasytooutputtoHIM_orPDF
http://korflab.ucdavis.edu/bootcamp.md
http://korflab.ucdavis.edu/bootcamp.html
Markdown is easy to read, and converts to
useful HTML (with hyperlinks and formatting)
http://kortlabiucdavis.edu/bootcamp.md
http://kortlabiucdavis.edu/bootcampihtml
Markdowniseasytoread,andconvertsto
usefulHIM_(withhyperlinksandformatting)
Title: Command-line Bootcamp
Authors: Keith Bradnam
Date: 2015-06-14
Address:Genome Center, UC Davis, Davis, CA, 95616
#Command-line Bootcamp
### Keith Bradnam
###UC Davis Genome Center
#10 Version 1.0 - - - June 2015
<br><br><br>
><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img
ait="Creative CommonsLicense" style-"border-width:0"
src="https://i.creativecommons.org/l/by-nc-sa/4.0188x31.png" /></a><br />This work is
licensed under a <a rel-"license" href-"http://creativecommons.org/licenses/by-nc-
sa/4.0/">CreativeCommonsAttribution-NonCommercial-ShareAlike 4.0 International
License</a>. Please send feedback, questions, money, or abuse to <krbradnamquedavis.edu>
Introduction [Introduction]
This 'bootcamp' i s intended to provide the reader with a basic overview of essential
Unix/Linux commands that will allow them to navigate a file system and move, copy, edit
files. I t will also introduce a brief overview of some 'power' commands in Unix.
##Why Unix? [Why Unix]
The [Unix operating system][Unix] has been around since 1969. Back then
thing as a graphical user interface. You typed everything. I t mayseem a
keyboard to issue commands today, but i t ' s much easier to automate keybo
mouse tasks. There are several variants of Unix (including [Linux][Linux o u g
differences do not matter much for most basic functions.
[Unix]: http://en.wikipedia.org/wiki/Unix
[Linux]: http://en.wikipedia.org/wiki/Linux
Increasingly, the raw output of biological research exists as _in silico_ data, usually
in the form of large text files. Unix is particularly suited to working with such files
andhas several powerful (and flexible) commands that can process your data for you. The
real strength of learning Unix is that most of these commands can be combined in an
almost unlimited fashion. So i f you can learn just five Unix commands, you will be able
to do a lot more than just five things.
OfTypeset Conventions [Typeset]
Command-line examples that you are meant to type into a terminal window will be shown_
Command-lineBootcamp
KeithBradnam
UCDavisGenomeCenter
Version1.0—June2015
ThisworkislicensedunderaCreativeCommonsAttribution-
NonCommercial-ShareAlike4.0InternationalLicense.Pleasesend
feedback,questions,money,orabusetokrbradnamgucdavis.edu
Introduction
This'bootcampisintendedtoprovidethereaderwithabasicoverviewofessential
Unix/Linuxcommandsthatwillallowthemtonavigateafilesystemandmove,copy,
editfiles.Itwillalsointroduceabriefoverviewofsome'power'commandsinUnix.
WhyUnix?
TheUnixoperatingsystemhasbeenaroundsince1969.Backthentherewasno
suchthingasagraphicaluserinterface.Youtypedeverything.itmayseemarchaicto
useakeyboardtoissuecommandstoday,butitsmucheasiertoautomatekeyboard
0 This repositorySearch Explore Gist Blog Help k b r a d n a m 0 0
KorfLab/Milk-DNase-Seq-Project
i2/branch:master • Milk-DNase-Seq-Project/ README_RNA-SecLanalysisand
0 Watch
cdAnalysis/Test
mkdirRNA—Seq_Barcode_check
cdRNA—Seq_Barcode_check
qlogin
*Star 0 V F o r k 0
i=
koradnam3daysagoNewanalysisusingRtorunDEseq2
1_c::,tribiAtOr
317lines(213sloc)12.729kb R a w Blame History m
Milk-DNase-Seq-Project:RNA-SeqAnalyis
SeemainREADMEfileformoreinformationaboutthisprolect.
BovineRNA-seqdata
Storedin/shereitamu/Data/RNA-Seq/Cow/2014-1.0Lookslikepaired-read100bpdata.Intotal31x2files,ranging
from1-3.5GBinsize.SeealsotheisharettamuiData/RNA-Seq/CowiMetadatadirectorywhichcontainsametadata
filewhichsuggeststhatwehavedatafrom15virgincowsand16lacatingcows.
Theultimategoalistofindgenesthataredifferentiallyexpressedbetweenthesetwodevelopmentalstages.
Thesefileswereoriginallycompressedwithbzip2,willre-compresswithgzipsothatexistingpipelinescanworkwith
them.Andwillalsorenamethemtohavefastesuffix:
cdishare/tamu/Data/RNA—Seq/Cow/2014-10
bunzip2*.bz2
rename.pl sitxt/fastq/ *.txt
gzip *.fastq
CheckingbarcodesinRNA-Selldata
Let'scheckonallbarcodesbeingused.Willmakesomesoftlinkstothedatafiles
1
Sites like GitHub use Markdown
ThisrepositorySearch
1
Explore Gist Blog Help kbradnam 0 0
1
KorfLab/Milk-DNase-Seq-Project
i2/branch:master • Milk-DNase-Seq-Project/ README_RNA-SecLanalysisand
0 Watch *Star 0 V F o r k 0
i=
kbradnam3daysagoNewanalysisusingRtorunDEseq2
1 b 10r
317lines(213sloc)12.729kb RawBlame History I l m
Milk-DNase-Seq-Project:RNA-SeqAnalyis
SeemainREADMEfileformoreinformationaboutthisprolect.
SiteslikeGitHubuseMarkdown
bunzip2*.bz2
rename.pl sitxt/fastq/ *.txt
gzip *.fastq
CheckingbarcodesinRNA-Selldata
Let'scheckonallbarcodesbeingused.Willmakesomesoftlinkstothedatafiles
cdAnalysis/Test
mkdirRNA-Seq_Barcode_check
cdRNA-Seg_Barcode_check
qlogin
Reproducible science is important!Reproduciblescienceisimportant!
Reviewers increasingly want more
details regarding bioinformatics methods
Reviewersincreasinglywantmore
detailsregardingbloinformaticsmethods
Make it easy to for others to follow your workMakeiteasytoforotherstofollowyourwork
The endTheend

Mais conteúdo relacionado

Semelhante a This bioinformatics lesson is brought to you by the letter 'D'

Reverse-Engineering a Proprietary Sound Sample Format
Reverse-Engineering a Proprietary Sound Sample FormatReverse-Engineering a Proprietary Sound Sample Format
Reverse-Engineering a Proprietary Sound Sample Format
Andrew Bulhak
 
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
RootedCON
 

Semelhante a This bioinformatics lesson is brought to you by the letter 'D' (20)

Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 
Reverse-Engineering a Proprietary Sound Sample Format
Reverse-Engineering a Proprietary Sound Sample FormatReverse-Engineering a Proprietary Sound Sample Format
Reverse-Engineering a Proprietary Sound Sample Format
 
SGN Introduction to UNIX Command-line 2015 part 2
SGN Introduction to UNIX Command-line 2015 part 2SGN Introduction to UNIX Command-line 2015 part 2
SGN Introduction to UNIX Command-line 2015 part 2
 
Ns network simulator
Ns network simulatorNs network simulator
Ns network simulator
 
Programming Assignment Help
Programming Assignment HelpProgramming Assignment Help
Programming Assignment Help
 
Phylogenetics in R
Phylogenetics in RPhylogenetics in R
Phylogenetics in R
 
Odp
OdpOdp
Odp
 
Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2
 
04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)
 
printf("%s from %c to Z, in %d minutes!\n", "printf", 'A', 45);
printf("%s from %c to Z, in %d minutes!\n", "printf", 'A', 45);printf("%s from %c to Z, in %d minutes!\n", "printf", 'A', 45);
printf("%s from %c to Z, in %d minutes!\n", "printf", 'A', 45);
 
Kernel Recipes 2016 - Why you need a test strategy for your kernel development
Kernel Recipes 2016 - Why you need a test strategy for your kernel developmentKernel Recipes 2016 - Why you need a test strategy for your kernel development
Kernel Recipes 2016 - Why you need a test strategy for your kernel development
 
Software Vulnerabilities in C and C++ (CppCon 2018)
Software Vulnerabilities in C and C++ (CppCon 2018)Software Vulnerabilities in C and C++ (CppCon 2018)
Software Vulnerabilities in C and C++ (CppCon 2018)
 
Crash Dump Analysis 101
Crash Dump Analysis 101Crash Dump Analysis 101
Crash Dump Analysis 101
 
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
 
Introduction to Compiler Development
Introduction to Compiler DevelopmentIntroduction to Compiler Development
Introduction to Compiler Development
 
M12 random forest-part01
M12 random forest-part01M12 random forest-part01
M12 random forest-part01
 
Mona cheatsheet
Mona cheatsheetMona cheatsheet
Mona cheatsheet
 
Microchip Mfg. problem
Microchip Mfg. problemMicrochip Mfg. problem
Microchip Mfg. problem
 
20141106 asfws unicode_hacks
20141106 asfws unicode_hacks20141106 asfws unicode_hacks
20141106 asfws unicode_hacks
 

Mais de Keith Bradnam

The art of good science writing
The art of good science writingThe art of good science writing
The art of good science writing
Keith Bradnam
 
Database talk for Bits & Bites meeting
Database talk for Bits & Bites meetingDatabase talk for Bits & Bites meeting
Database talk for Bits & Bites meeting
Keith Bradnam
 

Mais de Keith Bradnam (20)

13 questions you might have about galaxy
13 questions you might have about galaxy13 questions you might have about galaxy
13 questions you might have about galaxy
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...
 
This bioinformatics lesson is brought to you by the letter 'W'
This bioinformatics lesson is brought to you by the letter 'W'This bioinformatics lesson is brought to you by the letter 'W'
This bioinformatics lesson is brought to you by the letter 'W'
 
This bioinformatics lesson is brought to you by the letter 'T'
This bioinformatics lesson is brought to you by the letter 'T'This bioinformatics lesson is brought to you by the letter 'T'
This bioinformatics lesson is brought to you by the letter 'T'
 
Thoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestThoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contest
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
 
Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2
 
Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1
 
Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1
 
What's in a name? Better vocabularies = better bioinformatics?
What's in a name? Better vocabularies = better bioinformatics?What's in a name? Better vocabularies = better bioinformatics?
What's in a name? Better vocabularies = better bioinformatics?
 
The art of good science writing
The art of good science writingThe art of good science writing
The art of good science writing
 
Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0
 
Polish that presentation! 25 tips to bring clarity to your slides
Polish that presentation! 25 tips to bring clarity to your slidesPolish that presentation! 25 tips to bring clarity to your slides
Polish that presentation! 25 tips to bring clarity to your slides
 
10 tips for adding polish to presentations
10 tips for adding polish to presentations10 tips for adding polish to presentations
10 tips for adding polish to presentations
 
Database talk for Bits & Bites meeting
Database talk for Bits & Bites meetingDatabase talk for Bits & Bites meeting
Database talk for Bits & Bites meeting
 
Benchmarking short-read mapping programs
Benchmarking short-read mapping programsBenchmarking short-read mapping programs
Benchmarking short-read mapping programs
 
Thoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesThoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore Technologies
 
When is a genome finished?
When is a genome finished? When is a genome finished?
When is a genome finished?
 
Twitter 101 - an introduction to Twitter
Twitter 101  - an introduction to TwitterTwitter 101  - an introduction to Twitter
Twitter 101 - an introduction to Twitter
 

Último

An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Último (20)

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 

This bioinformatics lesson is brought to you by the letter 'D'

  • 1. Today's bioinformatics lesson is brought to you by the letter 'D' by Keith Bradnam Image from flickr.com/91619273@N00/ Today'sbloinformatieslesson isbroughttoyoubytheletter101 Imagefromflickr.com/91619273©NO0/
  • 2. D is for Default parametersisforDefaultparameters
  • 3. D is also for Danger!isalsoforDanger!
  • 4.
  • 5. about0,91-6-1?proelootspleasefrtaltusat415,1v.ostet:co/?7 caiwetalatlal7soriyourpateliaseofanostz-Rttoaster/Toleatt?%re bralleltaclones a b e t efectaadocoMPlaererostadotaOSIZ.R*l PapaapreovetozrlssabrelosprocluctosaeOSTER',visite/7ospottavoteo ivit*ostet:coto.
  • 6. X
  • 7. X Nobody reads a toaster manual!Nobodyreadsatoastermanual!
  • 8.
  • 9. But everyone would read a manual for thisButeveryonewouldreadamanualforthis , - - r ----------- opts....tos
  • 10. Bioinformatics programs are not toasters! ANL Bloinformaticsprogramsarenottoasters! -EL
  • 12. At least, read *some* of the manualAtleast,read*some*ofthemanual
  • 13. TIEBOW Bowtie Anultrafestmemory-efficientshortreedaligner OHNSHOPKINS U N I V E R S I T Y Bowtle isanultrafast, memory-efficientshortreadaligner. It alignsshortDNAsequences(reads) to thehumangenomeat arate of over25million 35-bpreadsperhour.Bowtieindexesthegenomewith aBurrows-Wheelerindex tokeep itsmemoryfootprint small: typically about2.2GBfor thehuman genome(2.9GBfor paired-end). OSIcertified Recentnews "Lighterreleased OLighter isanextremely fastandmemory-efficientprogramfor correctingsequencingerrors inDNAsequencingdata.Fordetailson howerror correctioncanhelpimprovethespeedandaccuracy of downstreamanalysistools,seethepaperinGenomeBiology. Sourceandsoftwareavailable atGitHub. "1.1.1-101112014 OFixed acompilinglinkageproblemrelated withMacOSXMavericks. OImprovedperformance forcaseswherethereferencecontainsmany stretches ofNs. OSome minorautomatictestsupdates. 1.1.0-7/19/2014 OAdded support for largeandsmallindexes,removing4-billion- nucleotidebarrier.Bowtiecannowbeusedwithreferencegenomes ofanysize ONo longerreleasing 32-bit binaries.SimplifiedmanualandMakefile accordingly. OPhased outCygWinsupport. OImproved efficiency ofindexfilesloading. OFixed abug thatmadebowtic-inspecz fail insomesituations. O(This releasewasbrieflygivenversionnumber2.0.0, butwe changed it to 1.1.0 to avoidconfusionwithBowtie 2.) 1.0.1release-3/1412014 bowie-bio_sourceforge.ne: SiteMap Home Newsarchive Gettingstarted Manual ToolsthatuseBowtie LatestRelease Bowtie1.1.1 1 0 / 1 / 1 4 Pleasecite.Langmead8,TrapnellCoPopM,Salzberg Ultraastancmemory-efficientalignmentofshot DNAsequencestothenumangenome.GenomeEltol 10:1125. Forreleaseupdates,subscribetothemailinglist. relatedTools Bowtie2: Fast,accuratereadalignment Crossbow:Genotyping,cloudcomputing Tophat:RNA-Seqsplicejunctionmapper Cufflinks:Isoformassembly,quantitation Myrna:Cloud,differentialgeneexpression Lighter:Fasterrorcorrection OthertoolsusingBowtie Pre-builtindexes Considerusing Illumina'siGenomes collection.EachiGenomesarchivecontains pre-builtBowtieandBowtie2indexes. H.sapiens, NCBIGRCh38 2 . 7 GB
  • 14. How to use bowtie bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>] Howtousebowtie bowtie [options]* <ebwt> 1-1 <ml> -2 <m2>1 --12 <r> 1 <s>1 [<hit>]
  • 15. How to use bowtie bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>] Howtousebowtie bowtie [options]* <ebwt> 1-1 <ml> -2 <m2>1 --12 <r> 1 <s>1 [<hit>]
  • 16. bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>] Bowtie has a lot of options!Bowtiehasalotofoptions!
  • 17. Thequeryinputfiles(specifiedeitheras<m1>and<m2>,oras<s>)areFASTQfiles(usuallyhavingextension • fq or , fastg). Thisisthedefault.Seealso: --solexa-quais and --integer-quals. Thequeryinputfiles(specifiedeitheras<mi>and<m2>,oras<s>)areFASTAfiles(usuallyhavingextension fa, .mfa, fna orsimilar).All qualityvaluesareassumedtobe40onthePhredqualityscale. -r T h e queryinputfiles(specifiedeitheras<rni>and<m2>,oras<s>)areRawfiles:onesequenceperline,withoutqualityvalues ornames.All qualityvaluesareassumedtobe40onthePhredqualityscale. -c T h e querysequencesaregivenoncommandline. I.e.<ml>,<m2>and<singles>arecomma-separatedlists ofreadsrather thanlists ofreadfiles. -C/--color A l i g n incolorspace.Readcharactersareinterpretedascolors.Theindexspecifiedmustbeacolorspaceindex(i.e. built with bowtie-build -C,or bowtie will printanerrormessageandquit.SeeColorspacealignmentformoredetails. -Qt--quals <files> Comma-separated list offilescontainingqualityvaluesforcorrespondingunpairedCSFASTAreads.Useincombinationwith -c and-t. --integer-quais is setautomaticallywhen-Q/--guals isspecified. --Q1 <files> Comma-separated list offilescontainingqualityvaluesforcorrespondingCSFASTA#1.mates.Useincombinationwith-C, -f, and-1. --integer-quals issetautomaticallywhen--Q1isspecified. --Q2 <files> Comma-separated list offilescontainingqualityvaluesforcorrespondingCSFASTA#2mates.Useincombinationwith-C, -f, and-2. --integer-quals issetautomaticallywhen--Q2isspecified. -s/--skip <int> S k i p (i.e.donotalign)the first <int>readsorpairsintheinput. -u/--qupto <int> O n l y alignthe first <int>readsorreadpairsfromtheinput (afterthe -s/--skip readsorpairshavebeenskipped).Default: nolimit. -51--trim5 <int> T r i m <int>basesfromhigh-quality(left)endofeachreadbeforealignment(default:0). -3/--trim3 <int> T r i m <int>basesfromlow-quality (right)endofeachreadbeforealignment(default: 0). --phred33-quals I n p u t qualitiesareASCIIcharsequalto thePhredqualityplus33.Default:on. --phred64-quals I n p u t qualitiesareASCIIcharsequalto thePhredqualityplus64.Default: off. --solexa-quals C o n v e r t inputqualitiesfromSolexa(whichcanbenegative)toPhred(whichcan't).Thisisusuallytherightoptionforusewith (unconverted)readsemittedbyGAPipelineversionspriorto1.3.Default: off. --solexa1.3-quals S a m e as--phred64-quals.Thisisusuallythe rightoption forusewith(unconverted)readsemittedbyGAPipelineversion1.3 orlater.Default: off. --integer-quals Qualityvaluesarerepresentedinthereadinput fileasspace-separatedASCIIintegers,e.g.,4040 30 40-, ratherthanASCII characters,e.g., I n t e g e r s aretreatedasbeingonthePhredqualityscaleunless--s01.exa-quals isalsospecified.
  • 18. -k <int> -m<int> -M <int> --best Reportupto<int>validalignmentsperreadorpair(default:1).Validityofalignmentsisdeterminedbythealignmentpolicy(combined effectsof-n, -v, -1,and-e). Ifmorethanonevalidalignmentexistsandthe--bestand--strataoptionsarespecified,thenonlythose alignmentsbelongingtothebestalignment'stratum"willbereported.Bowtieisdesignedtobeveryfastforsmall-kbutbowtiecan becomesignificantlysloweras-kincreases.IfyouwouldliketouseBowtieforlargervaluesof considerbuildinganindexwitha densersuffix-arraysample,i.e.specifyasmaller-ot—offratewheninvokingbowtie-buildfortherelevantindex(seethePerformance tuningsectionfordetails). -a/--all Report allvalidalignmentsperreadorpair(default:off).Validityofalignmentsisdeterminedbythealignmentpolicy(combinedeffectsof -n,-v, -1,and-e). Ifmorethanonevalidalignmentexistsandthe--bestand--strataoptionsarespecified,thenonlythose alignmentsbelongingtothebestalignment"stratum"willbereported.Bowtieisdesignedtobeveryfastforsmall-kbutbowtiecan becomesignificantlyslowerif -a/--all isspecified.IfyouwouldliketouseBowtiewith-a,considerbuildinganindexwithadensersuffix- arraysample,i.e.specifyasmaller-oi—offratewheninvokingbowtie-buildfortherelevantindex(seethePerformancetuningsection fordetails). Suppressallalignmentsforaparticularreadorpair ifmorethan <int> reportablealignmentsexistfor it.Reportablealignmentsarethose thatwouldbereportedgiventhe -n, -v, -1, -e, -k, -a, --best,and --strata options.Default:nolimit.Bowtieisdesignedtobeveryfast forsmall-mbutbowtiecanbecomesignificantlyslowerforlargervaluesof-in. IfyouwouldliketouseBowtieforlargervaluesof-k, considerbuildinganindexwithadensersuffix-arraysample,i.e.specifyasmaller-0/--offratewheninvokingbowtie-buildforthe relevantindex(seethePerformancetuningsectionfordetails). Behaveslike-raexceptthatifareadhasmorethan<int>reportablealignments,oneisreportedatrandom.Indefaultoutputmode,the selectedalignment's7thcolumnissetto<int>-1-1toindicatethereadhasatleast<int>+1validalignments.In -S/--sammode,the selectedalignmentisgivenaMAPQ(mappingquality)of0andthexm:ifieldissetto<int>4-1.Thisoptionrequires--best; ifspecified without--best, --bestisenabledautomatically. MakeBowtieguaranteethatreportedsingletonalignmentsare"best"intermsofstratum(i.e.numberofmismatches,ormismatchesin theseedinthecaseof-r_mode)andintermsofthequalityvaluesatthemismatchedposition(s).Stratumalwaystrumpsquality;e.g.a 1-mismatchalignmentwherethemismatchedpositionhasPhredquality40ispreferredovera2-mismatchalignmentwherethe mismatchedpositionsbothhavePhredquality10.When--bestisnotspecified,Bowtiemayreportalignmentsthataresub-optimalin termsofstratumand/orquality(thoughaneffortismadetoreportthebestalignment).--bes7_modealsoremovesallstrandbias.Note that --bestdoesnotaffectwhichalignmentsareconsidered"valid"bybowtie,onlywhichvalidalignmentsarereportedbyboTertie.When --best isspecifiedandmultiplehitsareallowed(via -k or -a), thealignmentsforagivenreadareguaranteedtoappearinbest-to-worst orderinbewtie'soutput.bowtie issomewhatslowerwhen--best isspecified. --strata I f manyvalidalignmentsexistandarereportable(e.g.arenotdisallowedviathe -k option)andtheyfall intomorethanonealignment "stratum",reportonlythosealignmentsthatfallintothebeststratum.Bydefault,Bowtiereportsallreportablealignmentsregardlessof whethertheyfallintomultiplestrata.When--strata isspecified,--bestmustalsobespecified.
  • 19. -v <int> R e p o r t alignmentswithatmost<int>mismatches.-0and-1optionsareignoredandqualityvalueshavenoeffectonwhat alignmentsarevalid.-v ismutuallyexclusivewith-n. -n/--seedmms<int> Maximum numberofmismatchespermittedinthe"seed",i.e.thefirstLbasepairsoftheread(whereLissetwith -1/-- seedien).Thismaybe0,1, 2or3andthedefaultis2.Thisoptionismutuallyexclusivewiththe -voption. -ef--magerr <int> Maximum permittedtotalofqualityvaluesatallmismatchedreadpositionsthroughouttheentirealignment,notjustinthe "seed".Thedefaultis70.LikeMaq,Dow-tieroundsqualityvaluestothenearest10andsaturatesat30;roundingcanbe disabled with --nomaground. -1/--seedien <int> --nomaground -I/--minins <int> -X/--maxins <int> --nofw/--norc The"seedlength";i.e.,thenumberofbasesonthehigh-qualityendofthereadtowhichthe-nceilingapplies.Thelowest permittedsettingis5andthedefaultis28.bowtieisfasterforlargervaluesof MaqacceptsqualityvaluesinthePhredqualityscale,butinternallyroundsvaluestothenearest10,withamaximumof30.By default,bowtiealsoroundsthisway.--nomagrouncipreventsthisroundinginbowtie. Theminimuminsertsizeforvalidpaired-endalignments.E.g.if -I 60isspecifiedandapaired-endalignmentconsistsoftwo 20-bpalignmentsintheappropriateorientationwitha20-bpgapbetweenthem,thatalignmentisconsideredvalid(aslongas -xisalsosatisfied).A19-bpgapwouldnotbevalidinthatcase.Iftrimmingoptions-3or-!,;arealsoused,the constraint isappliedwithrespecttotheuntrimmedmates.Default:O. Themaximuminsertsizeforvalidpaired-endalignments.E.g.if -x100isspecifiedandapaired-endalignmentconsistsoftwo 20-bpalignmentsintheproperorientationwitha60-bpgapbetweenthem,thatalignmentisconsideredvalid(aslongas -I is alsosatisfied).A61-bpgapwouldnotbevalidinthatcase.Iftrimmingoptions-3or -5arealsoused,the -xconstraintis appliedwithrespecttotheuntrimmedmates,notthetrimmedmates.Default:250. Theupstream/downstreammateorientationsforavalidpaired-endalignmentagainsttheforwardreferencestrand.E.g.,if -- fr isspecifiedandthereisacandidatepaired-endalignmentwherematelappearsupstreamofthereversecomplementof mate2andtheinsertlengthconstraintsaremet,thatalignmentisvalid.Also,ifmate2appearsupstreamofthereverse complementofmatelandallotherconstraintsaremet,thattooisvalid. --rf likewiserequiresthatanupstreammatelbe reverse-complementedandadownstreammate2beforward-oriented. --ff requiresbothanupstreammatelanda downstreammate2tobeforward-oriented.Default: --fr when-C(colorspacealignment)isnotspecified, --ff when-Cis specified. If --nowisspecified,bowtiewillnotattempttoalignagainsttheforwardreferencestrand. If --nort isspecified,bowtiewill notattempttoalignagainstthereverse-complementreferencestrand.Forpaired-endreadsusing --fr or --rf modes,--nofIsT and--norcapplytotheforwardandreverse-complementpairorientations.I.e.specifying--nofwand--±r willonlyfindreads intheR/Forientationwheremate2occursupstreamofmate1withrespecttotheforwardreferencestrand. --maxbts T h e maximumnumberofbacktrackspermittedwhenaligningareadin 2 or-n3mode(default:125without--best,800 with--best).A"backtrack"istheintroductionofaspeculativesubstitutionintothealignment.Withoutthislimit,thedefault
  • 20. Printtheamountofwall-clocktimetakenbyeachphase. -V--offbase <int> When outputtingalignmentsinBowtieformat,considerthefirstbaseofareferencesequencetohaveoffset<int>.Thisoption hasnoeffectin-si—salamode,sinceSAMmandates1-basedoffsets.Default:O. --quiet P r i n t nothingbesidesalignments. --refout --al <filename> --un <filename> --max <filename> --suppress <cols> --fullref WritealignmentstoasetoffilesnamedrefXXXXX.map,wherexxxXXisthe0-paddedindexofthereferencesequencealigned to.Thiscanbeausefulwaytobreakupworkfordownstreamanalyseswhendealingwith,forexample,largenumbersofreads alignedtotheassembledhumangenome.If <hits>isalsospecified,itwillbeignored. --refidx W h e n areferencesequenceisreferredtoinareportedalignment,refertoitby0-basedindex(itsoffsetintothelistof referencesthatwereindexed)ratherthanbyname. Writeallreadsforwhichatleastonealignmentwasreportedtoafilewithname<filename>.Writtenreadswillappearasthey didintheinput,withoutanyofthetrimmingortranslationofqualityvaluesthatmayhavetakenplacewithinbowtie.Paired- endreadswillbewrittentotwoparallelfileswith_1andinserted inthefilename,e.g.,if <filename>isaligned.fq,the#1 andIt2matesthatalignatleastoncewillbewrittentoaligned_l.fqandaligned_2.fa_respectively. Writeallreadsthatcouldnotbealignedtoafilewithname<filename>.Writtenreadswillappearastheydidintheinput, withoutanyofthetrimmingortranslationofqualityvaluesthatmayhavetakenplacewithinBowtie.Paired-endreadswillbe writtentotwoparallelfileswith_1and_2insertedinthefilename,e.g.,if<filename>isunaligned.fq,the#1and#2mates thatfailtoalignwillbewrittentounaligned_l fo andunaligned_2 q respectively.Unless--maxisalsospecified,readswith anumberofvalidalignmentsexceedingthelimitsetwiththe-moptionarealsowrittento<filenane>. Writeallreadswithanumberofvalidalignmentsexceedingthelimitsetwiththe-moptiontoafilewithname<filename>. Writtenreadswillappearastheydidintheinput,withoutanyofthetrimmingortranslationofqualityvaluesthatmayhave takenplacewithin•zowtie.Paired-endreadswillbewrittentotwoparallelfileswith_1and_2insertedinthefilename,e.g.,if <filename>ismax.fa,the#1and#2matesthatexceedthe-mlimitwillbewrittentomax_1.fqandmax_2.fqrespectively. Thesereadsarenotwrittentothefilespecifiedwith--lart. Suppresscolumnsofoutputinthedefaultoutputmode.E.g.if--suppress 1, 5,6isspecified,thereadname,readsequence, andreadqualityfieldswillbeomitted.SeeDefaultBowtieoutputforfielddescriptions.Thisoptionisignored if theoutput modeis-S/--sarr.. Printthefullreferncesequencename,includingwhitespace,inalignmentoutput.Bydefaultbowtieprintseverythinguptobut notincludingthe firstwhitespace.
  • 21. Colorspace --snpphred <int> --snpfrac <dec> --col-cseq --col-equal --col-keepends SAM -S/--sam Whendecodingcolorspacealignments,use <int> astheSNPpenalty.Thisshouldbesetto theuser'sbestguessof thetrue ratio ofSNPsperbasein thesubjectgenome,converted to thePhredqualityscale.E.g., if theuserexpectsabout1SNPevery1,000 positions,--snpphredshouldbeset to30(whichisalsothedefault).Tospecifythefractiondirectly,use --snpfrac. Whendecodingcolorspacealignments,use<dot>astheestimatedratio ofSNPsperbase.Forbestdecodingresults, thisshould beset to theuser'sbestguessof thetrue ratio. bowtie internallyconvertsthe ratio toaPhredquality,andbehavesas if that qualityhadbeensetviathe--zinpphredoption.Default:0.001. Ifreadsareincolorspaceandthe defaultoutputmodeisactive, --col-cseq causesthereads'colorsequencetoappearinthe read-sequencecolumn(column5)instead of thedecodednucleotidesequence.SeetheDecodingcolorspacealignmentssection fordetailsaboutdecoding.Thisoptionisignoredin -s/--sammode. Ifreadsareincolorspaceandthedefaultoutputmodeisactive,--col-cguaicausesthereadsoriginal(color)qualitysequence toappearinthe qualitycolumn(column6)instead of thedecodedqualities.SeetheColorspacealignmentsectionfor details aboutdecoding.Thisoptionisignoredin-S1--sarrimode. Whendecodingcolorpsacealignments,bowtie trims offanucleotideandqualityfromthe leftandrightedgesofthealignment. Thisisbecausethosenucleotidesaresupportedbyonlyonecolor,in contrasttothemiddlenucleotideswhicharesupportedby two.Specify--col-keepends tokeeptheextreme-endnucleotidesandqualities. PrintalignmentsinSAMformat.SeetheSAMoutputsectionofthemanualfordetails.TosuppressallSAMheaders,use--sam- noheadinaddition to -S/--sam.Tosuppressjust the headers (e.g. if thealignmentisagainstaverylargenumberofreference sequences),use--sam-nosqinaddition to -S/--sam. bowtiedoesnot writeBAMfilesdirectly, butSAMoutputcanbeconvertedto BAMonthe flybypiping•DowtielSoutput tosamtools view. -Si—sarnisnotcompatiblewith --refout. --mapo<int> I f analignmentisnon-repetitive(accordingto-m,--strataandotheroptions)settheMAPQ(mappingquality)fieldtothisvalue. SeetheSAMSpecfordetailsabouttheMAK,fieldDefault:255. --sam-hohead S u p p r e s s headerlines(starting with@)whenoutputis-S/--sarr..Thismustbespecifiedinadditionto -S/--sam.--sam-noheadis ignoredunless-s/--sarr. isalsospecified. --sam-hosq S u p p r e s s 1S0headerlineswhenoutputis--Si—sam.Thismustbespecifiedinaddition to -S/--sam.--sam-hosqisignoredunless -sj--sam isalsospecified. --sam-RG<text> A d d <text> (usually of theformTAG:VAL,e.g.ID:IL-1LANE2)asafieldonthe2:RGheaderline.Specify--sam-RGmultipletimesto setmultiplefields.SeetheSAMSpecfordetailsaboutwhatfieldsarelegal.Notethat, if any@RGfieldsaresetusingthisoption, theIDandSMfieldsmustbothbeamongthemtomakethegRGlinelegalaccordingto theSAMSpec.--sari-RGisignoredunless -
  • 22. Performance -of—offrate <int> -pi—threads <int> --mm --shmem Other Overridetheoffrate oftheindexwith <int>. If <int> isgreaterthantheoffrateusedtobuildtheindex,thensomerow markingsarediscardedwhentheindexisreadintomemory.Thisreducesthememoryfootprintofthealignerbutrequires moretimetocalculatetextoffsets. <int> mustbegreaterthanthevalueusedtobuildtheindex. Launch<in':>parallelsearchthreads(default: 1).Threadswillrunonseparateprocessors/coresandsynchronizewhenparsing readsandoutputtingalignments.Searchingforalignmentsishighlyparallel,andspeedupisfairlyclosetolinear.Thisoptionis onlyavailable if b,owtieislinkedwiththeothreadslibrary(i.e. ifBOVIIE_PTHREADS=0isnotspecifiedatbuildtime). Usememory-mappedI/O toloadtheIndex,ratherthannormalCfileI/O.Memory-mappingtheindexallowsmanyconcurrent bowtioprocessesonthesamecomputertosharethesamememoryimageoftheindex(i.e.youpaythememoryoverhead justonce).Thisfacilitatesmemory-efficientparallelizationofbowtieInsituationswhereusing-p isnotpossible. Usesharedmemorytoloadtheindex,ratherthannormalCfileI/O.Usingsharedmemoryallowsmanyconcurrentbowtie processesonthesamecomputertosharethesamememoryimageoftheindex(i.e.youpaythememoryoverheadjustonce). Thisfacilitatesmemory-efficientparallelizationofbowtieinsituationswhereusing-p isnotdesirable.Unlike--mm,--shnem installstheindexintosharedmemorypermanently,or untiltheuserdeletesthesharedmemorychunksmanually.Seeyour operatingsystemdocumentationfordetailsonhowtomanuallylistandremovesharedmemorychunks(onLinuxandMacOS X,thesecommandsareipcsandipcm).YoumayalsoneedtoincreaseyourOS'smaximumshared-memorychunksizeto accomodatelargerindexes;seeyourOSdocumentation. --seed <int> U s e <int>astheseedforpseudo-randomnumbergenerator. --verbose P r i n t verboseoutput(fordebugging). --version P r i n t versioninformationandquit. -hi—help P r i n t usageinformationandquit.
  • 24. "I'll just use the default parameters!""I'lljustusethedefaultparameters!"
  • 25. "What could go wrong?""Whatcouldgowrong?"
  • 26. First, some terminology… Read 1 Read 2 'Insert' inner-mate pair distance DNA/RNA Fragmentadapter adapteradapter First,someterminology... DNA/RNAFragment adapter Read1 inner-matepairdistance 'Insert' Read2
  • 27. We can plot the distribution of inner mate pair distances Wecanplotthedistribution ofinnermatepairdistances
  • 28. ReadsmappedtoTranscriptomewithBowtie2 200 4 0 0 6 0 0 8 0 0 Innersizebetweenmappedreadpairs
  • 30. Bowtie 2 has an -X option for 'max fragment length' The default value is 500 bp = 100 + 100 + 300 What happens if we increase -X to 2000 bp? Bowtie2hasan-Xoption for'maxfragmentlength' Thedefaultvalueis500bp =100+100+300 Whathappensifwe increase-Xto2000bp?
  • 31. New data! c7, 0 2 0 0 ReadsmappedtoTranscriptomewithBowtie2 1 Newdata! 11111r1n1Ithimin Innersizebetweenmappedreadpairs 400 6 0 0 8 0 0
  • 32. Most programs will have some options that you should consider changing Mostprogramswillhavesomeoptions thatyoushouldconsiderchanging
  • 33. Some options from TopHat TopHat command-line option Meaning Default value --num-threads How many CPU threads to use when running TopHat 1 --min-intron-length Minimum intron length 70 -r / --mate-inner-dist Expected (mean) inner distance between mate pairs 50 --mate-std-dev Standard deviation for the distribution on inner distances 20 SomeoptionsfromTopHat 1WTopHat command-lineoption Meaning Default value --num-threads HowmanyCPUthreadsto usewhenrunningTopHat 1
  • 34. You nearly always can run with more processors/threads than the default (1) Younearlyalwayscanrunwithmore processors/threadsthanthedefault(1)
  • 35. Some options from TopHat TopHat command-line option Meaning Default value --num-threads How many CPU threads to use when running TopHat 1 --min-intron-length Minimum intron length 70 -r / --mate-inner-dist Expected (mean) inner distance between mate pairs 50 --mate-std-dev Standard deviation for the distribution on inner distances 20 SomeoptionsfromTopHat 1WTopHat command-lineoption Meaning Default value --num-threads HowmanyCPUthreadsto usewhenrunningTopHat 1 --min-intron-lengthMinimumintronlength 7 0
  • 36. This might not be suitable for non-vertebratesThismightnotbesuitablefornonvertebrates
  • 38. You should document your efforts!Youshoulddocumentyourefforts!
  • 39. You should document as you goYoushoulddocumentasyougo
  • 40. 1iortt,peke),,,v, 4Z-c;(>t_t' 17:11 tresP L LA-r,,oc-nrt t Lek (20tf-1-*)1re-4, (3,31 - or- 1?-1.,tokos•,,,,,4 Rool1,11-- 12ProN rc RIcA.046 -AccAlitipow) Pr5TPlowe opoy.•)&!). t r Aer-od %Pt,' • •••••- t'vsa ( ( A t %44.--5 0-F 6 Co% c - (tcei L0pAr COV Pv-0 t ) ‘_ 5i et-f)triz. a c et 0 ) Loe_ev-T,t,voS otir•I'L re e? (c• ctfte,4411, rev6-1esTmok•as, kJ,So&ISIFir,at- 05.771 - efive,:t •••1 _ 0V(t,d2(sty LI5 IP(112,'A lActi - r r e r Cc. 5 4 , e c e 14(,,r5 ,Ferv'A4t r 3L. PSAe e V O I N 4 or-ti-t1/4 etec.vt, 6,11,2) e ot%Ps. C E-4• a t . t , LAJo-oevw'_1__ s v l y, - 1 c/94,er Ttrei-4/. _ (3 eos-1- r . v c-rf Pi410 /4,5 ok.1 t%'t / e n AY) 6"-',00•4) 6•, O f f t ) s s . .SV.Cte, (v-11re,iYVIte%kV%) ,,,,,e1r,C rot; C : ) r pu ) 4 61 Cke,4tteV c•At r a t tocr CT') kJsIlk); 0 1 ; , 4 - S P -h0460w b l ' " " re.4,ki,6- 1 S t . e12-",r,POT cve,e3e4crey "egg,40LS-TAN oPcaftle - r t , *ger o-yreleve _ A t , 1 7 . 7 L t 4-,5 F a t 1 1 0 PI21rA, ke, i 2 eleta I:3.0ex4itiv a , - Pp v . / )•% ) 1 a rient or- g,vA.) e s - $ • T4P ••••• _ 3 covy L a _ Tsres )ti7t-eltri —1TO r e (Drrn)t-iPM-N4C Arc- CA-13 rrsArtf..tg-S6,„tid—toAtt- e1244-r (0;:t tn.) •••,trttrteT—.1.t.,,IA 1 (..•Oren t irtra t4..c..ec4 IAI/oe4 ( I N T cyt _ tu.s.+-• ST ft:,S i42the4.1st2_7 IRO Ft-ca*.$ nit.P0-6
  • 41. Lab books are good… tiortt,peke),,,v, 17-6,Nos•••,,,.4 PsIcA.0*s6.._111Q• -kce.iitoow) RIN'YrIl -1---- — T c p - efIc•I:rt _ 3 44,elk, -r 4rer -:-1) re-v6-1esTA.Was, U_Si)S IFINit( OS:71—C plAlt " 1 n r • t e . . . n c , 1:Y4'k.. r e PIONCti (C•Nitt. C I E tot, L wkoD°'6'.%L• 54,ec e 14e.m-, r a 1E4--iter,'A‘tr 1 ) PsAee (vo f;) wtE ct.c1,..J) 1160411,pt„,„ 1 V)1.N„. 41.-ettneT—.1„.„;1,1 abbooksaregood... L • •11 • • 1 • - • •nrt (20•- • - • • • • — • - •• • • ••• - opoysn&i) 4._(3,3-, 0 r- r FpaL ( f A ( 0 5 - F 6 col6,1 c - L/tors E 0 P v - 0 s p . ) _ P v i z , 5 i e r •-t• 1 2 0 et 0 ) Weaft-Tt0,1 S , A r c , 6 ( el2-• ittg, LSTAN Or:- L/ ( r , 1 7 2 - 1 ) , oex ' L k - P p 1 . 1 1 ) ) ) 1 a 1 f l I 1/1/1,-1_1w4,12,'A •5) eltri OPco2at1/4.) P'1't F a t * - or- "••••••-) a. Qr44-19— v1,6A*5z•- *gerE-yrn,J,6_1 PQtrAl / 2 e Cer7 / rvs-1,- It 4 tA>11)*-, it-N4C 1244--r (or'sesn IAI/to'NNI (INT cyt V I - , E WA ' nii.P0-6
  • 42. …but electronic lab books are more helpful few,":-) 1iortt,peke),,,v, 17-1-Nos•et,4 RIcA.e*s6 -kce.tiovw) ROVNI-11 -1----- — 51NA _ L 1:3"kk- r e x? IONCti (C•Nist. ClEtoc., PsAee v —ef4c--;rtt (Les-$ Tcp t - r 4rer -z1) 01"14 i1/4JitsTA.LIVai1241 - C 5-0Zc e k.,041,,,,r, 1 ) (vo ,;) e ct.cro,) _ f i c tert-t'4(-1 r 416041 4 , 1 1 V)I d_et41 - butelectroniclabbooksaremorehelpful L LA-1--,oc.•nrt tve7).1re-4, (3,310ife--or- (//k./ 4 L t e l l " - S e P p A r S P i t Pit$ rc • ••••- opoysn&i) F C014:140 *C- C0ev—o1)_%--tvs") 5ieor-(3,11-0.'" a° ec.,-) t w o s ittg. LSTphekje Or- ,/71-111kA est- I:100v4lt-Ae'Lk- pp 1 . 1 1 ) ) ) 1 1 4 t (1:1 "•5 co2at1/4.) ci 17-7,E,Fat •g,cv,rOrtor>y or- 904---, aver to,Vreteve6-1 Pi2trA,Ike,t, /2e/ta mc.A It71- -1 b C LI7 / rr.,1-1-t 4 a>11n),-,P11-N4cAgzAwk-o_p_ Qr44-19- e12-44--r ratorNN.)eiNtlcyt nit.P0-6
  • 45. Tools like Microsoft Word might not be future proof ToolslikeMicrosoftWord mightnotbefutureproof
  • 46. Consider using plain text filesConsiderusingplaintextfiles
  • 47. I.e. something that can be read using 'less'Lasomethingthatcanbereadusing'less'
  • 48. I like to write README files in Markdown format for everything Milk-DNase-Seq-Project:RNA-SeqAnalyis --- - - - Seemain,READMErl'ADME.md) file for moreinformation about this project. 4*BovineRNA-seqdata ## Stored in 'ishare/tamu/Data/RNA-Seq/Cow/2014-10'.Looks like paired-read100 bpdata. In total 31 x 2 files, ranging from 1-3.5GBin size. Seealso the '/ share/tamu/Data/RNA-Seq/Cow/Metadata'directory whichcontains ametadata file whichsuggests thatwehavedata from15 virgin cowsand16 lacating cows. Theultimate goal is to find genes that are differentially expressedbetween thesetwodevelopmentalstages. Thesefiles were originallycompressedwith bzip2, will re-compress with gzip sothat existing pipelines canwork with them.And will alsorenamethem to havefastq suffix: —bash cdishare/tamu/Data/RNA-Seq/Cow/2014-10 bunzip2*.bz2 rename.pl sitxt/fastq/ *.txt gzip *.fastq tttCheckingbarcodes inRNA-Seqdata ## Let'scheckon all barcodesbeingused. Will makesomesoft links to the data files "'bash cdAnalysis/Test mkdirRNA-Seq_Barcode_check cdRNA-Seq_Barcode_check qlogin bunzip2 -c ../../../Data/RNA-Seq/Cow/2014-10/*.bz21grep@HWI"1 sed 's/.* [12]:N:0://' I sort I uniq -c > barcodes_in_identifiers.txt Unfortunately this failed due to a 'Nospace left ondevice error'. Somaybe needto treat each file separately. #TestrunofScytheandSickle# IliketowriteREADMEfilesinMarkdownformatforeverything
  • 49. Easy to output to HTML or PDF Milk-DNase-Seq-Project:RNA-SeqAnalyis --- - - - Seemain,READMErl'ADME.md) file for moreinformation about this project. 4*BovineRNA-seqdata ## Stored in 'ishare/tamu/Data/RNA-Seq/Cow/2014-10'.Looks like paired-read100 bpdata. In total 31 x 2 files, ranging from 1-3.5GBin size. Seealso the '/ share/tamu/Data/RNA-Seq/Cow/Metadata'directory whichcontains ametadata file whichsuggests thatwehavedata from15 virgin cowsand16 lacating cows. Theultimate goal is to find genes that are differentially expressedbetween thesetwodevelopmentalstages. Thesefiles were originallycompressedwith bzip2, will re-compress with gzip sothat existing pipelines canwork with them.And will alsorenamethem to havefastq suffix: —bash cdishare/tamu/Data/RNA-5eq/Cow/2014-10 bunzip2*.bz2 rename.pl sitxt/fastq/ *.txt gzip *.fastq tttCheckingbarcodes inRNA-Seqdata ## 1 is/Test Let'scheckon all barcodesbeingused. Will makesomesoft links to the data Barcodecheck files q Barcode_check "'bash cdAnalysis/Test mkdirRNA-Seq_Barcode_check cdRNA-Seq_Barcode_check qlogin bunzip2 -c ../../../Data/RNA-Seq/Cow/2014-10/*.bz21grep@HWI"1 sed 's/.* [12]:N:0://' I sort I uniq -c > barcodes_in_identifiers.txt Unfortunately this failed due to a 'Nospace left ondevice error'. Somaybe needto treat each file separately. #TestrunofScytheandSickle# Milk-DNase-Seq-Project:RNA-SeqAnalyis SeemainREADMEfileformoreinformationaboutthisproject. BovineRNA-secidata Storedin/share/tamu/Data/RNA-Seq/Cow/2014-10Lookslikepaired-read100bpdata.Intotal31x2files,rangingfrom1-3.5 GBinsize.Seealsotheishareitamo/Data/RNA-Seq/Cow/Metadatadirectorywhichcontainsametadatafilewhichsuggeststhat wehavedatafrom15virgincowsand16lacatingcows. Theultimategoalistofindgenesthataredifferentiallyexpressedbetweenthesetwodevelopmentalstages. Thesefileswereoriginallycompressedwithbzip2,willre-compresswithgzipsothatexistingpipelinescanworkwiththem.Andwill alsorenamethemtohavefastqsuffix: cd/share/tamu/Data/RNA-Seq/Cow/2014-10 bunzip2*.b22 rename.pl s/txt/fastq/ *.txt gzip *.fastq CheckingbarcodesinRNA-Seqdata et'scheckonallbarcodesbeingused.Willmakesomesoftlinkstothedatafiles ogin bunzip2 -c ../../../Data/RNA-Seq/Cow/2014-10/*.bz2 1 grep "MI" 1 sed 's/.. f121:N:0://' 1 sort 1 unlq -c > UnfortunatelythisfailedduetoaNospaceleftondeviceerror'.Somaybeneedtotreateachfileseparately. TestrunofScytheandSickle UnliketheDNase-Segdata.wenowhavepaired-enddata,whichrequiresrunningSicklealittledifferently.Sofirst,let'sdoatest (using10.000readsfromeachoftwopairedFAST()files): cdishare/tamu/Analysis/Test mkdirPaired_end_seythe_sickle test EasytooutputtoHIM_orPDF
  • 50. http://korflab.ucdavis.edu/bootcamp.md http://korflab.ucdavis.edu/bootcamp.html Markdown is easy to read, and converts to useful HTML (with hyperlinks and formatting) http://kortlabiucdavis.edu/bootcamp.md http://kortlabiucdavis.edu/bootcampihtml Markdowniseasytoread,andconvertsto usefulHIM_(withhyperlinksandformatting)
  • 51. Title: Command-line Bootcamp Authors: Keith Bradnam Date: 2015-06-14 Address:Genome Center, UC Davis, Davis, CA, 95616 #Command-line Bootcamp ### Keith Bradnam ###UC Davis Genome Center #10 Version 1.0 - - - June 2015 <br><br><br> ><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img ait="Creative CommonsLicense" style-"border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0188x31.png" /></a><br />This work is licensed under a <a rel-"license" href-"http://creativecommons.org/licenses/by-nc- sa/4.0/">CreativeCommonsAttribution-NonCommercial-ShareAlike 4.0 International License</a>. Please send feedback, questions, money, or abuse to <krbradnamquedavis.edu> Introduction [Introduction] This 'bootcamp' i s intended to provide the reader with a basic overview of essential Unix/Linux commands that will allow them to navigate a file system and move, copy, edit files. I t will also introduce a brief overview of some 'power' commands in Unix. ##Why Unix? [Why Unix] The [Unix operating system][Unix] has been around since 1969. Back then thing as a graphical user interface. You typed everything. I t mayseem a keyboard to issue commands today, but i t ' s much easier to automate keybo mouse tasks. There are several variants of Unix (including [Linux][Linux o u g differences do not matter much for most basic functions. [Unix]: http://en.wikipedia.org/wiki/Unix [Linux]: http://en.wikipedia.org/wiki/Linux Increasingly, the raw output of biological research exists as _in silico_ data, usually in the form of large text files. Unix is particularly suited to working with such files andhas several powerful (and flexible) commands that can process your data for you. The real strength of learning Unix is that most of these commands can be combined in an almost unlimited fashion. So i f you can learn just five Unix commands, you will be able to do a lot more than just five things. OfTypeset Conventions [Typeset] Command-line examples that you are meant to type into a terminal window will be shown_ Command-lineBootcamp KeithBradnam UCDavisGenomeCenter Version1.0—June2015 ThisworkislicensedunderaCreativeCommonsAttribution- NonCommercial-ShareAlike4.0InternationalLicense.Pleasesend feedback,questions,money,orabusetokrbradnamgucdavis.edu Introduction This'bootcampisintendedtoprovidethereaderwithabasicoverviewofessential Unix/Linuxcommandsthatwillallowthemtonavigateafilesystemandmove,copy, editfiles.Itwillalsointroduceabriefoverviewofsome'power'commandsinUnix. WhyUnix? TheUnixoperatingsystemhasbeenaroundsince1969.Backthentherewasno suchthingasagraphicaluserinterface.Youtypedeverything.itmayseemarchaicto useakeyboardtoissuecommandstoday,butitsmucheasiertoautomatekeyboard
  • 52. 0 This repositorySearch Explore Gist Blog Help k b r a d n a m 0 0 KorfLab/Milk-DNase-Seq-Project i2/branch:master • Milk-DNase-Seq-Project/ README_RNA-SecLanalysisand 0 Watch cdAnalysis/Test mkdirRNA—Seq_Barcode_check cdRNA—Seq_Barcode_check qlogin *Star 0 V F o r k 0 i= koradnam3daysagoNewanalysisusingRtorunDEseq2 1_c::,tribiAtOr 317lines(213sloc)12.729kb R a w Blame History m Milk-DNase-Seq-Project:RNA-SeqAnalyis SeemainREADMEfileformoreinformationaboutthisprolect. BovineRNA-seqdata Storedin/shereitamu/Data/RNA-Seq/Cow/2014-1.0Lookslikepaired-read100bpdata.Intotal31x2files,ranging from1-3.5GBinsize.SeealsotheisharettamuiData/RNA-Seq/CowiMetadatadirectorywhichcontainsametadata filewhichsuggeststhatwehavedatafrom15virgincowsand16lacatingcows. Theultimategoalistofindgenesthataredifferentiallyexpressedbetweenthesetwodevelopmentalstages. Thesefileswereoriginallycompressedwithbzip2,willre-compresswithgzipsothatexistingpipelinescanworkwith them.Andwillalsorenamethemtohavefastesuffix: cdishare/tamu/Data/RNA—Seq/Cow/2014-10 bunzip2*.bz2 rename.pl sitxt/fastq/ *.txt gzip *.fastq CheckingbarcodesinRNA-Selldata Let'scheckonallbarcodesbeingused.Willmakesomesoftlinkstothedatafiles 1
  • 53. Sites like GitHub use Markdown ThisrepositorySearch 1 Explore Gist Blog Help kbradnam 0 0 1 KorfLab/Milk-DNase-Seq-Project i2/branch:master • Milk-DNase-Seq-Project/ README_RNA-SecLanalysisand 0 Watch *Star 0 V F o r k 0 i= kbradnam3daysagoNewanalysisusingRtorunDEseq2 1 b 10r 317lines(213sloc)12.729kb RawBlame History I l m Milk-DNase-Seq-Project:RNA-SeqAnalyis SeemainREADMEfileformoreinformationaboutthisprolect. SiteslikeGitHubuseMarkdown bunzip2*.bz2 rename.pl sitxt/fastq/ *.txt gzip *.fastq CheckingbarcodesinRNA-Selldata Let'scheckonallbarcodesbeingused.Willmakesomesoftlinkstothedatafiles cdAnalysis/Test mkdirRNA-Seq_Barcode_check cdRNA-Seg_Barcode_check qlogin
  • 54. Reproducible science is important!Reproduciblescienceisimportant!
  • 55. Reviewers increasingly want more details regarding bioinformatics methods Reviewersincreasinglywantmore detailsregardingbloinformaticsmethods
  • 56. Make it easy to for others to follow your workMakeiteasytoforotherstofollowyourwork