SlideShare a Scribd company logo
1 of 36
Rapid automatic microbial
genome annotation
using Prokka
Dr Torsten Seemann
Applied Bioinformatics and Public Health Microbiology - Wed 15 May 2013 @ 1530h - Moller Centre, Cambridge, UK
Background
We come from a land down-under
The team
● Simon Gladman
o VelvetOptimiser author, presenting Galaxy poster
● Paul Harrison
o author of Nesoni toolkit
● David Powell
o author of VAGUE, software wizard, theoretician
● Dieter Bulach
o sequence magician, closes genomes at will
... and we are recruiting.
History
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
?
?
?
?
?
?
?
?
Introduction
De novo
assembly
Align reads to
a reference
Process
De novo assembly
Ideally, one sequence per replicon.
Millions of short
sequences
(reads)
A few long
sequences
(contigs)
Reconstruct the original genome sequence
from the sequence reads only
Annotation
Adding biological information to sequences.
ACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGA
AAAGCAGCCTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTC
CCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTG
GCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGG
ACAGAATGCCCTGCAGGAACTTCTTCTAGAAGACCTTCTCCTCCTG
CAAATAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGA
CCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCT
CTCCGTCCGTCCGTGGGCCACGGCCACCGCTTTTTTTTTTGCC
delta toxin
PubMed: 15353161
ribosome
binding site
transfer RNA
Leu-(UUR)
tandem repeat
CCGT x 3
homopolymer
10 x T
What's in an annotation?
● Location
o which sequence? chromosome 2
o where on the sequence? 100..659
o what strand? -ve
● Feature type
o what is it? protein
coding gene
● Attributes
o protein product? alcohol
dehydrogenase
o enzyme code? EC:1.1.1.1
o subcellular location? cytoplasm
Bacterial feature types
● protein coding genes
o promoter (-10, -35)
o ribosome binding site (RBS)
o coding sequence (CDS)
 signal peptide, protein domains, structure
o terminator
● non coding genes
o transfer RNA (tRNA)
o ribosomal RNA (rRNA)
o non-coding RNA (ncRNA)
● other
o repeat patterns, operons, origin of replication, ...
Automatic annotation
Key bacterial features
● tRNA
o easy to find and annotate: anti-codon
● rRNA
o easy to find and annotate: 5s 16s 23s
● CDS
o straightforward to find candidates
 false positives are often small ORFs
 wrong start codon
o partial genes, remnants
o pseudogenes
o assigning function is the bulk of the workload
Automatic annotation
Two strategies for identifying coding genes:
● sequence alignment
o find known protein sequences in the contigs
 transfer the annotation across
o will miss proteins not in your database
o may miss partial proteins
● ab initio gene finding
o find candidate open reading frames
 build model of ribosome binding sites
 predict coding regions
o may choose the incorrect start codon
o may miss atypical genes, overpredict small genes
Some good existing tools
Software
ab
initio
align-
ment
Availability Speed
RAST yes yes web only 12-24 hours
xBASE yes no web only >8 hours
BG7 no yes standalone >10 hours
PGAAP
(NCBI)
yes yes email / we >1 month
Why another tool?
● Convenience
o I have sequence, just tell me what's in it, please.
● Speed
o exploit multi-core computers (aim < 15min)
● Standards compliant
o GFF3/GBK for viewing, TBL/FSA for Genbank sub.
● Rich consistent trustworthy output
o /product /gene /EC_number
● Provenance
o a record of where/how/why is was annotated so
Why "Prokka" ?
● Unique in Google
● I like the letter "k"
● Easy to type
● It sounds Aussie
● Loosely fits "Prokaryotic Annotation"
● It rhymes with "Quokka"
o Australian cat-sized nocturnal marsupial herbivore
o first Aussie mammal seen by Europeans - "giant rat"
Prokka pipeline (simplified)
tRNA
rRNA
ncRNA
CDS
FASTA
contigs
Infernal
RNAmmer
Prodigal SignalP
Aragorn
sig_peptide
protein domains
HMMER3
protein annotation
BLAST+
Rfam
Swiss Pfam TIGRUser
GFF3
GBK
ASN1
What can you trust?
Predicting protein function
Sequence similarity is a proxy for homology
● Sequence based (alignment)
o tools: BLAST, BLAT, FASTA, Exonerate
o databases: RefSeq, Uniprot, ...
● Model based ("fuzzy sequence" matching)
o PSSM: position specific scoring matrix
 tools: RPS-BLAST, Psi-BLAST
 databases: CDD, COG, Smart
o HMM: hidden Markov models
 tools: HMMER, HHblits
 databases: Pfam, TIGRfams
Sequence databases
I'll just BLAST against the non-redundant database.
-- Anonymous
● Which one?
o nucleotide (nt) or protein (nr)
● It's actually quite redundant
o only eliminates exact matching sequences
● It's not picky
o nearly anything is admitted, garbage in garbage out
● It's too big
o searching takes too long
Hierarchical searching
● Facts
o searching against smaller databases is faster
o searching against similar sequences is faster
● Idea
o start with small set of close proteins
o advance to larger sets of more distant proteins
● Prokka
o your own custom "trusted" set (optional)
o core bacterial proteome (default)
o genus specific proteome (optional)
o whole protein HMMs: PRK clusters, TIGRfams
o protein domain HMMs: Pfam
Core bacterial proteome
● Many bacterial proteins are conserved
o experimentally validated
o small number of them
o good annotations
● Prokka provides this database
o derived from UniProt-Swissprot
o only bacterial proteins
o only accept evidence level 1 (aa) or 2 (RNA)
o reject "Fragment" entries
o extract /gene /EC_number /product /db_xref
● First step gets ~50% of the genes
o BLAST+ blastp, multi-threading to use all CPUs
The remainder
● Prokka has genus specific databases
o aim to capture "genus specific" naming conventions
o derived from proteins in completed genomes
o proteins are clustered and majority annotation wins
o some annotations are rubbish though
● Custom model databases
o I took COG/PRK MSAs and made HMMs
● Existing model databases
o Pfam, TIGRfams are well curated
● And if all else fails
Provenance
Provenance
Recording where an annotation came from.
Prokka uses Genbank "evidence qualifier" tags:
Wet lab
/experiment="EXISTENCE:Northern blot"
Dry lab
/inference="similar to DNA sequence:INSD:AACN010222672.1"
/inference="profile:tRNAscan:2.1"
/inference="protein motif:InterPro:IPR001900"
/inference="ab initio prediction:Glimmer:3.0"
Example from Prokka
Feature Type:
tRNA
Location:
contig000341 @ 655..730 +
Attributes:
/gene="tRNA-Leu(UUR)"
/anticodon=(pos:678..680,aa:Leu)
/product="transfer RNA-Leu(UUR)"
/inference="profile:Aragorn:1.2"
Software quality
Software goals
● Follow basic conventions
o "prokka" should say something helpful
o "prokka -h" or "prokka --help" should show help
● All options should be optional
o "prokka contigs.fa" should do something useful
● Fail gracefully
o check your dependencies exist
o produce useful error messages
o generate a log file (provenance!)
● Use standard input and output file formats
o or at least tab-separated values if you insist...
Prokka in context
● Prokka is not
o particularly original
o technically or algorithmically significant
o foolproof to install some dependencies
o for everyone
BUT
● Prokka
o is an ongoing project which will only improve :-)
o checks it will run properly before wasting your time
o does what it claims, and does it quickly
o is being used widely
Conclusions
Prokka in the wild
● Pathogen Informatics @ Sanger UK
o Andrew Page
o 50,000 draft genomes in 2 weeks (24 sec each!)
● Austin Hospital @ Melbourne AU
o Ben Howden - Dept Infectious Diseases
o assembly & annotation of MiSeq clinical isolates
● VBC @ Monash AU
o assemble & annotate all of SRA
● Many more
o Public Health Agency of Canada
Planned features
● Modularity
o alternate sub-tools eg. Aragorn vs tRNAscan-SE
o every sub-system should be optional
o facilitate entry into Galaxy Toolshed
● Better support for
o metagenome assemblies, viruses and archaea
o broken genes, pseudogenes, assembly breakpoints
● Faster
o smaller core databases
o better parallelisation and less disk i/o
● Prokka-Web
o web server version currently in beta testing
Acknowledgements
● Organisers
o Conference committee - for inviting me
o Wellcome Trust - Laura Hubbard
● Original Prokka testers
o Simon Gladman & Dieter Bulach (internal)
o Tim Stinear & Scott Chandry (external)
● Funding
o VLSCI / LSCC
o Monash University
● Family
o Naomi, Oskar, Zoe - for tolerating my absences
Contact
Email torsten.seemann@monash.edu
Twitter @torstenseemann
Blog TheGenomeFactory.blogspot.com
Web www.bioinformatics.net.au
Thank you.

More Related Content

What's hot

Sequence assembly
Sequence assemblySequence assembly
Sequence assemblyRamya P
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function predictionLars Juhl Jensen
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserHoffman Lab
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analysesrjorton
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijayVijay Hemmadi
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesMelanie Courtot
 
Expressed sequence tag (EST), molecular marker
Expressed sequence tag (EST), molecular markerExpressed sequence tag (EST), molecular marker
Expressed sequence tag (EST), molecular markerKAUSHAL SAHU
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing priyanka raviraj
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotationScott Dawson
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsAthira RG
 

What's hot (20)

Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Gene prediction strategies
Gene prediction strategies Gene prediction strategies
Gene prediction strategies
 
Genomics types
Genomics typesGenomics types
Genomics types
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome Browser
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analyses
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resources
 
Expressed sequence tag (EST), molecular marker
Expressed sequence tag (EST), molecular markerExpressed sequence tag (EST), molecular marker
Expressed sequence tag (EST), molecular marker
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 

Viewers also liked

Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014Torsten Seemann
 
Dna sequencing powerpoint
Dna sequencing powerpointDna sequencing powerpoint
Dna sequencing powerpoint14cummke
 
De novo assemble for NGS
De novo assemble for NGSDe novo assemble for NGS
De novo assemble for NGSlangkang
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Keith Bradnam
 
The use of Fuse Connectors in Cold-Formed Steel Drive-In Racks, Thesis Presen...
The use of Fuse Connectors in Cold-Formed Steel Drive-In Racks, Thesis Presen...The use of Fuse Connectors in Cold-Formed Steel Drive-In Racks, Thesis Presen...
The use of Fuse Connectors in Cold-Formed Steel Drive-In Racks, Thesis Presen...Chris Wodzinski
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioLex Nederbragt
 
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...Torsten Seemann
 
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014Torsten Seemann
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesGenome Reference Consortium
 
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Torsten Seemann
 
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015Torsten Seemann
 
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...ExternalEvents
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Torsten Seemann
 
Annotating the cytochrome P450 CYP720B gene family in white spruce
Annotating the cytochrome P450 CYP720B gene family in white spruceAnnotating the cytochrome P450 CYP720B gene family in white spruce
Annotating the cytochrome P450 CYP720B gene family in white spruceShaun Jackman
 

Viewers also liked (20)

Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
 
Dna sequencing
Dna    sequencingDna    sequencing
Dna sequencing
 
DNA Sequencing
DNA SequencingDNA Sequencing
DNA Sequencing
 
Dna sequencing powerpoint
Dna sequencing powerpointDna sequencing powerpoint
Dna sequencing powerpoint
 
De novo assemble for NGS
De novo assemble for NGSDe novo assemble for NGS
De novo assemble for NGS
 
Earthquake Fuse
Earthquake FuseEarthquake Fuse
Earthquake Fuse
 
Genome Assembly Forensics
Genome Assembly ForensicsGenome Assembly Forensics
Genome Assembly Forensics
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1
 
The use of Fuse Connectors in Cold-Formed Steel Drive-In Racks, Thesis Presen...
The use of Fuse Connectors in Cold-Formed Steel Drive-In Racks, Thesis Presen...The use of Fuse Connectors in Cold-Formed Steel Drive-In Racks, Thesis Presen...
The use of Fuse Connectors in Cold-Formed Steel Drive-In Racks, Thesis Presen...
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBio
 
20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop
 
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
 
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
 
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
 
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
 
2015 12-09 nmdd
2015 12-09 nmdd2015 12-09 nmdd
2015 12-09 nmdd
 
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
 
Annotating the cytochrome P450 CYP720B gene family in white spruce
Annotating the cytochrome P450 CYP720B gene family in white spruceAnnotating the cytochrome P450 CYP720B gene family in white spruce
Annotating the cytochrome P450 CYP720B gene family in white spruce
 

Similar to Prokka - rapid bacterial genome annotation - ABPHM 2013

RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
Introducing data analysis: reads to results
Introducing data analysis: reads to resultsIntroducing data analysis: reads to results
Introducing data analysis: reads to resultsAGRF_Ltd
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshopc.titus.brown
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Jane Landolin
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsJoão André Carriço
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
Reproducible research - to infinity
Reproducible research - to infinityReproducible research - to infinity
Reproducible research - to infinityPeterMorrell4
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
IPK - Reproducible research - To infinity
IPK - Reproducible research - To infinityIPK - Reproducible research - To infinity
IPK - Reproducible research - To infinityPeterMorrell4
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityMonica Munoz-Torres
 
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSMirko Rossi
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Monica Munoz-Torres
 
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...Andor Kiss
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopMorgan Langille
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 

Similar to Prokka - rapid bacterial genome annotation - ABPHM 2013 (20)

RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
Snapgene
SnapgeneSnapgene
Snapgene
 
Introducing data analysis: reads to results
Introducing data analysis: reads to resultsIntroducing data analysis: reads to results
Introducing data analysis: reads to results
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
Thesis biobix
Thesis biobixThesis biobix
Thesis biobix
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and Annotations
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
Reproducible research - to infinity
Reproducible research - to infinityReproducible research - to infinity
Reproducible research - to infinity
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
IPK - Reproducible research - To infinity
IPK - Reproducible research - To infinityIPK - Reproducible research - To infinity
IPK - Reproducible research - To infinity
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
 
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGS
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 

More from Torsten Seemann

How to write bioinformatics software no one will use
How to write bioinformatics software no one will useHow to write bioinformatics software no one will use
How to write bioinformatics software no one will useTorsten Seemann
 
How to write bioinformatics software people will use and cite - t.seemann - ...
How to write bioinformatics software people will use and cite -  t.seemann - ...How to write bioinformatics software people will use and cite -  t.seemann - ...
How to write bioinformatics software people will use and cite - t.seemann - ...Torsten Seemann
 
Snippy - T.Seemann - Poster - Genome Informatics 2016
Snippy - T.Seemann - Poster - Genome Informatics 2016Snippy - T.Seemann - Poster - Genome Informatics 2016
Snippy - T.Seemann - Poster - Genome Informatics 2016Torsten Seemann
 
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Torsten Seemann
 
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...Torsten Seemann
 
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...Torsten Seemann
 
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...
Sequencing your poo with a usb stick -  Linux.conf.au 2016 miniconf  - mon 1 ...Sequencing your poo with a usb stick -  Linux.conf.au 2016 miniconf  - mon 1 ...
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...Torsten Seemann
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015Torsten Seemann
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015Torsten Seemann
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015Torsten Seemann
 
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Torsten Seemann
 
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Torsten Seemann
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Torsten Seemann
 
Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Torsten Seemann
 
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...Torsten Seemann
 

More from Torsten Seemann (15)

How to write bioinformatics software no one will use
How to write bioinformatics software no one will useHow to write bioinformatics software no one will use
How to write bioinformatics software no one will use
 
How to write bioinformatics software people will use and cite - t.seemann - ...
How to write bioinformatics software people will use and cite -  t.seemann - ...How to write bioinformatics software people will use and cite -  t.seemann - ...
How to write bioinformatics software people will use and cite - t.seemann - ...
 
Snippy - T.Seemann - Poster - Genome Informatics 2016
Snippy - T.Seemann - Poster - Genome Informatics 2016Snippy - T.Seemann - Poster - Genome Informatics 2016
Snippy - T.Seemann - Poster - Genome Informatics 2016
 
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
 
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
 
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
 
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...
Sequencing your poo with a usb stick -  Linux.conf.au 2016 miniconf  - mon 1 ...Sequencing your poo with a usb stick -  Linux.conf.au 2016 miniconf  - mon 1 ...
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
 
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
 
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015
 
Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012
 
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
 

Recently uploaded

Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Cherry
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACherry
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxCherry
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfCherry
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Cherry
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Cherry
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Cherry
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cherry
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfCherry
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 

Recently uploaded (20)

Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdf
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdf
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 

Prokka - rapid bacterial genome annotation - ABPHM 2013

  • 1. Rapid automatic microbial genome annotation using Prokka Dr Torsten Seemann Applied Bioinformatics and Public Health Microbiology - Wed 15 May 2013 @ 1530h - Moller Centre, Cambridge, UK
  • 3. We come from a land down-under
  • 4. The team ● Simon Gladman o VelvetOptimiser author, presenting Galaxy poster ● Paul Harrison o author of Nesoni toolkit ● David Powell o author of VAGUE, software wizard, theoretician ● Dieter Bulach o sequence magician, closes genomes at will ... and we are recruiting.
  • 5. History 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 ? ? ? ? ? ? ? ?
  • 7. De novo assembly Align reads to a reference Process
  • 8. De novo assembly Ideally, one sequence per replicon. Millions of short sequences (reads) A few long sequences (contigs) Reconstruct the original genome sequence from the sequence reads only
  • 9. Annotation Adding biological information to sequences. ACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGA AAAGCAGCCTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTC CCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTG GCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGG ACAGAATGCCCTGCAGGAACTTCTTCTAGAAGACCTTCTCCTCCTG CAAATAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGA CCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCT CTCCGTCCGTCCGTGGGCCACGGCCACCGCTTTTTTTTTTGCC delta toxin PubMed: 15353161 ribosome binding site transfer RNA Leu-(UUR) tandem repeat CCGT x 3 homopolymer 10 x T
  • 10. What's in an annotation? ● Location o which sequence? chromosome 2 o where on the sequence? 100..659 o what strand? -ve ● Feature type o what is it? protein coding gene ● Attributes o protein product? alcohol dehydrogenase o enzyme code? EC:1.1.1.1 o subcellular location? cytoplasm
  • 11. Bacterial feature types ● protein coding genes o promoter (-10, -35) o ribosome binding site (RBS) o coding sequence (CDS)  signal peptide, protein domains, structure o terminator ● non coding genes o transfer RNA (tRNA) o ribosomal RNA (rRNA) o non-coding RNA (ncRNA) ● other o repeat patterns, operons, origin of replication, ...
  • 13. Key bacterial features ● tRNA o easy to find and annotate: anti-codon ● rRNA o easy to find and annotate: 5s 16s 23s ● CDS o straightforward to find candidates  false positives are often small ORFs  wrong start codon o partial genes, remnants o pseudogenes o assigning function is the bulk of the workload
  • 14. Automatic annotation Two strategies for identifying coding genes: ● sequence alignment o find known protein sequences in the contigs  transfer the annotation across o will miss proteins not in your database o may miss partial proteins ● ab initio gene finding o find candidate open reading frames  build model of ribosome binding sites  predict coding regions o may choose the incorrect start codon o may miss atypical genes, overpredict small genes
  • 15. Some good existing tools Software ab initio align- ment Availability Speed RAST yes yes web only 12-24 hours xBASE yes no web only >8 hours BG7 no yes standalone >10 hours PGAAP (NCBI) yes yes email / we >1 month
  • 16. Why another tool? ● Convenience o I have sequence, just tell me what's in it, please. ● Speed o exploit multi-core computers (aim < 15min) ● Standards compliant o GFF3/GBK for viewing, TBL/FSA for Genbank sub. ● Rich consistent trustworthy output o /product /gene /EC_number ● Provenance o a record of where/how/why is was annotated so
  • 17. Why "Prokka" ? ● Unique in Google ● I like the letter "k" ● Easy to type ● It sounds Aussie ● Loosely fits "Prokaryotic Annotation" ● It rhymes with "Quokka" o Australian cat-sized nocturnal marsupial herbivore o first Aussie mammal seen by Europeans - "giant rat"
  • 18. Prokka pipeline (simplified) tRNA rRNA ncRNA CDS FASTA contigs Infernal RNAmmer Prodigal SignalP Aragorn sig_peptide protein domains HMMER3 protein annotation BLAST+ Rfam Swiss Pfam TIGRUser GFF3 GBK ASN1
  • 19. What can you trust?
  • 20. Predicting protein function Sequence similarity is a proxy for homology ● Sequence based (alignment) o tools: BLAST, BLAT, FASTA, Exonerate o databases: RefSeq, Uniprot, ... ● Model based ("fuzzy sequence" matching) o PSSM: position specific scoring matrix  tools: RPS-BLAST, Psi-BLAST  databases: CDD, COG, Smart o HMM: hidden Markov models  tools: HMMER, HHblits  databases: Pfam, TIGRfams
  • 21. Sequence databases I'll just BLAST against the non-redundant database. -- Anonymous ● Which one? o nucleotide (nt) or protein (nr) ● It's actually quite redundant o only eliminates exact matching sequences ● It's not picky o nearly anything is admitted, garbage in garbage out ● It's too big o searching takes too long
  • 22. Hierarchical searching ● Facts o searching against smaller databases is faster o searching against similar sequences is faster ● Idea o start with small set of close proteins o advance to larger sets of more distant proteins ● Prokka o your own custom "trusted" set (optional) o core bacterial proteome (default) o genus specific proteome (optional) o whole protein HMMs: PRK clusters, TIGRfams o protein domain HMMs: Pfam
  • 23. Core bacterial proteome ● Many bacterial proteins are conserved o experimentally validated o small number of them o good annotations ● Prokka provides this database o derived from UniProt-Swissprot o only bacterial proteins o only accept evidence level 1 (aa) or 2 (RNA) o reject "Fragment" entries o extract /gene /EC_number /product /db_xref ● First step gets ~50% of the genes o BLAST+ blastp, multi-threading to use all CPUs
  • 24. The remainder ● Prokka has genus specific databases o aim to capture "genus specific" naming conventions o derived from proteins in completed genomes o proteins are clustered and majority annotation wins o some annotations are rubbish though ● Custom model databases o I took COG/PRK MSAs and made HMMs ● Existing model databases o Pfam, TIGRfams are well curated ● And if all else fails
  • 26. Provenance Recording where an annotation came from. Prokka uses Genbank "evidence qualifier" tags: Wet lab /experiment="EXISTENCE:Northern blot" Dry lab /inference="similar to DNA sequence:INSD:AACN010222672.1" /inference="profile:tRNAscan:2.1" /inference="protein motif:InterPro:IPR001900" /inference="ab initio prediction:Glimmer:3.0"
  • 27. Example from Prokka Feature Type: tRNA Location: contig000341 @ 655..730 + Attributes: /gene="tRNA-Leu(UUR)" /anticodon=(pos:678..680,aa:Leu) /product="transfer RNA-Leu(UUR)" /inference="profile:Aragorn:1.2"
  • 29. Software goals ● Follow basic conventions o "prokka" should say something helpful o "prokka -h" or "prokka --help" should show help ● All options should be optional o "prokka contigs.fa" should do something useful ● Fail gracefully o check your dependencies exist o produce useful error messages o generate a log file (provenance!) ● Use standard input and output file formats o or at least tab-separated values if you insist...
  • 30. Prokka in context ● Prokka is not o particularly original o technically or algorithmically significant o foolproof to install some dependencies o for everyone BUT ● Prokka o is an ongoing project which will only improve :-) o checks it will run properly before wasting your time o does what it claims, and does it quickly o is being used widely
  • 32. Prokka in the wild ● Pathogen Informatics @ Sanger UK o Andrew Page o 50,000 draft genomes in 2 weeks (24 sec each!) ● Austin Hospital @ Melbourne AU o Ben Howden - Dept Infectious Diseases o assembly & annotation of MiSeq clinical isolates ● VBC @ Monash AU o assemble & annotate all of SRA ● Many more o Public Health Agency of Canada
  • 33. Planned features ● Modularity o alternate sub-tools eg. Aragorn vs tRNAscan-SE o every sub-system should be optional o facilitate entry into Galaxy Toolshed ● Better support for o metagenome assemblies, viruses and archaea o broken genes, pseudogenes, assembly breakpoints ● Faster o smaller core databases o better parallelisation and less disk i/o ● Prokka-Web o web server version currently in beta testing
  • 34. Acknowledgements ● Organisers o Conference committee - for inviting me o Wellcome Trust - Laura Hubbard ● Original Prokka testers o Simon Gladman & Dieter Bulach (internal) o Tim Stinear & Scott Chandry (external) ● Funding o VLSCI / LSCC o Monash University ● Family o Naomi, Oskar, Zoe - for tolerating my absences
  • 35. Contact Email torsten.seemann@monash.edu Twitter @torstenseemann Blog TheGenomeFactory.blogspot.com Web www.bioinformatics.net.au