SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
Mapping to assign reads to
genes
Joachim Jacob
20 and 27 January 2014

This presentation is available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Please refer to
http://www.bits.vib.be/ if you use this presentation or parts hereof.
Goal
Assign reads to genes.
The result of the mapping will be used to
construct a summary of the counts: the
count table.

GeneA: 12
GeneB: 5
2 scenarios
Reference genome sequence available
NO reference genome sequence available
● De novo assembly of the reads
(trinity) (transcriptome construction)
● Map the reads to the assembly
(RSEM mapper)
● Extract count table
(note:no removal of polyA is required.
Computationally expensive!)
Reference genome sequence available
Preprocessed reads are mapped to the
reference sequence:
1. Reference is haplotype: mixture of alleles,
leads to mismatches.

35 → for 2 alleles
together

If we compare samples within the same specimen, this effect is similar for all samples.

2. Reads contain sequencing errors
3. Reads derived from mRNA, genome is DNA.
mRNA reads: some reads span introns
●

Reads are derived from mRNA

mRNA
One isoform!

exon
intron

etc.

Many reads span
introns: they need to be aligned
with gaps. This can be used to
detect intron-exon junctions

http://www.ensembl.org
mRNA reads: multiple isoforms exist
●

Isoforms are transcribed at different
levels, contributing differently to the
number of reads.

http://www.ensembl.org
Algorithm: gapped read mapping
●

Exon-first approach: TopHat (popular)

Junction database constructed to
try to map unmapped reads.

TopHat: discovering splice junctions with RNA-Seq
Vol. doi:10.1093/bioinformatics/btp120
25 no. 9 2009, pages 1105–1111
Principle of gapped read mapping
●

STAR: fast and suited for longer reads

STAR: ultrafast universal RNA-seq aligner
Alexander Dobin et al. Bioinformatics
Checklist for mapping to reference genome
1. A reference genome sequence (fasta),
to be indexed by the alignment software.
2. A genome annotation file (GFF3 or
GTF), with indication of currently known
annotations (optional, but highly
recommended)
3. The cleaned (preprocessed) reads (fastq
)
Getting your reference genome sequence
●

●

Genomes to be used by TopHat can be fetched
from iGenomes and for STAR here
If your genome is not
listed above, check
http://ensembl.org
and
http://ensemblgenomes.org ; and follow indexing
software

●

If still no luck, try a specialized species website, e.g.
Indexing a genome
Mapping reads is fairly
fast, because the heavy
lifting is done beforehand:
the reference genome
sequence is preprocessed
by indexing (taking a lot of
time), making mapping fast.
●

On Galaxy, the indexing
has already been
performed for you. Just
choose your genome from
the list.
●
Using genome annotation information
Annotation info is stored in text files formatted as
GTF or GFF3 files.
●

If sequencing is deep enough, the complete
transcriptome structure can be derived from the
mapping: splice junctions, isoforms, variants,...
CuffLinks for example reconstructs the annotation
from an alignment, and generates a GFF file, to be
used. Potentially novel transcripts are included in
this file. But remember, this is NOT OUR GOAL.
●

We will use a GTF file from an respected genome
database to assist the mapping of reads.
●

http://cufflinks.cbcb.umd.edu/
Using genome annotation information
GTF example
Mapping in Galaxy
Mapping in
Galaxy
Basic settings

!
Mapping in Galaxy

!
!
Mapping in Galaxy
Mapping QC
TIP: align a subsample of reads in Galaxy. Play with
the settings, and determine the best outcome.
●

Set the mapping fairly liberal: map as much as
possible, and let the mapper assign mapping
qualities. Ideally, every read maps once ('uniquely
mapped'). In the following step, we will discard reads
mapped to multiple locations ('multi reads').
●

The outcome of the alignment is a SAM or a BAM
format, which you can visualize in Galaxy (or with a
stand-alone viewer such as GenomeView or IGV.
●
Mapping QC
The outcome of the alignment is a SAM or a BAM
format, which you can visualize in Galaxy (or with a
stand-alone viewer such as GenomeView or IGV.
Check whether this
visualization
matches:
- paired end
- splice junctions
- strandedness
- ...

Let's visualize
Practical tips
Position on the reference
genome sequence

Add the GTF to the viz

These are the reads, 2 colours
because of the sense and
antisense strand. (obviously
this library was not stranded!)
Some reads span an intron
Mapping QC - RSeQC
After checking the mapping visually, determine more
metrics with RseQC.

http://rseqc.sourceforge.net/
Mapping QC - RSeQC
Duplication rate observed in the RNA-seq data.

http://rseqc.sourceforge.net/
Mapping QC - RSeQC
Read quality of aligned reads

http://rseqc.sourceforge.net/
Mapping QC - RSeQC
Sequence depth saturation
Q1 → Q4: from low
count genes
to high count genes

Early flattening points
to saturation

http://rseqc.sourceforge.net/
Mapping QC - RSeQC
Sequence depth saturation

http://rseqc.sourceforge.net/
Mapping QC - RSeQC
After checking visually, determine more metrics with
RseQC.

http://rseqc.sourceforge.net/
Mapping QC - RSeQC
After checking visually, determine more metrics with
RseQC.
Deviating!

http://rseqc.sourceforge.net/
Mapping QC - BamQC
Another useful tool is BamQC of the Qualimap Suite.
Be aware however: also useful for DNA-seq!

http://qualimap.bioinfo.cipf.es/
Mapping QC: BamQC

http://qualimap.bioinfo.cipf.es/
Mapping QC: BamQC

http://qualimap.bioinfo.cipf.es/
Mapping QC: BamQC

http://qualimap.bioinfo.cipf.es/
Mapping QC: BamQC

Fraction of genome sequence not covered

http://qualimap.bioinfo.cipf.es/
Mapping QC: BamQC

http://qualimap.bioinfo.cipf.es/
Mapping QC: BamQC

http://qualimap.bioinfo.cipf.es/
Mapping QC: BamQC

http://qualimap.bioinfo.cipf.es/
Mapping QC: BamQC

http://qualimap.bioinfo.cipf.es/
Mapping QC: BamQC

http://qualimap.bioinfo.cipf.es/
Mapping QC: BamQC
Some examples to watch out for.

http://qualimap.bioinfo.cipf.es/
Mapping QC: BamQC
Some examples to watch out for.

http://qualimap.bioinfo.cipf.es/
Mapping QC: BamQC
Some examples to watch out for.

http://qualimap.bioinfo.cipf.es/
Keywords
haplotype
Gapped mapping
GTF
duplication
isoforms
strandedness
coverage

Write in your own words what the terms mean
Exercise
→ → Mapping exercise
Break

Mais conteúdo relacionado

Mais procurados

RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seqJyoti Singh
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...VHIR Vall d’Hebron Institut de Recerca
 
Whole Genome Sequencing Analysis
Whole Genome Sequencing AnalysisWhole Genome Sequencing Analysis
Whole Genome Sequencing AnalysisEfi Athieniti
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewSean Davis
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Sri Ambati
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicshemantbreeder
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingSajad Rafatiyan
 

Mais procurados (20)

RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Rna seq
Rna seqRna seq
Rna seq
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
 
Whole Genome Sequencing Analysis
Whole Genome Sequencing AnalysisWhole Genome Sequencing Analysis
Whole Genome Sequencing Analysis
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
NGS - QC & Dataformat
NGS - QC & Dataformat NGS - QC & Dataformat
NGS - QC & Dataformat
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Genome assembly
Genome assemblyGenome assembly
Genome assembly
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 

Destaque

RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4BITS
 
Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsBITS
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6BITS
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsBITS
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in Rmikaelhuss
 
BITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsBITS
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsBITS
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsBITS
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisJunsu Ko
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-SeqcursoNGS
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS
 
Lokala banksystem utan vinstkrav - för tillväxt och hållbar utveckling
Lokala banksystem utan vinstkrav - för tillväxt och hållbar utvecklingLokala banksystem utan vinstkrav - för tillväxt och hållbar utveckling
Lokala banksystem utan vinstkrav - för tillväxt och hållbar utvecklingJonas Lagander
 
BITS: Introduction to Linux - Software installation the graphical and the co...
BITS: Introduction to Linux -  Software installation the graphical and the co...BITS: Introduction to Linux -  Software installation the graphical and the co...
BITS: Introduction to Linux - Software installation the graphical and the co...BITS
 
Projekt sociala ekonomin i motala - slutrapport 2015
Projekt sociala ekonomin i motala - slutrapport 2015Projekt sociala ekonomin i motala - slutrapport 2015
Projekt sociala ekonomin i motala - slutrapport 2015Jonas Lagander
 
Genevestigator
GenevestigatorGenevestigator
GenevestigatorBITS
 
Besök kimstad rapport förstudie
Besök kimstad   rapport förstudieBesök kimstad   rapport förstudie
Besök kimstad rapport förstudieJonas Lagander
 

Destaque (20)

RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4
 
Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformatics
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformatics
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
 
BITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome level
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformatics
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformatics
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformatics
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-Seq
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry data
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra tool
 
Lokala banksystem utan vinstkrav - för tillväxt och hållbar utveckling
Lokala banksystem utan vinstkrav - för tillväxt och hållbar utvecklingLokala banksystem utan vinstkrav - för tillväxt och hållbar utveckling
Lokala banksystem utan vinstkrav - för tillväxt och hållbar utveckling
 
BITS: Introduction to Linux - Software installation the graphical and the co...
BITS: Introduction to Linux -  Software installation the graphical and the co...BITS: Introduction to Linux -  Software installation the graphical and the co...
BITS: Introduction to Linux - Software installation the graphical and the co...
 
Projekt sociala ekonomin i motala - slutrapport 2015
Projekt sociala ekonomin i motala - slutrapport 2015Projekt sociala ekonomin i motala - slutrapport 2015
Projekt sociala ekonomin i motala - slutrapport 2015
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
 
Besök kimstad rapport förstudie
Besök kimstad   rapport förstudieBesök kimstad   rapport förstudie
Besök kimstad rapport förstudie
 

Semelhante a RNA-seq: Mapping and quality control - part 3

RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_PresentationToyin23
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisSANJANA PANDEY
 
Processing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataProcessing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataAlireza Doustmohammadi
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentationaustinps
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assemblyRamya P
 
Gene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGHGene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGHRafael C. Jimenez
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
Genome Assembly copy
Genome Assembly   copyGenome Assembly   copy
Genome Assembly copyPradeep Kumar
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination NetworkMonica Munoz-Torres
 

Semelhante a RNA-seq: Mapping and quality control - part 3 (20)

Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
3302 3305
3302 33053302 3305
3302 3305
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data Analysis
 
GenomeTraveler
GenomeTravelerGenomeTraveler
GenomeTraveler
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
Processing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataProcessing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing Data
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014
 
Gene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGHGene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGH
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
Genome Assembly copy
Genome Assembly   copyGenome Assembly   copy
Genome Assembly copy
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination Network
 

Mais de BITS

BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS
 
BITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS
 
Marcs (bio)perl course
Marcs (bio)perl courseMarcs (bio)perl course
Marcs (bio)perl courseBITS
 
Basics statistics
Basics statistics Basics statistics
Basics statistics BITS
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networksBITS
 
Cytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksCytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksBITS
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS
 
Vnti11 basics course
Vnti11 basics courseVnti11 basics course
Vnti11 basics courseBITS
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structureBITS
 

Mais de BITS (13)

BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysis
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysis
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec data
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomics
 
BITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generation
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
 
Marcs (bio)perl course
Marcs (bio)perl courseMarcs (bio)perl course
Marcs (bio)perl course
 
Basics statistics
Basics statistics Basics statistics
Basics statistics
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networks
 
Cytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksCytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networks
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1
 
Vnti11 basics course
Vnti11 basics courseVnti11 basics course
Vnti11 basics course
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
 

Último

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 

Último (20)

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 

RNA-seq: Mapping and quality control - part 3

  • 1. Mapping to assign reads to genes Joachim Jacob 20 and 27 January 2014 This presentation is available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Please refer to http://www.bits.vib.be/ if you use this presentation or parts hereof.
  • 2. Goal Assign reads to genes. The result of the mapping will be used to construct a summary of the counts: the count table. GeneA: 12 GeneB: 5
  • 3. 2 scenarios Reference genome sequence available NO reference genome sequence available ● De novo assembly of the reads (trinity) (transcriptome construction) ● Map the reads to the assembly (RSEM mapper) ● Extract count table (note:no removal of polyA is required. Computationally expensive!)
  • 4. Reference genome sequence available Preprocessed reads are mapped to the reference sequence: 1. Reference is haplotype: mixture of alleles, leads to mismatches. 35 → for 2 alleles together If we compare samples within the same specimen, this effect is similar for all samples. 2. Reads contain sequencing errors 3. Reads derived from mRNA, genome is DNA.
  • 5. mRNA reads: some reads span introns ● Reads are derived from mRNA mRNA One isoform! exon intron etc. Many reads span introns: they need to be aligned with gaps. This can be used to detect intron-exon junctions http://www.ensembl.org
  • 6. mRNA reads: multiple isoforms exist ● Isoforms are transcribed at different levels, contributing differently to the number of reads. http://www.ensembl.org
  • 7. Algorithm: gapped read mapping ● Exon-first approach: TopHat (popular) Junction database constructed to try to map unmapped reads. TopHat: discovering splice junctions with RNA-Seq Vol. doi:10.1093/bioinformatics/btp120 25 no. 9 2009, pages 1105–1111
  • 8. Principle of gapped read mapping ● STAR: fast and suited for longer reads STAR: ultrafast universal RNA-seq aligner Alexander Dobin et al. Bioinformatics
  • 9. Checklist for mapping to reference genome 1. A reference genome sequence (fasta), to be indexed by the alignment software. 2. A genome annotation file (GFF3 or GTF), with indication of currently known annotations (optional, but highly recommended) 3. The cleaned (preprocessed) reads (fastq )
  • 10. Getting your reference genome sequence ● ● Genomes to be used by TopHat can be fetched from iGenomes and for STAR here If your genome is not listed above, check http://ensembl.org and http://ensemblgenomes.org ; and follow indexing software ● If still no luck, try a specialized species website, e.g.
  • 11. Indexing a genome Mapping reads is fairly fast, because the heavy lifting is done beforehand: the reference genome sequence is preprocessed by indexing (taking a lot of time), making mapping fast. ● On Galaxy, the indexing has already been performed for you. Just choose your genome from the list. ●
  • 12. Using genome annotation information Annotation info is stored in text files formatted as GTF or GFF3 files. ● If sequencing is deep enough, the complete transcriptome structure can be derived from the mapping: splice junctions, isoforms, variants,... CuffLinks for example reconstructs the annotation from an alignment, and generates a GFF file, to be used. Potentially novel transcripts are included in this file. But remember, this is NOT OUR GOAL. ● We will use a GTF file from an respected genome database to assist the mapping of reads. ● http://cufflinks.cbcb.umd.edu/
  • 13. Using genome annotation information
  • 15. Mapping in Galaxy Mapping in Galaxy Basic settings !
  • 18. Mapping QC TIP: align a subsample of reads in Galaxy. Play with the settings, and determine the best outcome. ● Set the mapping fairly liberal: map as much as possible, and let the mapper assign mapping qualities. Ideally, every read maps once ('uniquely mapped'). In the following step, we will discard reads mapped to multiple locations ('multi reads'). ● The outcome of the alignment is a SAM or a BAM format, which you can visualize in Galaxy (or with a stand-alone viewer such as GenomeView or IGV. ●
  • 19. Mapping QC The outcome of the alignment is a SAM or a BAM format, which you can visualize in Galaxy (or with a stand-alone viewer such as GenomeView or IGV. Check whether this visualization matches: - paired end - splice junctions - strandedness - ... Let's visualize
  • 20. Practical tips Position on the reference genome sequence Add the GTF to the viz These are the reads, 2 colours because of the sense and antisense strand. (obviously this library was not stranded!) Some reads span an intron
  • 21. Mapping QC - RSeQC After checking the mapping visually, determine more metrics with RseQC. http://rseqc.sourceforge.net/
  • 22. Mapping QC - RSeQC Duplication rate observed in the RNA-seq data. http://rseqc.sourceforge.net/
  • 23. Mapping QC - RSeQC Read quality of aligned reads http://rseqc.sourceforge.net/
  • 24. Mapping QC - RSeQC Sequence depth saturation Q1 → Q4: from low count genes to high count genes Early flattening points to saturation http://rseqc.sourceforge.net/
  • 25. Mapping QC - RSeQC Sequence depth saturation http://rseqc.sourceforge.net/
  • 26. Mapping QC - RSeQC After checking visually, determine more metrics with RseQC. http://rseqc.sourceforge.net/
  • 27. Mapping QC - RSeQC After checking visually, determine more metrics with RseQC. Deviating! http://rseqc.sourceforge.net/
  • 28. Mapping QC - BamQC Another useful tool is BamQC of the Qualimap Suite. Be aware however: also useful for DNA-seq! http://qualimap.bioinfo.cipf.es/
  • 32. Mapping QC: BamQC Fraction of genome sequence not covered http://qualimap.bioinfo.cipf.es/
  • 38. Mapping QC: BamQC Some examples to watch out for. http://qualimap.bioinfo.cipf.es/
  • 39. Mapping QC: BamQC Some examples to watch out for. http://qualimap.bioinfo.cipf.es/
  • 40. Mapping QC: BamQC Some examples to watch out for. http://qualimap.bioinfo.cipf.es/
  • 43. Break