SlideShare a Scribd company logo
1 of 54
Download to read offline
Here you have your reads: now what?
Making sense of high-throughput sequencing illustrated with ChIP- and RNA-seq data
Javier Quilez Oliete - Bioinformatician @ Beato Lab
1
Downstream
analyses
Core
analysis
ChIP-seq
RNA-seq
…
- Sample-level
- Homogeneous
- Similar steps
across *seq types
- Multi-sample
- Project-specific
- Varied/flexible
- Combine different
*seq types
ChIP-seq
DNA
Protein
ChIP-seq
DNA
Protein
Formaldehyde
(chemical binding)
ChIP-seq
DNA
Protein
Formaldehyde
(chemical binding)
X X
X
Sonication
(physical fragmentation)
ChIP fragment
ChIP-seq
DNA
Protein
Formaldehyde
(chemical binding)
X X
Technical sequences
(e.g. adapters)
ChIP fragment
X
Sonication
(physical fragmentation)
The fragment sequenced
includes sequence beyond
that of the actual binding
ChIP-seq
DNA
Protein
Formaldehyde
(chemical binding)
X X
Technical sequences
(e.g. adapters)
Single end
Single end
ChIP fragment
X
Sonication
(physical fragmentation)
Most common
ChIP-seq
DNA
Protein
Formaldehyde
(chemical binding)
X X
Technical sequences
(e.g. adapters)
Paired end
Paired end
ChIP fragment
X
Sonication
(physical fragmentation)
Core analysis (ChIP-seq)
Trimming
- sequencing adapters
- low-quality ends
- too-short reads
Trimmomatic
Improves alignment to
genome sequence
Alignment
Protein
binding
site
Genome sequence
Alignment
Protein
binding
site
Genome sequence
BWA
Bowtie
GEM
…
Read-by-read sequence alignment to
genome sequence with the goal of
identifying the genomic location from
which the ChIP fragment originated
Read counts profiles
Protein
binding
site
Genome sequence
bam2wig
BEDtools
SAMtools
Deeptools
…
Read counts profiles
100 million reads
10 million reads
Not comparable!
Read counts profiles
Reads per million
Comparable!
Peak calling
Genome sequence
Signal background
Peak calling
Genome sequence
Signal background
Signal enrichment
Peak calling
Peak region
Genome sequence
Identification of regions showing
significant signal enrichment over the
background levels (MACS2, Zerone…)
Signal background
Signal enrichment
Peak calling
Control
(no ChIP)
ChIP sample
Peak region
Signal enrichment
Signal enrichment
Peak calling
Control
(no ChIP)
ChIP sample
Including a control sample allows
accounting for spurious enrichments
(resulting from structural variation in
the genome, ChIP artefacts) and
improves the accuracy of the peak
calling by reducing the false positives
Peak region
True enrichment
Spurious enrichment
Downstream analyses (ChIP-seq)
Genome Browser
Scale
chr9:
T47D gDNA
T47D T0 Roberto input
T47D PR T0
T47D PR T60
T47D T0 PR
T47D T30 PR 1nM
T47D T30 PR 2nM
T47D T30 PR 5nM
T47D T30 PR 10nM
T47D T30 PR 100nM
GENCODE v24
Pseudogenes
Segmental Dups
Simple Repeats
RepeatMasker
WM + SDust
T47D PR T0 [0]
50 kb hg38
137,300,000 137,350,000 137,400,000
DNA-seq peaks indentified with MACS2 (without control)
T47D gDNA RPM profile
ChIP-seq peaks indentified with MACS2
Input T0 (Roberto) RPM profile
ChIP-seq peaks indentified with MACS2
T47D PR T0 (gv_009_02_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D PR T60 (gv_066_01_01_chipseq) RPM profile
T47D input (gv_098_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T0 PR (gv_092_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T30 PR 1nM (gv_093_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T30 PR 2nM (gv_094_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T30 PR 5nM (gv_095_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T30 PR 10nM (gv_097_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T30 PR 100nM (gv_096_01_01_chipseq) RPM profile
GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default)
All GENCODE transcripts including comprehensive set V24
Duplications of >1000 Bases of Non-RepeatMasked Sequence
Simple Tandem Repeats by TRF
Repeating Elements by RepeatMasker
Genomic Intervals Masked by WindowMasker + SDust
ChIP-seq peaks indentified with MACS2
T47D gDNA
1 _
0 _
Input T0 (Roberto)
1 _
0 _
T47D PR T0
1 _
0 _
T47D PR T60
1 _
0 _
T47D input
1 _
0 _
T47D T0 PR
1 _
0 _
T47D T30 PR 1nM
1 _
0 _
T47D T30 PR 2nM
1 _
0 _
T47D T30 PR 5nM
1 _
0 _
T47D T30 PR 10nM
1 _
0 _
T47D T30 PR 100nM
1 _
0 _
Control
samples
Genome Browser
Scale
chr9:
T47D gDNA
T47D T0 Roberto input
T47D PR T0
T47D PR T60
T47D T0 PR
T47D T30 PR 1nM
T47D T30 PR 2nM
T47D T30 PR 5nM
T47D T30 PR 10nM
T47D T30 PR 100nM
GENCODE v24
Pseudogenes
Segmental Dups
Simple Repeats
RepeatMasker
WM + SDust
T47D PR T0 [0]
50 kb hg38
137,300,000 137,350,000 137,400,000
DNA-seq peaks indentified with MACS2 (without control)
T47D gDNA RPM profile
ChIP-seq peaks indentified with MACS2
Input T0 (Roberto) RPM profile
ChIP-seq peaks indentified with MACS2
T47D PR T0 (gv_009_02_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D PR T60 (gv_066_01_01_chipseq) RPM profile
T47D input (gv_098_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T0 PR (gv_092_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T30 PR 1nM (gv_093_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T30 PR 2nM (gv_094_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T30 PR 5nM (gv_095_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T30 PR 10nM (gv_097_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T30 PR 100nM (gv_096_01_01_chipseq) RPM profile
GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default)
All GENCODE transcripts including comprehensive set V24
Duplications of >1000 Bases of Non-RepeatMasked Sequence
Simple Tandem Repeats by TRF
Repeating Elements by RepeatMasker
Genomic Intervals Masked by WindowMasker + SDust
ChIP-seq peaks indentified with MACS2
T47D gDNA
1 _
0 _
Input T0 (Roberto)
1 _
0 _
T47D PR T0
1 _
0 _
T47D PR T60
1 _
0 _
T47D input
1 _
0 _
T47D T0 PR
1 _
0 _
T47D T30 PR 1nM
1 _
0 _
T47D T30 PR 2nM
1 _
0 _
T47D T30 PR 5nM
1 _
0 _
T47D T30 PR 10nM
1 _
0 _
T47D T30 PR 100nM
1 _
0 _
Control
samples
True
peaks
Genome Browser
Scale
chr9:
T47D gDNA
T47D T0 Roberto input
T47D PR T0
T47D PR T60
T47D T0 PR
T47D T30 PR 1nM
T47D T30 PR 2nM
T47D T30 PR 5nM
T47D T30 PR 10nM
T47D T30 PR 100nM
GENCODE v24
Pseudogenes
Segmental Dups
Simple Repeats
RepeatMasker
WM + SDust
T47D PR T0 [0]
50 kb hg38
137,300,000 137,350,000 137,400,000
DNA-seq peaks indentified with MACS2 (without control)
T47D gDNA RPM profile
ChIP-seq peaks indentified with MACS2
Input T0 (Roberto) RPM profile
ChIP-seq peaks indentified with MACS2
T47D PR T0 (gv_009_02_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D PR T60 (gv_066_01_01_chipseq) RPM profile
T47D input (gv_098_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T0 PR (gv_092_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T30 PR 1nM (gv_093_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T30 PR 2nM (gv_094_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T30 PR 5nM (gv_095_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T30 PR 10nM (gv_097_01_01_chipseq) RPM profile
ChIP-seq peaks indentified with MACS2
T47D T30 PR 100nM (gv_096_01_01_chipseq) RPM profile
GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default)
All GENCODE transcripts including comprehensive set V24
Duplications of >1000 Bases of Non-RepeatMasked Sequence
Simple Tandem Repeats by TRF
Repeating Elements by RepeatMasker
Genomic Intervals Masked by WindowMasker + SDust
ChIP-seq peaks indentified with MACS2
T47D gDNA
1 _
0 _
Input T0 (Roberto)
1 _
0 _
T47D PR T0
1 _
0 _
T47D PR T60
1 _
0 _
T47D input
1 _
0 _
T47D T0 PR
1 _
0 _
T47D T30 PR 1nM
1 _
0 _
T47D T30 PR 2nM
1 _
0 _
T47D T30 PR 5nM
1 _
0 _
T47D T30 PR 10nM
1 _
0 _
T47D T30 PR 100nM
1 _
0 _
Control
samples
True
peaks
False
positive
Overlap of peaks genomic coordinates
http://bedtools.readthedocs.io/en/latest/content/tools/intersect.html
Overlap of peaks genomic coordinates
http://bedtools.readthedocs.io/en/latest/content/tools/intersect.html
Replicate 1
Replicate 2
Measure overlap between ChIP-seq
replicate samples (expected to be
high) as a quality metric
Overlap of peaks genomic coordinates
http://bedtools.readthedocs.io/en/latest/content/tools/intersect.html
Replicate 1
Replicate 2
Measure overlap between ChIP-seq
replicate samples (expected to be
high) as a quality metric
Protein A
Treatment 1
Protein B
Protein A
Treatment 2
Interrogate overlap
between proteins/
conditions
(Venn diagrams >3 groups cannot be
proportional and are harder to interpret)
Signal enrichment over regions
Gene expression
Gene promoter
ChIP-signal
Signal enrichment over regions
Gene expression
Gene promoter
ChIP-signal
…
Is there consistent ChIP-seq signal enrichment
over gene promoters?
Signal enrichment over regions
GenepromotersProteinCpeaksRandomregions
Protein A Protein B
For each promoter (rows) the
normalised protein A ChIP-seq
signal is shown for the promoter
(center of the row) as well as for its
flanking region
The darker the color in the
heatmap, the higher the
intensity of the ChIP-seq
signal (i.e. number of reads)
Average profile: curve
showing the average for all
rows (e.g. gene promoters)
Genomic distribution of peaks
Percentage of peaks falling in each of the annotation categories
Genomic distribution of peaks
Percentage of peaks falling in each of the annotation categories
Percentage of peaks at a given distance from a transcription start site (TSS)
Peak region
Genome sequence
Signal enrichment
Motif discovery analysis
Protein
binding
site
The fragment sequenced
includes sequence beyond
that of the actual binding
Motif discovery analysis
TGTTCT
TGTTCT
TGTTCT
TGTTCT
TGTTCT
TGTTCT
TGTTCT
TGTTCT
TGTTCT
TGTTCT
TGTTCT
TGTTCT
TGTTCT
TGTTCT
• Is there any motif/sequence over-
represented in the sequences of the
peaks (relative to the genome)?
• Does such motif correspond to any
annotated motif (e.g. transcription
factors)?
• Motif discovery allows defining the
binding site of the target protein as
well as the binding of secondary
proteins
• Need to account for the fact that the
peak may reflect a region broader
than the precise binding site
TGTTCT
Motif discovery analysis
1. Find consensus motif
The height of each letter is
proportional to its frequency in that
position within the motif
RC = reverse complement
Motif discovery analysis
1. Find consensus motif
The height of each letter is
proportional to its frequency in that
position within the motif
RC = reverse complement
2. Motif annotation
Search for known transcription
factors compatible with the
consensus motif
Downstream
analyses
Core
analysis
ChIP-seq
RNA-seq
…
- Sample-level
- Homogeneous
- Similar steps
across *seq types
- Multi-sample
- Project-specific
- Varied/flexible
- Combine different
*seq types
RNA-seq
DNA
Gene
Exon1
…
Exon2
ExonN
RNA-seq
DNA
Gene
Exon1
…
Exon2
ExonN
Transcription et al*
*splicing plus addition of polyA-tail
Poly-A tail
mRNA
RNA-seq
DNA
Gene
Exon1
…
Exon2
ExonN
Transcription et al*
*splicing plus addition of polyA-tail
Poly-A tail
mRNA
cDNA
Poly-A selection
+
cDNA synthesis
RNA-seq
DNA
Gene
Exon1
…
Exon2
ExonN
Transcription et al*
*splicing plus addition of polyA-tail
Poly-A tail
mRNA
cDNA
Poly-A selection
+
cDNA synthesis
RNA-seq experiment targeting
messenger RNA (mRNA) as this is
one of the most common
applications —however, other
applications exist (e.g. total RNA,
ribosomal RNA)
RNA-seq
DNA
Gene
Exon1
…
Exon2
ExonN
Transcription et al*
*splicing plus addition of polyA-tail
Poly-A tail
mRNA
cDNA
Technical sequences
(e.g. adapters)Poly-A selection
+
cDNA synthesis
RNA-seq experiment targeting
messenger RNA (mRNA) as this is
one of the most common
applications —however, other
applications exist (e.g. total RNA,
ribosomal RNA)
RNA-seq
DNA
Gene
Exon1
…
Exon2
ExonN
Transcription et al*
*splicing plus addition of polyA-tail
Poly-A tail
mRNA
cDNA
Technical sequences
(e.g. adapters)Poly-A selection
+
cDNA synthesis
RNA-seq experiment targeting
messenger RNA (mRNA) as this is
one of the most common
applications —however, other
applications exist (e.g. total RNA,
ribosomal RNA)
Paired end
Paired end
Core analysis (RNA-seq)
Trimming
- sequencing adapters
- low-quality ends
- too-short reads
Trimmomatic
Improves alignment to
genome sequence
Alignment
Genome sequence
Gene
Exon1
…
Exon2
ExonN
STAR
TopHat
GEM
…
Read-by-read sequence alignment to
genome sequence with the goal of
identifying the genomic location from
which the RNA originated
Some RNA-seq reads will originate
from different exons and thus map to
non-contiguos genomic positions —
RNA-seq aligners need to be aware
of this and split reads accordingly
(red region) during the alignment
Read counts profiles
Genome sequence
Gene
Exon1
…
Exon2
ExonN
STAR
bam2wig
BEDtools
SAMtools
Deeptools
…
Normalise by the number of
million reads to make different
experiment comparable
Expression quantification
• The number of reads is proportional to the level of expression (i.e. more RNA, more
reads)
• Expression quantification can be measured at either gene- or transcript-level
(Kallisto, HTSeq, featureCounts)
• There are several units to measure expression:
• read counts per gene/transcript —not normalised: does not account for the
sample library size (i.e. the number of reads sequenced) or gene/transcript
length
• Reads Per Kilobase of transcript per Million mapped reads (RPKM) —accounts
for library size and loci length
• Transcripts Per Million (TPM) —accounts for library size and loci length
• TPM is becoming more popular over RPKM and some argue the latter are
inconsistent across samples (http://blog.nextgenetics.net/?e=51)
Downstream analyses (RNA-seq)
Genome Browser
Treated
Scale
chr6:
T47D gDNA
GENCODE v24
Pseudogenes
Segmental Dups
Simple Repeats
RepeatMasker
WM + SDust
50 kb hg38
35,590,000 35,600,000 35,610,000 35,620,000 35,630,000 35,640,000 35,650,000 35,660,000 35,670,000 35,680,000
DNA-seq peaks indentified with MACS2 (without control)
T47D gDNA RPM profile
T47D T0 (fd_004_02_01_rnaseq) RPM profile
T47D T0 (fd_004_01_01_rnaseq) RPM profile
T47D T0 (fd_004_03_01_rnaseq) RPM profile
T47D R6 (fd_006_03_01_rnaseq) RPM profile
T47D R6 (fd_005_02_01_rnaseq) RPM profile
T47D R6 (fd_005_01_01_rnaseq) RPM profile
Basic Gene Annotation Set from GENCODE Version 24 (Ensembl 83)
GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default)
All GENCODE transcripts including comprehensive set V24
Duplications of >1000 Bases of Non-RepeatMasked Sequence
Simple Tandem Repeats by TRF
Repeating Elements by RepeatMasker
Genomic Intervals Masked by WindowMasker + SDust
FKBP5
FKBP5
FKBP5
FKBP5
SNORA40 MIR5690
T47D gDNA
1 _
0 _
T47D T0
0 _
-25 _
T47D T0
0 _
-25 _
T47D T0
1 _
-25 _
T47D R6
10 _
-25 _
0 -
T47D R6
0 _
-25 _
T47D R6
0 _
-25 _
Untreated
Genome Browser
Treated
Scale
chr6:
T47D gDNA
GENCODE v24
Pseudogenes
Segmental Dups
Simple Repeats
RepeatMasker
WM + SDust
50 kb hg38
35,590,000 35,600,000 35,610,000 35,620,000 35,630,000 35,640,000 35,650,000 35,660,000 35,670,000 35,680,000
DNA-seq peaks indentified with MACS2 (without control)
T47D gDNA RPM profile
T47D T0 (fd_004_02_01_rnaseq) RPM profile
T47D T0 (fd_004_01_01_rnaseq) RPM profile
T47D T0 (fd_004_03_01_rnaseq) RPM profile
T47D R6 (fd_006_03_01_rnaseq) RPM profile
T47D R6 (fd_005_02_01_rnaseq) RPM profile
T47D R6 (fd_005_01_01_rnaseq) RPM profile
Basic Gene Annotation Set from GENCODE Version 24 (Ensembl 83)
GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default)
All GENCODE transcripts including comprehensive set V24
Duplications of >1000 Bases of Non-RepeatMasked Sequence
Simple Tandem Repeats by TRF
Repeating Elements by RepeatMasker
Genomic Intervals Masked by WindowMasker + SDust
FKBP5
FKBP5
FKBP5
FKBP5
SNORA40 MIR5690
T47D gDNA
1 _
0 _
T47D T0
0 _
-25 _
T47D T0
0 _
-25 _
T47D T0
1 _
-25 _
T47D R6
10 _
-25 _
0 -
T47D R6
0 _
-25 _
T47D R6
0 _
-25 _
Untreated
Indeed, looking for genes that are show differential expression between two conditions (e.g. treated vs
untreated) is likely the most common application of RNA-seq
Obviously it is not performed by visual inspection in the browser but with dedicated software (sleuth,
DESeq2, edgeR) —these account for biological/technical variation between replicates and assign a
significance value to the differential expression
Differential expression
analysis
Differential expression
analysis
Downstream
analyses
Core
analysis
RNA-seq
Trimming
Trimmomatic
Alignment
BWA
Bowtie
GEM
ChIP-seq Read counts profiles
bam2wig
BEDtools
SAMtools
Deeptools
Peak calling
MACS2
Zerone
- Genome Browser
- Overlaps of peaks genomic
coordinates
- Signal enrichment over
regions
- Genomic distribution of peaks
- Motif discovery analysis
- …
Trimming
Trimmomatic
Alignment
STAR
TopHat
GEM
Read counts profiles
STAR
bam2wig
BEDtools
SAMtools
Deeptools
Expression
quantification
Kallisto
featureCounts
HTSeq
- Genome Browser
- Differential expression
analysis
- …

More Related Content

What's hot

Aug2014 nist integration plans
Aug2014 nist integration plansAug2014 nist integration plans
Aug2014 nist integration plansGenomeInABottle
 
W8_2: Inside the UoS Educational Processor
W8_2: Inside the UoS Educational ProcessorW8_2: Inside the UoS Educational Processor
W8_2: Inside the UoS Educational ProcessorDaniel Roggen
 
Winter training,Readymade Projects,Buy Projects,Corporate Training
Winter training,Readymade Projects,Buy Projects,Corporate TrainingWinter training,Readymade Projects,Buy Projects,Corporate Training
Winter training,Readymade Projects,Buy Projects,Corporate TrainingTechnogroovy
 
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...Kate Barlow
 
Samsung 1.8 inch AMOLED(128x60) Datasheet
Samsung 1.8 inch AMOLED(128x60) DatasheetSamsung 1.8 inch AMOLED(128x60) Datasheet
Samsung 1.8 inch AMOLED(128x60) DatasheetPanox Display
 
Scale17x buffer overflows
Scale17x buffer overflowsScale17x buffer overflows
Scale17x buffer overflowsjohseg
 
Optimized methods to use Cas9 nickases in genome editing
Optimized methods to use Cas9 nickases in genome editingOptimized methods to use Cas9 nickases in genome editing
Optimized methods to use Cas9 nickases in genome editingIntegrated DNA Technologies
 
Apollo 324
Apollo 324Apollo 324
Apollo 324kbebak
 
Reducing off-target events in CRISPR genome editing applications with a novel...
Reducing off-target events in CRISPR genome editing applications with a novel...Reducing off-target events in CRISPR genome editing applications with a novel...
Reducing off-target events in CRISPR genome editing applications with a novel...Integrated DNA Technologies
 
Target capture of DNA from FFPE samples— recommendations for generating robus...
Target capture of DNA from FFPE samples— recommendations for generating robus...Target capture of DNA from FFPE samples— recommendations for generating robus...
Target capture of DNA from FFPE samples— recommendations for generating robus...Integrated DNA Technologies
 
Embedded Systems Training & Live Projects @Technogroovy Systems India Pvt Ltd
Embedded Systems Training & Live Projects @Technogroovy Systems India Pvt Ltd Embedded Systems Training & Live Projects @Technogroovy Systems India Pvt Ltd
Embedded Systems Training & Live Projects @Technogroovy Systems India Pvt Ltd Technogroovy India
 
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...Eli Kaminuma
 
Hacking PLCs and Causing Havoc on Critical Infrastructures
Hacking PLCs and Causing Havoc on Critical InfrastructuresHacking PLCs and Causing Havoc on Critical Infrastructures
Hacking PLCs and Causing Havoc on Critical InfrastructuresPriyanka Aash
 
selected input/output - sensors and actuators
selected input/output - sensors and actuatorsselected input/output - sensors and actuators
selected input/output - sensors and actuatorsEueung Mulyana
 

What's hot (20)

Aug2014 nist integration plans
Aug2014 nist integration plansAug2014 nist integration plans
Aug2014 nist integration plans
 
W8_2: Inside the UoS Educational Processor
W8_2: Inside the UoS Educational ProcessorW8_2: Inside the UoS Educational Processor
W8_2: Inside the UoS Educational Processor
 
Winter training,Readymade Projects,Buy Projects,Corporate Training
Winter training,Readymade Projects,Buy Projects,Corporate TrainingWinter training,Readymade Projects,Buy Projects,Corporate Training
Winter training,Readymade Projects,Buy Projects,Corporate Training
 
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
 
Grodins model
Grodins modelGrodins model
Grodins model
 
GPCR ORF Clones
GPCR ORF ClonesGPCR ORF Clones
GPCR ORF Clones
 
Samsung 1.8 inch AMOLED(128x60) Datasheet
Samsung 1.8 inch AMOLED(128x60) DatasheetSamsung 1.8 inch AMOLED(128x60) Datasheet
Samsung 1.8 inch AMOLED(128x60) Datasheet
 
Ffpe pcr array
Ffpe pcr arrayFfpe pcr array
Ffpe pcr array
 
PIC18F452
PIC18F452PIC18F452
PIC18F452
 
Biblioteca PIC18F452
Biblioteca PIC18F452Biblioteca PIC18F452
Biblioteca PIC18F452
 
Scale17x buffer overflows
Scale17x buffer overflowsScale17x buffer overflows
Scale17x buffer overflows
 
Optimized methods to use Cas9 nickases in genome editing
Optimized methods to use Cas9 nickases in genome editingOptimized methods to use Cas9 nickases in genome editing
Optimized methods to use Cas9 nickases in genome editing
 
Apollo 324
Apollo 324Apollo 324
Apollo 324
 
Reducing off-target events in CRISPR genome editing applications with a novel...
Reducing off-target events in CRISPR genome editing applications with a novel...Reducing off-target events in CRISPR genome editing applications with a novel...
Reducing off-target events in CRISPR genome editing applications with a novel...
 
What the Fax!?
What the Fax!?What the Fax!?
What the Fax!?
 
Target capture of DNA from FFPE samples— recommendations for generating robus...
Target capture of DNA from FFPE samples— recommendations for generating robus...Target capture of DNA from FFPE samples— recommendations for generating robus...
Target capture of DNA from FFPE samples— recommendations for generating robus...
 
Embedded Systems Training & Live Projects @Technogroovy Systems India Pvt Ltd
Embedded Systems Training & Live Projects @Technogroovy Systems India Pvt Ltd Embedded Systems Training & Live Projects @Technogroovy Systems India Pvt Ltd
Embedded Systems Training & Live Projects @Technogroovy Systems India Pvt Ltd
 
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
 
Hacking PLCs and Causing Havoc on Critical Infrastructures
Hacking PLCs and Causing Havoc on Critical InfrastructuresHacking PLCs and Causing Havoc on Critical Infrastructures
Hacking PLCs and Causing Havoc on Critical Infrastructures
 
selected input/output - sensors and actuators
selected input/output - sensors and actuatorsselected input/output - sensors and actuators
selected input/output - sensors and actuators
 

Viewers also liked

RNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then someRNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then somebasepairtech
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)James Hadfield
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 

Viewers also liked (7)

RNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then someRNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then some
 
presentation
presentationpresentation
presentation
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 

Similar to 20161021_master_lesson_no_feedback

D Belver FEE for Trasgos
D Belver  FEE for TrasgosD Belver  FEE for Trasgos
D Belver FEE for TrasgosMiguel Morales
 
Pp gef rt2_profiler_0212_web
Pp gef rt2_profiler_0212_webPp gef rt2_profiler_0212_web
Pp gef rt2_profiler_0212_webElsa von Licy
 
B_Command_Parameters.pdf
B_Command_Parameters.pdfB_Command_Parameters.pdf
B_Command_Parameters.pdfwafawafa52
 
Anne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentationAnne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentationAnne Vaittinen
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Thermo Fisher Scientific
 
Aai 2007-pcr array-poster
Aai 2007-pcr array-posterAai 2007-pcr array-poster
Aai 2007-pcr array-posterElsa von Licy
 
Ascb 2007-pcr array-poster
Ascb 2007-pcr array-posterAscb 2007-pcr array-poster
Ascb 2007-pcr array-posterElsa von Licy
 
RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...
RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...
RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...QIAGEN
 
Chipqpcrpresentation
ChipqpcrpresentationChipqpcrpresentation
ChipqpcrpresentationElsa von Licy
 
Microprocessor 8086 instructions
Microprocessor 8086 instructionsMicroprocessor 8086 instructions
Microprocessor 8086 instructionsRavi Anand
 
ATE Testers Overview
ATE Testers OverviewATE Testers Overview
ATE Testers Overviewstn_tkiller
 
Mi rna data analysis 2013
Mi rna data analysis 2013Mi rna data analysis 2013
Mi rna data analysis 2013Elsa von Licy
 
100513_homology_search(ensembl)
100513_homology_search(ensembl)100513_homology_search(ensembl)
100513_homology_search(ensembl)ocha_kaneko
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdfFrangoCamila
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging RubyAman Gupta
 
Jitsi Videobridge, Octopodes, and Kotlin
Jitsi Videobridge, Octopodes, and KotlinJitsi Videobridge, Octopodes, and Kotlin
Jitsi Videobridge, Octopodes, and KotlinBoris Grozev
 

Similar to 20161021_master_lesson_no_feedback (20)

D Belver FEE for Trasgos
D Belver  FEE for TrasgosD Belver  FEE for Trasgos
D Belver FEE for Trasgos
 
Ad7716
Ad7716Ad7716
Ad7716
 
Pp gef rt2_profiler_0212_web
Pp gef rt2_profiler_0212_webPp gef rt2_profiler_0212_web
Pp gef rt2_profiler_0212_web
 
Abrf poster2007
Abrf poster2007Abrf poster2007
Abrf poster2007
 
B_Command_Parameters.pdf
B_Command_Parameters.pdfB_Command_Parameters.pdf
B_Command_Parameters.pdf
 
Anne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentationAnne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentation
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
 
Aai 2007-pcr array-poster
Aai 2007-pcr array-posterAai 2007-pcr array-poster
Aai 2007-pcr array-poster
 
Ascb 2007-pcr array-poster
Ascb 2007-pcr array-posterAscb 2007-pcr array-poster
Ascb 2007-pcr array-poster
 
RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...
RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...
RT2 Profiler PCR Arrays: Pathway-focused Gene Expression Profiling with qRT-P...
 
Chipqpcrpresentation
ChipqpcrpresentationChipqpcrpresentation
Chipqpcrpresentation
 
Microprocessor 8086 instructions
Microprocessor 8086 instructionsMicroprocessor 8086 instructions
Microprocessor 8086 instructions
 
ATE Testers Overview
ATE Testers OverviewATE Testers Overview
ATE Testers Overview
 
Mi rna data analysis 2013
Mi rna data analysis 2013Mi rna data analysis 2013
Mi rna data analysis 2013
 
100513_homology_search(ensembl)
100513_homology_search(ensembl)100513_homology_search(ensembl)
100513_homology_search(ensembl)
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdf
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 
Arm architecture
Arm architectureArm architecture
Arm architecture
 
3D-DRESD ASIDA
3D-DRESD ASIDA3D-DRESD ASIDA
3D-DRESD ASIDA
 
Jitsi Videobridge, Octopodes, and Kotlin
Jitsi Videobridge, Octopodes, and KotlinJitsi Videobridge, Octopodes, and Kotlin
Jitsi Videobridge, Octopodes, and Kotlin
 

20161021_master_lesson_no_feedback

  • 1. Here you have your reads: now what? Making sense of high-throughput sequencing illustrated with ChIP- and RNA-seq data Javier Quilez Oliete - Bioinformatician @ Beato Lab 1
  • 2. Downstream analyses Core analysis ChIP-seq RNA-seq … - Sample-level - Homogeneous - Similar steps across *seq types - Multi-sample - Project-specific - Varied/flexible - Combine different *seq types
  • 6. ChIP-seq DNA Protein Formaldehyde (chemical binding) X X Technical sequences (e.g. adapters) ChIP fragment X Sonication (physical fragmentation) The fragment sequenced includes sequence beyond that of the actual binding
  • 7. ChIP-seq DNA Protein Formaldehyde (chemical binding) X X Technical sequences (e.g. adapters) Single end Single end ChIP fragment X Sonication (physical fragmentation) Most common
  • 8. ChIP-seq DNA Protein Formaldehyde (chemical binding) X X Technical sequences (e.g. adapters) Paired end Paired end ChIP fragment X Sonication (physical fragmentation)
  • 10. Trimming - sequencing adapters - low-quality ends - too-short reads Trimmomatic Improves alignment to genome sequence
  • 12. Alignment Protein binding site Genome sequence BWA Bowtie GEM … Read-by-read sequence alignment to genome sequence with the goal of identifying the genomic location from which the ChIP fragment originated
  • 13. Read counts profiles Protein binding site Genome sequence bam2wig BEDtools SAMtools Deeptools …
  • 14. Read counts profiles 100 million reads 10 million reads Not comparable!
  • 15. Read counts profiles Reads per million Comparable!
  • 17. Peak calling Genome sequence Signal background Signal enrichment
  • 18. Peak calling Peak region Genome sequence Identification of regions showing significant signal enrichment over the background levels (MACS2, Zerone…) Signal background Signal enrichment
  • 19. Peak calling Control (no ChIP) ChIP sample Peak region Signal enrichment Signal enrichment
  • 20. Peak calling Control (no ChIP) ChIP sample Including a control sample allows accounting for spurious enrichments (resulting from structural variation in the genome, ChIP artefacts) and improves the accuracy of the peak calling by reducing the false positives Peak region True enrichment Spurious enrichment
  • 22. Genome Browser Scale chr9: T47D gDNA T47D T0 Roberto input T47D PR T0 T47D PR T60 T47D T0 PR T47D T30 PR 1nM T47D T30 PR 2nM T47D T30 PR 5nM T47D T30 PR 10nM T47D T30 PR 100nM GENCODE v24 Pseudogenes Segmental Dups Simple Repeats RepeatMasker WM + SDust T47D PR T0 [0] 50 kb hg38 137,300,000 137,350,000 137,400,000 DNA-seq peaks indentified with MACS2 (without control) T47D gDNA RPM profile ChIP-seq peaks indentified with MACS2 Input T0 (Roberto) RPM profile ChIP-seq peaks indentified with MACS2 T47D PR T0 (gv_009_02_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D PR T60 (gv_066_01_01_chipseq) RPM profile T47D input (gv_098_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T0 PR (gv_092_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T30 PR 1nM (gv_093_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T30 PR 2nM (gv_094_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T30 PR 5nM (gv_095_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T30 PR 10nM (gv_097_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T30 PR 100nM (gv_096_01_01_chipseq) RPM profile GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default) All GENCODE transcripts including comprehensive set V24 Duplications of >1000 Bases of Non-RepeatMasked Sequence Simple Tandem Repeats by TRF Repeating Elements by RepeatMasker Genomic Intervals Masked by WindowMasker + SDust ChIP-seq peaks indentified with MACS2 T47D gDNA 1 _ 0 _ Input T0 (Roberto) 1 _ 0 _ T47D PR T0 1 _ 0 _ T47D PR T60 1 _ 0 _ T47D input 1 _ 0 _ T47D T0 PR 1 _ 0 _ T47D T30 PR 1nM 1 _ 0 _ T47D T30 PR 2nM 1 _ 0 _ T47D T30 PR 5nM 1 _ 0 _ T47D T30 PR 10nM 1 _ 0 _ T47D T30 PR 100nM 1 _ 0 _ Control samples
  • 23. Genome Browser Scale chr9: T47D gDNA T47D T0 Roberto input T47D PR T0 T47D PR T60 T47D T0 PR T47D T30 PR 1nM T47D T30 PR 2nM T47D T30 PR 5nM T47D T30 PR 10nM T47D T30 PR 100nM GENCODE v24 Pseudogenes Segmental Dups Simple Repeats RepeatMasker WM + SDust T47D PR T0 [0] 50 kb hg38 137,300,000 137,350,000 137,400,000 DNA-seq peaks indentified with MACS2 (without control) T47D gDNA RPM profile ChIP-seq peaks indentified with MACS2 Input T0 (Roberto) RPM profile ChIP-seq peaks indentified with MACS2 T47D PR T0 (gv_009_02_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D PR T60 (gv_066_01_01_chipseq) RPM profile T47D input (gv_098_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T0 PR (gv_092_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T30 PR 1nM (gv_093_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T30 PR 2nM (gv_094_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T30 PR 5nM (gv_095_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T30 PR 10nM (gv_097_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T30 PR 100nM (gv_096_01_01_chipseq) RPM profile GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default) All GENCODE transcripts including comprehensive set V24 Duplications of >1000 Bases of Non-RepeatMasked Sequence Simple Tandem Repeats by TRF Repeating Elements by RepeatMasker Genomic Intervals Masked by WindowMasker + SDust ChIP-seq peaks indentified with MACS2 T47D gDNA 1 _ 0 _ Input T0 (Roberto) 1 _ 0 _ T47D PR T0 1 _ 0 _ T47D PR T60 1 _ 0 _ T47D input 1 _ 0 _ T47D T0 PR 1 _ 0 _ T47D T30 PR 1nM 1 _ 0 _ T47D T30 PR 2nM 1 _ 0 _ T47D T30 PR 5nM 1 _ 0 _ T47D T30 PR 10nM 1 _ 0 _ T47D T30 PR 100nM 1 _ 0 _ Control samples True peaks
  • 24. Genome Browser Scale chr9: T47D gDNA T47D T0 Roberto input T47D PR T0 T47D PR T60 T47D T0 PR T47D T30 PR 1nM T47D T30 PR 2nM T47D T30 PR 5nM T47D T30 PR 10nM T47D T30 PR 100nM GENCODE v24 Pseudogenes Segmental Dups Simple Repeats RepeatMasker WM + SDust T47D PR T0 [0] 50 kb hg38 137,300,000 137,350,000 137,400,000 DNA-seq peaks indentified with MACS2 (without control) T47D gDNA RPM profile ChIP-seq peaks indentified with MACS2 Input T0 (Roberto) RPM profile ChIP-seq peaks indentified with MACS2 T47D PR T0 (gv_009_02_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D PR T60 (gv_066_01_01_chipseq) RPM profile T47D input (gv_098_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T0 PR (gv_092_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T30 PR 1nM (gv_093_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T30 PR 2nM (gv_094_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T30 PR 5nM (gv_095_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T30 PR 10nM (gv_097_01_01_chipseq) RPM profile ChIP-seq peaks indentified with MACS2 T47D T30 PR 100nM (gv_096_01_01_chipseq) RPM profile GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default) All GENCODE transcripts including comprehensive set V24 Duplications of >1000 Bases of Non-RepeatMasked Sequence Simple Tandem Repeats by TRF Repeating Elements by RepeatMasker Genomic Intervals Masked by WindowMasker + SDust ChIP-seq peaks indentified with MACS2 T47D gDNA 1 _ 0 _ Input T0 (Roberto) 1 _ 0 _ T47D PR T0 1 _ 0 _ T47D PR T60 1 _ 0 _ T47D input 1 _ 0 _ T47D T0 PR 1 _ 0 _ T47D T30 PR 1nM 1 _ 0 _ T47D T30 PR 2nM 1 _ 0 _ T47D T30 PR 5nM 1 _ 0 _ T47D T30 PR 10nM 1 _ 0 _ T47D T30 PR 100nM 1 _ 0 _ Control samples True peaks False positive
  • 25. Overlap of peaks genomic coordinates http://bedtools.readthedocs.io/en/latest/content/tools/intersect.html
  • 26. Overlap of peaks genomic coordinates http://bedtools.readthedocs.io/en/latest/content/tools/intersect.html Replicate 1 Replicate 2 Measure overlap between ChIP-seq replicate samples (expected to be high) as a quality metric
  • 27. Overlap of peaks genomic coordinates http://bedtools.readthedocs.io/en/latest/content/tools/intersect.html Replicate 1 Replicate 2 Measure overlap between ChIP-seq replicate samples (expected to be high) as a quality metric Protein A Treatment 1 Protein B Protein A Treatment 2 Interrogate overlap between proteins/ conditions (Venn diagrams >3 groups cannot be proportional and are harder to interpret)
  • 28. Signal enrichment over regions Gene expression Gene promoter ChIP-signal
  • 29. Signal enrichment over regions Gene expression Gene promoter ChIP-signal … Is there consistent ChIP-seq signal enrichment over gene promoters?
  • 30. Signal enrichment over regions GenepromotersProteinCpeaksRandomregions Protein A Protein B For each promoter (rows) the normalised protein A ChIP-seq signal is shown for the promoter (center of the row) as well as for its flanking region The darker the color in the heatmap, the higher the intensity of the ChIP-seq signal (i.e. number of reads) Average profile: curve showing the average for all rows (e.g. gene promoters)
  • 31. Genomic distribution of peaks Percentage of peaks falling in each of the annotation categories
  • 32. Genomic distribution of peaks Percentage of peaks falling in each of the annotation categories Percentage of peaks at a given distance from a transcription start site (TSS)
  • 33. Peak region Genome sequence Signal enrichment Motif discovery analysis Protein binding site The fragment sequenced includes sequence beyond that of the actual binding
  • 34. Motif discovery analysis TGTTCT TGTTCT TGTTCT TGTTCT TGTTCT TGTTCT TGTTCT TGTTCT TGTTCT TGTTCT TGTTCT TGTTCT TGTTCT TGTTCT • Is there any motif/sequence over- represented in the sequences of the peaks (relative to the genome)? • Does such motif correspond to any annotated motif (e.g. transcription factors)? • Motif discovery allows defining the binding site of the target protein as well as the binding of secondary proteins • Need to account for the fact that the peak may reflect a region broader than the precise binding site TGTTCT
  • 35. Motif discovery analysis 1. Find consensus motif The height of each letter is proportional to its frequency in that position within the motif RC = reverse complement
  • 36. Motif discovery analysis 1. Find consensus motif The height of each letter is proportional to its frequency in that position within the motif RC = reverse complement 2. Motif annotation Search for known transcription factors compatible with the consensus motif
  • 37. Downstream analyses Core analysis ChIP-seq RNA-seq … - Sample-level - Homogeneous - Similar steps across *seq types - Multi-sample - Project-specific - Varied/flexible - Combine different *seq types
  • 39. RNA-seq DNA Gene Exon1 … Exon2 ExonN Transcription et al* *splicing plus addition of polyA-tail Poly-A tail mRNA
  • 40. RNA-seq DNA Gene Exon1 … Exon2 ExonN Transcription et al* *splicing plus addition of polyA-tail Poly-A tail mRNA cDNA Poly-A selection + cDNA synthesis
  • 41. RNA-seq DNA Gene Exon1 … Exon2 ExonN Transcription et al* *splicing plus addition of polyA-tail Poly-A tail mRNA cDNA Poly-A selection + cDNA synthesis RNA-seq experiment targeting messenger RNA (mRNA) as this is one of the most common applications —however, other applications exist (e.g. total RNA, ribosomal RNA)
  • 42. RNA-seq DNA Gene Exon1 … Exon2 ExonN Transcription et al* *splicing plus addition of polyA-tail Poly-A tail mRNA cDNA Technical sequences (e.g. adapters)Poly-A selection + cDNA synthesis RNA-seq experiment targeting messenger RNA (mRNA) as this is one of the most common applications —however, other applications exist (e.g. total RNA, ribosomal RNA)
  • 43. RNA-seq DNA Gene Exon1 … Exon2 ExonN Transcription et al* *splicing plus addition of polyA-tail Poly-A tail mRNA cDNA Technical sequences (e.g. adapters)Poly-A selection + cDNA synthesis RNA-seq experiment targeting messenger RNA (mRNA) as this is one of the most common applications —however, other applications exist (e.g. total RNA, ribosomal RNA) Paired end Paired end
  • 45. Trimming - sequencing adapters - low-quality ends - too-short reads Trimmomatic Improves alignment to genome sequence
  • 46. Alignment Genome sequence Gene Exon1 … Exon2 ExonN STAR TopHat GEM … Read-by-read sequence alignment to genome sequence with the goal of identifying the genomic location from which the RNA originated Some RNA-seq reads will originate from different exons and thus map to non-contiguos genomic positions — RNA-seq aligners need to be aware of this and split reads accordingly (red region) during the alignment
  • 47. Read counts profiles Genome sequence Gene Exon1 … Exon2 ExonN STAR bam2wig BEDtools SAMtools Deeptools … Normalise by the number of million reads to make different experiment comparable
  • 48. Expression quantification • The number of reads is proportional to the level of expression (i.e. more RNA, more reads) • Expression quantification can be measured at either gene- or transcript-level (Kallisto, HTSeq, featureCounts) • There are several units to measure expression: • read counts per gene/transcript —not normalised: does not account for the sample library size (i.e. the number of reads sequenced) or gene/transcript length • Reads Per Kilobase of transcript per Million mapped reads (RPKM) —accounts for library size and loci length • Transcripts Per Million (TPM) —accounts for library size and loci length • TPM is becoming more popular over RPKM and some argue the latter are inconsistent across samples (http://blog.nextgenetics.net/?e=51)
  • 50. Genome Browser Treated Scale chr6: T47D gDNA GENCODE v24 Pseudogenes Segmental Dups Simple Repeats RepeatMasker WM + SDust 50 kb hg38 35,590,000 35,600,000 35,610,000 35,620,000 35,630,000 35,640,000 35,650,000 35,660,000 35,670,000 35,680,000 DNA-seq peaks indentified with MACS2 (without control) T47D gDNA RPM profile T47D T0 (fd_004_02_01_rnaseq) RPM profile T47D T0 (fd_004_01_01_rnaseq) RPM profile T47D T0 (fd_004_03_01_rnaseq) RPM profile T47D R6 (fd_006_03_01_rnaseq) RPM profile T47D R6 (fd_005_02_01_rnaseq) RPM profile T47D R6 (fd_005_01_01_rnaseq) RPM profile Basic Gene Annotation Set from GENCODE Version 24 (Ensembl 83) GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default) All GENCODE transcripts including comprehensive set V24 Duplications of >1000 Bases of Non-RepeatMasked Sequence Simple Tandem Repeats by TRF Repeating Elements by RepeatMasker Genomic Intervals Masked by WindowMasker + SDust FKBP5 FKBP5 FKBP5 FKBP5 SNORA40 MIR5690 T47D gDNA 1 _ 0 _ T47D T0 0 _ -25 _ T47D T0 0 _ -25 _ T47D T0 1 _ -25 _ T47D R6 10 _ -25 _ 0 - T47D R6 0 _ -25 _ T47D R6 0 _ -25 _ Untreated
  • 51. Genome Browser Treated Scale chr6: T47D gDNA GENCODE v24 Pseudogenes Segmental Dups Simple Repeats RepeatMasker WM + SDust 50 kb hg38 35,590,000 35,600,000 35,610,000 35,620,000 35,630,000 35,640,000 35,650,000 35,660,000 35,670,000 35,680,000 DNA-seq peaks indentified with MACS2 (without control) T47D gDNA RPM profile T47D T0 (fd_004_02_01_rnaseq) RPM profile T47D T0 (fd_004_01_01_rnaseq) RPM profile T47D T0 (fd_004_03_01_rnaseq) RPM profile T47D R6 (fd_006_03_01_rnaseq) RPM profile T47D R6 (fd_005_02_01_rnaseq) RPM profile T47D R6 (fd_005_01_01_rnaseq) RPM profile Basic Gene Annotation Set from GENCODE Version 24 (Ensembl 83) GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default) All GENCODE transcripts including comprehensive set V24 Duplications of >1000 Bases of Non-RepeatMasked Sequence Simple Tandem Repeats by TRF Repeating Elements by RepeatMasker Genomic Intervals Masked by WindowMasker + SDust FKBP5 FKBP5 FKBP5 FKBP5 SNORA40 MIR5690 T47D gDNA 1 _ 0 _ T47D T0 0 _ -25 _ T47D T0 0 _ -25 _ T47D T0 1 _ -25 _ T47D R6 10 _ -25 _ 0 - T47D R6 0 _ -25 _ T47D R6 0 _ -25 _ Untreated Indeed, looking for genes that are show differential expression between two conditions (e.g. treated vs untreated) is likely the most common application of RNA-seq Obviously it is not performed by visual inspection in the browser but with dedicated software (sleuth, DESeq2, edgeR) —these account for biological/technical variation between replicates and assign a significance value to the differential expression
  • 54. Downstream analyses Core analysis RNA-seq Trimming Trimmomatic Alignment BWA Bowtie GEM ChIP-seq Read counts profiles bam2wig BEDtools SAMtools Deeptools Peak calling MACS2 Zerone - Genome Browser - Overlaps of peaks genomic coordinates - Signal enrichment over regions - Genomic distribution of peaks - Motif discovery analysis - … Trimming Trimmomatic Alignment STAR TopHat GEM Read counts profiles STAR bam2wig BEDtools SAMtools Deeptools Expression quantification Kallisto featureCounts HTSeq - Genome Browser - Differential expression analysis - …