SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
DESeq, voom and vst
Qiang Kou
qkou@umail.iu.edu
April 28, 2014
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 1 / 31
Background
Advantages of RNA-seq Compared to Microarray
Detecting novel transcripts and isoforms
High reproducibility, low background
Detection of gene fusions and SNPs
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 2 / 31
Background
Differential Expression Analysis
Steps
Normalization
Dispersion estimation
Statistical testing
Methods to be presented
DESeq: negative binomial distribution [1]
voom: variance modelling at the observational level [2]
vst: variance-stabilizing transformation [1, 3]
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 3 / 31
Background
Timeline
2002 2004 2006 2008 2010 2012 2014 2016
vst
lim
m
a
cuffl
inksD
Eseq,edgeR
baySeq
voom
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 4 / 31
Background
Why different models?
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 5 / 31
Background
RNA-seq is Discrete
Garber et al. (2011) Nature Methods 8:469-477
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 6 / 31
Background
Length Normalization
Within sample: gene length
Between samples: library size
RPKM and FPKM
Reads/fragments per kilobase per million mapped reads
Normalization for gene length and library size
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 7 / 31
Background
Different Distribution
0.0
0.2
0.4
0.6
1 2 3 4
expression
density
(a) Microarray
0.0
0.1
0.2
0.3
0.4
−2 0 2 4
log10(fpkm)
density
condition
Untreated
CG8144_RNAi
genes
(b) RNA-seq
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 8 / 31
Background
Differential Expression as a Function of Transcript Length
0 2000 4000 6000 8000
020406080
Sequencing Data (Sultan)
%DE
a
0 2000 4000 6000 8000
020406080
Array Data (Sultan)
Transcript length (bp)
%DE
b
2000 4000 6000 8000 10000
024681012
Sequencing Data (Cloonan)
Transcript length (bp)
%DE
c
0 1000 2000 3000 4000 5000 6000 7000
020406080
Sequencing Data (Marioni)
d
1000 3000 5000 7000
020406080
Array Data (Marioni)
Transcript length (bp)
e
1000 2000 3000 4000 5000 6000 7000
020406080
Sequencing Data (Marioni)
f
1000 2000 3000 4000 5000 6000 7000
020406080
Array Data (Marioni)
Transcript length (bp)
g
Oshlack et al. (2009) Biology Direct 4:14
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 9 / 31
Background
Poisson and Negative Binomial Distribution
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 10 / 31
Background
Poisson Distribution
Graph from Wikipedia
Pr(X = k) = λk
e−λ
k!
E(x) = Var(X) = λ
A list of genes g1, g2, . . . gn
X ∼ Poisson(λ), a random variable
representing the number of reads
falling in gi
Likelihood ratio test
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 11 / 31
Background
Negative Binomial Distribution
Graph from Wikipedia
X ∼ NB(r; p)
Pr(X = k) = Ck
k+r−1pk
(1 − p)r
p: probability of success
r: predefined number of failures
X: number of successes until r
failures
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 12 / 31
Background
DEseq, voom and vst
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 13 / 31
DEseq, voom and vst
Normalization in DESeq
Assumption
Most genes not expressed differentially
Differentially expressed genes divided equally between up- and down-regulation
Steps
Geometric mean of gene’s counts across all samples
Divide gene’s counts by the geometric mean
Normalization factor: median of ratios
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 14 / 31
Model in DESeq
Model in DESeq
Read counts for gene i in sample j follows negative binomial distribution
Kij ∼ NB(µij , σ2
ij )
Why not Poisson distribution?
In RNA-seq, variance is larger than mean
Very difficult to estimate µij and σ2
ij
Parameters estimation is the main difference between methods based on NB
distribution
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 15 / 31
Model in DESeq
Model in DESeq
Count sum for gene i in condition A: a
Count sum for gene i in condition B: b
Sum: κ = a + b
p(a), p(b) and p(a, b)
p-value:
p =
i+j=κ,p(i,j)<p(a,b) p(i, j)
i+j=κ p(i, j)
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 16 / 31
Model in DESeq
R code for DESeq
library(DESeq)
DESeq.cds = newCountDataSet(countData = data.sim$counts,
conditions = factor(data.sim$treatment))
DESeq.cds = estimateSizeFactors(DESeq.cds)
DESeq.cds = estimateDispersions(DESeq.cds, fitType = "local")
DESeq.test = nbinomTest(DESeq.cds, "1", "2")
DESeq.pvalues = DESeq.test$pval
DESeq.adjpvalues = p.adjust(DESeq.pvalues, method = "BH")
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 17 / 31
Model in limma
Model in limma
Linear Models for Microarray Data: lmFit()
Classical t-test: tj =
µ1j −µ2j
σ2
j ( 1
n1
+ 1
n2
)
Very hard to get the σ2
j from a small sample size
limma: moderated t-test
Use information from other genes
σ2
j ∼ Inverse Gamma(α, β)
Empirical Bayesian for parameter estimate: eBayes()
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 18 / 31
Model in voom
Model in voom
voom: variance modelling at the observational level
Locally weighted regression to get the relation between count and variance
Moderated t-test in limma
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 19 / 31
Model in voom
Model in voom
4 6 8 10 12 14
0.00.20.40.60.81.0
Average log2(count size + 0.5)
Sqrt(standarddeviation)
a
4 6 8 10 12 14
Average log2(count size + 0.5)
voom: Mean−variance trend
b
4 6 8 10 12 14
Fitted log2(count size + 0.5)
c
1.2
Law et al. Genome Biology 2014, 15:R29
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 20 / 31
Model in voom
R code for voom
library(limma)
library(DESeq)
group = factor(conditions)
nf = calcNormFactors(data.matrix, method = "TMM")
voom.data = voom(data.matrix, design = model.matrix(~group),
lib.size = colSums(data.matrix) * nf)
voom.data$genes = rownames(data.matrix)
voom.fitlimma = lmFit(voom.data, design = model.matrix(~group))
voom.fitbayes = eBayes(voom.fitlimma)
voom.pvalues = voom.fitbayes$p.value[, 2]
voom.adjpvalues = p.adjust(voom.pvalues, method = "BH")
voom.genes <- data.matrix[which(voom.adjpvalues <=
0.05), ]
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 21 / 31
Model in vst
Model in vst
Variance-stabilizing transformation
To find a simple function f to create new values y = f (x) that the variability
of y is not related to mean
A method used in microarray data analysis [4]
Moderated t-test in limma
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 22 / 31
Model in vst
R code for vst
library(DESeq)
library(limma)
group = factor(conditions)
DESeq.cds = newCountDataSet(countData = data.matrix,
conditions = group)
DESeq.cds = estimateSizeFactors(DESeq.cds)
DESeq.cds = estimateDispersions(DESeq.cds, method = "blind",
fitType = "local")
DESeq.vst = getVarianceStabilizedData(DESeq.cds)
DESeq.vst.fitlimma = lmFit(DESeq.vst, design = model.matrix(~group))
DESeq.vst.fitbayes = eBayes(DESeq.vst.fitlimma)
DESeq.vst.pvalues = DESeq.vst.fitbayes$p.value[, 2]
DESeq.vst.adjpvalues = p.adjust(DESeq.vst.pvalues,
method = "BH")
DESeq.vst.genes <- data.matrix[which(DESeq.vst.adjpvalues <=
0.05), ]
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 23 / 31
Results from Simulation
AUC Results
0.5
0.6
0.7
0.8
5.0 7.5 10.0 12.5 15.0
#sample/condition
AUC
software
baySeq
DESeq
EBSeq
edgeR
NBPSeq
SAMseq
ShrinkSeq
TSPM.
voom
vst
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 24 / 31
Results from Simulation
Differential Expression Gene Number
1
10
baySeq
DESeq
NBPSeq
voom
vst
edgeR
ShrinkSeq
TSPM
EBSeq
SAMSeq
software
value
variable
correct
incorrect
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 25 / 31
Results from Simulation
Running Time
0
100
200
300
400
500
5.0 7.5 10.0 12.5 15.0
#sample/condition
time(sec)
software
baySeq
DESeq
EBSeq
edgeR
NBPSeq
SAMseq
ShrinkSeq
TSPM
voom
vst
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 26 / 31
Results from Simulation
Running Time with 15 Samples per Condition
Software AUC Time
edgeR 0.810 0.630
DESeq 0.652 48.388
NBPSeq 0.767 24.942
baySeq 0.495 210.781
EBSeq 0.769 12.666
TSPM 0.836 7.486
SAMseq 0.827 1.801
voom 0.835 0.264
vst 0.830 0.138
ShrinkSeq 0.796 343.260
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 27 / 31
Results from Simulation
Venn Diagram for Drosophila melanogaster
4
7
13
11
310
178
17
DESeq voom
vst
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 28 / 31
Some Conclusion
Some Conclusion
Each method has many assumptions
Negative binomial model has a relatively better specificity and sensitivity
Good performance of voom and vst in accuracy and time, no difference
between them
All methods will have better performance with larger sample, however,
sample size very limited in practice
Different normalization in cuffdiff: both alternative isoforms and length of
transcripts
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 29 / 31
Some Conclusion
References
Simon Anders and Wolfgang Huber.
Differential expression analysis for sequence count data.
Genome Biology, 11:R106, 2010.
Charity W Law, Yunshun Chen, Wei Shi, and Gordon K Smyth.
Voom: precision weights unlock linear model analysis tools for rna-seq read counts.
Genome Biology, 15(2):R29, 2014.
Gordon K Smyth.
Linear models and empirical bayes methods for assessing differential expression in microarray
experiments.
Statistical Applications in Genetics and Molecular Biology, 3:Article 3, 2004.
Blythe P Durbin, Johanna S Hardin, Douglas M Hawkins, and David M Rocke.
A variance-stabilizing transformation for gene-expression microarray data.
Bioinformatics, pages S105–S110, 2002.
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 30 / 31
Thanks
Thanks
Thank you for your time!
Qiang Kou
qkou@umail.iu.edu
Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 31 / 31

Mais conteúdo relacionado

Mais procurados

楽にggplotを描く・整える
楽にggplotを描く・整える楽にggplotを描く・整える
楽にggplotを描く・整えるdaiki hojo
 
SparkとCassandraの美味しい関係
SparkとCassandraの美味しい関係SparkとCassandraの美味しい関係
SparkとCassandraの美味しい関係datastaxjp
 
巨大な表を高速に扱うData.table について
巨大な表を高速に扱うData.table について巨大な表を高速に扱うData.table について
巨大な表を高速に扱うData.table についてHaruka Ozaki
 
第三回ありえる社内勉強会 「いわががのLombok」
第三回ありえる社内勉強会 「いわががのLombok」第三回ありえる社内勉強会 「いわががのLombok」
第三回ありえる社内勉強会 「いわががのLombok」yoshiaki iwanaga
 
パフォーマンス ボトルネック 国内あるある事例
パフォーマンス ボトルネック 国内あるある事例パフォーマンス ボトルネック 国内あるある事例
パフォーマンス ボトルネック 国内あるある事例日本Javaユーザーグループ
 
1 6.変数選択とAIC
1 6.変数選択とAIC1 6.変数選択とAIC
1 6.変数選択とAIClogics-of-blue
 
Rustに触れて私のPythonはどう変わったか
Rustに触れて私のPythonはどう変わったかRustに触れて私のPythonはどう変わったか
Rustに触れて私のPythonはどう変わったかShunsukeNakamura17
 
Kaggle&競プロ紹介 in 中田研究室
Kaggle&競プロ紹介 in 中田研究室Kaggle&競プロ紹介 in 中田研究室
Kaggle&競プロ紹介 in 中田研究室Takami Sato
 
Scala警察のすすめ
Scala警察のすすめScala警察のすすめ
Scala警察のすすめtakezoe
 
さくっとはじめるテキストマイニング(R言語)  スタートアップ編
さくっとはじめるテキストマイニング(R言語)  スタートアップ編さくっとはじめるテキストマイニング(R言語)  スタートアップ編
さくっとはじめるテキストマイニング(R言語)  スタートアップ編Yutaka Shimada
 
Visual Studio CodeでRを使う
Visual Studio CodeでRを使うVisual Studio CodeでRを使う
Visual Studio CodeでRを使うAtsushi Hayakawa
 
非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2nd非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2ndMika Yoshimura
 
最高の統計ソフトウェアはどれか? "What’s the Best Statistical Software? A Comparison of R, Py...
最高の統計ソフトウェアはどれか? "What’s the Best Statistical Software? A Comparison of R, Py...最高の統計ソフトウェアはどれか? "What’s the Best Statistical Software? A Comparison of R, Py...
最高の統計ソフトウェアはどれか? "What’s the Best Statistical Software? A Comparison of R, Py...ケンタ タナカ
 
マルコフ連鎖モンテカルロ法入門-2
マルコフ連鎖モンテカルロ法入門-2マルコフ連鎖モンテカルロ法入門-2
マルコフ連鎖モンテカルロ法入門-2Nagi Teramo
 
Tokyo r94 beginnerssession3
Tokyo r94 beginnerssession3Tokyo r94 beginnerssession3
Tokyo r94 beginnerssession3kotora_0507
 
Requirement Analysis Tree
Requirement Analysis TreeRequirement Analysis Tree
Requirement Analysis TreeKent Ishizawa
 
統計的因果推論からCausalMLまで走り抜けるスライド
統計的因果推論からCausalMLまで走り抜けるスライド統計的因果推論からCausalMLまで走り抜けるスライド
統計的因果推論からCausalMLまで走り抜けるスライドfusha
 
Tomcatの実装から学ぶクラスローダリーク #渋谷Java
Tomcatの実装から学ぶクラスローダリーク #渋谷JavaTomcatの実装から学ぶクラスローダリーク #渋谷Java
Tomcatの実装から学ぶクラスローダリーク #渋谷JavaNorito Agetsuma
 

Mais procurados (20)

楽にggplotを描く・整える
楽にggplotを描く・整える楽にggplotを描く・整える
楽にggplotを描く・整える
 
SparkとCassandraの美味しい関係
SparkとCassandraの美味しい関係SparkとCassandraの美味しい関係
SparkとCassandraの美味しい関係
 
巨大な表を高速に扱うData.table について
巨大な表を高速に扱うData.table について巨大な表を高速に扱うData.table について
巨大な表を高速に扱うData.table について
 
第三回ありえる社内勉強会 「いわががのLombok」
第三回ありえる社内勉強会 「いわががのLombok」第三回ありえる社内勉強会 「いわががのLombok」
第三回ありえる社内勉強会 「いわががのLombok」
 
パフォーマンス ボトルネック 国内あるある事例
パフォーマンス ボトルネック 国内あるある事例パフォーマンス ボトルネック 国内あるある事例
パフォーマンス ボトルネック 国内あるある事例
 
1 6.変数選択とAIC
1 6.変数選択とAIC1 6.変数選択とAIC
1 6.変数選択とAIC
 
Rustに触れて私のPythonはどう変わったか
Rustに触れて私のPythonはどう変わったかRustに触れて私のPythonはどう変わったか
Rustに触れて私のPythonはどう変わったか
 
Kaggle&競プロ紹介 in 中田研究室
Kaggle&競プロ紹介 in 中田研究室Kaggle&競プロ紹介 in 中田研究室
Kaggle&競プロ紹介 in 中田研究室
 
Scala警察のすすめ
Scala警察のすすめScala警察のすすめ
Scala警察のすすめ
 
さくっとはじめるテキストマイニング(R言語)  スタートアップ編
さくっとはじめるテキストマイニング(R言語)  スタートアップ編さくっとはじめるテキストマイニング(R言語)  スタートアップ編
さくっとはじめるテキストマイニング(R言語)  スタートアップ編
 
Visual Studio CodeでRを使う
Visual Studio CodeでRを使うVisual Studio CodeでRを使う
Visual Studio CodeでRを使う
 
非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2nd非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2nd
 
最高の統計ソフトウェアはどれか? "What’s the Best Statistical Software? A Comparison of R, Py...
最高の統計ソフトウェアはどれか? "What’s the Best Statistical Software? A Comparison of R, Py...最高の統計ソフトウェアはどれか? "What’s the Best Statistical Software? A Comparison of R, Py...
最高の統計ソフトウェアはどれか? "What’s the Best Statistical Software? A Comparison of R, Py...
 
マルコフ連鎖モンテカルロ法入門-2
マルコフ連鎖モンテカルロ法入門-2マルコフ連鎖モンテカルロ法入門-2
マルコフ連鎖モンテカルロ法入門-2
 
Tokyo r94 beginnerssession3
Tokyo r94 beginnerssession3Tokyo r94 beginnerssession3
Tokyo r94 beginnerssession3
 
Requirement Analysis Tree
Requirement Analysis TreeRequirement Analysis Tree
Requirement Analysis Tree
 
統計的因果推論からCausalMLまで走り抜けるスライド
統計的因果推論からCausalMLまで走り抜けるスライド統計的因果推論からCausalMLまで走り抜けるスライド
統計的因果推論からCausalMLまで走り抜けるスライド
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 
Tomcatの実装から学ぶクラスローダリーク #渋谷Java
Tomcatの実装から学ぶクラスローダリーク #渋谷JavaTomcatの実装から学ぶクラスローダリーク #渋谷Java
Tomcatの実装から学ぶクラスローダリーク #渋谷Java
 
Lockfree list
Lockfree listLockfree list
Lockfree list
 

Destaque

RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
Technology[Advantages&Disadvantages]-ICT-Valderrama
Technology[Advantages&Disadvantages]-ICT-ValderramaTechnology[Advantages&Disadvantages]-ICT-Valderrama
Technology[Advantages&Disadvantages]-ICT-ValderramaNorvi Grace Valderrama
 
Health System Innovation
Health System InnovationHealth System Innovation
Health System InnovationJohn G. Singer
 
Security and Exchange board of India - About
Security and Exchange board of India - AboutSecurity and Exchange board of India - About
Security and Exchange board of India - AboutAkash Kshirsagar
 
Part 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionPart 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionJoachim Jacob
 
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachDetecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachHong ChangBum
 
Detecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble ApproachDetecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble ApproachHong ChangBum
 
DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club avrilcoghlan
 
Wipro (Western India Products ltd) - A presentation
Wipro (Western India Products ltd) - A presentationWipro (Western India Products ltd) - A presentation
Wipro (Western India Products ltd) - A presentationAkash Kshirsagar
 
Domino's Pizza - MIS
Domino's Pizza - MISDomino's Pizza - MIS
Domino's Pizza - MISBarish Bose
 
Civitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC CurvesCivitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC CurvesKristen Hunter
 
What Makes Great Infographics
What Makes Great InfographicsWhat Makes Great Infographics
What Makes Great InfographicsSlideShare
 
Masters of SlideShare
Masters of SlideShareMasters of SlideShare
Masters of SlideShareKapost
 

Destaque (20)

RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
Lindo 2010
Lindo 2010Lindo 2010
Lindo 2010
 
Technology[Advantages&Disadvantages]-ICT-Valderrama
Technology[Advantages&Disadvantages]-ICT-ValderramaTechnology[Advantages&Disadvantages]-ICT-Valderrama
Technology[Advantages&Disadvantages]-ICT-Valderrama
 
Hotspot Shield
Hotspot ShieldHotspot Shield
Hotspot Shield
 
Health System Innovation
Health System InnovationHealth System Innovation
Health System Innovation
 
Pelican stores report
Pelican stores reportPelican stores report
Pelican stores report
 
Motion picture industry
Motion picture industryMotion picture industry
Motion picture industry
 
Future cars inc report
Future cars inc reportFuture cars inc report
Future cars inc report
 
Security and Exchange board of India - About
Security and Exchange board of India - AboutSecurity and Exchange board of India - About
Security and Exchange board of India - About
 
Part 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionPart 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expression
 
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachDetecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
 
Detecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble ApproachDetecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble Approach
 
DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club
 
Piyush pile
Piyush pilePiyush pile
Piyush pile
 
Wipro (Western India Products ltd) - A presentation
Wipro (Western India Products ltd) - A presentationWipro (Western India Products ltd) - A presentation
Wipro (Western India Products ltd) - A presentation
 
Domino's Pizza - MIS
Domino's Pizza - MISDomino's Pizza - MIS
Domino's Pizza - MIS
 
paragraph
paragraphparagraph
paragraph
 
Civitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC CurvesCivitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC Curves
 
What Makes Great Infographics
What Makes Great InfographicsWhat Makes Great Infographics
What Makes Great Infographics
 
Masters of SlideShare
Masters of SlideShareMasters of SlideShare
Masters of SlideShare
 

Semelhante a DEseq, voom and vst

Mining group correlations over data streams
Mining group correlations over data streamsMining group correlations over data streams
Mining group correlations over data streamsyuanchung
 
Understanding R for Epidemiologists
Understanding R for EpidemiologistsUnderstanding R for Epidemiologists
Understanding R for EpidemiologistsTomas J. Aragon
 
BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...IJAEMSJORNAL
 
Protein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors ApproachProtein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors ApproachGualberto Asencio Cortés
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsYoonho Lee
 
Presentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali ShahPresentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali Shahguest5de83e
 
Imecs2012 pp440 445
Imecs2012 pp440 445Imecs2012 pp440 445
Imecs2012 pp440 445Rasha Orban
 
Teaching Population Genetics with R
Teaching Population Genetics with RTeaching Population Genetics with R
Teaching Population Genetics with RBruce Cochrane
 
International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)CSCJournals
 
A Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierA Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierCSCJournals
 
Escobar-thesis-presentation-2
Escobar-thesis-presentation-2Escobar-thesis-presentation-2
Escobar-thesis-presentation-2Ivana Escobar
 
Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...Oleg Ovcharenko
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)neeraj7svp
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introductionYueshen Xu
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysistuxette
 
ベイジアンディープニューラルネット
ベイジアンディープニューラルネットベイジアンディープニューラルネット
ベイジアンディープニューラルネットYuta Kashino
 

Semelhante a DEseq, voom and vst (20)

Mining group correlations over data streams
Mining group correlations over data streamsMining group correlations over data streams
Mining group correlations over data streams
 
Understanding R for Epidemiologists
Understanding R for EpidemiologistsUnderstanding R for Epidemiologists
Understanding R for Epidemiologists
 
BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...
 
Protein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors ApproachProtein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors Approach
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
 
Estimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample SetsEstimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample Sets
 
Presentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali ShahPresentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali Shah
 
Imecs2012 pp440 445
Imecs2012 pp440 445Imecs2012 pp440 445
Imecs2012 pp440 445
 
Teaching Population Genetics with R
Teaching Population Genetics with RTeaching Population Genetics with R
Teaching Population Genetics with R
 
ISHIposter16_f
ISHIposter16_fISHIposter16_f
ISHIposter16_f
 
International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)
 
A Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierA Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target Classifier
 
Escobar-thesis-presentation-2
Escobar-thesis-presentation-2Escobar-thesis-presentation-2
Escobar-thesis-presentation-2
 
Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...
 
Basen Network
Basen NetworkBasen Network
Basen Network
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
 
Pycon2017
Pycon2017Pycon2017
Pycon2017
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis
 
ベイジアンディープニューラルネット
ベイジアンディープニューラルネットベイジアンディープニューラルネット
ベイジアンディープニューラルネット
 

Último

Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Celine George
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 

Último (20)

Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptx
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 

DEseq, voom and vst

  • 1. DESeq, voom and vst Qiang Kou qkou@umail.iu.edu April 28, 2014 Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 1 / 31
  • 2. Background Advantages of RNA-seq Compared to Microarray Detecting novel transcripts and isoforms High reproducibility, low background Detection of gene fusions and SNPs Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 2 / 31
  • 3. Background Differential Expression Analysis Steps Normalization Dispersion estimation Statistical testing Methods to be presented DESeq: negative binomial distribution [1] voom: variance modelling at the observational level [2] vst: variance-stabilizing transformation [1, 3] Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 3 / 31
  • 4. Background Timeline 2002 2004 2006 2008 2010 2012 2014 2016 vst lim m a cuffl inksD Eseq,edgeR baySeq voom Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 4 / 31
  • 5. Background Why different models? Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 5 / 31
  • 6. Background RNA-seq is Discrete Garber et al. (2011) Nature Methods 8:469-477 Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 6 / 31
  • 7. Background Length Normalization Within sample: gene length Between samples: library size RPKM and FPKM Reads/fragments per kilobase per million mapped reads Normalization for gene length and library size Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 7 / 31
  • 8. Background Different Distribution 0.0 0.2 0.4 0.6 1 2 3 4 expression density (a) Microarray 0.0 0.1 0.2 0.3 0.4 −2 0 2 4 log10(fpkm) density condition Untreated CG8144_RNAi genes (b) RNA-seq Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 8 / 31
  • 9. Background Differential Expression as a Function of Transcript Length 0 2000 4000 6000 8000 020406080 Sequencing Data (Sultan) %DE a 0 2000 4000 6000 8000 020406080 Array Data (Sultan) Transcript length (bp) %DE b 2000 4000 6000 8000 10000 024681012 Sequencing Data (Cloonan) Transcript length (bp) %DE c 0 1000 2000 3000 4000 5000 6000 7000 020406080 Sequencing Data (Marioni) d 1000 3000 5000 7000 020406080 Array Data (Marioni) Transcript length (bp) e 1000 2000 3000 4000 5000 6000 7000 020406080 Sequencing Data (Marioni) f 1000 2000 3000 4000 5000 6000 7000 020406080 Array Data (Marioni) Transcript length (bp) g Oshlack et al. (2009) Biology Direct 4:14 Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 9 / 31
  • 10. Background Poisson and Negative Binomial Distribution Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 10 / 31
  • 11. Background Poisson Distribution Graph from Wikipedia Pr(X = k) = λk e−λ k! E(x) = Var(X) = λ A list of genes g1, g2, . . . gn X ∼ Poisson(λ), a random variable representing the number of reads falling in gi Likelihood ratio test Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 11 / 31
  • 12. Background Negative Binomial Distribution Graph from Wikipedia X ∼ NB(r; p) Pr(X = k) = Ck k+r−1pk (1 − p)r p: probability of success r: predefined number of failures X: number of successes until r failures Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 12 / 31
  • 13. Background DEseq, voom and vst Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 13 / 31
  • 14. DEseq, voom and vst Normalization in DESeq Assumption Most genes not expressed differentially Differentially expressed genes divided equally between up- and down-regulation Steps Geometric mean of gene’s counts across all samples Divide gene’s counts by the geometric mean Normalization factor: median of ratios Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 14 / 31
  • 15. Model in DESeq Model in DESeq Read counts for gene i in sample j follows negative binomial distribution Kij ∼ NB(µij , σ2 ij ) Why not Poisson distribution? In RNA-seq, variance is larger than mean Very difficult to estimate µij and σ2 ij Parameters estimation is the main difference between methods based on NB distribution Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 15 / 31
  • 16. Model in DESeq Model in DESeq Count sum for gene i in condition A: a Count sum for gene i in condition B: b Sum: κ = a + b p(a), p(b) and p(a, b) p-value: p = i+j=κ,p(i,j)<p(a,b) p(i, j) i+j=κ p(i, j) Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 16 / 31
  • 17. Model in DESeq R code for DESeq library(DESeq) DESeq.cds = newCountDataSet(countData = data.sim$counts, conditions = factor(data.sim$treatment)) DESeq.cds = estimateSizeFactors(DESeq.cds) DESeq.cds = estimateDispersions(DESeq.cds, fitType = "local") DESeq.test = nbinomTest(DESeq.cds, "1", "2") DESeq.pvalues = DESeq.test$pval DESeq.adjpvalues = p.adjust(DESeq.pvalues, method = "BH") Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 17 / 31
  • 18. Model in limma Model in limma Linear Models for Microarray Data: lmFit() Classical t-test: tj = µ1j −µ2j σ2 j ( 1 n1 + 1 n2 ) Very hard to get the σ2 j from a small sample size limma: moderated t-test Use information from other genes σ2 j ∼ Inverse Gamma(α, β) Empirical Bayesian for parameter estimate: eBayes() Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 18 / 31
  • 19. Model in voom Model in voom voom: variance modelling at the observational level Locally weighted regression to get the relation between count and variance Moderated t-test in limma Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 19 / 31
  • 20. Model in voom Model in voom 4 6 8 10 12 14 0.00.20.40.60.81.0 Average log2(count size + 0.5) Sqrt(standarddeviation) a 4 6 8 10 12 14 Average log2(count size + 0.5) voom: Mean−variance trend b 4 6 8 10 12 14 Fitted log2(count size + 0.5) c 1.2 Law et al. Genome Biology 2014, 15:R29 Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 20 / 31
  • 21. Model in voom R code for voom library(limma) library(DESeq) group = factor(conditions) nf = calcNormFactors(data.matrix, method = "TMM") voom.data = voom(data.matrix, design = model.matrix(~group), lib.size = colSums(data.matrix) * nf) voom.data$genes = rownames(data.matrix) voom.fitlimma = lmFit(voom.data, design = model.matrix(~group)) voom.fitbayes = eBayes(voom.fitlimma) voom.pvalues = voom.fitbayes$p.value[, 2] voom.adjpvalues = p.adjust(voom.pvalues, method = "BH") voom.genes <- data.matrix[which(voom.adjpvalues <= 0.05), ] Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 21 / 31
  • 22. Model in vst Model in vst Variance-stabilizing transformation To find a simple function f to create new values y = f (x) that the variability of y is not related to mean A method used in microarray data analysis [4] Moderated t-test in limma Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 22 / 31
  • 23. Model in vst R code for vst library(DESeq) library(limma) group = factor(conditions) DESeq.cds = newCountDataSet(countData = data.matrix, conditions = group) DESeq.cds = estimateSizeFactors(DESeq.cds) DESeq.cds = estimateDispersions(DESeq.cds, method = "blind", fitType = "local") DESeq.vst = getVarianceStabilizedData(DESeq.cds) DESeq.vst.fitlimma = lmFit(DESeq.vst, design = model.matrix(~group)) DESeq.vst.fitbayes = eBayes(DESeq.vst.fitlimma) DESeq.vst.pvalues = DESeq.vst.fitbayes$p.value[, 2] DESeq.vst.adjpvalues = p.adjust(DESeq.vst.pvalues, method = "BH") DESeq.vst.genes <- data.matrix[which(DESeq.vst.adjpvalues <= 0.05), ] Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 23 / 31
  • 24. Results from Simulation AUC Results 0.5 0.6 0.7 0.8 5.0 7.5 10.0 12.5 15.0 #sample/condition AUC software baySeq DESeq EBSeq edgeR NBPSeq SAMseq ShrinkSeq TSPM. voom vst Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 24 / 31
  • 25. Results from Simulation Differential Expression Gene Number 1 10 baySeq DESeq NBPSeq voom vst edgeR ShrinkSeq TSPM EBSeq SAMSeq software value variable correct incorrect Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 25 / 31
  • 26. Results from Simulation Running Time 0 100 200 300 400 500 5.0 7.5 10.0 12.5 15.0 #sample/condition time(sec) software baySeq DESeq EBSeq edgeR NBPSeq SAMseq ShrinkSeq TSPM voom vst Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 26 / 31
  • 27. Results from Simulation Running Time with 15 Samples per Condition Software AUC Time edgeR 0.810 0.630 DESeq 0.652 48.388 NBPSeq 0.767 24.942 baySeq 0.495 210.781 EBSeq 0.769 12.666 TSPM 0.836 7.486 SAMseq 0.827 1.801 voom 0.835 0.264 vst 0.830 0.138 ShrinkSeq 0.796 343.260 Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 27 / 31
  • 28. Results from Simulation Venn Diagram for Drosophila melanogaster 4 7 13 11 310 178 17 DESeq voom vst Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 28 / 31
  • 29. Some Conclusion Some Conclusion Each method has many assumptions Negative binomial model has a relatively better specificity and sensitivity Good performance of voom and vst in accuracy and time, no difference between them All methods will have better performance with larger sample, however, sample size very limited in practice Different normalization in cuffdiff: both alternative isoforms and length of transcripts Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 29 / 31
  • 30. Some Conclusion References Simon Anders and Wolfgang Huber. Differential expression analysis for sequence count data. Genome Biology, 11:R106, 2010. Charity W Law, Yunshun Chen, Wei Shi, and Gordon K Smyth. Voom: precision weights unlock linear model analysis tools for rna-seq read counts. Genome Biology, 15(2):R29, 2014. Gordon K Smyth. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3:Article 3, 2004. Blythe P Durbin, Johanna S Hardin, Douglas M Hawkins, and David M Rocke. A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics, pages S105–S110, 2002. Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 30 / 31
  • 31. Thanks Thanks Thank you for your time! Qiang Kou qkou@umail.iu.edu Qiang Kou (qkou@umail.iu.edu) DESeq, voom and vst April 28, 2014 31 / 31