SlideShare uma empresa Scribd logo
1 de 24
Chapter 12
Reviewer : Sunwoo Kim
Christopher M. Bishop
Pattern Recognition and Machine Learning
Yonsei University
Department of Applied Statistics
Chapter 12. Continuous Latent Variables
2
What are we doing?
This chapter is pretty familiar to us.
Unlike discrete latent variables which we covered in chapter 9 and 10, we are going to figure out some ‘continuous’ latent variables.
We have already covered such values in our multivariate statistical analysis class, a PCA.
First, let’s think of what is ‘continuous latent variable.’
Consider above figure of 3.
They are all 3, and they are just being changed by ‘rotation’, ‘vertical translation’, and ‘horizontal translation’.
Here, we can say these changes are three degrees of freedom and can be refer to the ℝ𝟑
latent space!
Note that changes are not perfectly observed. That is, they exist with the ‘noise’. Thus, its important to figure the noises and errors out!
Now, let’s take a deeper look at some methods.
Chapter 12.1. Principal Component Analysis
3
What is PCA?
Originally, PCA was proposed to overcome the problem of ‘multicollinearity’.
That is, we are re-organizing features with un-correlated vectors!
Here, let’s only think at the perspective of ‘dimensionality reduction’.
Goal of PCA.
PCA1. We are finding projections which maximize the variance.
PCA2. We are finding projections which minimize the error.
Aims of two approaches are different.
However, we can see the results are the same!!
Let’s take a look at how these approaches differ.
PCA is a method which projects each data point to the eigen vectors of covariance matrix.
The reason why ‘eigen vectors of covariance matrix’ are used will be covered soon.
If we use all eigen vectors, we are simply changing entire feature vectors to the same
number of un-correlated vectors.
If we select some of them, we are performing dimensionality reduction!
Chapter 12.1. Principal Component Analysis
4
Variance maximization
Let’s begin with projection to one-dimensional space (𝑀 = 1).
We are trying to project each data point to a vector of 𝑢1, which has a unit length(𝑢1
𝑇
𝑢1 = 1).
Here, we can say variance of projected vector can be
As we mentioned, our primary goal is to find 𝒖𝟏, which maximized the resulting vector’s variance. This can be obtained by constraint optimization!
By using Lagrange, we can get (+ setting zero derivative)
That is, resulting 𝒖𝟏 is an eigen vector
of a covariance matrix 𝑺
Here, variance can be calculated as,
What does it mean? This indicates resulting variance is defined as the eigen value of
the corresponding eigen vector. Thus, we have to choose the eigen vectors from
the biggest eigen vectors, and by descending order!
Chapter 12.1. Principal Component Analysis
5
Error minimization
Here, we are thinking of same eigen vector formulation.
However, we are trying to minimize error which occurs according to the reduction of dimensionality!
** Then, why such error occurs?
Consider there are 7 features. If we reduce it to 5 new-features, obviously there will be difference!
If we assume basis is complete, every data point can be represented by a linear combination of them.
That is, if every basis vectors are
Here, we can set 𝜶 as pretty obvious method, that is, 𝛼𝑛𝑗 = 𝑋𝑛
𝑇
𝑢𝑗. Then we can write
Think we are performing
dimensionality reduction.
Approximation can be
Note that 𝑧 is being defined respectively to each
data, but 𝑏𝑖 is universal constant! Which is being
applied equally to every data!
Chapter 12.1. Principal Component Analysis
6
Error minimization
Thus, overall error can be defined as
By setting derivative to zero, we can re-write 𝑧 and 𝑏 by the followings…
By applying new notation, we can achieve… Thus, error which we are trying to minimize is…
Again, we can see this has same form with the previous variance maximization case. We can again use Lagrange.
Chapter 12.1. Principal Component Analysis
7
Interpretation & Application
Here, we can achieve some interesting insights.
1. First, we were trying to maximize the variance of transformed data.
2. At the different perspective, we tried to minimize errors.
3. Here results were same! Furthermore, we can interpret the abandoned eigen vector’s eigen value as the loss of information.
4. That is why we can interpret chosen eigen values as the explained proportion!
We can use PCA in…
- Dimensionality reduction - Data transformation(To uncorrelated) - Data visualization
This can be applied to multicollinearity issues!
We covered it in regression analysis!
Chapter 12.2. Probabilistic PCA
8
Idea of probabilistic PCA
In fact, aforementioned PCA is not a ‘statistics’.
Rather, it was a pure linear algebra, which was done by mere mathematics.
Now, let’s view PCA at the perspective of statistics and probability!
Still, we consider gaussian latent variable which follows 𝒑 𝒛 ~ 𝑵 𝒛 𝟎, 𝑰 .
Similarly, we can define distribution of data as 𝒑 𝒙 𝒛 ~ 𝑵(𝒙|𝑾𝒛 + 𝝁, 𝝈𝟐
𝑰).
Note that transformed data is a linear combination of existing latent variables!
Thus, we can define our data by…
𝑿 = 𝑾𝒛 + 𝝁 + 𝝐
Chapter 12.2. Probabilistic PCA
9
Estimating parameters
Here, we have to estimate 𝑾, 𝝁, 𝝈𝟐
.
Pure likelihood function can be written as
** Detail calculations were skipped!
This can be simply derived by…
Here, orthogonal transformation of 𝑊 does not affect the
resulting distribution.
Thus, this estimation is independent of 𝑅.
Chapter 12.2. Probabilistic PCA
10
Estimating parameters
For the evaluation of predictive distribution and finding posterior, we need inverse of 𝐶. By using the fact that
Resulting posterior would be…
Here, check out that only observed variable is 𝑥𝑛.
Rest were not observed. They only exist as a random variable or unknown constant!
Note that this is pretty similar to Bayesian regression, but here 𝑊𝑍 + 𝜇 are all un-observed!
So the estimation process is a bit different!
Chapter 12.2. Probabilistic PCA
11
Maximum likelihood solution
From the marginal distribution of 𝑥, we can estimate 𝑊, 𝜇, 𝜎2
.
Here, we can get a closed form of 𝜇 to be 𝑥.
This is really intuitive!
By substituting 𝜇 by 𝑥, we can get following
equations of density function. Note that 𝑆
corresponds to the covariance matrix of 𝑋.
Direct estimation of 𝑊 is pretty hard. However, it is known that approximated solution can be
- Here, 𝑈𝑀 is a column vectors of eigen vectors.
- 𝐿𝑀 is a diagonal matrix filled with eigen values.
- 𝑅 is an arbitrary orthogonal matrix. (In fact, I cannot understand where does
this arbitrary 𝑅 matrix comes from…)
Chapter 12.2. Probabilistic PCA
12
Maximum likelihood solution
For the variance term, it can be computed by…
There is one notable fact.
Even we compute the estimated value of matrix 𝑊, its product is invariant of 𝑹! (We have already covered 𝑊𝑇𝑊 = 𝑊𝑇
𝑊)
Note that if we set 𝑅 as an identity matrix, we can see 𝑊 converges to the basic PCA matrix!
That is, direction is preserved as the eigen vector, only its direction in scaled by 𝜆𝑖 − 𝜎2
.
Let’s revise form of covariance matrix once again.
As we have seen, 𝐶 = 𝑊𝑊𝑇
+ 𝜎2
𝐼. Consider a unit vector 𝑣𝑇
𝑣 = 1 is aligned with covariance matrix 𝐶.
If a 𝑣 is a vector orthogonal to the existing eigen vectors, it would yield 𝑣𝑇
𝐶𝑣 = 𝜎2
, since first term goes zero. This is just noise vectors.
If a 𝑣 is a vector of PC, 𝑢𝑖 = 𝑣, then variance becomes 𝜆𝑖 − 𝜎2
+ 𝜎2
= 𝜆𝑖, which means model captured variance well!
Then what does it mean??
In my opinion, this is an issue about estimating latent variable 𝒛.
If a 𝑧 is estimated correctly, which means aligned to the eigen vector of covariance matrix, then resulting covariance of saturated value(𝑋) should be
similar to that of original data.
On the other hand, if 𝑧 is wrongly estimated(which means orthogonal to the existing PCs) then the resulting variance would be only 𝜎2
, which means
noise term.
𝑿 = 𝑾𝒛 + 𝝁 + 𝝐
Chapter 12.2. Probabilistic PCA
13
Intuitive understanding
Let’s see whether the resulting parameters fit our intuition.
First, what if we use full-latent vector? Which means, dimension of latent vector and original vector is same. Then…
This exactly recovers the original data’s covariance!
Secondly, unlike original PCA which maps data to the saturated dimension, probabilistic PCA maps ‘latent-vectors’ to the data space.
Which means,
On the other hand, if we set 𝜎2
→ 0, posterior mean becomes…
Which means, we are projecting data onto the estimated 𝑾 space!
Chapter 12.2. Probabilistic PCA
14
EM algorithm for PCA
You may wonder why we are applying EM to the PCA despite the fact that we know the closed form of each parameter.
It’s because its sometime computationally more efficient!
Here, we define 𝑧 as a latent value, and 𝜇 and 𝜎2
as our parameter!
Parameters can be computed as
Complete log-likelihood (To be maximized)
E-Step
M-Step
Chapter 12.2. Probabilistic PCA
15
Bayesian PCA
Naturally, we can think PCA with the Bayesian perspective.
This may give help in deciding appropriate dimension of latent vectors with the help of evidence values!
However, to obtain Bayesian model, we need to marginalize distribution with respect to 𝜎2
To make such computations tractable, we use ARD(automatic relevance determination). / Detail calculation has been skipped!
That is, each prior is defined separately for each column of 𝑾 matrix!
Then as we all know, we are finding appropriate 𝛼 after integrating 𝑊 out of total likelihood functions! (This can also be computed by using Gibbs-sampling!)
Then, we can get
𝒑 𝒙 𝒛 ~ 𝑵(𝒙|𝑾𝒛 + 𝝁, 𝝈𝟐𝑰) .
Note that this term also contains 𝑤, so we need estimation of it!
Where 𝐴 is a diagonal matrix filled with 𝛼𝑖.
So, we need to obtain it in a sequential way like EM!
Chapter 12.2. Probabilistic PCA
16
Factor Analysis
We have already studied this chapter in detail at multivariate-statistics!
It can be expressed by…
Likewise, we are assuming data is being formulated by some
un-observed factors, and we are back-tracking factors by the
linear combination and error terms of data!
This is really similar to the probabilistic PCA
Intelligence
Language skill
IQ-test
Soo-Neung
GPA
Factors
(Unobserved)
Data
(Observed)
Here, Ψ is called a DxD diagonal matrix called ‘uniqueness’, an independent noise of data.
Furthermore, 𝑊 is called a ‘factor loading’.
Similarly, we can find that 𝑝 𝑥 = 𝑁 𝑥 𝜇, 𝑊𝑊𝑇
+ Ψ).
Then, we can find
Chapter 12.3. Kernel PCA
17
Kernel-based approach of PCA
We studied kernel, especially valid kernel(a kernel which we do not need to explicitly compute 𝜙(𝑥)) in chapter 6 and 7.
Kernel can also be applied to PCA, which means we are projecting data-points to the ‘non-linear’ space!
As you all can see, it is natural to express 𝑥𝑛𝑥𝑛
𝑇
by the form of kernel!
Chapter 12.3. Kernel PCA
18
Calculation
We are mapping data to non-linear kernel 𝜙(𝑥𝑛).
As we have studied, explicitly computing kernel is not a good idea, since some kernel (like gaussian) sends feature vector to the infinite dimension
and compute inner product of it.
First, let’s consider zero mean(𝜙(𝑥𝑛) = 0) mapping function. (It’s non-sense, but we will relax this condition.)
As we have seen, resulting covariance matrix and eigen value can be
Last equation can be re-written as Note that this is a linear
combination of 𝜙(𝑥𝑛)
This term is a scalar
However, there still exist single 𝜙(𝑥𝑛) term.
We have to make it to the form of kernel!
By multiplying 𝜙(𝑥𝑛) to both side…
Chapter 12.3. Kernel PCA
19
Calculation
By using matrix notation, we can write as…
There is 𝐾 on both side of the equation.
We can erase them by multiplying 𝐾−1
on both side.
Since valid kernel is positive definite (𝝀𝒊 > 𝟎), we can know there always exist inverse of kernel function!
** Note that 𝒅𝒆𝒕 𝑲 ≠ 𝟎 when p.d.
Thus, overall equation can be expressed by…
Normalizing condition of 𝑎𝑖
Method to generate saturated data vector.
Chapter 12.3. Kernel PCA
20
Non-centered data vector
We have assumed 𝜙(𝑥𝑛) are all centered.
However, real-world data is not centered in most cases. Thus, we have to make it centered to easily compute covariance matrix 𝐶.
Note that here again we have to avoid computing 𝝓(𝒙𝒏) explicitly!
Original form of centered data. Let’s re-write gram matrix 𝐾 with respect to this tilda value!
We can write left-side equations by this matrix notation!
Here, 1𝑁 is a vector filled with
1
𝑁
values!
Note that computing such gram matrix is not always possible,
since it is 𝑁 𝑋 𝑁 matrix.
Under big-data condition, it might be sometimes intractable!
So we may use some approximation method.
Chapter 12.3. Kernel PCA
21
Example of kernel PCA
In this example, we used gaussian
kernel which looks like
Lines(‘a bit blurred, please look at the
figure of textbook’) are the PCs,
Check that PC lines well-captured the
original data distribution!
Let’s see more practical examples
Chapter 12.3. Kernel PCA
22
Example of kernel PCA
This example is from Wikipedia kernel PCA, link is
https://en.wikipedia.org/wiki/Kernel_principal_component_analysis
That is, we can make such data separable with just a single dimension (x-axis in a figure!)
Chapter 12.4. Nonlinear Latent Variable Models
23
Non-Gaussian?
Limiting model and distribution as linear and gaussian may restrict the practical application and other related issues.
Thus, let’s cover some practical latent variable models as we shall see shortly.
Independent component analysis
Overall discussion begins from
That is, latent variables are clearly independent!
Note that we restricted the independence with respect to linear orthogonality,
We are assuming probabilistic independence here!
Power of this method is that we do not assume the gaussian structure of the model.
Here, we can assume data distribution to be
There are not much examples in the
book.. So.. One who is interested in it
may do some extra study.. ;(
Chapter 12.4. Nonlinear Latent Variable Models
24
Auto-associative neural networks
It is a dimensionality reduction model of neural network.
Overall form of this model is similar to that of auto-encoder which we are familiar of!
In order to train this network, we need a specific error function.
Which can be
If this network does not contain any non-linear function, it can be something really close to
the basic PCA. However, there is no orthogonality condition for the hidden units.
For the non-linearity, we can use such model(left-hand side model!)
This model can be interpreted by…
Please note that this transformation of 𝐹2
can be a non-linear embedding since it
contains non-linear unit between the
network units!

Mais conteúdo relacionado

Mais procurados

Approximate Inference (Chapter 10, PRML Reading)
Approximate Inference (Chapter 10, PRML Reading)Approximate Inference (Chapter 10, PRML Reading)
Approximate Inference (Chapter 10, PRML Reading)Ha Phuong
 
Leveraged Gaussian Process
Leveraged Gaussian ProcessLeveraged Gaussian Process
Leveraged Gaussian ProcessSungjoon Choi
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector MachineDerek Kane
 
PRML Chapter 7
PRML Chapter 7PRML Chapter 7
PRML Chapter 7Sunwoo Kim
 
PRML Chapter 1
PRML Chapter 1PRML Chapter 1
PRML Chapter 1Sunwoo Kim
 
PRML Chapter 6
PRML Chapter 6PRML Chapter 6
PRML Chapter 6Sunwoo Kim
 
Dropout as a Bayesian Approximation
Dropout as a Bayesian ApproximationDropout as a Bayesian Approximation
Dropout as a Bayesian ApproximationSangwoo Mo
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법 홍배 김
 
20191006 bayesian dl_1_pub
20191006 bayesian dl_1_pub20191006 bayesian dl_1_pub
20191006 bayesian dl_1_pubYoichi Tokita
 
Reinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based PoliciesReinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based PoliciesSangwoo Mo
 
Understanding deep learning requires rethinking generalization (2017) 1/2
Understanding deep learning requires rethinking generalization (2017) 1/2Understanding deep learning requires rethinking generalization (2017) 1/2
Understanding deep learning requires rethinking generalization (2017) 1/2정훈 서
 
Uncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep LearningUncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep LearningChristian Perone
 
Gaussian Process Regression
Gaussian Process Regression  Gaussian Process Regression
Gaussian Process Regression SEMINARGROOT
 
[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial NetworksJaeJun Yoo
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 ISungbin Lim
 
Chapter 9 - convolutional networks
Chapter 9 - convolutional networksChapter 9 - convolutional networks
Chapter 9 - convolutional networksKyeongUkJang
 
Attention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsAttention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsArtifacia
 
漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)
漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)
漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)Hideo Hirose
 

Mais procurados (20)

Approximate Inference (Chapter 10, PRML Reading)
Approximate Inference (Chapter 10, PRML Reading)Approximate Inference (Chapter 10, PRML Reading)
Approximate Inference (Chapter 10, PRML Reading)
 
Leveraged Gaussian Process
Leveraged Gaussian ProcessLeveraged Gaussian Process
Leveraged Gaussian Process
 
Uncertainty in Deep Learning
Uncertainty in Deep LearningUncertainty in Deep Learning
Uncertainty in Deep Learning
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
 
PRML Chapter 7
PRML Chapter 7PRML Chapter 7
PRML Chapter 7
 
PRML Chapter 1
PRML Chapter 1PRML Chapter 1
PRML Chapter 1
 
PRML Chapter 6
PRML Chapter 6PRML Chapter 6
PRML Chapter 6
 
Dropout as a Bayesian Approximation
Dropout as a Bayesian ApproximationDropout as a Bayesian Approximation
Dropout as a Bayesian Approximation
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법
 
20191006 bayesian dl_1_pub
20191006 bayesian dl_1_pub20191006 bayesian dl_1_pub
20191006 bayesian dl_1_pub
 
Reinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based PoliciesReinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based Policies
 
Understanding deep learning requires rethinking generalization (2017) 1/2
Understanding deep learning requires rethinking generalization (2017) 1/2Understanding deep learning requires rethinking generalization (2017) 1/2
Understanding deep learning requires rethinking generalization (2017) 1/2
 
Uncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep LearningUncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep Learning
 
Gaussian Process Regression
Gaussian Process Regression  Gaussian Process Regression
Gaussian Process Regression
 
[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 I
 
Chapter 9 - convolutional networks
Chapter 9 - convolutional networksChapter 9 - convolutional networks
Chapter 9 - convolutional networks
 
Wasserstein GAN
Wasserstein GANWasserstein GAN
Wasserstein GAN
 
Attention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsAttention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its Applications
 
漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)
漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)
漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)
 

Semelhante a PRML Chapter 12

Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregressionkongara
 
Feature selection using PCA.pptx
Feature selection using PCA.pptxFeature selection using PCA.pptx
Feature selection using PCA.pptxbeherasushree212
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONijaia
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchEshanAgarwal4
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysisRaman Kannan
 
Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...butest
 
Linear regression.pptx
Linear regression.pptxLinear regression.pptx
Linear regression.pptxssuserb8a904
 
A Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisA Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisMonica Franklin
 
Bootcamp of new world to taken seriously
Bootcamp of new world to taken seriouslyBootcamp of new world to taken seriously
Bootcamp of new world to taken seriouslykhaled125087
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1Gautam Kumar
 
Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)Mumbai Academisc
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpankit_ppt
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)Learnbay Datascience
 

Semelhante a PRML Chapter 12 (20)

Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregression
 
Feature selection using PCA.pptx
Feature selection using PCA.pptxFeature selection using PCA.pptx
Feature selection using PCA.pptx
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
ML Lab.docx
ML Lab.docxML Lab.docx
ML Lab.docx
 
Practical --1.pdf
Practical --1.pdfPractical --1.pdf
Practical --1.pdf
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysis
 
MF Presentation.pptx
MF Presentation.pptxMF Presentation.pptx
MF Presentation.pptx
 
Transform idea
Transform ideaTransform idea
Transform idea
 
Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...
 
Chapter 18,19
Chapter 18,19Chapter 18,19
Chapter 18,19
 
Linear regression.pptx
Linear regression.pptxLinear regression.pptx
Linear regression.pptx
 
A Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisA Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer Analysis
 
Bootcamp of new world to taken seriously
Bootcamp of new world to taken seriouslyBootcamp of new world to taken seriously
Bootcamp of new world to taken seriously
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1
 
Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlp
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
 

Último

➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 

Último (20)

➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 

PRML Chapter 12

  • 1. Chapter 12 Reviewer : Sunwoo Kim Christopher M. Bishop Pattern Recognition and Machine Learning Yonsei University Department of Applied Statistics
  • 2. Chapter 12. Continuous Latent Variables 2 What are we doing? This chapter is pretty familiar to us. Unlike discrete latent variables which we covered in chapter 9 and 10, we are going to figure out some ‘continuous’ latent variables. We have already covered such values in our multivariate statistical analysis class, a PCA. First, let’s think of what is ‘continuous latent variable.’ Consider above figure of 3. They are all 3, and they are just being changed by ‘rotation’, ‘vertical translation’, and ‘horizontal translation’. Here, we can say these changes are three degrees of freedom and can be refer to the ℝ𝟑 latent space! Note that changes are not perfectly observed. That is, they exist with the ‘noise’. Thus, its important to figure the noises and errors out! Now, let’s take a deeper look at some methods.
  • 3. Chapter 12.1. Principal Component Analysis 3 What is PCA? Originally, PCA was proposed to overcome the problem of ‘multicollinearity’. That is, we are re-organizing features with un-correlated vectors! Here, let’s only think at the perspective of ‘dimensionality reduction’. Goal of PCA. PCA1. We are finding projections which maximize the variance. PCA2. We are finding projections which minimize the error. Aims of two approaches are different. However, we can see the results are the same!! Let’s take a look at how these approaches differ. PCA is a method which projects each data point to the eigen vectors of covariance matrix. The reason why ‘eigen vectors of covariance matrix’ are used will be covered soon. If we use all eigen vectors, we are simply changing entire feature vectors to the same number of un-correlated vectors. If we select some of them, we are performing dimensionality reduction!
  • 4. Chapter 12.1. Principal Component Analysis 4 Variance maximization Let’s begin with projection to one-dimensional space (𝑀 = 1). We are trying to project each data point to a vector of 𝑢1, which has a unit length(𝑢1 𝑇 𝑢1 = 1). Here, we can say variance of projected vector can be As we mentioned, our primary goal is to find 𝒖𝟏, which maximized the resulting vector’s variance. This can be obtained by constraint optimization! By using Lagrange, we can get (+ setting zero derivative) That is, resulting 𝒖𝟏 is an eigen vector of a covariance matrix 𝑺 Here, variance can be calculated as, What does it mean? This indicates resulting variance is defined as the eigen value of the corresponding eigen vector. Thus, we have to choose the eigen vectors from the biggest eigen vectors, and by descending order!
  • 5. Chapter 12.1. Principal Component Analysis 5 Error minimization Here, we are thinking of same eigen vector formulation. However, we are trying to minimize error which occurs according to the reduction of dimensionality! ** Then, why such error occurs? Consider there are 7 features. If we reduce it to 5 new-features, obviously there will be difference! If we assume basis is complete, every data point can be represented by a linear combination of them. That is, if every basis vectors are Here, we can set 𝜶 as pretty obvious method, that is, 𝛼𝑛𝑗 = 𝑋𝑛 𝑇 𝑢𝑗. Then we can write Think we are performing dimensionality reduction. Approximation can be Note that 𝑧 is being defined respectively to each data, but 𝑏𝑖 is universal constant! Which is being applied equally to every data!
  • 6. Chapter 12.1. Principal Component Analysis 6 Error minimization Thus, overall error can be defined as By setting derivative to zero, we can re-write 𝑧 and 𝑏 by the followings… By applying new notation, we can achieve… Thus, error which we are trying to minimize is… Again, we can see this has same form with the previous variance maximization case. We can again use Lagrange.
  • 7. Chapter 12.1. Principal Component Analysis 7 Interpretation & Application Here, we can achieve some interesting insights. 1. First, we were trying to maximize the variance of transformed data. 2. At the different perspective, we tried to minimize errors. 3. Here results were same! Furthermore, we can interpret the abandoned eigen vector’s eigen value as the loss of information. 4. That is why we can interpret chosen eigen values as the explained proportion! We can use PCA in… - Dimensionality reduction - Data transformation(To uncorrelated) - Data visualization This can be applied to multicollinearity issues! We covered it in regression analysis!
  • 8. Chapter 12.2. Probabilistic PCA 8 Idea of probabilistic PCA In fact, aforementioned PCA is not a ‘statistics’. Rather, it was a pure linear algebra, which was done by mere mathematics. Now, let’s view PCA at the perspective of statistics and probability! Still, we consider gaussian latent variable which follows 𝒑 𝒛 ~ 𝑵 𝒛 𝟎, 𝑰 . Similarly, we can define distribution of data as 𝒑 𝒙 𝒛 ~ 𝑵(𝒙|𝑾𝒛 + 𝝁, 𝝈𝟐 𝑰). Note that transformed data is a linear combination of existing latent variables! Thus, we can define our data by… 𝑿 = 𝑾𝒛 + 𝝁 + 𝝐
  • 9. Chapter 12.2. Probabilistic PCA 9 Estimating parameters Here, we have to estimate 𝑾, 𝝁, 𝝈𝟐 . Pure likelihood function can be written as ** Detail calculations were skipped! This can be simply derived by… Here, orthogonal transformation of 𝑊 does not affect the resulting distribution. Thus, this estimation is independent of 𝑅.
  • 10. Chapter 12.2. Probabilistic PCA 10 Estimating parameters For the evaluation of predictive distribution and finding posterior, we need inverse of 𝐶. By using the fact that Resulting posterior would be… Here, check out that only observed variable is 𝑥𝑛. Rest were not observed. They only exist as a random variable or unknown constant! Note that this is pretty similar to Bayesian regression, but here 𝑊𝑍 + 𝜇 are all un-observed! So the estimation process is a bit different!
  • 11. Chapter 12.2. Probabilistic PCA 11 Maximum likelihood solution From the marginal distribution of 𝑥, we can estimate 𝑊, 𝜇, 𝜎2 . Here, we can get a closed form of 𝜇 to be 𝑥. This is really intuitive! By substituting 𝜇 by 𝑥, we can get following equations of density function. Note that 𝑆 corresponds to the covariance matrix of 𝑋. Direct estimation of 𝑊 is pretty hard. However, it is known that approximated solution can be - Here, 𝑈𝑀 is a column vectors of eigen vectors. - 𝐿𝑀 is a diagonal matrix filled with eigen values. - 𝑅 is an arbitrary orthogonal matrix. (In fact, I cannot understand where does this arbitrary 𝑅 matrix comes from…)
  • 12. Chapter 12.2. Probabilistic PCA 12 Maximum likelihood solution For the variance term, it can be computed by… There is one notable fact. Even we compute the estimated value of matrix 𝑊, its product is invariant of 𝑹! (We have already covered 𝑊𝑇𝑊 = 𝑊𝑇 𝑊) Note that if we set 𝑅 as an identity matrix, we can see 𝑊 converges to the basic PCA matrix! That is, direction is preserved as the eigen vector, only its direction in scaled by 𝜆𝑖 − 𝜎2 . Let’s revise form of covariance matrix once again. As we have seen, 𝐶 = 𝑊𝑊𝑇 + 𝜎2 𝐼. Consider a unit vector 𝑣𝑇 𝑣 = 1 is aligned with covariance matrix 𝐶. If a 𝑣 is a vector orthogonal to the existing eigen vectors, it would yield 𝑣𝑇 𝐶𝑣 = 𝜎2 , since first term goes zero. This is just noise vectors. If a 𝑣 is a vector of PC, 𝑢𝑖 = 𝑣, then variance becomes 𝜆𝑖 − 𝜎2 + 𝜎2 = 𝜆𝑖, which means model captured variance well! Then what does it mean?? In my opinion, this is an issue about estimating latent variable 𝒛. If a 𝑧 is estimated correctly, which means aligned to the eigen vector of covariance matrix, then resulting covariance of saturated value(𝑋) should be similar to that of original data. On the other hand, if 𝑧 is wrongly estimated(which means orthogonal to the existing PCs) then the resulting variance would be only 𝜎2 , which means noise term. 𝑿 = 𝑾𝒛 + 𝝁 + 𝝐
  • 13. Chapter 12.2. Probabilistic PCA 13 Intuitive understanding Let’s see whether the resulting parameters fit our intuition. First, what if we use full-latent vector? Which means, dimension of latent vector and original vector is same. Then… This exactly recovers the original data’s covariance! Secondly, unlike original PCA which maps data to the saturated dimension, probabilistic PCA maps ‘latent-vectors’ to the data space. Which means, On the other hand, if we set 𝜎2 → 0, posterior mean becomes… Which means, we are projecting data onto the estimated 𝑾 space!
  • 14. Chapter 12.2. Probabilistic PCA 14 EM algorithm for PCA You may wonder why we are applying EM to the PCA despite the fact that we know the closed form of each parameter. It’s because its sometime computationally more efficient! Here, we define 𝑧 as a latent value, and 𝜇 and 𝜎2 as our parameter! Parameters can be computed as Complete log-likelihood (To be maximized) E-Step M-Step
  • 15. Chapter 12.2. Probabilistic PCA 15 Bayesian PCA Naturally, we can think PCA with the Bayesian perspective. This may give help in deciding appropriate dimension of latent vectors with the help of evidence values! However, to obtain Bayesian model, we need to marginalize distribution with respect to 𝜎2 To make such computations tractable, we use ARD(automatic relevance determination). / Detail calculation has been skipped! That is, each prior is defined separately for each column of 𝑾 matrix! Then as we all know, we are finding appropriate 𝛼 after integrating 𝑊 out of total likelihood functions! (This can also be computed by using Gibbs-sampling!) Then, we can get 𝒑 𝒙 𝒛 ~ 𝑵(𝒙|𝑾𝒛 + 𝝁, 𝝈𝟐𝑰) . Note that this term also contains 𝑤, so we need estimation of it! Where 𝐴 is a diagonal matrix filled with 𝛼𝑖. So, we need to obtain it in a sequential way like EM!
  • 16. Chapter 12.2. Probabilistic PCA 16 Factor Analysis We have already studied this chapter in detail at multivariate-statistics! It can be expressed by… Likewise, we are assuming data is being formulated by some un-observed factors, and we are back-tracking factors by the linear combination and error terms of data! This is really similar to the probabilistic PCA Intelligence Language skill IQ-test Soo-Neung GPA Factors (Unobserved) Data (Observed) Here, Ψ is called a DxD diagonal matrix called ‘uniqueness’, an independent noise of data. Furthermore, 𝑊 is called a ‘factor loading’. Similarly, we can find that 𝑝 𝑥 = 𝑁 𝑥 𝜇, 𝑊𝑊𝑇 + Ψ). Then, we can find
  • 17. Chapter 12.3. Kernel PCA 17 Kernel-based approach of PCA We studied kernel, especially valid kernel(a kernel which we do not need to explicitly compute 𝜙(𝑥)) in chapter 6 and 7. Kernel can also be applied to PCA, which means we are projecting data-points to the ‘non-linear’ space! As you all can see, it is natural to express 𝑥𝑛𝑥𝑛 𝑇 by the form of kernel!
  • 18. Chapter 12.3. Kernel PCA 18 Calculation We are mapping data to non-linear kernel 𝜙(𝑥𝑛). As we have studied, explicitly computing kernel is not a good idea, since some kernel (like gaussian) sends feature vector to the infinite dimension and compute inner product of it. First, let’s consider zero mean(𝜙(𝑥𝑛) = 0) mapping function. (It’s non-sense, but we will relax this condition.) As we have seen, resulting covariance matrix and eigen value can be Last equation can be re-written as Note that this is a linear combination of 𝜙(𝑥𝑛) This term is a scalar However, there still exist single 𝜙(𝑥𝑛) term. We have to make it to the form of kernel! By multiplying 𝜙(𝑥𝑛) to both side…
  • 19. Chapter 12.3. Kernel PCA 19 Calculation By using matrix notation, we can write as… There is 𝐾 on both side of the equation. We can erase them by multiplying 𝐾−1 on both side. Since valid kernel is positive definite (𝝀𝒊 > 𝟎), we can know there always exist inverse of kernel function! ** Note that 𝒅𝒆𝒕 𝑲 ≠ 𝟎 when p.d. Thus, overall equation can be expressed by… Normalizing condition of 𝑎𝑖 Method to generate saturated data vector.
  • 20. Chapter 12.3. Kernel PCA 20 Non-centered data vector We have assumed 𝜙(𝑥𝑛) are all centered. However, real-world data is not centered in most cases. Thus, we have to make it centered to easily compute covariance matrix 𝐶. Note that here again we have to avoid computing 𝝓(𝒙𝒏) explicitly! Original form of centered data. Let’s re-write gram matrix 𝐾 with respect to this tilda value! We can write left-side equations by this matrix notation! Here, 1𝑁 is a vector filled with 1 𝑁 values! Note that computing such gram matrix is not always possible, since it is 𝑁 𝑋 𝑁 matrix. Under big-data condition, it might be sometimes intractable! So we may use some approximation method.
  • 21. Chapter 12.3. Kernel PCA 21 Example of kernel PCA In this example, we used gaussian kernel which looks like Lines(‘a bit blurred, please look at the figure of textbook’) are the PCs, Check that PC lines well-captured the original data distribution! Let’s see more practical examples
  • 22. Chapter 12.3. Kernel PCA 22 Example of kernel PCA This example is from Wikipedia kernel PCA, link is https://en.wikipedia.org/wiki/Kernel_principal_component_analysis That is, we can make such data separable with just a single dimension (x-axis in a figure!)
  • 23. Chapter 12.4. Nonlinear Latent Variable Models 23 Non-Gaussian? Limiting model and distribution as linear and gaussian may restrict the practical application and other related issues. Thus, let’s cover some practical latent variable models as we shall see shortly. Independent component analysis Overall discussion begins from That is, latent variables are clearly independent! Note that we restricted the independence with respect to linear orthogonality, We are assuming probabilistic independence here! Power of this method is that we do not assume the gaussian structure of the model. Here, we can assume data distribution to be There are not much examples in the book.. So.. One who is interested in it may do some extra study.. ;(
  • 24. Chapter 12.4. Nonlinear Latent Variable Models 24 Auto-associative neural networks It is a dimensionality reduction model of neural network. Overall form of this model is similar to that of auto-encoder which we are familiar of! In order to train this network, we need a specific error function. Which can be If this network does not contain any non-linear function, it can be something really close to the basic PCA. However, there is no orthogonality condition for the hidden units. For the non-linearity, we can use such model(left-hand side model!) This model can be interpreted by… Please note that this transformation of 𝐹2 can be a non-linear embedding since it contains non-linear unit between the network units!