SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Monte Carlo Methods
Lecture slides for Chapter 17 of Deep Learning
www.deeplearningbook.org
Ian Goodfellow
Last updated 2017-12-29
(Goodfellow 2017)
Roadmap
• Basics of Monte Carlo methods
• Importance Sampling
• Markov Chains
(Goodfellow 2017)
Randomized Algorithms
Las Vegas Monte Carlo
Type of
Answer
Exact
Random amount
of error
Runtime
Random (until
answer found)
Chosen by user
(longer runtime
gives lesss error)
(Goodfellow 2017)
Estimating sums / integrals
with samples
has an exponential number of terms, and no exact simplification is known), it is
often possible to approximate it using Monte Carlo sampling. The idea is to view
the sum or integral as if it were an expectation under some distribution and to
approximate the expectation by a corresponding average. Let
s =
X
x
p(x)f(x) = Ep[f(x)] (17.1)
or
s =
Z
p(x)f(x)dx = Ep[f(x)] (17.2)
be the sum or integral to estimate, rewritten as an expectation, with the constraint
that p is a probability distribution (for the sum) or a probability density (for the
integral) over random variable x.
We can approximate s by drawing n samples x(1), . . . , x(n) from p and then
forming the empirical average
ŝn =
1
n
n
X
i=1
f(x(i)
). (17.3)
This approximation is justified by a few different properties. The first trivial
observation is that the estimator ŝ is unbiased, since
Monte Carlo Sampling
egral cannot be computed exactly (for example, the sum
mber of terms, and no exact simplification is known), it is
ximate it using Monte Carlo sampling. The idea is to view
if it were an expectation under some distribution and to
ation by a corresponding average. Let
s =
X
x
p(x)f(x) = Ep[f(x)] (17.1)
s =
Z
p(x)f(x)dx = Ep[f(x)] (17.2)
o estimate, rewritten as an expectation, with the constraint
distribution (for the sum) or a probability density (for the
variable x.
te s by drawing n samples x(1), . . . , x(n) from p and then
average
ŝn =
1
n
n
X
i=1
f(x(i)
). (17.3)
justified by a few different properties. The first trivial
estimator ŝ is unbiased, since
17.1.2 Basics of Monte Carlo Sampling
When a sum or an integral cannot be computed exactly (for example, the sum
has an exponential number of terms, and no exact simplification is known), it is
often possible to approximate it using Monte Carlo sampling. The idea is to view
the sum or integral as if it were an expectation under some distribution and to
approximate the expectation by a corresponding average. Let
s =
X
x
p(x)f(x) = Ep[f(x)] (17.1)
or
s =
Z
p(x)f(x)dx = Ep[f(x)] (17.2)
be the sum or integral to estimate, rewritten as an expectation, with the constraint
that p is a probability distribution (for the sum) or a probability density (for the
integral) over random variable x.
We can approximate s by drawing n samples x(1), . . . , x(n) from p and then
forming the empirical average
ŝn =
1
n
n
X
i=1
f(x(i)
). (17.3)
This approximation is justified by a few different properties. The first trivial
observation is that the estimator ŝ is unbiased, since
(Goodfellow 2017)
Justification
• Unbiased:
• The expected value for finite n is equal to the correct value
• The value for any specific n samples will have random
error, but the errors for different sample sets cancel out
• Low variance:
• Variance is O(1/n)
• For very large n, the error converges “almost surely” to 0
(Goodfellow 2017)
Roadmap
• Basics of Monte Carlo methods
• Importance Sampling
• Markov Chains
(Goodfellow 2017)
Non-unique decomposition
ectation by a corresponding average. Let
s =
X
x
p(x)f(x) = Ep[f(x)] (17.1)
s =
Z
p(x)f(x)dx = Ep[f(x)] (17.2)
al to estimate, rewritten as an expectation, with the constraint
ty distribution (for the sum) or a probability density (for the
m variable x.
mate s by drawing n samples x(1), . . . , x(n) from p and then
al average
ŝn =
1
n
n
X
i=1
f(x(i)
). (17.3)
Say we want to compute
Z
a(x)b(x)c(x)dx.
Which part is p? Which part is f ?
p=a and f=bc? p=ab and f=c? etc.
No unique decomposition.
We can always pull part of any p into f.
(Goodfellow 2017)
Importance Sampling
n equation 17.2 is deciding which part of the integrand should
bility p(x) and which part of the integrand should play the
x) whose expected value (under that probability distribution)
here is no unique decomposition because p(x)f(x) can always
p(x)f(x) = q(x)
p(x)f(x)
q(x)
, (17.8)
from q and average pf
q . In many cases, we wish to compute
given p and an f, and the fact that the problem is specified
expectation suggests that this p and f would be a natural
ator of the variance is often preferred, in which the sum of squared
n 1 instead of n.
589
This is our new p,
meaning it is the
distribution we will draw
samples from
This ratio is our new f,
meaning we will evaluate
it at each sample
(Goodfellow 2017)
Why use importance sampling?
• Maybe it is feasible to sample from q but not from p
• This is how GANs work
• A good q can reduce the variance of the estimate
• Importance sampling is still unbiased for every q
(Goodfellow 2017)
Optimal q
• Determining the optimal q requires solving the original integral,
so not useful in practice
• Useful to understand intuition behind importance sampling
• This q minimizes the variance
• Places more mass on points where the weighted function is larger
Var[ŝq] = Var[
p(x)f(x)
q(x)
]/n. (17.12)
occurs when q is
q⇤
(x) =
p(x)|f(x)|
Z
, (17.13)
tion constant, chosen so that q⇤(x) sums or integrates to
importance sampling distributions put more weight where
In fact, when f(x) does not change sign, Var[ŝq⇤ ] = 0,
ample is sufficient when the optimal distribution is used.
ecause the computation of q⇤ has essentially solved the
usually not practical to use this approach of drawing a
ptimal distribution.
ing distribution q is valid (in the sense of yielding the
⇤
(Goodfellow 2017)
Roadmap
• Basics of Monte Carlo methods
• Importance Sampling
• Markov Chains
(Goodfellow 2017)
Sampling from p or q
• So far we have assumed we can sample from p or q
easily
• This is true when p or q has a directed graphical
model representation
• Use ancestral sampling
• Sample each node given its parents, moving from
roots to leaves
(Goodfellow 2017)
Sampling from undirected
models
• Sampling from undirected models is more difficult
• Can’t get a fair sample in one pass
• Use a Monte Carlo algorithm that incrementally
updates samples, comes closer to sampling from the
right distribution at each step
• This is called a Markov Chain
(Goodfellow 2017)
Simple Markov Chain: Gibbs
sampling
• Repeatedly cycle through all variables
• For each variable, randomly sample that variable
given its Markov blanket
• For an undirected model, the Markov blanket is
just the neighbors in the graph
• Block Gibbs trick: conditionally independent
variables may be sampled simultaneously
(Goodfellow 2017)
Gibbs sampling example
• Initialize a, s, and b
• For n repetitions
• Sample a from P(a|s) and b from P(b|s)
• Sample s from P(s|a,b)
CHAPTER 16. STRUCTURED PROBABILISTIC
a s b
(a)
Figure 16.6: (a) The path between random varia
active, because s is not observed. This means th
is shaded in, to indicate that it is observed. Be
through s, and that path is inactive, we can con
16.2.5 Separation and D-Separatio
Block Gibbs trick lets us
sample a and b in parallel
(Goodfellow 2017)
Equilibrium
• Running a Markov Chain long enough causes it to
mix
• After mixing, it samples from an equilibrium
distribution
• Sample before update comes from distribution π(x)
• Sample after update is a different sample, but still
from distribution π(x)
(Goodfellow 2017)
Downsides
• Generally infeasible to…
• …know ahead of time how long mixing will take
• …know how far a chain is from equilibrium
• …know whether a chain is at equilibrium
• Usually in deep learning we just run for n steps, for
some n that we think will be big enough, and hope for
the best
(Goodfellow 2017)
Trouble in Practice
• Mixing can take an infeasibly long time
• This is especially true for
• High-dimensional distributions
• Distributions with strong correlations between
variables
• Distributions with multiple highly separated modes
(Goodfellow 2017)
Difficult Mixing
Figure 17.1: Paths followed by Gibbs sampling for three distributions, with the Markov
chain initialized at the mode in both cases. (Left)A multivariate normal distribution
with two independent variables. Gibbs sampling mixes well because the variables are
independent. (Center)A multivariate normal distribution with highly correlated variables.
The correlation between variables makes it difficult for the Markov chain to mix. Because
the update for each variable must be conditioned on the other variable, the correlation
reduces the rate at which the Markov chain can move away from the starting point.
(Right)A mixture of Gaussians with widely separated modes that are not axis aligned.
Gibbs sampling mixes very slowly because it is difficult to change modes while altering
only one variable at a time.
(Goodfellow 2017)
Difficult Mixing in Deep
Generative Models
Energy-based models may be augmented with an extra parameter controlling
how sharply peaked the distribution is:
p (x) / exp ( E(x)) . (17.26)
The parameter is often described as being the reciprocal of the temperature,
reflecting the origin of energy-based models in statistical physics. When the
temperature falls to zero, and rises to infinity, the energy-based model becomes
Figure 17.2: An illustration of the slow mixing problem in deep probabilistic models.
Each panel should be read left to right, top to bottom. (Left)Consecutive samples from
Gibbs sampling applied to a deep Boltzmann machine trained on the MNIST dataset.
Consecutive samples are similar to each other. Because the Gibbs sampling is performed
in a deep graphical model, this similarity is based more on semantic than raw visual
features, but it is still difficult for the Gibbs chain to transition from one mode of the
distribution to another, for example, by changing the digit identity. (Right)Consecutive
ancestral samples from a generative adversarial network. Because ancestral sampling
generates each sample independently from the others, there is no mixing problem.
600
(Goodfellow 2017)
For more information…

Mais conteúdo relacionado

Mais procurados

Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
Caleb (Shiqiang) Jin
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
olli0601
 

Mais procurados (20)

Variational Inference
Variational InferenceVariational Inference
Variational Inference
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmology
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 

Semelhante a 17_monte_carlo.pdf

A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
Sean Golliher
 
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
SSA KPI
 
proposal_pura
proposal_puraproposal_pura
proposal_pura
Erick Lin
 

Semelhante a 17_monte_carlo.pdf (20)

Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
 
Climate Extremes Workshop - Assessing models for estimation and methods for u...
Climate Extremes Workshop - Assessing models for estimation and methods for u...Climate Extremes Workshop - Assessing models for estimation and methods for u...
Climate Extremes Workshop - Assessing models for estimation and methods for u...
 
Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal...
Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal...Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal...
Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal...
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
Averaging Schemes For Solving Fixed Point And Variational Inequality Problems
Averaging Schemes For Solving Fixed Point And Variational Inequality ProblemsAveraging Schemes For Solving Fixed Point And Variational Inequality Problems
Averaging Schemes For Solving Fixed Point And Variational Inequality Problems
 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
 
2 borgs
2 borgs2 borgs
2 borgs
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributions
 
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
 
Quadrature
QuadratureQuadrature
Quadrature
 
High-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECLHigh-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECL
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite Integral
 
10.1.1.630.8055
10.1.1.630.805510.1.1.630.8055
10.1.1.630.8055
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
 
Continuous random variables and probability distribution
Continuous random variables and probability distributionContinuous random variables and probability distribution
Continuous random variables and probability distribution
 
proposal_pura
proposal_puraproposal_pura
proposal_pura
 
A bit about мcmc
A bit about мcmcA bit about мcmc
A bit about мcmc
 
Elementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
Elementary Landscape Decomposition of the Hamiltonian Path Optimization ProblemElementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
Elementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
 
Computer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC AlgorithmComputer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC Algorithm
 

Mais de KSChidanandKumarJSSS (8)

12_applications.pdf
12_applications.pdf12_applications.pdf
12_applications.pdf
 
16_graphical_models.pdf
16_graphical_models.pdf16_graphical_models.pdf
16_graphical_models.pdf
 
10_rnn.pdf
10_rnn.pdf10_rnn.pdf
10_rnn.pdf
 
15_representation.pdf
15_representation.pdf15_representation.pdf
15_representation.pdf
 
14_autoencoders.pdf
14_autoencoders.pdf14_autoencoders.pdf
14_autoencoders.pdf
 
Adversarial ML - Part 2.pdf
Adversarial ML - Part 2.pdfAdversarial ML - Part 2.pdf
Adversarial ML - Part 2.pdf
 
Adversarial ML - Part 1.pdf
Adversarial ML - Part 1.pdfAdversarial ML - Part 1.pdf
Adversarial ML - Part 1.pdf
 
13_linear_factors.pdf
13_linear_factors.pdf13_linear_factors.pdf
13_linear_factors.pdf
 

Último

2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
MadhuKothuru
 
Competitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptxCompetitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptx
ScottMeyers35
 
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 

Último (20)

NGO working for orphan children’s education
NGO working for orphan children’s educationNGO working for orphan children’s education
NGO working for orphan children’s education
 
2024 UNESCO/Guillermo Cano World Press Freedom Prize
2024 UNESCO/Guillermo Cano World Press Freedom Prize2024 UNESCO/Guillermo Cano World Press Freedom Prize
2024 UNESCO/Guillermo Cano World Press Freedom Prize
 
Honasa Consumer Limited Impact Report 2024.pdf
Honasa Consumer Limited Impact Report 2024.pdfHonasa Consumer Limited Impact Report 2024.pdf
Honasa Consumer Limited Impact Report 2024.pdf
 
Panchayath circular KLC -Panchayath raj act s 169, 218
Panchayath circular KLC -Panchayath raj act s 169, 218Panchayath circular KLC -Panchayath raj act s 169, 218
Panchayath circular KLC -Panchayath raj act s 169, 218
 
Financing strategies for adaptation. Presentation for CANCC
Financing strategies for adaptation. Presentation for CANCCFinancing strategies for adaptation. Presentation for CANCC
Financing strategies for adaptation. Presentation for CANCC
 
Scaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP processScaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP process
 
74th Amendment of India PPT by Piyush(IC).pptx
74th Amendment of India PPT by Piyush(IC).pptx74th Amendment of India PPT by Piyush(IC).pptx
74th Amendment of India PPT by Piyush(IC).pptx
 
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
 
unang digmaang pandaigdig tagalog version
unang digmaang pandaigdig tagalog versionunang digmaang pandaigdig tagalog version
unang digmaang pandaigdig tagalog version
 
The NAP process & South-South peer learning
The NAP process & South-South peer learningThe NAP process & South-South peer learning
The NAP process & South-South peer learning
 
2024 UN Civil Society Conference in Support of the Summit of the Future.
2024 UN Civil Society Conference in Support of the Summit of the Future.2024 UN Civil Society Conference in Support of the Summit of the Future.
2024 UN Civil Society Conference in Support of the Summit of the Future.
 
World Press Freedom Day 2024; May 3rd - Poster
World Press Freedom Day 2024; May 3rd - PosterWorld Press Freedom Day 2024; May 3rd - Poster
World Press Freedom Day 2024; May 3rd - Poster
 
Just Call VIP Call Girls In Bangalore Kr Puram ☎️ 6378878445 Independent Fem...
Just Call VIP Call Girls In  Bangalore Kr Puram ☎️ 6378878445 Independent Fem...Just Call VIP Call Girls In  Bangalore Kr Puram ☎️ 6378878445 Independent Fem...
Just Call VIP Call Girls In Bangalore Kr Puram ☎️ 6378878445 Independent Fem...
 
Competitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptxCompetitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptx
 
Call Girls in Moti Bagh (delhi) call me [8448380779] escort service 24X7
Call Girls in Moti Bagh (delhi) call me [8448380779] escort service 24X7Call Girls in Moti Bagh (delhi) call me [8448380779] escort service 24X7
Call Girls in Moti Bagh (delhi) call me [8448380779] escort service 24X7
 
Coastal Protection Measures in Hulhumale'
Coastal Protection Measures in Hulhumale'Coastal Protection Measures in Hulhumale'
Coastal Protection Measures in Hulhumale'
 
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
 
A Press for the Planet: Journalism in the face of the Environmental Crisis
A Press for the Planet: Journalism in the face of the Environmental CrisisA Press for the Planet: Journalism in the face of the Environmental Crisis
A Press for the Planet: Journalism in the face of the Environmental Crisis
 
31st World Press Freedom Day Conference in Santiago.
31st World Press Freedom Day Conference in Santiago.31st World Press Freedom Day Conference in Santiago.
31st World Press Freedom Day Conference in Santiago.
 
Finance strategies for adaptation. Presentation for CANCC
Finance strategies for adaptation. Presentation for CANCCFinance strategies for adaptation. Presentation for CANCC
Finance strategies for adaptation. Presentation for CANCC
 

17_monte_carlo.pdf

  • 1. Monte Carlo Methods Lecture slides for Chapter 17 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2017-12-29
  • 2. (Goodfellow 2017) Roadmap • Basics of Monte Carlo methods • Importance Sampling • Markov Chains
  • 3. (Goodfellow 2017) Randomized Algorithms Las Vegas Monte Carlo Type of Answer Exact Random amount of error Runtime Random (until answer found) Chosen by user (longer runtime gives lesss error)
  • 4. (Goodfellow 2017) Estimating sums / integrals with samples has an exponential number of terms, and no exact simplification is known), it is often possible to approximate it using Monte Carlo sampling. The idea is to view the sum or integral as if it were an expectation under some distribution and to approximate the expectation by a corresponding average. Let s = X x p(x)f(x) = Ep[f(x)] (17.1) or s = Z p(x)f(x)dx = Ep[f(x)] (17.2) be the sum or integral to estimate, rewritten as an expectation, with the constraint that p is a probability distribution (for the sum) or a probability density (for the integral) over random variable x. We can approximate s by drawing n samples x(1), . . . , x(n) from p and then forming the empirical average ŝn = 1 n n X i=1 f(x(i) ). (17.3) This approximation is justified by a few different properties. The first trivial observation is that the estimator ŝ is unbiased, since Monte Carlo Sampling egral cannot be computed exactly (for example, the sum mber of terms, and no exact simplification is known), it is ximate it using Monte Carlo sampling. The idea is to view if it were an expectation under some distribution and to ation by a corresponding average. Let s = X x p(x)f(x) = Ep[f(x)] (17.1) s = Z p(x)f(x)dx = Ep[f(x)] (17.2) o estimate, rewritten as an expectation, with the constraint distribution (for the sum) or a probability density (for the variable x. te s by drawing n samples x(1), . . . , x(n) from p and then average ŝn = 1 n n X i=1 f(x(i) ). (17.3) justified by a few different properties. The first trivial estimator ŝ is unbiased, since 17.1.2 Basics of Monte Carlo Sampling When a sum or an integral cannot be computed exactly (for example, the sum has an exponential number of terms, and no exact simplification is known), it is often possible to approximate it using Monte Carlo sampling. The idea is to view the sum or integral as if it were an expectation under some distribution and to approximate the expectation by a corresponding average. Let s = X x p(x)f(x) = Ep[f(x)] (17.1) or s = Z p(x)f(x)dx = Ep[f(x)] (17.2) be the sum or integral to estimate, rewritten as an expectation, with the constraint that p is a probability distribution (for the sum) or a probability density (for the integral) over random variable x. We can approximate s by drawing n samples x(1), . . . , x(n) from p and then forming the empirical average ŝn = 1 n n X i=1 f(x(i) ). (17.3) This approximation is justified by a few different properties. The first trivial observation is that the estimator ŝ is unbiased, since
  • 5. (Goodfellow 2017) Justification • Unbiased: • The expected value for finite n is equal to the correct value • The value for any specific n samples will have random error, but the errors for different sample sets cancel out • Low variance: • Variance is O(1/n) • For very large n, the error converges “almost surely” to 0
  • 6. (Goodfellow 2017) Roadmap • Basics of Monte Carlo methods • Importance Sampling • Markov Chains
  • 7. (Goodfellow 2017) Non-unique decomposition ectation by a corresponding average. Let s = X x p(x)f(x) = Ep[f(x)] (17.1) s = Z p(x)f(x)dx = Ep[f(x)] (17.2) al to estimate, rewritten as an expectation, with the constraint ty distribution (for the sum) or a probability density (for the m variable x. mate s by drawing n samples x(1), . . . , x(n) from p and then al average ŝn = 1 n n X i=1 f(x(i) ). (17.3) Say we want to compute Z a(x)b(x)c(x)dx. Which part is p? Which part is f ? p=a and f=bc? p=ab and f=c? etc. No unique decomposition. We can always pull part of any p into f.
  • 8. (Goodfellow 2017) Importance Sampling n equation 17.2 is deciding which part of the integrand should bility p(x) and which part of the integrand should play the x) whose expected value (under that probability distribution) here is no unique decomposition because p(x)f(x) can always p(x)f(x) = q(x) p(x)f(x) q(x) , (17.8) from q and average pf q . In many cases, we wish to compute given p and an f, and the fact that the problem is specified expectation suggests that this p and f would be a natural ator of the variance is often preferred, in which the sum of squared n 1 instead of n. 589 This is our new p, meaning it is the distribution we will draw samples from This ratio is our new f, meaning we will evaluate it at each sample
  • 9. (Goodfellow 2017) Why use importance sampling? • Maybe it is feasible to sample from q but not from p • This is how GANs work • A good q can reduce the variance of the estimate • Importance sampling is still unbiased for every q
  • 10. (Goodfellow 2017) Optimal q • Determining the optimal q requires solving the original integral, so not useful in practice • Useful to understand intuition behind importance sampling • This q minimizes the variance • Places more mass on points where the weighted function is larger Var[ŝq] = Var[ p(x)f(x) q(x) ]/n. (17.12) occurs when q is q⇤ (x) = p(x)|f(x)| Z , (17.13) tion constant, chosen so that q⇤(x) sums or integrates to importance sampling distributions put more weight where In fact, when f(x) does not change sign, Var[ŝq⇤ ] = 0, ample is sufficient when the optimal distribution is used. ecause the computation of q⇤ has essentially solved the usually not practical to use this approach of drawing a ptimal distribution. ing distribution q is valid (in the sense of yielding the ⇤
  • 11. (Goodfellow 2017) Roadmap • Basics of Monte Carlo methods • Importance Sampling • Markov Chains
  • 12. (Goodfellow 2017) Sampling from p or q • So far we have assumed we can sample from p or q easily • This is true when p or q has a directed graphical model representation • Use ancestral sampling • Sample each node given its parents, moving from roots to leaves
  • 13. (Goodfellow 2017) Sampling from undirected models • Sampling from undirected models is more difficult • Can’t get a fair sample in one pass • Use a Monte Carlo algorithm that incrementally updates samples, comes closer to sampling from the right distribution at each step • This is called a Markov Chain
  • 14. (Goodfellow 2017) Simple Markov Chain: Gibbs sampling • Repeatedly cycle through all variables • For each variable, randomly sample that variable given its Markov blanket • For an undirected model, the Markov blanket is just the neighbors in the graph • Block Gibbs trick: conditionally independent variables may be sampled simultaneously
  • 15. (Goodfellow 2017) Gibbs sampling example • Initialize a, s, and b • For n repetitions • Sample a from P(a|s) and b from P(b|s) • Sample s from P(s|a,b) CHAPTER 16. STRUCTURED PROBABILISTIC a s b (a) Figure 16.6: (a) The path between random varia active, because s is not observed. This means th is shaded in, to indicate that it is observed. Be through s, and that path is inactive, we can con 16.2.5 Separation and D-Separatio Block Gibbs trick lets us sample a and b in parallel
  • 16. (Goodfellow 2017) Equilibrium • Running a Markov Chain long enough causes it to mix • After mixing, it samples from an equilibrium distribution • Sample before update comes from distribution π(x) • Sample after update is a different sample, but still from distribution π(x)
  • 17. (Goodfellow 2017) Downsides • Generally infeasible to… • …know ahead of time how long mixing will take • …know how far a chain is from equilibrium • …know whether a chain is at equilibrium • Usually in deep learning we just run for n steps, for some n that we think will be big enough, and hope for the best
  • 18. (Goodfellow 2017) Trouble in Practice • Mixing can take an infeasibly long time • This is especially true for • High-dimensional distributions • Distributions with strong correlations between variables • Distributions with multiple highly separated modes
  • 19. (Goodfellow 2017) Difficult Mixing Figure 17.1: Paths followed by Gibbs sampling for three distributions, with the Markov chain initialized at the mode in both cases. (Left)A multivariate normal distribution with two independent variables. Gibbs sampling mixes well because the variables are independent. (Center)A multivariate normal distribution with highly correlated variables. The correlation between variables makes it difficult for the Markov chain to mix. Because the update for each variable must be conditioned on the other variable, the correlation reduces the rate at which the Markov chain can move away from the starting point. (Right)A mixture of Gaussians with widely separated modes that are not axis aligned. Gibbs sampling mixes very slowly because it is difficult to change modes while altering only one variable at a time.
  • 20. (Goodfellow 2017) Difficult Mixing in Deep Generative Models Energy-based models may be augmented with an extra parameter controlling how sharply peaked the distribution is: p (x) / exp ( E(x)) . (17.26) The parameter is often described as being the reciprocal of the temperature, reflecting the origin of energy-based models in statistical physics. When the temperature falls to zero, and rises to infinity, the energy-based model becomes Figure 17.2: An illustration of the slow mixing problem in deep probabilistic models. Each panel should be read left to right, top to bottom. (Left)Consecutive samples from Gibbs sampling applied to a deep Boltzmann machine trained on the MNIST dataset. Consecutive samples are similar to each other. Because the Gibbs sampling is performed in a deep graphical model, this similarity is based more on semantic than raw visual features, but it is still difficult for the Gibbs chain to transition from one mode of the distribution to another, for example, by changing the digit identity. (Right)Consecutive ancestral samples from a generative adversarial network. Because ancestral sampling generates each sample independently from the others, there is no mixing problem. 600
  • 21. (Goodfellow 2017) For more information…