SlideShare uma empresa Scribd logo
1 de 4
Baixar para ler offline
Universit´ Paris-Dauphine Ann´e 2012-2013
         e                   e
D´partement de Math´matique
 e                   e

                             Examen NOISE, sujet A
                                     Pr´liminaires
                                       e

  Cet examen est ` r´aliser sur ordinateur en utilisant le langage R et `
                    a e                                                    a
  rendre simultan´ment sur papier pour les r´ponses d´taill´es et sur fichier
                  e                           e         e  e
  informatique Examen pour les fonctions R utilis´es. Les fichiers informa-
                                                    e
  tiques seront ` sauvegarder suivant la proc´dure ci-dessous et seront pris
                a                              e
  en compte pour la note finale. Toute duplication de fichiers R fera l’objet
  d’une poursuite disciplinaire. L’absence de document enregistr´ donnera
                                                                  e
  lieu ` une note nulle sans possibilit´ de contestation.
       a                               e
    1. Enregistrez r´guli`rement vos fichiers sur l’ordinateur, sans utiliser
                    e    e
       d’accents ni d’espace, ni de caract`res sp´ciaux.
                                          e      e
    2. Si vous utilisez Rkward, vous devez enregistrer ` l’aide du bouton
                                                       a
       “Save script” (ou “Save script as”) et non “Save”.
    3. V´rifiez que vos fichiers ont bien ´t´ enregistr´s en les rouvrant avant
         e                              e e          e
       de vous d´connecter. N’h´sitez pas ` rouvrir votre fichier ` l’aide d’un
                e               e         a                       a
       autre ´diteur de texte afin de v´rifier qu’il contient bien tout votre
             e                          e
       code R.
    4. En cas de probl`me ou d’inqui´tude, contacter un enseignant sans
                      e                 e
       vous d´connecter. Il nous est sinon impossible de r´cup´rer les fichiers
             e                                            e   e
       de sauvegarde automatique.

  Aucun document informatique n’est autoris´, seuls les livres de R le sont.
                                                e
  L’utilisation de tout service de messagerie ou de mail est interdite et, en
  cas d’utilisation av´r´e, se verra sanctionn´e.
                      e e                     e
  Les probl`mes sont ind´pendants, peuvent ˆtre trait´s dans n’importe quel
           e            e                  e         e
  ordre. R´soudre trois et uniquement trois exercices au choix.
          e



Exercice 1
Download the dataset LakeHuron :

> data(LakeHuron)
> huron = jitter(as.vector(LakeHuron))

We assume that those observations are iid realisations Xn = (X1 , . . . , Xn ) of a random
variable X.

We denote by IQ0.5 (Xn ) the inter-quartile interval of the sample Xn . It is defined as
IQ0.5 (Xn ) = Q0.75 (Xn ) − Q0.25 (Xn ) where Q0.75 (Xn ) and Q0.25 (Xn ) are the empirical
quartiles of the sample Xn at levels 75% and 25%. We would like to calibrate IQ0.5 (Xn )
by a coefficient α so that it becomes an unbiased estimator of the standard deviation σ
of the distribution of the Xi ’s.
1. Write an R function iqar(x) which produces the statistic IQ0.5 (Xn ) associated
     with the sample x, taking special care of the case when x has 3 elements or less.
     Compare your output with the one of the resident R function IQR() on huron.
  2. Simulate 104 replicas of a normal N (µ, σ 2 ) sample Xn of size n = 10 and deduce a
     Monte Carlo evaluation of the coefficient α such that αE[IQ0.5 (Xn )] = σ. (Extra-
     credit : Explain why the values of µ and σ can be chosen arbitrarily.)
  3. Repeat the above question with 104 replicas of a normal N (µ, σ) sample Xn of size
     n = 50. (Extra-credit : Do you notice enough similarity between both α’s to accept
     the hypothesis that they are equal ?)
  4. Getting back to the case of question 2., when n = 10, and using the 104 reali-
     sations of IQ0.5 (Xn ) generated in question 2., deduce a 96% confidence interval
     on IQ0.5 (Xn )/σ. (Hint : Use the empirical cdf of the IQ0.5 (Xn )’s, rather than
     bootstrap.) Compare with the asymptotically normal 96% confidence interval on
     E[IQ0.5 (Xn )]/σ. Check whether or not 1.3490 belongs to these intervals. (Extra-
     credit : Justify the choice α = 1/1.3490.)
  5. Check whether or not huron is distributed from a normal sample (with unknown
     mean and variance).
  6. Since huron is not necessarily a normal sample, denoting by σ the standard deviation
     of the distribution of the Xi ’s, construct by bootstrap a 96% confidence interval
     on E[IQ0.5 (Xn )]/σ, where σ is estimated by the usual empirical standard deviate
     σ (Xn ). Does it still contain 1.3490 ?
     ˆ

Exercice 2
Consider the Rider density function
                                                          k
                                 n!     1   1                     1
                     fk (x) =             − 2 arctan2 x                  ,
                                (k!)2   4 π                   π(1 + x2 )

where n = 2k + 1 and k ≥ 1 is an integer.
  1. Check by numerical integration that fk is a proper density for k = 5, 10, 20
  2. Design an accept-reject algorithm function on R that produce an iid sample of
     arbitrary size m for an arbitrary parameter k. Produce a graphical verification of
     the fit for m = 103 and k = 5, 10, 20.
  3. We want to check from the acceptance rate of this accept-reject algorithm that
     the normalisation is correct in the above. Produce 520 realisations of an empirical
     acceptance rate based on 100 proposals and deduce a 94% confidence interval on
     the expectation of the acceptance rate. Check whether or not it contains the inverse
     normalising constant.
  4. This density is actually the distribution of the median of a Cauchy sample of size
     n = 2k +1. Generate a sample from the above accept-reject algorithm with m = 520
     and k = 10, then another sample of m = 520 medians from samples of 21 Cauchy
     variates. Test whether they have the same distribution.
  5. Check whether or not the p-value of the above test is distributed as a uniform U (0, 1)
     random variate. (Extra-credit : Establish why the distribution of the p-value should
     be uniform.)
Exercice 3
If U1 , U2 , . . . , Uk is a sample from the U (0, 1) distribution, then Mk = min(U1 , . . . , Uk )
follows the Beta(1, k) distribution. We wish to verify that
                                                L
                                      kMk − − Exp(1)
                                           −→
                                             k→∞

   1. Create a function rbeta2(n, k) which simulates n realizations of the Beta(1, k)
      distribution, using nk realizations of the uniform distribution. (Note : if you do not
      manage this question, you can use the R function rbeta(n,1,k) for the remainder
      of the exercise.)
   2. For k = 50 and n = 1000, propose a graphical way to verify the fit of kMk to the
      Exp(1) distribution.
   3. Using ks.test() and n = 1000, check whether the exponential distribution is an
      acceptable fit when k = 10, k = 50, k = 200.
   4. From now on, k = 200 and n = 1000. We now have a test to check the fit of a sample
      x to the Beta(1, k) distribution : we accept the null hypothesis that x comes from
      the Beta(1, k) distribution iff the Kolmogorov-Smirnov test accepts the hypothesis
      that kx fits the Exp(1) distribution. Perform a bootstrap experiment to calculate
      the probability of accepting the null hypothesis for a sample which comes from the
      Beta(1, k) distribution.
   5. Perform another bootstrap experiment to calculate the same probability when using
      directly the Kolmogorov-Smirnov test for fit to the Beta(1, k) distribution (whose
      cdf exists in R as pbeta).

Exercice 4
The SkewLogistic(α) distribution defines a random variable X which takes values in R
and with cumulative distribution function
                                                     1
                                      F (x) =
                                                (1 + e−x )α
   1. Using the generic inversion method, write a function rskewlogistic(n,α) which
      outputs n realizations of the SkewLogistic(α) distribution.
   2. For α = 2, give a Monte Carlo experiment to estimate V ar(X) and the median of
      X. Calculate (on paper) the theoretical value of the median and compare it to your
      estimate.
   3. Propose a bootstrap experiment to evaluate the bias of your variance and median
      estimators.
   4. For α = 2, use the Kolmogorov-Smirnov test to verify that the variable

                                          Y = log(1 + e−X )

      follows an Exp(2) distribution.

Exercice 5
Given the probability density
                                                    C − |x−δ|
                                     f (x|θ, δ) =     e θ ,
                                                    θ
1. explain why an importance sampling technique, designed to approximate the
     constant C, that is based on the Normal density cannot not work. Illustrate this
     lack of convergence with a numerical experiment using θ = 2 and δ = 4.
  2. Propose a more suitable importance distribution.
We now focus on the integral
                                       I=       xf (x|2, 4)dx
                                            R

using samples of size n = 102 .
  3. Propose a Monte Carlo approximation of I. (Hint : Note that the integral over R is
     twice the integral over R+ when δ = 0 and connect f with a standard distribution
     on (δ, ∞).)
  4. Approximate I by importance sampling using the same distribution g as in question
     2.
  5. Compute a confidence interval on I at level 95% for each of your method. Which
     one of the two estimates does reach the lowest precision ?
  6. Design a Monte Carlo experiment in order to check whether or not the asymptotic
     coverage level of the CI holds. Repeat the experiment with samples of size n = 103 .

Exercice 6
Given the Galton density on R∗ ,
                             +

                                        1
                        f (x|µ, σ) =    √       exp{−(log(x) − µ)2 /2σ 2 }
                                       xσ 2π
  1. Determine which of the following distributions can be used in an A/R algorithm
     designed to sample from f (x|0, 1) :

                    k x k−1 −(xλ)k                1 1 k−1 − x
         g1 (x) =    ( ) e             g2 (x) =           x e θ      g3 (x) = (1 + αx)−1/α−1
                    λ λ                           θk Γ(k)

     which are respectively a Weibull, a Gamma and a generalized Pareto distribution.
     Determine the appropriate upper bounds.
  2. Using the inversion method write an algorithm that samples from the selected g.
  3. Write an R function AR() that samples from f (x|0, 1). (Extra-credit : Optimize the
     parameters of the proposal density g.)
  4. Based on a sample of size 104 from f (x|0, 1), estimate by Monte Carlo the mean
     and variance of h(X) = log(X) when X ∼ f and give a confidence interval at level
     95% for both quantities.
  5. The distribution associated with f can be obtained by the transform exp{Z} when
     Z ∼ N (µ, σ). Establish this result and test it, based on the sample used in question
     4.

Mais conteúdo relacionado

Mais de Christian Robert

Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimationChristian Robert
 

Mais de Christian Robert (20)

CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 

Último

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Último (20)

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

R exam (A) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013

  • 1. Universit´ Paris-Dauphine Ann´e 2012-2013 e e D´partement de Math´matique e e Examen NOISE, sujet A Pr´liminaires e Cet examen est ` r´aliser sur ordinateur en utilisant le langage R et ` a e a rendre simultan´ment sur papier pour les r´ponses d´taill´es et sur fichier e e e e informatique Examen pour les fonctions R utilis´es. Les fichiers informa- e tiques seront ` sauvegarder suivant la proc´dure ci-dessous et seront pris a e en compte pour la note finale. Toute duplication de fichiers R fera l’objet d’une poursuite disciplinaire. L’absence de document enregistr´ donnera e lieu ` une note nulle sans possibilit´ de contestation. a e 1. Enregistrez r´guli`rement vos fichiers sur l’ordinateur, sans utiliser e e d’accents ni d’espace, ni de caract`res sp´ciaux. e e 2. Si vous utilisez Rkward, vous devez enregistrer ` l’aide du bouton a “Save script” (ou “Save script as”) et non “Save”. 3. V´rifiez que vos fichiers ont bien ´t´ enregistr´s en les rouvrant avant e e e e de vous d´connecter. N’h´sitez pas ` rouvrir votre fichier ` l’aide d’un e e a a autre ´diteur de texte afin de v´rifier qu’il contient bien tout votre e e code R. 4. En cas de probl`me ou d’inqui´tude, contacter un enseignant sans e e vous d´connecter. Il nous est sinon impossible de r´cup´rer les fichiers e e e de sauvegarde automatique. Aucun document informatique n’est autoris´, seuls les livres de R le sont. e L’utilisation de tout service de messagerie ou de mail est interdite et, en cas d’utilisation av´r´e, se verra sanctionn´e. e e e Les probl`mes sont ind´pendants, peuvent ˆtre trait´s dans n’importe quel e e e e ordre. R´soudre trois et uniquement trois exercices au choix. e Exercice 1 Download the dataset LakeHuron : > data(LakeHuron) > huron = jitter(as.vector(LakeHuron)) We assume that those observations are iid realisations Xn = (X1 , . . . , Xn ) of a random variable X. We denote by IQ0.5 (Xn ) the inter-quartile interval of the sample Xn . It is defined as IQ0.5 (Xn ) = Q0.75 (Xn ) − Q0.25 (Xn ) where Q0.75 (Xn ) and Q0.25 (Xn ) are the empirical quartiles of the sample Xn at levels 75% and 25%. We would like to calibrate IQ0.5 (Xn ) by a coefficient α so that it becomes an unbiased estimator of the standard deviation σ of the distribution of the Xi ’s.
  • 2. 1. Write an R function iqar(x) which produces the statistic IQ0.5 (Xn ) associated with the sample x, taking special care of the case when x has 3 elements or less. Compare your output with the one of the resident R function IQR() on huron. 2. Simulate 104 replicas of a normal N (µ, σ 2 ) sample Xn of size n = 10 and deduce a Monte Carlo evaluation of the coefficient α such that αE[IQ0.5 (Xn )] = σ. (Extra- credit : Explain why the values of µ and σ can be chosen arbitrarily.) 3. Repeat the above question with 104 replicas of a normal N (µ, σ) sample Xn of size n = 50. (Extra-credit : Do you notice enough similarity between both α’s to accept the hypothesis that they are equal ?) 4. Getting back to the case of question 2., when n = 10, and using the 104 reali- sations of IQ0.5 (Xn ) generated in question 2., deduce a 96% confidence interval on IQ0.5 (Xn )/σ. (Hint : Use the empirical cdf of the IQ0.5 (Xn )’s, rather than bootstrap.) Compare with the asymptotically normal 96% confidence interval on E[IQ0.5 (Xn )]/σ. Check whether or not 1.3490 belongs to these intervals. (Extra- credit : Justify the choice α = 1/1.3490.) 5. Check whether or not huron is distributed from a normal sample (with unknown mean and variance). 6. Since huron is not necessarily a normal sample, denoting by σ the standard deviation of the distribution of the Xi ’s, construct by bootstrap a 96% confidence interval on E[IQ0.5 (Xn )]/σ, where σ is estimated by the usual empirical standard deviate σ (Xn ). Does it still contain 1.3490 ? ˆ Exercice 2 Consider the Rider density function k n! 1 1 1 fk (x) = − 2 arctan2 x , (k!)2 4 π π(1 + x2 ) where n = 2k + 1 and k ≥ 1 is an integer. 1. Check by numerical integration that fk is a proper density for k = 5, 10, 20 2. Design an accept-reject algorithm function on R that produce an iid sample of arbitrary size m for an arbitrary parameter k. Produce a graphical verification of the fit for m = 103 and k = 5, 10, 20. 3. We want to check from the acceptance rate of this accept-reject algorithm that the normalisation is correct in the above. Produce 520 realisations of an empirical acceptance rate based on 100 proposals and deduce a 94% confidence interval on the expectation of the acceptance rate. Check whether or not it contains the inverse normalising constant. 4. This density is actually the distribution of the median of a Cauchy sample of size n = 2k +1. Generate a sample from the above accept-reject algorithm with m = 520 and k = 10, then another sample of m = 520 medians from samples of 21 Cauchy variates. Test whether they have the same distribution. 5. Check whether or not the p-value of the above test is distributed as a uniform U (0, 1) random variate. (Extra-credit : Establish why the distribution of the p-value should be uniform.)
  • 3. Exercice 3 If U1 , U2 , . . . , Uk is a sample from the U (0, 1) distribution, then Mk = min(U1 , . . . , Uk ) follows the Beta(1, k) distribution. We wish to verify that L kMk − − Exp(1) −→ k→∞ 1. Create a function rbeta2(n, k) which simulates n realizations of the Beta(1, k) distribution, using nk realizations of the uniform distribution. (Note : if you do not manage this question, you can use the R function rbeta(n,1,k) for the remainder of the exercise.) 2. For k = 50 and n = 1000, propose a graphical way to verify the fit of kMk to the Exp(1) distribution. 3. Using ks.test() and n = 1000, check whether the exponential distribution is an acceptable fit when k = 10, k = 50, k = 200. 4. From now on, k = 200 and n = 1000. We now have a test to check the fit of a sample x to the Beta(1, k) distribution : we accept the null hypothesis that x comes from the Beta(1, k) distribution iff the Kolmogorov-Smirnov test accepts the hypothesis that kx fits the Exp(1) distribution. Perform a bootstrap experiment to calculate the probability of accepting the null hypothesis for a sample which comes from the Beta(1, k) distribution. 5. Perform another bootstrap experiment to calculate the same probability when using directly the Kolmogorov-Smirnov test for fit to the Beta(1, k) distribution (whose cdf exists in R as pbeta). Exercice 4 The SkewLogistic(α) distribution defines a random variable X which takes values in R and with cumulative distribution function 1 F (x) = (1 + e−x )α 1. Using the generic inversion method, write a function rskewlogistic(n,α) which outputs n realizations of the SkewLogistic(α) distribution. 2. For α = 2, give a Monte Carlo experiment to estimate V ar(X) and the median of X. Calculate (on paper) the theoretical value of the median and compare it to your estimate. 3. Propose a bootstrap experiment to evaluate the bias of your variance and median estimators. 4. For α = 2, use the Kolmogorov-Smirnov test to verify that the variable Y = log(1 + e−X ) follows an Exp(2) distribution. Exercice 5 Given the probability density C − |x−δ| f (x|θ, δ) = e θ , θ
  • 4. 1. explain why an importance sampling technique, designed to approximate the constant C, that is based on the Normal density cannot not work. Illustrate this lack of convergence with a numerical experiment using θ = 2 and δ = 4. 2. Propose a more suitable importance distribution. We now focus on the integral I= xf (x|2, 4)dx R using samples of size n = 102 . 3. Propose a Monte Carlo approximation of I. (Hint : Note that the integral over R is twice the integral over R+ when δ = 0 and connect f with a standard distribution on (δ, ∞).) 4. Approximate I by importance sampling using the same distribution g as in question 2. 5. Compute a confidence interval on I at level 95% for each of your method. Which one of the two estimates does reach the lowest precision ? 6. Design a Monte Carlo experiment in order to check whether or not the asymptotic coverage level of the CI holds. Repeat the experiment with samples of size n = 103 . Exercice 6 Given the Galton density on R∗ , + 1 f (x|µ, σ) = √ exp{−(log(x) − µ)2 /2σ 2 } xσ 2π 1. Determine which of the following distributions can be used in an A/R algorithm designed to sample from f (x|0, 1) : k x k−1 −(xλ)k 1 1 k−1 − x g1 (x) = ( ) e g2 (x) = x e θ g3 (x) = (1 + αx)−1/α−1 λ λ θk Γ(k) which are respectively a Weibull, a Gamma and a generalized Pareto distribution. Determine the appropriate upper bounds. 2. Using the inversion method write an algorithm that samples from the selected g. 3. Write an R function AR() that samples from f (x|0, 1). (Extra-credit : Optimize the parameters of the proposal density g.) 4. Based on a sample of size 104 from f (x|0, 1), estimate by Monte Carlo the mean and variance of h(X) = log(X) when X ∼ f and give a confidence interval at level 95% for both quantities. 5. The distribution associated with f can be obtained by the transform exp{Z} when Z ∼ N (µ, σ). Establish this result and test it, based on the sample used in question 4.