This is one of two exams given to our students this year. They had two hours to solve three problems and had to return R codes as well as handwritten explanations.
R exam (A) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013
1. Universit´ Paris-Dauphine Ann´e 2012-2013
e e
D´partement de Math´matique
e e
Examen NOISE, sujet A
Pr´liminaires
e
Cet examen est ` r´aliser sur ordinateur en utilisant le langage R et `
a e a
rendre simultan´ment sur papier pour les r´ponses d´taill´es et sur fichier
e e e e
informatique Examen pour les fonctions R utilis´es. Les fichiers informa-
e
tiques seront ` sauvegarder suivant la proc´dure ci-dessous et seront pris
a e
en compte pour la note finale. Toute duplication de fichiers R fera l’objet
d’une poursuite disciplinaire. L’absence de document enregistr´ donnera
e
lieu ` une note nulle sans possibilit´ de contestation.
a e
1. Enregistrez r´guli`rement vos fichiers sur l’ordinateur, sans utiliser
e e
d’accents ni d’espace, ni de caract`res sp´ciaux.
e e
2. Si vous utilisez Rkward, vous devez enregistrer ` l’aide du bouton
a
“Save script” (ou “Save script as”) et non “Save”.
3. V´rifiez que vos fichiers ont bien ´t´ enregistr´s en les rouvrant avant
e e e e
de vous d´connecter. N’h´sitez pas ` rouvrir votre fichier ` l’aide d’un
e e a a
autre ´diteur de texte afin de v´rifier qu’il contient bien tout votre
e e
code R.
4. En cas de probl`me ou d’inqui´tude, contacter un enseignant sans
e e
vous d´connecter. Il nous est sinon impossible de r´cup´rer les fichiers
e e e
de sauvegarde automatique.
Aucun document informatique n’est autoris´, seuls les livres de R le sont.
e
L’utilisation de tout service de messagerie ou de mail est interdite et, en
cas d’utilisation av´r´e, se verra sanctionn´e.
e e e
Les probl`mes sont ind´pendants, peuvent ˆtre trait´s dans n’importe quel
e e e e
ordre. R´soudre trois et uniquement trois exercices au choix.
e
Exercice 1
Download the dataset LakeHuron :
> data(LakeHuron)
> huron = jitter(as.vector(LakeHuron))
We assume that those observations are iid realisations Xn = (X1 , . . . , Xn ) of a random
variable X.
We denote by IQ0.5 (Xn ) the inter-quartile interval of the sample Xn . It is defined as
IQ0.5 (Xn ) = Q0.75 (Xn ) − Q0.25 (Xn ) where Q0.75 (Xn ) and Q0.25 (Xn ) are the empirical
quartiles of the sample Xn at levels 75% and 25%. We would like to calibrate IQ0.5 (Xn )
by a coefficient α so that it becomes an unbiased estimator of the standard deviation σ
of the distribution of the Xi ’s.
2. 1. Write an R function iqar(x) which produces the statistic IQ0.5 (Xn ) associated
with the sample x, taking special care of the case when x has 3 elements or less.
Compare your output with the one of the resident R function IQR() on huron.
2. Simulate 104 replicas of a normal N (µ, σ 2 ) sample Xn of size n = 10 and deduce a
Monte Carlo evaluation of the coefficient α such that αE[IQ0.5 (Xn )] = σ. (Extra-
credit : Explain why the values of µ and σ can be chosen arbitrarily.)
3. Repeat the above question with 104 replicas of a normal N (µ, σ) sample Xn of size
n = 50. (Extra-credit : Do you notice enough similarity between both α’s to accept
the hypothesis that they are equal ?)
4. Getting back to the case of question 2., when n = 10, and using the 104 reali-
sations of IQ0.5 (Xn ) generated in question 2., deduce a 96% confidence interval
on IQ0.5 (Xn )/σ. (Hint : Use the empirical cdf of the IQ0.5 (Xn )’s, rather than
bootstrap.) Compare with the asymptotically normal 96% confidence interval on
E[IQ0.5 (Xn )]/σ. Check whether or not 1.3490 belongs to these intervals. (Extra-
credit : Justify the choice α = 1/1.3490.)
5. Check whether or not huron is distributed from a normal sample (with unknown
mean and variance).
6. Since huron is not necessarily a normal sample, denoting by σ the standard deviation
of the distribution of the Xi ’s, construct by bootstrap a 96% confidence interval
on E[IQ0.5 (Xn )]/σ, where σ is estimated by the usual empirical standard deviate
σ (Xn ). Does it still contain 1.3490 ?
ˆ
Exercice 2
Consider the Rider density function
k
n! 1 1 1
fk (x) = − 2 arctan2 x ,
(k!)2 4 π π(1 + x2 )
where n = 2k + 1 and k ≥ 1 is an integer.
1. Check by numerical integration that fk is a proper density for k = 5, 10, 20
2. Design an accept-reject algorithm function on R that produce an iid sample of
arbitrary size m for an arbitrary parameter k. Produce a graphical verification of
the fit for m = 103 and k = 5, 10, 20.
3. We want to check from the acceptance rate of this accept-reject algorithm that
the normalisation is correct in the above. Produce 520 realisations of an empirical
acceptance rate based on 100 proposals and deduce a 94% confidence interval on
the expectation of the acceptance rate. Check whether or not it contains the inverse
normalising constant.
4. This density is actually the distribution of the median of a Cauchy sample of size
n = 2k +1. Generate a sample from the above accept-reject algorithm with m = 520
and k = 10, then another sample of m = 520 medians from samples of 21 Cauchy
variates. Test whether they have the same distribution.
5. Check whether or not the p-value of the above test is distributed as a uniform U (0, 1)
random variate. (Extra-credit : Establish why the distribution of the p-value should
be uniform.)
3. Exercice 3
If U1 , U2 , . . . , Uk is a sample from the U (0, 1) distribution, then Mk = min(U1 , . . . , Uk )
follows the Beta(1, k) distribution. We wish to verify that
L
kMk − − Exp(1)
−→
k→∞
1. Create a function rbeta2(n, k) which simulates n realizations of the Beta(1, k)
distribution, using nk realizations of the uniform distribution. (Note : if you do not
manage this question, you can use the R function rbeta(n,1,k) for the remainder
of the exercise.)
2. For k = 50 and n = 1000, propose a graphical way to verify the fit of kMk to the
Exp(1) distribution.
3. Using ks.test() and n = 1000, check whether the exponential distribution is an
acceptable fit when k = 10, k = 50, k = 200.
4. From now on, k = 200 and n = 1000. We now have a test to check the fit of a sample
x to the Beta(1, k) distribution : we accept the null hypothesis that x comes from
the Beta(1, k) distribution iff the Kolmogorov-Smirnov test accepts the hypothesis
that kx fits the Exp(1) distribution. Perform a bootstrap experiment to calculate
the probability of accepting the null hypothesis for a sample which comes from the
Beta(1, k) distribution.
5. Perform another bootstrap experiment to calculate the same probability when using
directly the Kolmogorov-Smirnov test for fit to the Beta(1, k) distribution (whose
cdf exists in R as pbeta).
Exercice 4
The SkewLogistic(α) distribution defines a random variable X which takes values in R
and with cumulative distribution function
1
F (x) =
(1 + e−x )α
1. Using the generic inversion method, write a function rskewlogistic(n,α) which
outputs n realizations of the SkewLogistic(α) distribution.
2. For α = 2, give a Monte Carlo experiment to estimate V ar(X) and the median of
X. Calculate (on paper) the theoretical value of the median and compare it to your
estimate.
3. Propose a bootstrap experiment to evaluate the bias of your variance and median
estimators.
4. For α = 2, use the Kolmogorov-Smirnov test to verify that the variable
Y = log(1 + e−X )
follows an Exp(2) distribution.
Exercice 5
Given the probability density
C − |x−δ|
f (x|θ, δ) = e θ ,
θ
4. 1. explain why an importance sampling technique, designed to approximate the
constant C, that is based on the Normal density cannot not work. Illustrate this
lack of convergence with a numerical experiment using θ = 2 and δ = 4.
2. Propose a more suitable importance distribution.
We now focus on the integral
I= xf (x|2, 4)dx
R
using samples of size n = 102 .
3. Propose a Monte Carlo approximation of I. (Hint : Note that the integral over R is
twice the integral over R+ when δ = 0 and connect f with a standard distribution
on (δ, ∞).)
4. Approximate I by importance sampling using the same distribution g as in question
2.
5. Compute a confidence interval on I at level 95% for each of your method. Which
one of the two estimates does reach the lowest precision ?
6. Design a Monte Carlo experiment in order to check whether or not the asymptotic
coverage level of the CI holds. Repeat the experiment with samples of size n = 103 .
Exercice 6
Given the Galton density on R∗ ,
+
1
f (x|µ, σ) = √ exp{−(log(x) − µ)2 /2σ 2 }
xσ 2π
1. Determine which of the following distributions can be used in an A/R algorithm
designed to sample from f (x|0, 1) :
k x k−1 −(xλ)k 1 1 k−1 − x
g1 (x) = ( ) e g2 (x) = x e θ g3 (x) = (1 + αx)−1/α−1
λ λ θk Γ(k)
which are respectively a Weibull, a Gamma and a generalized Pareto distribution.
Determine the appropriate upper bounds.
2. Using the inversion method write an algorithm that samples from the selected g.
3. Write an R function AR() that samples from f (x|0, 1). (Extra-credit : Optimize the
parameters of the proposal density g.)
4. Based on a sample of size 104 from f (x|0, 1), estimate by Monte Carlo the mean
and variance of h(X) = log(X) when X ∼ f and give a confidence interval at level
95% for both quantities.
5. The distribution associated with f can be obtained by the transform exp{Z} when
Z ∼ N (µ, σ). Establish this result and test it, based on the sample used in question
4.