SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Statistics Lab
Rodolfo Metulini
IMT Institute for Advanced Studies, Lucca, Italy
Lesson 2 - Application to the Central Limit Theory - 16.01.2015
Introduction
The modern statistics was built and developed around the normal
distribution.
Academic world use to say that, if the empirical distribution is
normal (or approximative normal), everything works good. This
depends mainly on the sample dimension
Said this, it is important to undestand in which circumstances we
can state the distribution is normal.
Two founding statistical theorems are helpful: The Central Limit
Theorem and The Law of Large Numbers.
The Law of Large Numbers (LLN)
Suppose we have a random variable X with expected value
E(X) = µ.
We extract n observation from X (say {x = x1, x2, ..., xn}).
If we define ˆXn = i xi
n = x1+x2+...+xn
n , the LLN states that, for
n −→ ∞,
ˆXn −→ µ
The Central Limit Theorem (CLT)
Suppose we have a random variable X with expected value
E(X) = µ and v(X) = σ2
We extract n observation from X (say {x = x1, x2, ..., xn}).
Lets define ˆXn = i xi
n = x1+x2+...+xn
n .
ˆXn distributes with expected value µ and variance
σ2
n
.
In case n −→ ∞ (in pratice n > 30)
ˆXn ∼ N(µ,
σ2
n
), whatever the distribution of x be.
N.B. If X is normal distributed, ˆXn ∼ N(µ,
σ2
n
) even if
n < 30
CLT: Empiricals
To better understand the CLT, it is recommended to examine the
theorem empirically and step by step.
By the introduction of new commands in the R programming
language.
In the first part, we will show how to draw and visualize a sample
of random numbers from a distribution.
Then, we will examine the mean and standard deviation of the
sample, then the distribution of the sample means.
Drawing random numbers - 1
We already introduced the use of the letters d, p and q in relations
to the various distributions (e.g. normal, uniform, exponential). A
reminder of their use follows:
d is for density: it is used to find values of the probability
density function.
p is for probability: it is used to find the probability that the
random variable lies on the left of a giving number.
q is for quantile: it is used to find the quantiles of a given
distribution.
There is a fourth letter, namely r, used to draw random numbers
from a distribution. For example runif and rexp would be used to
draw random numbers from the uniform and exponential
distributions, respectively.
Drawing random numbers - 2
Let use the rnorm command to draw 500 number at random from
a normal distribution having mean 100 and standard deviation (sd)
10.
> x= rnorm(500,mean=100,sd=10)
The results, typing in the r consolle x, is a list of 500 numbers
extracted at random from a normal distribution with mean 500 and
sd 100.
When you examine the numbers stored in the vector x, There is a
sense that you are pulling random numbers that are clumped about
a mean of 100. However, a histagram of this selection provides a
different picture of the data stored.
> hist(x,prob=TRUE)
Drawing random numbers - Comments
Several comments are in order regarding the histogram in the
figure.
1. The histogram is approximately normal in shape.
2. The balance point of the histogram appears to be located
near 100, suggesting that the random numbers were drawn
from a distribution having mean 100.
3. Almost all of the values are within 3 increments of 10 from
the mean, suggesting that random numbers were drawn from
a normal distribution having standard deviation 10.
Drawing random numbers - a new drawing
Lets try the experiment again, drawing a new set of 500 random
numbers from the normal distribution having mean 100 and
standard deviation 10:
> x = rnorm(500, mean = 100, sd = 10)
> hist(x, prob = TRUE, ylim = c(0, 0.04))
Give a look to the histogram ... It is different from the first one,
however, it share some common traits: (1) it appears normal in
shape; (2) it appears to be balanced around 100; (3) all values
appears to occur within 3 increments of 10 of the mean.
This is a strong evidence that the random numbers have been
drawn from a normal distribution having mean 100 and sd 10. We
can provide evidence of this claim by imposing a normal density
curve:
> curve(dnorm(x, mean = 100, sd = 10), 70, 130, add =
TRUE, lwd = 2, col = ”red”))
The curve command
The curve command is new. Some comments on its use
follow:
1. In its simplest form, the sintax curve(f (x), from =, to =)
draws the function defined by f(x) on the interval (from, to).
Our function is dnorm(x, mean = 100, sd = 10). The curve
command sketches this function of X on the interval
(from,to).
2. The notation from = and to = may be omitted if the
arguments are in the proper order to the curve command:
function first, value of from second, value of to third. That is
what we have done.
3. If the argument add is set to TRUE, then the curve is added
to the existing figure. If the arument is omitted (or FALSE)
then a new plot is drawn,erasing the prevoius graph.
The distribution of ˆXn (sample mean)
In our previous example we drew 500 random numbers from a
normal distribution with mean 100 and standard deviation 10. This
leads to ONE sample of n = 500. Now the question is: what is
the mean of our sample?
> mean(x)
[1]100.14132
If we take another sample of 500 random numbers from the SAME
distribution, we get a new sample with different mean.
> x = rnorm(500, mean = 100, sd = 10)
mean(x)
[1]100.07884
What happens if we draw a sample several times?
Producing a vector of sample means
We will repeatedly sample from the normal distribution, 500 times.
Each of the 500 samples will select 5 random numbers (instead of
500) from the normal distribution having mean 100 and sd 10. We
will then compute the mean of those samples.
We begin by declaring the mean and the standard deviation. Then,
we declare the sample mean.
> µ = 100; σ = 10
> n = 5
We then need some place to store the mean of the samples. We
initalize a vector xbar to initially contain 500 zeros.
> xbar = rep(0, 500)
Producing a vector of sample means - cycle for
It is easy to draw a sample of size n = 5 from the normal
distribution having mean µ = 100 and standard deviation σ = 10.
We simply issue the command
rnorm(n, mean = µ, sd = σ).
To find the mean of this results, we simply add the
adjustment
mean(rnorm(n, mean = µ, sd = σ)).
The final step is to store this results in the vector xbar. Then we
must repeat this same process an addintional 499 times. This
require the use of a for loop.
> for(iin1 : 500){xbar[i] = mean(rnorm(n, mean = µ, sd =
σ))}
Cycle for
The i in for(iin1 : 500) is called the index of the for loop.
The index i is first set equal to 1, then the body of the for
loop is executed. On the next iteration, i is set equal to 2 and
the body of the loop is executed again. The loop continues in
this manner, incrementing by 1, finally setting the index i to
500. After executing the last loop, the for cycle is terminated
In the body of the for loop, we have
xbar[i] = mean(rnorm(n, mean = µ, sd = σ)). This draws a
sample of size 5 from the normal distribution, calculates the
mean of the sample, and store the results in xbar[i].
When the for loop completes 500 iterations, the vector xbar
contains the means of 500 samples of size 5 drawn from the
normal distribution having µ = 100 and σ = 10
> hist(xbar, prob = TRUE, breacks = 12, xlim = c(70, 130, ylim =
c(0, 0.1)))
Distribution of ˆXn - observations
1. The previous histograms describes the shape of the 500
random number randomly selected, here, the histogram
describe the distribution of 500 different sample means, each
of which founded by selecting n = 5 random number from the
normal distribution.
2. The distribution of xbar appears normal in shape. This is so
even though the sample size is relatively small ( n = 5).
3. It appears that the balance point occurs near 100. This can
be checked with the following command:
> mean(xbar)
That is the mean of the sample means, that is almost equal to
the mean of the draw of random numbers.
4. The distribution of the sample means appears to be narrower
then the random number distributions.
Increasing the sample size
Lets repeat the last experiment, but this time let’s draw a sample
size of n = 10 from the same distribution (µ = 100, σ = 10)
> µ = 100; σ = 10
> n = 10
> xbar = rep(0, 500)
> for(iin1 : 500){xbar[i] = mean(rnorm(n, mean = µ, sd =
σ))}
hist(xbar, prob = TRUE, breaks = 12, xlim = c(70, 130), ylim =
c(0, 0.1))
The Histogram produced is even more narrow than using
n = 5
Key Ideas
1. When we select samples from a normal distribution, then the
distribution of sample means is also normal in shape
2. The mean of the distribution of sample means appears to be
the same as the mean of the random numbers
(parentpopulation) (see the balance points compared)
3. By increasing the sample size of our samples, the histograms
becomes narrower. Infact, we would expect a more accurate
estimate of the mean of the parent population if we take the
mean from a larger sample size.
4. Imagine to draw sample means from a sample of n = ∞. The
histogram will be exactly concentrated (P = 1) in Xbar = µ,
since the variance is σ2/∞
Summarise
We finish replicating the statement about CLT:
1. If you draw samples from a normal distribution, then the
distribution of the sample means is also normal
2. The mean of the sample means is roughly identical to the
mean of the parent population
3. The higher the sample size that is drawn, the narrower will be
the spread of the distribution of the sample means.
Homeworks
Experiment 1: Draw the Xbar histogram for n = 1000. How is
the histogram shape?
Experiment 2: Repeat the full experiment drawing random
numbers and sample means from a (1) uniform and from (2) a
poisson distribution. Is the histogram of Xbar normal in shape for
n = 5 and for n=30?
Experiment 3: Repeat the full experiment using real data instead
of random numbers. (HINT: select samples of dimension n = 5
from the real data, not rnorm)
Recommended: Try to evaluate the agreement of the sample mean
histogram with normal distribution by mean of the qq-plot and
shapiro wilk test.
Application to Large Number Law
Experiment: toss the coin 100 times.
This experiment is like repeating 100 times a random draw from a
bernoulli distribution with parameter ρ = 0.5
We expect to have 50 times (value = 1) head and 50 times cross
(value = 0), if the coin is not distorted
But, in practice, this not happen: repeating the experiment we are
going to have a distributions centered in 50, but spread out.
Let’s imagine to define ˆXn as the mean of the number of heads
across n experiments. For n −→ ∞, ˆXn −→ 50
Application to Large Number Law - 2
x = rbinom(100,1,0.5) x x2 = rbinom(100,2,0.5)
hist random numbers hist(x)
define the empirical frequency sum(x)
define the empirical frequency for the sample mean xfreq =
rep(0,1000) xfreq
for loop define the number i N = rep(0,1000) for (i in 1:1000) N[i]
= i N
define the cumulated frequency (total) xfreq[1] =
sum(rbinom(100,1,0.5)) xfreq[1] for (i in 2:1000) xfreq[i] =
(sum(rbinom(100,1,0.5)) + xfreq[i-1]) xfreq
define the sample mean (cumulative freq divided by number of
experiments) xfreq2 = rep(0,1000) for (i in 1:1000) xfreq2[i] =
xfreq[i]/i xfreq2 plot(xfreq2, ylim= c(48,52))
mu = rep(50,1000) mu points(mu,col=”red”)

Mais conteúdo relacionado

Mais procurados

Chapter 2 discrete_random_variable_2009
Chapter 2 discrete_random_variable_2009Chapter 2 discrete_random_variable_2009
Chapter 2 discrete_random_variable_2009
ayimsevenfold
 
Introduction to random variables
Introduction to random variablesIntroduction to random variables
Introduction to random variables
Hadley Wickham
 
Probability mass functions and probability density functions
Probability mass functions and probability density functionsProbability mass functions and probability density functions
Probability mass functions and probability density functions
Ankit Katiyar
 
Statistik 1 6 distribusi probabilitas normal
Statistik 1 6 distribusi probabilitas normalStatistik 1 6 distribusi probabilitas normal
Statistik 1 6 distribusi probabilitas normal
Selvin Hadi
 
Statistik 1 5 distribusi probabilitas diskrit
Statistik 1 5 distribusi probabilitas diskritStatistik 1 5 distribusi probabilitas diskrit
Statistik 1 5 distribusi probabilitas diskrit
Selvin Hadi
 

Mais procurados (20)

Statistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsStatistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: Models
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
 
Montecarlophd
MontecarlophdMontecarlophd
Montecarlophd
 
newton raphson method
newton raphson methodnewton raphson method
newton raphson method
 
statistics assignment help
statistics assignment helpstatistics assignment help
statistics assignment help
 
Chapter 2 discrete_random_variable_2009
Chapter 2 discrete_random_variable_2009Chapter 2 discrete_random_variable_2009
Chapter 2 discrete_random_variable_2009
 
Unit4
Unit4Unit4
Unit4
 
Introduction to random variables
Introduction to random variablesIntroduction to random variables
Introduction to random variables
 
Bisection & Regual falsi methods
Bisection & Regual falsi methodsBisection & Regual falsi methods
Bisection & Regual falsi methods
 
Numerical Method 2
Numerical Method 2Numerical Method 2
Numerical Method 2
 
Probability mass functions and probability density functions
Probability mass functions and probability density functionsProbability mass functions and probability density functions
Probability mass functions and probability density functions
 
Regulafalsi_bydinesh
Regulafalsi_bydineshRegulafalsi_bydinesh
Regulafalsi_bydinesh
 
Chap05 continuous random variables and probability distributions
Chap05 continuous random variables and probability distributionsChap05 continuous random variables and probability distributions
Chap05 continuous random variables and probability distributions
 
Statistik 1 6 distribusi probabilitas normal
Statistik 1 6 distribusi probabilitas normalStatistik 1 6 distribusi probabilitas normal
Statistik 1 6 distribusi probabilitas normal
 
03 optimization
03 optimization03 optimization
03 optimization
 
probability assignment help
probability assignment helpprobability assignment help
probability assignment help
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statistics
 
Statistik 1 5 distribusi probabilitas diskrit
Statistik 1 5 distribusi probabilitas diskritStatistik 1 5 distribusi probabilitas diskrit
Statistik 1 5 distribusi probabilitas diskrit
 
algebric solutions by newton raphson method and secant method
algebric solutions by newton raphson method and secant methodalgebric solutions by newton raphson method and secant method
algebric solutions by newton raphson method and secant method
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 

Destaque

The linear regression model: Theory and Application
The linear regression model: Theory and ApplicationThe linear regression model: Theory and Application
The linear regression model: Theory and Application
University of Salerno
 

Destaque (8)

Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 
Ad b 1702_metu_v2
Ad b 1702_metu_v2Ad b 1702_metu_v2
Ad b 1702_metu_v2
 
Talk 5
Talk 5Talk 5
Talk 5
 
The Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with KriskogramThe Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with Kriskogram
 
Eu
EuEu
Eu
 
Talk 3
Talk 3Talk 3
Talk 3
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
The linear regression model: Theory and Application
The linear regression model: Theory and ApplicationThe linear regression model: Theory and Application
The linear regression model: Theory and Application
 

Semelhante a Talk 2

Applications to Central Limit Theorem and Law of Large Numbers
Applications to Central Limit Theorem and Law of Large NumbersApplications to Central Limit Theorem and Law of Large Numbers
Applications to Central Limit Theorem and Law of Large Numbers
University of Salerno
 
1 Lab 4 The Central Limit Theorem and A Monte Carlo Si.docx
1 Lab 4   The Central Limit Theorem and A Monte Carlo Si.docx1 Lab 4   The Central Limit Theorem and A Monte Carlo Si.docx
1 Lab 4 The Central Limit Theorem and A Monte Carlo Si.docx
jeremylockett77
 
HW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docx
HW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docxHW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docx
HW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docx
wilcockiris
 
Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis tests
University of Salerno
 
Suggest one psychological research question that could be answered.docx
Suggest one psychological research question that could be answered.docxSuggest one psychological research question that could be answered.docx
Suggest one psychological research question that could be answered.docx
picklesvalery
 
C2 st lecture 10 basic statistics and the z test handout
C2 st lecture 10   basic statistics and the z test handoutC2 st lecture 10   basic statistics and the z test handout
C2 st lecture 10 basic statistics and the z test handout
fatima d
 
Probability distribution
Probability distributionProbability distribution
Probability distribution
Ranjan Kumar
 

Semelhante a Talk 2 (20)

Applications to Central Limit Theorem and Law of Large Numbers
Applications to Central Limit Theorem and Law of Large NumbersApplications to Central Limit Theorem and Law of Large Numbers
Applications to Central Limit Theorem and Law of Large Numbers
 
Probility distribution
Probility distributionProbility distribution
Probility distribution
 
Lect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadLect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spread
 
raghu veera stats.ppt
raghu veera stats.pptraghu veera stats.ppt
raghu veera stats.ppt
 
Normal Distribution, Binomial Distribution, Poisson Distribution
Normal Distribution, Binomial Distribution, Poisson DistributionNormal Distribution, Binomial Distribution, Poisson Distribution
Normal Distribution, Binomial Distribution, Poisson Distribution
 
Excel Homework Help
Excel Homework HelpExcel Homework Help
Excel Homework Help
 
1 Lab 4 The Central Limit Theorem and A Monte Carlo Si.docx
1 Lab 4   The Central Limit Theorem and A Monte Carlo Si.docx1 Lab 4   The Central Limit Theorem and A Monte Carlo Si.docx
1 Lab 4 The Central Limit Theorem and A Monte Carlo Si.docx
 
HW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docx
HW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docxHW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docx
HW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docx
 
U unit8 ksb
U unit8 ksbU unit8 ksb
U unit8 ksb
 
Makalah ukuran penyebaran
Makalah ukuran penyebaranMakalah ukuran penyebaran
Makalah ukuran penyebaran
 
Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis tests
 
random variation 9473 by jaideep.ppt
random variation 9473 by jaideep.pptrandom variation 9473 by jaideep.ppt
random variation 9473 by jaideep.ppt
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
Unit3
Unit3Unit3
Unit3
 
Chapter 3 sampling and sampling distribution
Chapter 3   sampling and sampling distributionChapter 3   sampling and sampling distribution
Chapter 3 sampling and sampling distribution
 
Suggest one psychological research question that could be answered.docx
Suggest one psychological research question that could be answered.docxSuggest one psychological research question that could be answered.docx
Suggest one psychological research question that could be answered.docx
 
Fundamentals of Sampling Distribution and Data Descriptions
Fundamentals of Sampling Distribution and Data DescriptionsFundamentals of Sampling Distribution and Data Descriptions
Fundamentals of Sampling Distribution and Data Descriptions
 
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapConfidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
 
C2 st lecture 10 basic statistics and the z test handout
C2 st lecture 10   basic statistics and the z test handoutC2 st lecture 10   basic statistics and the z test handout
C2 st lecture 10 basic statistics and the z test handout
 
Probability distribution
Probability distributionProbability distribution
Probability distribution
 

Mais de University of Salerno

Poster venezia
Poster veneziaPoster venezia
Poster venezia
University of Salerno
 
Metulini280818 iasi
Metulini280818 iasiMetulini280818 iasi
Metulini280818 iasi
University of Salerno
 
Metulini1503
Metulini1503Metulini1503
Metulini1503
University of Salerno
 
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
University of Salerno
 
Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov Chains
University of Salerno
 

Mais de University of Salerno (20)

Modelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large dataModelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large data
 
Regression models for panel data
Regression models for panel dataRegression models for panel data
Regression models for panel data
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2
 
A strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census dataA strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census data
 
Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...
 
BASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORSBASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORS
 
Human activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone dataHuman activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone data
 
Poster venezia
Poster veneziaPoster venezia
Poster venezia
 
Metulini280818 iasi
Metulini280818 iasiMetulini280818 iasi
Metulini280818 iasi
 
Players Movements and Team Performance
Players Movements and Team PerformancePlayers Movements and Team Performance
Players Movements and Team Performance
 
Big Data Analytics for Smart Cities
Big Data Analytics for Smart CitiesBig Data Analytics for Smart Cities
Big Data Analytics for Smart Cities
 
Meeting progetto ode_sm_rm
Meeting progetto ode_sm_rmMeeting progetto ode_sm_rm
Meeting progetto ode_sm_rm
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
 
Metulini1503
Metulini1503Metulini1503
Metulini1503
 
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
 
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
 
The Global Virtual Water Network
The Global Virtual Water NetworkThe Global Virtual Water Network
The Global Virtual Water Network
 
Talk 4
Talk 4Talk 4
Talk 4
 
Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov Chains
 

Último

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Último (20)

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

Talk 2

  • 1. Statistics Lab Rodolfo Metulini IMT Institute for Advanced Studies, Lucca, Italy Lesson 2 - Application to the Central Limit Theory - 16.01.2015
  • 2. Introduction The modern statistics was built and developed around the normal distribution. Academic world use to say that, if the empirical distribution is normal (or approximative normal), everything works good. This depends mainly on the sample dimension Said this, it is important to undestand in which circumstances we can state the distribution is normal. Two founding statistical theorems are helpful: The Central Limit Theorem and The Law of Large Numbers.
  • 3. The Law of Large Numbers (LLN) Suppose we have a random variable X with expected value E(X) = µ. We extract n observation from X (say {x = x1, x2, ..., xn}). If we define ˆXn = i xi n = x1+x2+...+xn n , the LLN states that, for n −→ ∞, ˆXn −→ µ
  • 4. The Central Limit Theorem (CLT) Suppose we have a random variable X with expected value E(X) = µ and v(X) = σ2 We extract n observation from X (say {x = x1, x2, ..., xn}). Lets define ˆXn = i xi n = x1+x2+...+xn n . ˆXn distributes with expected value µ and variance σ2 n . In case n −→ ∞ (in pratice n > 30) ˆXn ∼ N(µ, σ2 n ), whatever the distribution of x be. N.B. If X is normal distributed, ˆXn ∼ N(µ, σ2 n ) even if n < 30
  • 5. CLT: Empiricals To better understand the CLT, it is recommended to examine the theorem empirically and step by step. By the introduction of new commands in the R programming language. In the first part, we will show how to draw and visualize a sample of random numbers from a distribution. Then, we will examine the mean and standard deviation of the sample, then the distribution of the sample means.
  • 6. Drawing random numbers - 1 We already introduced the use of the letters d, p and q in relations to the various distributions (e.g. normal, uniform, exponential). A reminder of their use follows: d is for density: it is used to find values of the probability density function. p is for probability: it is used to find the probability that the random variable lies on the left of a giving number. q is for quantile: it is used to find the quantiles of a given distribution. There is a fourth letter, namely r, used to draw random numbers from a distribution. For example runif and rexp would be used to draw random numbers from the uniform and exponential distributions, respectively.
  • 7. Drawing random numbers - 2 Let use the rnorm command to draw 500 number at random from a normal distribution having mean 100 and standard deviation (sd) 10. > x= rnorm(500,mean=100,sd=10) The results, typing in the r consolle x, is a list of 500 numbers extracted at random from a normal distribution with mean 500 and sd 100. When you examine the numbers stored in the vector x, There is a sense that you are pulling random numbers that are clumped about a mean of 100. However, a histagram of this selection provides a different picture of the data stored. > hist(x,prob=TRUE)
  • 8. Drawing random numbers - Comments Several comments are in order regarding the histogram in the figure. 1. The histogram is approximately normal in shape. 2. The balance point of the histogram appears to be located near 100, suggesting that the random numbers were drawn from a distribution having mean 100. 3. Almost all of the values are within 3 increments of 10 from the mean, suggesting that random numbers were drawn from a normal distribution having standard deviation 10.
  • 9. Drawing random numbers - a new drawing Lets try the experiment again, drawing a new set of 500 random numbers from the normal distribution having mean 100 and standard deviation 10: > x = rnorm(500, mean = 100, sd = 10) > hist(x, prob = TRUE, ylim = c(0, 0.04)) Give a look to the histogram ... It is different from the first one, however, it share some common traits: (1) it appears normal in shape; (2) it appears to be balanced around 100; (3) all values appears to occur within 3 increments of 10 of the mean. This is a strong evidence that the random numbers have been drawn from a normal distribution having mean 100 and sd 10. We can provide evidence of this claim by imposing a normal density curve: > curve(dnorm(x, mean = 100, sd = 10), 70, 130, add = TRUE, lwd = 2, col = ”red”))
  • 10. The curve command The curve command is new. Some comments on its use follow: 1. In its simplest form, the sintax curve(f (x), from =, to =) draws the function defined by f(x) on the interval (from, to). Our function is dnorm(x, mean = 100, sd = 10). The curve command sketches this function of X on the interval (from,to). 2. The notation from = and to = may be omitted if the arguments are in the proper order to the curve command: function first, value of from second, value of to third. That is what we have done. 3. If the argument add is set to TRUE, then the curve is added to the existing figure. If the arument is omitted (or FALSE) then a new plot is drawn,erasing the prevoius graph.
  • 11. The distribution of ˆXn (sample mean) In our previous example we drew 500 random numbers from a normal distribution with mean 100 and standard deviation 10. This leads to ONE sample of n = 500. Now the question is: what is the mean of our sample? > mean(x) [1]100.14132 If we take another sample of 500 random numbers from the SAME distribution, we get a new sample with different mean. > x = rnorm(500, mean = 100, sd = 10) mean(x) [1]100.07884 What happens if we draw a sample several times?
  • 12. Producing a vector of sample means We will repeatedly sample from the normal distribution, 500 times. Each of the 500 samples will select 5 random numbers (instead of 500) from the normal distribution having mean 100 and sd 10. We will then compute the mean of those samples. We begin by declaring the mean and the standard deviation. Then, we declare the sample mean. > µ = 100; σ = 10 > n = 5 We then need some place to store the mean of the samples. We initalize a vector xbar to initially contain 500 zeros. > xbar = rep(0, 500)
  • 13. Producing a vector of sample means - cycle for It is easy to draw a sample of size n = 5 from the normal distribution having mean µ = 100 and standard deviation σ = 10. We simply issue the command rnorm(n, mean = µ, sd = σ). To find the mean of this results, we simply add the adjustment mean(rnorm(n, mean = µ, sd = σ)). The final step is to store this results in the vector xbar. Then we must repeat this same process an addintional 499 times. This require the use of a for loop. > for(iin1 : 500){xbar[i] = mean(rnorm(n, mean = µ, sd = σ))}
  • 14. Cycle for The i in for(iin1 : 500) is called the index of the for loop. The index i is first set equal to 1, then the body of the for loop is executed. On the next iteration, i is set equal to 2 and the body of the loop is executed again. The loop continues in this manner, incrementing by 1, finally setting the index i to 500. After executing the last loop, the for cycle is terminated In the body of the for loop, we have xbar[i] = mean(rnorm(n, mean = µ, sd = σ)). This draws a sample of size 5 from the normal distribution, calculates the mean of the sample, and store the results in xbar[i]. When the for loop completes 500 iterations, the vector xbar contains the means of 500 samples of size 5 drawn from the normal distribution having µ = 100 and σ = 10 > hist(xbar, prob = TRUE, breacks = 12, xlim = c(70, 130, ylim = c(0, 0.1)))
  • 15. Distribution of ˆXn - observations 1. The previous histograms describes the shape of the 500 random number randomly selected, here, the histogram describe the distribution of 500 different sample means, each of which founded by selecting n = 5 random number from the normal distribution. 2. The distribution of xbar appears normal in shape. This is so even though the sample size is relatively small ( n = 5). 3. It appears that the balance point occurs near 100. This can be checked with the following command: > mean(xbar) That is the mean of the sample means, that is almost equal to the mean of the draw of random numbers. 4. The distribution of the sample means appears to be narrower then the random number distributions.
  • 16. Increasing the sample size Lets repeat the last experiment, but this time let’s draw a sample size of n = 10 from the same distribution (µ = 100, σ = 10) > µ = 100; σ = 10 > n = 10 > xbar = rep(0, 500) > for(iin1 : 500){xbar[i] = mean(rnorm(n, mean = µ, sd = σ))} hist(xbar, prob = TRUE, breaks = 12, xlim = c(70, 130), ylim = c(0, 0.1)) The Histogram produced is even more narrow than using n = 5
  • 17. Key Ideas 1. When we select samples from a normal distribution, then the distribution of sample means is also normal in shape 2. The mean of the distribution of sample means appears to be the same as the mean of the random numbers (parentpopulation) (see the balance points compared) 3. By increasing the sample size of our samples, the histograms becomes narrower. Infact, we would expect a more accurate estimate of the mean of the parent population if we take the mean from a larger sample size. 4. Imagine to draw sample means from a sample of n = ∞. The histogram will be exactly concentrated (P = 1) in Xbar = µ, since the variance is σ2/∞
  • 18. Summarise We finish replicating the statement about CLT: 1. If you draw samples from a normal distribution, then the distribution of the sample means is also normal 2. The mean of the sample means is roughly identical to the mean of the parent population 3. The higher the sample size that is drawn, the narrower will be the spread of the distribution of the sample means.
  • 19. Homeworks Experiment 1: Draw the Xbar histogram for n = 1000. How is the histogram shape? Experiment 2: Repeat the full experiment drawing random numbers and sample means from a (1) uniform and from (2) a poisson distribution. Is the histogram of Xbar normal in shape for n = 5 and for n=30? Experiment 3: Repeat the full experiment using real data instead of random numbers. (HINT: select samples of dimension n = 5 from the real data, not rnorm) Recommended: Try to evaluate the agreement of the sample mean histogram with normal distribution by mean of the qq-plot and shapiro wilk test.
  • 20. Application to Large Number Law Experiment: toss the coin 100 times. This experiment is like repeating 100 times a random draw from a bernoulli distribution with parameter ρ = 0.5 We expect to have 50 times (value = 1) head and 50 times cross (value = 0), if the coin is not distorted But, in practice, this not happen: repeating the experiment we are going to have a distributions centered in 50, but spread out. Let’s imagine to define ˆXn as the mean of the number of heads across n experiments. For n −→ ∞, ˆXn −→ 50
  • 21. Application to Large Number Law - 2 x = rbinom(100,1,0.5) x x2 = rbinom(100,2,0.5) hist random numbers hist(x) define the empirical frequency sum(x) define the empirical frequency for the sample mean xfreq = rep(0,1000) xfreq for loop define the number i N = rep(0,1000) for (i in 1:1000) N[i] = i N define the cumulated frequency (total) xfreq[1] = sum(rbinom(100,1,0.5)) xfreq[1] for (i in 2:1000) xfreq[i] = (sum(rbinom(100,1,0.5)) + xfreq[i-1]) xfreq define the sample mean (cumulative freq divided by number of experiments) xfreq2 = rep(0,1000) for (i in 1:1000) xfreq2[i] = xfreq[i]/i xfreq2 plot(xfreq2, ylim= c(48,52)) mu = rep(50,1000) mu points(mu,col=”red”)