2. Normal Distribution: What It Is, Properties, Uses, and
Formula
In graphical form, the normal distribution appears as a inverse
"bell curve"
Normal distribution, also known as the Gaussian distribution, is a probability
distribution that is symmetric about the mean, showing that data near the mean
are more frequent in occurrence than data far from the mean.
3. KEY TAKEAWAYS
The normal distribution is the proper term for a probability bell curve.
In a normal distribution the mean is zero and the standard deviation is 1.
It has zero skew and a kurtosis of 3.
Normal distributions are symmetrical, but not all symmetrical
distributions are normal.
Many naturally-occurring phenomena tend to approximate the normal
distribution.
In finance, most pricing distributions are not, however, perfectly normal.
4. Normal distribution
The normal distribution is the most common type of
distribution assumed in technical stock market analysis
and in other types of statistical analyses. The standard
normal distribution has two parameters: the mean and the
standard deviation.
5. The normal distribution
The normal distribution model is important in statistics and is
key to the Central Limit Theorem (CLT). This theory states that
averages calculated from independent, identically distributed
random variables have approximately normal distributions,
regardless of the type of distribution from which the variables
are sampled (provided it has finite variance).
6. Properties of the Normal Distribution
The normal distribution has several key features and properties
that define it.
First, its mean (average), median (midpoint), and mode (most
frequent observation) are all equal to one another.
Moreover, these values all represent the peak, or highest point,
of the distribution.
The distribution falls symmetrically around the mean, the width
of which is defined by the standard deviation.
7. The Empirical Rule
For all normal distributions, 68.2% of the observations will
appear within plus or minus one standard deviation of the
mean;
95.4% of the observations will fall within +/- two standard
deviations;
and 99.7% within +/- three standard deviations. This fact is
sometimes referred to as the "empirical rule," a heuristic that
describes where most of the data in a normal distribution will
appear.
8. Empirical rule 68 – 95 – 99.7 For an approximately normal
data set,
the values within one
standard deviation of the
mean account for about
68% of the set;
while within two standard
deviations account for about
95%; and
within three standard
deviations account for about
99.7%.
St. de. Shown percentages
are rounded theoretical
probabilities intended only to
approximate the empirical
data derived from a normal
population.
9.
10. Skewness
Skewness measures the degree of symmetry of a distribution.
The normal distribution is symmetric and has a skewness of
zero.
If the distribution of a data set instead has a skewness less
than zero, or negative skewness (left-skewness), then the left
tail of the distribution is longer than the right tail;
positive skewness (right-skewness) implies that the right tail of
the distribution is longer than the left.
11.
12. Kurtosis
Kurtosis measures the thickness of the tail ends of a distribution in
relation to the tails of a distribution. The normal distribution has a
kurtosis equal to 3.0.
Distributions with larger kurtosis greater than 3.0 exhibit tail data
exceeding the tails of the normal distribution (e.g., five or more
standard deviations from the mean). This excess kurtosis is known
in statistics as leptokurtic, "fat tails." The occurrence of fat tails in
financial markets describes what is known as tail risk.
Distributions with low kurtosis less than 3.0 (platykurtic) exhibit
tails that are generally less extreme ("skinnier") than the tails of the
normal distribution.
The Formula for
13. if the descriptive statistics estimated kurtosis value for all considered parameters are positive,
which implies that the distribution has heavier tails and a sharper peak than normal distribution.
Furthermore, in a distribution where a negative kurtosis value is observed, the curve has lighter
tails and a flatter peak or more rounded curve than the normal distribution, which is not
observed in this study. In Figure 2 below, the solid line indicates the normal distribution and
the dotted line shows the positive kurtosis values.
Figure 2: Positive Kurtosis
14. The Formula for the Normal Distribution
The normal distribution follows the following formula. Note that
only the values of the mean (μ ) and standard deviation (σ) are
necessary
Normal Distribution Formula
Z score and standard deviation relation
Z-score indicates how much a given value differs from the standard
deviation.
The Z-score, or standard score, is the number of standard deviations
a given data point lies above or below mean. Standard deviation is
essentially a reflection of the amount of variability within a given data set.
15. Sample question
Let us consider and use the normal distribution table and find the area
Use the standard formula Z= X-µ/σ
Z Score = ( X – µ ) / σ
Where
Z ; average percentage value
X is a normal random variable
µ average mean value
σ standard deviation
example µ= 5 and σ= 2 in a normal distribution please find the deviation area between x=6 and
x=9
First you have to find these standard deviation
Z= 6-5/2=0.5
Z=9-5/2= 2 and then we will look standard normal distribution table for 0.5 it is 0.6915
and for 2 it is 0.9772 and when we subtract each other 0.9772-6915= 0.2857 this gives us the
area between 6 and 9 which is 28.57%
16. Example 2
lets find µ=5 and σ=2for the standard deviation area when x=1 and x=3
Z= 1-5/2=-2
And z 3-5/2=-1
if it is negative for -1 subtract from 0.5 to find left side value of normal
distribution table 0.8413-0.5000= 0.3413 for the -2 0.9772-0.5000=0.4772
Z=1 is 0.3413 and z=2 is 0.4772
Subtract from each other 0.4772-0.3413=0.1359
18. Example
Assuming the weight of an orange is 200 g and the mean standard deviation is
determined as 50 g, what will be the percentage of oranges 300 g and above,
assuming that the oranges have a normal distribution?
Z= X-µ/σ 300-200/50=2 or (X=300) The area to the right is found by subtracting
0.4772 from half of the symmetric normal distribution (the area is 0.4772 if we
subtract this from 0.5000)
This value is 0.5-0.4772=0.0228, so 2.28 percent of oranges are heavier than
300 grams.
19. How to develop a hypothesis
Another question regarding the distribution of the weights of the oranges
mentioned above is what value is the weight of 60 percent of the
oranges below. As shown in the figure, we need to find the z value in
order to find the x value hypothesis generation
Ho whole oranges 300+
H1: whole oranges 200+50If the
HO is in the acceptance zone arithmetically, it is rejected if not
accepted. And here H1 is accepted because the arithmetic mean is in
the reject region.
20. T distribution
The t distribution is a symmetrical distribution and its appearance
resembles a normal distribution.
If the data is less than 30, the t distribution table is used.
If the sample volume commands, the standard normal distribution
table is used instead of the t distribution, because the larger the
sample volume, the closer the t distribution to the standard normal
distribution.
21. T distribution formula
t = t-distribution
x = sample mean
µ = population mean
s = sample standard deviation
n = sample size
22. Example
For example, let n = 15, let's find the value of t, which represents a 40% area
at t=0.
In the distribution table, when we subtract the value of 0.4, which means 40%,
from the value of 0.5,
0.5-0.4=0.10 which is half of the total area,
for the area of 40% in question, we will obtain the value of t, which is given
according to the value of 0.1, and the standard deviation.
We find at the intersection of =n-1= 15-1=14 line, this t value is 1.345 and it
is shown as td (14)= 1.345.
t = t-distribution
x = sample mean
µ = population mean
s = sample standard deviation
n = sample size
23.
24. Example 2
Let’s say we want to map an one-tailed t-test for a mean with an alpha level of 0.05. The total students
involved in this study are 25. What critical value should we compare t to?
Answer:
Firstly, we see that there are 25 students involved in this study. To get the degrees of freedom (df), we have
to subtract 1 from the sample size. Therefore, df = n – 1 = 25 – 1 = 24.
Hint:
Next, we see that our t-test is one-tailed. So we will choose the one-tail row to map our alpha level.
Next, we look for the alpha value along the above highlighted row. Our alpha level for this example
is 0.05. Let us map the same on the table
Once that is done, let us map the degrees of freedom under the leftmost column of the table
under (df)
The intersection of these two presents us with the critical value we are looking for
25.
26. Chi-Square (Χ²) Distributions | Definition &
Examples
A chi-square (Χ2) distribution is a continuous probability
distribution that is used in many hypothesis tests.
The shape of a chi-square distribution is determined by
the parameter k (degree of freedom). The graph below
shows examples of chi-square distributions with different
values of k.
27. K =degree of freedom (often abbreviated as df or d) tell you how many
numbers in your grid are actually independent. For a Chi-square grid, the
degrees of freedom can be said to be the number of cells you need to fill in
before, given the totals in the margins, you can fill in the rest of the grid
using a formula.
28. What is a chi-square distribution?
Chi-square (Χ2) distributions are a family of continuous probability
distributions. They’re widely used in hypothesis tests, including the chi-
square goodness of fit test and the chi-square test of independence.
The shape of a chi-square distribution is determined by the parameter k,
which represents the degrees of freedom.
Very few real-world observations follow a chi-square distribution. The
main purpose of chi-square distributions is hypothesis testing, not
describing real-world distributions.
In contrast, most other widely used distributions, like normal
distributions or Poisson distributions, can describe useful things such as
newborns’ birth weights or disease cases per year, respectively.
29. Relationship to the standard normal
distribution
Chi-square distributions are useful for hypothesis
testing because of their close relationship to
the standard normal distribution.
The standard normal distribution, which is a
normal distribution with a mean of zero and a
variance of one, is central to many
important statistical tests and theories.
30. Chi-square distributions
Imagine taking a random sample of a standard
normal distribution (Z).
If you squared all the values in the sample, you
would have the chi-square distribution with k = 1.
Χ2
1 = (Z)2
31. Now imagine taking samples from two standard
normal distributions (Z1 and Z2).
If each time you sampled a pair of values, you
squared them and added them together, you would
have the chi-square distribution with k = 2.
Χ2
2 = (Z1)2 + (Z2)2
32. chi-square
More generally, if you sample from k independent
standard normal distributions and then square and
sum the values, you’ll produce a chi-square
distribution with k degrees of freedom.
Χ2
k = (Z1) 2 + (Z2) 2 + … + (Zk) 2
33. Chi-square tests
Chi-square tests are hypothesis tests with test statistics that follow a chi-square
distribution under the null hypothesis. Pearson’s chi-square test was the first chi-
square test to be discovered and is the most widely used.
Pearson’s chi-square test statistic is:
X² is the chi-square test statistic
Σ is the summation operator (it means “take the sum
of”)
O is the observed frequency
E is the expected frequency
34. When k is one or two, the chi-square distribution is a curve shaped like a
backwards “J.” The curve starts out high and then drops off, meaning that
there is a high probability that Χ² is close to zero.
35. When k is greater than two
When k is greater than two, the chi-square distribution is
hump-shaped. The curve starts out low, increases, and then
decreases again. There is low probability that Χ² is very close
to or very far from zero.
The most probable value of Χ² is Χ² − 2.
36. When k is only a bit greater than two, the distribution is
much longer on the right side of its peak than its left (i.e., it
is strongly right-skewed).
37. As k increases, the distribution looks more and more similar
to a normal distribution. In fact, when k is 90 or greater, a
normal distribution is a good approximation of the chi-
square distribution.
38. Properties of chi-square distributions
Chi-square distributions start at zero and continue to
infinity.
The chi-square distribution starts at zero because it
describes the sum of squared random variables,
and a squared number can’t be negative.
The mean (μ) of the chi-square distribution is its degrees
of freedom, k. Because the chi-square distribution is
right-skewed, the mean is greater than the median and
mode.
The variance of the chi-square distribution is 2k.
39. Unit root
Unit Root Test
A unit root test tests whether a time series is not
stationary and consists of a unit root in time series
analysis.
The presence of a unit root in time series defines
the null hypothesis, and the alternative hypothesis
defines time series as stationary (desired).
40. Unit root
Mathematically the unit root test can be represented as
Where,
Dt is the deterministic component.
zt is the stochastic component.
ɛt is the stationary error process.
The unit root test’s basic concept is to determine whether the zt (stochastic
component for estimating speed - impact of parameter) consists of a unit root
or not.
41. Unit root
It is an econometric approach that tests whether the mean and variance
change over time, taking into account the autoregressive structure of the
time series.
Autoregressive models, as the name suggests, are models that return to
themselves.
That is, the dependent variable and the explanatory variable are the
same except that the dependent variable will be at a later time (t)
than the independent variable (t-1).
We say chronologically ordered because we are now at time (t). If we
go forward one period, we go to (t+1), and if we go back one period,
we go to (t-1).
42. What is “Unit Root”?
A unit root (also called a unit root process or a difference stationary
process) is a stochastic trend in a time series, sometimes called a “random
walk with drift”; If a time series has a unit root, it shows a systematic
pattern that is unpredictable.
unit root
possible unit root.
The red line shows
the drop in output
and path of recovery
if the time series has
a unit root. Blue
shows the recovery if
there is no unit root
and the series is
trend-stationary.
43. Unit root
In probability theory and statistics, a unit root is a feature of
some stochastic processes (variable or process that has
uncertainty) (such as random walks) that can cause problems
in statistical inference involving time series models.
A linear stochastic process has a unit root if 1 is a root of the
process's characteristic equation. Such a process is non-
stationary but does not always have a trend.
44. Unit root
If the other roots of the characteristic equation lie inside the
unit circle—that is, have a modulus (absolute value) less
than one—then the first difference of the process will be
stationary; otherwise, the process will need to be
differenced multiple times to become stationary.
If there are d unit roots, the process will have to be
differenced d times in order to make it stationary. Due to
this characteristic, unit root processes are also
called difference stationary.
45. Unit root
In statistics, a unit root test tests whether a time
series variable is non-stationary using an
autoregressive model.
A well-known test that is valid in large samples is
the augmented Dickey–Fuller test.
46. Unit root
A test for determining whether the mean, variance
and covariance of a time series are independent of
time.
47. unit root
A unit root is a unit of measurement to determine how much
stationarity a time series model has.
Also called a unit root process, we determine the
stochasticity of the model using statistical Hypothesis
testing.
stochasticity :process involving a randomly determined
sequence of observations
‘These are statistical hypothesis tests of stationarity that are
designed for determining whether differencing is required.’
48. Why is this important?
In a model that has a unit root, spikes and shocks to the
model will happen.
Meaning that a stock price might make a big jump or a big
fall that has nothing to do with seasonality.
If there is stochasticity in the model the effect of this
shock will disappear with time.
An important thing to take into consideration when building
a broader business model.
49. Types of Unit Root Tests
• The Dickey Fuller Test/ Augmented Dickey Fuller Test
• The Elliott–Rothenberg–Stock Test, which has two subtypes:
1. The P-test (panel unit root test) that is based on a notion of median
unbiased estimation that uses the invariance property and the median
function of panel pooled OLS estimator, and takes the error term’s serial
correlation into account.
2. The DF-GLS test can be applied to detrended data without intercept.
• The Schmidt–Phillips Test: Subtypes are the rho-test and the tau-test.
• The Phillips–Perron (PP) Test is a modification of the Dickey Fuller test,
and corrects for autocorrelation and heteroscedasticity in the errors.
• The Zivot-Andrews test allows a break at an unknown point in the
intercept or linear trend (4).
50. Dickey Fuller Test
The Dickey Fuller Test is a statistical hypothesis
test that measures the amount of stochasticity in a
time series model. The Dickey Fuller Test is based
on linear regression.
The Dickey Fuller test above actually creates a t-
statistic that is compared to predetermined
critical values. Being below that critical statistic
allows us to reject the null hypothesis and accept
the alternative
51. Unit root
A unit root process is a data-generating process whose first
difference is stationary. In other words, a unit root
process yt has the form
yt = yt–1 + stationary process.
A unit root test attempts to determine whether a given time
series is consistent with a unit root process.
The next section gives more details of unit root processes,
and suggests why it is important to detect them.
52. What does a unit root test do?
In statistics, a unit root test tests whether a time
series variable is non-stationary and
possesses a unit root.
The null hypothesis is generally defined as the
presence of a unit root and
the alternative hypothesis is either stationarity,
trend stationarity or explosive root depending on
the test used.
53. Modeling Unit Root Processes
There are two basic models for economic data with linear growth
characteristics:
• Trend-stationary process (TSP): yt = c + δt + stationary process,
I(O)
• Unit root process, also called a Difference-stationary process
(DSP): Δyt = δ + stationary process, I(1)
Here Δ is the differencing operator, Δyt = yt – yt–1 = (1 – L)yt, where L is the lag operator
defined by Liyt = yt – i.
54. Unit root
The processes are indistinguishable for finite data.
In other words, there are both a TSP and a DSP that fit a
finite data set arbitrarily well.
However, the processes are distinguishable when restricted
to a particular subclass of data-generating processes, such
as AR(Autoregressive processes)(p) processes.
After fitting a model to data, a unit root test checks if the AR
(1)- autoregressive pane data- coefficient is 1.
55. Unit root
There are two main reasons to distinguish between these
types of processes:
• Forecasting
• Spurious Regression
57. ADF
Serial correlation can be an issue, in which case
the Augmented Dickey-Fuller (ADF) test can be
used.
The ADF is handles bigger and more complex
models.
58. Below are the results from an Augmented Dickey Fuller Test
from two different data sets.
One being stochastic in nature, the other naught.
The first test color coded in purple has a high p value and a test
statistic well higher the highest critical value (absolute values).
This means it has a unit root process and therefore is stochastic
in nature. We fail to reject the null hypothesis. (vice versa)
59. The test is color coded in green and has a low p value and a test
statistic well below the lowest critical value.
This means it does not have a unit root process and therefore is
non-stochastic in nature.
We reject the null hypothesis and accept the alternative (desired
for significance)
60. Forecasting
A Trend Stationary process -TSP and a differentiated -DSP produce
different forecasts. Basically, shocks to a TSP return to the trend line
c+δt as time increases. In contrast, shocks to a DSP might be persistent
over time.
For example, consider the simple trend-stationary model
y1,t=0.9y1,t−1+0.02t+ε1,t
and the difference-stationary model subtracting Yt-1 from Yt, taking
the difference Yt - Yt-1) correspondingly to DY=Yt - Yt-1 = εt or Yt -
Yt-1 = α + εt and then the process becomes difference-stationary.
y2,t=0.2+y2,t−1+ε2,t.
62. TSP trend stationarity process
Examine the fitted parameters by passing the estimated
model to summarize, and you find estimate did an
excellent job.
The TSP has confidence intervals that do not grow with
time, whereas the DSP has confidence intervals that grow.
Furthermore, the TSP goes to the trend line quickly, while
the DSP does not tend towards the trend line
y=0.2t asymptotically.
64. Spurious Regression-
In statistics, a spurious relationship or spurious
correlation[1][2] is a mathematical relationship in
which two or more events or variables
are associated but not causally related,
due to either coincidence or the presence of a
certain third, unseen factor (referred to as a
"common response variable",
65. Spurious Regression- illegitimate not true??
The presence of unit roots can lead to false inferences in regressions between time
series.
Suppose xt and yt are unit root processes with independent increments, such as
random walks with drift
xt = c1 + xt–1 + ε1(t)
yt = c2 + yt–1 + ε2(t),
where εi(t) are independent innovations processes. Regressing y on x results, in
general, in a nonzero regression coefficient, and significant coefficient of
determination R2. This result holds despite xt and yt being independent random
walks.
66. Spurious Regression
If both processes have trends (ci ≠ 0), there is a correlation
between x and y because of their linear trends.
However, even if the ci = 0, the presence of unit roots in
the xt and yt processes yields correlation.
67. Testing unit root
There are four Econometrics Toolbox™ tests for unit roots. These functions test for the existence of
a single unit root. When there are two or more unit roots, the results of these tests might not be valid.
Modeling Unit Root Processes
There are two basic models for economic data with linear growth characteristics:
Trend-stationary process (TSP): yt = c + δt + stationary process
Unit root process, also called a difference-stationary process (DSP): Δyt = δ + stationary
process
Here Δ is the differencing operator, Δyt = yt – yt–1 = (1 – L)yt, where L is the lag operator
defined by Liyt = yt – i.
68. Testing for Unit Roots
Transform Data
Choose Models to Test
Determine Appropriate Lags
Conduct Unit Root Tests at Multiple Lags
69. Transform Data
Transform your time series to be approximately linear
before testing for a unit root.
If a series has exponential growth, take its logarithm.
For example, GDP and consumer prices typically have
exponential growth, so test their logarithms for unit roots.
70. Choose Models to Test
•For adf test or pp test, choose model in as follows:
•If your data shows a linear trend, set model to 'TS'.
•If your data shows no trend, but seem to have a
nonzero mean, set model to 'ARD'.
•If your data shows no trend and seem to have a zero
mean, set model to 'AR' (the default).
71. Determine Appropriate Lags
Setting appropriate lags depends on the test you use:
Adf test — One method is to begin with a maximum lag,
Then, test down by assessing the significance of the
coefficient of the term at maximum lag of p
72.
73. Unit root test : Augmented Dickey Fuller test (ADF)
Explanation of the Dickey-Fuller test.
A simple AR model can be represented as:
where
yt is variable of interest at the time t
ρ is a coefficient that defines the unit root
ut is noise or can be considered as an error term.
If ρ = 1, the unit root is present in a time series, and the time series is
non-stationary.
74. ADF
If a regression model can be represented as
Where
Δ is a difference operator.
ẟ = ρ-1
So here, if ρ = 1, which means we will get the differencing as the error term and
if the coefficient has some values smaller than one or bigger than one, we will
see the changes according to the past observation.
75. There can be three versions of the test.
test for a unit root
test for a unit root with constant
test for a unit root with the constant and deterministic trends
with time
76. So if a time series is non-stationary, it will tend to return an
error term or a deterministic trend with the time values.
If the series is stationary, then it will tend to return only an
error term or deterministic trend.
In a stationary time series, a large value tends to be
followed by a small value, and a small value tends to be
followed by a large value.
And in a non-stationary time series the large and the small
value will accrue with probabilities that do not depend on
the current value of the time series.
77. The augmented dickey- fuller test is an extension of the
dickey-fuller test, which removes autocorrelation from the
series and then tests similar to the procedure of the dickey-
fuller test.
The augmented dickey fuller test works on the statistic,
which gives a negative number and rejection of the
hypothesis depends on that negative number;
the more negative magnitude of the number represents the
confidence of presence of unit root at some level in the
time series.
78. We apply ADF on a model, and it can be represented
mathematically as
Where
ɑ is a constant
???? is the coefficient at time.
p is the lag order of the autoregressive process.
Here in the mathematical representation of ADF, we have
added the differencing terms that make changes between
ADF and the Dickey-Fuller test.
79. The unit root test is then carried out under the null
hypothesis ???? = 0 against the alternative hypothesis of
???? < 0. Once a value for the test statistic.
In statistics, a unit root test tests whether a time series
variable is non-stationary and possesses a unit root.
The null hypothesis is generally defined as the presence
of a unit root and the alternative hypothesis is either
stationarity, trend stationarity or explosive root depending
on the test used.
80. ADF
A key point to remember here is: Since the null hypothesis
assumes the presence of a unit root,
It can be compared to the relevant critical value for the
Dickey-Fuller test. The test has a specific distribution
simply known as the Dickey–Fuller table for critical values.
the p-value obtained by the test should be less than the
significance level (say 0.05) to reject the null hypothesis.
Thereby, inferring that the series is stationary.
81. Implementation of ADF Test
To perform the ADF test in any time series package, stats model provides the
implementation function ad fuller().
Function ad fuller() provides the following information.
p-value
Value of the test statistic
Number of lags for testing consideration
The critical values
Next in the following we perform unit root test and interpret
82. Autocorrelation
‘Just as correlation measures the extent of a linear relationship between two
variables, autocorrelation measures the linear relationship between lagged values of
a time series.’ Autocorrelation means that the linear model is self aware, it is
constantly taking into account a past version of itself.
A time lag is when the model measures its current performance in comparison to past
performance after a given set of time.
This is yet another way of measuring the stationarity of a model.
It is a simpler approach than trying to color code your graphs based on the outcome
of your AD Fuller results.
83. Below is the correlogram for Google stocks over a period
of 5 years. The lines being extremely close to one and in
a slight descending visual pattern indicates high
amounts of autocorrelation.
84. Looking at the second correlogram below, we can see that
autocorrelation is very low and random meaning our model is
stable.
Usually correlograms start with the first lag being fully correlated
with itself at 1.
The closer to 0 each lag lies, the less autocorrelation present in
the model.
85. Differencing
Differencing is a technique that can be applied to a data set in order to remove any
sort of stochasticity.
It is a method to make a non-stationary time series stationary — compute the
differences between consecutive observations.
This is a technique applied after a unit root test and autocorrelation test have been
run.
The 2 lines of code for differencing
86. Next week we will perform
Unit root test and OLS Eview
Thank you