4. Presenting Tropics
Measure of dispersion.
Importance and necessity of measure of
dispersion
The classification of measure of dispersion.
Merits & Demerits of Standard Deviation
Merits & Demerits of Mean Deviation
Merits & Demerits of Quartile Deviation
Advantages & Disadvantages of Range
5. Presenting Tropics
Difference between Absolute and Relative
Measure of Dispersion
Good qualities of a good measure of dispersion
Best measure of dispersion
Difference between
a) SD and CV
b) MD and SD
c) Variance and CV
Standard deviation is the best measure of
dispersion
Coefficient of correlation (Karl Pearson’s),scatter
diagram, rank correlation(spearman’s)
6. Presenting Tropics
Perfect correlation and curvilinear correlation
with example.
Characteristics and limitation of correlation.
Endogenous and exogenous variable.
Least square method.
Difference between correlation analysis and
regression analysis.
Degree of correlation.
Difference between correlation and Regression
analysis.
7. Presenting Tropics
Regression & it’s important
Simple and multiple regression & regression
coefficient
Probability Distributions
Continuous and discrete probability distributions
Continuous distribution
Example of the distribution of weights
Distribution plot of the weight of adult males
Discrete distribution
Example of the number of customer complaints
8. Presenting Tropics
The different types of discrete and continuous
distribution
Condition of binomial distribution
Properties of binomial distribution
Relation of Poisson and Normal distribution
9. Dispersion
The measure dispersion are called average of second order.
A measure of dispersion describes the degree of scatter
shown by observation and is usually measure as an average
deviation about some central value.
Importance and necessity of measure of dispersion
To realize the reliability of the measures of central
tendency
To compare the variability of two or more sets of data.
To suggest various methods for controlling the variations in
a set of observations
To facilitate as a basis for further statistical analysis
10. Dispersion
There are two type of measure of dispersion:
1.The absolute measure of dispersion
2.The relative measure of dispersion
The absolute measure of dispersion
When dispersion is measure in original units then it is
known as absolute measure of dispersion. There are four type
of absolute measure of dispersion
1.Range
2.Mean deviation
3.Standard deviation
4.Quartile deviation
11. Dispersion
The relative measure of dispersion
A relative measure of dispersion is independent of
original units.
Generally, relative measure of dispersion are
expressed interns ratio, percentage etc.
The relative measure of dispersion are as follows:-
1.Coefficient of range
2.Coefficient of mean deviation
3.Coefficient of standard deviation
4.Coefficient of quartile deviation
12. Standard Deviation
The standard deviation is the positive square root
of the mean of the square from their mean of a set
of values. It is generally denoted by (sigma) and
expressed is
Merits of Standard Deviation
It is rigidly defined and used in many general use
It is less affected by sampling fluctuations
It is useful for calculating the skewness, Kurtosis,
Coefficient of correlation and so forth
It measures the consistency of data.
It is less erratic.
Dispersion
13. Dispersion
Demerits of Standard Deviation
It is not so easy to compute
It is affected by extreme values.
Merits of Mean Deviation
It is easy to compute and understand.
It is based on all the value of set.
It is useful measure of dispersion.
It is not greatly affected by extreme value.
14. Dispersion
Demerits of Mean Deviation
This measure is not sates factory unless the data is
symmetrical.
This is not suitable for further mathematical
treatment.
Use of Mean Deviation
It is used in certain economic and anthropological
studies.
It is often sufficient when an informal measure of
dispersion required.
15. Dispersion
Merits of Quartile Deviation
It is easy to compute and simple to understand.
It is not affected by extreme value.
This measure of dispersion is superior to range.
It is useful also in measuring variation in case of open
ended distribution.
16. Dispersion
Demerits of Quartile Deviation
It is not based on all the observations in the series.
It is highly affected by sometimes fluctuations.
It is not suitable for further algebraic or mathematical
treatment.
Advantages of Range
It is the simplest measure of dispersion.
It is easy to compute the range and easy to understand.
17. Dispersion
It is based on extreme observations only and no detail
information is required.
The chief merit of range is that it gives us a idea of the
variability of a set of data.
It does not depend on the measures of central
tendency.
Disadvantages of Range
The range is not precise measure.
It is influenced completely by the extreme values.
It cannot be computed for open ended distribution.
It is not suitable for further mathematical treatment.
20. Dispersion
The good measure of dispersion
It should be rigidly defined.
It should be easy to understand and easy to calculate.
It should be based on all the values of the given data.
It should be useful for further mathematical treatment.
It should be less affected by sampling fluctuation.
It should be unduly influenced by extreme values of all
the observations.
21. Dispersion
Range
The range is easy to compute and rigidly defined but it
is not based on all the observations.
It is influenced by extreme values and not useful for
further mathematical treatment.
It is not use for open class interval and also affected by
sampling fluctuation.
This measure is employed in quality control whether
forecast etc.
22. Mean deviation
Mean deviation rigidly defined and easy to understand.
It is based on all observations.
It is capable for algebraic manipulation.
It is less affected by sampling function and extreme
values.
It can’t be determine for open class interval.
It is freely used in stoical analysis of economics, business
etc.
It is less suitable than standard deviation.
23. Dispersion
Quartile deviation
Quartile deviation rigidly defined to easy to understand
and easy to calculate.
It is not based on all the values.
It is not capable to further mathematical treatment and
affected by extreme values.
It is used to measure variation in open-end distribution.
It is not influenced by extreme values.
It is suitable to study the social phenomenon.
24. Standard deviation
Standard deviation rigidly defined.
It is based on all the values.
It is less affected by sampling function.
It is capable for further algebraic manipulation.
It is not easy to understand and not easy to calculated.
It is not suitable for open-end distribution.
It is affected by extreme values.
Standard deviation is extensively used in the theory of
sampling, regression, correlation, analysis of variance etc.
25. Standard deviation
From the above comparison of dispersion, standard
deviation is the most popular and used method for
measuring dispersion in a series.
It has great practical utility in sampling and statical
inference.
It is the most important measure of variation which is
one of the pillars of statistics.
30. Standard deviation is the best measure
of dispersion
Standard deviation is the most popular and used
method for measuring dispersion in a series.
It has great practical utility in sampling and
statically inference.
It is the most important measure of variation which
is one of the pillars of statistics.
Standard deviation is extensively used in the theory
of sampling, regression, correlation, analysis of
variance etc.
31. Coefficient of correlation
Karl Pearson’s
Karl Pearson’s Coefficient of Correlation is widely
used mathematical method where in the numerical
expression is used to calculate the degree and
direction of the relationship between linear related
variables.
Pearson’s method, popularly known as a Pearson an
Coefficient of Correlation, is the most extensively
used quantitative methods in practice.
The coefficient of correlation is denoted by “r”.
32. Coefficient of correlation
If the relationship between two variables X and Y is to be ascertained,
then the following formula is used:
Where, mean of X variable
mean of Y variable
Scatter diagram
A graphic representation of vicariate data as a set of points in
the plane that have Cartesian coordinates equal to
corresponding values of the two varieties.
33. Coefficient of correlation
Rank correlation (Spearman’s)
This method of finding out co-variability or the lack
of it between two variables was developed by the
British psychologist Charles, Edward Spearman in
1904.
This measure is especially useful when quantitative
measure of certain factor (such as in the evaluation of
leadership ability or the judgment of female beauty)
cannot be fixed, but the individuals in the group can
be arranged in order thereby obtaining for each
individual a number indicating his rank in the group.
35. Coefficient of correlation
Significant of
the coefficient of correlation indicator.
It makes understand the value of r.
The closer r to +1 or -1, the closer the relationship
between the variables and the closer r is to 0, the less
closely the relationship.
Beyond this is not safe to go.
The fill interpretation of r depends upon circumstances,
one of which is the size of simple.
All that can really be said that when estimating the value
of one variable from the value of another; the higher the
value of r, the better the estimate.
So, r should be the value, it has more significant.
36. Correlation
Perfect correlation
When a change in the values of one variable associated
whit a corresponding and proportional change in the
other then the correlation is called perfect correlation.
Curvilinear correlation
When the change in the values of one variable tends to
be a constant ratio with the corresponding change in
the other variable, then the correlation is said to be
linear.
39. Correlation
Characteristics of correlation
The major characteristics of Pearson’s coefficient of
correlation, r is as follows:
(a). The value of r lies between (-1, +1)
When, r= + 1, then there exists perfect positive correlation.
When, r= - 1, then there exists perfect negative correlation.
(b). When r=0, then there is no linear relationship between
two variables but there may be curvilinear or non-linear
relationship.
When, r= 0, then the two variables may also be
independent
40. Correlation
(c). It gives the degree of concomitant movements or
variation between two variables.
(d). ‘r’ is independent of origin and scale.
(e). The coefficient of correlation is the geometric
mean of two regression coefficients. Symbolically,
41. Correlation
Limitation of correlation
To determine the coefficient of correlation (r) we have
to assume that there is a liner relationship and non-
linear relationship.
It is valid when we have a random sample from a
vicariate normal distribution.
If the sample size is small then it doesn’t give us a
better result to determine the relation
42. Variable
Endogenous variable
Endogenous variables are used in econometrics and
sometimes in linear regression.
They are similar to(but not exactly the same as) dependent
variables.
According to Daniel Little, University of Michigan-
Dearborn, an endogenous Variable is defined in the
following way:
A variable is said to be endogenous within the causal model
M if its value is determined or influenced by one or more of
the independent variables X(excluding itself).
Exogenous Variables
An exogenous variable is a variable that is not affected
by other variables in the system.
44. Variable
Least square method
For finding out the correlation of coefficient by least square
method we have to calculate the values of two regression
coefficient:
1. The regression coefficient of x on y.
2. The regression coefficient of y on x.
Then the correlation coefficient is the square root of the
product term of the regression coefficients symbolically
45. Correlation
Degree of correlation
Perfect correlation: If the value of r is near 1 it said to be
perfect correlation; as one variable increase (or decrease),
the other variable tend to also increase (or decrease).
Higher degree of correlation: If the coefficient of r lies
between 50 and bellow 1 then it is said to high degree
on strong correlation.
Moderate degree: If the value lies between 0.30 and 0.49
then it is said to be a medium/moderate degree of
correlation.
Low degree: When the value lies below 0.29 it is said to
be low degree of correlation.
No correlation: When the value of r = 0 then there is no
correlation.
48. Regression
Regression is a statistical measure used in finance, investing
and disciplines that attempts to determine the strength of the
relationship between one dependent variable
Importance of Regression
The regression equation provides a coincide and meaningful
summery of the relationship between the dependent variable y
and independent variable x.
The relation can be used for predictive purpose.
If the form of the relationship between x and by x is known
then the parameters of interest can be estimated from relevant
data.
In situations, where the variable of interest(y) depends on a
number of factors, then it is possible to assess and entangle the
contribution of the factors individually with the help of
regression analysis.
49. Regression
Simple Regression
When the dependency of the dependent variable is
estimated by only one independent variable then it is said
simple regression.
Multiple Regression
When there are more than one independent variables then
it is said multiple regression.
Equation for multiple regression:
50. Regression
Regression coefficient
When the regression line (y=+) the regression coefficient is the
constant () that represents the rate of change of one variable(y)
as a function of changes in the other (x); it is the slope of the
regression line.
And alternative,
Here, regression coefficient of y on x and
= regression coefficient of x on y.
51. Probability Distributions
A listing of all the values the random variable can assume
with their corresponding probabilities make a probability
distribution.
Probability distribution is a table or an equation that links
each outcome of a statistical experiment with its
probability of occurrence.
A note about random variables.
A random variable does not mean that the values can be
anything (a random number).
Random variables have a well defined set of outcomes and
well defined probabilities for the occurrence of each
outcome.
The random refers to the fact that the outcomes happen
by chance – that is, you don’t know which outcome with
occur next.
52. Probability Distributions
Here’s an example probability distribution that results
from the rolling of a single fair die.
Probability distributions are either continuous probability
distributions or discrete probability distributions, depending on
whether they define probabilities for continuous or discrete
variables.
Continuous and discrete probability distributions
54. Probability Distributions
Continuous distribution
A continuous distribution describes the probabilities of
the possible values of a continuous random variable.
A continuous random variable is a random variable with a
set of possible values (known as the range) that is infinite
and uncountable.
Probabilities of continuous random variables (X) are
defined as the area under the curve of its PDF.
Thus, only ranges of values can have a nonzero
probability.
The probability that a continuous random variable equals
some value is always zero.
55. Probability Distributions
Distribution of weights
The continuous normal distribution can describe the
distribution of weight of adult males.
For example, you can calculate the probability that a man
weighs between 160 and 170 pounds.
56. Probability Distributions
Distribution plot of the weight
The shaded region under the curve in this example represents the
range from 160 and 170 pounds.
The area of this range is 0.136; therefore, the probability that a
randomly selected man weighs between 160 and 170 pounds is
13.6%.
The entire area under the curve equals 1.0.
However, the probability that X is equal to some value is zero because
the area under the curve at a single point, which has no width, is zero.
For example, the probability that a man weighs exactly 190 pounds
to infinite precision is zero.
You could calculate a nonzero probability that a man weighs more
than 190 pounds, or less than 190 pounds, or between 189.9 and
190.1 pounds, but the probability that he weighs exactly 190 pounds
is zero.
58. Probability Distributions
Discrete distribution
A discrete distribution describes the probability of
occurrence of each value of a discrete random
variable.
A discrete random variable is a random variable
that has countable values, such as a list of non-
negative integers.
With a discrete probability distribution, each
possible value of the discrete random variable can
be associated with a non-zero probability.
Thus, a discrete probability distribution is often
presented in tabular form.
59. Probability Distributions
Number of customer complaints
With a discrete distribution, unlike with a continuous
distribution, you can calculate the probability that X is
exactly equal to some value.
For example, you can use the discrete Poisson
distribution to describe the number of customer
complaints within a day.
Suppose the average number of complaints per day is
10 and you want to know the probability of receiving
5, 10 and 15 customer complaints in a day.
You can also view a discrete distribution on a
distribution plot to see the probabilities between
ranges.
62. Probability Distributions
Discrete distribution
Binomial and Poisson are discrete probability distribution
and normal distribution is continuous probability
distribution.
Binomial distribution
A random variable X is said to follow binomial distribution
if it assume only non negative values and its probability
mass function is given by
P(X=x) x=0,1,2,3,…….n
P+q=1,q=1-p, 0 P
P= success, q=failure, n= number of Bernoulli trials
When n=1, it is called Bernoulli distribution
63. Probability Distributions
Condition of binomial distribution
The experiment consist of n repeated trials
Each trail result in just two possible outcomes, we call
one of these outcomes a success and other, a failure
The probability of success, denoted by p, is the same
on every trail
The trails are independent; that is the outcomes on
trail does not affect the outcome other trails
64. Probability Distributions
Properties of binomial distribution
It is discrete probability distribution
Mean and variance of binomial distribution are np
and npq respectively.
Parameters of binomial distribution are n and p
When n is large enough i.e, n and p then binomial
distribution tends to normal distribution
When n is large enough i.e, n then binomial
distribution tends to Poisson distribution.
65. Probability Distributions
Normal of Poisson distribution
A random x is said to have a normal distribution if and
only if for >0 and - , the density function of x is
y = f(x)= , -
y = the computed height of an ordinate at a distance of x
from the mean
= Standard deviation of the given normal distribution
= 3.1416
e = the constant = 207183
= mean or average of the given normal distribution.
67. Probability Distributions
A random variable with any mean and standard
deviation can be transformed to a standardized
normal variable by subtracting the mean and dividing
by standard deviation.
For a normal distribution with mean and standard
deviation , the standardized variable z obtained as
z =
68. Probability Distributions
Relation of Binomial, Poisson and Normal
distribution
When n is large and the probability p of occurrence of
an event is close to zero so that np remains a finite
constant, then the binomial distribution tends to
Poisson distribution.
Similarly, there is a relation between binomial and
distributions. Normal distribution is limiting from of
binomial distribution under the following conditions:
n, the number of trials is very large, i.e n
Neither p nor q is very small.
69. Probability Distributions
In fact, it can be proved that the binominal
distribution approaches a normal distribution with
standardized normal variable, I.e., z =
or, z = will follow the normal distribution with
mean zero and variance one.
Similarly, Poisson distribution also approaches a
normal distribution with standardized normal
variable, i.e. , z =
In other words, z = will follow the normal
distribution with mean zero and variance one.