2. It deals with association between two or
more variables
Correlation analysis deals with
covariation between two or more
variables
Types
1. Positive or negative
Simple or multiple
Linear or non-linear
3. Methods of Measuring correlation
1. Graphic Method
2. Diagramatic Method- Scatter Diagram
3. Algebraic method
a. Karl Pearson’s Coefficient of correlation
b. Spearman’s Rank Co-efficient Correlation
c. Coefficient of Concurrent deviations
d. Least Squares Method
4. Karl Pearson’s Coefficient of Correlation
Σ dx dy
γ ( Gamma) = -------------------------
√ Σ dx2
Σ dy2
Σ dx dy
= -------------------------
N σxσy
dx = x-xbar
dy = y- ybar
dx dy = sum of products of deviations from
respective arithmetic means of both series
5. Karl Pearson’s Coefficient of Correlation
After calculating assumed or working mean Ax &
Ay
Σ dx dy – (Σ dx) x( Σ dy)
γ ( Gamma) = --------------------------------
√ [ NΣ dx2
- (Σ dx)2
x [Σ Ndy2
- (Σ dy)2
]
Σ dx dy = total of products of deviation from
assumed means of x and y series
Σ dx = total of deviations of x series
Σ dy = total of deviations of y series
Σ dx2
= total of squared deviations of x series
Σ dy2
= total of squared deviations of y series
N= No. of items ( no. of paired items
6. Karl Pearson’s Coefficient of Correlation
After calculating assumed or working mean Ax &
Ay
Σ dx x Σ dy
Σ dx dy - ----------------
N
γ ( Gamma) = -------------------------
(Σ dx)2
(Σ dy)2
√ [ Σ dx2
- --------- ] x [ Σ dy2
- ------------]
N N
7. Assumptions of Karl Pearson’s Coefficient of
Correlation
1. Linear relationship exists between the variables
Properties of Karl Pearson’s Coefficient of
Correlation
1.value lies between +1 & - 1
2.Zero means no correlation
3.γ ( Gamma) = √ bxy X byx
Where bxy X byx are two regression coefficicent
Merit
Convenient for accurate interpretation as it gives
degree & direction of relationship between two
variables
8. Limitations
1. Assumes linear relationship , even though it
may not be
2. Method & process of calculation is difficult &
time consuming
3. Affected by extreme values in distribution
9. Probable Error of Karl Pearson’s Coefficient of
Correlation
1- γ2
Probable Error of γ ( Gamma) = 0.6745 --------
√ N
10. Q7.Calculate coefficient of correlation for following data
X
65 63 67 64 68 62 70 66 68 67 69 71
Y 68 66 68 65 69 66 68 65 71 67 68 70
Ans Σ dx dy
γ ( Gamma) = -------------------------
√ Σ dx2
Σ dy2
Σ dx dy
= -------------------
N σxσy
14. Rank Correlation : some times variable are not
quantitative in nature but can be arranged in
serial order.
Specially while eading with attributes like –
honesty , beauty , character , morality etc
To deal with such situations , Charles Edward
Spearman , in 1904 developed a formula for
obtaining correlation coefficient between ranks
of n individuals in two attributes under study , or
ranks given by two or three judges
15. Rank coefficient of correlation
6Σ d2
ρ (rho) = 1 - -------------------
N3
-N
6Σ d2
ρ (rho) = 1 - -------------------
N(N2
-1)
Σ d2
= total of squared difference
N = number of items
16. Q9. ten competitors in a cooking competition are ranked
by three judges in the following way .by using rank
coorelation method find out which pair of judges have
nearest approach
P Q R
1 1 3 6
2 6 5 4
3 5 8 9
4 10 4 8
5 3 7 1
6 2 10 2
7 4 2 3
8 9 1 10
9 7 6 5
10 8 9 7
18. Regression Analysis is the process of
developing a statistical model which is used
to predict the value of a dependant variable
by an independent variable
Application
Advertising v/s sales revenue
First used by Sir Francis Gatton in 1877 for
study of height of sons w.r.t height of fathers
19. Regression Analysis – going back or to revert to
the former condition or return
Refers to functional relationship between x & y
and estimates of value of depebdent variable y
for given values of independeny variable x
Relationship between income of employees and
savings
Regression coefficients can be used to calculate ,
correlation coeffecient.γ ( Gamma) = √ bxy X
byx
20. Types of Regression
1. Simple & Multiple Regression
2. Total or Partial
3. Linear / Non-linear
Methods of Regression Analysis
1. Scatter Diagram
2. Regression Equations
3. Regression Lines
21. Line of Regression of y on x y= a + bx
Coefficient b is slope of line of regression of y on x.
It represents the increment in the value of the
dependent variable y for a unit change in the value of
independent variable x i.e. rate of change of y w.r.t. x.
It is written as byx
Regression coefficients/ coefficient of regression of y
on x
Σ( x- x-
) (y- y-
) σdx dy
byx= ------------------= ----------
Σ (x- x-
)2
Σ dx2
i.e. Equation of Line of Regression of x on y
y-y-
= byx (x-x-
)
22. Line of Regression of x on y x= a + by
Coefficient b is slope of line of regression of x on y.
It represents the increment in the value of the
dependent variable x for a unit change in the value of
independent variabley i.e. rate of change of x w.r.t. y.
It is written as bxy
Regression coefficients/ coefficient of regression of x
on y
Σ( x- x-
) (y- y-
) σdx dy
bxy= ------------------= ----------
Σ (y- y-
)2
Σ dy2
i.e. Equation of Line of Regression of x on y
y-y-
= bxy (x-x-
)
23. Q2.From the data given below find
two regression coefficients
two regression equations
coefficient of correlation between marks in
Economics & statistics
most likely marks in statistics when marks in
Economics are 30
let marks in Economics be x and that in statistics
be y
Marks in Eco 25 28 35 32 31 36 29 38 34 32
Marks in Stat 43 46 49 41 36 32 31 30 33 39
24. Marks in
Eco
25 28 35 32 31 36 29 38 34 32 Σx 320 x-
32
Marks in
Stat
43 46 49 41 36 32 31 30 33 39 Σy 380 y-
38
27. Regression coefficients / coefficient of regression
of y on x =
Σ( x- x-
) (y- y-
) Σdx dy -93
byx= ------------------= ---------- = --------= -0.6643
Σ (x- x
-
)2
Σ dx2
140
regression of y on x
y-y-
= byx (x-x-
)
y-38 = -0.6643(x-32)
y -38= -0.6643x+0.6643*32
y = -0.6643x+38+0.6643*32
y = -0.6643x+38+21.2576
y = -0.6643x+59.2576
28. coefficient of regression of x on y
Σ( x- x-
) (y- y-
) Σdx dy -93
bxy= ------------------= ------- = ------ = -0.2337
Σ (y- y-
)2
Σ dy2
398
Equation of regression of x on y
x-x-
= bxy (y-y-
)
x-32 = -0.2337(y-38)
= - 0.2337 y +0.2337 *38
= -0.2337y + 8.8806
x = -0.2337y +32 + 8.8806
x = -0.2337y +40.8806
29. Correlation Coefficient = √ bxy *byx
= √ -0.2337 *-0.6643 = √ 0.1552 = -0.394
Since byx & bxy are both negative
30. In order to estimate most likely marks in statistics
(y) when Economics (x) are 30 , we shall use the
line regression of y x viz
The required estimate is given by
y = -0.6643* 30+59.2576= -19.929+59.2576 =
=39.3286
31. Sum of Squares- x&y
(Σx )*(Σy)
SSxy=Σ( x-x-
)(y-y-
)= Σdxdy = Σxy - --------------
n
Sum of Squares xx
(Σx )
SSxx = Σ ( x-x-
)2
= Σdx
2
=Σx2
- -------------
n
33. Sum of Squares- x&y
(Σx )*(Σy)
SSxy=Σ( x-x-
)(y-y-
)= Σdxdy = Σxy - --------------
n
Sum of Squares xx
(Σx )
SSxx = Σ ( x-x-
)2
= Σdx
2
=Σx2
- -------------
n
34. SSxy Σdxdy
b = ------------=---------
SSxx Σdx
2
y=a+bx
Σ y= Σ a+b Σ x
Σ y= n* a+b Σ x
n* a = b Σ x - Σ y
Σ y - bΣ x Σ y bΣ x
a = ----------- = ------- - -------
n n n
38. SSxy 6565
b = ------------- = ----------------= 19.0704
SSxx 344.25
y=a+bx
Σ y= Σ a+b Σ x
Σ y= n* a+b Σ x
n* a = b Σ x - Σ y
Σ y - bΣ x Σ y bΣ x 13060 19.0704*1221
a = ----------- = ------- - ------- = ---------- - --------------
n n n 12 12
= - 852.08
39. equation for simple regression line
y= a+bx
y= -852.08+ 19.0704 x
for regression of y on x
40. For testing the Fit
yi = yi- value of y –recorded value in the given
data
y-
= Mean ( Average )of y
y^ = Predicted Values from regression line
deviation = (yi- y-
) = difference in actual value of y
from mean
Residuals = (yi- y^)= gap ( error , difference )
between actual value of y & predicted value
calculated from regression line
Deviation of predicted value from mean = (y^- y-
)
a = intercept on y -axis
b= slope of regression line
41. total sum of squares = SST = Σ (yi-y-
)2
regression sum of squares = SSR = Σ (y^- y-
)2
Error sum of squares = SSE = Σ (yi-y^)2
SSR
coefficient of determination = γ2= -------
SST
42. SSE
Standard Error of Estimate =Syx= √----------------
n-2
In order to to determine whether a significant
linear relationship exists between independent
variable x and dependent variable y we perform
whether population slope is zero
b - β
t= ----------
Sb
Syx
Sb = Standard error of b= -----------
√ SSxx
43. H0:Slope of thr regression line is zero
H1-Slope of the regression line is not zero
44. SSE
Syx= Standard Error of Estimate =√--------
n-2
Σ (yi-y^)2 13769.21
=√ -------- = √------------ = √1376.92 = 37.1068
n-2 10-2
(Σx )2 (1221)2
SSxx = Σx2 - -------- = 124581 - -------= 344.25
n 12
Syx
Sb = Standard error of b= -----------
√ SSxx
45. Syx
Sb = Standard error of b= -----------
√ SSxx
b- β 19.07-0
t= ---------- = ------------------------------- = 9.53
Sb 37.1068/( √344.25)
As calculated value of t is more than table
value of t for 12-2 = 10 degrees of freedom
Null hypothesis is rejected
46. Coefficient of Determination Definition
The Coefficient of Determination, also known as R
Squared, is interpreted as the goodness of fit of a
regression.
The higher the coefficient of determination, the
better the variance that the dependent variable is
explained by the independent variable.
The coefficient of determination is the overall
measure of the usefulness of a regression.
For example,r2
is given at 0.95. This means that the
variation in the regression is 95% explained by the
independent variable. That is a good regression.
47. The Coefficient of Determination can be
calculated as the Regression sum of squares,
SSR, divided by the total sum of squares, SST
SSR
Coefficient of Determination γ2
= ---------- SST