2. SIMPLE REGRESSION AND
CORRELATION
Both regression and correlation between two sets of
variables measure strength of relationship. In the case
of linear regression, we will examine the amount of
variability in one variable (Y, the dependent variable)
that is explained by changes in another variable (X, the
independent variable). Specifically, we will look for
straight line or linear changes in Y as X changes.
Regression analysis is usually done in situation in which
we have control of the X variable and can measure it
essentially without error. For simplicity, we will avoid
discussing curvilinear relationship between variables.
3. Regression and correlation..,
Correlation analysis is used when both the
variables are experimental and measured with
error. It is more preliminary than regression
analysis and generally measure the
correlationship between two variables of
interest.
Let us consider two examples to further
highlight the differences between regression
and correlation analysis.
4. Example 10.1
A biology student wishes to determine the relationship between
temperature and heart rate (heart beat/minute) in the
common leopard frog. He manipulates the temperature in 20
C increments ranging from 2 to 180C and records the heart
rate at each interval. His data are presented in table form
below
Rec.
No.
1
2
3
4
5
6
7
8
9
Temp
(X)
2
4
6
8
10
12
14
16
18
Heart
rate
(Y)
5
11 11 14 22 23 32 29 32
5. Example 10.1 …
How should he proceed to describe the relationship
between these variables (temp. and heart rate) ?
Clearly the two variables have functional
dependence – as the temperature increases the
heart rate increases. Here the temperature is
controlled by the student and can take exactly the
same values in another experiment with a
different frog. Temperature is the INDEPENDENT
or “predictor” variable (X). Heart rate is
determined by temperature and is, therefore, the
DEPENDENT variable or “response” variable (Y).
6.
7. Example 10.2
A biologist interested in the morphology of west
Indian Chitons and he measured the length
and width of each of 10 chitons as
Anima
l
1
Length
(cm)
Width
(cm)
2
3
4
5
6
7
8
9
10
10.7 11.0
9.5
11.1
10.3
10.7
9.9
10.6
10.0
12.0
5.8
5.0
6.0
5.3
5.8
5.2
5.7
5.3
6.3
6.0
8. Example 10.2….
This data set is fundamentally different from the
data in Example 10.1 because neither variable
is under biologist’s control. To try to predict
length from width is as logical as to try to
predict width from length. Both variables are
free to vary (Fig 10.2). A correlational study is
more appropriate here than a regression
analysis. Because some of the calculations are
similar, regression and correlation are often
confused.
10. SIMPLE LINEAR REGRESSION
We assume
X
Y
1. Independent
1. Dependent
variable
variable
2. Measured
2. Free to vary
without error, fixed
and repeatable
11. Linear Model Assumptions
1. X’s are fixed and measured without error
2. The expected or mean value for the variable Y
for a given value of X is described by a linear
function
Y X
where and are constant real numbers
and 0
.
and represent the intercept and slope,
respectively, of the linear relationship between
X and Y.
12. Linear Model Assumptions
3. For any fixed value of X, there may be several
corresponding values of the dependent variable Y. For
example, for fixed temperature several frogs may show
several results. However, we assume that for any such
given below Xi , the Yi ‘s are independent of each
other and normally distributed. We can represent each
Yi value as Y X e
i
i
i
Y is described as the expected value ( X i ) plus a
deviation (ei) from that expectation. We assume ei s
are normally distributed error terms with a mean of
zero.
13. Linear Model Assumptions…
4. The variances of the distributions of Y for different
values of X are assumed to be equal.
To describe the experimental regression relationship
between Y and X we need to do the following
a) Graph the data to ascertain that an apparent linear
relationship exists
b) Find the best fitting straight line for the data set.
c) Test whether or not the fitted line explains a
significant portion of the variability in Y i.e. test
whether the linear relationship is real or not.
14. Regression coefficient
The regression coefficient or slope (b)
b
XY
X
2
( X )( Y )
n
2
( X )
n
Y changes for every unit change in X. Therefore, b
has unit as the original data set have.
If we have the value of ‘b’ we can calculate the
value of ‘a’ from Y a b X
15. Calculation of b
Referring to the example of temperature and heart rate
relationship in frog we have
n=9
X 90
X 10 . 0
b= 1.78.
X
2
1140
Y
179
Y 19 . 9
Y
2
4365
XY 2216
THIS MEANS, FOR EVERY 1 DEGREE CHANGE IN TEMP., THERE
IS 1.78 BIT/MIN HEART RATE INCREASES OR DECREASES.
16. Simple Linear Correlation Analysis
Correlation analysis is used to measure the intensity
of association observed between any pair of
variables. We are largely concerned with whether
two variables are interdependent or co-vary. Here
we do not express one variable as a function of
the other and do not imply that Y is dependent
on X as we did with regression analysis. Both X
and Y are measured with error and we wish to
estimate the degree to which these variables vary
together.
17. …Correlation
A widely used index of the association of two
quantitative variables is Pearson ProductMoment Correlation Coefficient, usually called
correlation coefficient (r).
r
X
2
XY
( X )( Y )
n
2
2
( X )
( Y )
2
Y
n
n
18. ….Correlation
Explainabl e var iability
-1≤r≤1 , r2 =
Total var iability
r2 = Coefficient of determination.
0.00
±0.10
±0.20
±0.30
±0.40
±0.50
±0.60
±0.70
±0.80
±0.90
±1.00
0.01
0.04
0.09
0.16
0.25
0.36
0.49
0.64
0.89
1.00
r
0.00
Correlation coefficients and the corresponding coefficients of
determination
r2
19. Correlation…
The standard error of the coefficient is
1 r
sr = n 2
Using this standard error we can develop a test
of hypothesis for
Ho: = 0
r0
r
Ha: ≠ 0 with the test statistic t s 1 r
With v = n-2
n2
2
2
r
20. Example 10.4 : Analysis of example
10.2 as a correlation problem
Let X be the chiton length (cm) and Y be the
chiton width (cm). The data for the problem
and the preliminary calculations
Length
Width
Length
Width
10.7
5.8
10.7
5.8
11.0
6.0
9.9
5.2
9.5
5.0
10.6
5.7
11.1
6.0
10.0
5.3
10.3
5.3
12.0
6.3
21. Example 10.4
X 105 . 8 , X 10 . 58
Y
X
2
56 . 4 , Y 5 . 64
1123 . 9 , Y
2
319 . 68 , XY 599 . 31
n 10
r
X
XY
( X )( Y )
n
( X )
( Y )
2
Y
n
n
2
2
2
0 . 969
22. Correlation….
Test whether there is a significant correlation
with α = 0.05 and v = n-2 = 10-2 = 8
Ho : 0
Ha : 0
Sr
So the test statistic is
t
r0
Sr
0 . 969 0
0 . 087
11 . 14
1 r
2
n2
1 ( 0 . 969 )
10 2
2
0 . 087
23. Correlation
The critical values from Table
C.4 for v = 8 with α = 0.05 are
± 2.306. Since 11.14>>2.306,
we find a STRONG LINEAR
CORRELATION between length
and width of chiton shells.
24. Solve the problem
• Followings are the records of amount of feed
ingested (kg) and live weight (Kg) of broilers. Test
whether there is any significant correlation between
amount of feed intake and body weight. How much
weight gains a broiler out of 1 kg feed.
Bird
No.
1
2
3
Feed 3.6
3.9
4.1 4.0 3.9
Wt.
2.1
2.2 2.4 2.3
2.0
4
5
r=0.726, sig (2-tailed) p<0.017
6
7
8
9
10
4.4 4.2 4.0
3.9
4.6
2.9 2.8 2.5
2.7
2.7