Rank correlation- some features and an application
1. On some interesting features and an
application of rank correlation
Kushal Kr. Dey
Indian Statistical Institute
D.Basu Memorial Award Talk 2011
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
2. List of contents
1 Historical overview of rank correlation.
2 Some properties of rank correlation.
3 A practical example of rank correlation.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
3. Historical Overview—Correlation
In 1886, Sir Francis Galton coined the term correlation by
quoting
length of a human arm is said to be correlated with
that of the leg, because a person with long arm has
usually long legs and conversely.
Galton wanted a measure of correlation that takes value +1
for perfect correspondence, 0 for independence, and -1 for
perfect inverse correspondence.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
4. Historical Overview—Correlation
In 1886, Sir Francis Galton coined the term correlation by
quoting
length of a human arm is said to be correlated with
that of the leg, because a person with long arm has
usually long legs and conversely.
Galton wanted a measure of correlation that takes value +1
for perfect correspondence, 0 for independence, and -1 for
perfect inverse correspondence.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
5. Historical overview—contd.
Karl Pearson, a student of Galton, worked on his idea and
formulated his ”product moments” measure of correlation in
1896.
Sxy
r=√ . (1)
Sxx Syy
Spearman observed that for characteristics not quantitatively
measurable, the Pearsonian measure fails to measure the
association. This motivated him to use rank-based methods
for association and develop his rank correlation coefficient in
1904. [”The proof and measurement of association between
two things” by C. Spearman in The American Journal of
Psychology (1904)].
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
6. Historical overview—contd.
Karl Pearson, a student of Galton, worked on his idea and
formulated his ”product moments” measure of correlation in
1896.
Sxy
r=√ . (1)
Sxx Syy
Spearman observed that for characteristics not quantitatively
measurable, the Pearsonian measure fails to measure the
association. This motivated him to use rank-based methods
for association and develop his rank correlation coefficient in
1904. [”The proof and measurement of association between
two things” by C. Spearman in The American Journal of
Psychology (1904)].
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
7. Historical overview contd
In 1938, two years after the death of Pearson, Maurice
Kendall, a British scientist, while working on psychological
experiments, came up with a new measure of correlation
popularly known as Kendall’s τ . [”A new measure of rank
correlation”, M. Kendall, Biometrika,(1938)].
Th next few years saw extensive research in this area due to
Kendall, Daniels, Hoeffding and others.
In 1954, a modification to Kendall’s coefficient in case of ties
was made by Goodman and Kruskal. [”Measures of
association for cross classifications” Part I, L.A.Goodman and
W.H. Kruskal, J. Amer. Statist. Assoc, (1954)]
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
8. Historical overview contd
In 1938, two years after the death of Pearson, Maurice
Kendall, a British scientist, while working on psychological
experiments, came up with a new measure of correlation
popularly known as Kendall’s τ . [”A new measure of rank
correlation”, M. Kendall, Biometrika,(1938)].
Th next few years saw extensive research in this area due to
Kendall, Daniels, Hoeffding and others.
In 1954, a modification to Kendall’s coefficient in case of ties
was made by Goodman and Kruskal. [”Measures of
association for cross classifications” Part I, L.A.Goodman and
W.H. Kruskal, J. Amer. Statist. Assoc, (1954)]
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
9. Daniel’s Generalized correlation coefficient
H.E. Daniels of Cambridge University, a close associate of
Kendall, proposed a measure in 1944 to unify Pearson’s r ,
Spearman’s ρ and Kendall’s τ [The relation between
measures of correlation in the universe of sample
permutations, H.E.Daniels, Biometrika,(1944)].
Consider n data points given by (Xi , Yi ), i = 1(|)n , for each
pair of X ’s, (Xi , Xj ), we may allot aij = −aji and aii = 0,
similarly, we may allot bij to the pair (Yi , Yj ), then Daniel’s
generalized coefficient D is given by
n n
d i=1 j=1 aij bij
D= n n n n 1 (2)
2 2 2
( i=1 j=1 aij . i=1 j=1 bij )
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
10. Daniel’s Generalized correlation coefficient
H.E. Daniels of Cambridge University, a close associate of
Kendall, proposed a measure in 1944 to unify Pearson’s r ,
Spearman’s ρ and Kendall’s τ [The relation between
measures of correlation in the universe of sample
permutations, H.E.Daniels, Biometrika,(1944)].
Consider n data points given by (Xi , Yi ), i = 1(|)n , for each
pair of X ’s, (Xi , Xj ), we may allot aij = −aji and aii = 0,
similarly, we may allot bij to the pair (Yi , Yj ), then Daniel’s
generalized coefficient D is given by
n n
d i=1 j=1 aij bij
D= n n n n 1 (2)
2 2 2
( i=1 j=1 aij . i=1 j=1 bij )
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
11. Daniel’s generalized coefficient contd.
Special cases
Put aij as Xj − Xi and bij as Yj − Yi to get Pearson’s r .
Put aij as Rank(Xj ) − Rank(Xi ) and bij as
Rank(Yj ) − Rank(Yi ) to get Spearman’s ρ.
Put aij as sgn(Xj − Xi ) and bij as sgn(Yj − Yi ) to get
Kendall’s τ .
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
12. Alternative expression for τ and ρ
First, we define dij to be +1 when the rank j ( j > i) precedes
the rank i in the second ranking and zero otherwise.
We can write the Kendall’s τ as the following
4Q
τ =1− (3)
n(n − 1)
where Q is the total score, Q = i<j dij and n is the total
number of elements in the sample.
Similarly, we can write Spearman’s ρ as the following
12V
ρ=1− (4)
n(n2 − 1)
where V = i<j (j − i)dij is the sum of inversions weighted
by the numerical difference between the ranks inverted. This
difference is called the weight of inversion.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
13. Alternative expression for τ and ρ
First, we define dij to be +1 when the rank j ( j > i) precedes
the rank i in the second ranking and zero otherwise.
We can write the Kendall’s τ as the following
4Q
τ =1− (3)
n(n − 1)
where Q is the total score, Q = i<j dij and n is the total
number of elements in the sample.
Similarly, we can write Spearman’s ρ as the following
12V
ρ=1− (4)
n(n2 − 1)
where V = i<j (j − i)dij is the sum of inversions weighted
by the numerical difference between the ranks inverted. This
difference is called the weight of inversion.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
14. An interesting result
We simulated observations in large sample size from a
bivariate normal distribution and plotted the mean values of
Spearman’s ρ and Kendall’s τ against Pearson’s r . We
obtained the following graph.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
15. The graph
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
16. Relation of τ and ρ with r for BVN
In 1907, Pearson , in his book [”On Further Methods of
Determining Correlation”, Karl Pearson, Biometric series IV,
(1907)], established the following relation between
Spearman’s ρ and his r for bivariate normal distribution.
π
r = 2 sin ρ (5)
6
Cramer, in 1946, also established a relation between Kendall’s
τ and Pearson’s r for bivariate normal.
π
r = sin τ (6)
2
However it is easy to show that the above two relations hold
for any elliptic distribution.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
17. Relation of τ and ρ with r for BVN
In 1907, Pearson , in his book [”On Further Methods of
Determining Correlation”, Karl Pearson, Biometric series IV,
(1907)], established the following relation between
Spearman’s ρ and his r for bivariate normal distribution.
π
r = 2 sin ρ (5)
6
Cramer, in 1946, also established a relation between Kendall’s
τ and Pearson’s r for bivariate normal.
π
r = sin τ (6)
2
However it is easy to show that the above two relations hold
for any elliptic distribution.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
18. Relation of τ and ρ with r for BVN
In 1907, Pearson , in his book [”On Further Methods of
Determining Correlation”, Karl Pearson, Biometric series IV,
(1907)], established the following relation between
Spearman’s ρ and his r for bivariate normal distribution.
π
r = 2 sin ρ (5)
6
Cramer, in 1946, also established a relation between Kendall’s
τ and Pearson’s r for bivariate normal.
π
r = sin τ (6)
2
However it is easy to show that the above two relations hold
for any elliptic distribution.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
19. Relation between Kendall’s τ and r for bivariate
normal
Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) be a sample drawn from
BVN(0,0,1,1,r). Then Kendall’s τ computed from the data is
an unbiased estimator of
2P((X1 − X2 )(Y1 − Y2 ) > 0) − 1 = 2P(Z1 Z2 > 0) − 1 (7)
where (Z1 , Z2 ) ∼ BVN(0, 0, 2, 2, 2r ).
d √ √
Note that (Z1 , Z2 ) = 2(V 1 − r 2 + Wr , W ) where (V , W )
have standard normal distribution. Since (Z1 , Z2 ) is symmetric
about (0, 0)
4P(Z1 > 0, Z2 > 0)−1 = 4P(V 1 − r 2 +Wr > 0, W > 0)−1
(8)
Use polar transformation on (V , W ) and evaluate this
probability to get π sin−1 r .
2
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
20. Relation between Kendall’s τ and r for bivariate
normal
Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) be a sample drawn from
BVN(0,0,1,1,r). Then Kendall’s τ computed from the data is
an unbiased estimator of
2P((X1 − X2 )(Y1 − Y2 ) > 0) − 1 = 2P(Z1 Z2 > 0) − 1 (7)
where (Z1 , Z2 ) ∼ BVN(0, 0, 2, 2, 2r ).
d √ √
Note that (Z1 , Z2 ) = 2(V 1 − r 2 + Wr , W ) where (V , W )
have standard normal distribution. Since (Z1 , Z2 ) is symmetric
about (0, 0)
4P(Z1 > 0, Z2 > 0)−1 = 4P(V 1 − r 2 +Wr > 0, W > 0)−1
(8)
Use polar transformation on (V , W ) and evaluate this
probability to get π sin−1 r .
2
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
21. Relation between Spearman’s ρ and r for bivariate
normal
Now we try to give a sketch of a proof of the relationship
between Pearson’s r and Spearman’s ρ for bivariate normal
distribution .
Let R(Xi ) and R(Yi ) be the ranks of Xi and Yi . Define
H(t) = I{t>0} . Then, observe that
n
R(Xi ) = H(Xi − Xj ) + 1 (9)
j=1
Note that Spearman’s ρ is the Pearson’s correlation coefficient
h− 1 n(n−1)2
between R(Xi ) and R(Yi ) which is 1
4
n(n2 −1)
12
n n n
where h = i=1 j=1 k=1 H(Xi − Xj )H(Yi − Yk ).
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
22. Proof continued
Case 1
If i, j, k are distinct, then (Xi − Xj , Yi − Yk ) are distributed as
r
BVN(0, 0, 2, 2, 2 ).
E {H(Xi − Xj )H(Yi − Yk )} will reduce to the integral of the
probability density over the positive quadrant.
We can check, following similar technique as in the case of τ
that, this integral is 2 (1 − π cos−1 2 ).
1 1 r
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
23. Proof continued
Case 2
If i = j = k, then (Xi − Xj , Yi − Yk ) are distributed as
BVN(0, 0, 2, 2, r ) and the above expectation would reduce to
1 1 −1 r ). Then,
2 (1 − π cos
h − 4 n(n − 1)2
1
6 n − 2 −1 r 1
E 1 2
= sin + sin−1 r
12 n(n − 1)
π n+1 2 n+1
(10)
As n goes to infinity, the R.H.S reduces to 6
π sin−1 2 .
r
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
24. Reason for approximate linear relationship between
Spearman’s ρ and Pearson’s r for BVN
As observed from the graph, Spearman’s ρ for Bivariate
normal is almost linearly related with Pearson’s r . This may
be attributed to the fact that ρ = π sin−1 2
6 r
3
= π ( 2 + 1 r8 + . . .)
6 r
6
3
= π r + terms very small compared to 1st order term
3
≈ πr
For Kendall’s τ , using similar expansion, we can also show
that τ convex function of r in the interval [0,1]. a
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
25. Kendall’s comparative assessment of τ and ρ
−n 3
Kendall in his paper admitted that ρ can take n 6 values
2 −n
between −1 and +1, whereas τ can take only n 2 values in
the range, but according to him, this does not seriously affect
the sensitivity of τ .
Both Kendall’s τ and Spearman’s ρ computed from the
sample have asymptotically normal distributions.
But Kendall showed using simulation experiments that the
distribution for his correlation coefficient is surprisingly close
to normal even for small values of n, which is not the case for
Spearman’s correlation.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
26. Bias properties of Kendall’s τ and Spearman’s ρ
Consider a finite population. Let ρ and τ be Spearman’s
and Kendall’s rank correlation coefficients computed from the
entire population.
Suppose that we have a simple random sample without
replacement from that population. And we compute
Spearman’s ρ and Kendall’s τ from the sample.
Then, τ is an unbiased estimator for τ but ρ is a biased
estimator for ρ .
If the population size N tends to infinity, expected value of
1
Spearman’s ρ goes to n+1 {3τ + (n − 2)ρ } where n is the
size of the sample.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
27. small sample distribution of τ , ρ and r
It is well-known that for a simple random sample of size n
drawn from a bivariate normal distribution, under the
assumption of zero correlation, Pearson’s r satisfies
√
r n−2
√ ∼ tn−2 (11)
1 − r2
But the distribution of r for small samples from normal
distribution with non-zero correlation and from non-normal
distributions, is not tractable.
τ and ρ are distribution free statistics in the sense that their
distributions do not depend on the distribution of the data so
long as X and Y are independent. Consequently, their
distributions under the hypothesis of independence of X and
Y can be tabulated.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
28. Asymptotic normality of r , ρ and τ
Note that each of Pearson’s r , Spearman’s ρ and Kendall’s τ
computed from a bivariate data are asymptotically normally
distributed.
Asymptotic normality of Pearson’s r can be derived using
Central Limit Theorem applied to various bivariate sample
moments.
Asymptotic normality of Spearman’s ρ follows from
asymptotic normality of linear rank statistics.
Asymptotic normality of Kendall’s τ follows from asymptotic
normality of U-statistics.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
29. List of contents
Historical overview of rank correlation.
Some properties of rank correlation.
A practical example of rank correlation.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
30. A practical application of rank correlation
Recently, the Ministry of Human Resource Development
(MHRD) considered giving weightage to the marks scored in
the 10+2 Board exams for admission to engineering colleges
in India.
The raw scores across the Boards are not comparable. So,
they wanted help in this regard from the Indian Statistical
Institute.
The use of percentile ranks of students based on their
aggregate scores was recommended by Indian Statistical
Institute.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
31. A practical application of rank correlation
Recently, the Ministry of Human Resource Development
(MHRD) considered giving weightage to the marks scored in
the 10+2 Board exams for admission to engineering colleges
in India.
The raw scores across the Boards are not comparable. So,
they wanted help in this regard from the Indian Statistical
Institute.
The use of percentile ranks of students based on their
aggregate scores was recommended by Indian Statistical
Institute.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
32. The Data
Indian Statistical Institute was provided data from 4 boards
(namely, ICSE , CBSE , West Bengal Board and
Tamil Nadu Board) for two consecutive years 2008 and 2009
Though the recommendation from Indian Statistical Institute
was to use aggregate scores of a student for computing the
percentile rank of the student (and that recommendation was
favorably accepted by MHRD), a statistically interesting
question is what happens if we consider various subject scores
separately instead of the aggregate score.
We intend to investigate this issue under some appropriate
assumptions.
2
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
33. The Data
Indian Statistical Institute was provided data from 4 boards
(namely, ICSE , CBSE , West Bengal Board and
Tamil Nadu Board) for two consecutive years 2008 and 2009
Though the recommendation from Indian Statistical Institute
was to use aggregate scores of a student for computing the
percentile rank of the student (and that recommendation was
favorably accepted by MHRD), a statistically interesting
question is what happens if we consider various subject scores
separately instead of the aggregate score.
We intend to investigate this issue under some appropriate
assumptions.
2
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
34. The Data
Indian Statistical Institute was provided data from 4 boards
(namely, ICSE , CBSE , West Bengal Board and
Tamil Nadu Board) for two consecutive years 2008 and 2009
Though the recommendation from Indian Statistical Institute
was to use aggregate scores of a student for computing the
percentile rank of the student (and that recommendation was
favorably accepted by MHRD), a statistically interesting
question is what happens if we consider various subject scores
separately instead of the aggregate score.
We intend to investigate this issue under some appropriate
assumptions.
2
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
35. The Model
For convenience, let us consider only two subjects namely
Mathematics and Physics.
Let us denote the observed score of a student in Mathematics
and Physics as XM and XP . Assume the existence of
unobserved merit variables WP and WM such that the scores
in the two subjects are related as
XM ≈ gM (WM ) XP ≈ gP (WP ) (12)
WM and WP may be treated as attributes of the student
which depend on the knowledge and understanding of Maths
and Physics respectively and also on other factors like
schooling, intelligence etc.
gM and gP relate to the examination procedure corresponding
to the two subjects. They may vary across the boards. 3
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
36. The Model
For convenience, let us consider only two subjects namely
Mathematics and Physics.
Let us denote the observed score of a student in Mathematics
and Physics as XM and XP . Assume the existence of
unobserved merit variables WP and WM such that the scores
in the two subjects are related as
XM ≈ gM (WM ) XP ≈ gP (WP ) (12)
WM and WP may be treated as attributes of the student
which depend on the knowledge and understanding of Maths
and Physics respectively and also on other factors like
schooling, intelligence etc.
gM and gP relate to the examination procedure corresponding
to the two subjects. They may vary across the boards. 3
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
37. The Model
For convenience, let us consider only two subjects namely
Mathematics and Physics.
Let us denote the observed score of a student in Mathematics
and Physics as XM and XP . Assume the existence of
unobserved merit variables WP and WM such that the scores
in the two subjects are related as
XM ≈ gM (WM ) XP ≈ gP (WP ) (12)
WM and WP may be treated as attributes of the student
which depend on the knowledge and understanding of Maths
and Physics respectively and also on other factors like
schooling, intelligence etc.
gM and gP relate to the examination procedure corresponding
to the two subjects. They may vary across the boards. 3
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
38. Formulation of the model
Two students may obtain different scores in Mathematics and
Physics because of the difference in their merit variables WM
and WP or due to the difference in examination procedure gM
and gP across the boards.
It is time that we lay down our assumptions about WM , WP
and gM and gP .
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
39. Assumptions of the model
Assumption 1
The functions gP and gM are monotonically increasing. This
implies the scores of the students are expected to increase
from less meritorious to more meritorious students for each of
the two subjects.
Assumption 2
The joint distribution of (WP , WM ) for the students is the
same in different boards.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
40. How Assumptions can be checked
Imagine a common test in Mathematics and Physics taken by
students of all the boards.
Mathematics score in the common test would be a monotone
function of the Mathematics score in the board examination,
as both are monotone functions of the same merit variable.
(The same holds for Physics scores).
This can be tested by using Spearman’s ρ and Kendall’s τ
statistics.
Mathematics and Physics scores in the common test would
have the same distribution in the subpopulations
corresponding to different boards.
This can be tested using any non-parametric test for equality
of bivariate distributions.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
41. Is there a way to check the validity of these
assumptions using currently available data?
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
42. How assumptions can be checked without a
common test
According to Assumption 2, the dependence between merits
in Physics and Mathematics should be similar in all the
boards.
Rank correlation between Physics and Mathematics scores in
a particular board should not depend on the board-specific
monotone functions gM and gP .
Therefore, rank correlation between Physics and Mathematics
scores across the boards should be the same.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
43. Rank correlation between Physics & Maths for
different boards and years
0
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
44. Rank correlation Physics & Chemistry
Figure: Rank correlation between Physics and Chemistry marks over
years
0
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
45. bar chart of rank correlation Chemistry & Maths
Figure: Rank correlation between Chemistry and Maths marks over years
m
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
46. Subject percentile graph WBHS 2008
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
47. Variation of a subject across a board same year
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
48. Inference from the data analysis
Between boards variation is significantly higher than within
board variation across the two years.
Visibly,there is high correlation in Tamil Nadu Board, whereas
low correlation is observed in CBSE Board.
If we interpret the data available as a large sample from a
larger hypothetical population, the rank correlation computed
for a board in a particular year will have an approximate
normal distribution.
So, we can use this rank correlation values to carry out
ANOVA type statistical analysis to see whether there is
significant difference values across different boards and across
different years. When this is done, rank correlation appears to
be significant across different boards.
This essentially implies breakdown of Assumption 2.
Study of the rank correlation brings out this fact even without
scores of a common test.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
49. Inference from the data analysis
Between boards variation is significantly higher than within
board variation across the two years.
Visibly,there is high correlation in Tamil Nadu Board, whereas
low correlation is observed in CBSE Board.
If we interpret the data available as a large sample from a
larger hypothetical population, the rank correlation computed
for a board in a particular year will have an approximate
normal distribution.
So, we can use this rank correlation values to carry out
ANOVA type statistical analysis to see whether there is
significant difference values across different boards and across
different years. When this is done, rank correlation appears to
be significant across different boards.
This essentially implies breakdown of Assumption 2.
Study of the rank correlation brings out this fact even without
scores of a common test.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
50. Inference from the data analysis
Between boards variation is significantly higher than within
board variation across the two years.
Visibly,there is high correlation in Tamil Nadu Board, whereas
low correlation is observed in CBSE Board.
If we interpret the data available as a large sample from a
larger hypothetical population, the rank correlation computed
for a board in a particular year will have an approximate
normal distribution.
So, we can use this rank correlation values to carry out
ANOVA type statistical analysis to see whether there is
significant difference values across different boards and across
different years. When this is done, rank correlation appears to
be significant across different boards.
This essentially implies breakdown of Assumption 2.
Study of the rank correlation brings out this fact even without
scores of a common test.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
51. Inference from the data analysis
Between boards variation is significantly higher than within
board variation across the two years.
Visibly,there is high correlation in Tamil Nadu Board, whereas
low correlation is observed in CBSE Board.
If we interpret the data available as a large sample from a
larger hypothetical population, the rank correlation computed
for a board in a particular year will have an approximate
normal distribution.
So, we can use this rank correlation values to carry out
ANOVA type statistical analysis to see whether there is
significant difference values across different boards and across
different years. When this is done, rank correlation appears to
be significant across different boards.
This essentially implies breakdown of Assumption 2.
Study of the rank correlation brings out this fact even without
scores of a common test.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
52. Inference from the data analysis
Between boards variation is significantly higher than within
board variation across the two years.
Visibly,there is high correlation in Tamil Nadu Board, whereas
low correlation is observed in CBSE Board.
If we interpret the data available as a large sample from a
larger hypothetical population, the rank correlation computed
for a board in a particular year will have an approximate
normal distribution.
So, we can use this rank correlation values to carry out
ANOVA type statistical analysis to see whether there is
significant difference values across different boards and across
different years. When this is done, rank correlation appears to
be significant across different boards.
This essentially implies breakdown of Assumption 2.
Study of the rank correlation brings out this fact even without
scores of a common test.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
53. Acknowledgement
I would like to express my gratitude towards my mentors for this
project, Prof.Probal Chaudhuri and Prof. Debasis Sengupta
for their immense co-operation. I would also like to think all those
who have been associated with this work in some way or the other.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla
54. Thank You
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
On some interesting features and an application of rank correla