A short brief of distance and statistical distance which is core of multivariate analysis.................you will get here some more simple conception about distances and statistical distance.
2. Md. Menhazul Abedin
M.Sc. Student
Dept. of Statistics
Rajshahi University
Mob: 01751385142
Email: menhaz70@gmail.com
3. Objectives
• To know about the meaning of statistical
distance and it’s relation and difference with
general or Euclidean distance
4. Content
Definition of Euclidean distance
Concept & intuition of statistical distance
Definition of Statistical distance
Necessity of statistical distance
Concept of Mahalanobis distance (population
&sample)
Distribution of Mahalanobis distance
Mahalanobis distance in R
Acknowledgement
9. we see that two specific points in each picture
Our problem is to determine the length between
two points .
But how ??????????
Assume that these pictures are placed in two
dimensional spaces and points are joined by a
straight line
10. Let 1st point is (𝑥1,𝑦1) and 2nd point is (𝑥2, 𝑦2)
then distance is
D= √ ( (𝑥1−𝑥2)2
+ (𝑦1 − 𝑦2)2
)
What will be happen when dimension is three
12. Distance is given by
• Points are (x1,x2,x3) and (y1,y2,y3)
(𝑥1 − 𝑦1)2+(𝑥2 − 𝑦2)2+(𝑥3 − 𝑦3)2
13. For n dimension it can be written
as the following expression and
named as Euclidian distance
22
22
2
11
2121
)()()(),(
),,,(),,,,(
pp
pp
yxyxyxQPd
yyyQxxxP
14. 12/12/2016 14
Properties of Euclidean Distance and
Mathematical Distance
• Usual human concept of distance is Eucl. Dist.
• Each coordinate contributes equally to the distance
22
22
2
11
2121
)()()(),(
),,,(),,,,(
pp
pp
yxyxyxQPd
yyyQxxxP
14
Mathematicians, generalizing its three properties ,
1) d(P,Q)=d(Q,P).
2) d(P,Q)=0 if and only if P=Q and
3) d(P,Q)=<d(P,R)+d(R,Q) for all R, define distance
on any set.
17. • The Manhattan distance is the simple sum of
the horizontal and vertical components,
whereas
the diagonal distance might be computed by
applying the Pythagorean Theorem .
19. • Manhattan distance 12 unit
• Diagonal or straight-line distance or Euclidean
distance is 62 + 62 =6√2
We observe that Euclidean distance is less than
Manhattan distance
23. Relationship between Manhattan &
Euclidean distance.
• It now seems that the distance from A to C is 7 blocks,
while the distance from A to B is 6 blocks.
• Unless we choose to go off-road, B is now closer to A
than C.
• Taxicab distance is sometimes equal to Euclidean
distance, but otherwise it is greater than Euclidean
distance.
Euclidean distance <Taxicab distance
Is it true always ???
Or for n dimension ???
27. For high dimension
• It holds for high dimensional case
• Σ │𝑥𝑖 − 𝑦𝑖│2
≤ Σ │𝑥𝑖 − 𝑦𝑖│2
+ 2Σ│𝑥𝑖 − 𝑥𝑖││𝑥𝑗 − 𝑥𝑗│
Which implies
Σ (𝑥𝑖 − 𝑦𝑖)2 ≤ Σ│𝑥𝑖 − 𝑥𝑗│
𝑑 𝐸 ≤ 𝑑 𝑇
28. 12/12/2016
Statistical Distance
• Weight coordinates subject to a great deal of
variability less heavily than those that are not
highly variable
Whoisnearerto
datasetifitwere
point?
Same
distance from
origin
29. • Here
variability in x1 axis > variability in x2 axis
Is the same distance meaningful from
origin ???
Ans: no
But, how we take into account the different
variability ????
Ans : Give different weights on axes.
30. 12/12/2016
Statistical Distance for Uncorrelated Data
22
2
2
11
2
12*
2
2*
1
222
*
2111
*
1
21
),(
/,/
)0,0(),,(
s
x
s
x
xxPOd
sxxsxx
OxxP
weight
Standardization
31. all point that have coordinates (x1,x2) and
are a constant squared distance , c2
from the
origin must satisfy
𝑥12
𝑠11
+
𝑥22
𝑠22
=𝑐2
But … how to choose c ?????
It’s a problem
Choose c as 95% observation fall in this area ….
𝑠11 > 𝑠22
= >
1
𝑠11
<
1
𝑠22
33. • This expression can be generalized as ………
statistical distance from an arbitrary point
P=(x1,x2) to any fixed point Q=(y1,y2)
;lk;lk;
For P dimension……………..
34. Remark :
1) The distance of P to the origin O is
obtain by setting all 𝑦𝑖 = 0
2) If all 𝑠𝑖𝑖 are equal Euclidean
distance formula is appropriate
36. • How do you measure the statistical distance of
the above data set ??????
• Ans : Firstly make it uncorrelated .
• But why and how………???????
• Ans: Rotate the axis keeping origin fixed.
41. Choice of 𝜃
What 𝜃 will you choice ?
How will you do it ?
Data matrix → Centeralized data matrix → Covariance of
data matrix → Eigen vector
Theta = angle between 1st eigen vector and [1,0]
or
angle between 2nd eigen vector and [0,1]
42. Why is that angle between 1st eigen vector and
[0,1] or angle between 2nd eigen vector and [1,0]
??
Ans: Let B be a (p by p) positive definite matrix
with eigenvalues λ1≥λ2≥λ3≥ … … . . ≥ λp>0
and associated normalized eigenvectors
𝑒1, 𝑒2, … … … , 𝑒 𝑝.Then
𝑚𝑎𝑥 𝑥≠0
𝑥′ 𝐵𝑥
𝑥′ 𝑥
= λ1 attained when x= 𝑒1
𝑚𝑖𝑛 𝑥≠0
𝑥′ 𝐵𝑥
𝑥′ 𝑥
= λ 𝑝 attained when x= 𝑒 𝑝
43. 𝑚𝑎𝑥 𝑥⊥𝑒1,𝑒2,…,𝑒 𝑘
𝑥′ 𝐵𝑥
𝑥′ 𝑥
= λ 𝑘+1 attained when
x= 𝑒 𝑘+1 , k = 1,2, … , p − 1.
44. Choice of 𝜃
#### Excercise 16.page(309).Heights in inches (x) &
Weights in pounds(y). An Introduction to Statistics
and Probability M.Nurul Islam #######
x=c(60,60,60,60,62,62,62,64,64,64,66,66,66,66,68,
68,68,70,70,70);x
y=c(115,120,130,125,130,140,120,135,130,145,135
,170,140,155,150,160,175,180,160,175);y
############
V=eigen(cov(cdata))$vectors;V
as.matrix(cdata)%*%V
plot(x,y)
49. • ################ comparison of both
method ############
comparison=tdata -
as.matrix(cbind(xx,yy));comparison
round(comparison,4)
50. ########### using package. md from original data #####
md=mahalanobis(data,colMeans(data),cov(data),inverted =F);md
## md =mahalanobis distance
######## mahalanobis distance from transformed data ########
tmd=mahalanobis(tdata,colMeans(tdata),cov(tdata),inverted =F);tmd
###### comparison ############
md-tmd
51. Mahalanobis distance : Manually
mu=colMeans(tdata);mu
incov=solve(cov(tdata));incov
md1=t(tdata[1,]-mu)%*%incov%*%(tdata[1,]-
mu);md1
md2=t(tdata[2,]-mu)%*%incov%*%(tdata[2,]-
mu);md2
md3=t(tdata[3,]-mu)%*%incov%*%(tdata[3,]-
mu);md3
............. ……………. …………..
md20=t(tdata[20,]-mu)%*%incov%*%(tdata[20,]-
mu);md20
md for package and manully are equal
59. • The above distances are completely
determined by the coefficients(weights)
𝑎𝑖𝑘 ; i, k = 1,2,3, … … … p. These are can be
arranged in rectangular array as
this array (matrix) must be symmetric positive
definite.
60. Why Positive definite ????
Let A be a positive definite matrix .
A=C’C
X’AX= X’C’CX = (CX)’(CX) = Y’Y It obeys
all the distance property.
X’AX is distance ,
For different A it gives different distance .
61. • Why positive definite matrix ????????
• Ans: Spectral decomposition : the spectral
decomposition of a k×k symmetric matrix
A is given by
• Where (λ𝑖, 𝑒𝑖); 𝑖 = 1,2, … … … , 𝑘 are pair of
eigenvalues and eigenvectors.
And λ1 ≥ λ2 ≥ λ3 ≥ … … . . And if pd λ𝑖 > 0
& invertible .
66. • Consider the Euclidean distances from the
point Q to the points P and the origin O.
• Obviously d(PQ) > d (QO )
But, P appears to be more like the points in
the cluster than does the origin .
If we take into account the variability of the
points in cluster and measure distance by
statistical distance , then Q will be closer to P
than O .
67. Mahalanobis distance
• The Mahalanobis distance is a descriptive
statistic that provides a relative measure of a
data point's distance from a common point. It
is a unitless measure introduced by P. C.
Mahalanobis in 1936
68. Intuition of Mahalanobis Distance
• Recall the eqution
d(O,P)= 𝑥′ 𝐴𝑥
=> 𝑑2
(𝑂, 𝑃) =𝑥′
𝐴𝑥
Where x=
𝑥1
𝑥2
, A=
𝑎11 𝑎12
𝑎21 𝑎22
71. Mahalanobis Distance
• Mahalanobis used ,inverse of covariance
matrix Σ instead of A
• Thus 𝑑2
𝑂, 𝑃 = 𝑥′
Σ−1
𝑥 ……………..(1)
• And used 𝜇 (𝑐𝑒𝑛𝑡𝑒𝑟 𝑜𝑓 𝑔𝑟𝑎𝑣𝑖𝑡𝑦 ) instead of y
𝑑2
(𝑃, 𝑄) = (𝑥 − 𝜇 )′Σ−1
(𝑥 − 𝜇)………..(2)
Mah-
alan-
obis
dist-
ance
72. Mahalanobis Distance
• The above equations are nothing but
Mahalanobis Distance ……
• For example, suppose we took a single
observation from a bivariate population with
Variable X and Variable Y, and that our two
variables had the following characteristics
73. • single observation, X = 410 and Y = 400
The Mahalanobis distance for that single value
as:
75. • Therefore, our single observation would have
a distance of 1.825 standardized units from
the mean (mean is at X = 500, Y = 500).
• If we took many such observations, graphed
them and colored them according to their
Mahalanobis values, we can see the elliptical
Mahalanobis regions come out
76. • The points are actually distributed along two
primary axes:
77.
78. If we calculate Mahalanobis distances for each
of these points and shade them according to
their distance value, we see clear elliptical
patterns emerge:
79.
80. • We can also draw actual ellipses at regions of
constant Mahalanobis values:
68%
obs
95%
obs
99.7%
obs
81. • Which ellipse do you choose ??????
Ans : Use the 68-95-99.7 rule .
1) about two-thirds (68%) of the points should
be within 1 unit of the origin (along the axis).
2) about 95% should be within 2 units
3)about 99.7 should be within 3 units
83. Sample Mahalanobis Distancce
• The sample Mahalanobis distance is made by
replacing Σ by S and 𝜇 by 𝑋
• i.e (X- 𝑋)’ 𝑆−1
(X- 𝑋)
84. For sample
(X- 𝑿)’ 𝑺−𝟏
(X- 𝑿)≤ 𝝌 𝟐
𝒑 (∝)
Distribution of mahalanobis distance
85. Distribution of mahalanobis distance
Let 𝑋1, 𝑋2, 𝑋3, … … … , 𝑋 𝑛 be in dependent
observation from
any population with
mean 𝜇 and finite (nonsingular) covariance Σ .
Then
𝑛 ( 𝑋 − 𝜇) is approximately 𝑁𝑝(0, Σ)
and
𝑛 𝑋 − 𝜇 ′
𝑆−1
( 𝑋 − 𝜇) is approximately χ 𝑝
2
for n-p large
This is nothing but central limit theorem
86. Mahalanobis distance in R
• ########### Mahalanobis Distance ##########
• x=rnorm(100);x
• dm=matrix(x,nrow=20,ncol=5,byrow=F);dm ##dm = data matrix
• cm=colMeans(dm);cm ## cm= column means
• cov=cov(dm);cov ##cov = covariance matrix
• incov=solve(cov);incov ##incov= inverse of
covarianc matrix
87. Mahalanobis distance in R
• ####### MAHALANOBIS DISTANCE : MANUALY ######
• @@@ Mahalanobis distance of first
• observation@@@@@@
• ob1=dm[1,];ob1 ## first observation
• mv1=ob1-cm;mv1 ## deviatiopn of first
observation from
center of gravity
• md1=t(mv1)%*%incov%*%mv1;md1 ## mahalanobis
distance of first
observation from center of
gravity
•
88. Mahalanobis distance in R
• @@@@@@ Mahalanobis distance of second
observation@@@@@
• ob2=dm[2,];ob2 ## second observation
• mv2=ob2-cm;mv2 ## deviatiopn of second
• observation from
• center of gravity
• md2=t(mv2)%*%incov%*%mv2;md2 ##mahalanobis
distance of second
observation from center of
gravity
................ ……………… …..……………
89. Mahalanobis distance in R
………....... ……………… ……………
@@@@@ Mahalanobis distance of 20th
observation@@@@@
• Ob20=dm[,20];ob20 [## 20th observation
• mv20=ob20-cm;mv20 ## deviatiopn of 20th
observation from
center of gravity
• md20=t(mv20)%*%incov%*%mv20;md20
## mahalanobis distance of
20thobservation from
center of gravity
90. Mahalanobis distance in R
####### MAHALANOBIS
DISTANCE : PACKAGE ########
• md=mahalanobis(dm,cm,cov,inverted =F);md
## md =mahalanobis
distance
• md=mahalanobis(dm,cm,cov);md
91. Another example
• x <- matrix(rnorm(100*3), ncol = 3)
• Sx <- cov(x)
• D2 <- mahalanobis(x, colMeans(x), Sx)