This document discusses clustering financial time series data using correlation matrices. It summarizes that analyzing 560 credit default swaps over 2500 days, the empirical correlation matrix eigenvalues closely match the theoretical Marchenko-Pastur distribution, indicating noise. Only 26 eigenvalues exceed the theoretical maximum, which may correspond to market and industry factors. Hierarchical clustering can reorder assets to reveal correlation patterns. Filtering by this reveals the underlying network structure. Beyond correlations, copulas represent the dependence structure, and a distance measure is proposed combining L1 and L0 distances of cumulative distribution functions to cluster on full distributions rather than just correlations. Stability tests show the proposed approach yields more robust clusters than standard correlation-based methods.
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
On Clustering Financial Time Series - Beyond Correlation
1. ON CLUSTERING FINANCIAL TIME SERIES
GAUTIER MARTI, PHILIPPE DONNAT AND FRANK NIELSEN
NOISY CORRELATION MATRICES
Let X be the matrix storing the standardized re-
turns of N = 560 assets (credit default swaps)
over a period of T = 2500 trading days.
Then, the empirical correlation matrix of the re-
turns is
C =
1
T
XX .
We can compute the empirical density of its
eigenvalues
ρ(λ) =
1
N
dn(λ)
dλ
,
where n(λ) counts the number of eigenvalues of
C less than λ.
From random matrix theory, the Marchenko-
Pastur distribution gives the limit distribution as
N → ∞, T → ∞ and T/N fixed. It reads:
ρ(λ) =
T/N
2π
(λmax − λ)(λ − λmin)
λ
,
where λmax
min = 1 + N/T ± 2 N/T, and λ ∈
[λmin, λmax].
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
λ
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
ρ(λ)
Figure 1: Marchenko-Pastur density vs. empirical den-
sity of the correlation matrix eigenvalues
Notice that the Marchenko-Pastur density fits
well the empirical density meaning that most of
the information contained in the empirical corre-
lation matrix amounts to noise: only 26 eigenval-
ues are greater than λmax.
The highest eigenvalue corresponds to the ‘mar-
ket’, the 25 others can be associated to ‘industrial
sectors’.
CLUSTERING TIME SERIES
Given a correlation matrix of the returns,
0 100 200 300 400 500
0
100
200
300
400
500
Figure 2: An empirical and noisy correlation matrix
one can re-order assets using a hierarchical clus-
tering algorithm to make the hierarchical correla-
tion pattern blatant,
0 100 200 300 400 500
0
100
200
300
400
500
Figure 3: The same noisy correlation matrix re-ordered
by a hierarchical clustering algorithm
and finally filter the noise according to the corre-
lation pattern:
0 100 200 300 400 500
0
100
200
300
400
500
Figure 4: The resulting filtered correlation matrix
BEYOND CORRELATION
Sklar’s Theorem. For any random vector X = (X1, . . . , XN ) having continuous marginal cumulative
distribution functions Fi, its joint cumulative distribution F is uniquely expressed as
F(X1, . . . , XN ) = C(F1(X1), . . . , FN (XN )),
where C, the multivariate distribution of uniform marginals, is known as the copula of X.
Figure 5: ArcelorMittal and Société générale prices are projected on dependence ⊕ distribution space; notice their
heavy-tailed exponential distribution.
Let θ ∈ [0, 1]. Let (X, Y ) ∈ V2
. Let G = (GX, GY ), where GX and GY are respectively X and Y marginal
cdf. We define the following distance
d2
θ(X, Y ) = θd2
1(GX(X), GY (Y )) + (1 − θ)d2
0(GX, GY ),
where d2
1(GX(X), GY (Y )) = 3E[|GX(X) − GY (Y )|2
], and d2
0(GX, GY ) = 1
2 R
dGX
dλ − dGY
dλ
2
dλ.
CLUSTERING RESULTS & STABILITY
0 5 10 15 20 25 30
Standard Deviation in basis points
0
5
10
15
20
25
30
35
Numberofoccurrences
Standard Deviations Histogram
Figure 6: (Top) The returns correlation structure ap-
pears more clearly using rank correlation; (Bottom)
Clusters of returns distributions can be partly described
by the returns volatility
Figure 7: Stability test on Odd/Even trading days sub-
sampling: our approach (GNPR) yields more stable
clusters with respect to this perturbation than standard
approaches (using Pearson correlation or L2 distances).