Backtesting Value at Risk and Expected Shortfall with Underlying Fat Tails and Volatility Clustering

Backtesting Value at Risk and Expected Shortfall with
Underlying Volatility Clustering and Fat Tails
by
Stefano Bochicchio Estival BSc
A thesis submitted in conformity with the requirements
for the degree of Master of Science
Department of Mathematics
Faculty of Mathematical & Physical Sciences
University College London
September, 2016

Disclaimer
I, Stefano Bochicchio Estival, conﬁrm that the work presented in this thesis is
my own. Where information has been derived from other sources, I conﬁrm that this
has been indicated in the thesis.
Signature
Date
2

Abstract
Since the financial crisis in 2008, risk management has become one of the most
important topics in finance. The need to accurately assess the risk exposure of a
financial entity has ignited a discussion between academics and regulators to search
for the most accurate and reliable way to measure risk. The most prominent risk
measures are Value at Risk (VaR) and Expected Shortfall (ES). Furthermore, back-
testing has become an important tool to verify the performance of risk measures.
In the context of the behaviour of financial time series, “volatility clustering” and
“fat tails” are the most important properties. [15,36]. This motivates the following
question: What is the effect of these properties on the backtesting procedure of VaR
and ES?
The objective of this thesis is to investigate and analyse the backtesting procedure
of VaR and ES when exposed to data enriched with the properties of “fat tails” and
“volatility clustering”.
The structure of this thesis is integrated as follows. First, the GARCH(1,1)
model is proposed as a reliable tool that embodies the property of “volatility clus-
tering” and the Student t’s distribution as trustworthy model that captures the “fat
tails” property. Second, the parameters of the proposed GARCH(1,1) model and
the Student t’s distribution are estimated using the JPMM stock data and random
3

simulations are generated in order to obtain the “in-sample” and “out-of-sample”
subsets. Third, the VaR and ES estimates for both models are computed using the
“in-sample” subset. Fourth, VaR is backtested using Christoffersen’s [12] tests, while
ES is backtested using Acerbi and Szekely’s [1] Test I and II on the “out-sample”
dataset. Finally, the correspondent p-values of the tests are calculated in order to
conclude whether the estimated risk measures pass the backtest.
In conclusion, the following results were obtained. Regarding the GARCH(1,1)
model, the VaR estimate overestimated the expected VaR violations. Additionally,
these violations were not independent due to the “volatility clustering” property.
Furthermore, The ES estimate indeed passed the backtest but suggested that the
“real” risk was overestimated.
Regarding the Student’s t distribution, the VaR estimate passed the backtest as
the VaR violations were in line with the estimation. Moreover, the violations proved
to be independent. Likewise, The ES estimate passed the backtest. Hence, the
“fat tails” property did not affect the backtesting procedure for both risk measures.
Finally, further lines of investigation are recommended in order to study this topic
with a different focus.
This thesis was completed under the supervision of Professor Johannes Ruf and
Professor Alejandro Gómez.
4

Acknowledgments
Firstly I would like to thank Prof. Alejandro G´omez for his unconditional support
and excellent supervision, I highly appreciate the dedication he showed to this thesis.
Secondly I would like to thank Prof. Johannes Ruf for his teachings during this
whole year, I am deeply grateful to the help that he provided throughout my Master
studies and during this thesis.
Thirdly I would like to thank la bandita.
Last but not least, I would like to thank my parents and Vanessa for their con-
tinuous support.
Stefano Bochicchio Estival, University College London, September 2016
5

To my family and to Vanessa, thanks for all the unconditional support.

Contents
Disclaimer 2
Abstract 3
Acknowledgments 5
List of Tables 10
List of Figures 12
1 Introduction 14
2 Properties of Financial Time Series 18
2.1 Volatility Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.1 GARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Fat Tails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1 Student’s t Distribution . . . . . . . . . . . . . . . . . . . . . 23
3 Properties of Risk Measures and Introduction to VaR and ES 27
3.1 Properties of Risk Measures . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.1 Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.2 Elicitability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.3 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Introduction to Value at Risk . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Introduction to Expected Shortfall . . . . . . . . . . . . . . . . . . . 36
7

Contents 8
4 Theoretical Background for Backtesting VaR and ES 39
4.1 Statistical Background . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Backtesting Value at Risk . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.1 Regulatory Framework . . . . . . . . . . . . . . . . . . . . . . 42
4.2.2 Statistical Framework . . . . . . . . . . . . . . . . . . . . . . 43
4.2.3 Unconditional Coverage Tests . . . . . . . . . . . . . . . . . . 45
4.2.3.1 Violation Ratio . . . . . . . . . . . . . . . . . . . . . 45
4.2.3.2 Failure Test . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.3.3 Proportion of Failures (POF) . . . . . . . . . . . . . 47
4.2.3.4 Christoffersen’s Unconditional Coverage Test . . . . 48
4.2.4 Independence Tests . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.4.1 Christoffersen’s Independence Test (Markov Test) . . 49
4.2.4.2 Christoffersen and Pelletier’s Duration Test . . . . . 50
4.2.5 Conditional Coverage Tests . . . . . . . . . . . . . . . . . . . 51
4.2.5.1 Joint Markov Test . . . . . . . . . . . . . . . . . . . 51
4.2.5.2 Christoffersen’s Conditional Coverage Joint Test . . . 51
4.3 Backtesting Expected Shortfall . . . . . . . . . . . . . . . . . . . . . 52
4.3.1 Quantile Approximation . . . . . . . . . . . . . . . . . . . . . 52
4.3.2 Acerbi and Szekely Test . . . . . . . . . . . . . . . . . . . . . 54
4.3.2.1 Test I . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.2.2 Test II . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3.2.3 Test III . . . . . . . . . . . . . . . . . . . . . . . . . 57
5 Backtesting VaR and ES with the Generated Data 59
5.1 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1.1 Volatility Clustering . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.2 Fat Tails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Computation of Risk Measures . . . . . . . . . . . . . . . . . . . . . 67
5.2.1 Computation of Value at Risk . . . . . . . . . . . . . . . . . . 67
5.2.2 Computation of Expected Shortfall . . . . . . . . . . . . . . . 72
5.3 Backtesting Value at Risk and Expected Shortfall Using Selected Tests 75
5.3.1 Backtesting Value at Risk . . . . . . . . . . . . . . . . . . . . 75

Contents 9
5.3.2 Backtesting Expected Shortfall . . . . . . . . . . . . . . . . . 81
5.3.2.1 Acerbi and Szekely Test I . . . . . . . . . . . . . . . 82
5.3.2.2 Acerbi and Szekely Test II . . . . . . . . . . . . . . . 84
6 Conclusions, limitations and further research 90
A Statistical Tests For VaR Backtesting 95
Bibliography 97

List of Tables
2.1 Kurtosis of the FTSE Index with Fitted Normal Distribution . . . . . 23
2.2 Kurtosis of the FTSE Index with Fitted Student’s t Distribution . . . 26
3.1 Summary of Properties for VaR and ES 1
. . . . . . . . . . . . . . . . 38
4.1 Hypothesis Testing Summary Table . . . . . . . . . . . . . . . . . . . 40
4.2 Contingency Table for Christoﬀersen’s Markov Test 2
. . . . . . . . . 49
5.1 Statistical Test for the Residuals of the Returns Data Series of JPMM
with α = 0.05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 Parameter Estimates, Standard Error and Test Statistic for the Fitted
GARCH(1,1) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3 Estimates and Standard Error of the Fitted Student’s t Distribution . 65
5.4 Kurtosis of the Simulated Student’s t Distribution . . . . . . . . . . . 65
5.5 Estimates of VaR for the GARCH(1,1) Model and the Student’s t
Distribution with the Correspondent α. . . . . . . . . . . . . . . . . . 71
5.6 Estimates of VaR and ES for Both Methods with Various Conﬁdence
Levels α . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.7 Test Statistic and p-value for the Z1 (X) Test . . . . . . . . . . . . . . 83
5.8 Test Statistic and p-value for the Z2 (X) Test . . . . . . . . . . . . . . 85
A.1 Statistical Test for the GARCH(1,1) Model with α = 0.05 . . . . . . 95
A.4 Statistical Test for Student t’s Distribution with α = 0.05 . . . . . . . 96
A.5 Statistical Test for Student t’s Distribution with α = 0.025 . . . . . . 96
10

A.6 Statistical Test for Student t’s Distribution with α = 0.01 . . . . . . . 96
11

List of Figures
2.1 FTSE Daily Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Empirical and Fitted Normal Distribution of the FTSE Index . . . . 24
2.3 Rescale of Empirical and Fitted Normal Distribution of the FTSE Index 24
2.4 Rescale of Empirical and Fitted Student’s t distribution of the FTSE
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1 VaR and ES for a Loss Function that is Normally Distributed with
µ = 0 and σ2
= 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.1 Daily Returns of JPMM . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Sample Autocorrelation Function and Sample Partial Autocorrelation
Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3 Simulated Conditional Variance and Returns for the Fitted GARCH(1,1)
Model for a Selected Path . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4 Sample Autocorrelations for the Conditional Variance (up) and Re-
turns (down) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.5 Fitted Student’s t Distribution vs. Empirical Distribution . . . . . . 66
5.6 Cumulative Mean for the VaR of the GARCH(1,1) Model with Se-
lected Values of α. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.7 Moving Average with a Window of 1,000 Observations for the VaR of
the GARCH(1,1) Model with Selected Values of α. . . . . . . . . . . 69
5.8 Cumulative Mean for the VaR of the Student’s t Distribution with
Selected Values of α in Conjunction with the Quantile Values Derived
from the Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
12

5.9 Cumulative Mean for the ES of the GARCH(1,1) Model with Selected
Values of α . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.10 Moving Average with a Window of 1,000 Observations for the ES of
the GARCH(1,1) Model with Selected Values of α. . . . . . . . . . . 73
5.11 Cumulative Mean for the ES of the Student’s t Distribution with Se-
lected Values of α. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.12 Backtesting VaR with Student’s t Distribution Generated Data with
α = 0.05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
α = 0.025 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
α = 0.01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.15 Backtesting VaR with GARCH(1,1) Generated Data with α = 0.05 . 78
5.16 Backtesting VaR with GARCH(1,1) Generated Data with α = 0.025 . 78
α = 0.01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.18 Contribution to the Z2(X) Test Statistic for the GARCH(1,1) Model
with α = 0.05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
with α = 0.025 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
with α = 0.01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.21 Contribution to the Z2(X) Test Statistic for the Student’s t Distribu-
tion with α = 0.05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
tion with α = 0.025 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
tion with α = 0.01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
13

Chapter 1
Introduction
Since the financial crisis in 2008, risk management has become one of the most
important topics in finance. The need to accurately assess the risk exposure of a
financial entity has ignited a discussion between academics and regulators to search
for the most accurate and reliable way to measure risk. The most common type
of risk is market risk, which measures the sensitivity of the value of a portfolio
with respect to changes in the price of the underlying financial products. Another
important manifestation of risk is called credit risk, which embodies the risk of not
receiving outstanding payments from a financial counterpart due to a default. Within
the realm of credit risk there exists a subset called credit counterparty risk which is
mainly incurred when trading OTC1
derivatives, as the fulfilment of future cashflows
depends directly on a financial counterparty. Moreover, liquidity risk corresponds to
the risk that arises when financial positions cannot be opened or closed at the desired
prices due to a lack of trading activity in the market. Operational risk measures the
risk associated with partial or complete failure of internal processes such as human
or computational systems [39]. Within operational risk, legal risk corresponds to the
unexpected losses attributed to a defective transaction related to a dispute or legal
action against a certain financial entity [38] (for more information about other types
1
OTC stands for “over the counter” which denotes the non-standardized contracts that are not
traded in exchanges but directly between counterparties.
14

Chapter 1. Introduction 15
of risk, see McNeil et al. [39]).
A natural question that arises when dealing with risk is: How can risk be mea-
sured?
In their seminal article, Emmer et al. [24] mention that the concept of risk mea-
surement is fundamental to the correct management of risk. Specifically, Kou et
al. [30] indicate that “a risk measure attempts to assign a single numerical value to
the random loss of a portfolio of assets.”
The modern history of risk measurement starts with Markovitz in 1952 [37] when
he introduced the concept of risk together with the return of a financial product. In
his work, Markovitz defines risk as the “standard deviation” of returns [24].
At the end of 1974, the Basel Committee on Banking Supervision (BCBS) was
established by the members of the Group of Ten (G-10) countries. Its objective is to
ensure global financial stability by setting the minimum regulatory framework for the
supervision of the banking industry. In the second agreement of the BCBS in 2004
[4], Value at Risk (VaR) was adopted as the benchmark downside risk measure to
quantify the market risk for financial institutions (for more information about other
downside risk measures please see Nawrocki [40]) [6, 31, 39]. In October 2013, the
BCBS [5] proposed a change in its regulations and, therefore, introduced Expected
Shortfall (ES) as a suggested financial risk measure to capture unexpected losses
incurred in financial distress [1].
In the financial regulatory framework, risk measures need to be backtested to
assess accurately the capital needed to set aside in order to cover extreme portfolio
losses. When it comes to backtesting VaR, certain standardized tests can be im-
plemented in order to cross-check the current capital requirements as explained by
Campbell [9] and Kupiec [31].
On the other hand, Gneiting [26] and Carver [10] mention that ES is not back-
testable due to the fact that it does not fulfil the property of elicitability (see Section

3.1.2). Nevertheless, Acerbi and Szekely [1], Kerhof et al. [29] and Costanzino et
al. [18] mention that elicitability is not a necessary factor to determine if a risk mea-
sure is backtestable. As a consequence, the former authors introduce standardized
non-parametric backtesting procedures for ES.
In the context of the behaviour of financial time series, the property of “volatility
clustering” is frequently embodied by financial assets as shown first by Mandelbrot
[36] and studied by Cont [16]. Moreover, after the financial crisis of 2008, the property
of “fat tails” in the probability distribution of prices has manifested in the dynamics
of financial markets (for further information please see Dash [20]). Now the following
question can be asked: What is the effect of these properties on the backtesting
procedure of VaR and ES?
The objective of this thesis is to investigate and analyse the backtesting procedure
of VaR and ES when exposed to data enriched with the properties of “fat tails” and
“volatility clustering”.
In Chapter 2, the properties of “volatility clustering” and “fat tails” are pre-
sented. The GARCH(1,1) model is proposed as a tool that embodies the property
of “volatility clustering” and the Student t’s distribution is taken as a trustworthy
model that captures the “fat tails” property. Moreover, the fulfilment of these prop-
erties is empirically evidenced in a specific financial time series namely, the FTSE
index. In Chapter 3, a background on the properties that are important for risk mea-
sures is exposed. Additionally, VaR and ES are introduced. In Chapter 4, a thorough
analysis of the backtesting procedures available for VaR and ES is undertaken.
In Chapter 5 the methodology of the thesis is introduced. First, the parameters
of the proposed GARCH(1,1) model and the Student t’s distribution are estimated
using the JPMM stock data and random simulations are generated in order to obtain
the “in-sample” and “out-of-sample” subsets. As a next step, the VaR and ES
estimates for both models are computed using the “in-sample” subset. Afterwards,

VaR is backtested using Christoﬀersen’s [12] tests, while ES is backtested using
Acerbi and Szekely’s [1] Test I and II on the “out-sample” dataset. Finally, the
correspondent p-values of the tests are calculated in order to conclude whether the
estimated risk measures pass the backtest.

Chapter 2
Properties of Financial Time Series
In this chapter, a theoretical background introduces the role of the two most
common properties of ﬁnancial times series: “volatility clustering” and “fat tails”.
Furthermore, two models are introduced as catalysts of these properties. Particularly,
the GARCH(1,1) model is used to capture the “volatility clustering” property and
the Student’s t distribution is chosen to embody the property of “fat tails”.
18

Chapter 2. Properties of Financial Time Series 19
2.1 Volatility Clustering
The volatility clustering phenomenon was first described by Mandelbrot [36] as
“large changes tend to be followed by large changes, of either sign and small changes
tend to be followed by small changes.” In other words, when volatility is high during
a certain period of time it tends to be high for the consequent periods and vice versa.
Moreover, Cont [16] indicates that the volatility clustering effect corresponds to the
fact that financial time series returns are non-linearly dependent on time.
As Figure 2.1 illustrates, large clusters of returns arrive consecutively in the FTSE
Index. This is a clear manifestation of the volatility clustering effect on financial
instruments.
Figure 2.1: FTSE Daily Returns1
2.1.1 GARCH Model
The GARCH (Generalized Autoregressive Conditional Heteroscedasticity) model
was developed by Engle [25] and generalized by Bollerslev [8]. This model is a
popular reference in modelling the dynamic variability of time series. Due to the
1
Price data obtained from www.yahoofinance.com.

fact that prices fluctuate during periods of financial stress, conditional variances are
non-constant.
GARCH models have proven to be interesting tools to embody the volatility
clustering effect on financial time series [8, 25]. This is due to the fact that in the
GARCH model, the present level of volatility is dependent on the volatility of one
period before. For example, if volatility is high for a previous time step, it suggests
that it would still be high for the next time step. Therefore, in the realm of finance,
the GARCH model is an appealing option to model financial time series [44,50].
Cont [16] even calls the volatility clustering feature the “GARCH effect”. How-
ever, the author mentions that this event is non-parametric and it is not implicitly
linked to the GARCH(1,1) model specification.
Definition 2.1.1 GARCH process. The process Xt follows a GARCH process
composed of p past conditional variances (σ2
i−1) and q past squared innovations (X2
i−1)
if
σ2
t = ω +
q
i=1
αiX2
t−i +
p
i=1
βiσ2
t−i
Xt = σt t
(2.1)
where ω ∈ R , βi, αi ≥ 0 and t ∼ N(0, 1)
Due to both its usefulness and importance in the financial industry [44,46], this thesis
focuses on the GARCH(1,1) process which takes one lag for the past conditional vari-
ances (σ2
i−1) and one lag for the past squared innovations (X2
i−1).The GARCH(1,1)
can be represented using Definition 2.1.1 with p = 1 and q = 1.
σ2
t = ω + αX2
t−1 + βσ2
t−1 (2.2)

where, ω ∈ R and β, α ≥ 0
For the purpose of this thesis the returns of certain financial time series (Xt)
follow a GARCH(1,1) process which fulfil the following Xt ∼ N(0, σ2
t ). where σ2
t
satisfies Equation 2.2. Additionally, in order to have a stationary solution for the
GARCH(1,1) model the following equation needs to hold.
α + β < 1 (2.3)
Lindner [34] argues that the process Xt has a finite variance if, and only if, Equation
2.3 is fulfilled.
For the estimation of the parameters, the maximum likelihood approach is usually
used to produce the estimated parameters of the model. It is known that
Xt ∼ N(0, σ2
t )
σ2
t = ω + αX2
t−1 + βσ2
t−1
(2.4)
so in order to find the estimated coefficient vector ν = (ω, α, β)T
, the following is
obtained
L(θ) = 0.5
n
k=2
X2
t
σ2
t
−
1
σt
∂σt
∂ν
(2.5)
J = −0.5
n
k=2
E
1
σ2 t
∂σt
∂ν
∂σt
∂νT
(2.6)
where L(θ) is the gradient of the loglikelihood function and J is the Fisher’s In-
formation Matrix. Consequently, the estimated parameters can be found using the
iterative scheme from Newton’s optimization method [54] (see Yang [54]).

2.2 Fat Tails
The property of fat tails2
in time series has been labelled as a stylized fact3
in
financial assets as evidenced by [15,21]. This feature refers to the property that data
possesses extreme values that tend to be separated from the mean of the distribution.
In other words, data is underestimated by a normal distribution as it assigns a low
probability to events far from the mean. Therefore, a better treatment can be pro-
vided with the use of heavy-tailed distributions such as the Student’s t distribution.
Cont [15] mentions that precise behaviour of the tails may be sometimes difficult to
determine.
From a mathematical viewpoint, the property of fat tails can be represented with
the following formula.
P(X > x) ∼ x−α
α > 0 (2.7)
In other words, the asymptotic density function of the extreme events fX(x) decays
as polynomial with α > 0. For example, in the case of a normal distribution it is
quadratically exponential and therefore the decay compared to a polynomial is faster.
Conversely, for the case of the Student t’s distribution the asymptotic distribution
corresponds to a polynomial decay.
A useful property to determine the property of fat tails in the data can be captured
with the kurtosis (normalized fourth moment) of the distribution, which is defined
as follows
KX =
E[(X − µ)4
]
(E[(X − µ)2])2
, (2.8)
2
The term of fat tails and heavy tails is are used interchangeably in the literature and in this
thesis.
3
Cont [15] mentions that a stylized fact is defined as “a common denominator among the
properties observed in studies of different markets and instruments.”

where µ corresponds to the mean of the random variable X.
Distribution Kurtosis
Empirical Distribution 12.5354
Fitted Normal Distribution 3.0018
Table 2.1: Kurtosis of the FTSE Index with Fitted Normal Distribution
Table 2.1 shows that the kurtosis of the empirical data extracted from the FTSE
index is higher than the one obtained from the fitted normal distribution. Hence, as
it was already mentioned, the excess kurtosis observed in the FTSE index cant’ be
accurately modelled by a normal distribution.
Moreover, Figures 2.2 and 2.3 present the empirical distribution in conjunction
with the fitted normal distribution for the FTSE index daily compounded returns.
As it can be seen on the graphs, the empirical distribution assigns more probability to
extreme events in comparison to the fitted normal distribution. Hence, this suggests
that the empirical data may possess the property of fat tails as also observed in
Table 2.1.
Figure 2.3 presents the events that are higher than µ + 3σ for the empirical
distribution. It is known that the fitted normal distribution covers about 99.7% of
its area in the interval (µ − 3σ, µ + 3σ). Therefore, the events higher than µ + 3σ
are extremely unlikely. To put it into perspective, if normal random numbers were
to be drawn every day, the event that a trial lies outside the interval (µ−6σ, µ+6σ)
would occur every 1.38 million years. On the contrary, Figure 2.3 shows that this
event happened more than once in the last 32 years of the FTSE index data.
2.2.1 Student’s t Distribution
As evidenced in Section 2.2, the normal distribution may underestimate the true
underlying behaviour of the returns of financial time serie. Therefore, when it comes

Figure 2.2: Empirical and Fitted Normal Distribution of the FTSE Index
Figure 2.3: Rescale of Empirical and Fitted Normal Distribution of the FTSE Index

to model excessive returns in financial time series, the use of fat tails distributions
seems appropriate.
One of the most famous fat tails distributions is the Student’s t distribution. Stoy-
anov [48] mentions that the underlying reason why this distribution is so widespread
is due to its simplicity and the easy implementation of a numerical method for its
application. Therefore, the Student’s t distribution is used as the catalyst to generate
the dataset enriched with the property of fat tails.
Definition 2.2.1 Student’s t distribution. Let X be a random variable. X is a
Student’s t distribution with ν degrees of freedom if it has the following probability
density function:
f(t) =
Γ(ν+1
2
)
√
νπ Γ(ν
2
)
1 +
(t−µ
σ
)2
ν
−ν+1
2
, (2.9)
where µ and σ correspond to the location and the scaling parameters respectively.
Moreover, Γ corresponds to the Gamma function, which is defined in Equation 2.10,
Γ(t) =
ˆ ∞
0
xt−1
e−x
dx. (2.10)
Returning to the example concerning the FTSE index, Figure 2.4 shows the same
information as Figure 2.3 with the exception that a fitted Student’s t distribution is
utilized instead. As the graph shows, this distribution takes more into consideration
the extreme values in comparison to the fitted normal distribution. Finally, Table 2.2
illustrates the kurtosis of the fitted Student t’s distribution compared to the empirical
distribution. Clearly the Student t’s distribution matches better the actual kurtosis
of the empirical data as compared to Table 2.1.

Figure 2.4: Rescale of Empirical and Fitted Student’s t distribution of the FTSE Index
Empirical Distribution 12.5354
Student’s t Fitted Distribution 20.2836
Table 2.2: Kurtosis of the FTSE Index with Fitted Student’s t Distribution

Chapter 3
Properties of Risk Measures and
Introduction to VaR and ES
In this chapter the theoretical background of risk measures is presented. More-
over, the standard risk measures proposed by the regulators and the industry are
presented, namely: VaR and ES.
27

Chapter 3. Properties of Risk Measures and Introduction to VaR and ES 28
3.1 Properties of Risk Measures
As already mentioned in Chapter 1, “a risk measure attempts to assign a single
numerical value to the random loss of a portfolio of assets.” Kou et al. [30]. In this
section a formal definition is given in terms of the desired properties that a risk
measure must possess. This section is based on the layout presented by Emmer et
al. [24].
3.1.1 Coherence
The concept of coherence is important as it groups various mathematical proper-
ties that should be taken into account in order to select a suitable risk measure [24].
Specifically, Artzner et al. [3] propose the following four key properties that need to
be fulfilled in order for a risk measure to be coherent.
Definition 3.1.1 Homogeneity. A certain risk measure ζ(·) is called homogeneous
if for all loss variables L and h ≥ 0 it holds that
ζ(hL) = hζ(L)
Definition 3.1.2 Subadditivity. A certain risk measure ζ(·) is called subadditive
if for all loss variables L and K it holds that
ζ(L + K) ζ(L) + ζ(K)
Definition 3.1.3 Monotonicity. A certain risk measure ζ(·) is called monotonic
if for all loss variables L and K it holds that
L K =⇒ ζ(L) ζ(K)

Definition 3.1.4 Translation Invariance. A certain risk measure ζ(·) is called
translation invariant if for all loss variables L and δ ∈ R it holds that
ζ(L − δ) = ζ(L) − δ
3.1.2 Elicitability
Elicitability [26,32,43] plays an important role in the determination of an appro-
priate risk measure. Before formalizing the definition of elicitability the following
definitions need to be introduced.
Definition 3.1.5 Scoring function. A scoring function is defined as follows
s : R × R → [0, ∞)
(x, y) → s(x, y)
where x and y correspond to the forecast and the realization respectively. Put into
words, a scoring function is a function that assigns a numerical score in terms of
the distance between the forecasted value and the realized value. For example, this
difference could be measured by the square error s(x, K) = (x − K)2
or the absolute
error s(x, K) = |(x − K)|.
Definition 3.1.6 Consistency. Let τ be a functional on a class of probability mea-
sures P on R:
τ : P → 2R
Q → τ(Q) ⊂ R
A scoring function s : R × R → [0, ∞) is consistent for the functional τ relative to

the class P if and only if, ∀Q ∈ P, t ∈ τ(Q), x ∈ R and L being the loss random
variable defined on (Ω, F, Q) then
EQ[s(t, L)] ≤ EQ[s(x, L)]
Definition 3.1.7 Strict Consistency. A scoring function S is strictly consistent
if and only if it is consistent and
EQsS(t, L)] = EQ[s(x, L)] =⇒ x ∈ τ(Q)
Finally, the definition of elicitability can be introduced.
Definition 3.1.8 Elicitability. The functional τ is elicitable relative to P if and
only if there exists a scoring function S which is strictly consistent for τ relative to
P.
This definition is used by Emmer et al. [24]. Moreover, the authors mention that
elicitability is a very helpful property for the determination of optimal point forecasts.
Hence, if there exists a strictly consistent scoring function S for a functional τ then
elicitability can be defined as follows (also used by Acerbi and Szekely [1])
ι = arg min
x
E[s(x, K)] (3.1)
where s(x) is a scoring function and ι(K) is a statistic of the random variable K. One
of the most important properties of elicitability is that it can be utilized to assess the
performance of forecast models [26]. It is worth noting that usually elicitability refers
to the risk measure itself and not to a functional with respect to the risk measure.
On the following sense, a “weak” second order elicitability can be defined as

follows
Definition 3.1.9 Conditional Elicitability A functional τ of Q is called condi-
tionally elicitable if there exists functionals κ and κ : D → 2R
with D ⊂ Q × 2R
that
satisfy the following
• κ is elicitable relative to Q
• (P, κ) ∈ D ∀P ∈ Q
• ∀c ∈ κ(Q) the functional κc : Qc → 2R
, P → κ(P, c) ⊂ R is elicitable relative
to Qc = {P ∈ Q : (P, c) ∈ D}
the property of Conditional Elicitability is relevant when forecasting risk measures
that are not elicitable.
3.1.3 Robustness
Robustness is defined as the sensibility that a certain model has when altering its
underlying parameters. A robust risk measure, in a strict sense, is not significantly
affected by external as well as internal shocks. In the risk context, Emmer et al. [24]
mention that without robustness it could be the case that results cannot be relevant
as small measurement errors lead to big changes in the estimated risk measure.
Furthermore, Cont et al. [17] define robustness with a different focus. Specifically,
instead of assuming that the sensibility comes from measurement errors, they assign
it to the actual inflow of new data into estimate the model.
When analysing the robustness of a certain risk measure, a distance should be
defined. Emmer et al. [24] proposes the following definition

Definition 3.1.10 Wasserstein distance The Wasserstein distance between two
probability measures P and Q is,
Dws(P, Q) = inf{E(|X − Y |) : X ∼ P, Y ∼ Q} (3.2)
using Equation 3.1.10 the definition of robustness can be introduced.
Definition 3.1.11 Robustness. A risk measure µ is called robust with respect to
the Wasserstein distance if
lim
x→∞
Dws(Xn, X) = 0 ⇒ lim
x→∞
|µ(Xn) − µ(X)|= 0 (3.3)
where Xn ∼ Pn, n ∈ N as well as Pn corresponds to a probability measure and µ
corresponds to a certain risk measure.

3.2 Introduction to Value at Risk
VaR is the most widespread risk measures in finance. VaR was developed by J.P.
Morgan in 1994 on their publication of the Riskmetrics framework. This catapulted
it as a benchmark risk measure in the industry [28, 41]. Afterwards, as already
mentioned in Chapter 1, the Basel Committee on Banking Supervision introduced
VaR as the internal benchmark for banks to calculate the capital requirements.
Definition 3.2.1 Value at Risk (VaR). A portfolio’s Value at Risk (VaR) corre-
sponds to the α quantile of the profit and loss distribution X [9]
VaRt(α) = −F−1
(α) = inf{x ∈ R : F(x) ≥ α} (3.4)
where F−1
(α) is the quantile function (inverse CDF1
) of the profit and loss distribu-
tion.
When defining VaR, a confidence level (1 − α) and a time interval (t) must be given.
Specifically, t determines the distribution, and in the risk metric perspective this
parameter is introduced for convenience and clarity.
As an illustrative example, let t = 1 and α = 0.01. Therefore, if VaR corresponds
to the value of $1,000, then under the mathematical definition, 99% of the times the
incurred loss of a certain portfolio in one day exceeds $1,000.
In spite of the popularity and simplicity of VaR, some shortcomings have been
diagnosed on this risk measure. As a first important drawback, VaR does not provide
any information regarding the magnitude of the excess loss beyond the α level. This
is an important pitfall as the VaR could underestimate the actual loss of the portfo-
lio [42,45,53]. Second, VaR is criticized by its lack of subadditivity (see Definition
3.1.2) and therefore its lack of coherence (see Definition 3.1.1). This result is disturb-
1
Assuming F() is continuous and the inverse function exists.

ing because of the following reason: there may be no direct benefit by diversifying
portfolios as actually the VaR can be higher for the added portfolio [39,53]. Never-
theless, there are some cases in which VaR is subadditive. For example, according
to Haugh [27], VaR is indeed subadditive when dealing with elliptical distributions
as well as with distributions which are continuous and symmetric.
Regarding the computation of VaR, there exist three major techniques that are
commonly implemented.
1. Variance-Covariance Approach. The variance-covariance approach is based
in the assumption that returns are normally distributed. Therefore, historical
data is taken in order to estimate the parameters of the normal distribution
(µ, σ2
). Consequently, when quantiles need to be obtained, the calculation
is merely simplified by the fact that it corresponds to the ones in the normal
distribution.
A strong advantage of this method is that it is very flexible and simple to use.
Moreover, it facilitates the inclusion of stress scenarios to analyse the sensitivity
of the results when parameters are changed [51].
However, the most important pitfall of this technique is that the returns of
the portfolio are assumed to be normally distributed. As already explained
in Section 2.2, the normal distribution may sometimes underestimate the true
behaviour of financial assets.
2. Historical Simulation. As mentioned by O’Brien et al. [42], the Historical
Simulation technique is the most popular approach for calculating VaR.
This technique is mainly based on the historical information of the financial
products that compose a specific portfolio. It is assumed that the weights
of the financial instruments in the portfolio do not change for the observation
period. In this case, VaR is obtained by inspecting the quantiles of the empirical

distribution generated by the historical prices.
The key advantage of this technique lies in the fact that it is non-parametric.
In other words, there is no need to estimate any kind of parameters as the
distribution is based on the historical prices. Moreover, Nieppola [41] mentions
that using historical data series can account for the property of “heavy tails”
of the distribution.
One of the most important pitfalls of this model is that it assumes that the
behaviour of past prices is a good model for its behaviour in the future; “driving
by looking in the rearview mirror”. Therefore, it assumes that history could
repeat itself in the future. For example, Dowd [22] mentions that if the data is
unusually quiet, the VaR calculated under the Historical Simulation approach
could underestimate the “true risk”. Moreover, another important shortcoming
of the model is that, as past prices are the most important input for the model,
a long history of data is needed. That could pose a problem when taking
into account financial instruments that have a short-lived history [41](for more
information regarding the Historical Simulation approach refer to Dowd [22]).
3. Monte Carlo Simulation. The Monte Carlo Simulation approach, despite
being a really powerful VaR calculation technique, is the most challenging
technique to implement [22]. The Monte Carlo method relies on the simula-
tion of financial variables which are estimated with respect to market data.
Specifically, price paths are simulated at various times to calculate the implied
distribution from which VaR estimates can be computed.
One of the most important disadvantages of the Monte Carlo approach is that
the computational cost is extremely high. In other words, multiple simulated
paths need to be generated in order to obtain a robust result requiring sub-
stantial computational memory. This can be crucial when trading in a high-

frequency environment or when estimating the risk of the whole portfolio of a
large bank (for more information about the Monte Carlo technique see Niep-
pola [41]).
3.3 Introduction to Expected Shortfall
As already shown in Section 3.2.1, one of the main drawbacks that VaR poses
is that it does not take into account the magnitude of the loss. As a consequence
Expected Shortfall2
was introduced as an enhanced risk measure.
Definition 3.3.1 Expected Shortfall. Let X be a profit and loss random variable
such that E(X) < ∞ with probability density function fX, the ES can be defined as
follows
ESt(α) =
1
1 − α
ˆ 1
α
gu(fX)du = −E[X|X ≤ −VaRt(α)] (3.5)
where g(·) corresponds to the quantile function or the inverse cumulative density
function of the profit and loss distribution.
Put into words, the Expected Shortfall as denoted in Equation 3.5 weights the
probability under the tail of the loss distribution for the losses that exceed the VaR
threshold. As a consequence, the following relationship holds
|ESt(α)|≥ |VaRt(α)| (3.6)
2
Expected Shortfall is also defined with a different nomenclature. Across the literature, it is also
called Expected Tail Loss, Conditional VaR, Tail VaR, Tail Conditional Expectation, and Worst
Conditional Expectation. For more information refer to Acerbi and Tasche [2].

Figure 3.1: VaR and ES for a Loss Function that is Normally Distributed with µ = 0 and
σ2 = 1.
Figure 3.1 depicts the values for VaR and ES with respect to a normally dis-
tributed loss random variable with µ = 0 and σ2
= 1. As it can be observed on the
graph, Equation 3.6 holds.
The key advantage of ES with respect to VaR is that it takes into account the
magnitude of the loss beyond the VaR threshold and therefore proves to be a more
precise measure of the actual exposure to market risk. Moreover, from a mathemati-
cal standpoint, ES fulfils all the properties of a coherent risk measure (see Definition
3.1.1) as shown by Artzner et al. [3]. Also, Dowd [22] defines ES as “the most
attractive coherent risk measure”.
Nonetheless, the most troublesome drawback for ES, arises from the lack of the
elicitability property (see Definition 3.1.2). Specifically, some authors as Gneiting [26]
and Carver [10] mention that ES is not backtestable due to the fact that it is not
elicitable. Nevertheless, Acerbi and Szekely [1], Kerhof et al. [29], and other authors
have done substantial work in developing non-parametric standardized backtesting
procedures to test ES.
In summary, Table 3.1 shows the properties that hold for VaR and ES.

Property VaR ES
Coherence
Robustness
Elicitability
Conditional Elicitability
Table 3.1: Summary of Properties for VaR and ES 3
3
Table taken from Emmer et al. [24]

Chapter 4
Theoretical Background for
Backtesting VaR and ES
The concept of backtesting risk measures is crucial for the validation of risk
models. Backtesting is essential for employers and risk managers which need to
assess whether risk measures are well calibrated. [22]. Backtesting is composed by
statistical and quantitative tests that veriﬁes if a certain risk measure (in this case
VaR and ES) is consistent with the assumptions of the model.
In this chapter the statistical background of hypothesis testing is introduced and
a variety of backtesting procedures across the academic literature is exhibited for
VaR and ES.
39

Chapter 4. Theoretical Background for Backtesting VaR and ES 40
4.1 Statistical Background
When backtesting risk measures, the procedure hypothesis testing is crucial to
assess the performance of risk measures.
A hypothesis test normally defines two types of hypothesis; a null hypothesis
(H0) and the alternative hypothesis (Ha). Usually, the objective of hypothesis test-
ing relies on verifying if the null hypothesis is true.
H0
Decision True False
Reject H0 Type I error (α)
Not Reject H0 Type II error (β)
Table 4.1: Hypothesis Testing Summary Table
Table 4.1 describes the different cases when testing the null hypothesis. The most
troublesome decision that can be done is rejecting the null hypothesis when in fact
it is true. This is called Type I error or significance (α). Another nomenclature that
can be found in the literature is “false positive.” Under normal circumstances, this
error is set up at the beginning of the test with values normally ranging from 0.01 to
0.05. The probabilistic interpretation of α is the chance of rejecting null hypothesis
when it is actually true. Moreover, another type of possible error when performing
a hypothesis testing is the so-called error type II or β. Specifically, β corresponds
to the probability of accepting the null hypothesis when it is actually false. In
the statistical literature, this is called “false negative”. Finally, in the statistical
literature, the quantity 1−β is called “power” of the test. That is, the probability of
rejecting the null hypothesis when it is indeed false. Hence it is desirable to obtain
the highest “power” when performing hypothesis testing [11].
As the significance level increases, it is more probable that null hypothesis is
accepted, and therefore the probability that the “true model” is rejected decreases

(Type I error). Nevertheless, this implies that it is more probable to incorrectly
accepting the “false” model (Type II error) [22].
When performing hypothesis testing a test statistic is crucial. A test statistic is
basically a function of a given data sample that is used to judge whether the null
hypothesis is true or not. Specifically, it is compared to a certain critical value in
terms of α to check whether the null hypothesis can be rejected or not.
The usual way of verifying the test statistic is with the p-value. That is, the
p-value (ρ) corresponds to the probability of observing a more extreme value under
the null hypothesis . Hence, if the following holds, ρ ≤ α the null hypothesis is
rejected and if ρ > α then it is not possible to reject the null hypothesis.
In order to clarify the aforementioned definitions, an illustrative example is shown.
Lets suppose that there is a sample of a certain random variable X1
. The null
hypothesis is that these data points are normally distributed with mean equal to
0 (µ = 0) and a certain standard deviation (σ). Therefore, H0 corresponds to
X ∼ N(0, σ2
) and for the Ha it can be said that X ∼ N(φ, σ2
) where φ ≥ 0
Now the following test statistic is taken in order to verify whether µ = 0
=
¯X
s√
n
(4.1)
where n denotes the sample size, ¯X the sample mean and σ the given standard
deviation. This is known as the “z-test.”
Now considering the case where ¯X = 3 , n = 10 and σ = 8, then = 2.4.
Therefore, by setting the significance to α = 0.01 the p-value is
ρ = P(X > ) = 1 − 0.9918 = 0.0082 (4.2)
1
samples of the random variable X which are identically distributed and independent.

hence, ρ < α. Therefore, under the null hypothesis, it is highly unlikely that X > 2.4
and hence the null hypothesis can be rejected.
4.2 Backtesting Value at Risk
The following section depicts different popular backtesting models used for VaR.
Specifically, the regulatory framework as explained by Campbell [9] is exposed as
the principal method used by the BCBS. Moreover, the unconditional coverage tests
proposed by Li [33], Jorion [28], Kupiec [31] and Christoffersen [12] are shown. As a
complement to the unconditional coverage test, the independence test in the version
of the Markov test by Christoffersen’s [13] as well as Christoffersen and Pelletier’s [13]
duration test is introduced. Finally, the conditional coverage tests, which covers both
the independence and the unconditional coverage property, described by Christof-
fersen [12] and Christoffersen and Pelletier [13] are described (for more backtesting
tests refer to Nieppola [41] and Campbell [9]).
4.2.1 Regulatory Framework
The regulatory guidelines require banks to calculate the capital needed to be
set aside in order to cover non-conventional losses. The amount that should be
reserved in the regulatory nomenclature is denoted as the market risk capital (MRC).
The MRC is a function of the internal VaR that the financial institution calculates.
Specifically, the MRC takes the highest of the following two factors. First, the
traditional 1% VaR calculated over a 10 day horizon. Second, the 60 day average
of the previous reported 1% VaR adjusted by a factor (st). In a mathematical

perspective, it is defined with the following formula
MRCt = max(VaRt(0.01), st
1
60
59
i=0
VaRt−i(0.01)) + ct (4.3)
where ct corresponds to the credit risk associated with the bank’s portfolio. Moreover
st is a multiplication factor determined by the times of VaR violations in the previous
250 trading days. Or more specifically
st =



3 if N ≤ 4
3 + 0.2(N − 4) if 5 ≤ N ≤ 9
4 if 10 < N
(4.4)
where N denotes the number of violations exceeding VaR.
Put into words, when the factor st increases (more violations of VaR in the 250
testing days), then the term on the right-hand side of Equation 4.4 augments and
therefore the MRC increments. This is logical as more violations of the VaR point out
that the current VaR calculation model may not be accurate and, therefore, should
be adjusted in order to improve the MRC level. Campbell [9] calls this technique the
“traffic light” approach as the multiplicative factor st is divided into three different
sets: the “green” light, which logically means the least amount of VaR violations,
the “amber” light which accounts for a higher amount of VaR violations and the
“red” light which is the maximum value taken by st.
4.2.2 Statistical Framework
In this section, the statistical terms that are a common denominator to the back-
testing procedures are introduced.
A key term in VaR backtesting is the “hit” function which counts how many

times the profit and loss realizations during a certain time exceed the VaR estimate.
Put into a mathematical context
It+1(α) =



1 if Xt,t+1 ≤ −VaRt(α)
0 if Xt,t+1 ≥ −VaRt(α)
(4.5)
where Xt,t+1 denotes the profit and loss over the period (t, t + 1). In his work from
1998, Christoffersen [12] mentions that the VaR accuracy can be determined by
inspecting whether the “hit” sequence fulfils the following properties.
• Unconditional Coverage Property. The property of unconditional cover-
age defines that the probability that a realized loss exceeds the VaR estimate
should be α · 100%. In other words, P(It+1 = 1) = α. As an illustrative ex-
ample, let α = 0.05. In this case, it would be expected to encounter 5 VaR
violation for every 100 realized returns in the case the VaR estimate is congru-
ent. However, if there were more VaR violations then the VaR estimate may
underestimate the “real” risk. On the other hand, if the VaR violations are
less than 5, then the VaR estimate may overestimate the“real” risk.
• Independence Property. This property analyses how VaR violations occur.
Specifically, the independence property states that two arbitrary elements of
the “hit” sequence have to be strictly independent of each other. In other
words, the prior history of the “hit” sequence should not convey any kind of
information on whether the future “hit” sequence occurs. As an illustrative
example, if there is clustering in the data, it may be expected that the “hit”
sequence clusters on that same period. In this case, the evidence may suggest
that the times of the “hit” sequence are not independent.
• Conditional Coverage Property. This property is mainly a joint test that

considers the unconditional coverage property as well as the independence prop-
erty simultaneously. Campbell [9] synthesizes this property with the following
statement
It(α) ∼ B(α) i.i.d. (4.6)
where B(α) denotes the Bernoulli distribution with probability α.
4.2.3 Unconditional Coverage Tests
In this section the unconditional coverage tests proposed by Li [33], Jorion [28],
Kupiec [31] and Christoﬀersen [12] are discussed.
4.2.3.1 Violation Ratio
The following test taken from Li [33] is composed of the following test statistic.
ζ =
T
t=1 It(α)
T · α
(4.7)
Put into words, if the VaR estimate were accurate then the numerator of Equation
4.7 would be close to the denominator. Therefore, the sum of the “hit” arrivals would
be similar to the theoretical expected VaR violations. The rule of thumb in this case
in order to verify the result is that 0.8 < ζ < 1.2.

4.2.3.2 Failure Test
This test, exposed by Jorion [28], records the failure rate, which is calculated as
the proportion of the time in which VaR violations occur. Let N be the number
of exceptions and T as the total number of days analysed. Hence, N
T
denotes the
failure rate. Ideally, α = N
T
should be an unbiased estimator for the probability of α
which denotes the conﬁdence level of the test. The set-up for this test is exactly the
testing framework of Bernoulli trials. That is, under the null hypothesis the number
of exceptions N is distributed with the binomial distribution
n
N
αk
(1 − α)n−k
(4.8)
where the mean and variance are nα and nα(1 − α) respectively.
In the case where T is large enough, then using the Central Limit Theorem (CLT)
the binomial distribution can be approximated by the normal distribution
m =
N − αn
α(1 − α)n
d
−→ N(0, 1). (4.9)
consequently, it is known that m is approximately distributed with a normal dis-
tribution so the critical values can be obtained directly. For example, if the test is
deﬁned with a 95% level (α = 0.05), the correspondent critical value is 1.96.

4.2.3.3 Proportion of Failures (POF)
The POF test, proposed by Kupiec [31], is based on the following test statistic
(using the notation in [9])
POF = 2 log((
α
α
)I(α)
(
1 − α
1 − α
)T−I(α)
)
I(α) =
T
t=1
It(α)
α =
I(α)
T
(4.10)
where T denotes the number of total observations, and It(α) is the “hit” sequence.
By simple inspection of Equation 4.10 it can be seen that, if the empirical prob-
ability of VaR violations (α) is exactly the same as α, then the POF test statistic
collapses to the value of zero. Conversely, when the empirical probability VaR viola-
tions is diﬀerent to the expected violation rate (α) then the POF test statistic may
indicate that the VaR overestimates or underestimates the actual underlying risk. As
an example, it would be expected for one trading year (i.e. T = 255) with α = 0.03
to spot on average 7.65 VaR violations. In the case when the actual amount of VaR
violations would be 12, α = 0.047, α = 0.03 so the POF would be equal to -11.52.
A normalized version of the POF can be expressed in the following way (using
the notation of [9]).
z =
√
T(α − α)
α(1 − α)
(4.11)
As the distribution of the test statistic z is normally distributed, the hypothesis
testing procedure may be undertaken in the traditional way. In other words, the
suitable critical point for a normal distribution would be compared to the realized
test statistic in order to determine the acceptance or rejection of the null hypothesis.

An advantage of this approach is that when there are no VaR violations at all then
z = 0. This fixes an anomaly encountered in the POF stated in Equation 4.10 as it
is undefined when there are no VaR violations at all since log(0) is not defined [9].
4.2.3.4 Christoffersen’s Unconditional Coverage Test
Christoffersen [12] proposes the following test statistic in order to test the uncon-
ditional coverage property
CUCT = 2 log((α)I(α)
(1 − α)T−I(α)
) − 2 log((α)I(α)
(1 − α)T−I(α)
)
I(α) =
T
t=1
It(α)
α =
I(α)
T
(4.12)
when T → ∞.
CUCT
d
−−−→
T→∞
χ2
(1) (4.13)
For example if α = 0.05 then the critical value with which the test statistic γ would
be compared is a χ2
with one degree of freedom.
In spite of the unconditional coverage tests’ simplicity and popularity, they are
haunted by an important pitfall. There is no analysis whether the VaR violations
occur in a specific fashion (i.e they “cluster” in certain periods or occur in pairs). As
a consequence, a complementary property needs to exploit the independence between
various groups of VaR violations.

4.2.4 Independence Tests
In this section the independence test in the version of the Markov test by Christof-
fersen’s [12] as well as Christoffersen and Pelletier’s [13] duration test is introduced.
4.2.4.1 Christoffersen’s Independence Test (Markov Test)
The Markov Test inspects the independence property by implementing the fol-
lowing 2 × 2 contingency table
It(α) = 0 It(α) = 1
It+1(α) = 0 T1 T3 T1 + T3
It+1(α) = 1 T2 T4 T2 + T4
T1 + T2 T3 + T4 T
Table 4.2: Contingency Table for Christoffersen’s Markov Test 2
where It(α) corresponds to the “hit” sequence as defined in Section 4.2.2. More-
over, T1 and T2 represent the non-violation and violation of the VaR at time t + 1
given that there was no violation in the prior time step, respectively. Conversely, T3
and T4 represent the non-violation and violation of the VaR at time t + 1 given that
there was a violation in the prior time step, respectively.
Ideally if the process It+1(α) is independent, then the following should hold
T2
T1 + T2
=
T4
T3 + T4
(4.14)
In other words, the proportion of VaR violations given that there was no violation in
the previous time step should be the same as the proportion of VaR violations given
that there was a VaR violation in the previous period. Therefore, the fact that there
was or was not a violation in the previous time step does not provide any kind of
2
Table taken from [13]

information to whether there is a VaR violation in the current time step and hence
the independence property holds.
The test statistic is defined as follows
CIT = −2 ln
(1 − π)T1+T3
(π)T2+T4
(1 − π0)T1 πT2
0 (1 − π1)T3 πT4
1
(4.15)
where
π0 =
T2
T1 + T2
π1 =
T4
T3 + T4
π =
T2 + T4
T1 + T2 + T3 + T4
(4.16)
and CIT is distributed with a χ2
distribution with one degree of freedom.
For example, if α = 0.05 then the critical value with which the test statistic CIT
would be compared to the critical value of a χ2
with one degree of freedom.
4.2.4.2 Christoffersen and Pelletier’s Duration Test
Christoffersen and Pelletier [13] proposed in 2004 a different approach in order
to prove the independent property in VaR calculations. In the case VaR violations
are independent of each other, the time elapsed between two VaR violations should
be independent of the time that elapsed since the last violation. In other words,
Campbell [9] mentions that the time between VaR violations should not present any
type of “duration dependence.”
Despite the sophistication of this approach, it can not be depicted in a 2×2 matrix
contingency table as in the Markov test. Therefore, a whole statistical model has to
be estimated for the duration between VaR violations. In their work, Christoffersen
and Pelletier [13] propose the exponential distribution as the desired distribution of

the duration between VaR violations as it possesses the property of memory-loss.
4.2.5 Conditional Coverage Tests
In order to have a reliable VaR measure, the independence as well as the un-
conditional coverage property need to be fulfilled. The following section covers the
conditional coverage tests proposed by Christoffersen [12] and Christoffersen and
Pelletier [13].
4.2.5.1 Joint Markov Test
The joint Markov test is based on the duration test by Christoffersen and Pel-
letier [13] used in Section 4.2.4.2 combined with the Markov test implemented by
Christoffersen [12]. Invoking Table 4.2, the joint Markov test proposes the following
equality in case the unconditional coverage and the independence property hold
T2
T1 + T2
=
T4
T3 + T4
= α (4.17)
where α is the confidence level for the test. Specifically, the LHS of the equality corre-
sponds to the independence property and the RHS corresponds to the unconditional
coverage property.
4.2.5.2 Christoffersen’s Conditional Coverage Joint Test
Christoffersen’s conditional coverage is simply the aggregation of the [12] uncon-
ditional coverage test and the [12].
The test statistic of both tests is added up to create a new test statistic CCCT
CCCT = CIT + CUCT (4.18)

where CCCT is distributed with a χ2
with two degrees of freedom. Therefore, the
value of the new test statistic is compared to the correspondent critical value of a χ2
distribution with 2 degrees of freedom.
4.3 Backtesting Expected Shortfall
In the case of backtesting ES, the procedure is not as direct as the VaR backtesting
according to Wimmerstedt [53] and Acerbi and Szekely [1]. Some authors attribute
this difficulty to the fact that its is does not fulfil the property of elicitability (see
Definition 3.1.2) [26].
In this thesis, the method employed by Emmer et al. [24] and the various tests
implemented by Acerbi and Szekely [1] are reviewed (for more backtesting methods
refer to Clift et al. [14]).
4.3.1 Quantile Approximation
The following approach is based on a research paper by Acerbi and Tasche [2].
This method is recognized by its simplicity as it is far less complex than the other
approaches used to backtest ES. As a first step, ES is represented in terms of VaR.
ESt(α) =
1
1 − α
ˆ 1
α
VaRt(k)dk (4.19)

In the next step, dividing the interval [1, α] into four subintervals of equal length
∆k = 1−α
4
, the following is obtained.
[α, α +
(1 − α)
4
]
k0
, [α +
(1 − α)
4
, α +
(1 − α)
2
]
k1
,
[α +
(1 − α)
2
, α +
3
4
(1 − α)]
k2
, [α +
3
4
(1 − α), 1]
k3
As a next step, approximating the integral in Equation 4.19 using Riemann sums
the following holds.
ESt(α) ≈
4
i=1
VaRt(k − 1)∆k (4.20)
Finally, by simplifying the expression the desired result is obtained.
ESt(α) ≈
1
4
[VaRt(α) + VaRt(0.75α + 0.25) + VaRt(0.5α + 0.5) + VaRt(0.25α + 0.75)]
(4.21)
For example when α = 0.01 the following holds
ESt(0.01) ≈
1
4
[VaRt(0.01) + VaRt(0.2575) + VaRt(0.505) + VaRt(0.7525)] (4.22)
where VaRt(α) correspond to the backtested VaR estimates. Therefore, the various
VaR estimates need to be backtested in order to determine if the ES passes the
backtesting procedure.
A remarkable advantage of this method is that it does not rely on Monte Carlo
simulations [1]. However, due to the fact that this method is based on a linear
approximation of the ES, it may sometimes be diﬃcult to assess how many supporting
points suﬃce in order to ensure the reliability of the backtesting procedure.

4.3.2 Acerbi and Szekely Test
The following collection of non-parametrical tests proposed by Acerbi and Szekely
[1] is implemented using Monte Carlo simulations. As the test statistic does not have
a predefined distribution, simulations need to be implemented to obtain a reliable
empirical distribution.
In this case, the null hypothesis stands for the fact that the predicted model
perfectly fits the realized model. Therefore, the estimate of ES passes the backtest.
Finally, it is worth noting that this is a one-sided test. In other words, the null
hypothesis is rejected only if the risk measure underestimates the actual risk. Hence,
the null hypothesis may be accepted with a risk measure that overestimates the
actual risk.
4.3.2.1 Test I
Invoking Equation 3.5 the following holds
ESt(α) = −E[X|X + VaRt(α) < 0] (4.23)
where [Xi]T
i=1 corresponds to the series of returns. Rewriting Equation 4.23 the
following equation is obtained
E
Xt
ESt(α)
+ 1|Xt + VaRt(α) < 0 = 0 (4.24)
using the definition of the “hit” function It(α) from Section 4.2.2, denoting T as
the number of observations and NT the number of VaR violations, the following test

statistic is deﬁned
Z1(X) =
T
t=1
XtIt
|ESt(α)|
NT
+ 1 (4.25)
In the next step, the hypothesis testing is implemented by deﬁning the following
H0 : Pα
t = Fα
t , ∀ t (4.26)
where Pα
t corresponds to the conditional tail distribution of Pt which is the predicted
distribution of returns (known). Moreover, Ft corresponds to the realized distribution
of returns (unknown) and Fα
t denotes the conditional tail distribution.3
The alternative hypothesis the following holds
Ha : ESt(α) ≥ ESt(α) ∀t
VaRt(α) = VaRt(α) ∀t
(4.27)
where ESt(α) and VaRt(α) denote the estimated ES and VaR from the realized
returns.
Put into words, under the alternative hypothesis the ES is underestimated by
the model. Nevertheless, the VaR estimate is not rejected. Therefore, this test
is just exposed to the magnitude of the VaR and is independent of the violation’s
frequency [1]. Furthermore,
EH0 [Z1] = 0 and EH1 [Z1] < 0 (4.28)
put into words, if the mean of the test statistic Z1 is 0 then the ES passes the
3
Acerbi and Szekely assume that the functions Ft and Pt are continuous.

backtest. However, if the mean is different than zero then there is enough evidence
to show that the ES could be underestimated.
4.3.2.2 Test II
The second test is based on the unconditional representation of ES as shown in
the following formula
ESt(α) = −E
XtIt(α)
α
(4.29)
where [Xi]T
i=1 corresponds to the series of returns. Furthermore, It(α) corresponds
to the “hit” function from Section 4.2.2. After rearranging the following holds
Z2(X) =
T
t=1
XtIt
Tα|ESt(α)|
+ 1 (4.30)
As a next step, in order to implement the hypothesis testing the following is defined
H0 : Pα
t = Fα
t , ∀ t (4.31)
where Pα
t corresponds to the conditional tail distribution of Pt which is the predicted
distribution of returns (known). Moreover, Ft corresponds to the realized distribution
of returns (unknown) and Fα
t denotes the conditional tail distribution.4
Put into
words, the null hypothesis H0 describes that the predicted model perfectly fits the
realized model. Therefore, the estimate of ES passes the backtest.
4
Acerbi and Szekely assume that the functions Ft and Pt are continuous.

For the alternative hypothesis the following holds
Ha : ESt(α) ≥ ESt(α) ∀t
VaRt(α) ≥ VaRt(α) ∀t
(4.32)
where ESt(α) and VaRt(α) denote the estimated ES and VaR from the realized
returns. Put into words, the ES is underestimated by the model compared to the
realized model. Moreover, the alternative hypothesis rejects ES and VaR jointly.
Therefore, this test is aﬀected by both the magnitude as well as the VaR violations.
Additionally,
EH0 [Z2] = 0 and EH1 [Z2] < 0 (4.33)
Finally, Acerbi and Szekely [1] propose the following relationship between the
two test statistics
Z2 = 1 − (1 − Z1)
T
t=1 It(α)
Tα
(4.34)
4.3.2.3 Test III
The following approach is based on Berkowitz [7]. The test analyses if the ob-
served ranks Ut = P(Xt) are i.i.d. U(0, 1). Ideally, P(Xt) ∼ U(0, 1).
Acerbi and Szekely [1] use the following deﬁnition of ES
ESN
t (α) = ESN
t,α(Y ) = −
1
[Nα]
[Nα]
t=1
(Yt) (4.35)
where N is the number of returns and Y corresponds to the ordered returns. Addi-
tionally, the operator [·] corresponds to the lowest integer operator. In other words,

Equation 4.35 corresponds to the average return weighted by Nα, which is the ex-
pected number of exceptions in the sample N. Hence, the following test statistic is
proposed
Z3 = −
1
T
T
t=1
EST
t,α(P−1
t (U))
EV [EST
t,α(P−1
t (V ))]
+ 1 (4.36)
as already stated in Section 4.3.2.1 and 4.30 the following holds
EH0 [Z3] = 0 and EH1 [Z3] < 0 (4.37)
for this case the null hypothesis is tested
H0 : Pt = Ft ∀t (4.38)
against the alternative hypothesis
H1 : Pt Ft ∀t (4.39)
where stands for weak stochastic dominance.

Chapter 5
Backtesting VaR and ES with the
Generated Data
In this chapter a methodology is proposed in order to analyse the backtesting
procedures regarding VaR and ES using data enriched with the properties of volatility
clustering and fat tails. Also, the analysis and the results are shown having in mind
the potential advantages and/or drawbacks when dealing with these stylized facts.
59

Chapter 5. Backtesting VaR and ES with the Generated Data 60
Methodology
The methodology is exposed below and its structure is as follows.
1. Data generation. As a first step, data is generated based on the GARCH(1,1)
model and Student’s t distribution. Moreover, the generated dataset is divided
in “in-sample” and “out-of-sample”. Specifically, the “in-sample” subset is
used to estimate the VaR and ES and the “out-of-sample” subset is used for
the backtesting procedures.
2. Computation of risk measures. In this step, a detailed explanation to
estimate VaR and ES based on the “in-sample” subset. Moreover, the use of
extra simulations guarantee the robustness of the estimations.
3. Backtesting of risk measures using selected tests. In this step, the
performance of the estimates of VaR and ES calculated in the previous step is
analysed using the “out-of-sample” subset. Specifically, a selection of the tests
exposed in Sections 4.2 and 4.3 are implemented.
4. Analysis of results. As a final step, an analysis is carried out to assess the
statistical significance and viability of the estimations.
5.1 Data Generation
In this section, the information of the stock of JPMM is used to estimate the
parameters for the GARCH(1,1) model and the Student t’s distribution.
Furthermore, the “in-sample” and “out-of-sample” subsets are generated. Specif-
ically, for the “in-sample” a total number of 7,000 paths composed of 10,000 simula-
tions are calculated to provide a robust workspace for the estimation of VaR and ES
in the next section. The “out-of-sample” set is constituted with one path of 10,000
simulations.

It is worth noting that the “out-of-sample” data is independent with respect
to the “in-sample” set. This ensures the independence of the estimation and the
validation of the model.
5.1.1 Volatility Clustering
As already mentioned in Section 2.1.1, the GARCH(1,1) model is used to capture
the volatility clustering effect property that is observed in financial time series.
First, using the daily price returns of the stock JPMM starting from the year 1983
(official date of the financial time series) to 20151
, the continuously daily compounded
returns are calculated as depicted in Figure 5.1.
Figure 5.1: Daily Returns of JPMM
Second, before fitting the GARCH(1,1) process to the JPMM compounded daily
returns, according to Tsay [49], significant autocorrelations need to be eliminated
from the data. In other words, it needs to be tested whether there exist autocorrela-
tions in the JPMM returns. In order to address this, the Ljung-Box test is undertaken
and the graphs of the autocorrelation and partial autocorrelation are inspected (for
1
Prices obtained from www.yahoofinance.com.

more information about this test refer to Ljung and Box [35]).
Moreover, another condition to ensure the suitability of the GARCH(1,1) model
is based on testing whether the residuals are serially correlated as stated by Da
Rocha [19]. Specifically, there needs to be evidence of an outstanding ARCH effect.
This property is proved using Engle’s test (for more information refer to Engle [25]).
Figure 5.2 depicts the sample autocorrelation function as well as the sample par-
tial autocorrelation function. As it can be seen, the residuals are not significantly
different than zero with a significance level α = 0.05. This can be corroborated with
the Lung-Box test in Table 5.1. As the p-value of the Ljung-Box test is bigger than
the significance α = 0.05, there is not enough evidence to reject the null hypothesis
which mentions that residuals are not serially autocorrelated. Moreover, analysing
the same table, the Engle ARCH effect test presents a p-value of 0 and therefore
shows that there exists an ARCH effect in the data. In summary, there is no further
statistical treatment needed for the returns series of JPMM as is already lacks of
autocorrelation and possesses the ARCH effect. Given the prior statistical analysis,
it is indeed reasonable to use the GARCH(1,1) model to fit the data.
Test statistic Critical value p-value
Ljung-Box test 28.2229 31.4104 0.1042
Engle ARCH effect test 472.3581 3.8415 0
Table 5.1: Statistical Test for the Residuals of the Returns Data Series of JPMM with
α = 0.05
As a next step, the parameters of the GARCH(1,1) model are estimated2
. Table
5.2 depicts the estimated parameters together with the correspondent standard error.
2
The estimation of the parameters is calculated based on the maximum likelihood approach
men using the built-in economic toolbox in Matlab 2016a.

Figure 5.2: Sample Autocorrelation Function and Sample Partial Autocorrelation Func-
tion
Parameter Estimated value Standard error
ω 0.0321078 0.00404445
β 0.91575 0.0021361
α 0.0830329 0.00164358
Table 5.2: Parameter Estimates, Standard Error and Test Statistic for the Fitted
GARCH(1,1) Model
Figure 5.3 graphs the conditional variance jointly with the returns time series for
a selected path. As it can be seen from the graph, the volatility clustering eﬀect is
present in the simulation of the GARCH(1,1) model3
.
Finally, Figure 5.4 graphs the sample correlogram for the conditional variance and
returns respectively for the same path as Figure 5.3. Speciﬁcally, for the conditional
variance, there exists a high dependence with respect to the previous variance. That
is expected due to the fact that the GARCH(1,1) model is highly dependent on the
volatility of the previous time step. However, the correlation slowly deteriorates as
the lag between variances increases.
3
The simulation of the parameters are calculated using the Monte Carlo method in the built-in
economic toolbox in Matlab 2016a.

Figure 5.3: Simulated Conditional Variance and Returns for the Fitted GARCH(1,1)
Model for a Selected Path
Figure 5.4: Sample Autocorrelations for the Conditional Variance (up) and Returns
(down)

5.1.2 Fat Tails
As already mentioned in Section 2.2 the Student’s t distribution is used to embody
the fat tails property that is observed in financial time series.
First, using the daily compounded returns of the stock JPMM starting from the
year 1983 to 2015, the parameters fo the Student’s t distribution are estimated4
.
Table 5.3 depicts the estimated values for the fitted Student’s t distribution model.
Figure 5.5 portrays the fitting process of the Student’s t distribution in the ob-
served empirical data. Specifically, the empirical distribution of the JPMM returns
is graphed in conjunction with the fitted Student’s t distribution with the estimated
parameters of Table 5.3.
Parameter Estimated value Standard error
µ 0.0269772 0.0192303
σ 1.40222 0.021351
ν 2.82064 0.104641
Table 5.3: Estimates and Standard Error of the Fitted Student’s t Distribution
Second, Table 5.4 presents the kurtosis of the Student’s t distribution in compar-
ison with the one of a normal distribution. This suggests the presence of fat tails in
the estimated model. Finally, the “in-sample” data is simulated. tails5
.
Fitted Student’s t distribution 30.4492
Fitted Normal Distribution 3.0797
Table 5.4: Kurtosis of the Simulated Student’s t Distribution
4
The estimation of the parameters are calculated based on the maximum likelihood approach
men using the built-in economic toolbox in Matlab 2016a.
5
The simulation of the parameters are calculated using the Monte Carlo method (inverse CDF
approach) in Matlab 2016a.

Figure 5.5: Fitted Student’s t Distribution vs. Empirical Distribution

5.2 Computation of Risk Measures
In this section, the estimations of VaR and ES are calculated with “in-sample”
data generated in Section 5.1.
It is worth noting that the significance levels of α = 0.05, 0.025, 1 are used to
compute the risk measures, as this are the ones most used in practice.
5.2.1 Computation of Value at Risk
The approach used for the calculation of VaR is the Monte Carlo method. As
already described in Section 3.2, one of the most important disadvantages of the
Monte Carlo method is that the computational cost is extremely high [41]. However,
this method is really useful for treating complex processes like the GARCH(1,1)
model.
First, for the GARCH(1,1) model, as there is no predefined distribution that
models the process, an iterative simulation approach is implemented in order to ob-
tain a reliable and robust VaR estimate. Specifically, the VaR estimate is calculated
with the following procedure.
1. For every sample path, the 7,000 simulations are sorted from lowest to highest
simulated returns.
2. In order to find the appropriate return that corresponds to the VaR then the
correspondent index (ια) is computed with the following formula
ια = c(Nα) (5.1)
where N corresponds to the total number of simulations, in this case 7,000.
Moreover, 1 − α stands for the desired confidence level. Furthermore, c(˙) de-

notes the ceiling function, which rounds a certain input value to the nearest
higher possible integer, therefore ια ∈ N.
3. Once ια is calculated, it is substituted back into the correspondent sorted vector
to obtain the estimated VaR value. In other words,
VaRk
t (α) = ϑk
(ια) (5.2)
where ϑ() corresponds to the sorted vector of returns calculated in the first step
and k embodies the current simulation path being analysed (k = 1 . . . 10, 000)
[52].
The simulations for each path of the GARCH(1,1) model is used to calculate a
single VaR estimate. By the Law of Large Numbers (LLN) (refer to Durrett [23]
for a definition of the Law of Large Numbers), if the estimate is based on various
simulations, it will converge to the mean which in this case is the ES. Moreover, The
LLN applies whenever the random variable (in this case the sampling procedure) has
a bounded variance. Particularly, the GARCH(1,1) model has a finite variance as
Equation 2.3 holds for the estimated parameters. Figure 5.6 presents the cumulative
mean of the VaR estimates with selected values of α. The graph suggests that the
cumulative mean stabilizes as more VaR simulations are averaged out.
Sinharay [47] propose the running mean plots as a useful way to validate if the
Monte Carlo method converges appropriately. In case the running mean plot sta-
bilizes, the algorithm converges. Figure 5.7 depicts the moving average for selected
values of α using a window of 1,000 observations as well as the overall mean. As
it can be observed, the moving average values remain stationary and close to the
overall mean.

Figure 5.6: Cumulative Mean for the VaR of the GARCH(1,1) Model with Selected Values
of α.
Figure 5.7: Moving Average with a Window of 1,000 Observations for the VaR of the
GARCH(1,1) Model with Selected Values of α.

In the case of the Student’s t distribution, the estimation of VaR is less complex.
Specifically, as the Student’s t distribution is a predefined probability distribution
with specified parameters, the calculation of VaR collapses on finding the correspon-
dent quantile in terms of α.
For the sake of completeness, the same simulation procedure is undertaken as in
the GARCH(1,1) model in order to verify that the simulations indeed converge to
the quantile value denoted by the probability distribution. Simulations within each
path are generated using the inverse cumulative density function (CDF) approach6
.
Furthermore, the convergence test suggested by Sinharay [47] is redundant when
implemented with the Student’s t distribution as it is a parametrical distribution
with a delimited probability density function.
Figure 5.8 portrays the cumulative mean taken from the Student’s t simulation.
As it can be observed, the convergence of the cumulative mean of the simulations to
the quantile value of the distribution occurs almost intermediately.
Finally, Table 5.5 presents the VaR estimates for the GARCH(1,1) model as well
as the Student’s t distribution. As it can be seen on the table, the magnitude of the
VaR values for every α are higher in the case of the GARCH(1,1) compared to the
Student’s t distribution. This may be due to the fact that the implied GARCH(1,1)
simulated distribution possesses a higher frequency of extreme values due to the
volatility clustering effect. Moreover, for the Student’s t distribution it could be the
case that, although extreme values are theoretically present, probably the frequency
is not high enough in order to affect the VaR estimate.
6
For more information regarding this method to generate random numbers with the desired
distribution refer to a standard statistics book, for example, Dowd [22]. Additionally, the uniform
random numbers needed are extracted from the rand() function in Matlab 2016a.

Figure 5.8: Cumulative Mean for the VaR of the Student’s t Distribution with Selected
Values of α in Conjunction with the Quantile Values Derived from the Distribution
-VaR
GARCH(1,1) α = 0.05 -5.8043
GARCH(1,1) α = 0.025 -8.2585
GARCH(1,1) α = 0.01 -12.2022
Student’s t distribution α = 0.05 -3.3603
Table 5.5: Estimates of VaR for the GARCH(1,1) Model and the Student’s t Distribution
with the Correspondent α.

5.2.2 Computation of Expected Shortfall
In order to calculate the Expected Shortfall, Equation 3.5 is used. The average
value is taken over all losses that exceed the VaR value calculated in Section 5.2.1.
This method is applied to both the GARCH(1,1) model as well as the Student’s t
distribution. In order to find a reliable estimate of ES, for the GARCH(1,1) model
and the Student’s t distribution, the arithmetic mean is proposed. Specifically, the
mean is calculated as the average of the ES values for the different paths.
Firstly, Figure 5.9 describes the cumulative average of the GARCH(1,1) ES es-
timate. As it can be seen on the graph, the cumulative mean stabilizes after some
iterations. As already mentioned in Section 5.2.1, Figure 5.10 serves as a visual test
to assess if a convergence is reached for the value of the ES. In other words, a moving
average with a time window of 1,000 observations is implemented to test whether
the moving averages vary with respect to each other. After inspecting the graph, the
moving average indeed remains stationary and it is close to the overall mean.
Secondly, the ES estimate for the Student’s t distribution is computed and anal-
ysed. Due to the fact that the Student’s t distribution is a parametrical probability
distribution, the actual convergence of the simulations to the “real value” is ex-
pected to be reached rapidly. For the sake of completeness, Figure 5.11 presents the
cumulative mean for the ES value of the Student’s t distribution. As expected, the
convergence occurs fast.
Table 5.6 shows that the ES values obtained for the GARCH(1,1) model are
greater in magnitude than the ones calculated for the Student’s t distribution.
It can be noted that the “in-sample” calculations for VaR and ES are stationary
and thus do not take into account the “out-of-sample” innovations. This is important
as risk measures should be exposed to data different from the one that was used to
estimate them. The fact that the risk measure is not constantly morphing to adapt
to new data provides better insights into the potential pitfalls of their performance.

Figure 5.9: Cumulative Mean for the ES of the GARCH(1,1) Model with Selected Values
of α
Figure 5.10: Moving Average with a Window of 1,000 Observations for the ES of the
GARCH(1,1) Model with Selected Values of α.

Figure 5.11: Cumulative Mean for the ES of the Student’s t Distribution with Selected
Values of α.
-VaR -ES |V aR − ES|
GARCH(1,1) α = 0.05 -5.8042 -9.6233 3.8191
GARCH(1,1) α = 0.025 -8.2585 -12.5765 4.3180
GARCH(1,1) α = 0.01 -12.2021 -16.9243 4.7222
Student’s t distribution α = 0.05 -3.3606 -5.6972 2.3366
Table 5.6: Estimates of VaR and ES for Both Methods with Various Conﬁdence Levels α

5.3 Backtesting Value at Risk and Ex-
pected Shortfall Using Selected Tests
In the next step, the VaR and ES estimates from Section 5.2 are backtested
with the “out-of-sample” data. Particularly, a selection of the tests presented in
Sections 4.2 and 4.3 is implemented.
5.3.1 Backtesting Value at Risk
In order to backtest the VaR estimations obtained in Section 5.2.1, a selection of
the tests introduced in Section 4.2. Specifically, the collection of Christoffersen’s [12]
tests is implemented.
The key reasons why this group of tests is chosen are the following. First, the
whole collection provides an overall assessment of the most important properties that
VaR should fulfil in order for it to be a reliable market risk estimate. Second, these
tests have a predefined distribution for the test statistic, namely the χ2
distribution,
and therefore make the hypothesis testing more robust.
In the first step, the unconditional coverage property is tested using the test intro-
duced in Section 4.2.3.4. In the second step, the Markov’s test from Section 4.2.4.1
is undertaken to prove the independence property. Finally, the conditional coverage
property, which assess the overall performance of the backtesting procedure, is tested
using the conditional coverage test exposed in Section 4.2.5.2.
Before proceeding with these tests, the “hit” function, as defined in Section 4.2.2,
is analysed in order to provide further insights into the backtesting procedure. Fig-
ures 5.12 to 5.17 illustrate the performance of the VaR estimate with respect to the
“out-of-sample” data. Figures 5.12a, 5.13a, 5.14a, 5.15a, 5.16a and 5.17a show the
“out-of-sample” data analysed in conjunction with the estimated “in-sample” VaR

threshold. Moreover, these graphs present in red the simulated returns that exceed
the estimated VaR threshold. On the other hand, Figures 5.12b, 5.13b, 5.14b, 5.15b,
5.16b and 5.17b present the cumulative sum of the“hit” function.
In Figures 5.15 to 5.17 the GARCH(1,1) generated data is presented. In Fig-
ures 5.15a, 5.16a and 5.17a it can be seen that the returns show volatility clustering
and therefore the VaR is surpassed consecutively. This leads to the “hit” function
having a drastic jump, as depicted in Figures 5.15b, 5.16b and 5.17b.
In Figures 5.12 to 5.14 the Student t’s distribution is exposed. Figures 5.12a,
5.13a and 5.14a show that there is no clear volatility clustering effect in the data.
However, some returns are more extreme than the ones observed in the GARCH(1,1)
generated data. Moreover, Figures 5.12b, 5.13b and 5.14b show that VaR violations
arrive in a more uniform fashion when compared to the GARCH(1,1) data. This may
influence the independence property when implementing Christoffersen’s statistical
test further on.
Finally, it is interesting to note for both models that, when α decreases the plots
behave in a more erratic and discontinuous fashion. In the case of the Student’s t
distribution, the behaviour of the graph starts to lose the shape of a straight line.
Similarly, for the GARCH(1,1) model, the sudden cluster of VaR violations occur in
a more accentuated way.

(a) Returns and VaR Estimate (b) Sum of VaR Violations
Figure 5.12: Backtesting VaR with Student’s t Distribution Generated Data with α = 0.05
Figure 5.13: Backtesting VaR with Student’s t Distribution Generated Data with α =
0.025

Figure 5.15: Backtesting VaR with GARCH(1,1) Generated Data with α = 0.05
Figure 5.16: Backtesting VaR with GARCH(1,1) Generated Data with α = 0.025

Now, focus is given to the statistical tests proposed at the beginning of this
section.
Tables A.1 to A.6 provide a summary of the selected statistical tests. The test
statistic, as well as the p-value, are presented in order to test whether the null
hypothesis is true or false.
In this collection of statistical tests the null hypothesis stands for the desired
property. For instance, the null hypothesis in the independence test corresponds to
the fact that VaR violations are independent.
Firstly, Tables A.1 to A.3 show the summarized statistics for the GARCH(1,1)
model. Starting with the unconditional coverage analysis, it can be seen that the
p-value indicates that there is enough evidence to reject the null hypothesis for the
various significance levels. Hence, the “hit” function lacks the property of uncon-
ditional coverage. That is, the total number of exceptions embodied by the “hit”
function does not match the expected theoretical exceptions given by Tα. Likewise,
using Equation 4.7 a violation ratio of aproximately 0.5 is obtained. That means
that less VaR violations occur compared to what is expected. In summary, the VaR
estimate of the GARCH(1,1) model overestimates the “true” risk value. For exam-
ple, Figure 5.15 states that when α = 0.05, there are almost 300 observations that
surpass VaR, compared to 10, 000 ·
α
0.05 = 500 expected in theory. In practice, this
would give a very conservative calculation of the actual risk and may not optimally
use resources.
Secondly, Tables A.1 to A.3 show there is enough evidence to reject the hypothesis
that the “hit” function is independent. This behaves in line with what is visualized
in Figures 5.15 to 5.17. Therefore, the volatility clustering effect is a solid evidence
that the “hit” function is not independent. This is confirmed by the rule of thumb
in Equation 4.14 which shows a big disparity between the probability of the arrival
of a VaR violation given no violation in the previous period and the probability of

the arrival of a VaR violation given a VaR violation in the previous period.
Finally, the conditional coverage test, which jointly tests for the both above-
stated properties is not relevant as it was already shown that both are rejected for
the GARCH(1,1) model. Hence, the conditional coverage property does not hold.
Proceeding with the analysis of the Student’s t distribution in the next step,
Tables A.4 to A.6 present the statistical tests for this model.
First, the analysis of the unconditional coverage property across all the selected
signiﬁcance levels does not provide enough evidence to reject the null hypothesis.
In other words, the statistical test supports the fact that arrivals of VaR match the
ones stated by the model. This is supported by the fact that the violation ratio, as
calculated in Equation 4.7, lies between the desired value of 0.8 and 1.2. For example,
when α = 0.05, then the actual number of violations is very close to 500 as shown
in Figure 5.12b, while the number of theoretical violations is 10, 000 ·
α
0.05 = 500.
In summary, the expected VaR violations closely match the “out-of-sample” realized
VaR violations.
Second, the independence property for the Student’s t distribution’s “hit” func-
tion holds as there is not enough evidence to reject the hypothesis that the “hit”
function is independent. Similarly, this is indicated by the close magnitude on both
sides of Equation 4.14. In other words, the probability of the arrival of a VaR vi-
olation given no violation in the previous period compared to the arrival of a VaR
violation given a VaR violation in the previous period is similar.
Finally, the analysis of the conditional coverage property does not show enough
evidence to reject the hypothesis that the Student t’s distribution fulﬁls the condi-
tional coverage property. This is expected, as the test is composed of the uncondi-
tional coverage as well as the independence property which indeed hold.

5.3.2 Backtesting Expected Shortfall
In order to backtest ES estimations, a selection of the tests introduced in Section
4.3 is implemented. Specifically, Test I and II from Acerbi and Szekely [1] are used
for the backtesting procedure. Those tests were selected as they are non-parametric
as mentioned in Section 4.3.2 and, therefore, do not assume any kind of return
distribution. Within the tests, the unconditional coverage for ES is tested. The
independence property does not need to be tested as it is equivalent to the one
calculated in the VaR section.
A significant difference of this backtesting method compared to the methods
implemented for backtesting VaR is that the test statistic does not have a predefined
distribution. As a consequence, simulations need to be implemented in order to
propose a reliable empirical distribution for the test statistic.
Acerbi and Szekely [1] propose the following guideline in order to calculate the
empirical p-value.
1. Simulate independent and identically distributed samples of a certain return
distribution ˚Rj
t ∼ Rt ∀t, ∀j = 1, ...., N, where N corresponds to the number of
simulated paths.
2. Compute the test statistic Zj
= Z( ˚Rj
t ) based on the simulated returns.
3. Assess the test statistic by calculating its respective empirical p-value which is
determined as follows
ρ =
1
N
N
j=1
{Zj
< Z(
←−
R )} (5.3)
where Z(
←−
R ) corresponds to the “out-of-sample” realized value of the test statistic.
Finally, as already mentioned in Section 4.3.2, the null hypothesis of these tests
stands for the fact that the ES estimate is a good estimate for the market risk and,

therefore, the ES estimate passes the backtest. Nevertheless, it is a one-sided test. In
other words, the null hypothesis is rejected only if the risk measure underestimates
the actual risk. Hence, the null hypothesis may be accepted with a risk measure that
overestimates the actual risk.
5.3.2.1 Acerbi and Szekely Test I
First, the Test I as already explained in Section 4.3.2.1 is carried out. To obtain
the test statistic, the formula is recalled
Z1(X) =
T
t=1
XtIt
|ESt(α)|
NT
+ 1
Rearranging Equation 5.3.2.1, the following is obtained
Z1(X) =
T
t=1
XtIt
NT
|ESt(α)|
+ 1
now the numerator embodies the “out-of-sample” estimate of the ES while the “in-
sample” estimation is represented in the denominator. Hence, if the “out-of-sample”
and “in-sample” estimations of the ES are identical, Z1(X) = 0. This means that,
the “in-sample” estimation of the ES passes the backtest successfully. Conversely,
when there exists a signiﬁcant diﬀerence between these two estimations, the null
hypothesis is rejected and therefore the ES underestimates the actual risk.
Acerbi and Szekely [1] mention that in order to perform this test, an estimate
of VaR needs to be available due to the existence of It(α). Furthermore, as already
introduced in Section 4.3, the authors mention that Z1(X) is an average over the
VaR exceptions. Therefore it is sensitive to the exception’s magnitude but not to its
frequency.

Backtesting Value at Risk and Expected Shortfall with Underlying Fat Tails and Volatility Clustering

Backtesting Value at Risk and Expected Shortfall with Underlying Fat Tails and Volatility Clustering

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (16)

Destaque

Destaque (20)

Semelhante a Backtesting Value at Risk and Expected Shortfall with Underlying Fat Tails and Volatility Clustering

Semelhante a Backtesting Value at Risk and Expected Shortfall with Underlying Fat Tails and Volatility Clustering (20)

Backtesting Value at Risk and Expected Shortfall with Underlying Fat Tails and Volatility Clustering