New_and_Improved_Robust_Portfolio_Selection_Models_ZUEV(dphil)

New and Improved Robust
Portfolio Selection Models
Denis Zuev
Balliol College
University of Oxford
A thesis submitted for the degree of
Doctor of Philosophy
Hilary Term 2009

Зимнее утро (winter morning)
A. S. Pushkin
24 июля 2009 г.
1 A section
Звонят колокола. Французской пахнет булкой.
На велосипеде парень - летит, закрывшись курткой.
Накрапывает дождик. Народ идет в пивную,
Чтоб пропустить с товарищем бутылочку хмельную.
А я сижу работаю - пишу для блага дома,
В библиотеке Оксфордской найдут приют два тома.
Но изредка смотрю на поле монитора
И формулы танцуют по воле режиссера.
01.06.2006
1.1 Accented characters
You can still use accents and diacritica as usual:
Übermäßiger Likörgenuss in der Öffentlichkeit führt zu Ärger
Être ou ne pas être c’est une question très intéressante en Fran¸cais
1.2 Шрифты
Конечно все варианты TEXовских шрифтов доступны и с кириллицей.
1

Acknowledgements
It is time for my DPhil to end, and therefore time to move on with my life, to float free into
the real world, a world for which I feel very well-prepared. As is the case for many students, this
thesis was difficult to write and, more, to ensure that it stands out and makes an impact. I hope I
have achieved this, something that makes me just a bit more proud, and more excited about what
the future may hold! For this, I would like to thank my supervisors Dr Raphael Hauser and Dr
William Shaw for obtaining this fully-funded DPhil project. I am especially grateful to Raphael
for his guidance, support and general ’looking-after’ during those four years at Oxford. I am truly
amazed by Raphael’s ability to always see into the heart of any problem in any subject. This thesis
would not have been possible without our regular discussions and Raphael’s insights and advice.
My DPhil project was kindly sponsored by EPSRC and Nomura, for which I am very grateful.
Moreover, I would like to thank William for his support and for introducing me to colleagues at
the Nomura Research in London with whom I had extremely valuable discussions. In particular, I
would like to thank Reza Ghassemieh, Roger Wilson, Andy Dougan and Eric Shirbini from Nomura
for sharing their experience and knowledge, as well as providing me with data for my experiments.
I would also like to thank Raphael for encouraging me to participate in conferences such as
CIM Thematic Term on Optimisation ’05, ISMP ’06, ICCOPT-II (MOPTA) ’07 and OptBridge ’07,
during which I had a number of stimulating discussions. I would like to thank Reha Tütüncü, Luis
Zuluaga, Victor DeMiguel, Zhaosong Lu, Ralf Werner, Katrine Schöttle, Olessandr Romanko and
many many other participants.
My thanks go to Sam Howison, Nick Gould, Ben Hambly and Michael Monoyios, my colleagues
at OCIAM and Comlab, for supporting my research. Special thanks goes to my fellow DPhil
friends from OCIAM and Comlab: Tino Kluge, Paddy Hewlett, Andrejs Novikovs, Nachi Gupta,
Jonothan Tse, Simona Svoboda, Klaus Schmidt, Daniel Schwarz and Max Skipper. Thanks for all
the discussions and fun that we have had both in and outside the office!
I would also like to thank Balliol College for providing me with a home for several long years and
for supporting me during conferences. Special thanks goes to my tutor at Balliol, Keith Hannabuss.
Life at Oxford turned out to be very special for me and for this I would like to thank all my
friends at Oxford and the Oxford University Volleyball Club for some magical moments on court
and for 16000 miles of cumulative match travel distance!
Finally, I would like to say that this work would not have been possible without incredible support
from my family, all of whom will be very proud of me once this thesis reaches the Oxford library!
I would like to especially acknowledge the enormous amount of trust, guidance, support and love
given by my wonderful parents, Anna and Sergei, and by my dear wife Marina. Thank you.

Abstract
Decision tools in risk management figure amongst the oldest applications of numerical
optimisation. Optimisation models that arise in this context typically rely on model
parameters. In practical applications these parameters have to be estimated and are
therefore not known with certainty.
In the first part of this thesis, we analyse the effects of parameter uncertainty on portfolio
selection problems. Our focus of research consists of identifying potentially extremely
harmful investment decisions in which a portfolio based on estimated risks and expected
returns may be far from the true unknown optimal portfolio. We prove that for a class
of portfolio selection problems, extremely harmful investment decisions arise whenever
the smallest eigenvalue of the estimated risk covariance matrix is very small allowing us
to develop a certificate for dangerous investments and a heuristic to limit the possibility
of these situations. This heuristic results in a number of new portfolio optimisation
problems which are shown to perform well both on simulated and historical market data.
In the second part, we mitigate the effects of instability of optimal investment decisions
as a function of the model parameters via robust optimisation, which aims at finding
solutions that behave well for values of the model parameters that lie in a chosen un-
certainty set. The existing literature thereby largely treats uncertainty in parameters
that model risk independently of parameters that model expected returns. We argue
that - in the typical situation where return data cannot be independently sampled but is
available through historical data - functional dependencies between the risk and expected
return terms of the model parameters arise naturally, leading to structured uncertainty
sets that are much smaller and less pessimistic than the standard models considered in
the literature. We show that several new portfolio optimisation models based on struc-
tured uncertainty sets that reside in quadratic manifolds have equivalent reformulations
as convex quadratic programming problems. Both our numerical results based on simu-
lated and on real market data suggest that the practical performance of our new models
compares favourably with existing methods.

Contents
1 Introduction 1
1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Overview of portfolio selection models . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Classical models: Markowitz approach . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Return models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.3 Advances in portfolio optimisation . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.4 Computing portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Limiting the Extremal Behaviour of Classical Investment Models 27
2.1 Understanding the portfolio selection problem . . . . . . . . . . . . . . . . . . . . . . 27
2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.1.3 Identifying extremal portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1.4 When to take action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.1.5 Heuristic to limit the extremal behaviour of classical portfolios . . . . . . . . 40
2.2 MMEH under APT assumptions for returns . . . . . . . . . . . . . . . . . . . . . . . 42
2.2.1 Nonlinear SDP methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2.2 Approximating algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2.3 Theoretical properties of the approximating algorithm . . . . . . . . . . . . . 46
2.2.4 Numerical properties of the sequential linear algorithm . . . . . . . . . . . . . 59
2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3 Robust Portfolio Optimisation with Structured Uncertainty Sets 63
3.1 Overview of robust models in ﬁnance . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.1.1 General framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.1.2 Robust counterparts in portfolio selection . . . . . . . . . . . . . . . . . . . . 65
3.1.3 Choices of uncertainty sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.2 The exploitation of structured uncertainty sets . . . . . . . . . . . . . . . . . . . . . 68
i

3.3 Structured uncertainty sets under Gaussian returns . . . . . . . . . . . . . . . . . . . 69
3.3.1 The a priori and a posteriori views on estimation . . . . . . . . . . . . . . . . 69
3.3.2 Robust investment models with a-posteriori estimation . . . . . . . . . . . . . 71
3.3.3 Comparison of a-priori and a-posteriori uncertainty sets . . . . . . . . . . . . 73
3.4 Multifactor APT model for asset returns . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.4.1 APT structured uncertainty sets . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.4.2 Robust risk-adjusted uncertain returns APT model . . . . . . . . . . . . . . . 80
3.4.3 Robust risk-adjusted APT model . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.4.4 Goldfarb Iyengar model [37] . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.4.5 Lu model [60] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.5 Analytical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.5.1 Solution to models with a-posteriori estimation . . . . . . . . . . . . . . . . . 83
3.5.2 Solution to the robust risk-adjusted APT problem . . . . . . . . . . . . . . . 88
3.6 Generalisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4 Numerical Experiments 96
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.2 Numerical results on simulated returns data . . . . . . . . . . . . . . . . . . . . . . . 97
4.2.1 Measures of performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.2.2 Generation of returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2.3 Portfolio properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.3 Numerical results on historical returns data . . . . . . . . . . . . . . . . . . . . . . . 103
4.3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3.2 Measures of performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.3.3 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.3.4 Access to implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5 Conclusions 118
5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.2.1 Extending MMEH methodology to diﬀerent constraint sets . . . . . . . . . . 119
5.2.2 Proving a linear convergence of the sequential linear algorithm . . . . . . . . 119
5.2.3 Finding a convex reformulation of other portfolio selection models under struc-
tured uncertainty sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.2.4 Modeling the size of the uncertainty sets for robust models . . . . . . . . . . 120
ii

A List of Stocks 122
Bibliography 122
iii

List of Figures
1.1 An illustration of an efficient frontier with explanations. . . . . . . . . . . . . . . . . 11
2.1 Real portfolio utility as a function of the smallest eigenvalue. . . . . . . . . . . . . . 31
2.2 The first illustration shows the uncertainty around µ0. The second illustration shows
the uncertainty around φ0 which was obtained from (2.12). λ = 0.5, κ = 1, µ0 =
(0.625, 2.042)T
, φ0 = (1/2, 1/2)T
. Vectors q1 and qn are the eigenvectors related to
the largest and the smallest eigenvalue of Σ respectively. Σ11 = 0.5, Σ12 = Σ21 = 0.7
and Σ22 = 3.33. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 Example of where possible values of φ∗ can lie. By minimising (2.17) we will prefer
the ellipse shown in the right hand illustration. . . . . . . . . . . . . . . . . . . . . . 33
2.4 Graph showing how ˜µ that belongs to the black circle projected onto a hyperplane L
described by the near-vertical black line that is perpendicular to the vector ˜1. The
red ellipse represents the ¯Σ−1/2
transformation and provides an insight into how L
will be stretched. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5 These graphs are similar to those in Figures 2.2 and 2.4, but where µ is believed to
belong to an polyhedral set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.6 Example of the 1D problem (Q) with F = 1, D = 1 subject to the constraints
−2 ≤ v ≤ 4. Finding the maximal t is equivalent to finding the maximal value of the
function in this figure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.7 The graphs shows the smallest eigenvalue function as a function of V ∈ G Vertical
lines define the set G. The graph also shows how A
(i)
k are constructed. . . . . . . . . 56
2.8 This figure shows the graph of log(tk+1 − tk) as a function of the k’th iteration. . . . 60
2.9 This figure shows the graph of log vk+1 − vk as a function of the k’th iteration. . . 61
2.10 This figure demonstrates the speed of convergence of tk. . . . . . . . . . . . . . . . 62
2.11 This figure demonstrates the speed of convergence of vk. . . . . . . . . . . . . . . . . 62
3.1 U represented by the white ellipse is usually strictly contained in the direct product
Uµ × UΣ which is represented by the rectangle. . . . . . . . . . . . . . . . . . . . . . 68
iv

3.2 An example of uncertainty sets for returns in 2D. Rectangle represents the uncertainty
set for the returns in the model by Tütüncü König [89] whereas the ellipse represents
the a-posteriori uncertainty set that is used for the RRAUR and RSRUR models. . . 74
3.3 These two images illustrate the uncertainty in the robust utility model. In the first
image the quadratic bold line represents the uncertainty for the risk as a function
of the expected returns (RRAUR). In the second image the uncertainty for risk is a
gradient-coloured surface which is a quadratic function of the returns. In the first and
second illustration the uncertainty set in Haldorsson and Tütüncü model is given by
a dashed rectangle and a dashed parallelepiped respectively. . . . . . . . . . . . . . . 75
3.4 These three figures show how each component of the symmetric Σ(µ) varies with µ
restricted to an ellipse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.5 Example of one-dimensional function f(u). A = 0.7, B = 1 and φ = 5. . . . . . . . . 88
4.1 Boxplot of the distribution of φ − φT 2 for RAMP, MMEH, RR, RMMEH, RRM-
MEH and RRAUR with different values of ω. Here φT = φ(µ,Σ), λ = 0.5, the
estimation window is 90 days and the data was obtained from normal distribution.
C = {φ ∈ Rn
: 1T
φ = 1}. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2 Boxplot of the distribution of |U(φ) − UT | for RAMP, MMEH, RR, RMMEH, RRM-
MEH and RRAUR with different values of ω. Here UT = U(φ(µ,Σ)), λ = 0.5, the
estimation window is 90 days and the data was generated under APT assumptions.
C = {φ ∈ Rn
: 1T
φ = 1}. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.3 Boxplot of the distribution of φ − φT 2 for RAMPA, GI, RRAA and LU models for
different values of ω. Here φT = φ(µ,Σ), λ = 0.5, the estimation window is 90 days
and the data was obtained from normal distribution. C = {φ ∈ Rn
: 1T
φ = 1}. . . 102
4.4 Boxplot of the distribution of U(φ) − UT 2 for RAMPA, GI, RRAA and LU models
for different values of ω. Here UT = U(φ(µ,Σ)), λ = 0.5, the estimation window is 90
days and the data was generated under APT assumptions. C = {φ ∈ Rn
: 1T
φ = 1}. 103
4.5 Real data investment strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.1 Optimal size of the uncertainty set for a 200 days investment strategy using RRAUR.
Optimality is calculated based on the maximal return achieved after 200 days. For
this experiment we took λ = 0.5 and worked with the same data as in Chapter 4. . . 121
v

List of Tables
4.1 A table illustrating the average performance on real data, shortselling is not allowed.
λ = 1, ω = 0.6, number of factors is 7 and number of days in the estimation window
is 90. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
is 90. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
is 180. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.4 A table illustrating the average performance on real data using the 130/30 strategy.
is 90. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
is 90. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
is 180. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.7 A table illustrating the average performance on real data with shortselling. λ = 1,
ω = 0.6, number of factors is 7 and number of days in the estimation window is 90. . 115
ω = 0.95, number of factors is 7 and number of days in the estimation window is 90. 116
ω = 0.95, number of factors is 7 and number of days in the estimation window is 180. 117
vi

Chapter 1
Introduction
With the growth of the financial services sector larger volumes of stocks, options and commodities
are traded every day by banks, hedge funds, pensions funds and private investors from all around
the world. Nowadays, many banks offer stock investment services to individuals, making stocks,
options and commodities trading available to anyone. Although entering a stock market nowadays
is relatively simple, it can still be challenging for a market newcomer to take good investment
decisions under highly uncertain and fast-paced market conditions. The ultimate aim of this thesis
consists of making the reader aware of several models within a certain class of investment strategies
that he/she can use for their own investments. After overviewing some classical theory on investment
strategies we shall develop improved investment models that respond more robustly to uncertain
market conditions and numerically compare their performance on historical data. The style of this
thesis will be directed towards individual investors who wish to use mathematical techniques in their
investments. We simply believe that this will make this thesis more understandable and by no means
will restrict our models from being implemented and used in large financial organisations.
Any individual entering a stock market needs to answer a number of questions about the aims of
their investment. These consist of: (i) identifying the purpose of the investment and the investment
time horizon, i.e. for how long does the investor want to be exposed to the market. Once the
investment time horizon has been identified investors should choose whether they require to consume
some part of the investment at regular periods or whether they want to maximise the returns at the
end of the investment horizon; (ii) selecting financially sound assets to invest into; (iii) identifying the
amount of risk that investors are prepared to take on, as higher returns come with higher risks and
(iv) allocating wealth such that higher returns are achieved for the same amount of risk. Portfolios
that provide the best trade-off between returns and risks are called optimal portfolios. This thesis
will primary deal with the theory behind computing optimal portfolios in one-period models in
which no decisions have to be made during the investment period and in which the investor is only
concerned about the total wealth at the end of such period. Nevertheless, let us still describe these
four concepts in greater detail.
1

Investment time horizons are subjectively selected by individual investors and could be measured
in years, months, days, hours, seconds or even milliseconds. Naturally, the market has different char-
acteristics depending on the time scale that the investor is considering and therefore it must be taken
into account when deciding for the best way to allocate one’s wealth. The problem of consumption
of wealth at regular intervals during the investment period is an example of so-called multiperiod
optimisation problems in which portfolio investment decisions are made each time consumption
takes place. Often, such problems are solved via dynamic programming or stochastic optimisation
methods that are outside the scope of this thesis.
One of the most common ways portfolio managers compare their performance, consists of creating
a portfolio, that performs better than a particular stock index - a collection of stocks weighted by
capitalisation, e.g. FTSE100 and SP500 amongst others. One of the problems that portfolio
managers face consists of selecting a small number of assets that can track an index as closely as
possible. Investing in all the assets in an index is not efficient because of large transaction costs that
will occur. Large transaction costs occur for several reasons. Firstly, the broker charges a fee for
transactions. Secondly, because of the bid-ask spread and slippage - which is the difference between
the expected price of a trade and the price the trade actually executes at, money is actually lost on
every stock transaction.
Finding and selecting well-performing assets is a difficult task and there exist multiple strategies
none of which is considered to be the ultimate one. Generally, stock picking strategies are based
on a mixture of fundamental and technical analyses. Fundamental analysis considers investment
highlights of a company based on cash flows, management profile, industry growth among others,
[42]. Technical analysis on the other hand selects stocks based on statistical information obtained
from the market activity such as volume traded, stock prices and any other noticeable trends. By
combining the two techniques one may be able to select stocks that may perform better than others
in the future.
One of the most popular money making opportunities for an individual investor consists of trying
to explore inefficiencies in the markets in order to gain extra return on investment compared to the
risk-free bank rates which guarantee a fixed level of returns in short term, independently of market
conditions. Gaining extra returns requires the investor to take on greater risks of losing part or
even the whole value of the investment. The higher the returns one wants to achieve the higher the
risks that one needs to take on. The existence of this relationship allows investors to seek for an
optimal trade-off between returns and risks subject to various legal and personal requirements on
the portfolio held.
One-period investment problems do not require the investor to take any decisions during the
actual investment time period. Thus at the beginning of an investment, the investor should create
a portfolio that will be optimal at the end of the investment period. Optimality at the end of
2

the investment period is achieved by assuming a certain behaviour of stock returns process. Since
future returns are unknown, it is common to assume a stationary (or covariance stationary), ergodic
stochastic process that drives stock returns. This ensures that the process’ mean and variance do not
change over time and that averaging a series through time one is continually adding new and useful
information to the average. As a result of these assumptions, historical data is used to estimate
average returns and risks for a set of stocks at the end of the investment period, thus allowing the
investor to characterise numerically future returns and risks. Numerical values are then used to
determine the optimal investment portfolio.
Seminal work on one-period portfolio selection was presented by H. Markowitz [63] where portfolio
risks were characterised by portfolio variance. In order to find the optimal portfolio, expected
portfolio returns are maximised subject to a bound on portfolio risks and simple portfolio constraints
which led to a quadratic programming formulation of the portfolio selection problem.
In practice it was shown that classical (based on Markowitz work) optimal portfolios suffer from
bad diversification and from being unstable to slight changes in the estimation of the expected
returns and risks, [10, 17].
Different approaches were developed to tackle these issues. Firstly, researchers imposed additional
constraints on portfolio weights to limit the unnatural extremal behaviour of the classical portfolio,
[43]. Secondly, an accurate estimation of parameters, such as expected returns and risks, has been a
considerable issue. Inability to accurately determine these parameters at the end of the investment
period implies uncertainty in the model. In order to take uncertainty into account, some authors
estimated returns and risks via robust statistical techniques [22, 47]. Another popular approach
consists of applying Bayes theory to parameter estimation. In this theory the investors specify
a prior distribution to the parameter in consideration from which a posterior distribution for the
returns and risks given some historical data is deduced [3, 16].
By acknowledging the existence of uncertainty around parameter estimates, scientists put forward
robust frameworks in which parameter uncertainties in the objective function and in the constraints
are explicitly incorporated into the optimisation problem. The robust optimisation framework ad-
dresses optimisation problems which are subject to parameter uncertainty and aims to find a solution
that is optimal for all possible parameter values in a (closed and bounded) uncertainty set. This
is achieved by requiring the optimal solution to satisfy the constraints for all possible values of the
parameters. As for the objective function, the robust optimisation framework requires the optimal
solution to maximise/minimise the worst case objective function subject to the parameters lying in
an uncertainty set. For this reason, the robust optimisation approach is often called the optimal-
pessimal or the worst-case approach. Because of the high degree of sensitivity of classical portfolio to
errors in the expected returns and risks, robust optimisation paradigm has been applied to portfolio
3

optimisation with success in [7, 36, 39, 60, 89]. In this thesis, we found that robust optimisation mod-
els for portfolio selection that have been mentioned in the literature can be significantly improved
by considering robust problems with smaller uncertainty sets that have a particular structure.
In this thesis we aim (i) to understand situations in which classical approaches fail to provide
stable and robust portfolios. We shall aim to provide investors with a certificate of warning of
when potentially dangerous investments can be made and develop a heuristic based on the classical
portfolio selection model that avoids those dangerous investments; (ii) to understand in what ways
existing robust portfolio selection problems can be improved and as a result develop tractable new
improved robust portfolio selection models; (iii) to use historical data to compare performance of
different portfolio selection models described in this thesis among each other in order to determine
their characteristics and behaviour in real markets.
1.1 Outline
This thesis is structured as follows. Chapter 1 is an introductory chapter in which advances in
portfolio optimisation will be discussed. After introducing classical portfolio selection problems,
we shall discuss in greater detail their practical limitations which often consist of a high degree
of sensitivity to the errors in the estimated input parameters. Portfolio sensitivity to input model
parameters has been tackled by a number of different approaches for which we shall give a brief
overview. This will allow us to move onto the subject of robust optimisation - an area of recent
interest and research. Robust optimisation which was developed to mitigate the effects of parameter
uncertainty has been applied with a high degree of success to portfolio selection problems which we
will discuss and establish basic concepts that will be used throughout this thesis. At last we mention
standard optimisational tools used in calculating portfolios in practice.
In Chapter 2 we consider classical portfolio selection problems and analyse the sensitivity of
computed portfolios as a function of the input parameters which are estimated from the data. We
aim to understand under what conditions the computed portfolio may end up being extremely far
away from the true unknown portfolio. We thus measure the sensitivity of computed portfolios
by the distance from the true unknown optimal portfolio. For a given estimated expected returns
and risks we show that the maximum sensitivity of the computed portfolio is dependent on the
the smallest eigenvalue of the estimated covariance matrix. Based on this result we then develop a
heuristic for portfolio selection problem to reduce the sensitivity in the computed portfolio. This
heuristic is based on maximising the smallest eigenvalue of the estimated covariance matrix and
is developed for different theoretical assumptions on asset returns. For one of the assumptions
on returns, our heuristic results in solving a nonlinear semidefinite problem for solving which we
introduce a sequential semidefinite algorithm and establish its properties.
4

Chapter 3 takes a different approach on the sensitivity of portfolios to the errors in the param-
eter inputs. In Chapter 3 we consider robust optimisation approach to portfolio selection that we
introduce later in Chapter 1. We find that existing robust portfolio selection problems used uncer-
tainty sets for the input parameters were often too large which resulted in portfolios being overly
conservative. We explain how one could obtain smaller structured uncertainty sets under different
theoretic assumptions for asset returns and how this implies a number of different formulations of
the robust portfolio selection problem. These formulations as they are, are not directly solvable, yet
in the second part of Chapter 3 we show how one could reformulate new problem formulations as
conic programming problems that could be efficiently solved on a standard computer.
After having developed several new portfolio selection models, we highlight portfolio characteris-
tics by running an extensive set of experiments both on simulated and real historical data, comparing
our models with existing ones. All the computational experiments are described in Chapter 4 where
we show that our models perform favourably compared to existing models.
We conclude this thesis in Chapter 5 and define the areas for future research.
1.2 Notation
In this thesis, scalars are represented by letter in standard font, e.g. a ∈ R and vectors are represented
by bold letters, such as v ∈ Rn
. General matrices are represented by capital letters in standard
font, e.g. A ∈ Rm×n
. Matrices in which each matrix element is a known real scalar are denoted
by “straightened” capital letters such as A. The identity matrix is denoted by I, the zero matrix
is denoted by 0 and their dimensions will be determined by the context in which it is used. The
j’th column of I is denoted by ej. We shall also be using the vector of ones denoted by 1 where its
dimension will also be determined by the context in which it is used.
We denote by SRn
the space of symmetric n×n matrices. The matrices Eij form a basis for this
space, where Eij = eieT
j + ejeT
i if i = j, and Eii = eieT
i . The “natural” norm to use on SRn
is the
Frobenius norm, which is induced by the inner product
A, B , A, B ∈ SRn
where A, B denotes the trace of AB. Frequent use will be made of properties of the trace, such as
trace(AB) = trace(BA).
For a real symmetric matrix A ∈ SRn
we denote by γ1, . . . , γn, where γ1 ≥ · · · ≥ γn the
eigenvalues of A. We also define γmax = γ1 and γmin = γn. The expression A 0 means that A
is positive definite, and A 0 means that A is positive semidefinite. The symmetric and positive
semidefinite matrix square root of a symmetric positive semidefinite A will be denoted by A1/2
. We
shall use a couple of norms on the vector space Rn
. These are the two norm on some v ∈ Rn
defined
5

as
v2 =
√
vT v
and
vG =
√
vT Gv
where G 0.
Random vectors will be denoted by bold capital letters, e.g. R ∈ Rn
. Often, we will be using
R to denote a vector of random asset returns. The vector φ ∈ Rn
denotes portfolio weights, i.e.
the proportion of wealth invested in each asset and λ ∈ R will denote the risk-aversion parameter.
We assume that µ = E(R) and Σ = Cov(R) = E(R − µ)(R − µ)T
. We denote maximal likelihood
estimators by a bar over the vector to be estimated such as ¯µ or ¯Σ.
For any matrix M ∈ Rw×v
we define the vec(M) ∈ Rvw
as a vector of columns of M placed one
underneath each other. We also define mat : Rvw
→ Rv×w
as the inverse operator to vec : Rv×w
→
Rvw
such that mat(vec(M)) = M.
Unless otherwise stated, in this thesis, n will usually denote the number of assets, m will denote
the number of macroeconomic factors that drive asset prices and k will usually stand for the number
of daily historical returns used in the estimation of risks and expected returns.
1.3 Overview of portfolio selection models
The benefits of portfolio diversification were known to people since as early as c. 200 BC. Early
references to the idea of diversification can be found in the Talmud which gives a formula to split
one’s assets into thirds: one third in business, one third kept liquid and one third in land.
In Britain, advantages of keeping money in different places is mentioned by William Shakespeare
at the end of 16th century. Indeed, in The Merchant of Venice (1596-1598), Act I, Scene I, Antonio
says:
My ventures are not in one bottom trusted,
Nor to one place; nor is my whole estate
Upon the fortune of this present year;
Therefore, my merchandise makes me not sad.
Act I, Scene 1
telling us that it is not wise to put all the eggs into one basket. On the same lines Captain Long
John Silver in Treasure Island (1883) says
I puts it all away, some here, some there,
none too much anywheres, by reason of suspicion.
Chapter 11. Robert Louis Stevenson
6

Moreover, a famous Dutch-Swiss mathematician, Daniel Bernoulli, in his famous 1738 article about
the St. Petersburg Paradox [9] argues by example that risk-averse investors will want to diversify:
“(...) it is advisable to divide goods which are exposed to some small danger into several portions
rather than to risk them all together”. However, it is not until 1952 that the notion of risk and
returns were quantified and put into perspective by Harry Markowitz.
1.3.1 Classical models: Markowitz approach
The most important model in portfolio selection of the 20th century was proposed by H. M.
Markowitz as early as 1952 in his seminal work [63], which was later extended in [64, 65]. Markowitz
considered the problem of finding an optimal portfolio maximising the expected portfolio return for
some level of portfolio risk over a fixed period during which there will be no changes to the portfolio.
In [63], portfolio risk is modelled by the variance of the portfolio return and this allowed Markowitz
to reformulate its portfolio selection problem as an optimisation problem which we will describe
below.
Start, by assuming that a particular investor has w0 of wealth at the time that the portfolio has
to be created. The investor wishes to invest into a predefined set of n assets. Since the potential
portfolio has to be held without any changes for a fixed amount time, called the investment period,
investors need to make sure that their portfolio is optimal at the end of the investment period. Thus,
assume that the asset returns at the end of the investment period are described by a random vector
R ∈ Rn
and let the vector φ ∈ Rn
represent the proportions of wealth that should be invested in each
asset. At the end of the investment period we define the portfolio return to be Π = (w0RT
φ)/w0,
which is equal to RT
φ. Therefore, the expected portfolio return is then µT
φ and the portfolio
variance is given by φT
Σφ, where µ = E(R) and Σ = Cov(R) = E(R−µ)(R−µ)T
. The Markowitz
problem that we will call the classical model and which is also known as the mean-variance problem,
can be expressed as an optimisation problem of the following kind:
max
φ
µT
φ (1.1)
s.t. φT
Σφ ≤ σ2
targ
φ ∈ C.
The value σ2
targ is chosen subjectively by the investor and represents exposure to risk that the
investor is willing to take. A feasible set C for the portfolio weights is taken to be a simple affine
convex set C := {φ ∈ Rn
: b − Aφ ≤ 0} described by q linear constraints, where b ∈ Rq
and
A ∈ Rq×n
. Although C does not have to be an affine set, generally linear constraints can describe a
number of situations in which investors wish to constrain their portfolio weights. Some of the most
important situations are outlined below:
7

• Budget constraints. These constraints specify how much of the total wealth is available to
invest into assets. Usually these constraints are of the following form
1T
φ = 1,
meaning that all the available wealth is invested into assets.
• Short-selling constraints. Unless specified otherwise, portfolio weights can be negative. This
is an example of short-selling, i.e. selling a portion of stock that the investor does not own.
A system of shares lending has been set up in order to allow large financial institutions to
short-sell stock. Yet, a small calibre investor may not be able to short-sell and therefore one
needs to restrict portfolio weights to positive values only,
φi ≥ 0, i = 1, . . . , n.
• Diversification constraints. In certain situations, an investor may want to artificially limit the
proportion of wealth that is invested in one asset with the aim of avoiding investing all the
money in one asset. Mathematically, that would amount to
φi ≤ 0.6, i = 1, . . . , n.
In this example not more than 60% of wealth may be invested in each asset. In a similar
manner, the investor may wish to specify the minimum proportion of wealth to be invested in
each asset.
• Turnover constraints and transaction costs. Turnover is defined as the total amount by which
the portfolio changed over an investment period as a ratio of the portfolio value at the beginning
of the investment period. Let w0 and φ(0)
be the total wealth and the proportions invested in
each asset at the beginning of an investment period. Let r be the actual realisation of returns
at the end of the investment period. Therefore, one sees that the portfolio value at the end of
the investment period is w1 = w0rT
φ(0)
and portfolio turnover is defined as
n
i=1

w0riφ
(0)
i − w1φi

w0
(1.2)
where φi is the proportion of wealth to be invested in the i’th stock at the end of the investment
period, i.e. at the beginning of the next investment period. The expression is (1.2) is a convex
function of φ and thus (1.2) defines a convex set if one bounds (1.2) by above. This particular
convex set can be reformulated as a series of linear constraints by augmenting the space of
decision variables to (φ, ψ) ∈ R2n
and by imposing additional constraints
−ψi ≤ w0riφ
(0)
i − w1φi ≤ ψi, i = 1, . . . , n. (1.3)
8

Let b ∈ R be an upper bound on the portfolio turnover in (1.2), then the portfolio turnover
constraint can be reformulated as a series of linear constraints
n
i=1
ψi
w0
≤ b
−ψi ≤ w0riφ
(0)
i − w1φi ≤ ψi, i = 1, . . . , n.
Transaction costs occur because transactions are processed by brokers who charge a small fee
for their work that is based on the number or volume of the transactions. There exist two main
types of transaction costs. The first type consists of charging the investor a fixed price for
each transaction (15) and this type of transaction costs are widely used in banks that provide
investment services to individuals. The second type consists of charging the investor based
on the total volume of the transactions or portfolio turnover as defined in (1.2). The fee in
this case is usually a small proportion of the turnover (around 1-2%). Second type transaction
costs can be described by linear inequalities in a similar manner to the turnover constraints.
• 130-30 strategy. Such strategy is relatively popular strategy among practitioners in which
short-selling is not allowed to exceed 30% of the portfolio value in order to leverage the pur-
chasing of other assets. In this strategy it is assumed that all wealth is invested and that
extra money obtained as a result of short-selling is used to buy more stock. The following
constraints would be used in order to define this strategy,
n
i=1
|φi| ≤ 1.6, (1.4)
1T
φ = 1. (1.5)
To see that these constraints represent the 130-30 strategy, without loss of generality let φ−
i
and φ+
i be the negative and the positive parts of φi. Assume also that there are n1 assets
that are sold and n2 assets that are bought. Naturally, n1 + n2 = n. Our aim is to show that
under (1.4) and (1.5) constraints,
n1
i=1

φ−
i

≤ 0.3. Let θ :=
n2
i=1 φ+
i − 1. From (1.5), θ is
non-negative. From the same equation it can be deduced that θ = −
n1
i=1 φ−
i =
n1
i=1

φ−
i

.
From (1.4), 1.6 ≥
n
i=1 |φi| =
n
i=1

φ−
i

+ 1 + θ = 2
n
i=1

φ−
i

+ 1 which implies the result.
Above, we have described some of most commonly used constraints, however we want to highlight
that there exist endless possibilities for the convex set C.
Up until now, we have only considered investments into risky assets. It is likely, however, that
investors will keep part of his investment in a bank account or government bonds, or alternatively
borrow money from a bank to finance their portfolio. The classical Markowitz problem (1.1) can be
readily adjusted in order to incorporate a risk-free asset. Let rf be the return on the risk-free asset
over the investment period and let φ0 be the share of wealth in the risk-free asset. The expected
9

portfolio return is thus rf φ0 + µT
φ. The risk-free asset has variance 0 by definition and does not
correlate with any other risky assets. Thus portfolio variance is

φ0
φ
T
0 0
0T
Σ

φ0
φ

= φT
Σφ, (1.6)
i.e. the same portfolio risk as if the riskless assets did not exist. This is natural since the riskless
asset does not bring extra risk.
Equivalence of formulations
Markowitz classical mean-variance problem has several equivalent formulations in a sense that under
suitably chosen user-defined parameters the optimal portfolios are the same for each formulation.
In particular (1.1) is equivalent to minimising portfolio risk under constraints on the returns,
min
φ
φT
Σφ (1.7)
s.t. µT
φ ≥ µtarg
φ ∈ C.
where µtarg is a user-specified parameter. This is a different kind of investment problem where the
investor specifies the amount of required return and minimises the total portfolio risk within a set
of feasible portfolios. Such a formulation is attractive to investors because it allows the specification
of user parameters in terms of the minimal bound on the expected portfolio return which is easier
to understand than a bound on portfolio risk as in (1.1).
An alternative formulation consists of optimising a function in which portfolio returns are in
competition with portfolio risks. An equivalent formulation for problems (1.1) and (1.7) is
max
φ
µT
φ − λφT
Σφ (1.8)
s.t. φ ∈ C,
for a suitably chosen λ where the objective function is an example of a utility function - a measure
of preference of one portfolio over another. In other words if for a particular portfolio φ(1)
the value
of the utility function is greater than the value of the same utility function for φ(2)
then the investor
prefers portfolio φ(1)
over portfolio φ(2)
. The optimisation problem of the type (1.8) allows to find
a portfolio with the largest value of the utility function. We need to mention that the choice of
utility function is not unique and there exist an infinite number of utility functions that need to
satisfy certain criteria. Utility theory is outside the scope of this thesis, however we would direct
the interested reader to [41, 73] for information on the financial decision making and utility theory.
Equivalence of (1.1), (1.7) and (1.8) can be very easily deduced from the Karush-Kuhn-Tucker
(KKT) conditions [38, 82] which are necessary for a solution in nonlinear programming to be optimal.
Moreover, the optimal solution is global and unique.
10

Efficient frontier
An efficient frontier is a good way of illustrating the best achievable trade-off between the optimal
portfolio return and portfolio risks using a particular portfolio selection model. The trade-off is
represented by a 2D graph with optimal portfolio risk on the x-axis and optimal expected portfolio
return on the y-axis. With every value of portfolio risk there is an associated maximum value
of expected portfolio returns. The function that links the two is called an efficient frontier. An
illustration of the efficient frontier can be seen in Figure 1.1.
Figure 1.1: An illustration of an efficient frontier with explanations.
In the case of the classical investment model given by (1.1), where feasible portfolios only satisfy
the budget constraints, it can be shown that the efficient frontier is a parabola, [67]. Optimal
(efficient) portfolios lie by definition on the efficient frontier, since for a given amount of portfolio
risk portfolio returns are maximised. Therefore no portfolio can have risk/return bundle above the
efficient frontier. On the other hand individual assets with a certain level of expected returns and
risks lie under the frontier. The capital allocation line (CAL) is the line of expected return plotted
against risk (standard deviation) that connects all portfolios that can be formed using a risky asset
and a risk-less asset. It can be proved that it is a straight line intersecting the y-axis at rf and
passing through the risk/return values of the risky asset. Therefore, the best possible portfolios that
can be obtained by investing into a risk-free asset and risky portfolio lie on the best CAL, which
tangents the efficient frontier. The portfolio at which the best CAL joins the efficient frontier is
called the tangent or market portfolio which has the largest Sharpe ratio (defined in Section 1.3.1).
By holding a portfolio which consists of a linear combination of the market portfolio and the risk-free
asset the investor can guarantee a higher value of the expected return on his portfolio than holding
a purely risky portfolio for the same amount of risk.
11

Looking at the efficient frontier can be a good way of comparing different investment models. If
one model produces an efficient frontier that is higher than the efficient frontier of another model
under the same investment conditions then the investor will prefer to chose the model that gives
more return for the same amount of risk.
The Sharpe ratio
Another way of looking at the performance of an optimal portfolio is to consider the notion of Sharpe
ratio, [80, 81], which measures the excess portfolio return per unit of portfolio risk. The Sharpe ratio
for portfolio φ is defined as
µT
φ − rf

φT Σφ
, (1.9)
where rf is the risk-free rate. Sharpe ratio can be used to determine the attractiveness of one
portfolio over another. Indeed, a portfolio with the largest Sharpe ratio gives the investor more
return over the risk-free rate per unit of risk than another portfolio with smaller Sharpe ratio. This
is why investors may be interested in finding a portfolio that maximises the Sharpe ratio,
max
φ
µT
φ − rf

φT Σφ
(1.10)
s.t. φ ∈ C.
Problem (1.10) is a non convex problem and finding an optimal portfolio directly can be difficult.
Fortunately, in situations when C is defined at least by the budget constraints (1.5), problem (1.10)
can be turned into a (1.1) problem using a so-called “space lifting technique”, [18], that artificially
increases the dimension of the problem space. As a result a non-convex problem (1.10) was reformu-
lated as convex problem (1.1) in a higher dimensional space. The optimal solution to the resulting
convex optimisation problem coincides with the optimal solution to (1.10). Constructing a convex
problem whose solution coincides with the optimal solution to a non-convex problem is an impor-
tant concept which will often be used in relationship to the robust portfolio selection problems in
Chapters 3 and 4.
CAPM
The Capital Asset Pricing Model (CAPM) was developed from mean-variance portfolio selection
and is an equilibrium asset pricing model. Under the assumptions [26] that
• investors make decisions based on the expected return and variance of returns,
• investors are rational and risk-averse,
• investors follow classical Markowitz model for portfolio diversification,
• investors all invest for the same period of time,
12

• investors have the same expectations about future returns and variance of all assets,
• there is a risk-free asset and investors can borrow or lend any amount at the risk-free rate,
• capital markets are perfectly competitive and frictionless,
it was deduced that every rational investor holds a proportion of his wealth both in the risk-free
asset and the market portfolio. As a result the return on any asset at time t can be modelled as a
linear function of the return of the market portfolio at time t,
ra,t = αa + βarM,t + a,t, (1.11)
where t is an error term distributed normally with zero mean and variance σ2
a. The variance term
σ2
a is called specific risk and is proper to a single asset. The term βa is known as systematic risk of
holding the market portfolio. Thus, under returns model (1.11) the portfolio risk will be described
by βa and σ2
a for different values of the index a. Specific risk is also known to be a risk that can be
diversified away by investing into many assets, whereas systematic risk can not be diversified away.
To see that assume that we invest equally into n assets in which i,t are independent identically
distributed normal random variables with expectation zero and same variance σ2
. The variance of
the resulting portfolio is
Var

1
n
n
a=1
ra,t

=
1
n
σ2
+
1
n2
Var

rM,t
n
a=1
βa

=
1
n
σ2
+ σ2
M,t
n
a=1
βa
n
2
, (1.12)
where σ2
M is the variance of the market portfolio returns. It can be seen from (1.12) that as n → ∞
specific risk disappears and systematic risk stays non zero since beta’s are bounded away from zero
by initial assumptions.
Asset Pricing Theory
The CAPM is often contrasted with the Asset Pricing Theory (APT), which acknowledges the ex-
istence of m sources of risk for which the investor should be rewarded. It states that the expected
return of a financial asset can be modeled as a linear function of various macro-economic factors,
where sensitivity to changes in each factor is represented by a factor specific beta coefficient. Math-
ematically, an asset return ra,t at time t is given by
ra,t = αa + β1r1,t + β2r2,t + · · · + βmrm,t + t, (1.13)
where ri,t, (i = 1, . . . , m) are the returns on factors and t is the source of specific risk at time t.
The APT allows for an explanatory model of asset returns, and assumes that each investor will
hold a unique portfolio with its own particular array of betas, as opposed to the identical market
portfolio of CAPM. The factors themselves are likely to change over time and between economies
and can represent stock indices, oil prices, interest rates and other macroeconomic indicators.
13

It has been seen in the literature, [37] that the asset returns that is given by (1.13) are said to
follow the “factor model” or the “multifactor CAPM model”.
1.3.2 Return models
Literature on portfolio optimisation often considers stock return process Rt is to be described by
some random walk model. Random walk model for stock returns appears naturally in finance. In
random walks models time is split into equal intervals, t1, . . . , tn at the end of which stock prices
Pt are observed. Usually investors consider adjusted closing prices at the end of daily, monthly or
quarterly periods. Returns for period over t period are defined as
Rt =
Pt − Pt−1
Pt−1
(1.14)
representing a percentage change over the value of the stock price in the previous period Pt−1. There
are two major frameworks for modelling the returns.
• Arithmetic random walk framework. Under this framework, the returns change according to
the arithmetic random walk model,
Rt = µ + ηt (1.15)
where µ is a constant and ηt is a white noise term. White noise is a sequence of independent
identically distributed (IID) random variables with zero mean and finite variance. Often white
noise term is replaced by a normal distribution t with variance σ2
so that
Rt = µ + σt. (1.16)
making the returns be distributed according to the normal distribution. Normality assumptions
were introduced for simplicity, however there has been empirical evidence, [29], that ηt in (1.15)
is described best by Student’s T-distribution.
• Lognormal framework. Under this framework, let pt := log Pt be the logarithm of the stock
prices. By defining rt := log(1 + Rt) the lognormal returns can be modeled by a random walk
process with constant µ and white noise η,
rt = pt − pt−1 = µ + ηt. (1.17)
Often ηt is replaced by σt where ∼ N(0, 1) and E(ts) = 0 for t = s. Also if logarithmic
returns are normally distributed then gross returns are distributed lognormally. As in the
arithmetic random walk model framework there has been evidence [29] that log normal returns
are better described by a T-distribution.
14

Generally, the returns (or lognormal returns) process is considered to be stationary or covariance
stationary whereas the price process is not. A process rt is covariance stationary if
E(rt) = µ = constant ∀t, (1.18)
Var(rt) = σ2
= constant ∀t, (1.19)
Cov(rt, rt+s) = f(s), ∀t, (1.20)
for some deterministic function f(s). Covariance stationarity essentially tells us that the means and
variances/covariances do not change over time. Stationarity, on the other hand, suggests that the
whole distribution of the returns does not change over time. Moreover, it is often assumed that
the returns process is an ergodic process. An ergodic process is one which conforms to the ergodic
theorem. The theorem allows the time average of a conforming process to equal the ensemble
average. In practice this means that statistical sampling can be performed at one instant across a
group of identical processes or sampled over time on a single process with no change in the measured
result. Above assumptions on the returns allow us to estimate future expected returns and variance
risk by sampling historical asset returns.
Extension to multiple assets
Previously, we only looked at the return models for one asset and we need to extend assumptions
made on the returns for one asset to multiple assets. Following the analysis in Section 1.3.2 one can
model returns (lognormal returns) for n assets by a multidimensional random walk,
rt = µ + Σ1/2
t, (1.21)
where ∼ N(0, I). Thus assets returns are modelled by a multivariate normal distribution with
expectation µ ∈ Rn
and covariance matrix Σ ∈ Rn×n
. In other words,
rt ∼ N(µ, Σ). (1.22)
In this thesis the arithmetic random walk framework will be used to estimate mean returns and
variances of individual stocks. In practice, lognormal framework can be easily adapted for empirical
calculations.
Multiple assets within the APT
Asset Pricing Theory (APT) allows the investor to chose market factors such as oil prices or interest
rates to be the main sources of risk for the behaviour of the asset prices. Let there be m factors and
denote the factor returns at time t by ft ∈ Rm
. It is assumed that the factors follow multivariate
normal random walk, i.e. ft ∼ N(η, F), F ∈ SRm
. Under APT, the corresponding returns rt for n
assets linearly depend on the dynamics of ft,
rt = α + VT
ft + . (1.23)
15

In the equation (1.23), α ∈ Rn
is a vector of constants, V ∈ Rm×n
represents a matrix of factor
loadings and ∼ N(0, D) independently of ft representing the asset specific risk or residual. The
n × n matrix D is a positive diagonal matrix of values. Thus, returns rt have a multivariate normal
distribution with mean α + VT
η and non-singular symmetric covariance matrix VT
F V + D.
The return models described by (1.22) and (1.23) form the foundation of this thesis and thus be
used permanently throughout.
Risk measures
Up until now we have only talked about one risk measure - the portfolio variance. In practice many
other risk measures have been developed and used with various success. Below we shall overview
some most common risk measures used in finance. Let Π be defined as the portfolio return.
• The variance measure. Risk is measured by the variance Var(Π) of the portfolio return at
the end of the investment period. Essentially, the riskiness of the investment is determined
by how much portfolio return spreads around its mean. This is a plausible risk measure since
higher portfolio variance means that there is a higher probability of losing a larger share of
the investment. The use of the variance as a measure of risk can be motivated from stochastic
calculus theory. Often an asset price St at time t is assumed to follow the following differential
equation
dSt
St
= µt + σWt, (1.24)
governed by the Brownian motion Wt. By applying Ito’s lemma to (1.24) we can obtain an
expression for St in terms of the Brownian motion Wt,
St = S0e(µ− 1
2 σ2
)t+σWt
. (1.25)
After recalling the definition of log returns, log(St/S0), the expected log return for an asset is

µ −
1
2
σ2

t. (1.26)
One can notice that because of the uncertainty in the stock prices the expected returns have
been reduced by a half of the variance of log returns. Thus, variance measure is a good measure
of risk as it penalises the expected returns.
Alternatively, since high values of the variance imply higher probabilities for high returns, the
variance measure can be suitable for an investor who is seeking stable average returns on his
investment. This risk measure, on the other hand, may not be the best one to use for investors
who are only concerned with the downward trends of their portfolio value.
16

• Value-at-Risk (VaR) measure. Given a probability β ∈ (0, 1) and some user-defined portfolio
loss distribution LΠ, e.g. negative returns (LΠ = −RT
φ), β-VaR is the level of loss, such that
any loss exceeding β-VaR, occurs with probability smaller than β,
β-VaR = inf {p ∈ R | P(LΠ p) β}. (1.27)
To get a better understanding, β-Var can be viewed as the β-quantile of LΠ. Value-at-Risk
measure is an example of a downside risk measure which considers the values of loss and places
no emphasis on the gains. Minimising value-at-risk consists of minimising a potential portfolio
loss that can happen with probability β.
It was shown in the literature [76] that Value-at-Risk can destroy the convexity in some convex
optimisation problems, which led to defining a class of coherent risk measures - measures
with nice properties for optimisation [2, 76]. An alternative measure to β-VaR is called the
Conditional Value-at-Risk, β-CVaR, measuring the expected loss on a portfolio that can occur
under a probability β,
β-CVaR := E [LΠ | LΠ β-VaR] . (1.28)
CVaR is answering the question “in the unlikely event of large losses, what will be the average
loss?”. By definition of β-CVaR in (1.28), β-CVaR is larger than β-VaR and hence small values
of CVaR guarantee small values of VaR. Moreover, CVaR can be viewed as a measure of how
heavy the tail of the portfolio loss distribution is. This is the reason CVaR is also known as
the Mean Excess Loss, Mean Shortfall or Tail VaR and it is considered as a more consistent
measure of risk in a context of [74]. Artzner et al. [2] define and discuss the properties of the
coherent risk measures.
Value-at-Risk and Conditional Value-at-Risk measures have been widely used in the context
of portfolio selection [14, 55, 61, 77].
• There exists a number of other similar risk measures used less often in the financial literature.
One example would be the Omega risk measure introduced in [15] and which reflects all
the statistical properties of the returns distribution, in other words all the moments of the
distribution are contained within that measure. Omega measure is described in detail in
[28, 49, 88].
For other risk measures used in finance, a good overview can be found in [86]. These risk
measures are outside the scope of this thesis.
This thesis focuses on a class of portfolio selection problems in which portfolio risk will be
measured by the portfolio variance.
17

1.3.3 Advances in portfolio optimisation
Since Markowitz seminal work in 1952, [63], much attention has been drawn to analysing the charac-
teristics of classical optimal portfolios, as well as its behaviour in real markets. A number of different
authors found that classical optimal portfolios are sometimes unnatural and suffer from time instabil-
ity, bad diversification and sensitivity to estimation error. In what follows we shall discuss potential
drawbacks of classical Markowitz models, finishing by overviewing a series of approaches that were
designed to tackle these issues.
Issues with the classical Markowitz model
There are many articles in the literature about problems related to the classical portfolio selection
problem which is based on the sample means and covariances for asset returns. Typically, a portfolio
that is computed based on these assumptions performs poorly out-of-sample on the real data [12,
17, 32, 33, 57, 69].
It had been identified that poor results are related to the errors in parameter estimates resulting
in very unstable optimal portfolios. In particular, it is known that it is more difficult to estimate
means than covariances as asset returns [68] and that errors in the estimates of the mean have a
larger impact on the portfolio than the errors in the estimation of the covariance. This result was
obtained by Chopra and Ziemba in [17] who showed that a slight change in the estimate of the
mean returns can have a big impact on the optimal portfolio. They have shown also that errors
in the estimates of the mean are ten times more important than the errors in the estimates of the
variances or the covariances. A somewhat similar result was shown by Best and Grauer in [10],
where the authors investigate the effect of changes in the means of individual assets and the effect
on the sensitivity of mean-variance(MV)-efficient portfolios. That article analyses to what extent
changes in the mean affect the mean, variance and the weights of the optimal MV portfolio.
In general, accurately estimating the input parameter is a very daunting task as there is never
enough data to be able to do so with a high degree of accuracy, thus uncertainty in the estimates plays
a very important role in making simple models perform worse. It was noticed by many authors that
the optimal portfolio is sometimes not well diversified and goes against the intuition of practitioners.
Moreover it has been noticed that optimal portfolios seem to be unstable over time. A review of
many problems can be found in [3, 69]. The effects of disregarding uncertainty are described in more
detail in [45, 46, 47, 69]
Imposing additional constraints
In order to overcome the problem of bad portfolio diversification, in which most of the wealth
is invested into a small number of assets, practitioners and academics thought of imposing extra
artificial constraints on each portfolio weight in order to increase diversification. As an example, one
18

could require to have at least 2% of the total wealth in each asset (assuming that one is investing
into 50 assets or less). Alternatively, one could restrict the share of wealth to be invested in a single
stock, e.g. 50% of the total investment value. Such constraint guarantees that not all eggs are put
into one basket, thus increasing diversification. Although imposing additional constraints does not
seem like a consistent approach, Jagannathan and Ma in [43] demonstrate the benefits of imposing
extra “wrong” constraints. As correctly mentioned by Frank Lutgens in his PhD thesis [62], this
approach is “merely a treatment of symptoms by limiting the harm the uncertainty may do”. This
approach of constraining weights has been applied by Frost and Savarino in [33] and recently by
DeMiguel et al. in [20].
Robust Statistics
The challenge in the classical portfolio selection consists of estimating the true risk and returns as
accurately as possible. It was described in Section 1.3.3 why the uncertainty around the estimates
can make the resulting optimal portfolio suboptimal and unreliable in practice. Uncertainty in
estimates results from several sources: 1) from a limited number of data points and 2) from outliers
that correspond to the misspecification of the probabilistic model for future returns.
Robust statistic theory tries to address the problem of slight model misspecification that is
assumed prior to any statistical analysis. Robust statistics tends to produce more robust estimates
to the model outliers thus making them more attractive in the uncertain world of financial markets.
Important concepts of the robust statistic theory are the breakdown point, (influence function) and
the M-estimators. The breakdown point is a useful tool to check how robust a particular estimator
is and is not used to find the most robust estimator. The breakdown point of an estimator is defined
as the proportion of observations that have to change in order to make the estimator arbitrary large.
The breakdown point is a number between 0 and 0.5 with 0.5 standing for the maximum robustness
of an estimate. For example, the sample mean has a breakdown point of 0 since any change in any
value of the realisation will make the sample mean change. On the other hand the sample median
has a breakdown point of 0.5 since at most half of the observations have to change in order to make
the sample median arbitrary large. A overview of the breakdown point can be found in the book by
Rousseeuw and Leroy [78].
One of the ways that one can obtain a robust estimate is by calculating a so-called M-estimators.
Unlike maximum likelihood estimators, which are obtained by minimising

i − log f(xi) for ob-
servations xi with density function f(xi), M-estimators are minimising a more general function

i ρ(xi) where ρ is some function. There are many different functions for ρ that one may use, e.g.
squared errors, absolute errors, biweight, Winsorizing at a given level and many more. M-estimators
are considered as more robust versions of the classical estimates and may also not be unique. In
19

portfolio selection estimating risks via M-estimators was approached by DeMiguel and Nogales in
[22]. Other robust statistical methods were applied in [13].
Shrinkage estimators
Robust estimates can often be hard to work with. The standard alternative is to use estimators for
which the likelihood function is maximised. It had been shown, however, that in certain cases if the
number of assets is greater than three, these estimators can no longer be admissible, i.e. suboptimal
with respect to the mean squared error, MSE(ˆθ), of an estimator ˆθ with respect to an estimated
parameter θ. The MSE is deﬁned as
MSE(ˆθ) := E((ˆθ − θ)2
). (1.29)
Minimising the MSE gives rise to the so-called “shrink” estimators, or James-Stein estimators [44,
84], which are admissible in the sense that no other estimator dominates it (has smaller MSE).
A particular property of the shrink estimators is that they are a weighted average of the sample
maximum likelihood estimator and some constant. This is why the estimator is said to shrink
towards a constant value. These James-Stein estimators can be generalised in a way that James-
Stein shrink estimators for means and variances are represented by a linear combination of the
maximum likelihood estimators and some target value. In particular, the Shrinkage estimator for
the mean could take the following form
µs = δµ0 + (1 − δ)¯µ, (1.30)
for 0 δ 1 and where ¯µ is the sample means and µ0 is some target constant. Thus, the shrink
estimator in (1.30) shrink the sample means towards a common value, usually taken as the grand
mean for all assets. A similar expression
Σs = δΣ0 + (1 − δ)¯Σ, (1.31)
was proposed in [32, 54] for the shrink covariance matrix estimator with 0 δ 1, some shrinkage
target Σ0 and sample covariance matrix ¯Σ. In order to obtain µs or Σs one needs to select the
shrinkage target, calculate the sample estimates and use the data to estimate the shrinkage parameter
δ. Once the estimates for expected returns and covariance matrix have been obtained, they can be
used as input into the classical portfolio optimisation model. This has been investigated by Jorion
in [47]. James-Stein estimators have a lot in common with the estimates obtained via Bayesian
procedure.
20

Bayesian approach
Bayesian methods in selecting the best estimate for future returns and risk have been one of the
most popular among practitioners. One of the reasons this methodology is attractive consists of
allowing practitioners to specify a so called prior belief distribution about the parameters of interest
and combine it with the information obtained from the historical data sample. More rigorously,
let f(θ, x) be the density function for a set of measurements x1, . . . , xk of some random variable.
Let p(θ) denote the distribution of the parameter θ called the prior distribution. The posterior
distribution g for the parameter θ is defined by
g(θ | x1, . . . , xk) ∝ p(θ)

i=1
f(θ, xi). (1.32)
It can be shown that the Bayes (posterior) estimate for θ in certain cases takes the form described
by the equation (1.30) and thus is very related to the shrink estimators. To see that similarity in
the case of one asset return assume that the future return over some period is distributed according
to the normal distribution N(θ, σ2
) where θ is unknown and σ2
is considered known. It is believed
by the investor that the mean θ is itself described by a normal distribution with known parameters
µ and τ. Assume that there is only one realisation X from N(θ, σ2
). It can be shown that the
posterior estimate for θ is given by
ˆθB
= [σ2
/(σ2
+ τ2
)]µ + [τ2
/(σ2
+ τ2
)]X (1.33)
which corresponds to (1.30). You can see that the Bayes estimate made the estimate X shrink
to the true population mean µ according to how much the investor believed in the accuracy of
µ. These results can be generalised to unknown σ2
and unknown µ and τ which can be explicitly
estimated from the data. Such procedure is called empirical Bayes. A good introduction to empirical
Bayes is done by G. Casella in [16] whereas Efron Morris in [23] give an excellent overview of
interrelationship between Bayes methods and shrink estimators.
Posterior estimates of θ, deduced from equation (1.32) can be used as input to the classical
Markowitz portfolio selection model (1.1). Many authors like Klein and Bawa [51], Frost and Savarino
[32] provided Bayesian approach to efficient portfolio selection. There has been a wide range of
publications in which Bayesian methods have been applied such as [48, 71, 75, 83].
Robust optimisation
One of the approaches we consider in this thesis to reduce sensitivity of portfolios to the errors in
the parameter estimates is that of the robust optimisation [5, 6], which is based on the assumption
of a priori knowledge of a reasonable range of values (a so-called uncertainty set) for the uncertain
model parameters, and on the aim of finding solutions that are guaranteed to perform reasonably
21

well for all (or most) possible realisations of the uncertain model parameters within this uncertainty
set.
To render the conceptual framework of robust optimisation more concrete, let us consider a
generic optimization problem in which both the objective function f(x, p) and the feasible set Xp
depend on a vector p of model parameters,
max
x∈Rn
f(x, p) (1.34)
s.t. x ∈ Xp.
When p is known, (1.34) is a standard optimisation problem. Robust optimisation is concerned
with the case where p is uncertain but known to lie in an uncertainty set U . This set can be
thought of as representing the set of possible scenarios or realizations for the parameters p. Robust
optimisation formulations optimise some worst-case performance metric, where the “worst-case” is
computed over the uncertainty set. In most cases [25, 37, 39], the objective is to optimise the worst-
case realization of the objective function. For the optimization problem (1.34), this leads to the
following formulation:
max
x∈
T
p∈U Xp

min
p∈U
f(x, p)

. (1.35)
In a practical framework, uncertainty sets would often be formed on the basis of different personal
opinions, on estimates using different forecasting models, on confidence regions of statistical estima-
tors, on Bayesian or Kalman filtering methods for tracking the evolution of an assumed probability
distribution for p, and on other related ideas. While the current literature does not provide clear
guidelines on the construction of uncertainty sets, their shape often reflects the sources of uncertainty
while their size depends on the desired level of robustness. The choice is not completely free however,
as mathematical considerations make it desirable to choose an uncertainty set that renders the ro-
bust problem (1.35) tractable (polynomial-time solvable). Typical families of sets that render robust
linear or quadratic programming tractable1
are the following (see [7]): (i) U = {p1, p2, . . . , pk} (a
finite set of scenarios), (ii) U = conv(p1, p2, . . . , pk) (a polytopic set), (iii) U = {p : l ≤ p ≤ u}
(intervals), and (iv) U = {p : p = p0 + Mu, u ≤ 1} (an ellipsoidal set), (v) direct products of
sets of type (i)–(iv), (vi) unions and intersections of sets of type (i)–(v).
Robust problems of type (1.35) are examples of semi-infinite problems in which infinitely many
constraints are imposed on finitely many decision variables. Problems of this kind are hard to solve
and currently there does not exist a general algorithm that we know of for solving these problems
efficiently. Yet, a number of tools have been applied in order to obtain an optimal solution to (1.35).
The first one consists of deducing analytically the optimal value of the inner optimisation problem
p∗
(x) = argminp∈U f(x, p) (1.36)
1Strictly speaking, the robust problems in question have convex quadratic relaxations the optimal solution of which
is provably feasible for the original problem.
22

by keeping x fixed thus reformulating robust optimisation problem (1.35) often as a nonlinear opti-
misation problem
max
x∈
T
p∈U Xp
f(x, p∗
(x)). (1.37)
In certain problems a solution to the problem (1.37) can be found using optimisation techniques in
convex or nonlinear programming.
An alternative tool consists of constructing a convex problem for which a unique solution exists
and can be found very efficiently, in such a way that this solution coincides with the optimal solution
to the robust problem (1.35). This elegant approach is in the heart of some publications in robust
financial mathematics, [37, 60] and in Chapter 3 of this thesis.
1.3.4 Computing portfolios
In previous sections we talked about various models for portfolio selection without mentioning op-
timisational tools that are required to compute the actual portfolios. We believe that a large class
of problems that were discussed in this chapter and all the new models developed throughout this
thesis will be solved either via analytical tools or via convex optimisation software. In what follows
we shall the most common ways of computing financial portfolios.
Analytical approach
In very few simple cases the optimal portfolio can be obtained analytically from the model. Consider
the classical Markowitz model of the form (1.8) with budget constraints
max
φ
µT
φ − λφT
Σφ (1.38)
s.t. 1T
φ = 1.
Optimisation problem (1.38) is a convex quadratic optimisation problem with a single global opti-
mum. It is convex because Σ is a positive definite matrix by the definition and the objective function
although concave can be turned into a convex one by adding a minus sign in front of the current
objective function and turning the max operator into the min operator. There are several ways of
solving (1.38). One way is to notice that the objective function restricted to the hyperplane 1T
φ = 1
is still a quadratic function with a single maximum, thus straight forward differentiation will allow
us to find the optimum portfolio. Another way to obtain the optimal portfolio to (1.38) is more
general and consists of minimising a Lagrangian function
L(φ, γ) = λφT
Σφ − µT
φ + γ(1T
φ − 1). (1.39)
After straight forward algebraic manipulations (performed in Chapter 2) one can show that the
optimal portfolio φ∗
is given by the following equation
φ∗
=
1
2λ
Σ−1

µ −
1T
Σ−1
µ
1T Σ−11
1

+
Σ−1
1
1T Σ−11
. (1.40)
23

Explicit result for optimal portfolio could help analyse the behaviour of the portfolio as a function
of µ and Σ. Unfortunately, it is rarely possible to obtain explicit results to more generalised for-
mulations and hence optimal search algorithm will be needed to numerically calculate the optimal
solutions.
Search algorithms
A variety of methods were developed to solve both unconstrained and constrained linear, quadratic
and more general nonlinear problems. The literature on this subject is substantial and to get an
overview of the basic methods in this subject we shall direct the user to the work by Gould et al.
[38]. The authors cover basic methods in the unconstrained optimisation such as the linesearch and
trust-region methods, as well as in the constrained optimisation where problems are solved by a
variety of methods such as active set methods and interior point methods [11] among many others.
In our work we shall often make use of interior point algorithms for convex problems in order
to solve a particular portfolio selection problem. A convex optimisation problem is defined by a
convex objective function and by a convex constraint set. We prefer to study convex problems not
only because they appear often in portfolio selection problems, but also because it can be shown
that strictly convex problems have a single optimal solution which is also a global one. This avoids
the difficulty of finding a global optimum in the case when there are several local optima, which is
known to be a hard problem.
Interior point methods were developed to solve constrained problems via optimising a sequence
of merit/barrier functions allowing us to make use of unconstrained nonlinear optimisation methods
to find a sequence of points converging to the optimal solution of the original problem.
In this thesis we shall put emphasis on conic optimisation problems which can be efficiently
solved on a desktop computer. A class of conic optimisation problems belongs to a general class of
convex problems. The general convex cone optimisation problem is described by a linear objective
function and conic constraints. Mathematically, for a decision vector φ ∈ Rn
the conic optimisation
problem is expressed as follows
min
φ
bT
φ (1.41)
s.t. c − AT
φ ∈ K,
In (1.41), b ∈ Rn
is a known constant, A ∈ Rn×m
for some integer m and K is a cone. K is a cone if
and only if αx + βy ∈ K for x and y in K and α, β ∈ R+
. There are several “nice” cones based on
which numerical solutions have been implemented. In particular, a standard problem to which we
shall be seeking a numerical solution consists of an intersection of three cones, namely the positive
cone, the quadratic cone and the positive semidefinite cone. We shall define these cones below. The
24

general conic problem with three different conic constraints is given by
min
φ
bT
φ (1.42)
s.t. cl − AT
l φ ∈ Kl
cq − AT
q φ ∈ Kq
A0 −
n
i=1
Ai φi ∈ Ks.
In (1.42), b ∈ Rn
, cl ∈ Rnl
, cq ∈ Rnq
are known constants, Al ∈ Rn×nl
, Aq ∈ Rn×nq
for some integer
nl and nq. Matrices Ai ∈ SRm
- the set of symmetric matrices, i = 0, 1, . . . , n and some integer m.
The notation Kl, Kq and Ks stand for linear, quadratic and semidefinite cones of an appropriate
size. The definitions of each cone is given below.
• The linear cone. It is called linear because it is characterised by the positive Eucledian quadrant
which is a cone. Mathematically,
Kl := {x ∈ Rn
| xi ≥ 0, i = 1, . . . , n}. (1.43)
Problems of type (1.42) constrained only by Kl are standard linear programing problems.
• The quadratic cone. It is called in such a way because it describes a class of convex quadratic
constraints,
Kq :=

x0
x

∈ Rn+1
| x2 ≤ x0

. (1.44)
The quadratic cone can be visualised in 3D as an ice-cream cone. In turns out that many
constraints can be reformulated via the quadratic cone constraints. For example, the constraint
φT
Σφ ≤ η with a positive semidefinite Σ ∈ Rn×n
is equivalent to


1 + η
1 − η
2Σ1/2
φ

 ∈ Kq. (1.45)
As another illustration it can be shown that the constraints 1 ≤ τη and τ, η ≥ 0 are equivalent
to 

τ + η
τ − η
2

 ∈ Kq. (1.46)
Problems of type (1.42) constrained by Kl and Kq are called Second-Order Cone Programming
problems (SOCP). Applications of SOCP in real life situations are discussed in much detail
in [58], whereas the algorithmic theory is discussed in [72]. Algorithms to solve SOCP were
proved to take polynomial time to reach the solution within a specified accuracy thus making
them very efficient.
25

• Positive semidefinite cone. If a symmetric n × n matrix S is said to belong to the positive
semidefinite cone Ks then S is a positive semidefinite matrix (xT
Sx ≥ 0 for all x ∈ Rn
), or
notationally convenient S 0. Mathematically,
Ks := {S ∈ SRn×n
| xT
Sx ≥ 0, ∀x ∈ Rn
}. (1.47)
Problems of type (1.42) constrained by Ks are called semidefinite programming problems
(SDP). Polynomial time algorithms based on interior point methods have been found used in
practice. An important reference on the theory and applications of semidefinite programming
is given by Vandenberghe and Boyd, [92].
The exist both open license and commercial software to solve SOCP, SDP and general problems of
type (1.42). SOCP problems are efficiently solved by the open-licensed SeDuMi [85], SDP problems
can be solved by open-licensed SDPA [34]. General convex cone problems (1.42) can be solved by the
open-licensed SDPT3 [90, 91] or by the commercially robust MOSEK software [70]. In this thesis we
shall be using SeDuMi and SDPT3 integrated with Matlab to perform computational operations.
26

Chapter 2
Limiting the Extremal Behaviour
of Classical Investment Models
2.1 Understanding the portfolio selection problem
2.1.1 Introduction
In Chapter 1 we have introduced the classical Markowitz investment problem [63] that is given by the
equations (1.1), (1.7) and (1.8). Recall that in the above problems the portfolio risks are modelled
by the variance of the portfolio returns which is estimated by a maximum likelihood estimator.
Likewise, the expected portfolio returns are estimated from the past data via the sample mean.
Even with a large amount of data available there always exists some estimation error. Fur-
thermore with probability one the estimate will not be equal to the true value of the estimated
parameters. In the context of portfolio optimisation it is then interesting to analyse how much the
hypothetical true portfolios that were based on the true expected returns and variance differ from
the portfolios that were obtained from the estimated expected returns and variance. The under-
standing of the behaviour of portfolios calculated from parameter estimates may allow us to quantify
the situations in which such portfolios can potentially lead the investor to disastrous investments.
This in turn, may allow us to develop robust algorithms designed specifically to avoid dangerous
situations. This is precisely the approach taken by us in this chapter.
In the past, several authors have noticed that optimal investment portfolios based on the classical
Markowitz model suffer from being very sensitive to input parameters such as expected returns and
variance. As mentioned in the introduction chapter, it was shown that errors in the estimates of the
expected returns have the largest effect on portfolio sensitivity [10, 17]. Therefore it is accepted to
study the way the uncertainties in the returns affect the resulting portfolios. In what follows, we will
in fact show that high sensitivity of portfolios actually results both from the errors in the returns,
but also from the estimate of the risk covariance matrix.
In the course of this chapter we assume that the asset returns R[t]
∼ N(µ, Σ) independently of
time.
27

2.1.2 Motivations
In what follows we shall be working the utility formulation of the classical Markowitz investment
problem that is given by (1.8), namely if φ represents portfolio weights, µ - portfolio expected
returns and Σ - the covariance matrix of asset returns then the classical problem (1.8) is given by
max
φ
µT
φ − λφT
Σφ (2.1)
s.t. φ ∈ C,
where λ is a constant risk aversion parameter. In this chapter we shall be working with a convex
set C that is defined by the budget constraints only
C = {φ ∈ Rn
: 1T
φ = 1}. (2.2)
It has been noticed in the past that portfolios obtained from (2.1) are very sensitive to input
parameters, especially to errors in the estimation of the expected returns. In this chapter we shall
address the problem of extreme sensitivity of portfolios obtained from (2.1).
Our aim consists of reducing the largest possible square distance between the “estimated” port-
folio (i.e. the portfolio obtained from estimated expected returns and variances) and the “true”
portfolio (i.e. a portfolio that was obtained as a result of problem (2.1) with true (and unknown)
expected returns and risks). To render the framework more concrete let us define
φ(µ,Σ) = argmax1T φ=1{µT
φ − λφT
Σφ} (2.3)
and let ¯µ and ¯Σ be the maximal likelihood estimates of µ and Σ respectively under the assumptions
that daily (or period) returns are independent multivariate normal random vectors.
We are interested in changing the inputs to the classical problem (2.1) from ¯µ and ¯Σ to ˜µ and ˜Σ
in such a way that the maximal distance between the calculated portfolio and the true portfolio for
this period is minimised. Mathematically, we want to make sure that for some uncertainty set U
max
(µ,Σ)∈U

φ(µ,Σ) − φ( ˜µ,˜Σ)

2
(2.4)
is minimised by considering different input parameters (˜µ, ˜Σ) to (2.1). We seek a direct way of
making the classical portfolio robust in the sense that (2.4) is not as large as it could be with the
sample ¯µ and ¯Σ. By analysing extremal behaviour we hope to understand why classical portfolios
are unstable to input parameters.
Before we proceed with the analysis of the extremal situations in which classical portfolios can
end up in, we shall consider several simple extremal examples that could be observed in practice.
28

Limited amount of data used for estimation
Assume that k daily returns for n assets were observed and put in a matrix form R = [r(1)
. . . r(k)
] ∈
Rn×k
, where r(i)
represent a vector of realised returns for n assets on day i. One of the classical
methods for estimating the expected portfolio return and the portfolio variance is estimated by the
sample mean and the sample variance (which are the maximum likelihood estimators under the
assumption that the returns are i.i.d. multivariate normal). For now, let us focus on the estimation
of the covariance matrix.
It can easily seen that a sample covariance matrix ¯Σ with
¯σij =
1
k − 1
k
l=1

r
(l)
i − ¯ri

r
(l)
j − ¯rj

(2.5)
can be written in the matrix form
¯Σ =
1
k − 1
R

I −
1
k
11T

RT
, (2.6)
where I is the identity matrix and 1 is the vector of ones of size k. Although the sample covariance
matrix (2.6) is the best one could get on the basis of the likelihood function it could still suffer from
a number of problems that can be harmful to the investor who follows classical investment decisions.
As an example the covariance matrix (2.6) can be positive semidefinite with one or more zero
eigenvalues. This could often happen because of rounding errors or because there is just not enough
information to estimate Σ accurately. To explain this, consider the equation (2.6) in greater detail.
Notice that the rank of the matrix I −1
k 11T
is equal to k −1 and thus the rank of ¯Σ is at most k −1.
Therefore, when the number of assets n is larger than k, then ¯Σ has n − k + 1 zero eigenvalues.
Intuitively, there is not enough available information to describe the covariance matrix.
Consider the problem (2.1) and assume that ¯Σ has at least one zero eigenvalue. Then it is
possible to find portfolio weights φa = 0, such that φT
a
¯Σφa = 0. To see that, let Γ ∈ SRn×n
=
diag(γ1, . . . , γn) be a diagonal matrix of eigenvalues of ¯Σ, where γ1 ≥ · · · ≥ γn, γn = 0. Let Q be a
matrix of eigenvectors such that ¯Σ = Q Γ QT
. If one defines φa = Q(0, . . . , 0, a)T
for any a ∈ R{0}
then one could see that φT
a
¯Σφa = 0. The portfolio φa ∈ C defined by (2.2), which forces a = 1
1T qn
where qn is n’th eigenvector. Also with probability one µT
φa = 0 and therefore we have constructed
a portfolio that has zero risk and a non-zero return. Thus, we managed to construct an arbitrage
portfolio based on the limited amount of data. Moreover, if ¯Σ has more than two zero eigenvalues
then it would be possible to construct a portfolio with zero risk and arbitrary large expected returns.
Such portfolios are unrealistic.
It can be conjectured that zero (or maybe just small) minimal eigenvalues of ¯Σ along with its
eigenvector play an important role in determining whether a portfolio is “bad” or “unrealistic”.
29

Distance between true and estimated portfolios
We continue with motivational examples. In this section we are interested in the distance between
the estimated portfolio, i.e. the portfolio obtained from (2.1) with estimated parameters (¯µ, ¯Σ) and
the unknown true portfolio that is based on the true (µ, Σ).
The optimal portfolio φ(µ,Σ) depends on the input parameters (µ, Σ). It is interesting to see
how much φ(µ,Σ) changes with the change in µ or Σ. From the literature it is known that the errors
in the estimates of the returns result in larger errors in the portfolio weights, as compared to the
errors in the sample covariance. For a fixed estimate of the portfolio covariance ¯Σ, let us proceed by
estimating a bound on the difference between φ( ¯µ,¯Σ) and φ(µ,¯Σ) in some vector norm. Such a norm
bound has been obtained by Best and Graurer in [10]. If we let δ = ¯µ − µ, it can be shown that

φ( ¯µ,¯Σ) − φ(µ,¯Σ)

2
≤
δ2
2λ
1
γmin(¯Σ)

1 +
γmax(¯Σ)
γmin(¯Σ)

. (2.7)
Therefore the maximal difference between portfolio depends on the size of the error δ and on γmin(¯Σ)
and γmax(¯Σ) - the minimal and the maximum eigenvalues of ¯Σ respectively. We see that in certain
cases the smallest eigenvalue of the sample covariance matrix plays a role in increasing the maximum
distance between the “estimated” and the “true” portfolio. At the moment it is not very clear in
what situations the smallest eigenvalue of the covariance matrix significantly increases the sensitivity
of portfolios to the errors in the returns. We shall investigate these issues in more detail in later
sections of this chapter.
Numerical evidence
As for this motivational example, consider the investment problem (2.1) with µ = 0. Let ¯Σ be a
statistical estimate of the true Σ. Define
φΣ = argmin1T φ=1 φT
Σφ (2.8)
and suppose that the utility function for a portfolio φ is defined by
U(φ) = −φT
Σφ. (2.9)
Note that U(φ¯Σ) ≤ U(φΣ) by definition (2.8). In Figure 2.1 we show how U(φ¯Σ) depends on the
smallest eigenvalue of ¯Σ. To do that we have taken 29 real assets (list of stocks used can be found
in Table A of the Appendix A) and estimated their expected returns and covariance matrix based
on 1470 days of data that we had available and assumed that the resulting estimates are the true
ones. Then, based on these estimates we generated 90 days of returns 1000 times. Generated data
was used to obtain ¯Σ and then φ¯Σ.
We can see from Figure 2.1 that large deviations from the optimal U do tend to happen whenever
the value of the smallest eigenvalue of ¯Σ is small. Small values of the smallest eigenvalue are necessary
30

for large |U|, but not sufficient. Since symmetric matrices can be fully characterised by eigenvectors
and corresponding eigenvalues, it is possible to conclude that the smallest eigenvalue as well as
eigenvectors contribute in some way towards large values of |U|.
Figure 2.1: Real portfolio utility as a function of the smallest eigenvalue.
2.1.3 Identifying extremal portfolios
Identifying cases of extremal behaviour of classical portfolios could be very useful. The possibility to
identify bad portfolios could provide the investor with a chance to avoid extremely bad performance.
Alternatively it could send a warning signal to the investor that something could go wrong.
We shall tackle the problem of minimising (2.4) in two stages. In the first stage will assume that
Σ is known and given. After estimating µ with its sample mean ¯µ we shall analyse changes in

φ( ¯µ,¯Σ) − φ(µ,¯Σ)

2
(2.10)
as µ changes within some uncertainty set U . We will see that the maximum value of (2.10) can be
minimised if one used some carefully constructed Σm instead of ¯Σ. In the second stage, using the
triangular inequality

φ( ¯µ,Σm) − φ(µ,Σ)

2
≤

φ( ¯µ,Σm) − φ(µ,Σm)

2
+

φ(µ,Σm) − φ(µ,Σ)

2
, (2.11)
it becomes clear that it is not possible to use arbitrary Σm since

φ(µ,Σm) − φ(µ,Σ)

2
may become
larger even though

φ( ¯µ,Σm) − φ(µ,Σm)

2
may get smaller. This trade-off will be analysed.
31

Optimal portfolios with no constraints
Our first aim consists of analysing the sensitivity of portfolios in (2.1) with respect to µ ∈ U . Con-
sider the problem of maximising a quadratic portfolio utility for some given risk aversion parameter
λ without any constraints on portfolio weights
max
φ
µT
φ − λφT ¯Σφ. (2.12)
Notice that (2.12) differs from (1.8) by being free of any constraints. Taking the derivatives, one
sees that
φ∗
=
1
2λ
¯Σ−1
µ (2.13)
is the optimal solution to (2.12). We now assume that the true µ belongs to an elliptic uncertainty
set
U = Eκ
S−1 (¯µ) := {µ ∈ Rn
, µ = ¯µ + u : uS−1 ≤ κ} (2.14)
which is centered at ¯µ, is of size κ and is defined by some positive definite matrix S. Such an
uncertainty set would be obtained naturally from estimating a confidence region for the expected
daily returns that follow a multivariate normal distribution (see section 3.3.2 in Chapter 3). It can
be deduced from (2.13) and (2.14) that
φ∗
=
1
2λ
¯Σ−1
¯µ +
1
2λ
v (2.15)
where
v ∈ Eκ
¯ΣT S−1 ¯Σ(0). (2.16)
Since S is positive definite, ¯ΣT
S−1 ¯Σ is also positive definite. In other words whenever the true µ
lies within some ellipse, φ∗
also lies in an ellipse. A typical value for S is ¯Σ (up to a multiple) in
which case v¯Σ ≤ κ. We shall often use S = ¯Σ. In particular, in Figure 2.2 we illustrate how the
optimal portfolio φ∗
changes with different true µ ∈ U defined by (2.14). In the first illustration
of Figure 2.2, one can see an ellipse centered at µ0 := ¯µ in which the true µ lies. We denote by q1
and qn the eigenvectors related to the largest and the smallest eigenvalues respectively of ¯Σ (note
that in this example n = 2). One can also see two points A and B which were added as an example
of the possible true values of µ. The second illustration in Figure 2.2 shows an ellipse within which
optimal φ∗
lie subject to variations in µ. This ellipse is centered at φ∗
0 := ¯Σ−1 ¯µ/2λ. The diagonal
line1
in this illustration is described by the equation φ1 + φ2 = 1 and will be considered later in this
chapter. From the illustration we can see that this ellipse is stretched out into the direction of the
eigenvector that relates to the smallest eigenvalue of ¯Σ. Thus
max
µ∈U

φ(µ,¯Σ) − φ(¯µ,¯Σ)

2
= max
v∈Eκ
¯Σ
(0)
1
2λ
v =
κ
2λ

γmin(¯Σ)
(2.17)
1In Figure 2.2, µ0 was chosen such that φ0 is on the line.
32

New_and_Improved_Robust_Portfolio_Selection_Models_ZUEV(dphil)

New_and_Improved_Robust_Portfolio_Selection_Models_ZUEV(dphil)

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (17)

Destaque

Destaque (15)

Semelhante a New_and_Improved_Robust_Portfolio_Selection_Models_ZUEV(dphil)

Semelhante a New_and_Improved_Robust_Portfolio_Selection_Models_ZUEV(dphil) (20)

New_and_Improved_Robust_Portfolio_Selection_Models_ZUEV(dphil)