Econometrics: Basic

Giulio Laudani #13 Cod. 20191

Econometrics
Black-Litterman Model ...................................................................................................................................................... 1
OLS..................................................................................................................................................................................... 2
VAR and volatility estimation ............................................................................................................................................ 4
Stock for the long run........................................................................................................................................................ 6
Style analysis (OLS application) ......................................................................................................................................... 6
Principle component ......................................................................................................................................................... 7
Logarithmic random walk.................................................................................................................................................. 8
Types of return and their properties................................................................................................................................. 9
Markowitz optimization portfolio (Algebra calculus application)................................................................................... 10
Probability Mathematics and Laws ................................................................................................................................. 11
Matlab question .............................................................................................................................................................. 12

Black-Litterman Model
1
Black-Litterman modelscope is to estimate the market expected return avoiding the Markowitz optimization pitfall . The basic
idea is to use as weights for the market allocation, the ones computedstarting from those provided by some well diversified
indexand by adjusting them with our views as departures from that index asset allocation. It is an application of the Bayesian
statistic, basically we want to find a new distribution under some new information provided by us.

The methodology proposedconsists of a multi-step process:
At first We should perform the estimations of the B-L variables:
o We will chose a market index, from whom we will obtain the corresponding weights. Here we are making
some assumptions on the index. The chosen market proxy should be mean variance efficient. This assumption
is not really strong, in fact it is reasonable that a market proxy is at least not too much mean variance
2
inefficient. However we should remember that a sub set of an efficient portfolio is not in general efficient .
o The available market information are distributed according to a Normal , where the mean
is equal to the estimated market expected return and the Var-Cov matrix times a scalar smaller than one
o We already know the relationship between Var-Cov, weight, market return and risk aversion coefficient, as it
has been defined by Markowitz optimization, hence it is possible to invert that formula and to find out the
implicit market expectation

 Since theestimatedexpected market returnhighly depends on the choice of the proxy index, to lessen
the problem, we should use a big portfolio. However the bigger the portfolio the harder/ numerical
demanding is the computation power required, so to maintain a numerical manageability we can
deepen the use ofCAPM:we are going to use a big portfolio and we willfind for each of our
3
securitiesjust the betas, hence we don’t need to estimate the whole Var-Cov matrix
o The Γ is the transforming parameter of the Var-Cov matrix, its meaning is to account the relative importance
gives to the market or our view info, it is important the ratio between it and the view matrix. The higher the
ratio the higher the confidence in the market

1
The problem overcome by this model is the high volatility of the historical return, which doesn’t allow to define narrow confidence interval at high probability level. Due to this high
volatility, there is an high sampling error, which doesn’t allow to use the Markowitz method to properly find out the weights of the market portfolio
2
It could be the case only if the sub portfolio has been built by random sampling technique, so that it has the same sub class exposure
3
There is a drawback, stocks with low correlation with the market tend to give unstable results, so it is necessary to implement a multifactor model
1

o We will make some assumption on the Var-Cov matrix. The matrix is usually estimated by monthly historical
data (usually a tree years time frame) or by smoothed estimates
 A typical problem in the Var-Cov matrix is the overestimation correlation, which will lower the
positive effect of diversification, if two securities have similar expected return and high correlation
there will be an over concentration on the asset with higher expected return. There exist a
4
procedure to lowering this problem, that is similar to the adjusted beta.
o The risk aversion parameter (assuming the absence of the risk free asset) is given by the Markowitz formula:
variance over expected excess return. Note the denominator is an a priori guess since it is what we are looking
for, we can use an iterated process

o The views must be given in a numerical form, so that it is possible to immediately check the effect on the
allocation.The asset manager views consist on portfolio return, which are summarized by a Normal with mean
the expected return of the portfolio (given the manager views) and a diagonal Var-Cov matrix expressing the
5
confidence on those views

Where P is the weights that allow to have V expected return given the expected return of the securities in the
market
Given all the previously information the Black and Litterman proposes to combine those two set of information using an
optimization equation, aiming to minimizing the distance between our parameter and both the market’s and manger’s
information.

Note that if we will use only the market portfolio information the investor will end up with the market portfolio itself,
the innovation of the model is the possibility to add views and so to have a different allocation
The solution can be express in two ways as well:

Which can be seen as the tangency portfolio in the Markowitz optimization theorem, where to the we will add a
spread position representing the view correction

Which is like a weighted average. The equivalent weights are

the parameter g is a constant that made the weights sum equal 1

OLS

OLS, ordinary least square is a method used to estimate the regression parameters of linear regression: a set of variables
regressed on common factors. It assumed a linear relationship between the depend variable and the weights, not for the
independent variable.

Besides OLS there exist other methods to estimate the regression parameters: Moments and maximum likelihood. The OLS
estimates consists of minimizing the the sum squared error  6 we want that our model on average is
equal to Y. This method is preferred because has an analytic solution and under certain hp is superior to any other methods as
proofed by Gauss Markov theorem. An estimator need to have an important feature to be useful, that is un-biasness and if it
cannot be achieve we need to require consistency, which is an asymptotic property which require less hp on the error and its
correlation with the independent variables.

Setting the first derivatives to 0 (it is a sufficient condition since the function is concave) we end up with the
which and

4
You will blend (meaning a weighted average) the estimated matrix with a reference one made of one in the diagonal and the average of the off diagonal elements of the estimated
matrix
5
We should define the matrix value so that to ideally built a IC at 95% in which our views are restrained
6
doing thefirst derivatives we end up with
2


from this formula we see that to increase the estimation quality by increasing the range of the independent variables.

As the formula shows the depend variable randomness comes from the presence of the error. Hence the conditional and
unconditional distribution are the same, furthermore under the weak hp we can say:
E( ) = Y and V( )=

OLS requires certain hp to properly work and to allow the user to infer IC
Weak hp are three and they will ensure all together that OLS estimators are BLUE:
o The expected value of the error is 0 (it is always the case if the intercept is included in the model) and they are
not correlate with the X; if X is random we should request
o The variance of error is constant and the correlation among errors is 0, if this hp fail, so that we can
still estimate the β with the generalized least method where it is still BLUE. We
need to transform the original equation so that to have another one with . Here the proof:
in GLS the

o Note the
o The matrix X is a full ranked one to avoid multi-collineratity and to ensure the matrix (X’X) to be invertible. The
effect of the multi-collinearity is the increase of the betas variance

The Gauss Markov theorem states that is BLUE by using the definition of variance efficiency which states that if
and are both unbiased estimated we can say that is not worse than iif V( is at least psd.

we should also consider that if we want to estimate a set of linear functions of where H is not random
the definition of BLUE estimator is invariant to this. We call this property “invariance to linear transform” and it is the
stronger argument in favor ofthis definition of ’not worse’ estimator. An implied hp in the previously theorem is that
the class of estimators is to be linear on the dependent variable.
Strong HP are two: the error are independent one to each other and to X, hence they are distribute as a Normal. It
follows that even the beta has the same distribution since they are linear combination of the errors.Under those hp we
can built confidence of interval and test the statistical meaningfulness of the model parameters.
There are severaltest used in statistics to assess the fit and the overall and one by one coefficient significance
o The t-ratiosince the error variance is unobservable we use the sample variance so instead of the Gaussian
distribution we are going to use the t-student distribution the percentile za to define the IC and to see if
the 0 is included or we can use the “a” p-value; “n” is the degree of freedom, each of them will be used for
each estimators.The general idea is to divide the numerator (hp) by its standard deviation. The paired sample
is a procedure to test the difference between estimator by considering the difference “d”introduced as a new
parameters in the model. Hence The standard deviation is automatically computed by the model and it will
7
consider the effect of the potential correlation among estimators
o The F-testto test more/jointly hp. The F ratio is defined as ; where k is the
# of parameters “q” is the number of factors tested. The F-test on one variable in general gives the same result
of a two side t-test

7
Positive corr will reduce the variance
3

2
o The R if the constant is included in the regression or better if . This
measure can only be reduce by adding new independent variables. Note that if we are in the univariate the
since y is a linear combination of x.

Some consideration based on exams test:
o Cov(r1,r2) where both return has been computed on the same factors it is equal to
o Remember that the expected value of each betas in a multivariate regression is the real beta and that the
difference of any pair of betas is always
o If we us the estimated OLS parameter to make an inference in the region outside the X used (forecasting) you
have to assume that the beta in the new region are the same and are still distributed according to a normal
with the same parameters. The target function to estimate the IC is . If we are doing
an IC for the Forecast the IC will
o If the constant is included in the model we have and the fitted value of y on the average
value of the X is the average of the fitted value itself, which is equal to the average of the real y as well.
; if we use a model without intercept the
o
o The Mean square error where T is the estimators is the real value
o If we do not consider the complete model, but we miss to consider one independent variable which is
correlated with the included variables, there will be a bias in our coefficient since they will be correlated with
the error

o If the intercept is excluded in the model (and it is effectively different form zero) than the estimates of the
betas are biased. However if the intercept is really 0 the coefficient variance will be lower.
o If the Cov(Xi;xj)=0 than each Beta could be estimated by the univariate formula

Where V is the Var-Cov Matrix of X, if the statement it is true that matrix is a diagonal

VAR and volatility estimation

Before talking about the VAR and its estimation procedure, we should spend some word on the volatility itself, on its meaning
and on how to estimate.
In the finance field the volatility is used as a measure of risk to have a sense of the unpredictability of an events. It is usually
computed looking to the historical trend of a variable or by looking to the derivatives market, that is the implied volatility,
which is the one making true the market price given the other variable using a pre-specified pricing formula
8
In the finance field Tails behavior is essential toestimateVaR , which is used to assess the max possible future loss on a certain
time interval. The VAR inputs are the exposure amount and the percentile indicating the given probability to experiment a loss at
least equal to the one indicated by the percentile itself. As we can see the hp on the distribution of the Tails of the return is the
key to ensure meaningfulness to this tool. The book propose four possible data distributions:
The Parametric one is the first methodology proposed. It consists ofa gauss distribution with parameters infer from
historical data.The parameters needed are the to find any quintile. Those parameters are estimated by
historical data, in detail the volatility is estimated using the Riskmetrix.

o Our goals are to estimate the quintile and the low bound since the variance it is estimated

8
A general limit of the VAR methodology is that it doesn’t give information on the event that is causing the loss, but it gives only the probability of that event. It also ignores the
distribution after the quintile estimated, furthermore it is a pro-cyclical measure, in fact since many of the methodologies proposed use historical parameters (or more in general data)
from the past time interval, a positive (negative) trend will bring a positive (negative) momentum that will biased the estimation downgrade (upgrade)
4

The where we are assuming as a proxy

So the lower bound is
o This method has several limits highlighted by empirical evidence, in fact the underlying hp on Gaussian returns
is counter proofed by data.
Mixture of gauss distribution. It consists of the mixture of two or moreGaussian or not distribution with different
parameters weighted with the probability of occurrence. The general idea is to use for the first the normal case
parameters, while for the second the exceptional one. The blended distribution can be computed only numerically by
maximum likelihood method
where is estimated before running the quasi-likelihood function (log form);
however the tails will decline still at an exponential rate like the Gaussian Distribution; this methods is like a Garch
model with infinite components, so the unconditional distribution becomes a non-constant variance
The Non parametricconsist of using a theoretical distribution based on a frequency probabilistic approach. We will use
as distribution the cumulative function, no parameter needed
o The confidence interval are built starting by finding the i-esimo observation from whom we have the wanted
empirical probability using the frequentistic approach
o To find out the low bound we will need to compute the volatility of the frequentistic probability
 We will compute the probability of occurrence of that i-esimo observation using a binomial
distribution,

The cumulative distribution:
With a n>> the distribution converge to a Gaussian distribution
where j is the number of the ordered
observation which maximize the so it is the lower bound
o The drawbackis givenby the few insight provided for extreme quintile since the observations became either
granular (be non-contiguous) or totally absent, hence this method is weak against alternative parametric
distribution (high sampling error)
Semi-parametricis a blend of a parametric model to estimate the central value (close to the mean) and a non-
parametric one for the tails, while the non-parametricpart is to find where to plug in the tailor model
o The parametric partfor the central value is a gauss distribution as in the parametric one
o The non-parametric part suggested to estimate the tailor data consists in building a function to approximated
the behavior of tails data.For where L(.) is a slow varying function and a is the speed
with which the tails goes to 0
o To estimate “a” (the only parameter) we use the formula to represent the log frequency distribution

Where represents all the approximation made, C is a conant, in fact the ln of slow varying function is basically
a constant  a is estimated using OLS; polynomial declining rate for tails
o Then we graphically search for the plug in point, which is the point form where the empirical cumulative
distribution start to behave as a linear function
 When we have found the sub set of data, we will use them to estimate the quintile with:

Hence given a and the probability and the first point where to start the plug in
 The procedure to find the low bond from the quintile probability is since:

5

Where the low bound is

Stock for the long run

Stock for the long run is a common mistake in the finance field. It states that an investor should choose its own investment
strategy choosing the stock with the highest expected return without considering the underling risk. This statement is based on
twoex-ante and one ex-post hp, those hp came from the intuition (LRW world) that after a certain time period any kind of return
can be achieved, regardless the risk:
First hp: given the Sharpe ratio formula the idea is that with a sufficient big n any result can be acquired, or in
other word there is time interval in which the probability to obtain a given expected return is reached (usually with a
confidence of 95%) it is a direct consequence of the LRWhp that states that return grows at a rate “n” while volatility
9
grows at a rate equal to the “square root of n” 
Second hp: taking two investment strategies with same mean and variance, one in 10 uncorrelated securities for one
year and the other in just one security for 10 year, the hp will suggest the existence of time diversification
Third hp (a posterior): looking the historical performance of the US stock exchange it makes sense to invest on it
compared to other investment strategy
As it can be seen It is a consequence of how we built confidence of interval, however it can be proven wrong:
First critique: it made some hp on the investors utility function, meaning how he will choice his investment strategy.
The statement is assuming that investors will choice only comparing the Sharpe ratio on the long run, and that they
won’t change their strategy. There is another comments to be done regarding the strategy: assuming to be confident on
the criterion of Sharpe only for a certain long time frame, but be against the investment using the same criterion for
each of the period subset, it is like assuming a peculiar Utility function of the investors; not only the statement is wrong,
in fact the investment is not superior for any given horizontal period, but for sufficient long time horizon this strategy
seems to be the best one among all the other possibilities. Furthermore, since we are interested in the total return (not
in the expected return) we notice that the range of possible total return ( will increase at “n” rate over
time, hence the uncertainty is not declining
Second critique: it is an error based on wrong idea that two investment strategy with different time frame are
comparable, thus there is not any kind of time diversification.
Third critique: since the US stock market has shown the highest return on the last century you should invest in stock.
This is an ex post statement and it cannot be proven for the future, in fact the positive US trend has been sustained by
the economic growth of that economy, and we cannot infer from historical data a similar future success

Style analysis (OLS application)

Style analysis is a statistical way to compare asset manger performance with a specified ex post portfolio built using market
index, meaning we want to know if the manger has been able to over-perform the market performance, hence if he had
deserved the management’s fees. This capability to add value is not replicable by investor using public information it is an ex-
post analysis
The suggested methodology consist of regressing the fund return on some indexes, which are subjectively assumed by
the investors to be a good proxy of management strategy
We will consider the spread of the return and the estimated return (hence we are considering the error) and we will
analyze if its mean is statistical significant and if the cumulative sum of the error show any trend
Sharpe suggests to build the model following this procedure:
o Set the constrain on the beta, they must sum to one and without intercept, this can be done by an ex-ante
method or an ex-post one (normalizing the value), keep in mind that those methods don’t give same results.
This is a simplification made by Sharpe to ensure a self-financing strategy and to avoid the presence of a
10
constant return over time (even risk free cannot achieve this result )
o The regression is made on subset of constant periodlength, making them moving forward one by one
The critiques of this methodology consists of three points:

9
The theme of the explosion growth in the unitary coefficient auto regressive model, where LRW is one of those.
10
Theory of finance justifies this statement. We can use sort term risk free rate investment
6

oThe weights has been set to maintain constant relative proportion is a limit and a costly strategy, there exist
alternative: buy and hold or trend strategy or even within the constant weight we can have changes
o If the fund manager knows how he will be judge and he knows more than investor regarding the composition
of the market portfolio, he can easily over perform the market, but it is not an easy task to replicate the market
portfolio ex ante
o The analysis is not considering the difference of variance produce by the two strategy and this can give an
advantage to the fund manger
There are three possible decision regarding the analysis depending on the error value
o The cumulative error is negative, and it is a strong evidence against the fund performance, since it is more
efficient a totally passive strategy
o The cumulative error is equal or not statistically significantly different from 0, it is hard to assess if the
management performance is not satisfying
o The cumulative error is positive, it cannot consider an evidence of the goodness of a management team since
this measure alone it is affected by many strong simplifying assumptions and doesn’t consider the volatility
difference between the passive strategy implemented and the effective one

Principle component

One of the key task in the asset management industry is to estimate the where are the price of
risktimes the beta (sensitivity). Those price are usually proxy with portfolio return, the problem is that we need to
joint estimate both the factors and the betas, so we have an infinite range of possible weight to be used as solution.

There exist two methods to test the meaningfulness of a model: the first one is to check if the intercept is equal zero,
however it is not really powerful and furthermore is not a good criterion (we may have really good fitting model
which fails the test); the other one is to test the linear relationship between return and beta. This last method is a
two-step process, at first we estimate the beta for each portfolio, than we will run a cross sectional regression to
check that the estimated beta and factor are consistent with market data11. We may add other term like square of
beta or error terms to see if those term are meaningful.

Principal component is an old and alternative method to estimate factors and betas by using the spectral theorem, where the
number of principal components is less than or equal to the number of original variables. The rational of the method is to proxy
the unobservable factor with portfolio return, which are built up to be sensible to constrains, basically we chose as factor
. We need to jointly estimate the factors and the betas.

This transformation is defined in such a way that the first principal component has as high a variance as possible (that is,
12
accounts for as much of the variability in the data as possible ), and each succeeding component in turn has the highest
variance possible under the constraint that it be orthogonal to (uncorrelated with) the preceding components, hence the first
elements of the error matrix is smaller than the smaller of the factors’. Principal components are guaranteed to be independent
only if the data set is jointly normally distributed. PCA is sensitive to the relative scaling of the original variables
Assuming to know the varianceand to have a time independent Var matrix. This last assumption is added just to simply
calculus, in fact there exist more complex methodologies to apply Principal component. Returns’ variance can be
represented by the spectral theorem. Other assumption is that V(r) is a full rank matrix, thus if k is the rank it is equal to
m which is the number of returns used

o Where x is the eigenvectors and the is the diagonal matrix which has been ordered from the highest to the
13
smallest value starting from the upper left position

11
To increase the power we group the return in box which maximize the distance between observation
12
By possible we mean given the constrain on the squared sum of the weights to be equal one, otherwise there won’t be a boundsince it can be arbitrary change by multiplying by a
constant. There exist other alternative such using the module, however those methodologies doesn’t allow an analytic solution
13
Remember that the Var-Cov matrix is a PD, otherwise (PSD) we cannot directly apply the theorem. The is the characteristic equation. Which is of order equal to the rank
of the Var-Cov matrix, so it can be solved only numerically
7

o The factors proposed are portfolio return, computed using the eigenvector and the market returns. Since each
portfolio is made by where x is the eigenvector, each of this portfolio is independent to the other so
we can use the univariate formula to compute our beta

so The beta are the eigenvector for the specified factor
o The variance of this factors is equal to the diagonal matrix in the spectral decomposition and it is a diagonal
;
o Since our model completely explains the return behavior, so to change it in a model more close to the common
regression we will rearrange the formula.We will divide the factors in two group. The first one will be the
variables matrix, the residual will be the error matrix.
 The residual matrix will have mean 0 and it is uncorrelated with the factors

Thus the Var-Cov highest value of the residual will be smaller than those of the factors one
 The factors matrix rank is equal to q where q is the number of factor considered (q=j)
There is drawback in this methodology, it doesn’t generally respect the pricing theory which state that there should not
be extra remuneration not to bear any risk, in fact the residual can be correlated to some return and so they are not
idiosyncratic, and furthermore this risk is not negligible  an asset even if it’s not correlated with the factors included
can have an excess return
There is another way to built principal component by maximize the portfolio risk with the constrain that each portfolio
is orthogonalto each other components and that the sum of the squared weights is set to one
o We will built a Lagrangian function to maximize the variance under the constrain, we will end up with that the
weights are the eigenvectors and the variance is the diagonal elements of the spectral theorem decomposition
of the variance of the return

The constrain is made to have an analitic solution, even if it doen’t have an economic meaning, in fact in
14
general the linear combination of and return is not a portfolio

o The book suggest to see the marginal contribution of the total variance of each component to notice how
basically all the variance is explained by the first three components
Assuming an unknown Var-Cov matrix : we can start from an a priori estimate of V(r) using historical data, however
there could be the case that the quality is to low, that’s way it is suggested another methodology. We can start to
estimate each components, starting from the highest, one by one.
o This method consists of maximize the variance with the usual constrain x’x=1 leaving all the estimation error in
the last component, since we can better off the estimate of the first one

Logarithmic random walk

In finance we are interested to forecast return, however the uncertainty around return is not predictable (differently form game
of chance) so we need to make assumption on possible probability distribution. One of the first model used is the LRW. It
assumed that the price evolution over time are approximated according to a stochastic difference equation

As it is shown by the equation the current price level depends on the past evolution and on an idiosyncratic component, so it’s
like saying that price movement are led by a modeled chance, meaning the underling distribution is assumed to be Gaussian. The
log form is used since it allows for multi-period return to preserve normality since the log of a product is the sum of the log
(linear function preserve the underling distribution). The idiosyncratic component has zero mean, constant variance and the
covariance among error across time is 0. Sometimes it is added the hp that those error are jointly normal distributed, hence they
are independent one to each other consistently with the time window, in fact if the observation are aggregated between period
the new idiosyncratic component will be not correlated only for the new time window, but it will be correlated with the middle
one, hence those middle observations must be dropped. Note that To Aggregate overtime the variance with a correlation

14
We can use the absolute sum of the , but only numerical solution are available
8

structure between correlated observation is not any more “n” times the one period variance but ,
hence the variance will increase an higher rate if the correlation is positive compared to the LRWvariance.

Nowadays Logarithmic random walk is simply used as descriptive method used to made accruals on returns, since no other
alternative has reach enough consensus in finance field, however the LRWhp are counter-proofed by empirical dataPrice don’t
evolve by chance as suggested by LRW&There exist a strong empirical evidence against constant variance and in favor of the
presence of correlation among securities; it can lead to negative Price level

The accrual convention consists of annualizing return bymultiplying the expected return and volatility of one period by number
of period or by its square root, which is the correct procedure in case of LRW, while it is an accrual for the securities

Another proposed model is the Geometric RW which is basically the LRW applied on price instead of return, this model
has a log normal distribution (hence a positive skewedness, which is related to the number of period considered). Some useful
properties are: it cannot become negative, volatility is function of the level of price (lower for small price. Bigger for big one)

Types of return and their properties

Even if in finance we are interested on the price evolution over time, all the models and assumptions are based on returns. The
easiest hp made on their possible evolution is basically supposing the existence of a stationarity process in the price, this
statement is a big contentious in finance.There are two typologies of return, neither of them is better than the other, itdepend
on what we want to do:
15
Linear are best used for portfolio return over one time period to compute expected return and variance of portfolio,
while the log return of portfolio doesn’t have alinear function to put together securities, hence any different
combination of stocks have a not linear relationship making incredible difficult any optimization problem, because the
return are even a function of the stocks return. not lin.
Logarithmic are best used for single stock return over time, in this case the return will only depend on the initial and
last elementof the time series ;
The relationship between those returns can be better understood by using the Taylor expansion for the log return,
which is the same of the linear if truncated at the first parameter . This formula shows
how the difference between the linear and log return (for price ratio far away from 1) will be always greater than zero,
since .
In finance the ratio of consecutive prices (maybe corrected by taking into accountaccruals) is often modeled as a random
variable with an expected value very near to 1.This implies that the two definitions shall give different values with sizable
probabilityonly when the variance (or more in general the dispersion) of the price ratio distributionis non-negligible, so that
observations far from the expected value have non-negligibleprobability. Since standard models in finance assume that variance
of returns increaseswhen the time between returns increases, this implies that the two definitions shallmore likely imply
different values when applied to long term returns.

The mean is hard to estimate due to the relative size of volatility, which is so big that basically the IC ends up with including the
0. Furthermore the increase of the frequency don’t provide any benefit
nothing change since the monthly
We are going to estimate the volatility measure using historical data, however there exist several procedure to be implemented
from the simplest based on the LRWhp to more complex one to properly address some empirically volatility features.
The one based on the LRW is simply the equally weighed sum of the difference between the i-observation and the
mean, however this measure has one big drawback that is the hp on the equality between the marginal contribution of
the new observation to improve the estimate and the one of the oldest.

15
; , it is linear
9

To overcame this assumption forward a more market tailor procedure the financial industry has introduced the
Riskmetrix procedure: The new formula is an exponentially smoothed estimate with coefficient usually around 0.95,
16
with boundary level set to >0 &<1; with the hp of zero mean


Alternatively we can be written as * where the last term is zero for n>>
o
The drawback of this estimate is the loss of the un-biasness property (if the data have a constant volatility),
and the formula is basically like truncating the available infowith a daily data frequency at 1 years at most even
for high coefficient
 it is the case only if wi=1/n
 it is minimize with wi=1/n by doing the Lagrangian
The variance estimation (reducing the variance of the variance) on the other side is small relative to its estimation and it can be
improved by increasing the frequency

Where the fourth moment is computed assuming Gaussian return with and the ;
the monthly frequency formula becomes which is smaller than before

Both volatility estimation suffer of the so called ghost problem, meaning that extremely high new obs has a high impact in the
level of our estimates. This behavior is asymmetric in fact incredible low obs are capped and it is more severe for the classic
formula where the volatility level will change abruptly when the outlier goes out the sample or will be reduce at a rate 1/n if all
the sample is considered. In the case of the smoothed estimators we have a decking factor equal to 1/

Markowitz optimization portfolio (Algebra calculus application)

Markowitz optimization portfoliois a methodology to build mean variance efficient portfolio using a set of stocks. This in general
is not related with the CAPM, which is a general equilibrium model, however if we consider the whole market the Markowitz
optimization becomes the CAPM market portfolio itself
The model is considering that the criterion used in the market is the mean variance efficiency and theinvestment time
window is unique and preset at the beginning of the investment process (no change after that)
o The hp to apply this method are that we know both the expected values of the single stock and the Var-Cov
Matrix, if those assumption fail there will be problems on the error sampling side
o One possible solution: The portfolio is built to minimize the variance with the constrain to achieve a specific
return. One of the most important result is that the relative weight on the portfolio are the same and do not
depend on the chosen return. This is a first instance of the separation theorem, meaning that the expected
return that we want to achieve depend solely on the allocation between the risk free asset and the portfolio

o The same result can be achieve by maximize the return given a certain risk, the tangency portfolio in this case
is equivalent to the result of the previously equation. This a sort of mean variance utility function
 The variance of the return is always equal to the portfolio one time the weight invest on it
 The ratio of the expected value and its standard deviation is the same for all the portfolio, hence all
the portfolio have the same marginal contribution on the composition of the stock portfolio risk
We want to show to result the first one is simply that the and that the is the slope and

that is the Sharpe ratio, basically we want to show that all the portfolio have the same value:

16
This is a consequence of data property, the volatility of the mean is so high to not allow for small interval to have mean significantly different form 0, and a consequence to ensure a
more conservative volatility estimates good for long term investor not for trader or hedger.
10

o If we consider the weight: and plug into the Markowitz lambda:

o If we plug this lambda in the Markowitz weight:

o We have to consider the market allocation
o The portfolio return will be: so computing the Expected and Variance of this equation and find
the and equal them so we will end up: and

Investors take on risk in order to generate higher expected returns. This trade-off implies that an investor must balance
the return contribution of each securities against their portfolio risk contributions. Central to achieving this balance is
some measure of the correlation of each investment’s returns with those of the portfolio
We do not believe there is one optimally estimated covariance matrix. Rather, we use approaches designed to balance
trade-offs along several dimensions and choose parameters that make sense for the task at hand.
o One important trade-off arises from the desire to track time varying volatilities, which must be balanced
against the imprecision that results from using only recent data. This balance is very different when the
investment horizon is short, for example a few weeks, versus when it is longer, such as a quarter or a year.
o Another trade-off arises from the desire to extract as much information from the data as possible, which
argues toward measuring returns over short intervals. This desire must be balanced against the reality that the
structure of volatility and correlation is not stable and may be contaminated by mean-reverting noise over
very short intervals, such as intraday or even daily returns
All the portfolio must have same Sharpe ratio if we can invested we can add as intercept the risk free rate, otherwise a
self-financing strategy should have an intercept of zero
assuming a portfolio equally weighted. The first

term is , the second term

Probability Mathematics and Laws

this is the conditional probability, this formula allows as to update probability with new
info, Bayes has proposed an alternative formula (more usable)

Any random variable has an associated distribution:
o Continuous case the “empirical” density is and the cumulative probability is
and it is a strictly increasing function
 The concept of percentile (q bounded between 0 and 1) is the probability that
where z is the corresponding value which put q% of data/value
 The excepted value is while the variance is  those
are population moments, using real CDF
Each distribution is defined by three parameters
o Local are those which will shift the distribution to the right or left
o Shape are the residual definition
o Scale are those who will change the σ, anything else
11

Possible distribution (useful)
o Binomial or Bernoulli distribution. The parameters are “n” number of experiment and “p”
probability of success for each experiment (it is assumed constant for all experiments)

K is the target success occurrence, for n>> the binomial will approximate a Gaussian Distribution
o Lognormal is a right skewed distribution
o Multivariate distribution is the distribution of the jointly behavior of two or more variable

Matrix operation:
o The rank
o A matrix is invertible iif it is full ranked and symmetric
o A must have same col of the B’s row, the resulting matrix is A’s row and B’s col
o
o
o
o :
o To sum rows:
o With two vectors we can built a matrix:
Inequalities
o Tchebicev Inequality ( the model consider the module, so we will account for both tail side)

o Vysochankij-Petun Inequality

o Cantelli one side:

Distribution measures:
o The skewedness measures the asymmetry of a distribution, it is the third moment centered. Positive
value will indicate right asymmetry, the opposite the left

o The kurtosis is a measure of the distribution in the shoulder or in the tails. The higher the value the
higher the concentration in the tails. It will be affected by asymmetry as well and it is always >0

Matlab question

Data=xlsread(‘nomefile’,’worksheet’,’range’); if “worksheet =-1” opens an Excel window to interactively
select data. Both string or number for worksheet
Xlswrite(‘nome file’, dati , ‘worksheet’,range)
Inv(A) is doing the inverse of the matrix
[coeff, latent]= pcacov(A) is doing the pc: where “coeff” stands for the eigenvectors, while “latent” for the
row of the lambda
Flipud(A) the last element becomes the first
12

Cov(A) will do the Var-Cov matrix
for i=0:4:12 (…) end, where 0 is the starting value, 4 is length and 12 is the final value

13

Econometrics: Basic

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (18)

Semelhante a Econometrics: Basic

Semelhante a Econometrics: Basic (20)

Mais de Giulio Laudani

Mais de Giulio Laudani (8)

Último

Último (20)

Econometrics: Basic