SlideShare uma empresa Scribd logo
1 de 12
Baixar para ler offline
CORRELATION AND REGRESSION Quantitative Analysis
R-11 (SS-3)
Page 1 of 12
 Covariance:
o Measures the linear relationship between two variables.
o It’s value is not very meaningful as it ranges from positive to negative infinity and presented in terms
of squared units (i.e. %2
, $2
)
 Correlation:
o Standardized measure of the linear relationship between two variables.
o Its value has no measurement unit and ranges from -1 (perfectly negatively correlated) to +1
(perfectly positively correlated).
o Limitations include the impact of outliers, potential for spurious correlation, and non-linear
relationships.
 Interpreting a scatter plot:
o A collection of points on a graph where each point represents the value of two variables.
o If correlation equals +1 the points lie exactly on an upward sloping line, the opposite is correct for
correlation equals -1.
 Hypothesis Testing for statistical significance:
o Test whether the correlation between the population of two variables is equal to zero,
(Two-tailed test with n-2 degrees of freedom at a given confidence level).
o Test structure:
o Test statistic: (assuming normal distribution)
o Decision rule:
o Interpretation:
If null cannot be rejected, we conclude that the correlation between variables X and Y is not
significantly different than zero at the given significance level (i.e 5%).
N
ot
For
R
elease
CORRELATION AND REGRESSION Quantitative Analysis
R-11 (SS-3)
Page 2 of 12
 Simple Linear Regression:
o Purpose: To explain the variation in a dependent variable in terms of the variation in a single
independent variable.
Dependent variable = explained, endogenous, or predicted variable.
Independent variable = explanatory, exogenous, predicting variable.
o Assumptions: (mostly related to residual –distrurbance or error- term (ε))
 A linear relationship exists between the dependent and the independent variable.
 The independent variable is uncorrelated with the residuals.
 The expected value of the residual term is zero.
 The variance of the residual term is constant for all observations. (otherwise, the data is
heteroskedastic)
 The residual term is independently distributed; that is, the residual for one observation is
not correlated with the residual of another. (otherwise, the data exhibits autocorrelation)
 The residual term is normally distributed.
o Model construction:
 The linear equation (regression line or line of best fit) is the line which minimizes the Sum of
Squared Errors (SSE), that’s why simple linear regression is often called Ordinary Least
Squares (OLS) regression and the estimated values are called least squares estimates.
 Slope coefficient: descibes the change in Y for a one unit change in X.
(stock’s β or systematic risk level, when X=market excess returns and Y=stock excess retuns)
 Inercept term: the line’s intersection with the Y axis (value of Y at X=0).
(ex-post α or excess risk-adjusted return relative to a market benchmark , when X=market
excess returns and Y=stock excess retuns)
N
ot
For
R
elease
CORRELATION AND REGRESSION Quantitative Analysis
R-11 (SS-3)
Page 3 of 12
o Importance of the regression model in explaining the independent variable:
Requires determining the statistical significance of the regression (slope) coefficient through:
 Confidence Interval:
 Structure:
tc is the critical two tailed t-value for a given confidence level with n-2 df.
 Decision rule & interpretation:
If confidence interval doesn’t include zero, we can conclude that the slope
coefficient slope coefficient is significantly different from zero.
 Hypothesis Testing:
 Test structure:
 Test statistic: (assuming normal distribution)
 Decision rule:
 Interpretation:
If null cannot be rejected, we conclude that the slope coefficient is not significantly
different than the hypothesized value of b1 (zero in this case) at the given
significance level (i.e 5%).
 F-Test: to be discussed later at the end of this reading
o Standard Error of Estimate (SEE):
 Also known as standard error of the residual or standard error of the regression, measures
the degree of variability of the actual Y-values relative to the estimated Y-values from a
regression equation = σε.
 The higher the correlation, the smaller the Standard Error, the better the fit.
o Coefficient of determination (R2
):
 The percentage of the total variation in the dependent variable explained by the
independent variable.
 For simple linear regression, R2
= ρ2
o Confidence interval for predicted values:
 Structure:
tc is the critical two tailed t-value for a given confidence level with n-2 df.
sf is the standard error of forecast. (Calculating sf is highly improbable in the exam)
N
ot
For
R
elease
CORRELATION AND REGRESSION Quantitative Analysis
R-11 (SS-3)
Page 4 of 12
 Interpretation:
Given a forecasted value of X, we can be (i.e. 95%) confident that Y will be between Y -
tc*sf and Y + tc*sf.
o Analysis of Variance (ANOVA):
Total variation = Explained variation + Unexplained Variation
 Total variation = Total Sum of Squares (SST) =
 Explained variation = Regression Sum of Squares (RSS) =
 Unexplained variation = Sum of Squared Errors (SSE) =
 If we denote the number of independent variables as k, then, regression df = k
= 1 for simple linear regression, error df = n-k-1 = n-2 for the same.
 MSR is the mean regression sum of squares and MSE is the mean squared error.
 R2
= Explained variation (RSS) / Total variation (SST)
 Standard Error of Estimate (SEE) =
 Variance of Y = SST / (n-1)
N
ot
For
R
elease
CORRELATION AND REGRESSION Quantitative Analysis
R-11 (SS-3)
Page 5 of 12
o The F-Statistic: (more useful with multiple regression)
Asses how well a set of independent variables, as a group, explains the variation in the
dependent variable with a desired level of significance. In other words, whether at least one of
the independent variables explains a siginificant portion of the variation of the dependent
variable. F-test is a one-tailed test.
 Test structure:
 Test statistic:
 Fc is the critical F-value at a given level of significance and the following df:
dfnumerator = k = 1
dfdenminator = n-k-1 = n-2
 Decision rule:
 Interpretation:
If null cannot be rejected, we conclude that the slope coefficient is not significantly
different than zero at the given significance level (i.e 5%).
o Limitations of regression analysis:
 Linear relationships can change over time (parameter instability)
 It’s usefulness is limited if other market participants are aware of and act on it.
 If the assumptions of the model don’t hold, the interpretation of the results will not be valid.
Major reasons for model invalidity include heteroskedasticity (non-constant variance of
error terms) and autocorrelation (error terms are not independent).
N
ot
For
R
elease
MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis
R-12 (SS-3)
Page 6 of 12
 Multiple regression is a regression analysis with more than one independent variable.
o Slope coefficient: descibes the change in Y for a one unit change in Xk holding other independent
valriables constant.
o Inercept term: the line’s intersection with the Y axis (value of Y at all Xs =0).
 Hypothesis Testing (t-tests).
 p-values: the smallest level of significance for which the null hypothesis can be rejected. So, if the p-value <
significance level, the null hypothesis can be rejected. Otherwise, the null hypothesis cannot be rejected.
 Confidence interval for a regression coefficient.
 If the independent variable is proved to be statistically insignificant (its coefficient is not different than zero
at a given confidence level), the whole model needs to be reestimated as the coefficients of other significant
variables will likely change.
 Assumptions: same as univariate regression in addition to that there is no exact linear relation between any
two or more independent variables. (otherwise, Multicollinearity)
 The F-Statistic
 R2
: the percentage of variation in the dependent variable that is collectively explained by all of the
independent variables.
o Multiple R: the correlation between actual and forecasted values of Y. Multiple R is the square root
of R2
. For simple regression, the correlation between the dependent and independent variables is
the same as multiple R with the same sign as slope coefficient.
 Adjusted R2
: R2
increases as more independent variables are added to the model, regardless of their
explanatory power, this problem is called overestimating the regression. To overcome this problem R2
should be adjusted for the number of independent variables as per the following formula:
o Adjusted R2
<= R2
o Adding a new variable to the model will increase R2
while it may increase or decrease adjusted R2
o Adjusted R2
may be less than zero if R2
is low enough
 Dummy variables:
o Usually used to quantify the impact of qualitative binary events (on or off). Dummy variables are
assigned values of 1 or 0 for on or off status.
o Whenever we need to distinguish between n classes we must use n-1 dummy variables. Otherwise,
the multiple regression assumption of no exact linear relationship between independent variables
would be violated.
N
ot
For
R
elease
MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis
R-12 (SS-3)
Page 7 of 12
The omitted class should be thought of as the reference point which is represented by the intercept.
o Testing the statistical significance of the slope coefficients is equivalent to testing whether the value
of the dummy variable is equal to the omitted variable (intercept).
 Issues in regression analysis:
o Heteroskedasticity:
 Definition:
Occurs when the variance of the residuals is not the same across all observations. There are
two types:
 Unconditional Heteroskedasticity:
Not related to the level of the independent variables. Not a major problem.
 Conditional Heteroskedasticity:
Related to the level of the independent variables. Significant problem.
 Effect:
The standard errors are unreliable (affecting t-tests and F-test) while the coefficients are not
affected. Too small standard error is the main concern as it might lead to type I error, by
rejecting the null hypothesis of no significant coefficient.
 Detection:
 Examine the scatter plot of the residuals against the independent variables.
 Breusch-Pagan (BP) test by conducting a second regression, using the squared
residuals (from the 1st
regression) against the independent variables and test
whether the independent variable significantly contribute to the explanation of the
squared residuals.
The test statistic has a chi-square (χ2
) distribution with k degrees of freedom and
calculated as:
This is a one-tailed test as the concern is having too large
If test statistic > chi-square critical value ⟹ Reject the null hypothesis and
conclude that a conditional heteroskedasticity problem is present.
 Correction:
 Use robust standard errors (White-corrected standard errors or heteroskedasticity-
consistent standard errors) which are usually higher than the original standard
errors.
 Use generalized least squares, which modifies the original equation in an attempt to
eliminate heteroskedasticity.
o Serial Correlation (Autocorrelation):
 Definition:
Occurs when the residual terms are correlated with one another. It’s a common problem
with time series data. There are two types:
 Positive serial correlation:
When a positive error in one period increases the probability of observing a positive
error in the next period.
 Negative serial correlation:
When a positive error in one period increases the probability of observing a
negative error in the next period.
N
ot
For
R
elease
MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis
R-12 (SS-3)
Page 8 of 12
 Effect:
The tendency of the data to cluster together underestimates the coefficient standard errors,
leading to type I errors.
 Detection:
 Examine the scatter plot of the residuals against time.
 Durbin-Watson (DW) statistic (the calculation is impractical for the exam)
If the sample size is very large ⟹ DW 2 (1-r), where r is the correlation coefficient
between residuals from one period and those from the previous period.
⟹ DW = 2 if r = 0 (homoscedastic data with no serial correlation)
⟹ DW > 2 if r < 0 (negative serial correlation)
⟹ DW < 2 if r > 0 (positive serial correlation)
For the Durbin-Watson test, there are upper and lower DW values depending on the
level of significance, number of observations, degrees of freedom (number of
variables k)
o Test Structure: H0: No positive serial correlation
o Decision Rule:
Reject H0 Inconclusive Fail to reject H0
0 DL DU
 Correction:
 Use Hansen method to provide Hansen-White standard errors, which also could be
used to correct for conditional heteroskedasticity. The general rule for use of
adjusted standard errors is:
o If the problem is serial correlation only ⟹ Hansen Method
o If the problem is conditional heteroskedasticity only ⟹ White-corrected
o If the problem is both ⟹ Hansen Method
 Improve the model specification, by including a seasonal term to reflect the time
series nature of the data. This can be tricky.
o Multicollinearity:
 Definition:
Occurs when linear combinations of independent variables are highly correlated. For k>2,
high correlation between individual independent variables (>0.7) suggests the possibility of
multicollinearity, but low correlation doesn’t necessarily indicate no multicollinearity.
N
ot
For
R
elease
MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis
R-12 (SS-3)
Page 9 of 12
 Effect:
Slope coefficients tend to be unreliable, and standard errors are artificially inflated. Hence,
there is a greater probability of Type II error.
 Detection:
While the F-test is statistically significant and R2
is high, the t-tests indicate no significance of
the individual coefficients.
 Correction:
Use statistical procedures, like stepwise regression, which systematically remove variables
from the regression until multicollinearity is minimized.
 Model misspecification:
o Categories:
I. The functional form can be misspecified:
1. Important variables are omitted.
2. Variables should be transformed.
(If the dependent is linearly related to the natural log of the variable or
standardizing B/S items by dividing by Total Assets or Sales for P&L and CF items.
Common mistakes include squaring or taking square root of the variable).
3. Data is improperly pooled.
(By pooling sub-periods that exhibit structural change).
II. Explanatory variables are correlated with the error terms in time series analysis:
1. A lagged dependent variable is used as an independent variable.
2. A function of the dependent variable is used as an independent variable
(“forecasting the past” i.e. use end of month market cap to predict returns during
the month).
3. Independent variables are measured with error.
(Use free float as a proxy for the corporate governance quality or actual inflation as
a proxy for expected inflation).
III. Other time-series misspecifications that result in nonstationarity.
o Effect:
Regression coefficients are often biased and/or inconsistent ⟹ unreliable hypothesis testing and
inaccurate predictions.
 Qualitative (dummy) dependent variables:
o Probit and logit models:
 Estimate the probability that the event occurs (i.e. probability of default or merger).
 The maximum likelihood is used to estimate coefficients.
 A probit model is based on normal distribution, while a logit model is based on a logistic
distribution.
o Discriminant models:
 Results in a linear function, similar to an ordinary regression, which generates an overall
score for an observation. The scores can then be used to rank or classify observations.
 Example: use financial ratios to get a score that places a company in a bankrupt or not
bankrupt class.
 Similar to probit and logit models but make different assumptions regarding the
independent variables.
N
ot
For
R
elease
TIME-SERIES ANALYSIS Quantitative Analysis
R-13 (SS-3)
Page 10 of 12
 A time series is a set of observations over successive periods of time.
 Linear trend model: (the data plot on a straight line)
 Log-linear trend model: (the data plot in a curve)
The model defines y as an exponential function of time. By taking the natural log of both sides, we transform
the equation from an exponential to a linear function.
For financial time series which display exponential growth, log-linear model provides a better fit for the data
and, thus, increases the model predictive power.
 When a variable grows at a constant rate (i.e. financial data and company sales), a log-linear model is most
appropriate. When it grows by a constant amount (i.e. inflation), a linear trend model is most appropriate.
 Limitation of trend models:
When time-series residuals exhibit serial correlation, as evident by DW test, we need to use autoregressive
(AR) model. This is done by regressing the dependent variable against one or more lagged values of itself, on
condition that the time-series being modeled is covariance stationary.
 Conditions for covariance stationary:
I. Constant and finite expected value (mean-reverting level).
II. Constant and finite variance (homoscedastic).
III. Constant and finite covariance between values at any given lag (the covariance of the time series
with leading or lagged values of itself is constant).
 An (AR) model of order p, AR(p) is expressed as:
(Where p is the number of lagged values included as independent variables)
 Forecasting with an autoregressive model:
Applying the chain rule of forecasting, it’s necessary to calculate a one-step-ahead forecast before a two-
step-ahead forecast. This implies that:
o Multi-period forecasts are more uncertain than single-period forecasts.
o Sample size = # of observations - # of (AR) order.
 Detection and correction for autocorrelation in autoregressive models:
The DW test used with trend models is not appropriate with AR models, instead, the following steps are
followed to detect autocorrelation and make sure the AR model is correctly specified:
I. Estimate the AR(1) model.
II. Calculate the autocorrelation of the model’s residuals.
III. t-test whether the autocorrelations are significantly different from zero. With df=T-2
√
⁄
, where T is the number of observations.
IV. If any of the autocorrelations is significantly different from zero, the model is not specified correctly.
Add more lags to the model and repeat step II until all autocorrelations are insignificant.
N
ot
For
R
elease
TIME-SERIES ANALYSIS Quantitative Analysis
R-13 (SS-3)
Page 11 of 12
 Mean reversion and random walk:
For a time-series to be covariance stationary it must have a constant and finite mean-reverting level, which
is the value the time-series tends to move to. Once this level is reached it’s expected that
⟹ mean-reverting level ⁄
For b1 = 1 (called unit root) the model doesn’t have a finite mean-reverting level, and thus, not covariance
stationary. This happens when the model follows random walk process which is classified to:
o Random walk without a drift:
o Random walk with a drift:
 Unit root detection:
As testing if b1=1 cannot be performed directly, use Dickey and Fuller (DF) test by transforming the AR(1)
model to run a simple regression by subtracting from both sides as follows:
⟹
Then test whether the transformed coefficient is different from zero using a modified t-test. With
H0 : = 0, if we fail to reject, we conclude that the series has a unit root.
 Unit root correction:
Use first differencing to transform the data to a covariance stationary time series for which . This is
done by constructing the following AR(1) model:
where and and
⟹ ⁄ ⁄ (finite value)
If the data has a linear trend, first difference the data. If the data has an exponential trend, first difference
the natural log of the data.
 Seasonality detection:
A pattern that tends to repeat from year to year. Not accounting for seasonality, when it’s present, will
make the AR model misspecified and unreliable for forecasting purposes.
Seasonality can be detected by observing that the residual autocorrelation for the month or quarter from
previous year (month 12 or quarter 4) is significantly different from zero.
 Seasonality correction:
Add an additional lag corresponding to the same period last year to the original model.
 In-sample forecasts are made within the range of data used to estimate the model. Out-of-sample forecasts
are made outside the sample period, to assess the predictive power of the model.
Given two models, to assess which one is better, apply root mean squared error (RMSE) criterion (the
square root of the average of the squared errors) on out-of-sample data. The model with the lowest RMSE is
the most accurate.
 As financial and economic environments are dynamic and frequently subject to structural shifts, there is a
tradeoff between the increased statistical reliability when using long time-series periods, and the increased
stability of the estimates when using shorter periods.
N
ot
For
R
elease
TIME-SERIES ANALYSIS Quantitative Analysis
R-13 (SS-3)
Page 12 of 12
 Autoregressive Conditional Heteroskedasticity (ARCH) model:
ARCH exists if the variance of the residuals in one period is dependent on the variance of the residuals in a
previous period. ARCH(1) model is expressed as:
If the coefficient a1 is statistically different from zero, it can be positive and the variance increases over time,
or negative and the variance decreases over time, indicating that error terms exhibit heteroskedasticity. In
either case, the time-series is ARCH(1) and, according to our need, we can either:
o Correct the model using procedures that correct for heteroskedasticity, such as generalized least
squares.
o Predict the variance of the residuals in future periods.
 Considerations for using two time-series variables in a linear regression:
Test for covariance stationary (by detecting the presence of autocorrelation or unit root) with the following
possibilities along with whether the data can be used or not:
1. Both time-series are covariance stationary ⟹ Yes
2. Only one variable is covariance stationary ⟹ No
3. Both time-series are not covariance stationary:
3.1. The two series are cointegrated ⟹ Yes
3.2. The two series are not cointegrated ⟹ No
 Cointergration:
Means that the two time-series are economically linked or follow the same trend and that relationship is not
expected to change.
To test for cointegration, regress one variable on the other using the following model:
The residuals are tested for a unit root using DF test with critical t-values calculated by Engle and Granger
(i.e. DF-EG test). If the test rejects null hypothesis of a unit root, we conclude that the error terms generated
by the two time series are covariance stationary and the two series are cointegrated.
N
ot
For
R
elease

Mais conteúdo relacionado

Mais procurados

Solving stepwise regression problems
Solving stepwise regression problemsSolving stepwise regression problems
Solving stepwise regression problemsSoma Sinha Roy
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysissomimemon
 
Functional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsFunctional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsTransweb Global Inc
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in RAlichy Sowmya
 
Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysisRabin BK
 
Polynomial regression
Polynomial regressionPolynomial regression
Polynomial regressionnaveedaliabad
 
Regression analysis
Regression analysisRegression analysis
Regression analysisSohag Babu
 
Regression &amp; correlation coefficient
Regression &amp; correlation coefficientRegression &amp; correlation coefficient
Regression &amp; correlation coefficientMuhamamdZiaSamad
 
Regression & It's Types
Regression & It's TypesRegression & It's Types
Regression & It's TypesMehul Boricha
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 
Regression: A skin-deep dive
Regression: A skin-deep diveRegression: A skin-deep dive
Regression: A skin-deep diveabulyomon
 

Mais procurados (20)

Solving stepwise regression problems
Solving stepwise regression problemsSolving stepwise regression problems
Solving stepwise regression problems
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Functional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsFunctional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | Eonomics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
 
Regression presentation
Regression presentationRegression presentation
Regression presentation
 
Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysis
 
Econometrics chapter 8
Econometrics chapter 8Econometrics chapter 8
Econometrics chapter 8
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Polynomial regression
Polynomial regressionPolynomial regression
Polynomial regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Simple Regression
Simple RegressionSimple Regression
Simple Regression
 
Regression &amp; correlation coefficient
Regression &amp; correlation coefficientRegression &amp; correlation coefficient
Regression &amp; correlation coefficient
 
Regression & It's Types
Regression & It's TypesRegression & It's Types
Regression & It's Types
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
Regression: A skin-deep dive
Regression: A skin-deep diveRegression: A skin-deep dive
Regression: A skin-deep dive
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 

Semelhante a Quantitative Methods - Level II - CFA Program

Exploring bivariate data
Exploring bivariate dataExploring bivariate data
Exploring bivariate dataUlster BOCES
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regressionKhulna University
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 
Regression analysis
Regression analysisRegression analysis
Regression analysissaba khan
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsBabasab Patil
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAntony Raj
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersrecepmaz
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersrecepmaz
 

Semelhante a Quantitative Methods - Level II - CFA Program (20)

Simple Regression.pptx
Simple Regression.pptxSimple Regression.pptx
Simple Regression.pptx
 
Exploring bivariate data
Exploring bivariate dataExploring bivariate data
Exploring bivariate data
 
IDS.pdf
IDS.pdfIDS.pdf
IDS.pdf
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regression
 
Regression -Linear.pptx
Regression -Linear.pptxRegression -Linear.pptx
Regression -Linear.pptx
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Measure of Association
Measure of AssociationMeasure of Association
Measure of Association
 
2-20-04.ppt
2-20-04.ppt2-20-04.ppt
2-20-04.ppt
 
9. parametric regression
9. parametric regression9. parametric regression
9. parametric regression
 
12943625.ppt
12943625.ppt12943625.ppt
12943625.ppt
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
 
Regression
RegressionRegression
Regression
 
Correlation and Regression
Correlation and Regression Correlation and Regression
Correlation and Regression
 
Regression
RegressionRegression
Regression
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
SPSS
SPSSSPSS
SPSS
 
Ders 2 ols .ppt
Ders 2 ols .pptDers 2 ols .ppt
Ders 2 ols .ppt
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
 

Último

ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 

Último (20)

ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptx
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 

Quantitative Methods - Level II - CFA Program

  • 1. CORRELATION AND REGRESSION Quantitative Analysis R-11 (SS-3) Page 1 of 12  Covariance: o Measures the linear relationship between two variables. o It’s value is not very meaningful as it ranges from positive to negative infinity and presented in terms of squared units (i.e. %2 , $2 )  Correlation: o Standardized measure of the linear relationship between two variables. o Its value has no measurement unit and ranges from -1 (perfectly negatively correlated) to +1 (perfectly positively correlated). o Limitations include the impact of outliers, potential for spurious correlation, and non-linear relationships.  Interpreting a scatter plot: o A collection of points on a graph where each point represents the value of two variables. o If correlation equals +1 the points lie exactly on an upward sloping line, the opposite is correct for correlation equals -1.  Hypothesis Testing for statistical significance: o Test whether the correlation between the population of two variables is equal to zero, (Two-tailed test with n-2 degrees of freedom at a given confidence level). o Test structure: o Test statistic: (assuming normal distribution) o Decision rule: o Interpretation: If null cannot be rejected, we conclude that the correlation between variables X and Y is not significantly different than zero at the given significance level (i.e 5%). N ot For R elease
  • 2. CORRELATION AND REGRESSION Quantitative Analysis R-11 (SS-3) Page 2 of 12  Simple Linear Regression: o Purpose: To explain the variation in a dependent variable in terms of the variation in a single independent variable. Dependent variable = explained, endogenous, or predicted variable. Independent variable = explanatory, exogenous, predicting variable. o Assumptions: (mostly related to residual –distrurbance or error- term (ε))  A linear relationship exists between the dependent and the independent variable.  The independent variable is uncorrelated with the residuals.  The expected value of the residual term is zero.  The variance of the residual term is constant for all observations. (otherwise, the data is heteroskedastic)  The residual term is independently distributed; that is, the residual for one observation is not correlated with the residual of another. (otherwise, the data exhibits autocorrelation)  The residual term is normally distributed. o Model construction:  The linear equation (regression line or line of best fit) is the line which minimizes the Sum of Squared Errors (SSE), that’s why simple linear regression is often called Ordinary Least Squares (OLS) regression and the estimated values are called least squares estimates.  Slope coefficient: descibes the change in Y for a one unit change in X. (stock’s β or systematic risk level, when X=market excess returns and Y=stock excess retuns)  Inercept term: the line’s intersection with the Y axis (value of Y at X=0). (ex-post α or excess risk-adjusted return relative to a market benchmark , when X=market excess returns and Y=stock excess retuns) N ot For R elease
  • 3. CORRELATION AND REGRESSION Quantitative Analysis R-11 (SS-3) Page 3 of 12 o Importance of the regression model in explaining the independent variable: Requires determining the statistical significance of the regression (slope) coefficient through:  Confidence Interval:  Structure: tc is the critical two tailed t-value for a given confidence level with n-2 df.  Decision rule & interpretation: If confidence interval doesn’t include zero, we can conclude that the slope coefficient slope coefficient is significantly different from zero.  Hypothesis Testing:  Test structure:  Test statistic: (assuming normal distribution)  Decision rule:  Interpretation: If null cannot be rejected, we conclude that the slope coefficient is not significantly different than the hypothesized value of b1 (zero in this case) at the given significance level (i.e 5%).  F-Test: to be discussed later at the end of this reading o Standard Error of Estimate (SEE):  Also known as standard error of the residual or standard error of the regression, measures the degree of variability of the actual Y-values relative to the estimated Y-values from a regression equation = σε.  The higher the correlation, the smaller the Standard Error, the better the fit. o Coefficient of determination (R2 ):  The percentage of the total variation in the dependent variable explained by the independent variable.  For simple linear regression, R2 = ρ2 o Confidence interval for predicted values:  Structure: tc is the critical two tailed t-value for a given confidence level with n-2 df. sf is the standard error of forecast. (Calculating sf is highly improbable in the exam) N ot For R elease
  • 4. CORRELATION AND REGRESSION Quantitative Analysis R-11 (SS-3) Page 4 of 12  Interpretation: Given a forecasted value of X, we can be (i.e. 95%) confident that Y will be between Y - tc*sf and Y + tc*sf. o Analysis of Variance (ANOVA): Total variation = Explained variation + Unexplained Variation  Total variation = Total Sum of Squares (SST) =  Explained variation = Regression Sum of Squares (RSS) =  Unexplained variation = Sum of Squared Errors (SSE) =  If we denote the number of independent variables as k, then, regression df = k = 1 for simple linear regression, error df = n-k-1 = n-2 for the same.  MSR is the mean regression sum of squares and MSE is the mean squared error.  R2 = Explained variation (RSS) / Total variation (SST)  Standard Error of Estimate (SEE) =  Variance of Y = SST / (n-1) N ot For R elease
  • 5. CORRELATION AND REGRESSION Quantitative Analysis R-11 (SS-3) Page 5 of 12 o The F-Statistic: (more useful with multiple regression) Asses how well a set of independent variables, as a group, explains the variation in the dependent variable with a desired level of significance. In other words, whether at least one of the independent variables explains a siginificant portion of the variation of the dependent variable. F-test is a one-tailed test.  Test structure:  Test statistic:  Fc is the critical F-value at a given level of significance and the following df: dfnumerator = k = 1 dfdenminator = n-k-1 = n-2  Decision rule:  Interpretation: If null cannot be rejected, we conclude that the slope coefficient is not significantly different than zero at the given significance level (i.e 5%). o Limitations of regression analysis:  Linear relationships can change over time (parameter instability)  It’s usefulness is limited if other market participants are aware of and act on it.  If the assumptions of the model don’t hold, the interpretation of the results will not be valid. Major reasons for model invalidity include heteroskedasticity (non-constant variance of error terms) and autocorrelation (error terms are not independent). N ot For R elease
  • 6. MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis R-12 (SS-3) Page 6 of 12  Multiple regression is a regression analysis with more than one independent variable. o Slope coefficient: descibes the change in Y for a one unit change in Xk holding other independent valriables constant. o Inercept term: the line’s intersection with the Y axis (value of Y at all Xs =0).  Hypothesis Testing (t-tests).  p-values: the smallest level of significance for which the null hypothesis can be rejected. So, if the p-value < significance level, the null hypothesis can be rejected. Otherwise, the null hypothesis cannot be rejected.  Confidence interval for a regression coefficient.  If the independent variable is proved to be statistically insignificant (its coefficient is not different than zero at a given confidence level), the whole model needs to be reestimated as the coefficients of other significant variables will likely change.  Assumptions: same as univariate regression in addition to that there is no exact linear relation between any two or more independent variables. (otherwise, Multicollinearity)  The F-Statistic  R2 : the percentage of variation in the dependent variable that is collectively explained by all of the independent variables. o Multiple R: the correlation between actual and forecasted values of Y. Multiple R is the square root of R2 . For simple regression, the correlation between the dependent and independent variables is the same as multiple R with the same sign as slope coefficient.  Adjusted R2 : R2 increases as more independent variables are added to the model, regardless of their explanatory power, this problem is called overestimating the regression. To overcome this problem R2 should be adjusted for the number of independent variables as per the following formula: o Adjusted R2 <= R2 o Adding a new variable to the model will increase R2 while it may increase or decrease adjusted R2 o Adjusted R2 may be less than zero if R2 is low enough  Dummy variables: o Usually used to quantify the impact of qualitative binary events (on or off). Dummy variables are assigned values of 1 or 0 for on or off status. o Whenever we need to distinguish between n classes we must use n-1 dummy variables. Otherwise, the multiple regression assumption of no exact linear relationship between independent variables would be violated. N ot For R elease
  • 7. MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis R-12 (SS-3) Page 7 of 12 The omitted class should be thought of as the reference point which is represented by the intercept. o Testing the statistical significance of the slope coefficients is equivalent to testing whether the value of the dummy variable is equal to the omitted variable (intercept).  Issues in regression analysis: o Heteroskedasticity:  Definition: Occurs when the variance of the residuals is not the same across all observations. There are two types:  Unconditional Heteroskedasticity: Not related to the level of the independent variables. Not a major problem.  Conditional Heteroskedasticity: Related to the level of the independent variables. Significant problem.  Effect: The standard errors are unreliable (affecting t-tests and F-test) while the coefficients are not affected. Too small standard error is the main concern as it might lead to type I error, by rejecting the null hypothesis of no significant coefficient.  Detection:  Examine the scatter plot of the residuals against the independent variables.  Breusch-Pagan (BP) test by conducting a second regression, using the squared residuals (from the 1st regression) against the independent variables and test whether the independent variable significantly contribute to the explanation of the squared residuals. The test statistic has a chi-square (χ2 ) distribution with k degrees of freedom and calculated as: This is a one-tailed test as the concern is having too large If test statistic > chi-square critical value ⟹ Reject the null hypothesis and conclude that a conditional heteroskedasticity problem is present.  Correction:  Use robust standard errors (White-corrected standard errors or heteroskedasticity- consistent standard errors) which are usually higher than the original standard errors.  Use generalized least squares, which modifies the original equation in an attempt to eliminate heteroskedasticity. o Serial Correlation (Autocorrelation):  Definition: Occurs when the residual terms are correlated with one another. It’s a common problem with time series data. There are two types:  Positive serial correlation: When a positive error in one period increases the probability of observing a positive error in the next period.  Negative serial correlation: When a positive error in one period increases the probability of observing a negative error in the next period. N ot For R elease
  • 8. MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis R-12 (SS-3) Page 8 of 12  Effect: The tendency of the data to cluster together underestimates the coefficient standard errors, leading to type I errors.  Detection:  Examine the scatter plot of the residuals against time.  Durbin-Watson (DW) statistic (the calculation is impractical for the exam) If the sample size is very large ⟹ DW 2 (1-r), where r is the correlation coefficient between residuals from one period and those from the previous period. ⟹ DW = 2 if r = 0 (homoscedastic data with no serial correlation) ⟹ DW > 2 if r < 0 (negative serial correlation) ⟹ DW < 2 if r > 0 (positive serial correlation) For the Durbin-Watson test, there are upper and lower DW values depending on the level of significance, number of observations, degrees of freedom (number of variables k) o Test Structure: H0: No positive serial correlation o Decision Rule: Reject H0 Inconclusive Fail to reject H0 0 DL DU  Correction:  Use Hansen method to provide Hansen-White standard errors, which also could be used to correct for conditional heteroskedasticity. The general rule for use of adjusted standard errors is: o If the problem is serial correlation only ⟹ Hansen Method o If the problem is conditional heteroskedasticity only ⟹ White-corrected o If the problem is both ⟹ Hansen Method  Improve the model specification, by including a seasonal term to reflect the time series nature of the data. This can be tricky. o Multicollinearity:  Definition: Occurs when linear combinations of independent variables are highly correlated. For k>2, high correlation between individual independent variables (>0.7) suggests the possibility of multicollinearity, but low correlation doesn’t necessarily indicate no multicollinearity. N ot For R elease
  • 9. MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis R-12 (SS-3) Page 9 of 12  Effect: Slope coefficients tend to be unreliable, and standard errors are artificially inflated. Hence, there is a greater probability of Type II error.  Detection: While the F-test is statistically significant and R2 is high, the t-tests indicate no significance of the individual coefficients.  Correction: Use statistical procedures, like stepwise regression, which systematically remove variables from the regression until multicollinearity is minimized.  Model misspecification: o Categories: I. The functional form can be misspecified: 1. Important variables are omitted. 2. Variables should be transformed. (If the dependent is linearly related to the natural log of the variable or standardizing B/S items by dividing by Total Assets or Sales for P&L and CF items. Common mistakes include squaring or taking square root of the variable). 3. Data is improperly pooled. (By pooling sub-periods that exhibit structural change). II. Explanatory variables are correlated with the error terms in time series analysis: 1. A lagged dependent variable is used as an independent variable. 2. A function of the dependent variable is used as an independent variable (“forecasting the past” i.e. use end of month market cap to predict returns during the month). 3. Independent variables are measured with error. (Use free float as a proxy for the corporate governance quality or actual inflation as a proxy for expected inflation). III. Other time-series misspecifications that result in nonstationarity. o Effect: Regression coefficients are often biased and/or inconsistent ⟹ unreliable hypothesis testing and inaccurate predictions.  Qualitative (dummy) dependent variables: o Probit and logit models:  Estimate the probability that the event occurs (i.e. probability of default or merger).  The maximum likelihood is used to estimate coefficients.  A probit model is based on normal distribution, while a logit model is based on a logistic distribution. o Discriminant models:  Results in a linear function, similar to an ordinary regression, which generates an overall score for an observation. The scores can then be used to rank or classify observations.  Example: use financial ratios to get a score that places a company in a bankrupt or not bankrupt class.  Similar to probit and logit models but make different assumptions regarding the independent variables. N ot For R elease
  • 10. TIME-SERIES ANALYSIS Quantitative Analysis R-13 (SS-3) Page 10 of 12  A time series is a set of observations over successive periods of time.  Linear trend model: (the data plot on a straight line)  Log-linear trend model: (the data plot in a curve) The model defines y as an exponential function of time. By taking the natural log of both sides, we transform the equation from an exponential to a linear function. For financial time series which display exponential growth, log-linear model provides a better fit for the data and, thus, increases the model predictive power.  When a variable grows at a constant rate (i.e. financial data and company sales), a log-linear model is most appropriate. When it grows by a constant amount (i.e. inflation), a linear trend model is most appropriate.  Limitation of trend models: When time-series residuals exhibit serial correlation, as evident by DW test, we need to use autoregressive (AR) model. This is done by regressing the dependent variable against one or more lagged values of itself, on condition that the time-series being modeled is covariance stationary.  Conditions for covariance stationary: I. Constant and finite expected value (mean-reverting level). II. Constant and finite variance (homoscedastic). III. Constant and finite covariance between values at any given lag (the covariance of the time series with leading or lagged values of itself is constant).  An (AR) model of order p, AR(p) is expressed as: (Where p is the number of lagged values included as independent variables)  Forecasting with an autoregressive model: Applying the chain rule of forecasting, it’s necessary to calculate a one-step-ahead forecast before a two- step-ahead forecast. This implies that: o Multi-period forecasts are more uncertain than single-period forecasts. o Sample size = # of observations - # of (AR) order.  Detection and correction for autocorrelation in autoregressive models: The DW test used with trend models is not appropriate with AR models, instead, the following steps are followed to detect autocorrelation and make sure the AR model is correctly specified: I. Estimate the AR(1) model. II. Calculate the autocorrelation of the model’s residuals. III. t-test whether the autocorrelations are significantly different from zero. With df=T-2 √ ⁄ , where T is the number of observations. IV. If any of the autocorrelations is significantly different from zero, the model is not specified correctly. Add more lags to the model and repeat step II until all autocorrelations are insignificant. N ot For R elease
  • 11. TIME-SERIES ANALYSIS Quantitative Analysis R-13 (SS-3) Page 11 of 12  Mean reversion and random walk: For a time-series to be covariance stationary it must have a constant and finite mean-reverting level, which is the value the time-series tends to move to. Once this level is reached it’s expected that ⟹ mean-reverting level ⁄ For b1 = 1 (called unit root) the model doesn’t have a finite mean-reverting level, and thus, not covariance stationary. This happens when the model follows random walk process which is classified to: o Random walk without a drift: o Random walk with a drift:  Unit root detection: As testing if b1=1 cannot be performed directly, use Dickey and Fuller (DF) test by transforming the AR(1) model to run a simple regression by subtracting from both sides as follows: ⟹ Then test whether the transformed coefficient is different from zero using a modified t-test. With H0 : = 0, if we fail to reject, we conclude that the series has a unit root.  Unit root correction: Use first differencing to transform the data to a covariance stationary time series for which . This is done by constructing the following AR(1) model: where and and ⟹ ⁄ ⁄ (finite value) If the data has a linear trend, first difference the data. If the data has an exponential trend, first difference the natural log of the data.  Seasonality detection: A pattern that tends to repeat from year to year. Not accounting for seasonality, when it’s present, will make the AR model misspecified and unreliable for forecasting purposes. Seasonality can be detected by observing that the residual autocorrelation for the month or quarter from previous year (month 12 or quarter 4) is significantly different from zero.  Seasonality correction: Add an additional lag corresponding to the same period last year to the original model.  In-sample forecasts are made within the range of data used to estimate the model. Out-of-sample forecasts are made outside the sample period, to assess the predictive power of the model. Given two models, to assess which one is better, apply root mean squared error (RMSE) criterion (the square root of the average of the squared errors) on out-of-sample data. The model with the lowest RMSE is the most accurate.  As financial and economic environments are dynamic and frequently subject to structural shifts, there is a tradeoff between the increased statistical reliability when using long time-series periods, and the increased stability of the estimates when using shorter periods. N ot For R elease
  • 12. TIME-SERIES ANALYSIS Quantitative Analysis R-13 (SS-3) Page 12 of 12  Autoregressive Conditional Heteroskedasticity (ARCH) model: ARCH exists if the variance of the residuals in one period is dependent on the variance of the residuals in a previous period. ARCH(1) model is expressed as: If the coefficient a1 is statistically different from zero, it can be positive and the variance increases over time, or negative and the variance decreases over time, indicating that error terms exhibit heteroskedasticity. In either case, the time-series is ARCH(1) and, according to our need, we can either: o Correct the model using procedures that correct for heteroskedasticity, such as generalized least squares. o Predict the variance of the residuals in future periods.  Considerations for using two time-series variables in a linear regression: Test for covariance stationary (by detecting the presence of autocorrelation or unit root) with the following possibilities along with whether the data can be used or not: 1. Both time-series are covariance stationary ⟹ Yes 2. Only one variable is covariance stationary ⟹ No 3. Both time-series are not covariance stationary: 3.1. The two series are cointegrated ⟹ Yes 3.2. The two series are not cointegrated ⟹ No  Cointergration: Means that the two time-series are economically linked or follow the same trend and that relationship is not expected to change. To test for cointegration, regress one variable on the other using the following model: The residuals are tested for a unit root using DF test with critical t-values calculated by Engle and Granger (i.e. DF-EG test). If the test rejects null hypothesis of a unit root, we conclude that the error terms generated by the two time series are covariance stationary and the two series are cointegrated. N ot For R elease