SlideShare uma empresa Scribd logo
1 de 34
Econometric Methodology:
Choosing the Right Regressors
PIDE Nurturing Minds Seminar 13th Sept 2017
Asad Zaman, VC PIDE
Major Misunderstandings of Econometric
Methodology
Three Papers Submitted to PDR:
(1): GDP = a + b1 X1 + … + bn Xn + C IPR = Intellectual Property Rights
(2): GDP = a + b1 X1 + … + bn Xn + C EXPORTS (Export Led Growth)
(3): GDP = a + b1 X1 + … + bn Xn + c1 FDI + c2 Lit (FDI + Human Capital)
Question: Can all three papers be right?
Axiom of Correct Specification
Edward Leamer
All INCLUDED Regressors MUST BE determinants.
All EXCLUDED Regressors MUST NOT BE DETERMINANTS
Both inclusions and exclusions MUST be correctly specified for the model to be valid.
ALL THREE REGRESSIONS CANNOT BE VALID:
If IPR is a determinant then other two are wrong, and similarly for the other (2)
SO at MOST one of the three can be correct.
BUT is one of the three correct? There are more than 60 variables AVAILABLE in WDI
data sets. So if ONLY one determinant, we have 60 possible regressions.
HOW CAN WE FIND OUT WHICH ONE IS CORRECT?
R-squared, AIC, BIC, Schwartz etc.
Use Model Selection Criteria --
 This method does not work very well.
KEY PROBLEM: HOW TO ENSURE THAT THERE ARE NO MISSING SIGNIFICANT
VARIABLES – these cause EXTREME BIAS and WRONG AN MISLEADING RESULTS
INCLUDING EXTRA VARIABLES CAUSE LOSS OF EFFICIENCY, BUT DOES NOT CAUSE
INFERENCE FAILURE !! BECAUSE 0 coefficient is a possible value.
BIG MODEL MUST ENCOMPASS TRUE MODEL, if it is to produce VALID INFERENCE.
How Much Trouble Can a Missing Variable
Cause?
CONS = Final consumption expenditure, etc. (constant 2010 US$) Pakistan
 GDP = GDP (constant 2010 US$) Pakistan
 CONS(Pak) = 4.12 + 0.883 * GDP (Pak) +  (R2=0.998)
(0.51) (0.006) (2.56)
Standard Keynesian Consumption Function. Likely to have autocorrelation and other
MIS-SPECIFIED SHORT TERM DYNAMICS. Other small missing variables.
Super-Consistency Holds – Because of strong trends in CONS and GDP,
misspecifications will not matter. Bias due to omitted stationary variables will VANISH.
But omitted MAJOR variable can make a big difference.
What happens if we regress CONS(Pak)
on randomly chosen WDI Vars ???
SUR =Survival to age 65, female (% of cohort)
C02 =CO2 emissions from gaseous fuel consumption (% of total)
 Obviously, these variables have no relation to Consumption. Nonetheless, the OLS
regression yields the following results
(E) CONS = -268.7 + 6.78 SUR – 1.82 CO2 +  (R2=0.84)
(25.9) (0.73) (0.65) (20.0)
Both SUR and CO2 are HIGHLY SIGNIFICANT DETERMINANTS OF CONSUMPTION!
This is called a NONESENSE REGRESSION.
IRRELEVANT VARIABLES BECOME SIGNIFICANT AS PROXIES FOR MISSING VARS.
ADD Relevant Regressor GDP:
(F) CONS = 15.64 + 0.902 GDP – 2.60 SUR + 67.8 CO2 +  (R2=0.99)
(0.54) (0.014) (1.40) (80.3) (2.28)
Both SUR and CO2 become insignificant
Nonsense Regressions are caused by MISSING DETERMINANTS
WRONG DIAGNOSIS MADE BY ECONOMETRICIANS:
Nonsense Regressions are cause by NON STATIONARITY – Immense amount of
literature on Integration and Co-Integration: COMPLETELY USELESS
Discovery due to Atiqur Rahman – He is supervising a thesis on this theme.
LESSON: Omitting Significant Regressor
Leads to NONESENSE REGRESSIONS
EXAMPLE:
(G) CONS(Pakistan) = -13.44 + 11.07 GDP(Honduras) +  (R2=0.99)
(1.15) (0.12) (4.34)
Consumption of Pakistan as a function of the GDP of Honduras !!
Standard Diagnosis – this is because CONS and GDP are not stationary. WRONG –
CONS and GDP do NOT have to be integrated variables. All we need is a MISSING
IMPORTANT DETERMINANT from the regression.
ORIGINAL QUESTION:
Are models (1), (2), (3) Correct?
 HOW do we know if important variables are missing or not?
 Maybe IPR is significant because it proxies for FDI or for Exports or for ANY of the
other 61 Variables available in WDI Data Set?
Edward Leamer: Fragility of Inference
First we study Leamer’s Solution – Extreme Bounds Analysis
Edward Leamer: Specification Search
The Truth about (regression) models
 Models are NOT used to DISCOVER truth.
 We start out KNOWING what we want to show.
 We manipulate data into proving this
 Example: Hundreds of papers proving free trade is beneficial
 Rodrik: A Skeptic’s Guide.
 Similar fact holds of models of economic theory.
Leamer: SPECIFICATION SEARCHES
 By varying sets of variables, we can get ANY RESULT we like
 Look at W as determinant of Y. Choose X1, X2, …, Xn
 By choosing the right set of variables, we can make coefficient of W positive or
negative, significant or insignificant.
 The PROCESS of REGRESSION is a specification search, where the Econometricians
looks for the right collection of regressors to prove his favorite hypothesis.
How to test if W is significant determinant
of Y?
Choose FIXED RELEVANT VARIABLES (guaranteed to be important from THEORETICAL
consideration, known a priori ) X1, X2, …, Xn
Focus Variable W
Potential Determinants: V1, V2, … Vn
Regress Y of X1,..,Xn, W, and some combination of Vi
If W is significant regardless of what combination of Vi is put in, then W is significant.
Look at the range of possible values of estimated coefficient of W. This is called
EXTREME BOUNDS ANALYSIS.
Typical conclusion: NO VARIABLE IS SIGNIFICANT. ALL INFERENCE IS FRAGILE.
Sala-i-Martin: I ran two million regressions
VARIANT of Leamer’s EBA
Start with 62 Variables in WDI data set. Set Three as Essential Determinants
GDP60, LE60, PSE60 – Life Expectancy and Primary School Enrolment (Barro).
That leaves X1,…,X59. Choose Any One of them as W, Choose ANY THREE OTHERS to
run:
Growth = c + b1 GDP60 + b2 LE60 + b3 PSE60 + c W + c1 Xi + c2 Xj + c3 Xk
VARY (i,j,k) over all possible sets of three regressors. 58x57x56= 185,136
If W is significant in 95% of these regressions, then count W as significant.
I count 10 million regressions here.
RESULT: 22 Variables out of the 59 are significant Conclusion EBA is too extreme
What is wrong with Sala-i-Martin?
 Analysis is self-contradictory
 If 22 variables are significant than ALL regressions with less than 22 regressors have
SIGNIFICANT OMITTED VARIABLES.
 It follows that all of his two million regressions are nonsense regression.
 CAN we get sensible results by running two million nonsense regressions?
 Answer NO. This can be established by simulation study, done by Hoover and
Perez later.
 Sala-i-Martin strategy has high Type I and II error probabilities. It can include
irrelevant regressors and exclude significant ones. Tends to include TOO MANY
variables as being relevant when they are NOT.
Pure Bayesian Approach
Fernandez, Ley, Steel (2001)
Take 41 regressors from Sala-t-Martin data set on which complete data is available. All
possible 2^41 – two trillion models. Assign priors to them.
Each regressor has prior probability 50% of being included in the model.
Compute posterior probabilities.
Regressors with HIGH posterior probabilities have high probabilities of being
determinants.
Strongest determinants are: Confucian% -- GDP60, EquipInv, LE60
Many other determinants.
Good models have 0.1 % probability.
Model Averaging VERSUS Selection
 Selection focuses on finding TRUE model and CORRECT regressors.
 BMA aims to USE all models, assign them weights, and come up with combined
forecast.
 DEBATE AND CONTROVERSEY:
 Can we average over wrong models and get right result?
 RESOLUTION – There are DIFFERENT GOALS, and each procedure is well suited to
its OWN goal.
 SELECTION involves putting all eggs in one basket. HIGHER RISK.
 FORECASTING involves avoiding selection and getting insurance against bad
choices.
Hoover Perez Simulations
 BMA fails to find the right regressors, BUT does well at forecasting.
 So when it comes to CHOOSING the right set of regressor, the right strategy comes
from ENCOMPASSING, using the Hendry Methodology
Hendry Methodology
 Conventional Methodology leads to conflicting, contradictory theories and models
 T!: IPR -- T2: FDI -- T3: Exports and many others – ALL theories describe
determinants of growth. They are in conflict with each other.
 Papers exist which prove ELG, GLE, BOTH, NEITHER
 Everybody runs a new regressions, and put down a new brick in a different place.
 There is NO CUMULATION OF KNOWLEDGE.
SOLUTION: ENCOMPASSING
 Given T1, T2, T3, etc. New Researcher is NOT ALLOWED to put down T(J)
 New Research MUST BUILD ON EXISTING RESEARCH.
 FIRST of ALL do a LIT REVIEW – that is, COVER, and BE AWARE of ALL PRIOR
EXISTING LITERATURE ON YOUR TOPIC.
 NEXT, explain the gap: What are the DEFECTS in existing theories?
 NEXT, FILL the gap: Explain how and why T(J) is SUPERIOR to all existing theories.
 At the END there should be ONLY ONE BEST THEORY – Encompassing shows that
our new theory COVERS all previous theories and IMPROVES UPON them. NEXT
researcher has to BEAT T(J) to produce T(J+!).
How to do this for Choice of Regressors
 GUM: General Unrestricted Model
 ADD ALL RELEVANT Variables
 In our example, form model with Exports, IPR, FDI, and include ALL regressors used
by ALL the researchers. The GUM NESTS T1 T2 T3 as special cases:
GUM:
GDP = a + b1 IPR + b2 FDI + b3 Exports + c1 X1 + … + ck Xk
T1 says that b2=0 and b3=0, T2 says that b1=0, b3=0, T3 says b1=b2=0
We can test these hypotheses using F-test for joint significance of multiple regressors.
Conventional Methodology
Simple-To-General
Start with C = a + b GNP + error -- Start with Simple Model,
If there is a FLAW, THEN look for additional regressors – Make it more complicated if
necessary.
What are FLAWS? Failures of standard assumptions
Heteroskedasticity (can usually be fixed by taking LOGS)
AUTOCORRELATION: Can be fixed by adding DYNAMICS to static equation
GUM Strategy for Autocorrelation
Suppose C = a + b Y + e has autocorrelated errors.
Then: e(t) = C(t) – a – b Y(t). ALSO: e(t-1) = C(t-1) – a – b Y(t-1)
AUTOCORRELATED MODEL IS e(t) = u(t) + r e(t-1)
C(t) = a + b Y(t) + u(t) + r e(t-1)
= a + b Y(t) + r C(t-1) –ra – rb Y(t-1) + u(t)
= (a - ra) + b Y(t) + r C(t-1) – rb Y(t-1) + u(t)
Consider the GENERAL ARDL model – This is General UNRESTRICTED Model
C = a* + b Y + c C(-1) + d Y(-1) + e
AR-1 model is special case with d = - bc, a* = a (1-r)
Flaws of Simple to General Strategy
 If regression equation does not forecast well (Y=a+bX) add relevant variable W.
 Then W may appear significant because it is proxy for some other missing variable.
This will DECEIVE the econometrician.
 If we add AR-1 restriction, we SET d = - bc. GeTS says ALLOW UNRESTRICTED d,
and THEN TEST RESTRICTION.
.
GeTS: General-To-Simple Modeling
 GeTS: Build the largest passible model. INCLUDE ALL POTENTIALLY RELEVANT
REGRESSORS. Now no regressor can be significant because of OMITTED
VARIABLES. Because you have included them ALL
 Assuming we have data on ALL relevant variables
 In the Sala-i-Martin data, run regression on ALL 61 variables.
 THEN DROP insignificant Variables.
Multiple Objections to GeTS
 With lots of regressors, we have MULTICOLLINEARITY problems.
 Many important regressors will fail to be significant.
 NOISE can exceed SIGNAL. Bad Regressors can drive out Good ones.
MANY PROBLEMS HAVE BEEN RESOLVED.
MUCH PROGRESS HAS BEEN MADE
EXISTING ALGORITHMS GIVE fairly GOOD probabilities of FINDING A MODEL WHICH
ENCOMPASSES the true model.
That is around 80% chances of picking up all relevant regressors, plus one or two
extras. (depending on configurations of model and regressors)
GeTS is NOT a mechanical procedure.
MUST be guided by KNOWLEDGE
 IDEAL CASE for GeTS
ALL regressors are ORTHOGONAL – that is INDEPENDENT.
Then each regressor can be treated separately, they do not INTERFERE with each other.
Arrange all the t-stats for significance in decreasing ORDER. DROP all t-stats LESS than
some critical value.
There is no model selection problem! That is significance will NOT be affected by
MODEL selection in this situation.
Much more difficult with CORRELATED
regressors
 Y = a + b X + c W1 + d W2
IDEAL situation: Good Regressor PRESENT, makes, Bad Regressor INSIGNIFICANT.
In first regression Pak Cons on Female Mortality and C02 Emissions, if we put in Pak
GNP, it makes other two variables INSIGNIFICANT.
THEORETICALLY, this will ALWAYS happen ASYMPTOTICALLY – as we get larger and
larger amounts of DATA (and MODEL does not CHANGE) Good Variables WILL DRIVE
out bad Variables.
PRACTICALLY THIS IS NOT GUARANTEED. Often working with small data sets. EVEN
with BIG DATA, if model changes from time to time than all data sets are small.
COMPLICATIONS
Pak Cons = a + b Pak GNP + c Honduras GNP + error
Honduras GNP REMAINS highly significant in this regression.
What does this mean? Does Honduras GNP matter for Pakistani Consumption?
No – it is acting as a proxy for some OTHER missing variable
This KNOWLEDGE comes from our knowledge of the real world. THAT is why model
selection cannot be mechanical.
Software Package PC-GeTS
Automatic Model Selection
Automatic GeTS is implements in PC-GeTS package, available and USEFUL.
It reaches correct models with high probability when sufficient data is available to
discriminate.
VARIABLE regressors are quickly spotted, those with low variation MAY BE MISSED.
The more the correlation, the greater possibility for ERROR – wrong variable can be
chosen instead of right one.
Human Guided Model Selection
MODELLING CORRECTION: Start by ensuring a GOOD GUM – that is, fulfill
assumptions of regression model. Choose right functional forms (log or other).
LINEARIZE relationships, and run a lot of different types of FIXES to put initial model
into GOOD Shape BEFORE starting selection.
TRY TO ORTHOGONALIZE REGRESSORS:
C = a GNP(t) + b GNP(t-1) can be changed to C = a GNP + b ∆ GNP
Multiple Searches
Y = a0 + a1 X1 + … + a60 X60
CAN Test EVEN when regressors exceed observations !
MAIN IDEAS drop ALL INSIGNIFICANT REGRESSORS. Works if regressors are
independent. But not if they are correlated. In this case, Good regressor may be
insignificant, and Bad Regressor may be significant. What to DO?
CHOOSE USING THEORY OVER EMPIRICS: Retain Theoretically Important Variables.
Take the TEN least significant variables. DROP them ONE at a TIME. This creates 10
DIFFERENT searches All variables are retained in one of the ten searches.
Compare TERMINAL models using BIC
 Continue EACH of ten searches by eliminating the LEAST significant regressors.
MAY BE GUIDED BY THEORY AT EACH STAGE
 Choose among collection of FINAL models.
 This selection need not be mechanical
 Can also use PRINCIPAL COMPONENTS to extract a small number of highly
variable regressor from a large set. But Problems arise in INTERPRETATION.
FINAL REMARKS
 TWO STEPS: Building a Good Regression Model (not being deceived by
ACCIDENTAL CORRELATIONS and SPURIOUS and NONESENSE REGRESSIONS)
 Picking out GENUINE, STABLE CORRELATIONS DOES NOT JUSTIFY CAUSAL
INFERENCE>
 GUM IDENTIFIES % CONFUCIAN as a KEY DETERMINANT of growth
 WHY?
 Because of CHINA. NOT A CAUSAL RELATIONSHIP
To view 70m video-talk based
on these slides, a brief
summary, and link to full paper,
see:
http://bit.do/azreg

Mais conteúdo relacionado

Mais procurados

Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regression
jamuga gitulho
 
Dummy variable
Dummy variableDummy variable
Dummy variable
Akram Ali
 

Mais procurados (20)

Intro to Quant Trading Strategies (Lecture 3 of 10)
Intro to Quant Trading Strategies (Lecture 3 of 10)Intro to Quant Trading Strategies (Lecture 3 of 10)
Intro to Quant Trading Strategies (Lecture 3 of 10)
 
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5
 
Intro to Quant Trading Strategies (Lecture 5 of 10)
Intro to Quant Trading Strategies (Lecture 5 of 10)Intro to Quant Trading Strategies (Lecture 5 of 10)
Intro to Quant Trading Strategies (Lecture 5 of 10)
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regression
 
Ali, Redescending M-estimator
Ali, Redescending M-estimator Ali, Redescending M-estimator
Ali, Redescending M-estimator
 
Csis 5420 week 8 homework answers (13 jul 05)
Csis 5420 week 8 homework   answers (13 jul 05)Csis 5420 week 8 homework   answers (13 jul 05)
Csis 5420 week 8 homework answers (13 jul 05)
 
Quantitative Methods for Lawyers - Class #21 - Regression Analysis - Part 4
Quantitative Methods for Lawyers - Class #21 - Regression Analysis - Part 4Quantitative Methods for Lawyers - Class #21 - Regression Analysis - Part 4
Quantitative Methods for Lawyers - Class #21 - Regression Analysis - Part 4
 
Intro to Quantitative Investment (Lecture 3 of 6)
Intro to Quantitative Investment (Lecture 3 of 6)Intro to Quantitative Investment (Lecture 3 of 6)
Intro to Quantitative Investment (Lecture 3 of 6)
 
165662191 chapter-03-answers-1
165662191 chapter-03-answers-1165662191 chapter-03-answers-1
165662191 chapter-03-answers-1
 
Dummy variable
Dummy variableDummy variable
Dummy variable
 
352735346 rsh-qam11-tif-16-doc
352735346 rsh-qam11-tif-16-doc352735346 rsh-qam11-tif-16-doc
352735346 rsh-qam11-tif-16-doc
 
352735347 rsh-qam11-tif-14-doc
352735347 rsh-qam11-tif-14-doc352735347 rsh-qam11-tif-14-doc
352735347 rsh-qam11-tif-14-doc
 
Les5e ppt 10
Les5e ppt 10Les5e ppt 10
Les5e ppt 10
 
200844797 rsh-qam11-tif-01-doc
200844797 rsh-qam11-tif-01-doc200844797 rsh-qam11-tif-01-doc
200844797 rsh-qam11-tif-01-doc
 
Selection & Making Decisions in c
Selection & Making Decisions in cSelection & Making Decisions in c
Selection & Making Decisions in c
 
7734376
77343767734376
7734376
 
Les5e ppt 09
Les5e ppt 09Les5e ppt 09
Les5e ppt 09
 
Quantitative Methods for Lawyers - Class #17 - Scatter Plots, Covariance, Cor...
Quantitative Methods for Lawyers - Class #17 - Scatter Plots, Covariance, Cor...Quantitative Methods for Lawyers - Class #17 - Scatter Plots, Covariance, Cor...
Quantitative Methods for Lawyers - Class #17 - Scatter Plots, Covariance, Cor...
 
Decision Tree Analysis
Decision Tree AnalysisDecision Tree Analysis
Decision Tree Analysis
 

Semelhante a Choosing the Right Regressors

Hypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.pptHypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.ppt
Solomonkiplimo
 
Marketing Engineering Notes
Marketing Engineering NotesMarketing Engineering Notes
Marketing Engineering Notes
Felipe Affonso
 
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...
QuantInsti
 

Semelhante a Choosing the Right Regressors (20)

Dummy variables
Dummy variablesDummy variables
Dummy variables
 
Hypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.pptHypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.ppt
 
Econometric (Indonesia's Economy).pptx
Econometric (Indonesia's Economy).pptxEconometric (Indonesia's Economy).pptx
Econometric (Indonesia's Economy).pptx
 
Econometrics
EconometricsEconometrics
Econometrics
 
Advanced Econometrics L1-2.pptx
Advanced Econometrics L1-2.pptxAdvanced Econometrics L1-2.pptx
Advanced Econometrics L1-2.pptx
 
report
reportreport
report
 
Classification methods and assessment.pdf
Classification methods and assessment.pdfClassification methods and assessment.pdf
Classification methods and assessment.pdf
 
Corrleation and regression
Corrleation and regressionCorrleation and regression
Corrleation and regression
 
200844797 rsh-qam11-tif-01-doc
200844797 rsh-qam11-tif-01-doc200844797 rsh-qam11-tif-01-doc
200844797 rsh-qam11-tif-01-doc
 
Marketing Engineering Notes
Marketing Engineering NotesMarketing Engineering Notes
Marketing Engineering Notes
 
Intro to Quant Trading Strategies (Lecture 10 of 10)
Intro to Quant Trading Strategies (Lecture 10 of 10)Intro to Quant Trading Strategies (Lecture 10 of 10)
Intro to Quant Trading Strategies (Lecture 10 of 10)
 
Advanced Econometrics L3-4.pptx
Advanced Econometrics L3-4.pptxAdvanced Econometrics L3-4.pptx
Advanced Econometrics L3-4.pptx
 
Autocorrelation
AutocorrelationAutocorrelation
Autocorrelation
 
Decision theory
Decision theoryDecision theory
Decision theory
 
Risk notes ch12
Risk notes ch12Risk notes ch12
Risk notes ch12
 
Classification modelling review
Classification modelling reviewClassification modelling review
Classification modelling review
 
Operations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paperOperations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paper
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
Cb36469472
Cb36469472Cb36469472
Cb36469472
 
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...
 

Mais de Asad Zaman

Mais de Asad Zaman (20)

Decolonization of Education: Islamic Perspective - HBKU
Decolonization of Education: Islamic Perspective - HBKUDecolonization of Education: Islamic Perspective - HBKU
Decolonization of Education: Islamic Perspective - HBKU
 
Decolonization of Education: Islamic Perspective
Decolonization of Education: Islamic PerspectiveDecolonization of Education: Islamic Perspective
Decolonization of Education: Islamic Perspective
 
The Ghazali Project: Countering Enlightenment Epistemology
The Ghazali Project: Countering Enlightenment EpistemologyThe Ghazali Project: Countering Enlightenment Epistemology
The Ghazali Project: Countering Enlightenment Epistemology
 
First, Second, and Third Generation Islamic Economicss
First, Second, and Third Generation Islamic EconomicssFirst, Second, and Third Generation Islamic Economicss
First, Second, and Third Generation Islamic Economicss
 
Teaching Third Generation Islamic Economics
Teaching Third Generation Islamic EconomicsTeaching Third Generation Islamic Economics
Teaching Third Generation Islamic Economics
 
Reversing the Great Transformation
Reversing the Great TransformationReversing the Great Transformation
Reversing the Great Transformation
 
Three Generations of Islamic Economics
Three Generations of Islamic EconomicsThree Generations of Islamic Economics
Three Generations of Islamic Economics
 
Challenge & Opportunity for 3rd Gen Islamic Economists (7/7)
Challenge & Opportunity for 3rd Gen Islamic Economists (7/7)Challenge & Opportunity for 3rd Gen Islamic Economists (7/7)
Challenge & Opportunity for 3rd Gen Islamic Economists (7/7)
 
Pragmatic Objections To Visionary Islamic Economics (6/7)
Pragmatic Objections To Visionary Islamic Economics (6/7)Pragmatic Objections To Visionary Islamic Economics (6/7)
Pragmatic Objections To Visionary Islamic Economics (6/7)
 
The Revolutionary Islamic Alternative (5/7)
The Revolutionary Islamic Alternative (5/7)The Revolutionary Islamic Alternative (5/7)
The Revolutionary Islamic Alternative (5/7)
 
Two Puzzles About Islamic Economics (4/7)
Two Puzzles About Islamic Economics (4/7)Two Puzzles About Islamic Economics (4/7)
Two Puzzles About Islamic Economics (4/7)
 
Four Flaws of Modern Economics (3/7)
Four Flaws of Modern Economics (3/7)Four Flaws of Modern Economics (3/7)
Four Flaws of Modern Economics (3/7)
 
Crisis in 2nd Generation Islamic Economics (2/7)
Crisis in 2nd Generation Islamic Economics (2/7)Crisis in 2nd Generation Islamic Economics (2/7)
Crisis in 2nd Generation Islamic Economics (2/7)
 
Islamization of Knowledge (1/7)
Islamization of Knowledge (1/7)Islamization of Knowledge (1/7)
Islamization of Knowledge (1/7)
 
Basics of Money: Seeing Through Deceptions
Basics of Money: Seeing Through DeceptionsBasics of Money: Seeing Through Deceptions
Basics of Money: Seeing Through Deceptions
 
Capitalism vs Islamic Economics
Capitalism vs Islamic EconomicsCapitalism vs Islamic Economics
Capitalism vs Islamic Economics
 
Complete and Perfect Guidance from God
Complete and Perfect Guidance from GodComplete and Perfect Guidance from God
Complete and Perfect Guidance from God
 
Imposing Eurocentric Patterns on Islamic Societies
Imposing Eurocentric Patterns on Islamic SocietiesImposing Eurocentric Patterns on Islamic Societies
Imposing Eurocentric Patterns on Islamic Societies
 
IIIE.pptx
IIIE.pptxIIIE.pptx
IIIE.pptx
 
Origins of Modern Money: Insufficiency of Gold
Origins of Modern Money: Insufficiency of GoldOrigins of Modern Money: Insufficiency of Gold
Origins of Modern Money: Insufficiency of Gold
 

Último

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 

Último (20)

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 

Choosing the Right Regressors

  • 1. Econometric Methodology: Choosing the Right Regressors PIDE Nurturing Minds Seminar 13th Sept 2017 Asad Zaman, VC PIDE
  • 2. Major Misunderstandings of Econometric Methodology Three Papers Submitted to PDR: (1): GDP = a + b1 X1 + … + bn Xn + C IPR = Intellectual Property Rights (2): GDP = a + b1 X1 + … + bn Xn + C EXPORTS (Export Led Growth) (3): GDP = a + b1 X1 + … + bn Xn + c1 FDI + c2 Lit (FDI + Human Capital) Question: Can all three papers be right?
  • 3. Axiom of Correct Specification Edward Leamer All INCLUDED Regressors MUST BE determinants. All EXCLUDED Regressors MUST NOT BE DETERMINANTS Both inclusions and exclusions MUST be correctly specified for the model to be valid. ALL THREE REGRESSIONS CANNOT BE VALID: If IPR is a determinant then other two are wrong, and similarly for the other (2) SO at MOST one of the three can be correct. BUT is one of the three correct? There are more than 60 variables AVAILABLE in WDI data sets. So if ONLY one determinant, we have 60 possible regressions. HOW CAN WE FIND OUT WHICH ONE IS CORRECT?
  • 4. R-squared, AIC, BIC, Schwartz etc. Use Model Selection Criteria --  This method does not work very well. KEY PROBLEM: HOW TO ENSURE THAT THERE ARE NO MISSING SIGNIFICANT VARIABLES – these cause EXTREME BIAS and WRONG AN MISLEADING RESULTS INCLUDING EXTRA VARIABLES CAUSE LOSS OF EFFICIENCY, BUT DOES NOT CAUSE INFERENCE FAILURE !! BECAUSE 0 coefficient is a possible value. BIG MODEL MUST ENCOMPASS TRUE MODEL, if it is to produce VALID INFERENCE.
  • 5. How Much Trouble Can a Missing Variable Cause? CONS = Final consumption expenditure, etc. (constant 2010 US$) Pakistan  GDP = GDP (constant 2010 US$) Pakistan  CONS(Pak) = 4.12 + 0.883 * GDP (Pak) +  (R2=0.998) (0.51) (0.006) (2.56) Standard Keynesian Consumption Function. Likely to have autocorrelation and other MIS-SPECIFIED SHORT TERM DYNAMICS. Other small missing variables. Super-Consistency Holds – Because of strong trends in CONS and GDP, misspecifications will not matter. Bias due to omitted stationary variables will VANISH. But omitted MAJOR variable can make a big difference.
  • 6. What happens if we regress CONS(Pak) on randomly chosen WDI Vars ??? SUR =Survival to age 65, female (% of cohort) C02 =CO2 emissions from gaseous fuel consumption (% of total)  Obviously, these variables have no relation to Consumption. Nonetheless, the OLS regression yields the following results (E) CONS = -268.7 + 6.78 SUR – 1.82 CO2 +  (R2=0.84) (25.9) (0.73) (0.65) (20.0) Both SUR and CO2 are HIGHLY SIGNIFICANT DETERMINANTS OF CONSUMPTION! This is called a NONESENSE REGRESSION. IRRELEVANT VARIABLES BECOME SIGNIFICANT AS PROXIES FOR MISSING VARS.
  • 7. ADD Relevant Regressor GDP: (F) CONS = 15.64 + 0.902 GDP – 2.60 SUR + 67.8 CO2 +  (R2=0.99) (0.54) (0.014) (1.40) (80.3) (2.28) Both SUR and CO2 become insignificant Nonsense Regressions are caused by MISSING DETERMINANTS WRONG DIAGNOSIS MADE BY ECONOMETRICIANS: Nonsense Regressions are cause by NON STATIONARITY – Immense amount of literature on Integration and Co-Integration: COMPLETELY USELESS Discovery due to Atiqur Rahman – He is supervising a thesis on this theme.
  • 8. LESSON: Omitting Significant Regressor Leads to NONESENSE REGRESSIONS EXAMPLE: (G) CONS(Pakistan) = -13.44 + 11.07 GDP(Honduras) +  (R2=0.99) (1.15) (0.12) (4.34) Consumption of Pakistan as a function of the GDP of Honduras !! Standard Diagnosis – this is because CONS and GDP are not stationary. WRONG – CONS and GDP do NOT have to be integrated variables. All we need is a MISSING IMPORTANT DETERMINANT from the regression.
  • 9. ORIGINAL QUESTION: Are models (1), (2), (3) Correct?  HOW do we know if important variables are missing or not?  Maybe IPR is significant because it proxies for FDI or for Exports or for ANY of the other 61 Variables available in WDI Data Set? Edward Leamer: Fragility of Inference First we study Leamer’s Solution – Extreme Bounds Analysis
  • 10. Edward Leamer: Specification Search The Truth about (regression) models  Models are NOT used to DISCOVER truth.  We start out KNOWING what we want to show.  We manipulate data into proving this  Example: Hundreds of papers proving free trade is beneficial  Rodrik: A Skeptic’s Guide.  Similar fact holds of models of economic theory.
  • 11. Leamer: SPECIFICATION SEARCHES  By varying sets of variables, we can get ANY RESULT we like  Look at W as determinant of Y. Choose X1, X2, …, Xn  By choosing the right set of variables, we can make coefficient of W positive or negative, significant or insignificant.  The PROCESS of REGRESSION is a specification search, where the Econometricians looks for the right collection of regressors to prove his favorite hypothesis.
  • 12. How to test if W is significant determinant of Y? Choose FIXED RELEVANT VARIABLES (guaranteed to be important from THEORETICAL consideration, known a priori ) X1, X2, …, Xn Focus Variable W Potential Determinants: V1, V2, … Vn Regress Y of X1,..,Xn, W, and some combination of Vi If W is significant regardless of what combination of Vi is put in, then W is significant. Look at the range of possible values of estimated coefficient of W. This is called EXTREME BOUNDS ANALYSIS. Typical conclusion: NO VARIABLE IS SIGNIFICANT. ALL INFERENCE IS FRAGILE.
  • 13. Sala-i-Martin: I ran two million regressions VARIANT of Leamer’s EBA Start with 62 Variables in WDI data set. Set Three as Essential Determinants GDP60, LE60, PSE60 – Life Expectancy and Primary School Enrolment (Barro). That leaves X1,…,X59. Choose Any One of them as W, Choose ANY THREE OTHERS to run: Growth = c + b1 GDP60 + b2 LE60 + b3 PSE60 + c W + c1 Xi + c2 Xj + c3 Xk VARY (i,j,k) over all possible sets of three regressors. 58x57x56= 185,136 If W is significant in 95% of these regressions, then count W as significant. I count 10 million regressions here. RESULT: 22 Variables out of the 59 are significant Conclusion EBA is too extreme
  • 14. What is wrong with Sala-i-Martin?  Analysis is self-contradictory  If 22 variables are significant than ALL regressions with less than 22 regressors have SIGNIFICANT OMITTED VARIABLES.  It follows that all of his two million regressions are nonsense regression.  CAN we get sensible results by running two million nonsense regressions?  Answer NO. This can be established by simulation study, done by Hoover and Perez later.  Sala-i-Martin strategy has high Type I and II error probabilities. It can include irrelevant regressors and exclude significant ones. Tends to include TOO MANY variables as being relevant when they are NOT.
  • 15. Pure Bayesian Approach Fernandez, Ley, Steel (2001) Take 41 regressors from Sala-t-Martin data set on which complete data is available. All possible 2^41 – two trillion models. Assign priors to them. Each regressor has prior probability 50% of being included in the model. Compute posterior probabilities. Regressors with HIGH posterior probabilities have high probabilities of being determinants. Strongest determinants are: Confucian% -- GDP60, EquipInv, LE60 Many other determinants. Good models have 0.1 % probability.
  • 16. Model Averaging VERSUS Selection  Selection focuses on finding TRUE model and CORRECT regressors.  BMA aims to USE all models, assign them weights, and come up with combined forecast.  DEBATE AND CONTROVERSEY:  Can we average over wrong models and get right result?  RESOLUTION – There are DIFFERENT GOALS, and each procedure is well suited to its OWN goal.  SELECTION involves putting all eggs in one basket. HIGHER RISK.  FORECASTING involves avoiding selection and getting insurance against bad choices.
  • 17. Hoover Perez Simulations  BMA fails to find the right regressors, BUT does well at forecasting.  So when it comes to CHOOSING the right set of regressor, the right strategy comes from ENCOMPASSING, using the Hendry Methodology
  • 18. Hendry Methodology  Conventional Methodology leads to conflicting, contradictory theories and models  T!: IPR -- T2: FDI -- T3: Exports and many others – ALL theories describe determinants of growth. They are in conflict with each other.  Papers exist which prove ELG, GLE, BOTH, NEITHER  Everybody runs a new regressions, and put down a new brick in a different place.  There is NO CUMULATION OF KNOWLEDGE.
  • 19. SOLUTION: ENCOMPASSING  Given T1, T2, T3, etc. New Researcher is NOT ALLOWED to put down T(J)  New Research MUST BUILD ON EXISTING RESEARCH.  FIRST of ALL do a LIT REVIEW – that is, COVER, and BE AWARE of ALL PRIOR EXISTING LITERATURE ON YOUR TOPIC.  NEXT, explain the gap: What are the DEFECTS in existing theories?  NEXT, FILL the gap: Explain how and why T(J) is SUPERIOR to all existing theories.  At the END there should be ONLY ONE BEST THEORY – Encompassing shows that our new theory COVERS all previous theories and IMPROVES UPON them. NEXT researcher has to BEAT T(J) to produce T(J+!).
  • 20. How to do this for Choice of Regressors  GUM: General Unrestricted Model  ADD ALL RELEVANT Variables  In our example, form model with Exports, IPR, FDI, and include ALL regressors used by ALL the researchers. The GUM NESTS T1 T2 T3 as special cases: GUM: GDP = a + b1 IPR + b2 FDI + b3 Exports + c1 X1 + … + ck Xk T1 says that b2=0 and b3=0, T2 says that b1=0, b3=0, T3 says b1=b2=0 We can test these hypotheses using F-test for joint significance of multiple regressors.
  • 21. Conventional Methodology Simple-To-General Start with C = a + b GNP + error -- Start with Simple Model, If there is a FLAW, THEN look for additional regressors – Make it more complicated if necessary. What are FLAWS? Failures of standard assumptions Heteroskedasticity (can usually be fixed by taking LOGS) AUTOCORRELATION: Can be fixed by adding DYNAMICS to static equation
  • 22. GUM Strategy for Autocorrelation Suppose C = a + b Y + e has autocorrelated errors. Then: e(t) = C(t) – a – b Y(t). ALSO: e(t-1) = C(t-1) – a – b Y(t-1) AUTOCORRELATED MODEL IS e(t) = u(t) + r e(t-1) C(t) = a + b Y(t) + u(t) + r e(t-1) = a + b Y(t) + r C(t-1) –ra – rb Y(t-1) + u(t) = (a - ra) + b Y(t) + r C(t-1) – rb Y(t-1) + u(t) Consider the GENERAL ARDL model – This is General UNRESTRICTED Model C = a* + b Y + c C(-1) + d Y(-1) + e AR-1 model is special case with d = - bc, a* = a (1-r)
  • 23. Flaws of Simple to General Strategy  If regression equation does not forecast well (Y=a+bX) add relevant variable W.  Then W may appear significant because it is proxy for some other missing variable. This will DECEIVE the econometrician.  If we add AR-1 restriction, we SET d = - bc. GeTS says ALLOW UNRESTRICTED d, and THEN TEST RESTRICTION. .
  • 24. GeTS: General-To-Simple Modeling  GeTS: Build the largest passible model. INCLUDE ALL POTENTIALLY RELEVANT REGRESSORS. Now no regressor can be significant because of OMITTED VARIABLES. Because you have included them ALL  Assuming we have data on ALL relevant variables  In the Sala-i-Martin data, run regression on ALL 61 variables.  THEN DROP insignificant Variables.
  • 25. Multiple Objections to GeTS  With lots of regressors, we have MULTICOLLINEARITY problems.  Many important regressors will fail to be significant.  NOISE can exceed SIGNAL. Bad Regressors can drive out Good ones. MANY PROBLEMS HAVE BEEN RESOLVED. MUCH PROGRESS HAS BEEN MADE EXISTING ALGORITHMS GIVE fairly GOOD probabilities of FINDING A MODEL WHICH ENCOMPASSES the true model. That is around 80% chances of picking up all relevant regressors, plus one or two extras. (depending on configurations of model and regressors)
  • 26. GeTS is NOT a mechanical procedure. MUST be guided by KNOWLEDGE  IDEAL CASE for GeTS ALL regressors are ORTHOGONAL – that is INDEPENDENT. Then each regressor can be treated separately, they do not INTERFERE with each other. Arrange all the t-stats for significance in decreasing ORDER. DROP all t-stats LESS than some critical value. There is no model selection problem! That is significance will NOT be affected by MODEL selection in this situation.
  • 27. Much more difficult with CORRELATED regressors  Y = a + b X + c W1 + d W2 IDEAL situation: Good Regressor PRESENT, makes, Bad Regressor INSIGNIFICANT. In first regression Pak Cons on Female Mortality and C02 Emissions, if we put in Pak GNP, it makes other two variables INSIGNIFICANT. THEORETICALLY, this will ALWAYS happen ASYMPTOTICALLY – as we get larger and larger amounts of DATA (and MODEL does not CHANGE) Good Variables WILL DRIVE out bad Variables. PRACTICALLY THIS IS NOT GUARANTEED. Often working with small data sets. EVEN with BIG DATA, if model changes from time to time than all data sets are small.
  • 28. COMPLICATIONS Pak Cons = a + b Pak GNP + c Honduras GNP + error Honduras GNP REMAINS highly significant in this regression. What does this mean? Does Honduras GNP matter for Pakistani Consumption? No – it is acting as a proxy for some OTHER missing variable This KNOWLEDGE comes from our knowledge of the real world. THAT is why model selection cannot be mechanical.
  • 29. Software Package PC-GeTS Automatic Model Selection Automatic GeTS is implements in PC-GeTS package, available and USEFUL. It reaches correct models with high probability when sufficient data is available to discriminate. VARIABLE regressors are quickly spotted, those with low variation MAY BE MISSED. The more the correlation, the greater possibility for ERROR – wrong variable can be chosen instead of right one.
  • 30. Human Guided Model Selection MODELLING CORRECTION: Start by ensuring a GOOD GUM – that is, fulfill assumptions of regression model. Choose right functional forms (log or other). LINEARIZE relationships, and run a lot of different types of FIXES to put initial model into GOOD Shape BEFORE starting selection. TRY TO ORTHOGONALIZE REGRESSORS: C = a GNP(t) + b GNP(t-1) can be changed to C = a GNP + b ∆ GNP
  • 31. Multiple Searches Y = a0 + a1 X1 + … + a60 X60 CAN Test EVEN when regressors exceed observations ! MAIN IDEAS drop ALL INSIGNIFICANT REGRESSORS. Works if regressors are independent. But not if they are correlated. In this case, Good regressor may be insignificant, and Bad Regressor may be significant. What to DO? CHOOSE USING THEORY OVER EMPIRICS: Retain Theoretically Important Variables. Take the TEN least significant variables. DROP them ONE at a TIME. This creates 10 DIFFERENT searches All variables are retained in one of the ten searches.
  • 32. Compare TERMINAL models using BIC  Continue EACH of ten searches by eliminating the LEAST significant regressors. MAY BE GUIDED BY THEORY AT EACH STAGE  Choose among collection of FINAL models.  This selection need not be mechanical  Can also use PRINCIPAL COMPONENTS to extract a small number of highly variable regressor from a large set. But Problems arise in INTERPRETATION.
  • 33. FINAL REMARKS  TWO STEPS: Building a Good Regression Model (not being deceived by ACCIDENTAL CORRELATIONS and SPURIOUS and NONESENSE REGRESSIONS)  Picking out GENUINE, STABLE CORRELATIONS DOES NOT JUSTIFY CAUSAL INFERENCE>  GUM IDENTIFIES % CONFUCIAN as a KEY DETERMINANT of growth  WHY?  Because of CHINA. NOT A CAUSAL RELATIONSHIP
  • 34. To view 70m video-talk based on these slides, a brief summary, and link to full paper, see: http://bit.do/azreg