SlideShare a Scribd company logo
1 of 53
Climate Change Models
to Estimate and Forecast Temperature
Gaetan Lion
September 2021
1
2
Content
1. Introduction
2. Data
3. Baseline trend models
4. CO2 models
5. Out-of-sample forecasts
6. Replicating IPCC scenarios
7. Granger Causality, VAR, IRFs
8. VAR Forecast
3
1. Introduction
This presentation discloses the modeling of global temperature* associated, or caused,
by a rising concentration in CO2 in parts per million (ppm). Other variables will also be
explored and tested to include within these Climate Change models.
The above is:
a) To assess the information imparted by CO2 concentration into this model
estimating and predicting temperature;
b) To test the accuracy of such models to fit the historical temperature data and to
forecast temperature within out-of-sample testing framework;
c) To replicate the most recent IPCC scenarios;
d) To better understand the relationship between CO2 concentration and
temperature and to attempt to demonstrate causality of CO2 -> temperature.
* Measured as temperature anomaly over the 1850 – 1900 average global temperature.
4
2. Data
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1880
1884
1888
1892
1896
1900
1904
1908
1912
1916
1920
1924
1928
1932
1936
1940
1944
1948
1952
1956
1960
1964
1968
1972
1976
1980
1984
1988
1992
1996
2000
2004
2008
2012
2016
2020
NOAA vs NASA Temperature anomaly in degree Celsius
NOAA NASA
Temperature annual history
going back to 1880 to 2020.
The two overlapping series
are from NASA and the NOAA.
We will use the average of the
two series.
5
Temperature records are
captured as “temperature
anomaly.” The latter
represents the difference in
temperature between a
specific year and the average
over the 1850 – 1900 period
when the industrialization,
level and CO2 concentration
were much lower.
Temperature anomaly is
measured in degree Celsius
(or Centigrade that are
equivalent).
Annual CO2 concentration in parts
per million from 1880 to 2020.
Data from 1880 to 1958 is derived
from a cooperative effort between
three different scientific teams from
Australia and France constructing
the data derived from ice core
analysis.
Data from 1958 to 2020 is from the
NOAA.
6
We understand that comparing two levels
variables, without detrending them, can lead to
spurious correlations and regressions.
However, when two level variables are cointegrated
the above caveat is nullified. We will disclose later
cointegration testing for these two variables.
As observed this scatter plot shows a pretty strong
correlation between the two variables.
7
The relationship between CO2 and temperature can be split over two periods. The first one (1880 – 1970) with CO2
concentration ranging from 290 to 325 ppm is associated with a not so strong linear relationship between the two
variables. The second one (1971 – 2020) with CO2 concentration ranging from 325 to close to 420 is associated with a very
strong linear relationship. For the purpose of our modeling, we will not split the data as the related regression parameters
are pretty stable (intercept and slope of the regression equations shown on the scatter plots). 8
Checking the Autocorrelation of the Residuals of the Ordinary Least Square (OLS)
Cointegration Regression: Temperature ~ CO2
9
Given that we are using level variables, the residuals autocorrelation levels as
captured by the ACF and PACF graphs is reasonably low. And, at the onset
suggests that these two variables (CO2 and temperature) may be indeed
cointegrated.
The PACF graph at the bottom is the one used to select the number of yearly lags
we should select to conduct our unit root testing to confirm that these residuals
are indeed stationary (do not have a unit root).
Even though within the PACF graph, only lag 1 crosses the line of statistical
significance ( > 0.2), we will use up to lag 4 to be more conservative.
10
Testing the residuals of the OLS Cointegration Regression
Temperature ~ CO2 for stationarity
Test p-value Interpretation confirming residuals are stationary
ADF test 0.01 Reject the null hypothesis that residuals are nonstationary
Phillips Perron 0.01 Reject the null hypothesis that residuals are nonstationary
KPSS > 0.1 Accept the null hypothesis that residuals are stationary
We used 4 lags for each of the above unit root test. In each case, the respective unit root tests confirmed that the
Cointegration Regression residuals were stationary. This confirmation allows us to proceed in modeling the
relationship between CO2 and temperature using level variables knowing that these two variables are explicitly
cointegrated.
Further residual model testing often includes testing for autocorrelation, heteroskedasticity, and normal distribution.
However, any related residual issues do not affect the regression coefficients biasness. They may affect the reliability
of regression coefficients confidence intervals and their statistical significance. However, if such regression
coefficients are associated with t-stats > 2.5 or 3.0, statistical significance is typically not an issue (even after
adjusting with Robust Standard Errors). Additionally, in some cases as we’ll see we are not explicitly concerned with
levels of statistical significance, as long as the variable make good sense in terms of explaining how the climate
system works, and that the variable regression coefficient has the appropriate sign.
11
3. Baseline trend models
Within this section we will develop models that do not use CO2 as an exogenous variable but simply various trend
variables (counting 1, 2, 3, 4,…). This is just to test whether just the passing of time is the driving trend and not so
much CO2 as a causal factor.
This is a pretty good test on whether your level-based original model is truly valid and not another example of a
spurious regression using level variables.
-0.4
-0.6
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1880
1884
1888
1892
1896
1900
1904
1908
1912
1916
1920
1924
1928
1932
1936
1940
1944
1948
1952
1956
1960
1964
1968
1972
1976
1980
1984
1988
1992
1996
2000
2004
2008
2012
2016
2020
Temperature Anomaly. Historical Fit of a simple Trend Model
Actual Trend
This model uses a single Trend
variable (counting 1, 2, 3, etc.) to
estimate the temperature over
time. It does not use any
exogenous information.
12
As shown, this Trend model is
pretty terrible. Notice how it way
underestimates temperatures at
the onset from 1880 to 1900 and
at the end from 2005 to 2020. In
between from 1901 to 2004 it
typically overestimates
temperatures.
13
The Trend Model residuals are pretty awful looking
-0.40
-0.30
-0.20
-0.10
0.00
0.10
0.20
0.30
0.40
0.50
Residual
280 300 320 340 360 380 400 420
CO2 concentration (ppm)
Trend Model residuals
-0.40
-0.30
-0.20
-0.10
0.00
0.10
0.20
0.30
0.40
0.50
-0.60 -0.40 -0.20 0.40 0.60
Residual
0.00 0.20
Model Estimate
Trend Model residuals
A good model should have a residual curve (red dashed line) that is flat, straight, and sits at the 0.00 level. This would
indicate residuals that are stationary and mean reverting around the 0.00 level. These residuals are far away from
meeting that standard. They are clearly nonstationary.
14
1.2
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
1880
1884
1888
1892
1896
1900
1904
1908
1912
1916
1920
1924
1928
1932
1936
1940
1944
1948
1952
1956
1960
1964
1968
1972
1976
1980
1984
1988
1992
1996
2000
2004
2008
2012
2016
2020
Temperature Anomaly. Hist. Fit of a polynomial Trend Model
Actual Trend 2
If we simply add a second Trend variable
that represents the square of the Trend
variable, we actually get a surprisingly
good historical fit of the data in absence
of any information from CO2.
While the Trend variable starts with 1, 2,
3, etc.; the Trend square variable starts
with 1, 4, 9, etc.
The combination of those two variables
make for a very good polynomial
regression equation that fits the J shape
curve of the data very well.
We call this model Trend 2.
The residual trend line (red) relative to Model Estimates on the x-axis within the right-hand graph is now perfectly flat,
straight, and on the 0.00 line as it should. Even that same residual trend line when using CO2 concentration on the x-axis
is actually reasonably flat. It looks like this model appear to capture a good deal of the information imparted by the CO2
variable. We have now a pretty competitive Baseline Trend model to assess the validity of our upcoming CO2 mode1ls5.
The Trend 2 Model residuals are far better looking
Description of the Trend 2 model
The square of the trend (trend2) is a
very large number, so the resulting
regression coefficient is very small:
0.000085.
16
All the Goodness-of-fit measures are very
high. And, the resulting model errors are
pretty low. This is kind of amazing given that
we have just used trend variables to fit the
temperature history starting back in 1880.
17
4. CO2 Models
We will introduce two CO2 based models to estimate and forecast temperature.
The first one will be our simple linear OLS Cointegration Regression just using CO2 as our stand alone exogenous
variable.
The second one will be a more complete model that will also include the influence on temperature from the
Pacific Decadal Oscillation with warm years due to El Nino and cold years due to La Nina. This model will also
include another intervention variable covering the years from 1940 to 1970 before sulfates aerosol were heavily
regulated. Sulfates have a lowering effect on temperature that partly counters the rising effect of CO2.
CO2 model description
Notice the extremely high t-stat of
the CO2 coefficient, leaving no
doubt as to the statistical
significance of this variable.
18
The CO2 model has very good looking residuals (flat red lines)
19
The more complete CO2 based model
The El Nino variable has a p-value of 0.155,
not stat. significant at the Alpha < 0.10 level.
However, within a sport betting market, this
same p-value would correspond to one team
being favored with odds close to 6-to-1 of
winning. That be a pretty good bet.
In view of the above, we are comfortable
including the El Nino variable in our model.
It also makes sense to include both years
that have a positive impact on temperature
(El Nino) with the ones that have a negative
impact (La Nina).
20
The complete Model residuals are still reasonably good looking (fairly flat red curves)
21
22
Model Competition regarding the fit of Temperature history
CO2 model Model Trend 2
Adjusted R Square 0.891 0.917 0.888
Predicted R Square 0.890 0.913 0.885
RMSE 0.117 0.102 0.119
MAE 0.095 0.082 0.095
Whether looking at measure of variance explanation (Adjusted R Square), one-observation prediction (Predicted R
Square) or model errors (RMSE and Mean Absolute Error), the three models are very close.
The CO2 model and the Trend 2 model are just about dead even on all counts. The Model that includes the other
variables such as El Nino and La Nina is fractionally more accurate.
If we stopped our analysis now, one could prematurely conclude that the trend (including the trend square variable)
just about explains everything regarding the progressive increase in temperature from 1880 and 2020. And, that the
two CO2 based models really do not add much information if any above just capturing this trend. This could lead
one to assessing our CO2 based models as “spurious.” Additional analysis will confirm otherwise supporting that
including a CO2 variable far improves the prediction accuracy of such model. Fitting the historical data is one thing.
Making reasonably accurate prediction is far more challenging and useful.
1.2
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
1880
1884
1888
1892
1896
1900
1904
1908
1912
1916
1920
1924
1928
1932
1936
1940
1944
1948
1952
1956
1960
1964
1968
1972
1976
1980
1984
1988
1992
1996
2000
2004
2008
2012
2016
2020
Temperature Anomaly. Historical Fit
Actual CO2 model Model Trend 2
When you visually compare
the historical fit of the three
models, they all fit the
underlying J curve long term
trend of temperature
increase over the 1880 to
2020 period.
23
The more complete Model
that includes the El Nino and
La Nina variables does match
some of the volatility or
oscillations in the
temperature annual data
much better than the other
two models. Otherwise,
again really not much
difference between the
three models.
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Temperature Anomaly. Historical Fit since 1990
Actual CO2 model Model Trend 2
Focusing on the more recent
period since 1990 util 2020, we
can observe similar pretty good fit
between the three models.
24
The more complete Model has a
slightly better fit by better
capturing the temperature
oscillations associated with El
Nino/La Nina.
However, notice how the Trend 2
model starts to underestimate the
temperature level starting in
2014. This may be the first
indication that CO2 does impart
some valuable information to this
temperature model.
25
5. Out-of-sample forecasts
Fitting historical data is one thing. And, one way or another it is often relatively easy even in the case of fitting
historical temperature level from 1880 to 2020, as we have seen. Predicting observations using out-of-sample
forecasts, also called Hold Out testing, is far more difficult and is a far more relevant test of a model predictive
accuracy.
With such models, you run often into a situation where a model fits the historical data really well, but predicts
really poorly (in Hold Out testing). This is a classic situation of model overfitting. It happens all the time.
Within this section we will test whether our models are overfit, or if instead they do provide predictive
information.
26
Cross Validation test
Mean Absolute Error
History Cross-val. C.V./History
CO2 model 0.095 0.099 1.05
Model 0.082 0.089 1.09
Trend 2 0.095 0.104 1.10
Cross validation is a rigorous form of out-of-sample forecast testing. In our case, we removed 14 observations from the
data to create a 14-year prediction window. And, we did this exercise 10 times to cover the 141 yearly observations
within our complete data set.
So, the first prediction window went from 1880 to 1893. We used a model with history from 1894 to 2020 to attempt
to predict the 1880 – 1893 years.
The second prediction window was from 1894 to 1907. We used a model with history in all other years outside the
prediction window. And, we continued this process until using the most recent 14 years as our prediction window.
The table compares the Mean Absolute Error (MAE) of each
of our three models when we first used the entire data set
to fit the history. Next, it discloses the MAE that is the
average MAE of the 10 cross validation prediction windows.
And, next we look at the ratio or multiple of the cross
validations MAE divided by the MAE during history. The
cross validation MAE by definition should be much higher than the MAE during history. If that multiple is greater than
1.5, you may be dealing with a model that is overfit. As shown above, all our three models perform well on this count
with very little deterioration during the cross validation test. Again the complete Model is a bit better than the other
two. And, at the margin our CO2 model did a bit better than the Trend 2 model during cross validation.
2006 – 2020 Out-of-sample Hold Out Test
27
Temperature Anomaly estimate 2006 - 2020
Actual CO2 model Model Trend2
2005 0.68 0.68 0.68 0.68
2006 0.64 0.63 0.64 0.56
2007 0.65 0.65 0.62 0.58
2008 0.54 0.67 0.54 0.59
2009 0.65 0.68 0.69 0.61
2010 0.72 0.71 0.68 0.63
2011 0.60 0.73 0.60 0.64
2012 0.65 0.75 0.72 0.66
2013 0.68 0.78 0.75 0.67
2014 0.75 0.80 0.81 0.69
2015 0.92 0.82 0.69 0.71
2016 1.01 0.85 0.82 0.72
2017 0.92 0.88 0.85 0.74
2018 0.84 0.90 0.87 0.76
2019 0.97 0.93 0.90 0.78
2020 1.00 0.95 0.92 0.79
Temp increase 0.33 0.28 0.25 0.12
MAE 0.07 0.06 0.11
Temperature Anomaly. Hold Out 2006 - 2020
1.20
1.00
0.80
0.60
0.40
0.20
0.00
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Actual CO2 model Model Trend2
When we attempt to forecast the recent period (2006 – 2020) using historical data (1880 – 2005), the Trend 2 model way
underestimates the increase in temperature over the recent period ( + 0.12 vs. + 0.33 for actuals). The two CO2 based
models do a lot better with respective temperature increase ranging from + 0.25 to + 0.28.
1990 – 2005 Out-of-sample Hold Out Test
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
2005
Temperature Anomaly. Hold Out 1990 - 2005
Actual CO2 Model Model Trend2
This is the exact same pattern as the prior Hold
Out Test. On a begin-to-end point basis, the
Trend 2 model greatly underestimates the
temperature increase over the 1990 – 2005
period.
28
Notice how the simpler CO2 model does better
than the more complete Model on a begin-to-
end point basis. This was also true in the Hold
Out test on the previous slide.
The repeated relative failure of the Trend 2
model is not so surprising. Polynomial
regressions are notoriously good at fitting
historical data; but often not so good the
minute you do some out-of-sample testing.
1982 – 2020 Out-of-sample Hold Out Test
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
Temperature Anomaly. Hold Out 1982 - 2020
Actual CO2 Model Model Trend 2
This is an unusually long Hold Out test
where we removed the most 39 recent
years of the data (1982 – 2020).
29
Just by knowing the CO2 concentration
level, we would have come up with an
excellent begin-to-end point estimation
of the overall temperature increase over
this 39 year period (CO2 Model). And,
that estimation is far superior than the
estimation from the other two models.
Why is the complete Model a distant second to the simpler CO2 based model?
It is because the Pacific Decadal Oscillation that captures the El Nino (+) and La Nina (-) is not so decadal. It is
very volatile and captured in 3-month moving average that can often fluctuate between an El Nino (+) and La
Nina (-) phenomenon within the same year. Therefore, the yearly based capture of those phenomena is highly
inaccurate.
30
Attempt to improve Hold Out with a Robust Quantile Regression
regular CO2 model. To the contrary, the regular CO2 model generated a better set of predictions over this Hold Out
period. This gives us some comfort that this CO2 model is pretty well specified, not overly influenced by outliers within its
historical data, and able to make really pretty good predictions over a 39 year period. Prediction success over such a long
period (just assuming we know the accurate value of CO2 concentration) is very rare for such time series models.31
Temperature Anomaly. Hold Out 1982 - 2020
1.2
1.0
0.8
0.6
0.4
0.2
0.0
1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 2019
Actual CO2 Model Robust Model
Linear regressions such as our CO2 models can be
affected by outliers of both the Y variable
(temperature) due to variables not included in the
model (El Nino/La Nina, influence of other greenhouse
gases, etc.) or the X variable (non linear change or
random jumps in the CO2 concentration variable).
To remedy the above issue of a regression coefficients
being influenced or distorted by outliers in the
historical data, we use robust regressions that are more
resistant to the influence of such outliers. A common
robust regression method is Quantile Regression that
regresses to the Median instead of the Mean. And,
therefore much reduces the influence of outliers.
However, as shown such a Robust Quantile Regression
did not improve the Hold Out performance of our
6. Replicating IPCC scenarios
32
IPCC Scenarios
33
Within its most recent assessment, the IPCC has developed 5 different scenarios. The most benign one being called
SSP1-1.9 whereby CO2 concentration would remain relatively flat between 400 to 450 ppm. And, the temperature
anomaly would remain close to + 1.5 degree Celsius. The most severe one is called SSP5-8.5 when CO2 concentration
would continue increasing rapidly to 1100 ppm by the end of the century; and, the temperature anomaly would reach
about + 4.4 degree Celsius.
Source: IPCC Technical Summary 2021. The large gray letters
are part of the following statement ”accepted version subject
to final editing.”
34
CO2 model LN(CO2) model
Intercept -3.2 -19.8
Coefficient 0.010 3.43
Temperature anomaly estimates
CO2 ppm CO2 model LN(CO2) model
300 -0.20 -0.21
400 0.80 0.78
500 1.81 1.54
600 2.81 2.17
700 3.82 2.70
800 4.82 3.15
900 5.83 3.56
1000 6.84 3.92
1100 7.84 4.25
1200 8.85 4.55
Temperature
Anomaly
in
deg.
Celsius
Replicating IPCC Scenarios
10
9
8
7
6
5
4
3
2
1
0
-1
300 400 500 600 700 800 900 1000 1100 1200
CO2 Concentration (ppm)
CO2 model LN(CO2) model
Attempting to replicate the IPCC scenarios
Our CO2 linear model appears to way overshoot IPCC scenarios when using true-out-of-sample CO2 concentrations
that are way higher than what the model was trained on (much greater than 420 ppm and going up to 1200 ppm).
However, using a very similar model structure and simply using the LN(CO2) generates a curve that looks like it may
very well replicate the IPCC scenarios. We will look at that in greater detail on the next slide.
Note how the two models are very close when using CO2 concentrations that the linear CO2 model was trained on,
ranging from 300 to 400 ppm
As shown on the graph, the
LN(CO2) model temperature
estimates with CO2 concentration
up to 1200 ppm come very close to
the ones generated by the IPCC
scenarios.
The graph highlights the
temperature estimates for the most
benign IPCC scenario, SSP1-1.9, and
the most severe one, SSP5-8.5. The
model slightly underestimates the
former; and, is pretty much right on
the money for the latter (the most
severe scenario).
35
Why did we not use LN(CO2) instead of CO2 to estimate and forecast
temperature earlier? It is for a simple reason. When CO2 is < 420 ppm,
historically there is a very strong linear relationship
between CO2 and temperature. That linear
relationship is much stronger and better fitting than a
logarithmic relationship between the two variables.
36
We tested a logarithmic model with LN(CO2). It was
pretty good, but it came a distant second to the linear
CO2 model when conducting out-of-sample Hold Out
testing.
Over the longer term, going forward, and with true-
out-sample CO2 concentration levels (much above 420
ppm), the scientific community within the IPCC
assesses that the CO2 vs. temperature relationship
follows a logarithmic curve. That’s a very good thing. If
the relationship would continue to be linear, our
survival would become increasingly unlikely.
Description of the CO2 Model vs. LN(CO2) one
As shown below, regarding the historical fit of the temperature data both models are very close. The Adjusted R
Squares are nearly even at 0.89. And, the respective model Standard Errors between 0.117 and 0.118 degree
Celsius are also very close.
37
1982 – 2020 Out-of-sample Hold Out Test. CO2 Model vs. LN(CO2) Model
As shown on our long out-of-sample
Hold Out test (1982 – 2020), the CO2
model performs much better than the
LN(CO2) model. This is especially true if
we look at it from a begin-to-endpoint
perspective.
The CO2 model just about meets the
endpoint in 2020 when the temperature
anomaly is + 1.00 degree Celsius.
Meanwhile, the LN(CO2) model misses it
by almost 0.2 degree Celsius.
1.2
1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
Temperature Anomaly. Hold Out 1982 - 2020
Actual CO2 Model LN(CO2) Model
38
39
7. Granger Causality, VAR, IRFs
We will use the mentioned statistical methods to attempt to assess the causality of the CO2 concentration on
temperature. Based on the disclosed work so far we already know there is a very strong association, or
correlation, between the two. But, is this association truly causal? Demonstrating causality in any such models
is most often extremely challenging. Often one can’t demonstrate true causality or even Granger causality (a
less absolute definition of causality that merely entails that one variable is the chronological predecessor of
another without necessarily causing the other.
40
The steps to evaluate Granger Causality in this particular case
1) Does CO2 Granger cause temperature? Run Granger Causality test: CO2 -> temperature.
2) Test in which direction this causality manifest itself. Run Granger Causality test in the reverse causal direction:
temperature -> CO2. This sounds absurd but there may be ecosystem explanations supporting why this may be so.
The math is agnostic on stuff like that. Granger Causality just checks if A causes B more than B causes A to confirm
the causality direction.
3) What sign direction is this causality. Obviously, we want CO2 concentration to cause rising temperatures not
declining one. To check that, we will observe the directional signs of the CO2 variables regression coefficients
embedded in the underlying Vector Autoregression (VAR) model. If they sum up to a strong positive value, you
have confirmed your hypothesis that CO2 causes rising temperatures. Otherwise, you have not.
4) Next, check out the Impulse Response Function (IRF) graphs to visualize how an unanticipated shock in CO2
concentration reverberates on temperature increase over the next 10 years.
5) Next, explore the Forecast Error Variance Decomposition (FEVD) to evaluate how much information CO2 does truly
impart to these VAR models.
Only once you have completed all five steps will you have drawn a complete picture of the Granger causality
between two variables. Many practitioners stop after the very first step in a hurry to confirm their hypothesis; while
being less than enthusiastic about pursuing the next steps that may not confirm their hypothesis.
Does CO2 Granger cause Temperature?
41
Yes, it does
We ran a set of Granger Causality tests. You start with a baseline autoregressive model that just includes 1 yearly lag
of the temperature to estimate the temperature history. Next, you develop a second model by adding the 1 year lag
of CO2 to also estimate the temperature history. Finally, you test with an F test and a Chi Square test whether the
residuals of the second model including the CO2 lag are much lower than the residuals of the baseline
autoregressive model. If they are indeed lower at a statistically significant level, you conclude that CO2 does
Granger cause temperature.
You repeat this procedure up to including 4 yearly lags (we did not contemplate using more lags. Beyond 4 yearly
lags, we may likely start overfitting the model on the autoregressive properties of the respective time series). As
shown above, both the series of F tests and Chi Square tests using models with up to 4 lags all confirm that CO2
clearly Granger cause temperature. Indeed, in all cases the resulting p-values are essentially Zero allowing us to
reject the null hypothesis that there is no statistically significant difference between the two sets of residuals
(baseline autoregressive model vs. model including the CO2 lags).
CO2 Granger causes Temperature testing
F test
Value p -value
Chi Square test
Value p -value
# of lags
1 39.9 0.00 40.7 0.00
2 21.4 0.00 44.4 0.00
3 10.0 0.00 31.6 0.00
4 5.7 0.00 24.4 0.00
Does CO2 Granger cause Temperature… more than
Temperature Granger causing CO2? Yes it does
42
CO2 Granger causes Temperature testing
F test
Value p -value
Chi Square test
Value p -value
# of lags
1 39.9 0.00 40.7 0.00
2 21.4 0.00 44.4 0.00
3 10.0 0.00 31.6 0.00
4 5.7 0.00 24.4 0.00
Temperature Granger causes CO2 testing
F test
Value p -value
Chi Square test
Value p -value
# of lags
1 1.9 0.17 1.9 0.17
2 3.9 0.02 8.2 0.02
3 2.9 0.04 9.2 0.03
4 1.8 0.14 7.6 0.11
When you run all the Granger causality test in the other direction, all the
F tests and Chi Square test are a lot lower, and the resulting p-value are
much lower. In several of the Granger causality tests, we can’t reject the
null hypothesis that any difference in residuals between the baseline
autoregressive model and the model that includes CO2 is just due to
randomness.
# of lags selection for the VAR models using Information Criteria
The models described earlier that include lags of both CO2 and temperature to establish causality in either
direction are essentially unrestricted Vector Autoregression (VAR) models. When used for other purposes, on a
stand alone basis, such models are also called Autoregressive Distributed Lag (ARDL) models, a popular model
structure in social sciences and econometrics.
43
As a side note, when using level variables one should typically use other forms of VAR (not unrestricted). But,
given that the residuals of our unrestricted VAR models are uncorrelated, we should be ok to proceed as is.
To select the best number of lags for our VAR models, we will check the output of information criteria generated
by an R function. The lower the information criterion value the better the model fit and specification.
# of Lags
Info Criteria 1 2 3 4
AIC -6.66 -6.85 -6.80 -6.87
HQ -6.61 -6.76
-6.63
-6.68 -6.72
SC -6.54 -6.51 -6.49
FPE 0.00128 0.00106 0.00111 0.00104
As shown above, two of the information criteria select the VAR models with 2 lags. And, the other two select the VAR
models with 4 lags. But, notice that all four models (with lags ranging from 1 up to 4 yearly lags) have very close
information criteria values. In essence, they are very competitive with each other. So, we will often look at all four
models.
Does the CO2 vs. Temperature causal relationship have the
appropriate positive sign? … well here it gets a bit foggy
Yet, when we look at the overall Granger causality effect of CO2 on temperature (associated with an unexpected
upward shock in CO2), this net effect seems very small at around 0.005 to 0.006 regardless of the VAR we use. We
derive this net effect by summing the CO2 lags regression coefficients. But, at least this net effect is positive.
44
Model equation causal direction: CO2 causes temperature
Model CO2 Lags Coefficient t stat p-value
VAR w/ 1 lag CO2 lag 1 0.005 6.32 0.00
VAR w/ 2 lags CO2 lag 1 -0.049 -2.17 0.03
CO2 lag 2 0.055 2.40 0.02
Sum 0.006
VAR w/ 3 lags CO2 lag 1 -0.045 -1.79 0.07
CO2 lag 2 0.051 1.22 0.22
CO2 lag 3 0.000 0.01 1.00
Sum 0.006
VAR w/ 4 lags CO2 lag 1 -0.044 -1.72 0.09
CO2 lag 2 0.058 1.35 0.18
CO2 lag 3 -0.016 -0.37 0.71
CO2 lag 4 0.008 0.29 0.77
Sum 0.006
Observing the signs of the CO2 lags regression
coefficients leaves us to answer the above question
with much nuance.
The VAR models with 2 and 3 lags both have one CO2
coefficient with the wrong negative sign. The VAR with
4 lags has two coefficients with the wrong sign. I
In some cases, we can accept coefficients with the
wrong sign considering that the CO2 -> temperature
relationship may have some mean-reverting properties
that would cause this reversal in coefficients signs.
Impulse Response Functions
45
The cumulative Impulse Response Function over the next 10 year periods describing the impact on temperature in response
to an unanticipated upward shock of a one unit increase in CO2 concentration is rather unsettling. Well, when using a VAR
model with only 1 lag, the IRF graph makes much sense; as it illustrates CO2 having a positive impact on temperature level
(left graph). But, the graph on the right that describes the same IRF for a VAR with 2 lags suggests that an upward shock in
CO2 would have a negative impact on temperature level. The IRF graphs for VAR with 3 and 4 lags looked nearly identical to
the VAR with 2 lags IRF graph (right hand graph) with the negative sign.
46
Forecast Error Variance Decomposition (FEVD)
For the VAR with 1 lag model fitting temperature, the table
indicates that the autoregressive lag of temperature provides
the vast majority of the information to fit temperature as the Y
dependent variable. And, that the exogenous CO2 lag 1
variable provides very little information to the model.
The FEVD profile for all the other VAR models with up to 4 lags
had the exact same FEVD profile with the lags of the
temperature variable providing over 99% of the information to
the model; and, the exogenous CO2 lags providing very little
information to these VAR models.
Forecast Error Variance Decomposition (FEVD)
VAR with just 1 lag
CO2 causes temperature
Period temperature co2
1 1.000 0.000
2 1.000 0.000
3 0.999 0.001
4 0.998 0.002
5 0.997 0.003
6 0.996 0.004
7 0.995 0.005
8 0.994 0.006
9 0.992 0.008
10 0.991 0.009
47
Why did some of our Granger Causality Analysis later steps
showed ambivalent results?
The first couple of steps showed pretty convincing mathematical results that CO2 does Granger cause
temperature. However, as shown the later steps were between ambivalent to disproving.
The above is probably due to a couple of phenomena.
The first one is generic to these types of analysis. It is common to confirm Granger causality through the first
couple of steps of such analysis. But confirmation through all 5 steps is much less common.
The second phenomenon potentially specific to this modeling exercise is that the temperature level variable has a
very high level of autocorrelation. And, within VAR models this strong autocorrelation of temperature probably
has much reduced the explanatory impact of CO2. Thus, the temperature lags partly crowded out the CO2 ones in
terms of estimating temperature levels with VAR models. More specifically, the temperature autocorrelation lag 1
is 0.9518; and, is a bit higher than the CO2 vs. temperature correlation lag 1 at 0.9453. One would think we could
resolve this situation by detrending the variables and dealing with yearly changes in temperature and CO2
concentration. But, there is too much volatility in the yearly change variables to demonstrate any explicit
relationship between the two variables. I had done such an exercise years ago. And, it would only serve as a
mean to demonstrate that there is no Granger causal relationship between the two variables.
48
8. VAR Forecast
Here we will revisit forecasting temperature anomaly over the 1982 – 2020 period using a model trained using
1880 – 1981 data. But, using VAR structures we will now attempt to conduct this forecast with no information
whatsoever (no info regarding prospective CO2 concentration levels).
This type of forecast testing is so challenging that it is bordering on the absurd. Imagine actually forecasting a time
series variable (S&P 500, GDP, CPI, etc.) over the next 39 years without any exogenous information over those
prospective years. That be probably close to impossible.
Revisiting our best 1982 – 2020 forecast with the CO2 Model
This was our best temperature
anomaly forecast so far over the 1982
– 2020 period using data from 1880
to 1981 to train our CO2 based
model.
As shown, this is a remarkably good
forecast. It entails that if you could
have known CO2 concentration over
this period (1982 – 2020), you could
have generated a pretty good
estimate of the temperature anomaly
over this same period (1982 – 2020).
Notice that all the CO2 model
estimates of the temperature
anomaly fall well within the 95%
Prediction Interval. This is a rather
unusually good situation.
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
Temperature Anomaly. Hold Out 1982 - 2020. With 95% Prediction Interval
Actual CO2 Model Lower Upper
49
A VAR model w/ 1 lag using LN(CO2) can predict with no info whatsoever!
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
Temperature Anomaly. VAR w/ LN(CO2) forecast 1 lag, 1982 - 2020. P.I. 95%
Actual VAR fcst Lower Upper
Just using LN(CO2) instead of CO2 as
our second Z variable within a VAR
model with 1 lag generates a
surprisingly good forecast of the
temperature anomaly over the 1982
– 2020 period with no information
whatsoever regarding this period!
50
This is rather astonishing.
As shown, the VAR forecast does
overestimate temperature by just
about 0.1 degree Celsius at the onset
in 1982 and in 2020. That’s a very
small error given the model is not fed
any information.
Comparing our CO2 Model vs. VAR (with LN(CO2) forecasts
1.2
1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
Temperature Anomaly. Hold Out 1982 - 2020
Actual CO2 Model VAR
Temperature anomaly over the 1982 - 2020 period
Actual CO2 Model VAR
Average 0.537 0.561 0.617
Median 0.555 0.533 0.581
Max 1.005 0.975 1.100
Min 0.140 0.225 0.295
Range 0.865 0.751 0.804
51
Ok, the VAR model does
overestimate the temperature
anomaly a bit relative to the OLS
Cointegration Regression (CO2
Model). But, the VAR
overestimation is really pretty
small when considering the VAR
model generated a 39 year forecast
with no info whatsoever. By
contrast, the CO2 model was fed
the precise CO2 concentration level
over that entire period. That is a
huge difference.
Why did the VAR (w/ LN(CO2) overestimated temperature?
52
440
430
420
410
400
390
380
370
360
350
340
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
CO2 (ppm). VAR (w/ LN(CO2) forecast 1 lag, 1982 - 2020 with P.I. 95%
Actual VAR fcst lower upper
This question is a little perplexing because
we observed earlier that using LN(CO2)
instead of CO2 within our earlier OLS
regressions resulted in the LN(CO2) model
underestimating temperature over the Hold
Out (1982 – 2020) by quite a bit.
But, when we use this same LN(CO2)
variable within this VAR model, instead of
underestimating temperature, it actually
overestimates them by a little bit.
Part of the reason is that this same VAR
model does overestimate CO2
concentration.
Remember in the former Hold Out tests with the standard OLS regressions, these models were fed with CO2
concentration over the 1982 – 2020 period; while the models were trained over the 1880 – 1981 period. With this VAR
model, we are dealing with a rather extraordinary situation where it was trained over the 1880 – 1981 period; and, it was
not provided any information over the Hold Out period (1982 – 2020). Yet, it was asked to forecast temperature over that
same period. That’s a very challenging situation.
Conclusion
53
Using CO2 concentration to estimate and forecast temperature anomaly levels was on many counts
surprisingly successful.
More complex models using additional variables associated with the Pacific Decadal Oscillation (El Nino (+); La
Nina (-)) proved not so successful. They could fit the historical data. But, they turned out inferior in
forecasting compared to the simpler model just using CO2 concentration.
Using the natural log of CO2 as an independent variable was surprisingly successful for replicating the IPCC
scenarios and also in forecasting the temperature anomaly over the 1982 – 2020 period with no info
whatsoever using a VAR model with one lag.
When it came to a full fledge Granger causality analysis, our results were much humbler. We could confirm
Granger causality through the first two steps (Granger causality and its relationship direction). But, the
subsequent steps turned out to be rather ambivalent (VAR regression coefficients signs, IRFs, FEVD).

More Related Content

What's hot

Reaction Rate Graph Tutorial 4
Reaction Rate Graph Tutorial 4Reaction Rate Graph Tutorial 4
Reaction Rate Graph Tutorial 4
paulbhill
 
Reaction Rate Graph Tutorial 3
Reaction Rate Graph Tutorial 3Reaction Rate Graph Tutorial 3
Reaction Rate Graph Tutorial 3
paulbhill
 
EXP1 Dissociation of Propionic Acid Vapor
EXP1 Dissociation of Propionic Acid VaporEXP1 Dissociation of Propionic Acid Vapor
EXP1 Dissociation of Propionic Acid Vapor
Rashid Alsuwaidi
 
Le châtelier’s principle
Le châtelier’s principleLe châtelier’s principle
Le châtelier’s principle
gbsliebs2002
 
CHM023L - B06 Final Report Group 3 Experiment 3 (Chemical Equilibrium: Le Cha...
CHM023L - B06 Final Report Group 3 Experiment 3 (Chemical Equilibrium: Le Cha...CHM023L - B06 Final Report Group 3 Experiment 3 (Chemical Equilibrium: Le Cha...
CHM023L - B06 Final Report Group 3 Experiment 3 (Chemical Equilibrium: Le Cha...
Chino Chino
 

What's hot (20)

Ijetr021254
Ijetr021254Ijetr021254
Ijetr021254
 
05 part1 combustion reactions
05 part1 combustion reactions05 part1 combustion reactions
05 part1 combustion reactions
 
finalHTproj
finalHTprojfinalHTproj
finalHTproj
 
05 part1 combustion reactions
05 part1 combustion reactions05 part1 combustion reactions
05 part1 combustion reactions
 
Reaction Rate Graph Tutorial 4
Reaction Rate Graph Tutorial 4Reaction Rate Graph Tutorial 4
Reaction Rate Graph Tutorial 4
 
Thompson, Rona: Changes in Net Ecosystem Exchange over Europe During the 2018...
Thompson, Rona: Changes in Net Ecosystem Exchange over Europe During the 2018...Thompson, Rona: Changes in Net Ecosystem Exchange over Europe During the 2018...
Thompson, Rona: Changes in Net Ecosystem Exchange over Europe During the 2018...
 
Mine Risk Control
Mine Risk ControlMine Risk Control
Mine Risk Control
 
Reaction Rate Graph Tutorial 3
Reaction Rate Graph Tutorial 3Reaction Rate Graph Tutorial 3
Reaction Rate Graph Tutorial 3
 
EXP1 Dissociation of Propionic Acid Vapor
EXP1 Dissociation of Propionic Acid VaporEXP1 Dissociation of Propionic Acid Vapor
EXP1 Dissociation of Propionic Acid Vapor
 
Convert all volume_004
Convert all volume_004Convert all volume_004
Convert all volume_004
 
Chem 2 - Chemical Equilibrium VI: Heterogeneous Equilibria
Chem 2 - Chemical Equilibrium VI: Heterogeneous EquilibriaChem 2 - Chemical Equilibrium VI: Heterogeneous Equilibria
Chem 2 - Chemical Equilibrium VI: Heterogeneous Equilibria
 
Book of Calculator Programs 01e
Book of Calculator Programs 01eBook of Calculator Programs 01e
Book of Calculator Programs 01e
 
Le châtelier’s principle
Le châtelier’s principleLe châtelier’s principle
Le châtelier’s principle
 
Chem 2 - Chemical Equilibrium V: ICE Tables and Equilibrium Calculations
Chem 2 - Chemical Equilibrium V: ICE Tables and Equilibrium CalculationsChem 2 - Chemical Equilibrium V: ICE Tables and Equilibrium Calculations
Chem 2 - Chemical Equilibrium V: ICE Tables and Equilibrium Calculations
 
Gas laws
Gas lawsGas laws
Gas laws
 
CONVERSION OF TEMPERATURE
CONVERSION OF TEMPERATURECONVERSION OF TEMPERATURE
CONVERSION OF TEMPERATURE
 
CHM023L - B06 Final Report Group 3 Experiment 3 (Chemical Equilibrium: Le Cha...
CHM023L - B06 Final Report Group 3 Experiment 3 (Chemical Equilibrium: Le Cha...CHM023L - B06 Final Report Group 3 Experiment 3 (Chemical Equilibrium: Le Cha...
CHM023L - B06 Final Report Group 3 Experiment 3 (Chemical Equilibrium: Le Cha...
 
Vernier exploring gas laws lab
Vernier exploring gas laws labVernier exploring gas laws lab
Vernier exploring gas laws lab
 
Chapter 4.4(charles' law)
Chapter 4.4(charles' law)Chapter 4.4(charles' law)
Chapter 4.4(charles' law)
 
2005 Carbon Dioxide Forecast for Alarm, NT, Canada
2005 Carbon Dioxide Forecast for Alarm, NT, Canada2005 Carbon Dioxide Forecast for Alarm, NT, Canada
2005 Carbon Dioxide Forecast for Alarm, NT, Canada
 

Similar to Climate Change Model

MSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik MallaMSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik Malla
Kartik Malla
 
Elizabeth Towle Batch Distillation of Ethanol Design
Elizabeth Towle Batch Distillation of Ethanol DesignElizabeth Towle Batch Distillation of Ethanol Design
Elizabeth Towle Batch Distillation of Ethanol Design
Elizabeth Towle
 
Regression Modelling of Thermal Degradation Kinetics, of Concentrated, Aqueou...
Regression Modelling of Thermal Degradation Kinetics, of Concentrated, Aqueou...Regression Modelling of Thermal Degradation Kinetics, of Concentrated, Aqueou...
Regression Modelling of Thermal Degradation Kinetics, of Concentrated, Aqueou...
Shaukat Mazari
 

Similar to Climate Change Model (20)

Validation of Results of Analytical Calculation of Steady State Heat Transfer...
Validation of Results of Analytical Calculation of Steady State Heat Transfer...Validation of Results of Analytical Calculation of Steady State Heat Transfer...
Validation of Results of Analytical Calculation of Steady State Heat Transfer...
 
DATA TABLE, EQUATION FIT OR INTERPOLATION
DATA TABLE, EQUATION FIT OR INTERPOLATIONDATA TABLE, EQUATION FIT OR INTERPOLATION
DATA TABLE, EQUATION FIT OR INTERPOLATION
 
DATA TABLE, EQUATION FIT OR INTERPOLATION
DATA TABLE, EQUATION FIT OR INTERPOLATION DATA TABLE, EQUATION FIT OR INTERPOLATION
DATA TABLE, EQUATION FIT OR INTERPOLATION
 
DATA TABLE, EQUATION FIT OR INTERPOLATION
DATA TABLE, EQUATION FIT OR INTERPOLATIONDATA TABLE, EQUATION FIT OR INTERPOLATION
DATA TABLE, EQUATION FIT OR INTERPOLATION
 
MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILE...
MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILE...MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILE...
MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILE...
 
MSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik MallaMSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik Malla
 
Forecasting_CO2_Emissions.pptx
Forecasting_CO2_Emissions.pptxForecasting_CO2_Emissions.pptx
Forecasting_CO2_Emissions.pptx
 
Quantile regression ensemble for summer temperatures
Quantile regression ensemble for summer temperaturesQuantile regression ensemble for summer temperatures
Quantile regression ensemble for summer temperatures
 
A correlation for the prediction of thermal conductivity of liquids
A correlation for the prediction of thermal conductivity of liquidsA correlation for the prediction of thermal conductivity of liquids
A correlation for the prediction of thermal conductivity of liquids
 
Shortcut Design Method for Multistage Binary Distillation via MS-Exce
Shortcut Design Method for Multistage Binary Distillation via MS-ExceShortcut Design Method for Multistage Binary Distillation via MS-Exce
Shortcut Design Method for Multistage Binary Distillation via MS-Exce
 
A density correction for the peng robinson equation
A density correction for the peng robinson equationA density correction for the peng robinson equation
A density correction for the peng robinson equation
 
Chapter 14
Chapter 14 Chapter 14
Chapter 14
 
An Offshore Natural Gas Transmission Pipeline Model and Analysis for the Pred...
An Offshore Natural Gas Transmission Pipeline Model and Analysis for the Pred...An Offshore Natural Gas Transmission Pipeline Model and Analysis for the Pred...
An Offshore Natural Gas Transmission Pipeline Model and Analysis for the Pred...
 
2013 sk~1
2013 sk~12013 sk~1
2013 sk~1
 
Elizabeth Towle Batch Distillation of Ethanol Design
Elizabeth Towle Batch Distillation of Ethanol DesignElizabeth Towle Batch Distillation of Ethanol Design
Elizabeth Towle Batch Distillation of Ethanol Design
 
SPE-175051-MS
SPE-175051-MSSPE-175051-MS
SPE-175051-MS
 
Computational Analysis of Natural Convection in Spherical Annulus Using FEV
Computational Analysis of Natural Convection in Spherical  Annulus Using FEVComputational Analysis of Natural Convection in Spherical  Annulus Using FEV
Computational Analysis of Natural Convection in Spherical Annulus Using FEV
 
232372441 correlation-and-prediction-of-vle-and-lle-by-empirical-eos
232372441 correlation-and-prediction-of-vle-and-lle-by-empirical-eos232372441 correlation-and-prediction-of-vle-and-lle-by-empirical-eos
232372441 correlation-and-prediction-of-vle-and-lle-by-empirical-eos
 
Regression Modelling of Thermal Degradation Kinetics, of Concentrated, Aqueou...
Regression Modelling of Thermal Degradation Kinetics, of Concentrated, Aqueou...Regression Modelling of Thermal Degradation Kinetics, of Concentrated, Aqueou...
Regression Modelling of Thermal Degradation Kinetics, of Concentrated, Aqueou...
 
Trial and error in determining carbon budgets at policy relevant scales
Trial and error in determining carbon budgets at policy relevant scalesTrial and error in determining carbon budgets at policy relevant scales
Trial and error in determining carbon budgets at policy relevant scales
 

More from Gaetan Lion

More from Gaetan Lion (20)

DRU projections testing.pptx
DRU projections testing.pptxDRU projections testing.pptx
DRU projections testing.pptx
 
Climate Change in 24 US Cities
Climate Change in 24 US CitiesClimate Change in 24 US Cities
Climate Change in 24 US Cities
 
Compact Letter Display (CLD). How it works
Compact Letter Display (CLD).  How it worksCompact Letter Display (CLD).  How it works
Compact Letter Display (CLD). How it works
 
CalPERS pensions vs. Social Security
CalPERS pensions vs. Social SecurityCalPERS pensions vs. Social Security
CalPERS pensions vs. Social Security
 
Recessions.pptx
Recessions.pptxRecessions.pptx
Recessions.pptx
 
Inequality in the United States
Inequality in the United StatesInequality in the United States
Inequality in the United States
 
Housing Price Models
Housing Price ModelsHousing Price Models
Housing Price Models
 
Global Aging.pdf
Global Aging.pdfGlobal Aging.pdf
Global Aging.pdf
 
Cryptocurrencies as an asset class
Cryptocurrencies as an asset classCryptocurrencies as an asset class
Cryptocurrencies as an asset class
 
Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?
 
Can Treasury Inflation Protected Securities predict Inflation?
Can Treasury Inflation Protected Securities predict Inflation?Can Treasury Inflation Protected Securities predict Inflation?
Can Treasury Inflation Protected Securities predict Inflation?
 
How overvalued is the Stock Market?
How overvalued is the Stock Market? How overvalued is the Stock Market?
How overvalued is the Stock Market?
 
The relationship between the Stock Market and Interest Rates
The relationship between the Stock Market and Interest RatesThe relationship between the Stock Market and Interest Rates
The relationship between the Stock Market and Interest Rates
 
Life expectancy
Life expectancyLife expectancy
Life expectancy
 
Comparing R vs. Python for data visualization
Comparing R vs. Python for data visualizationComparing R vs. Python for data visualization
Comparing R vs. Python for data visualization
 
Will Stock Markets survive in 200 years?
Will Stock Markets survive in 200 years?Will Stock Markets survive in 200 years?
Will Stock Markets survive in 200 years?
 
Standardization
StandardizationStandardization
Standardization
 
Is Tom Brady the greatest quarterback?
Is Tom Brady the greatest quarterback?Is Tom Brady the greatest quarterback?
Is Tom Brady the greatest quarterback?
 
Regularization why you should avoid them
Regularization why you should avoid themRegularization why you should avoid them
Regularization why you should avoid them
 
Basketball the 3 pt game
Basketball the 3 pt gameBasketball the 3 pt game
Basketball the 3 pt game
 

Recently uploaded

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 

Recently uploaded (20)

Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 

Climate Change Model

  • 1. Climate Change Models to Estimate and Forecast Temperature Gaetan Lion September 2021 1
  • 2. 2 Content 1. Introduction 2. Data 3. Baseline trend models 4. CO2 models 5. Out-of-sample forecasts 6. Replicating IPCC scenarios 7. Granger Causality, VAR, IRFs 8. VAR Forecast
  • 3. 3 1. Introduction This presentation discloses the modeling of global temperature* associated, or caused, by a rising concentration in CO2 in parts per million (ppm). Other variables will also be explored and tested to include within these Climate Change models. The above is: a) To assess the information imparted by CO2 concentration into this model estimating and predicting temperature; b) To test the accuracy of such models to fit the historical temperature data and to forecast temperature within out-of-sample testing framework; c) To replicate the most recent IPCC scenarios; d) To better understand the relationship between CO2 concentration and temperature and to attempt to demonstrate causality of CO2 -> temperature. * Measured as temperature anomaly over the 1850 – 1900 average global temperature.
  • 5. -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1880 1884 1888 1892 1896 1900 1904 1908 1912 1916 1920 1924 1928 1932 1936 1940 1944 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012 2016 2020 NOAA vs NASA Temperature anomaly in degree Celsius NOAA NASA Temperature annual history going back to 1880 to 2020. The two overlapping series are from NASA and the NOAA. We will use the average of the two series. 5 Temperature records are captured as “temperature anomaly.” The latter represents the difference in temperature between a specific year and the average over the 1850 – 1900 period when the industrialization, level and CO2 concentration were much lower. Temperature anomaly is measured in degree Celsius (or Centigrade that are equivalent).
  • 6. Annual CO2 concentration in parts per million from 1880 to 2020. Data from 1880 to 1958 is derived from a cooperative effort between three different scientific teams from Australia and France constructing the data derived from ice core analysis. Data from 1958 to 2020 is from the NOAA. 6
  • 7. We understand that comparing two levels variables, without detrending them, can lead to spurious correlations and regressions. However, when two level variables are cointegrated the above caveat is nullified. We will disclose later cointegration testing for these two variables. As observed this scatter plot shows a pretty strong correlation between the two variables. 7
  • 8. The relationship between CO2 and temperature can be split over two periods. The first one (1880 – 1970) with CO2 concentration ranging from 290 to 325 ppm is associated with a not so strong linear relationship between the two variables. The second one (1971 – 2020) with CO2 concentration ranging from 325 to close to 420 is associated with a very strong linear relationship. For the purpose of our modeling, we will not split the data as the related regression parameters are pretty stable (intercept and slope of the regression equations shown on the scatter plots). 8
  • 9. Checking the Autocorrelation of the Residuals of the Ordinary Least Square (OLS) Cointegration Regression: Temperature ~ CO2 9 Given that we are using level variables, the residuals autocorrelation levels as captured by the ACF and PACF graphs is reasonably low. And, at the onset suggests that these two variables (CO2 and temperature) may be indeed cointegrated. The PACF graph at the bottom is the one used to select the number of yearly lags we should select to conduct our unit root testing to confirm that these residuals are indeed stationary (do not have a unit root). Even though within the PACF graph, only lag 1 crosses the line of statistical significance ( > 0.2), we will use up to lag 4 to be more conservative.
  • 10. 10 Testing the residuals of the OLS Cointegration Regression Temperature ~ CO2 for stationarity Test p-value Interpretation confirming residuals are stationary ADF test 0.01 Reject the null hypothesis that residuals are nonstationary Phillips Perron 0.01 Reject the null hypothesis that residuals are nonstationary KPSS > 0.1 Accept the null hypothesis that residuals are stationary We used 4 lags for each of the above unit root test. In each case, the respective unit root tests confirmed that the Cointegration Regression residuals were stationary. This confirmation allows us to proceed in modeling the relationship between CO2 and temperature using level variables knowing that these two variables are explicitly cointegrated. Further residual model testing often includes testing for autocorrelation, heteroskedasticity, and normal distribution. However, any related residual issues do not affect the regression coefficients biasness. They may affect the reliability of regression coefficients confidence intervals and their statistical significance. However, if such regression coefficients are associated with t-stats > 2.5 or 3.0, statistical significance is typically not an issue (even after adjusting with Robust Standard Errors). Additionally, in some cases as we’ll see we are not explicitly concerned with levels of statistical significance, as long as the variable make good sense in terms of explaining how the climate system works, and that the variable regression coefficient has the appropriate sign.
  • 11. 11 3. Baseline trend models Within this section we will develop models that do not use CO2 as an exogenous variable but simply various trend variables (counting 1, 2, 3, 4,…). This is just to test whether just the passing of time is the driving trend and not so much CO2 as a causal factor. This is a pretty good test on whether your level-based original model is truly valid and not another example of a spurious regression using level variables.
  • 12. -0.4 -0.6 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1880 1884 1888 1892 1896 1900 1904 1908 1912 1916 1920 1924 1928 1932 1936 1940 1944 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012 2016 2020 Temperature Anomaly. Historical Fit of a simple Trend Model Actual Trend This model uses a single Trend variable (counting 1, 2, 3, etc.) to estimate the temperature over time. It does not use any exogenous information. 12 As shown, this Trend model is pretty terrible. Notice how it way underestimates temperatures at the onset from 1880 to 1900 and at the end from 2005 to 2020. In between from 1901 to 2004 it typically overestimates temperatures.
  • 13. 13 The Trend Model residuals are pretty awful looking -0.40 -0.30 -0.20 -0.10 0.00 0.10 0.20 0.30 0.40 0.50 Residual 280 300 320 340 360 380 400 420 CO2 concentration (ppm) Trend Model residuals -0.40 -0.30 -0.20 -0.10 0.00 0.10 0.20 0.30 0.40 0.50 -0.60 -0.40 -0.20 0.40 0.60 Residual 0.00 0.20 Model Estimate Trend Model residuals A good model should have a residual curve (red dashed line) that is flat, straight, and sits at the 0.00 level. This would indicate residuals that are stationary and mean reverting around the 0.00 level. These residuals are far away from meeting that standard. They are clearly nonstationary.
  • 14. 14 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 1880 1884 1888 1892 1896 1900 1904 1908 1912 1916 1920 1924 1928 1932 1936 1940 1944 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012 2016 2020 Temperature Anomaly. Hist. Fit of a polynomial Trend Model Actual Trend 2 If we simply add a second Trend variable that represents the square of the Trend variable, we actually get a surprisingly good historical fit of the data in absence of any information from CO2. While the Trend variable starts with 1, 2, 3, etc.; the Trend square variable starts with 1, 4, 9, etc. The combination of those two variables make for a very good polynomial regression equation that fits the J shape curve of the data very well. We call this model Trend 2.
  • 15. The residual trend line (red) relative to Model Estimates on the x-axis within the right-hand graph is now perfectly flat, straight, and on the 0.00 line as it should. Even that same residual trend line when using CO2 concentration on the x-axis is actually reasonably flat. It looks like this model appear to capture a good deal of the information imparted by the CO2 variable. We have now a pretty competitive Baseline Trend model to assess the validity of our upcoming CO2 mode1ls5. The Trend 2 Model residuals are far better looking
  • 16. Description of the Trend 2 model The square of the trend (trend2) is a very large number, so the resulting regression coefficient is very small: 0.000085. 16 All the Goodness-of-fit measures are very high. And, the resulting model errors are pretty low. This is kind of amazing given that we have just used trend variables to fit the temperature history starting back in 1880.
  • 17. 17 4. CO2 Models We will introduce two CO2 based models to estimate and forecast temperature. The first one will be our simple linear OLS Cointegration Regression just using CO2 as our stand alone exogenous variable. The second one will be a more complete model that will also include the influence on temperature from the Pacific Decadal Oscillation with warm years due to El Nino and cold years due to La Nina. This model will also include another intervention variable covering the years from 1940 to 1970 before sulfates aerosol were heavily regulated. Sulfates have a lowering effect on temperature that partly counters the rising effect of CO2.
  • 18. CO2 model description Notice the extremely high t-stat of the CO2 coefficient, leaving no doubt as to the statistical significance of this variable. 18
  • 19. The CO2 model has very good looking residuals (flat red lines) 19
  • 20. The more complete CO2 based model The El Nino variable has a p-value of 0.155, not stat. significant at the Alpha < 0.10 level. However, within a sport betting market, this same p-value would correspond to one team being favored with odds close to 6-to-1 of winning. That be a pretty good bet. In view of the above, we are comfortable including the El Nino variable in our model. It also makes sense to include both years that have a positive impact on temperature (El Nino) with the ones that have a negative impact (La Nina). 20
  • 21. The complete Model residuals are still reasonably good looking (fairly flat red curves) 21
  • 22. 22 Model Competition regarding the fit of Temperature history CO2 model Model Trend 2 Adjusted R Square 0.891 0.917 0.888 Predicted R Square 0.890 0.913 0.885 RMSE 0.117 0.102 0.119 MAE 0.095 0.082 0.095 Whether looking at measure of variance explanation (Adjusted R Square), one-observation prediction (Predicted R Square) or model errors (RMSE and Mean Absolute Error), the three models are very close. The CO2 model and the Trend 2 model are just about dead even on all counts. The Model that includes the other variables such as El Nino and La Nina is fractionally more accurate. If we stopped our analysis now, one could prematurely conclude that the trend (including the trend square variable) just about explains everything regarding the progressive increase in temperature from 1880 and 2020. And, that the two CO2 based models really do not add much information if any above just capturing this trend. This could lead one to assessing our CO2 based models as “spurious.” Additional analysis will confirm otherwise supporting that including a CO2 variable far improves the prediction accuracy of such model. Fitting the historical data is one thing. Making reasonably accurate prediction is far more challenging and useful.
  • 23. 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 1880 1884 1888 1892 1896 1900 1904 1908 1912 1916 1920 1924 1928 1932 1936 1940 1944 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012 2016 2020 Temperature Anomaly. Historical Fit Actual CO2 model Model Trend 2 When you visually compare the historical fit of the three models, they all fit the underlying J curve long term trend of temperature increase over the 1880 to 2020 period. 23 The more complete Model that includes the El Nino and La Nina variables does match some of the volatility or oscillations in the temperature annual data much better than the other two models. Otherwise, again really not much difference between the three models.
  • 24. 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Temperature Anomaly. Historical Fit since 1990 Actual CO2 model Model Trend 2 Focusing on the more recent period since 1990 util 2020, we can observe similar pretty good fit between the three models. 24 The more complete Model has a slightly better fit by better capturing the temperature oscillations associated with El Nino/La Nina. However, notice how the Trend 2 model starts to underestimate the temperature level starting in 2014. This may be the first indication that CO2 does impart some valuable information to this temperature model.
  • 25. 25 5. Out-of-sample forecasts Fitting historical data is one thing. And, one way or another it is often relatively easy even in the case of fitting historical temperature level from 1880 to 2020, as we have seen. Predicting observations using out-of-sample forecasts, also called Hold Out testing, is far more difficult and is a far more relevant test of a model predictive accuracy. With such models, you run often into a situation where a model fits the historical data really well, but predicts really poorly (in Hold Out testing). This is a classic situation of model overfitting. It happens all the time. Within this section we will test whether our models are overfit, or if instead they do provide predictive information.
  • 26. 26 Cross Validation test Mean Absolute Error History Cross-val. C.V./History CO2 model 0.095 0.099 1.05 Model 0.082 0.089 1.09 Trend 2 0.095 0.104 1.10 Cross validation is a rigorous form of out-of-sample forecast testing. In our case, we removed 14 observations from the data to create a 14-year prediction window. And, we did this exercise 10 times to cover the 141 yearly observations within our complete data set. So, the first prediction window went from 1880 to 1893. We used a model with history from 1894 to 2020 to attempt to predict the 1880 – 1893 years. The second prediction window was from 1894 to 1907. We used a model with history in all other years outside the prediction window. And, we continued this process until using the most recent 14 years as our prediction window. The table compares the Mean Absolute Error (MAE) of each of our three models when we first used the entire data set to fit the history. Next, it discloses the MAE that is the average MAE of the 10 cross validation prediction windows. And, next we look at the ratio or multiple of the cross validations MAE divided by the MAE during history. The cross validation MAE by definition should be much higher than the MAE during history. If that multiple is greater than 1.5, you may be dealing with a model that is overfit. As shown above, all our three models perform well on this count with very little deterioration during the cross validation test. Again the complete Model is a bit better than the other two. And, at the margin our CO2 model did a bit better than the Trend 2 model during cross validation.
  • 27. 2006 – 2020 Out-of-sample Hold Out Test 27 Temperature Anomaly estimate 2006 - 2020 Actual CO2 model Model Trend2 2005 0.68 0.68 0.68 0.68 2006 0.64 0.63 0.64 0.56 2007 0.65 0.65 0.62 0.58 2008 0.54 0.67 0.54 0.59 2009 0.65 0.68 0.69 0.61 2010 0.72 0.71 0.68 0.63 2011 0.60 0.73 0.60 0.64 2012 0.65 0.75 0.72 0.66 2013 0.68 0.78 0.75 0.67 2014 0.75 0.80 0.81 0.69 2015 0.92 0.82 0.69 0.71 2016 1.01 0.85 0.82 0.72 2017 0.92 0.88 0.85 0.74 2018 0.84 0.90 0.87 0.76 2019 0.97 0.93 0.90 0.78 2020 1.00 0.95 0.92 0.79 Temp increase 0.33 0.28 0.25 0.12 MAE 0.07 0.06 0.11 Temperature Anomaly. Hold Out 2006 - 2020 1.20 1.00 0.80 0.60 0.40 0.20 0.00 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Actual CO2 model Model Trend2 When we attempt to forecast the recent period (2006 – 2020) using historical data (1880 – 2005), the Trend 2 model way underestimates the increase in temperature over the recent period ( + 0.12 vs. + 0.33 for actuals). The two CO2 based models do a lot better with respective temperature increase ranging from + 0.25 to + 0.28.
  • 28. 1990 – 2005 Out-of-sample Hold Out Test 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 Temperature Anomaly. Hold Out 1990 - 2005 Actual CO2 Model Model Trend2 This is the exact same pattern as the prior Hold Out Test. On a begin-to-end point basis, the Trend 2 model greatly underestimates the temperature increase over the 1990 – 2005 period. 28 Notice how the simpler CO2 model does better than the more complete Model on a begin-to- end point basis. This was also true in the Hold Out test on the previous slide. The repeated relative failure of the Trend 2 model is not so surprising. Polynomial regressions are notoriously good at fitting historical data; but often not so good the minute you do some out-of-sample testing.
  • 29. 1982 – 2020 Out-of-sample Hold Out Test 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Temperature Anomaly. Hold Out 1982 - 2020 Actual CO2 Model Model Trend 2 This is an unusually long Hold Out test where we removed the most 39 recent years of the data (1982 – 2020). 29 Just by knowing the CO2 concentration level, we would have come up with an excellent begin-to-end point estimation of the overall temperature increase over this 39 year period (CO2 Model). And, that estimation is far superior than the estimation from the other two models.
  • 30. Why is the complete Model a distant second to the simpler CO2 based model? It is because the Pacific Decadal Oscillation that captures the El Nino (+) and La Nina (-) is not so decadal. It is very volatile and captured in 3-month moving average that can often fluctuate between an El Nino (+) and La Nina (-) phenomenon within the same year. Therefore, the yearly based capture of those phenomena is highly inaccurate. 30
  • 31. Attempt to improve Hold Out with a Robust Quantile Regression regular CO2 model. To the contrary, the regular CO2 model generated a better set of predictions over this Hold Out period. This gives us some comfort that this CO2 model is pretty well specified, not overly influenced by outliers within its historical data, and able to make really pretty good predictions over a 39 year period. Prediction success over such a long period (just assuming we know the accurate value of CO2 concentration) is very rare for such time series models.31 Temperature Anomaly. Hold Out 1982 - 2020 1.2 1.0 0.8 0.6 0.4 0.2 0.0 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 2019 Actual CO2 Model Robust Model Linear regressions such as our CO2 models can be affected by outliers of both the Y variable (temperature) due to variables not included in the model (El Nino/La Nina, influence of other greenhouse gases, etc.) or the X variable (non linear change or random jumps in the CO2 concentration variable). To remedy the above issue of a regression coefficients being influenced or distorted by outliers in the historical data, we use robust regressions that are more resistant to the influence of such outliers. A common robust regression method is Quantile Regression that regresses to the Median instead of the Mean. And, therefore much reduces the influence of outliers. However, as shown such a Robust Quantile Regression did not improve the Hold Out performance of our
  • 32. 6. Replicating IPCC scenarios 32
  • 33. IPCC Scenarios 33 Within its most recent assessment, the IPCC has developed 5 different scenarios. The most benign one being called SSP1-1.9 whereby CO2 concentration would remain relatively flat between 400 to 450 ppm. And, the temperature anomaly would remain close to + 1.5 degree Celsius. The most severe one is called SSP5-8.5 when CO2 concentration would continue increasing rapidly to 1100 ppm by the end of the century; and, the temperature anomaly would reach about + 4.4 degree Celsius. Source: IPCC Technical Summary 2021. The large gray letters are part of the following statement ”accepted version subject to final editing.”
  • 34. 34 CO2 model LN(CO2) model Intercept -3.2 -19.8 Coefficient 0.010 3.43 Temperature anomaly estimates CO2 ppm CO2 model LN(CO2) model 300 -0.20 -0.21 400 0.80 0.78 500 1.81 1.54 600 2.81 2.17 700 3.82 2.70 800 4.82 3.15 900 5.83 3.56 1000 6.84 3.92 1100 7.84 4.25 1200 8.85 4.55 Temperature Anomaly in deg. Celsius Replicating IPCC Scenarios 10 9 8 7 6 5 4 3 2 1 0 -1 300 400 500 600 700 800 900 1000 1100 1200 CO2 Concentration (ppm) CO2 model LN(CO2) model Attempting to replicate the IPCC scenarios Our CO2 linear model appears to way overshoot IPCC scenarios when using true-out-of-sample CO2 concentrations that are way higher than what the model was trained on (much greater than 420 ppm and going up to 1200 ppm). However, using a very similar model structure and simply using the LN(CO2) generates a curve that looks like it may very well replicate the IPCC scenarios. We will look at that in greater detail on the next slide. Note how the two models are very close when using CO2 concentrations that the linear CO2 model was trained on, ranging from 300 to 400 ppm
  • 35. As shown on the graph, the LN(CO2) model temperature estimates with CO2 concentration up to 1200 ppm come very close to the ones generated by the IPCC scenarios. The graph highlights the temperature estimates for the most benign IPCC scenario, SSP1-1.9, and the most severe one, SSP5-8.5. The model slightly underestimates the former; and, is pretty much right on the money for the latter (the most severe scenario). 35
  • 36. Why did we not use LN(CO2) instead of CO2 to estimate and forecast temperature earlier? It is for a simple reason. When CO2 is < 420 ppm, historically there is a very strong linear relationship between CO2 and temperature. That linear relationship is much stronger and better fitting than a logarithmic relationship between the two variables. 36 We tested a logarithmic model with LN(CO2). It was pretty good, but it came a distant second to the linear CO2 model when conducting out-of-sample Hold Out testing. Over the longer term, going forward, and with true- out-sample CO2 concentration levels (much above 420 ppm), the scientific community within the IPCC assesses that the CO2 vs. temperature relationship follows a logarithmic curve. That’s a very good thing. If the relationship would continue to be linear, our survival would become increasingly unlikely.
  • 37. Description of the CO2 Model vs. LN(CO2) one As shown below, regarding the historical fit of the temperature data both models are very close. The Adjusted R Squares are nearly even at 0.89. And, the respective model Standard Errors between 0.117 and 0.118 degree Celsius are also very close. 37
  • 38. 1982 – 2020 Out-of-sample Hold Out Test. CO2 Model vs. LN(CO2) Model As shown on our long out-of-sample Hold Out test (1982 – 2020), the CO2 model performs much better than the LN(CO2) model. This is especially true if we look at it from a begin-to-endpoint perspective. The CO2 model just about meets the endpoint in 2020 when the temperature anomaly is + 1.00 degree Celsius. Meanwhile, the LN(CO2) model misses it by almost 0.2 degree Celsius. 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Temperature Anomaly. Hold Out 1982 - 2020 Actual CO2 Model LN(CO2) Model 38
  • 39. 39 7. Granger Causality, VAR, IRFs We will use the mentioned statistical methods to attempt to assess the causality of the CO2 concentration on temperature. Based on the disclosed work so far we already know there is a very strong association, or correlation, between the two. But, is this association truly causal? Demonstrating causality in any such models is most often extremely challenging. Often one can’t demonstrate true causality or even Granger causality (a less absolute definition of causality that merely entails that one variable is the chronological predecessor of another without necessarily causing the other.
  • 40. 40 The steps to evaluate Granger Causality in this particular case 1) Does CO2 Granger cause temperature? Run Granger Causality test: CO2 -> temperature. 2) Test in which direction this causality manifest itself. Run Granger Causality test in the reverse causal direction: temperature -> CO2. This sounds absurd but there may be ecosystem explanations supporting why this may be so. The math is agnostic on stuff like that. Granger Causality just checks if A causes B more than B causes A to confirm the causality direction. 3) What sign direction is this causality. Obviously, we want CO2 concentration to cause rising temperatures not declining one. To check that, we will observe the directional signs of the CO2 variables regression coefficients embedded in the underlying Vector Autoregression (VAR) model. If they sum up to a strong positive value, you have confirmed your hypothesis that CO2 causes rising temperatures. Otherwise, you have not. 4) Next, check out the Impulse Response Function (IRF) graphs to visualize how an unanticipated shock in CO2 concentration reverberates on temperature increase over the next 10 years. 5) Next, explore the Forecast Error Variance Decomposition (FEVD) to evaluate how much information CO2 does truly impart to these VAR models. Only once you have completed all five steps will you have drawn a complete picture of the Granger causality between two variables. Many practitioners stop after the very first step in a hurry to confirm their hypothesis; while being less than enthusiastic about pursuing the next steps that may not confirm their hypothesis.
  • 41. Does CO2 Granger cause Temperature? 41 Yes, it does We ran a set of Granger Causality tests. You start with a baseline autoregressive model that just includes 1 yearly lag of the temperature to estimate the temperature history. Next, you develop a second model by adding the 1 year lag of CO2 to also estimate the temperature history. Finally, you test with an F test and a Chi Square test whether the residuals of the second model including the CO2 lag are much lower than the residuals of the baseline autoregressive model. If they are indeed lower at a statistically significant level, you conclude that CO2 does Granger cause temperature. You repeat this procedure up to including 4 yearly lags (we did not contemplate using more lags. Beyond 4 yearly lags, we may likely start overfitting the model on the autoregressive properties of the respective time series). As shown above, both the series of F tests and Chi Square tests using models with up to 4 lags all confirm that CO2 clearly Granger cause temperature. Indeed, in all cases the resulting p-values are essentially Zero allowing us to reject the null hypothesis that there is no statistically significant difference between the two sets of residuals (baseline autoregressive model vs. model including the CO2 lags). CO2 Granger causes Temperature testing F test Value p -value Chi Square test Value p -value # of lags 1 39.9 0.00 40.7 0.00 2 21.4 0.00 44.4 0.00 3 10.0 0.00 31.6 0.00 4 5.7 0.00 24.4 0.00
  • 42. Does CO2 Granger cause Temperature… more than Temperature Granger causing CO2? Yes it does 42 CO2 Granger causes Temperature testing F test Value p -value Chi Square test Value p -value # of lags 1 39.9 0.00 40.7 0.00 2 21.4 0.00 44.4 0.00 3 10.0 0.00 31.6 0.00 4 5.7 0.00 24.4 0.00 Temperature Granger causes CO2 testing F test Value p -value Chi Square test Value p -value # of lags 1 1.9 0.17 1.9 0.17 2 3.9 0.02 8.2 0.02 3 2.9 0.04 9.2 0.03 4 1.8 0.14 7.6 0.11 When you run all the Granger causality test in the other direction, all the F tests and Chi Square test are a lot lower, and the resulting p-value are much lower. In several of the Granger causality tests, we can’t reject the null hypothesis that any difference in residuals between the baseline autoregressive model and the model that includes CO2 is just due to randomness.
  • 43. # of lags selection for the VAR models using Information Criteria The models described earlier that include lags of both CO2 and temperature to establish causality in either direction are essentially unrestricted Vector Autoregression (VAR) models. When used for other purposes, on a stand alone basis, such models are also called Autoregressive Distributed Lag (ARDL) models, a popular model structure in social sciences and econometrics. 43 As a side note, when using level variables one should typically use other forms of VAR (not unrestricted). But, given that the residuals of our unrestricted VAR models are uncorrelated, we should be ok to proceed as is. To select the best number of lags for our VAR models, we will check the output of information criteria generated by an R function. The lower the information criterion value the better the model fit and specification. # of Lags Info Criteria 1 2 3 4 AIC -6.66 -6.85 -6.80 -6.87 HQ -6.61 -6.76 -6.63 -6.68 -6.72 SC -6.54 -6.51 -6.49 FPE 0.00128 0.00106 0.00111 0.00104 As shown above, two of the information criteria select the VAR models with 2 lags. And, the other two select the VAR models with 4 lags. But, notice that all four models (with lags ranging from 1 up to 4 yearly lags) have very close information criteria values. In essence, they are very competitive with each other. So, we will often look at all four models.
  • 44. Does the CO2 vs. Temperature causal relationship have the appropriate positive sign? … well here it gets a bit foggy Yet, when we look at the overall Granger causality effect of CO2 on temperature (associated with an unexpected upward shock in CO2), this net effect seems very small at around 0.005 to 0.006 regardless of the VAR we use. We derive this net effect by summing the CO2 lags regression coefficients. But, at least this net effect is positive. 44 Model equation causal direction: CO2 causes temperature Model CO2 Lags Coefficient t stat p-value VAR w/ 1 lag CO2 lag 1 0.005 6.32 0.00 VAR w/ 2 lags CO2 lag 1 -0.049 -2.17 0.03 CO2 lag 2 0.055 2.40 0.02 Sum 0.006 VAR w/ 3 lags CO2 lag 1 -0.045 -1.79 0.07 CO2 lag 2 0.051 1.22 0.22 CO2 lag 3 0.000 0.01 1.00 Sum 0.006 VAR w/ 4 lags CO2 lag 1 -0.044 -1.72 0.09 CO2 lag 2 0.058 1.35 0.18 CO2 lag 3 -0.016 -0.37 0.71 CO2 lag 4 0.008 0.29 0.77 Sum 0.006 Observing the signs of the CO2 lags regression coefficients leaves us to answer the above question with much nuance. The VAR models with 2 and 3 lags both have one CO2 coefficient with the wrong negative sign. The VAR with 4 lags has two coefficients with the wrong sign. I In some cases, we can accept coefficients with the wrong sign considering that the CO2 -> temperature relationship may have some mean-reverting properties that would cause this reversal in coefficients signs.
  • 45. Impulse Response Functions 45 The cumulative Impulse Response Function over the next 10 year periods describing the impact on temperature in response to an unanticipated upward shock of a one unit increase in CO2 concentration is rather unsettling. Well, when using a VAR model with only 1 lag, the IRF graph makes much sense; as it illustrates CO2 having a positive impact on temperature level (left graph). But, the graph on the right that describes the same IRF for a VAR with 2 lags suggests that an upward shock in CO2 would have a negative impact on temperature level. The IRF graphs for VAR with 3 and 4 lags looked nearly identical to the VAR with 2 lags IRF graph (right hand graph) with the negative sign.
  • 46. 46 Forecast Error Variance Decomposition (FEVD) For the VAR with 1 lag model fitting temperature, the table indicates that the autoregressive lag of temperature provides the vast majority of the information to fit temperature as the Y dependent variable. And, that the exogenous CO2 lag 1 variable provides very little information to the model. The FEVD profile for all the other VAR models with up to 4 lags had the exact same FEVD profile with the lags of the temperature variable providing over 99% of the information to the model; and, the exogenous CO2 lags providing very little information to these VAR models. Forecast Error Variance Decomposition (FEVD) VAR with just 1 lag CO2 causes temperature Period temperature co2 1 1.000 0.000 2 1.000 0.000 3 0.999 0.001 4 0.998 0.002 5 0.997 0.003 6 0.996 0.004 7 0.995 0.005 8 0.994 0.006 9 0.992 0.008 10 0.991 0.009
  • 47. 47 Why did some of our Granger Causality Analysis later steps showed ambivalent results? The first couple of steps showed pretty convincing mathematical results that CO2 does Granger cause temperature. However, as shown the later steps were between ambivalent to disproving. The above is probably due to a couple of phenomena. The first one is generic to these types of analysis. It is common to confirm Granger causality through the first couple of steps of such analysis. But confirmation through all 5 steps is much less common. The second phenomenon potentially specific to this modeling exercise is that the temperature level variable has a very high level of autocorrelation. And, within VAR models this strong autocorrelation of temperature probably has much reduced the explanatory impact of CO2. Thus, the temperature lags partly crowded out the CO2 ones in terms of estimating temperature levels with VAR models. More specifically, the temperature autocorrelation lag 1 is 0.9518; and, is a bit higher than the CO2 vs. temperature correlation lag 1 at 0.9453. One would think we could resolve this situation by detrending the variables and dealing with yearly changes in temperature and CO2 concentration. But, there is too much volatility in the yearly change variables to demonstrate any explicit relationship between the two variables. I had done such an exercise years ago. And, it would only serve as a mean to demonstrate that there is no Granger causal relationship between the two variables.
  • 48. 48 8. VAR Forecast Here we will revisit forecasting temperature anomaly over the 1982 – 2020 period using a model trained using 1880 – 1981 data. But, using VAR structures we will now attempt to conduct this forecast with no information whatsoever (no info regarding prospective CO2 concentration levels). This type of forecast testing is so challenging that it is bordering on the absurd. Imagine actually forecasting a time series variable (S&P 500, GDP, CPI, etc.) over the next 39 years without any exogenous information over those prospective years. That be probably close to impossible.
  • 49. Revisiting our best 1982 – 2020 forecast with the CO2 Model This was our best temperature anomaly forecast so far over the 1982 – 2020 period using data from 1880 to 1981 to train our CO2 based model. As shown, this is a remarkably good forecast. It entails that if you could have known CO2 concentration over this period (1982 – 2020), you could have generated a pretty good estimate of the temperature anomaly over this same period (1982 – 2020). Notice that all the CO2 model estimates of the temperature anomaly fall well within the 95% Prediction Interval. This is a rather unusually good situation. 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Temperature Anomaly. Hold Out 1982 - 2020. With 95% Prediction Interval Actual CO2 Model Lower Upper 49
  • 50. A VAR model w/ 1 lag using LN(CO2) can predict with no info whatsoever! 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Temperature Anomaly. VAR w/ LN(CO2) forecast 1 lag, 1982 - 2020. P.I. 95% Actual VAR fcst Lower Upper Just using LN(CO2) instead of CO2 as our second Z variable within a VAR model with 1 lag generates a surprisingly good forecast of the temperature anomaly over the 1982 – 2020 period with no information whatsoever regarding this period! 50 This is rather astonishing. As shown, the VAR forecast does overestimate temperature by just about 0.1 degree Celsius at the onset in 1982 and in 2020. That’s a very small error given the model is not fed any information.
  • 51. Comparing our CO2 Model vs. VAR (with LN(CO2) forecasts 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Temperature Anomaly. Hold Out 1982 - 2020 Actual CO2 Model VAR Temperature anomaly over the 1982 - 2020 period Actual CO2 Model VAR Average 0.537 0.561 0.617 Median 0.555 0.533 0.581 Max 1.005 0.975 1.100 Min 0.140 0.225 0.295 Range 0.865 0.751 0.804 51 Ok, the VAR model does overestimate the temperature anomaly a bit relative to the OLS Cointegration Regression (CO2 Model). But, the VAR overestimation is really pretty small when considering the VAR model generated a 39 year forecast with no info whatsoever. By contrast, the CO2 model was fed the precise CO2 concentration level over that entire period. That is a huge difference.
  • 52. Why did the VAR (w/ LN(CO2) overestimated temperature? 52 440 430 420 410 400 390 380 370 360 350 340 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 CO2 (ppm). VAR (w/ LN(CO2) forecast 1 lag, 1982 - 2020 with P.I. 95% Actual VAR fcst lower upper This question is a little perplexing because we observed earlier that using LN(CO2) instead of CO2 within our earlier OLS regressions resulted in the LN(CO2) model underestimating temperature over the Hold Out (1982 – 2020) by quite a bit. But, when we use this same LN(CO2) variable within this VAR model, instead of underestimating temperature, it actually overestimates them by a little bit. Part of the reason is that this same VAR model does overestimate CO2 concentration. Remember in the former Hold Out tests with the standard OLS regressions, these models were fed with CO2 concentration over the 1982 – 2020 period; while the models were trained over the 1880 – 1981 period. With this VAR model, we are dealing with a rather extraordinary situation where it was trained over the 1880 – 1981 period; and, it was not provided any information over the Hold Out period (1982 – 2020). Yet, it was asked to forecast temperature over that same period. That’s a very challenging situation.
  • 53. Conclusion 53 Using CO2 concentration to estimate and forecast temperature anomaly levels was on many counts surprisingly successful. More complex models using additional variables associated with the Pacific Decadal Oscillation (El Nino (+); La Nina (-)) proved not so successful. They could fit the historical data. But, they turned out inferior in forecasting compared to the simpler model just using CO2 concentration. Using the natural log of CO2 as an independent variable was surprisingly successful for replicating the IPCC scenarios and also in forecasting the temperature anomaly over the 1982 – 2020 period with no info whatsoever using a VAR model with one lag. When it came to a full fledge Granger causality analysis, our results were much humbler. We could confirm Granger causality through the first two steps (Granger causality and its relationship direction). But, the subsequent steps turned out to be rather ambivalent (VAR regression coefficients signs, IRFs, FEVD).