We will test whether :
a) Sequential Deep Neural Networks (DNNs) can predict the stock market (S&P 500) better than OLS regression;
b) DNNs using smooth Rectified Linear activation functions perform better than the ones using Sigmoid (Logit) activation functions.
1. Can you Deep Learn the Stock Market?
Gaetan Lion, March 20, 2022
2. 2
Introduction
Objectives:
We will test whether :
a) Sequential Deep Neural Networks (DNNs) can predict the stock market better than OLS regression;
b) DNNs using smooth Rectified Linear activation functions perform better than the ones using Sigmoid
(Logit) activation functions.
Data:
Quarterly data from 1959 Q2 to 2021 Q3. All variables are fully detrended as quarterly % change or first
differenced in % (for interest rate variables). Models are using standardized variables. Predictions are
converted back into quarterly % change.
Data sources are from FREDS for the economic variables, and the Federal Reserve H.15 for interest rates.
Software used for DNNs.
R neuralnet package. Inserted a customized function to use a smooth ReLu (SoftPlus) activation function.
4. 4
The best underlying OLS Regression model
After testing many
macroeconomic variables
(interest rates, monetary
policy (QE), fiscal variables,
and many others) the best
OLS regression included the
following variables, in order
of predominant selection:
a) Consumer Sentiment (U
of Michigan);
b) Housing start;
c) Yield curve. Difference
between 5 Year Treasury
minus Federal Funds;
d) Real GDP growth.
5. 5
Explanatory logic of OLS Regression to estimate and predict the S&P 500 level
Consumer Sentiment is by far the most predominant variable. This is supported by the behavioral finance (Richard
Thaler) literature.
Housing Start (the 2nd variable), is supported by the research of Edward E. Leamer advancing that the housing sector is
a leading indicator of overall economic activity, which in turn impacts the stock market.
Next, the Yield Curve (5 Year Treasury minus FF), and economic activity (RGDP growth) are well established exogenous
variables that influence the stock market. Both are not quite statistically significant. And, their influence is much
smaller than for the first two variables. Nevertheless, they add much explanatory logic to our OLS regression fitting the
S&P 500.
7. 7
Scatter Plot Matrix of Variables
The Yield curve has a surprisingly low correlation
with the S&P 500 quarterly % change.
Otherwise, the three other independent
variables have material correlation with the
mentioned S&P 500.
There is no multicollinearity between the X
variables, as their respective correlations are way
below standard multicollinearity thresholds.
8. 8
A closer look: Consumer Sentiment, Housing Start
Both variables have a correlation around 0.4 with the S&P 500 quarterly % change. As shown, a 0.4 correlation is
associated with much randomness. The data points show a wide dispersion around the estimated regression trend line.
9. 9
A closer look: Yield Curve, and RGDP
Same comment as on the previous slide. Also, you can see how the relationship between the S&P 500 and the
Yield Curve (on the left) is the weakest as the slope of the regression trendline is almost flat (close to Zero).
11. 11
Common DNNs Activation Functions
Until around 2017, the preferred DNN activation function was the Sigmoid or Logistic one as it had an implicit
probabilistic weight to a Yes or No loading of a node or neuron. However, soon after the Rectified Linear Unit (ReLU)
became the preferred DNN activation function. We will advance that SoftPlus, also called smooth ReLU, should be
considered a superior alternative to ReLU. See further explanation on the next slide.
12. 12
The Sigmoid or Logistic Activation Function
There is nothing wrong with the Sigmoid function per se. The problem occurs when you take the first derivative of this
function. And, it compresses the range of the values by 50% (from 0 to 1, to 0 to 0.5 for the first iteration). In iterative DNN
models, the output of one hidden layer becomes the input for the sequential layer. And, this 50% compression from one
layer to the next can generate values that converge close to zero. This problem is called the “vanishing gradient descent.”
We will see that in our situation, this problem is not material.
13. 13
ReLU and smooth ReLU or SoftPlus Activation Functions
SoftPlus appears superior to ReLu because it captures the weights of many more neurons’ features, as it does not zero
out any such features with an input value < 0. Also, it generates a continuous set of derivatives values ranging from 0 to
1. Instead, ReLu derivatives values are limited to a binomial outcome (0, 1).
15. 15
The DNNs structure
• One input layer with 4 independent variables:
Consumer Sentiment, Housing Start, Yield Curve,
and RGDP.
• Two hidden layers. The first one with 3 nodes, and
the second one with 2 nodes. Activation function
for the two hidden layers are SoftPlus for the 1st
DNN model, and Sigmoid for the second one.
• One output variable, with one node, the dependent
variable, the S&P 500 quarterly % change. The
output layer has a linear activation function.
• The DNN loss function is minimizing the sum of the
square errors (SSE). Same as for OLS.
The balance of the DNN structure is appropriate. It is recommended that the hidden layers have fewer nodes
than the input one; and, that they have more nodes than the output layer. Given that, the choice of nodes at
each layer is just about predetermined. More extensive DNNs would not have worked anyway. This is
because the DNNs, as structured, already had trouble converging towards a solution given an acceptable
error threshold.
16. 16
The 3 Models’ fit of the historical data
Despite the mentioned limitation of the Sigmoid activation function, the SoftPlus and Sigmoid DNN
models perform virtually identically. And, they both fit the complete historical data quite a bit better than
the OLR regression model.
However, as we will soon see, none of the models fit the historical data particularly well.
17. 17
The three models’ fit of the historical data: scatter plots
Visually, you can’t distinguish any difference in tightness of fit between the two DNNs (SoftPlus on the left,
Sigmoid in middle). As mentioned, the Sigmoid “vanishing gradient descent” problem did not materialize.
R Square 0.415 R Square 0.27
R Square 0.412
18. 18
The DNN models’ fit of the
historical data: time series plots
Again, you can’t visually distinguish
between the SoftPlus (top) vs. the
Sigmoid (bottom) model.
19. 19
The OLS Regression model fit of the historical data: time series plots
The OLS Regression model fit is weaker than the two DNNs. This is by definition. The DNNs use so many non linear
segmentation of the variables relationships that it is bound to generate a superior fit of historical data. As we will
see, the DNNs’ superior fit does not translate in superior out-of-sample predictions.
21. 21
Same visual data as on previous slide but disaggregated
The DNN models capture a bit more of the volatility in the S&P 500 quarterly % change. The standard deviation of
Actuals is 7.4%; for the DNNs it is about 4.8%; and for the OLS regression it is 3.8%.
22. 22
How do the models fit abrupt changes in S&P 500
defined as absolute changes of > 14%.
The models do not do a very good job at picking these outliers. The performance of the two DNNs is
indistinguishable. And, it is only incrementally better than the OLS Regression model.
23. 23
Testing the 3 models
Can these 3 models predict?
By predicting we mean whether they can generate descent S&P 500 quarterly %
estimates based on “new data” not included in the training of the models.
24. 24
Three different Testing Periods
Each testing period is 12 quarters long. And, it
is a true Hold Out or out-of-sample test. The
training data consists of all the earlier data
from 1959 Q2 up to the onset of the Hold Out
period. Thus, for the Dot.com period, the
training data runs from 1959 Q2 to 2000 Q1.
The quarters highlighted in orange denote
recessions. We call the three periods, Dot.com,
Great Recession, and COVID periods as each
respective period covers the mentioned events.
25. 25
Testing Performance Part 1: Dot.com period
The performance of all 3 models during the
Dot.com period is really bad. None of them
captured the severe market downturn over
this entire period.
But, at the margin notice that the OLS model
performed best.
We are showing the model predictions on an indexed basis where Period 0 or 2002 Q2 is equal to 100.
The next 12 quarters represent the 12 quarterly periods of the forecast within this Hold Out test.
26. 26
Testing Performance Part 2: Dot.com period
Here we are showing the annual %
change in the S&P 500 in the 1st, 2nd,
and 3d year of projections.
And, we are aggregating the
predictions by models. So, we see
what the “skyline” looks like for each
different models.
As shown, for all 3 models, the
predictions are really pretty bad.
None of the models captured the
Dot.com protracted long market
correction.
27. 27
Testing Performance Part 3: Dot.com period
This is the same visual data as shown on
the previous slide, except that the data is
clustered by Years instead of by models.
The conclusion is the same. All three
models predicted poorly over the Dot.com
period.
28. 28
Testing Performance Part 4: Dot.com period
This compares the Goodness-of-fit metrics for the
Training model vs. the same metrics for the 12
quarters Testing period, consisting of new data.
Surprisingly, in this case the R Square is often higher
during the Testing period vs. the Training one. This
is unusual. Yet, despite those occasional higher R
Squares during the Testing periods, the predictions
were rather dismal.
Focusing on the OLS Regression is interesting. It has
a surprisingly high R Square of 0.76. So, it picked up
the directional changes of the S&P 500 reasonably
well. However, it grossly overestimated the average
quarterly change at + 1.3% vs. Actual of – 2.7%
during this Dot.com period. As result, despite a
surprisingly high R Square, the OLS Regression
generated a really poor prediction. Yet, it was still
better than the DNNs.
29. 29
Testing Performance Part 5: Dot.com period
Here we are comparing the R Square and the Mean
Absolute Error (MAE) during the Training period vs.
the Testing one. By doing so, we derive an Overfit
multiple. If this Overfit multiple is > 1, it means a
model may be overfit, otherwise not.
Surprisingly, when looking at R Squares, none of the
models suffer from any material overfitness. When
we look at MAEs, the Overfit multiples are > 1. This
suggests that on this count, the models could be
considered overfit. However, this may be simply due
to the greater data volatility during the Testing
period.
The main takeaway is that the DNNs, despite their
greater complexity did perform worse than the OLS
Regression.
30. 30
Testing Performance Part 1: Great Recession period
The models’ projections look quite a
bit better than during the Dot.com
period.
At least they are directionally correct.
All three models convey a market
downturn during the Great Recession.
31. 31
Testing Performance Part 2: Great Recession period
The “skylines” are quite a bit better
for this Great Recession period
than the ones for the Dot.com
period.
The skyline of the Sigmoid and OLS
Regression models are more
convergent with Actuals than the
SoftPlus model.
33. 33
Testing Performance Part 4: Great Recession period
Focusing on the Testing period, the R Square and
MAE both show fairly material deterioration. This
is expectable since the models have not been
trained on the new data, as specified.
However, the projections are better than during
the Dot.com period because the models’ predicted
averages quarterly % change in the S&P 500 are at
least of the same sign as the Actual data.
The performance of the DNNs is not readily
differentiable from the OLS one. Again, no gain
from the added complexity.
Note that the SoftPlus model with the better
activation function has the worst R Square and
MAE.
34. 34
Testing Performance Part 5: Great Recession period
Now, we see rather stronger cases of
model overfitting. And, the overfitting
is typically more pronounced for the
DNNs, just as we expected.
35. 35
Testing Performance Part 1: COVID period
The SoftPlus model exaggerated the market
downturn in 2020 Q1. As a result, the predictions
out to 2021 Q3 ended up way too low.
The Sigmoid pretty much missed all the market
turns. But, ended up generating the best begin-
point to end-point prediction.
The OLS model tracked Actuals best up to 2020 Q1.
But, thereafter it missed much of the strength of
the spectacular Bull market over the remaining
quarters.
On a relative basis, these projections are not quite
as good as during the Great Recession period. But,
they are better than during the Dot.com period.
36. 36
Testing Performance Part 2: COVID period
Looking at these skylines, none of
them look visually convergent with
Actuals.
38. 38
Testing Performance Part 4: COVID period
During the Testing period, all models
underestimate the average pace of the market.
They all underestimate by a wide margin the bull
market strength during the 3d year.
39. 39
Testing Performance Part 5: COVID period
Not much overfitting, as specified.
But, as expected overfitting if any
is lesser within the OLS Regression
than within the DNNs.
40. 40
Testing Performance just looking at Averages
None of the models do that well on
this count. As mentioned elsewhere,
the simpler OLS Regression model is
typically competitive with the more
complex DNNs models.
41. 41
Testing Performance looking at
Averages and Standard Deviation
Given DNNs’s structures, you expect DNNs to better
capture the volatility (standard deviation) of a Y
variable than the OLS Regression. But, it is not
always the case.
43. 43
The models do not fit the historical data well enough to predict well
44. 44
The models’ weak historical fit is due to the variables relationships being very unstable
The graphs show 12 quarters correlations between Y and Xs variables. Correlations are very volatile. They often flip sign.
45. 45
Correlations during Training and Testing are very different
Correlations between Y and Xs are
very different during the respective
Training and Testing periods. Given
that, the models have no chance to
predict reasonably accurately.
46. 46
Considerations
• Macroeconomic relationships are way too unstable to facilitate the development of
effective predictive models.
• Even fitting historical data is already challenging.
• DNNs provide no advantage whatsoever over simpler OLS Regression. DNNs promoted
capacity of capturing non-linear relationship is more likely to overfit on randomness.
• The lack of these models ability to predict the stock market is probably not due to any
missing confounding variables, but more due to unstable variable relationships, and
pervasive data randomness.
• More complex DNNs with more variables, more hidden layers, more nodes would
probably not perform better. They may not even be feasible. The presented DNNs
already had challenges converging towards a solution.