CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
Summary Of Thesis
1. Dissertation summary Francesco Azzena
Neural Networks for Predicting Financial Series. A case of
study: S&P Mib Index.
The subject of this thesis is the study of so-called neural networks for forecasting purposes. These
networks consist in a system of mathematical-statistical modelling capable of describing the
relationship between one or more output variables and a set of inputs. This type of modelling is
particular in that no direct relationship is established between inputs and outputs, one or more
hidden layers being set between them that contain calculating units called neurons, which create
fictitious variables.
These models have experienced fluctuating fortunes over the years: the first attempts appeared in
the 40s as simple linear combinations of inputs. Slowly, neural networks were refined, from the
famous Perceptron models to multi-layers models with nonlinear activation functions. The
construction of these models came through many phases of training whose purpose was to make the
network learn the relationship between the variables under consideration. Despite the fact that
proper computational tools had not yet been conceived, neural networks were slowly driven out of
the attention of statisticians, above all because of their great complexity.
A second break through came at the end of the 80s, almost unexpectedly: George Cybenko, an
American professor of mathematics, proved through the study of the properties of the sigmoid
function that if well constructed, neural networks with such activation functions and one hidden
layer could approximate any nonlinear process generating data, and this with an arbitrary small
margin of error. Of course, this conclusion received great prominence: working on non-linear
generating process is always problematic precisely because of the difficulty implied in establishing
a good model to use.
Neural networks can approximate any type of non-linearity in the series —in theory at least—
through the learning system and the non-linearity of the activation functions between the layers.
For this reason, neural networks have made a comeback in many scientific fields, from economics
to medicine, biology, meteorology and many others.
We thought therefore that it would be interesting to investigate in our work the theoretical basis
which underpin this tool, so much in vogue nowadays.
1/5
2. Dissertation summary Francesco Azzena
With this objective in mind, we decided to produce a case study that would improve our knowledge
of the issues related to the construction of a neural network. This required the use of particular
computer tools. Notably, we chose to proceed with the drafting of a program in Matlab language
instead of using the appropriate statistical packages.
As this thesis was realized within the framework of a degree profile in “Banking and Finance", we
decided to focus on the most important index of the Milan stock exchange, the Standard & Poor's
Mib (S&P Mib), and to attempt to make good predictions. The choice may seem bold: according to
the theory of the weak efficiency of the markets, we should not be able to provide a proper model
for such kind of variable, since it should follow a random walk and therefore have completely
random variations, just like a white noise. We nonetheless decided to make this attempt, hoping not
only to learn about neural networks, but also to make interesting predictions or, in the worst-case
scenario, to demonstrate the correctness of the theory under consideration.
More specifically, the problem we tried to solve was how to predict the variations at the time of
closure of the Italian stock exchange by using the information available before the opening time.
The variables taken into account were the latest available closing value of the Tokyo and New York
stock exchange, together with the currency exchange rates of Euro with Yen and Dollar published
by the European Central Bank. If that modelling worked, therefore, a hypothetical operator in the
Milan stock exchange would be able to anticipate the market successfully, speculating through the
assumption of "short" or "long" positions on securities with high correlation with the market.
The structure of the thesis is rather simple: after a brief introduction in the first chapter, we
describe, in the following chapter, the story of the ideas related to neural networks, with a short
digression on the medical phenomenon of the same name.
We then analyze the historical process by which, from the first simple neural networks, more and
more complex networks came to be used. The structure of neural networks has, indeed, evolved
over time: initially, there was the model of the physiologists McCulloch and Pitts (MCP), which
simply was a weighted sum compared with a threshold value, this allowing for a response of binary
type comparable to the activity of the brain, which works through electrical impulses. As this
methodology is suitable only for linearly divisible problems, Rosenblatt proposed in 1958 the
Perceptron model: for the first time, the idea of a hidden level between inputs and outputs
appeared. The composition of the inputs remained a simple weighted sum, but by increasing the
number of neurons and thus of summations, it was made possible to define a precise area in the
Cartesian plane and then solve problems not divisible linearly.
2/5
3. Dissertation summary Francesco Azzena
The Perceptron idea opened the way for increasingly complex solutions. Today, the number of
hidden layers and neurons in each one of them depends only on the choice of structure made by the
user. Also, in every neuron we are now capable of applying to the sum of inputs in the previous
level a function chosen discretionally. This high possibility of customization of the models has
allowed neuron networks to adapt easily to many types of studies, thus finding applications in many
branches of science.
In the next part of the thesis, we explain the characteristics which distinguish the main types of
neural networks: the presence or absence of a teacher (which allows the model to learn) distinguish
between supervised networks (the most common type, the one on which we focussed our attention)
and unsupervised ones, which are less efficient but have the advantage of operating in real time
without any human intervention.
After an overview of these models, we analyze in detail the various components of a common
supervised neural network with particular attention to the activation functions and the weights.
When we create a model of this type, indeed, the first decisions we have to take pertain to the
structure: the high level of customization of the product allows us to create a network which we
believe is best suited to the series under study. Once the number of layers and hidden neurons for
each one of them have been determined, another important choice concerns the activation functions.
As already mentioned, these are the functions that we apply to the weighted sum of inputs in each
neuron of the network. There exists various types of functions of the kind, which we give a general
picture of. The most important to date is the sigmoid, since a fundamental study concerning it has
been produced already.
The neural networks almost fell into disuse when the mathematician George Cybenko
demonstrated, in an article focussing on the properties of the sigmoid, that a neural network with
this type of activation function, if well structured and with the right choice of variables, is an
universal approximator. Neural networks are therefore able to approximate a non-linear function
generating data with a margin of error smaller than a small epsilon arbitrarily chosen.
At the end of the second chapter of the thesis, we explain the methods of training the weights in the
network: when we create the network, the weights of the sum in each neuron are chosen randomly.
Subsequently, using the comparison between estimated values and values of the teacher, the
weights are trained to reach their most correct estimation.
3/5
4. Dissertation summary Francesco Azzena
Starting from the definition of error, we then examine how it is possible to use its gradient in order
to move towards the minimum point on the curve of error by changing the weights of the model.
Subsequently, we discuss the backpropagation algorithm, a method that enables the transmission of
error to the hidden layers of the model, although it is calculated solely on the final outputs, the only
detectable ones.
We also pay attention to the most commonly used methods to reduce the time for calculating the
weights and to improve performance. More particularly, we focus on methods based on learning
speed, both constant and variable. The last one was subsequently used in the practice as proposed
by Silva and Almeida.
At the end of this chapter, we also discuss the most popular way to avoid overfitting, i.e. the
excessive learning of the model about the sample, becoming almost useless out of sample. This is
one of the greatest risks using a model characterized by the ability to learn. The method we propose
and use is the division of the sample into three different groups: the training set to train the model,
the cross validation set, which is used to avoid the overfitting through the comparison between the
error curves on the training set and on the cross validation set, and the test set, the part of the
sample used to test the forecast ability when the training has ended and the weights are fixed.
In the third chapter of the thesis, we deal with practice. First, we analyze the data chosen to test the
neural networks on financial series for forecasting: our choice fell on the daily percentage changes
of the S&P Mib. We tried to explain its trend by using the data available before the start of trading:
more particularly, we used the morning closure value of the Nikkei 225 index of the Tokyo stock
exchange (thanks to time zone) and the one of the previous day, the Dow Jones Industrial Average
of New York of the two previous days, the currency rates of the two previous days between the
Euro and Yen and Dollar.
The thesis also comports a section devoted to the explanation of the theory of the weak efficiency of
the markets, which would seem to involve the unpredictability of the series under consideration,
since it would result as a random variable.
We then move on to the implementation of our ideas in Matlab language. We tried to create a small
guide for the implementation of a neural network with this program, focusing on those commands
which enact the options described in the theoretical part.
4/5
5. Dissertation summary Francesco Azzena
We created fifty models for each type of neural network with one layer distinguished by the number
of neurons (from one to fifteen, for a total of 750 models estimated). So we selected the best of
every type and compared them to the prediction of the test set: the objective was to find a network
that had a performance better than a white noise, which confirms the theory of the weak efficiency
of the markets.
In the last chapter of the thesis, we analyze the work done, pulling the conclusions about the
practical work in light of the theory discussed previously.
When we chose the S&P Mib for the practical application, we expected to meet many difficulties
and so did it happen: the estimated models unfortunately did not give satisfactory results. Using the
mean square error (MSE) for the assessment of the predicting performances, the best result was
obtained assuming the series as a white noise, putting the expected result values constantly equal to
zero: therefore the theory of the weak efficiency of the financial markets seems fully confirmed.
However, in some cases, the results were interesting, especially concerning the ability to provide
with a good approximation the combination of sign and magnitude of the change of the stock
market index in the day.
Therefore, although they can not be considered a tool which makes certain predictions, such models
could be useful when used by an experienced operator: actually, managed in conjunction with other
information spread in the markets and with the consciousness and the intuition derived from
experience, they could find a profitable use.
To operate on a wide and varied aggregate data as an stock exchange index, it is first necessary to
work properly with the economic theories it is based on. We tried to identify the most appropriate
inputs, but obviously, a financial analyst, with a greater knowledge of the mechanisms of the
market, can make a better choice: the Cybenko theorem could function with a good choice not only
of structure, but also of inputs to the network.
5/5