SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
Artificial Neural Networks 173
CHAPTER 4 Resolutions
Chapter 1
1.
a) Please see text in Section 1.2.
b) Please see text in Section 1.2
2. Please see text in Section
3. This is a very simple exercise.
a) The net input is just . The corresponding output is
.
b) To obtain the input patterns you can use the Matlab function inp=randn(10,2).
You can also obtain values downloading the file inp.mat. The Matlab function
required is in singneur.m.
4. Consider .
a) The derivative, , of this activation function, with respect to the standard devia-
tion, is:
et 1 1 1
1
1–
0.5
0= =
1
1 e
0.5–
+
-------------------- 0.622= =
fi Ci k, σi,( ) e
Ci k, xk–( )2
k 1=
n
∑
2σi
2
------------------------------------–
=
σi∂
∂fi
σi
Resolutions
174 Artificial Neural Networks
b) The derivative, , of this activation function, with respect to its kth
centre,
is:
σi∂
∂fi
e
Ci k, xk–( )2
k 1=
n
∑
2σi
2
------------------------------------–
 
 
 
 
 
 
 
 
 
′
f– i
Ci k, xk–( )2
k 1=
n
∑
2σi
2
--------------------------------------
 
 
 
 
 
 
 
 
′
⋅
f– i
Ci k, xk–( )2
k 1=
n
∑
2
-------------------------------------- σi
2–( )
′
⋅ ⋅ f– i
Ci k, xk–( )2
k 1=
n
∑
2
-------------------------------------- 2σi
3––⋅ ⋅
fi
Ci k, xk–( )2
k 1=
n
∑
σi
3
--------------------------------------⋅
= =
= =
=
Ci k,∂
∂fi
Ci k,
Ci k,∂
∂fi
e
Ci k, xk–( )2
k 1=
∑
2σi
2
------------------------------------–
 
 
 
 
 
 
 
 
 
f– i
Ci k, xk–( )2
k 1=
n
∑
2σi
2
--------------------------------------
 
 
 
 
 
 
 
 
⋅
f– i
1
2σi
2
--------- Ci k, xk–( )2
k 1=
n
∑
 
 
 
 
′
⋅ ⋅ f– i
1
2σi
2
--------- 2 Ci k, xk–( )⋅ ⋅
f– i
Ci k, xk–( )
-------------------------⋅
= =
= =
=
Chapter 1
Artificial Neural Networks 175
c) The derivative, , of this activation function, with respect to the kth input, is:
5. The recursive definition a B-spline function is:
a) By definition .
b)
We now have to determine which are the values of and .
If , then , and .
If , then , and .
Splines of order 2 can be seen in fig. 1.13 b).
xk∂
∂fi
xk
e
Ci k, xk–( )2
k 1=
∑
2σi
2
------------------------------------–
 
 
 
 
 
 
 
 
 
f– i
Ci k, xk–( )2
k 1=
∑
2σi
2
--------------------------------------
 
 
 
 
 
 
 
 
⋅
fi
1
2σi
2
--------- Ci k, xk–( )2
k 1=
n
∑
 
 
 
 
′
⋅ ⋅ f– i
1
2σi
2
--------- 2–( ) Ci k, x–(⋅ ⋅
2 C x( )
= =
=
Nk
j
x( )
x λj k––
λj 1– λj k––
----------------------------
 
  Nk 1–
j 1–
x( )
λj x–
λj λj k– 1+–
-----------------------------
 
  Nk 1–
j
x( )+=
N1
j
x( )
1 if x Ij∈
0 otherwise


=
1
j
x( )
1 if x Ij∈
0 otherwise


=
N2
j
x( )
x λj 2––
λj 1– λj 2––
-----------------------------
 
  N1
j 1–
x( )
λj x–
λj λj 1––
----------------------
 
  N1
j
x( )+=
N1
j 1– x( ) N1
j
x( )
x Ij∈ N1
j 1– x( ) 0
N1
j
x( ) 1
=
=
N2
j
x( )
λj x–
λj λj ––
-------------------=
x Ij 1–∈ N1
j 1– x( ) 1
N1
j
x( ) 0
=
=
2
j
x( )
x λj 2––
λj 1– λj ––
-------------------------=
Resolutions
176 Artificial Neural Networks
c)
We now have to find out which are the values of and . We have done
that above, and:
. Replacing the last two equations, we have:
In a more compact form, we have:
Assuming that the knots are equidistant, and that every interval is denoted by ,
we can have:
N3
j
x( )
x λj 3––
λj 1– λj 3––
-----------------------------
 
  N3
j 1–
x( )
λj x–
λj λj 2––
----------------------
 
  N2
j
x( )+=
N2
j 1– x( ) N2
j
x( )
1– x( )
λj 1– x–
λj 1– λj 2––
----------------------------- if x Ij –∈,
x λj 3––
λj 2– λj 3––
----------------------------- if x Ij –∈,







=
x( )
λj 1– x–
λj 1– λj 1––
----------------------------- if x Ij∈,
x λj 2––
λj 1– λj 2––
----------------------------- if x Ij –∈,







=
x λj 3––
λj 1– λj 3––
-----------------------------
x λj 3––
λj 2– λj 3––
-----------------------------⋅ x Ij 2–∈,
x λj 3––
λj 1– λj 3––
-----------------------------
λj 1– x–
λj 1– λj 2––
-----------------------------⋅
λj x–
λj λj 2––
----------------------
x λj 2––
λj 1– λj 2––
-----------------------------⋅+ x ∈,
λj x–
λj λj 2––
----------------------
λj x–
λj λj 1––
----------------------⋅ x Ij∈,











x λj 3––( )
2
λj 1– λj 3––( ) λj 2– λj 3––( )
-------------------------------------------------------------------- x Ij 2–∈,
x λj 3––
λj 1– λj 3––
-----------------------------
λj 1– x–
λj 1– λj 2––
-----------------------------⋅
λj x–
λj λj 2––
----------------------
x λj 2––
λj 1– λj 2––
-----------------------------⋅+ x ∈,
λj x–( )
2
λj λj 2–( ) λj λj 1–( )
------------------------------------------------------ x Ij∈,











∆
Chapter 1
Artificial Neural Networks 177
Splines of order 3 can be seen in fig. 1.13 c).
6. Please see text text in Section 1.3.3.
7. The input vector is:
and the desired target vector is:
.
a) The training criterion is:
, where . The output vector, y, is, in this case:
, and therefore:
.
The gradient vector, in general form, is:
For the point [0,0], it is:
x λj 3––( )
2
2∆( )
2
--------------------------- x Ij 2–∈,
x λj 3––( ) λj 1– x–( )
2∆( )
2
--------------------------------------------------
λj x–( ) x λj 2––( )
2∆( )
2
-------------------------------------------+ x I∈,
λj x–( )
2
2∆( )
2
-------------------- x Ij∈,













1– 0.5– 0 0.5 1, , , ,[ ]=
1 0.25 0 0.25 1, , , ,[ ]=
Ω
e
2
i[ ]
i 1=
5
∑
2
---------------------= e t y–=
xw2 w+=
Ω 1 w1 w2–( )–( )
2
0.25 w1 0.5w2–( )–( )
2
0 w1–( )
2
0.25 w1 0.5w2+( )–( )
2
1 w1 w2+( )–( )
2
+ +
+ +
(
) 2⁄
=
w1∂
∂Ω
w1∂
∂
e– 1 e2– e3– e4 e5+ +
e1 0.5e2 0 0.5e4– –+ +
= =
Resolutions
178 Artificial Neural Networks
.
b) For each pattern p, the correlation matrix is:
.
For the 5 input patterns, we have (we have just 1 output, and therefore a weight
vector):
Chapter 2
8. A decision surface which is an hyperplane, such as the one represented in the next
figure, separates data into two classes:
If it is possible to define an hyperplane that separates the data into two classes (i.e., if
it is possible to determine a weight vector w that accomplishes this), then data is said
to be linearly separable.
e– 1 e2– e3– e4 e5+ +
0.5e2 0 0.5e4– e5–+ +
w 0
0
=
2–
0
=
W IpTp
T
=
1
1–
1 1
0.5–
0.25⋅–⋅ 0 1
0.5
0.25⋅ 1
1
1⋅+ + + =
Class C1
Class C2
w1x1 w2x2 θ–+ 0=
Chapter 2
Artificial Neural Networks 179
The above figure illustrates the 2 classes of an XOR problem. There is no straight
line that can separate the circles from the crosses. Therefore, the XOR problem is not
linearly separable.
9. In an Adaline, the input and output variables are bipolar {-1,+1}, while in a
Perceptron the inputs and outputs are 0 or 1. The major difference, however, lies in
the learning algorithm, which in the case of the Adaline is the LMS algorithm, and in
the Perceptron is the Perceptron Learning Rule. Also, the point where the error is
computed, in an Adaline is at the neti point, and not at the output, in a Perceptron.
Therefore, in an Adaline, error is not limited to the discrete values {-1, 0, 1} as in the
normal perceptron, but can take any real value.
10. Consider the figure below:
The AND function has the following truth table:
Class C1
Class C2
w1x1 w2x2 θ–+ 0=
Resolutions
180 Artificial Neural Networks
This means that if we design a line passing through the points (0,1) and (1,0), and
translate this line so that it stays in the middle of these points, and the point (1,1), we
have a decision boundary that is able to classify data according to the AND function.
The line that passes through (0,1), (1,0) is given by:
,
which means that . Any value of satisfying will do
the job.
11. Please see Ex. 2.4.
12. The exclusive OR function can be implemented as:
Therefore, we need two AND functions, and one OR function. To implement the first
AND function, if the sign of the 1st weight is changed, and the 3rd weight is changed
to , then the original AND function implements the function (Please see
Ex. 2.4).
Using the same reasoning, if the sign of weight 2 is changed, and the 3rd
weight is
changed to , then the original AND function implements the function .
Finally, if the perceptron implementing the OR function is employed, with the out-
puts of the previous perceptrons as inputs, the XOR problem is solved.
Then, the implementation of the function uses just
the Adaline that implements the AND function, with inputs and the output of the
XOR function.
13. Please see Section 2.1.2.2.
14. Assume that you have a network with just one hidden layer (the proof can be easily
Table 4.172 - AND truth table
I1 I2 AND
0 0 0
0 1 0
1 0 0
1 1 1
x1 x2 1–+ 0=
w1 w2 θ 1= = = θ 1 θ 2< <
X Y⊕ XY XY+=
θ w1+ xy
θ w2+ xy
f x1 x2 x3, ,( ) x1 x2 x3⊕( )∧=
x1
Chapter 2
Artificial Neural Networks 181
extended to more than 1 hidden layer).
The output of the first hidden layer, for pattern p, can be given as:
, as the activation functions are linear.
In the same way, the output of the network is given by:
. Combining the two equations, we stay with:
.
Therefore, a one hidden layer network, with linear activation functions, are equiva-
lent to a neural network with no hidden layers.
15. Let us consider . Let us compute the square:
Please note that all the terms in the numerator of the two last fractions are scalar.
a) Let us compute . The derivative of the first term in the numerator of the
last equation is null, as it does not depend on w. is a row vector, and so the
next term in the numerator is a dot product (if we denote as xT, the dot prod-
uct is:
.
Therefore .
Op .,
2( )
W
1( )
Op .,
1( )
=
Op .,
3( )
W
2( )
Op .,
2( )
=
Op .,
3( )
W
2( )
W
1( )
Op .,
1( )
WOp .,
1( )
= =
Ωl t Aw– 2
2
-----------------------=
Ωl t Aw–( )
T
t Aw–( )⋅
2
-------------------------------------------------
t
T
t t
T
Aw– w
T
A
T
t– w
T
A
T
Aw+
2
----------------------------------------------------------------------------
t
T
t 2t
T
Aw– w
T
A
T
Aw+
2
---------------------------------------------------------
= =
=
gl
w
T
d
dΩ
l
=
t
T
A
t
T
A
t
T
Aw x1w1 x2w2 … xnwn+ + +=
w1d
d
t
T
Aw
…
w2d
d
t
T
Aw
x1
…
xn
t
T
A( )
T
A
T
t= = =
Resolutions
182 Artificial Neural Networks
Concentrating now on the derivative of the last term, is a square symmetric
matrix. Let us consider a 2*2 matrix denoted as C:
Then the derivative is just:
As , we finally have:
Putting all together, , is:
.
b) The minimum of is given by . Doing that, we stay with:
.
c) Consider an augmented matrix A, , and an augmented vector t,
. Then:
A
T
A
A
T
A
w
T
A
T
Aw w
T
Cw w1 w2
C1 1, C1 2,
C2 1, C2 2,
w1
w2
w1C1 1,
w2C2 1,+ w1C1 2,
w2C2 2,+
w1
w2
w1
2
C1 1, w1w2C2 1, w2w1C1 2,
w2
2
C2 2,+ + +
= =
=
=
w
T
d
d
w
T
A
T
Aw
2w1C1 1,
w2C2 1, w2C1 2,+ +
w1C2 1, w1C1 2,
2w2C2 2,+ +
=
C1 2, C2 1,=
w
T
d
d
w
T
A
T
Aw
2w1C1 1,
2w2C2 1,+
2w1C2 1, 2w2C2 2,+
2Cw= =
gl
w
T
d
dΩ
l
=
gl A
T
t– A
T
Aw+ A
T
t– A
T
y+ A
T
e–= = =
Ω
l
g 0=
0 A
T
t– A
T
Aw
w
+
A
T
A( )
1–
A
T
t
=
=
A
A
λI
=
t t
0
=
Chapter 2
Artificial Neural Networks 183
Then, all the results above can be employed, by replacing A with and t with .
Therefore, the gradient is:
. Notice that the gradient can
also be formulated as the negative of the product of the transpose of the Jacobean and
the error vector:
, where:
, .
The optimum is therefore:
.
16.
a) Please see Section 2.1.3.1.
b) The error back-propagation is a computationally efficient algorithm, but, since it
implements a steepest descent method, it is unreliable and can have a very slow
rate of convergence. Also it s difficult to select appropriate values of the learning
parameter. For more details please see Section 2.1.3.3 and Section 2.1.3.4.
The problem related with lack of convergence can be solved by incorporating a line-
search algorithm, to guarantee that the training criterion does not increase in any iter-
ation. To have a faster convergence rate, second-order methods can be used. It is
proved in Section 2.1.3.5 that the Levenberg-Marquardt algorithm is the best tech-
nique to use, which does not employ a learning rate parameter.
t Aw– 2
2
-----------------------
t
T
t 2t
T
Aw– w
T
A
T
Aw+
2
---------------------------------------------------------
t
T
t 2t
T
Aw– w
T
A
T
A λI+( )w+
2
--------------------------------------------------------------------------
t
T
t 2t
T
Aw– w
T
A
T
Aw λw
T
w+ +
2
------------------------------------------------------------------------------
t Aw– 2 λ w
2
+
2
-------------------------------------------- φl
=
=
=
= =
A t
gl
φ A
T
t– A
T
Aw+ A
T
t– A
T
A λI+( )w+
A
T
t– A
T
A( )w λw+ + gl λw+
= =
= =
gl
φ A
T
t– A
T
Aw+ A
T
t– A
T
y+ A
T
e–= = =
e t y–= y Aw
A
λI
w
y
λw
= = =
0 A
T
t– A
T
A λI+( )w
wˆ
+
A
T
A λI+( )
1–
A
T
t
=
=
Resolutions
184 Artificial Neural Networks
17. The sigmoid function is covered in Section 1.3.1.2.4 and the hyperbolic
tangent function in Section 1.3.1.2.51
. Notice that these functions are related
as: . The advantages of using an hyperbolic tangent function
over a sigmoid function are:
1.The hyperbolic function generates a better conditioned model. Notice that a
MLP with a linear function in the output layer has always a column of ones in the
Jacobean matrix (related with the output bias). As the Jacobean columns related
with the weights from the last hidden layer to the output layer are a linear function
of the outputs of the last hidden layer, and as the mean of an hyperbolic tangent
function is 0, while the mean of a sigmoid function is 1/2, in this latter case those
Jacobean columns are more correlated with the Jacobean column related with the
output bias;
2.The derivative of the sigmoid function lies between , its expected value
considering an uniform probability density function at the output of the node is 1/
6. For a hyperbolic tangent function, its derivative lies within and its
expected value is 2/3. When we compute the Jacobean matrix, one of the factors
involved in the computation is (see (2.42) ). Therefore, in comparison
with weights related with the linear output layer, the columns of the Jacobean
matrix related with the nonlinear layers appear “squashed” of a mean factor of 1/6,
for the sigmoid function, and a factor of 2/3, for the hyperbolic tangent function.
This “squashing” is translated into smaller eigenvalues, which itself is translated
into a slow rate of convergence, as the rate of convergence is related with the
smaller eigenvalues of the normal equation matrix (see Section 2.1.3.3.2). As this
“squashing” is smaller for the case of the hyperbolic tangent function, than a net-
work with these activation functions has potentially a faster rate of convergence.
18. We shall start by the pH problem. Using the same topology ([4 4 1]) and the same
initial values, the only difference in the code is to change, in the Matlab file
ThreeLay.m, the instructions:
Y1=ones(np,NNP(1))./(1+exp(-X1));
Der1=Y1.*(1-Y1);
Y2=ones(np,NNP(2))./(1+exp(-X2));
Der2=Y2.*(1-Y2);
by the following instructions:
Y1=tanh(X1);
1. If we consider ,
f1 x( )( )
f2 x( )( )
f x( ) tahn x( )= f' x( ) 1 x( )tanh
2
– 1 f x( )
2
–= =
f2 x( ) 2f1 2x( ) 1–=
0 0.25,[ ]
0 1,[ ]
∂Oi .,
z 1+( )
∂Neti .,
z 1+( )
-------------------------
Chapter 2
Artificial Neural Networks 185
Der1=1-Y1.^2;
Y2=tanh(X2);
Der2=1-Y2.^2;
then, the following results are obtained using BP.m:
Comparing these results with the ones shown in fig. 2.18 , it can be seen that a better
accuracy has been obtained.
0 10 20 30 40 50 60 70 80 90 100
0
1
2
3
4
5
6
7
Iteration
ErrorNorm
neta=0.005
neta=0.001
Resolutions
186 Artificial Neural Networks
Addressing now the Inverse Coordinate problem, using the same topology ([5 1]) and
the same initial values, and changing the instructions only related with layer 1 (see
above) in TwoLayer.m, the following results are obtained:
Again, better accuracy results are obtained using the hyperbolic tangent function
(compare this figure with fig. 2.23 ). It should be mentioned that smaller learning
rates than the ones used with the sigmoid function had to be applied, as the training
process diverged.
19. The error back-propagation is a computationally efficient algorithm, but, since it
implements a steepest descent method, it is unreliable and can have a very slow rate
of convergence. Also it s difficult to select appropriate values of the learning
parameter. The Levenberg-Marquardt methods is the “state-of-the-art” technique in
non-linear least-squares problems. It guarantees convergence to a local minimum,
and usually the rate of convergence is second-order. Also, it does not require any
user-defined parameter, such as learning-rate. Its disadvantage is that,
computationally, it is a more demanding algorithm.
20. Please see text in Section 2.1.3.6.
21. Please see text in Section 2.1.3.4.
22. Use the following Matlab code:
x=randn(10,5); % Matrix with 10*5 random elements following a normal distribution
cond(x); %The condition number of the original matrix
0 10 20 30 40 50 60 70 80 90 100
0
2
4
6
8
10
12
14
16
18
Iteration
ErrorNorm
neta=0.005
neta=0.001
Chapter 2
Artificial Neural Networks 187
Now use the following code:
alfa(1)=10;
for i=1:3
x1=[x(:,1:4)/alfa(i) x1(:,5)]; %The first four columns are reduced of alfa
c(i)=cond(x1);
alfa(i+1)=alfa(i)*10; % alfa will have the values of 10, 100 and 1000
end
If now, we compare the ratio of the condition numbers obtained ( c(2)/c(1) and c(3)/
c(2) ), we shall see that they are 9.98 and 9.97, very close to the factor 10 that was
used.
23. Use the following Matlab code:
for i=1:100
[W,Ki,Li,Ko,Lo,IPS,TPS,cg,ErrorN,G]=MLP_initial_par([5 1],InpPat,TarPat,2);
E(i)=ErrorN(2);
c(i)=cond(G);
end
This will generate 100 different initializations of the weight vector, with the weights
in the linear output layer computed as random values.
Afterwards use the following Matlab code:
for i=1:100
[W,Ki,Li,Ko,Lo,IPS,TPS,cg,ErrorN,G]=MLP_initial_par([5 1],InpPat,TarPat,1);
E1(i)=ErrorN(2);
c1(i)=cond(G);
end
This will generate 100 different initializations of the weight vector, with the weights
in the linear output layer computed as the least-square values. Finally, use the Matlab
code:
Resolutions
188 Artificial Neural Networks
for i=1:100
W=randn(1,21);
[Y,G,E,c]=TwoLayer(InpPat,TarPat,W,[5 1]);
E2(i)=norm(TarPat-Y);
c2(i)=cond(G);
end
The mean results obtained are summarized in the following table:
24. Let us consider first the input. Determining the net input of the first hidden layer:
This way, each row within the first k1 lines of W(1) appears multiplied by each ele-
ment of the diagonal of ki, while for each element of the last row (related with the
bias) a quantity is added, which is the dot product of the diagonal elements of li and
each column of the first k1 lines of W(1).
Let us address now the output.
Table 4.1 - Mean values of the Initial Jacobean condition number and error norm
Method
Jacobean Condition
Number
Initial Error
Norm
MLP_init (random values for linear
weights)
1.6 106 15.28
MLP_init (optimal values for linear
weights)
3.9 106 1.87
random values 2.8 1010 24.11
Net
2( )
IPs I W
1( )
IP ki⋅ I m k1×( ) li⋅+ | I m 1×( )
W1…K1 .,
1( )
-
WK1 1+ .,
1( )
IP kiW1…K1 .,
1( )
⋅ I m k1×( ) liW1…K1 .,
1( )
I m 1×( )WK1 1+ .,
1( )
+⋅+
= =
=
Chapter 2
Artificial Neural Networks 189
This is, the weights connecting the last hidden neurons with the output neuron appear
divided by ko, and the bias is first subtracted of lo, and afterwards divided by ko.
25. The results presented below should take into account that in each iteration of
Train_MLPs.m a new set of initial weight values is generated, and therefore, no run is
equal. Those results were obtained using the Levenberg-Marquardt methods,
minimizing the new criterion. For the early-stopping method, a percentage of 30%
for the validation set was employed.
In terms of the pH problem, employing a termination criterion of 10-3
, the following
results were obtained:
In terms of the Coordinate Transformation problem, a termination criterion of 10-5
was employed. The following results were obtained:
Table 4.2 - Results for the pH problem
Regularization
Parameter Error Norm Linear Weight Norm Number of Iterations
Error Norm
(Validation Set)
0 0.021 80 20 0.003
10-6 0.016 5.3 17 0.015
10-4 0.033 7.3 15 0.026
10-2 0.034 9.4 38 0.018
early-stopping 0.028 21 24 0.02
Table 4.3 - Results for the Coordinate Transformation problem
Regularization
Parameter Error Norm Linear Weight Norm Number of Iterations
Error Norm
(Validation Set)
0 0.41 17.5 49 0.39
10-6 0.99 2.3 45 0.91
Os Oko Ilo+ Net
q 1–( )
| I
w1…kq 1–
q 1–( )
-
wkq 1– 1+
q 1–( )
O
1
ko
----- Net
q 1–( )
w1…kq 1–
q 1–( )
Iwkq 1– 1+
q 1–( )
Ilo–+( )
Net
q 1–( ) w1…kq 1–
q 1–( )
ko
-------------------⋅ I
wkq 1– 1+
q 1–( )
lo–
ko
-----------------------------
 
 
 
+
= =
= =
=
Resolutions
190 Artificial Neural Networks
The results presented above show that, only in the 2nd case, the early-stopping tech-
nique achieves better generalization results than the standard technique, with or with-
out regularization. Again, care should be taken in the interpretation of the results, as
in every case different initial values were employed.
26. For both cases we shall use as termination criterion of 10-3. The Matlab files can be
extracted from Const.zip.
The results for the pH problem can be seen in the following figure:
There is no noticeable decrease in the error norm after 5 hidden neurons. Networks
with more than 5 neurons exhibit the phenomenon of overmodelling. If a MLP with
10 neurons is constructed using the Matlab function Train_MLPs.m, the error norm
obtained is 0.086, while with the constructive method we obtain 0.042.
The results for the Inverse Coordinate Problem can be seen in the following figure:
10-4 1.28 2.5 20 0.93
10-2 0.5 10.6 141 0.39
early-stopping 0.38 40 119 0.24
Table 4.3 - Results for the Coordinate Transformation problem
Regularization
Parameter Error Norm Linear Weight Norm Number of Iterations
Error Norm
(Validation Set)
1 2 3 4 5 6 7 8 9 10
0
0.5
1
1.5
2
2.5
Number of nonlinear neurons
ErrorNorm
Chapter 2
Artificial Neural Networks 191
As it can be seen, after the 7th neuron, there is no noticeable improvement in the
accuracy. For this particular case, models with more than 7 neurons exhibit the phe-
nomenon of overmodelling. If a MLP with 10 neurons is constructed using the Mat-
lab function Train_MLPs.m, the error norm obtained is 0.086, while with the
constructive method we obtain 0.097.
It should be mentioned that it can be seen that the strategy employed in this construc-
tive method lends to bad initial models, with a number of neurons greater than, let us
say, 5.
27. The instantaneous autocorrelation matrix is given by: . The
eigenvalues and eigenvectors of satisfy the equation: . Replacing the
previous equation in the last one, we have:
As the product is a scalar, then this corresponds to the eigenvalue, and is
the eigenvector.
28. After adaptation with the LMS rule, the a posteriori output of the network, , is
1 2 3 4 5 6 7 8 9 10
0
0.5
1
1.5
2
2.5
Number of nonlinear neurons
ErrorNorm
R k[ ] a k[ ]a’ k[ ]=
R k[ ] Re λe=
a k[ ]a’ k[ ]e λe=
a’ k[ ]e a k[ ]
y k[ ]
Resolutions
192 Artificial Neural Networks
given by:
,
where the a posterior error , is defined as:
.
For a non-null error, the following relations apply:
29. The following figure illustrates the results obtained with the NLMS rule, for the
y k[ ] a
T
k[ ]w k[ ]
a
T
k[ ]w k 1–[ ] a
T
k[ ]ηe k[ ] a k[ ]( )+
δ a k[ ]
2
yˆ k[ ] 1 η a k[ ]
2
–( )y k[ ]+
=
=
=
e k[ ]
e k[ ] yˆ k[ ] y k[ ]– 1 η a k[ ]
2
–( )e k[ ]= =
e k[ ] e k[ ] if η 0 2,( ) a k[ ]
2
⁄[ ]∉( )
e k[ ] e k[ ] if= η 0= or η 2 a k[ ]
2
⁄
e k[ ] e k[ ] if η 0 2,( ) a k[ ]
2
⁄[ ]∈( )
e k[ ]
<
=
>
0 if η 1 a k[ ]
2
⁄= =
Chapter 2
Artificial Neural Networks 193
Coordinate Inversion problem, when .
Learning is stable in all cases, and the rate of convergence is almost independent of
the learning rate employed. If we employ a learning rate (2.001) slightly larger than
the stable domain, we obtain unstable learning:
η 0.1 1 1.9, ,{ }=
0 200 400 600 800 1000 1200
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
MSE
Iterations
0 200 400 600 800 1000 1200
0
5
10
15
20
25
30
35
40
45
Iterations
MSE
Resolutions
194 Artificial Neural Networks
Using now the standard LMS rule, the following results were obtained with
. Values higher than 0.5 result in unstable learning.
In terms of convergence rate, the methods produce similar results. The NLMS rule
enables to guarantee convergence within the domain .
30. We shall consider first the pH problem. The average absolute error, after off-line
training (using the parameters stored in Initial_on-pH.mat) is:
. Using this value in
,
η 0.1 0.5,{ }=
0 200 400 600 800 1000 1200
0
0.5
1
1.5
2
2.5
Iterations
MSE
η 0 2,[ ]∈
ς E e
n
k[ ][ ] 0.0014= =
e
d
k[ ]
0 if e k[ ] ς≤( )
e k[ ] ς if e k[ ] ς–<( )+
e k[ ] ς if e k[ ] ς>( )–




=
Chapter 2
Artificial Neural Networks 195
the next figure shows the MSE value, for the last 10 (out of 20 passes) of adaptation,
using the NLMS, with .
The results obtained with the LMS rule, with , are shown in the next figure.
η 1=
1000 1200 1400 1600 1800 2000 2200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x 10
-3
Iterattions(Last 10 passes)
MSE
With Dead-Zone
Without Dead-Zone
η 0.5=
1000 1200 1400 1600 1800 2000 2200
0
1
2
3
4
5
6
7
8
x 10
-4
Iterattions(Last 10 passes)
MSE
With Dead-Zone
Without Dead-Zone
Resolutions
196 Artificial Neural Networks
Considering now the Coordinate Transformation problem, the average absolute error,
after off-line training is: . Using this value, the NLMS rule,
with , produces the following results:
The above figure shows the MSE in the last pass (out of 20) of adaptation.
Using now the LMS rule, with , we obtain:
ς E e
n
k[ ][ ] 0.027= =
η 1=
1900 1920 1940 1960 1980 2000 2020
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
Iterattions(Last pass)
MSE
With Dead-Zone
Without Dead-Zone
η 0.5=
1900 1920 1940 1960 1980 2000 2020
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
Iterattions(Last pass)
MSE
With Dead-Zone
Without Dead-Zone
Chapter 2
Artificial Neural Networks 197
For all the cases, better results are obtained with the inclusion of an output dead-zone
in the adaptation algorithm. The main problem is to determine the dead-zone parame-
ter, in real situations.
31. Considering the Coordinate Inverse problem, using the NLMS rule, with , the
results obtained with a worse conditioned model (weights in Initial_on_CT_bc.m),
compared with the results obtained with a better conditioned model (weights in
Initial_on_CT.m), are represented in the next figure:
Regarding now the pH problem, using the NLMS rule, with , the results
obtained with a worse conditioned model (weights in Initial_on_pH_bc.m), com-
η 1=
0 500 1000 1500 2000 2500
0
0.5
1
1.5
2
2.5
Iterattions
MSE
Worse conditioned
Better conditioned
η 1=
Resolutions
198 Artificial Neural Networks
pared with the results obtained with a better conditioned model (weights in
Initial_on_pH.m), are represented in the next figure
It is obvious that a better conditioned model achieves a better adaptation rate.
32. We shall use the NLMS rule, with , in the conditions of Ex. 2.6. We shall start
the adaptation from 4 different initial conditions: . The next figure
illustrates the evolution of the adaptation, with 10 passes over the training set. The 4
different adaptations converge to a small area, indicated by the green colour in the
0 500 1000 1500 2000 2500
0
0.05
0.1
0.15
0.2
0.25
Iterations
MSE Worse conditioned
Better conditioned
η 1=
wi 10±= i, 1 2,=
Chapter 2
Artificial Neural Networks 199
figure.
If we zoom this small area, we can see that . In the
first example ( ), it enter this are in iteration 203, in the second
example ( ), it enter this are in iteration 170, in the third case
( ), it enter this are in iteration 204, and in the fourth
case( ), it enter this domain in iteration 170. This is shown in the
next figure.
w1 0 0.75,[ ]∈ w2 0.07– 1,[ ]∈,
w 1[ ] 10 10,[ ]=
w 1[ ] 10 10–,[ ]=
w 1[ ] 10– 10–,[ ]=
w 1[ ] 10– 10,[ ]=
Resolutions
200 Artificial Neural Networks
This domain, where, after being entered, the weights never leave and never settle, is
called the minimal capture zone.
If we compare the evolution of the weight vector, starting from ,
with or without dead-zone, we obtain the following results:
The optimal values of the weight parameters, in the least squares sense, are given by:
, where x and y are the input and target data, obtaining
an optimal MSE of 0.09. The dead-zone parameter employed was
.
33. Assuming an interpolation scheme, the number of basis is equal to the number of
patterns. This way, your network has 100 neurons in the hidden layer. The centers of
the network are placed in the input training points. So, if the matrix of the centers is
denoted as C, then C=X. With respect to the spreads, as nothing is mentioned, you
can employ the most standard scheme, which is equal spreads of value ,
where dmax is the maximum distance between the centers. With respect to the linear
output weights, they are the optimal values, in the least squares sense, that is,
, where G is the matrix of the outputs of the hidden neurons.
The main problem with the last scheme is that the network grows as the training set
grows. This results in ill-conditioning of matrix G, or even singularity. For this rea-
son, an approximation scheme, with the number of neurons strictly less than the
number of patterns is the option usually taken.
w 1[ ] 10 10–,[ ]=
0 500 1000 1500 2000 2500
-10
-8
-6
-4
-2
0
2
4
6
8
10
Iterations
w1,w2
wˆ x 1
+
y 0 0.3367= =
ς max e
n
k[ ][ ] 0.663= =
σ
dmax
2m1
--------------=
wˆ G
+
t=
Chapter 2
Artificial Neural Networks 201
34. The k-means clustering algorithm places the centres in regions where a significant
number of examples is presented. The algorithm is:
35. We shall use, as initial values, the data stored in winiph_opt.m and winict_opt.m, for
the pH and the Coordinate inverse problems, respectively. The new criterion, and the
Levenberg-Marquardt will be used in all these problems.
a) With respect to the pH problem, the application of the termination criterion
( ), is expressed in the next table:
Table 4.6 - Standard Termination
Method
Number of
Iterations Error Norm Linear Weight Norm
Condition of Basis
Functions
LM (New
Criterion)
5 0.0133 7.8 104
1.4 107
1.Initialization - Choose random values for the centres;
they must be all different
2.For j=1 to n
2.1.Sampling - Find a sample vector x from the input matrix
2.2.Similarity matching - Find the centre closest to x. Let its
index be k(x):
(4.4)
2.3.Updating - Adjust the centres of the radial basis func-
tions according to:
(4.5)
2.4.j=j+1
end
) mink x k( ) cj i[ ]– 2
arg= j 1 …,=,
1+ ]
cj i[ ] η x k( ) cj i[ ]–( )+ j k(=,
cj i[ ] otherwise,





=
τ 10
4–
=
Resolutions
202 Artificial Neural Networks
With respect to the Coordinate Inverse problem, the application of the termination
criterion ( ), is expressed in the next table:
b) With respect to the pH problem, the application of the termination criterion
( ), to the LM method, minimizing the new criterion, and using an early-
stopping method (the Matlab function gen_set.m was applied with a percentage of
30%), gives the following results:
The second line represents the results obtained, for the same estimation and valida-
tion sets, using the parameters found by the application of the regularization tech-
nique (unitary matrix) to all the training set. It can be seen that the same accuracy is
obtained for the validation set, with better results in the estimation set.
With respect to the Coordinate Inverse problem, the application of the termination
criterion ( ), to the LM method, minimizing the new criterion, and using an
early-stopping method (the Matlab function gen_set.m was applied with a percentage
of 30%), gives the following results:
The results presented below show that using all the training data with the regulariza-
tion method, a better result was obtained for the validation set, although a worse
result was obtained for the estimation set.
Table 4.7 - Standard Termination
Method
Number of
Iterations Error Norm Linear Weight Norm
Condition of Basis
Functions
LM (New
Criterion)
18 0.19 5.2 108 2.5 1016
Table 4.8 - Early-stopping Method
Method
Number of
Iterations
Error Norm
(Est. set)
Error Norm
(Val. set) Linear Weight Norm
Condition of Basis
Functions
Early-Stop-
ping
6 0.0075 0.0031 8.7 104
1.7 107
19 0.0044 0.0029 15.4 1.6 103
Table 4.9 - Early-stopping Method
Method
Number of
Iterations
Error Norm
(Est. set)
Error Norm
(Val. set) Linear Weight Norm
Condition of Basis
Functions
Early-Stop-
ping
77 0.1141 0.1336 1.3 108
4.6 1014
19 0.1693 0.1047 168 3.7 1014
τ 10
3–
=
τ 10
4–
=
λ 10
6–
=
τ 10
3–
=
λ 10
6–
=
Chapter 2
Artificial Neural Networks 203
c) With respect to the pH problem, the application of the termination criterion
( ), to the LM method, minimizing the new criterion, is expressed in the
next table:
With respect to the Coordinate Inverse problem, the application of the termination
criterion ( ), to the LM method, minimizing the new criterion, is expressed
in the next table:
d) With respect to the pH problem, the application of the termination criterion
( ), to the LM method, minimizing the new criterion, is expressed in the
next table:
Table 4.10 - Explicit Regularization (I)
Method
Number of
Iterations Error Norm Linear Weight Norm
Condition of Basis
Functions
19 0.0053 15.4 1.6 103
25 0.0247 9.75 3.8 104
83 0.044 2.07 4.5 103
Table 4.11 - Explicit Regularization (I)
Method
Number of
Iterations Error Norm Linear Weight Norm
Condition of Basis
Functions
100 0.199 168 3.7 1014
100 0.4039 29 1.2 1015
100 0.9913 9.9 3.8 1017
Table 4.12 - Explicit Regularization (G0)
Method
Number of
Iterations Error Norm Linear Weight Norm
Condition of Basis
Functions
43 0.0132 3.6 104
5.6 106
17 0.0196 633 1.2 105
150 0.0539 38 1.3 106
τ 10
4–
=
λ 10
6–
=
λ 10
4–
=
λ 10
2–
=
τ 10
3–
=
λ 10
6–
=
λ 10
4–
=
λ 10
2–
=
τ 10
4–
=
λ 10
6–
=
λ 10
4–
=
λ 10
2–
=
Resolutions
204 Artificial Neural Networks
With respect to the Coordinate Inverse problem, the application of the termination
criterion ( ), to the LM method, minimizing the new criterion, is expressed
in the next table:
36. The generalization parameter is 2, so there are 2 overlays.
a)
FIGURE 4.66 - Overlay diagram, with
There are cells within the lattice. There are 18 basis
functions within the network. At any moment, only 2 basis functions are active in the
network.
Table 4.13 - Explicit Regularization (G0)
Method
Number of
Iterations Error Norm Linear Weight Norm
Condition of Basis
Functions
100 0.49 91 4 1015
100 0.3544 27 5.2 1015
100 1.229 11.5 2.4 1018
τ 10
3–
=
λ 10
6–
=
λ 10
4–
=
λ 10
2–
=
a3a1
Input lattice
1st
overlay
d1=(1,1)
2nd
overlay
d2=(2,2)
a2
a4 a5 a6
a7 a8 a9
a10 a11 a12
a13 a14 a15
a16 a17 a18
ρ 2=
p’ ri 1+( )
i 1=
n
∏ 5
2
25= = =
Chapter 2
Artificial Neural Networks 205
b) Analysing fig. 4.66 , we can see that as the input moves along the lattice one cell
parallel to an input axis, the number of basis functions dropped from, and intro-
duced to, the output calculation is a constant (1) and does not depend on the input.
c) A CMAC is said to be well defined if the generalization parameter satisfies:
(4.14)
37. The decomposition of the basis functions into overlays demonstrates that the
number of basis functions increases exponentially with the input dimension. The total
number of basis functions is the sum of basis functions in each overlay. This number,
in turn, is the product of the number of univariate basis functions on each axis. These
have a bounded support, and therefore there are at least two defined on each axis.
Therefore, a lower bound for the number of basis functions for each overlay, and
subsequently, for the AMN, is 2n
. These networks suffer therefore from the curse of
dimensionality. In B-splines, this problem can be alleviated decomposing a
multidimensional network into a network composed of additive sub-networks of
smaller dimensions. An algorithm to perform this task is the ASMOD algorithm.
38. The network has 4 inputs and 1 output.
a) The network can be described as: . The number
of basis functions for each sub-network is given by: . Therefore,
we have basis functions for
the overall network. In terms of active basis functions, we have ,
where n is the number of sub-networks, ni is the number of inputs for sub-network
i, and kj,i is the B-spline order for the jth dimension of the ith sub-network. For this
case, .
ρ maxi ri 1+(≤ ≤
ρ
f1 x1( ) f2 x2( ) f3 x3( ) f4 x3 x,(+ + +=
p ri ki+( )
i 1=
n
∏=
5 2+( ) 4 2+( ) 3 2+( ) 4 3+( )
2
+ + + 18 49+= =
p’’ kj i,
j 1=
ni
∏
i 1=
n
∑=
2 2 2 3 3×+ + + 1= =
Resolutions
206 Artificial Neural Networks
b) The ASMOD algorithm can be described as:
Algorithm 4.1 - ASMOD algorithm
Each main part of the algorithm will be detailed below.
•Candidate models are generated by the applications of a refinement step, where the
complexity of the current model is increased, and a pruning set, where the current
model is simplified, in an attempt to determine a simpler method that performs as
well as the current model. Note that, in the majority of the cases, the latter step
does not generate candidates that are eager to proceed to the next iteration.
Because of this, this set is often applied after a certain number of refinement steps,
or just applied to the optimal model resulting from the ASMOD algorithm, with
just refinement steps.
Three methods are considered for model growing:
1.For every input variable not presented in the current network, a new univariate
sub-model is introduced in the network. The spline order and the number of
interior knots is specified by the user, and usually 0 or 1 interior knots are
applied;
2.For every combination of sub-models presented in the current model, combine
them in a multivariate network with the same knot vector and spline order. Care
must be taken in this step to ensure that the complexity (in terms of weights) of
the final model does not overpass the size of the training set;
3.For every sub-model in the current network, for every dimension in each sub-
model split every interval in two, creating therefore candidate models with a
complexity higher of 1.
For network pruning, also three possibilities are considered:
1.For all univariate sub-models with no knots in the interior, replace them by a
spline of order k-1, also with no interior knots. If k-1=1, remove this sub-model
from the network, as it is just a constant;
i = 1;
termination criterion = FALSE;
WHILE NOT(termination criterion)
Generate a set of candidate networks;
Estimate the parameters for each candidate network;
Determine the best candidate, , according to some crite-
rion J;
IF termination criterion = TRUE END;
i=i+1;
END
mi 1– Initial Mode=
Mi
mi
mi( ) J mi 1–(≥
Chapter 2
Artificial Neural Networks 207
2.For every multivariate (n inputs) sub-models in the current network, split them
into n sub-models with n-1 inputs;
3.For every sub-model in the current network, for every dimension in each sub-
model, remove each interior knot, creating therefore candidate models with a
complexity smaller of 1.
39. Recall Exercise 1.3. Consider that no interior knots are employed. Therefore, a B-
spline of order 1 is given by:
(4.15)
The output corresponding to this basis function is therefore:
, (4.16)
which means that with a sub-model which is a B-spline of order 1, any constant term
can be obtained.
Consider now a spline of order 2. It is defined as:
(4.17)
It is easy to see that
(4.18)
For our case,
(4.19)
The outputs corresponding to these basis functions are simply:
N1
1
x( )
1 x I1∈,
0 x I1∉,





=
N1
1
x( ))
w1 x I∈,
0 x I1∉,





=
N2
j
x( )
x λj 2––
λj 1– λj 2––
-----------------------------
 
  N1
j 1–
x( )
λj x–
λj λj 1––
----------------------
 
  N1
j
x( )+=
 
  j, 1 2,=
N2
1
x( )
λ1 x–
λ1 λo–
-----------------
 
 = x I∈,
N2
2
x( )
x λ0–
λ1 λ0–
-----------------
 
 = x I∈,
N2
1
x1( ) 1 x1–( )= x1 I∈,
N2
2
x1( ) x1= x1 I1∈,
N2
1
x2( )
1 x2–
2
--------------
 
 = x2 I∈,
N2
2
x2( )
x2 1+
2
--------------
 
 = x2 I∈,
Resolutions
208 Artificial Neural Networks
(4.20)
Therefore, we can construct the functions 4x1 and -2x2 just by setting w2=0, w3=4,
and w4=4 w5=0. Note that this is not the only solution. Using this solution, note that
, which means that we must subtract 2, in order to get -2x2.
Consider now a bivariate sub-model, of order 2. As we know, bivariate B-splines are
constructed from univariate B-splines using:
(4.21)
We have now 4 basis functions:
(4.22)
These are equal to:
(4.23)
Therefore, the corresponding output is:
N2
1
x1( )( ) w2 1 x1–( )= x1 I∈,
y N2
2
x1( )( ) w3x1
= x1 I1∈,
N2
1
x2( )( ) w4
1 x2–
2
--------------
 
 = x2 I∈,
N2
2
x2( )( ) w5
x2 1+
2
--------------
 
 = x2 I∈,
N2
1
x2( )( ) 2 2x–=
Nk
j
x( ) Nki i,
j
xi( )
i 1=
n
∏=
2
1
x1 x2,( ) 1 x1–( )
1 x2–
2
--------------
 
 = x1 I1∈ x2 ∈, ,
N2 2,
2
x1 x2,( ) x1
x2 1+
2
--------------
 
 = x1 I1∈ x2 I1∈, ,
N2 2,
3
x1 x2,( ) x1
1 x2–
2
--------------
 
 = x1 I1∈ x2 I1∈, ,
2
4
x1 x2,( ) 1 x1–( )
x2 1+
2
--------------
 
 = x1 I1∈ x2 ∈, ,
2 2,
1
x1 x2,( )
1 x1– x2– x1x2+
2
------------------------------------------
 
 = x1 I1∈ x2 ∈, ,
N2 2,
2
x1 x2,( )
x1x2 x1+
2
----------------------
 
 = x1 I1∈ x2 I1∈, ,
N2 2,
3
x1 x2,( )
x1x2 x1+
2
----------------------
 
 –= x1 I1∈ x2 I1∈, ,
2
4
x1 x2,( )
1 x1– x2 x1– x2+ +
2
---------------------------------------------
 
 –= x1 I1∈ x2 ∈, ,
Chapter 2
Artificial Neural Networks 209
(4.24)
The function 0.5x1x2 can be constructed in many ways. Consider w6=w8=w9=0 and
w7=1. Therefore , which means that we must
subtract x1/2 from the output to get 0.5x1x2. This means that we should not design
4x1, but 7/2x1, therefore setting w3=7/2.
To summarize, we can design a network implementing the function
by employing 4 sub-networks, all with zero
interior knots:
1.A univariate sub-network (input x1 or x2, it does not matter) of order 1, with
w1=1;
2.A univariate sub-network with input x1, order 2, with w1=0 and w3=7/2;
3.A univariate sub-network with input x2, order 2, with w4=4 and w5=0;
4.A bivariate sub-network with inputs x1 and x2, order 2, with w6=w8=w9=0 and
w7=1.
40.
a) The Matlab functions in Asmod.zip were employed to solve this problem. First,
gen_set.m was employed, to split the training sets between estimation and valida-
tion sets, with a percentage of 30% for the latter. Then Asmod was employed, with
the termination criterion formulated as: the training stopped if the MSE for the val-
idation set increased constantly in the last 4 iterations, or the standard ASMOD
termination was found. In the following tables, the first row illustrates the results
obtained with this approach. The second row illustrates the application of the
model obtained with the standard ASMOD, trained using all the training set, to the
estimation and validation sets used in the other approach.
Concerning the pH problem, the following results were obtained:
Table 4.25 - ASMOD Results - Early-Stopping versus complete training (pH problem)
MSEe MSREe MSEv MREv Compl. Wei. N.
8.6 10-9
5.9 10-7
4.3 10-6 0.034 42 3.7
1.4 10-31
8.5 10-31
1.5 10-31
2.2 10-30 101 5.8
N2 2,
1
x1 x2,( )) w6
1 x1– x2– x1x2+
2
------------------------------------------
 
 = x1 I1∈ x2 ∈, ,
y N2 2,
2
x1 x2,( )( ) w7
x1x2 x1+
2
----------------------
 
 = x1 I1∈ x2 I1∈, ,
y N2 2,
3
x1 x2,( )( ) w8
x1x2 x1+
2
----------------------
 
 –= x1 I1∈ x2 I1∈, ,
2 2,
4
x1 x2,( )) w9
1 x1– x2 x1– x2+
2
---------------------------------------
 
 –= x1 I1∈ x2, ,
2 2,
2
x1 x2,( ))
x1x2 x1+
2
----------------------
 
 = x1 I1∈ x2 ∈, ,
f x1 x2,( ) 3 4x1 2x2– 0.5x1x2+ +=
Resolutions
210 Artificial Neural Networks
Concerning the Coordinate Transformation problem, the following results were
obtained:
For both cases, the MSE for the validation set is much lower if the training is per-
formed using all the data.
b) The Matlab functions in Asmod.zip were employed to solve this problem. Differ-
ent values of the regularization parameter were employed,
Concerning the pH problem, the following table summarizes the results obtained:
Concerning the Coordinate Transformation problem, the following table summarizes
the results obtained:
For both cases, an increase in the regularization parameter decreases the MSE,
decreases the complexity and the linear weight norm.
c) To minimize the MSRE, we can apply the following strategy: the training criterion
can be changed to: . This is equivalent to:
Table 4.26 - ASMOD Results - Early-Stopping versus complete training (CT problem)
MSEe MSREe MSEv MREv Compl. Wei. N.
2.7 10-4 7.7 109 15 10-3 9.7 1012 36 5.5
1.4 10-5
4.9 105
1.6 10-5
2.3 105 65 9.6
Table 4.27 - ASMOD Results - Different regularization values (pH problem)
Reg. factor MSE Criterion Complexity Weight Norm N. Candidates N. Iterations
1.4 10-31 -6,705 101 5.82 9945 101
3.2 10-6 -1,190 17 2.25 341 19
3.9 10-9 -1,673 61 4.36 3569 61
3.6 10-13 -2,440 98 5.71 11056 107
Table 4.28 - ASMOD Results - Different regularization values (CT problem)
Reg. factor MSE Criterion Complexity Weight Norm N. Candidates N. Iterations
1.5 10-5 -916 65 9.6 1043 30
3.7 10-5 -831.7 64 4.18 921 26
1.7 10-5 -918.5 61 5.1 1264 33
1.5 10-5 -915.7 65 9.1 1067 30
λ 0=
λ 10
2–
=
λ 10
4–
=
λ 10
6–
=
λ 0=
λ 10
2–
=
λ 10
4–
=
λ 10
6–
=
ti yi–
ti
-------------
 
 
2
i 1=
n
∑ ti 0≠,
Chapter 2
Artificial Neural Networks 211
, or, in matrix form: , where T is a diagonal matrix
with the values of the target vector in the diagonal, and 1 is a vector of ones. As y
is a linear combination of the outputs of the basis functions, A, we can employ, to
determine the optimal weights: . Using this strategy, we compare
the results obtained by the ASMOD algorithm, in terms of the MSE and MSRE,
using regularization or not, with the standard criterion. The first 4 rows show the
results obtained by the ASMOD algorithm, in terms of the MSE criterion, and the
last four rows the MSRE. The Matlab functions in Asmod.zip were employed to
solve this problem.
Concerning the pH problem, the following table summarizes the results obtained:
Concerning the Coordinate Transformation problem, the following table summarizes
the results obtained:
Table 4.29 - ASMOD Results - MSE versus MSRE (pH problem)
Reg. factor MSE MSRE Criterion Complexity Weight Norm N. Cand. N. Iterations
1.4 10-31
1.2 10-30 -6,705 101 5.82 9945 101
3.2 10-6
2.7 10-5 -1,190 17 2.25 341 19
3.9 10-9 4.2 10-7 -1,673 61 4.36 3569 61
3.6 10-13 1.5 10-12 -2,440 98 5.71 11056 107
1.9 10-10
1.1 10-30 -6,427 101 5.81 9699 99
9.5 10-7
2.1 10-6 -1,173 29 2.54 989 34
1.3 10-9
1.7 10-9 -1,652 79 4.75 6723 84
2.2 10-10
2.2 10-13 -2,462 98 5.71 11222 108
Table 4.30 - ASMOD Results - MSE versus MSRE (CT problem)
Reg. factor MSE MSRE Criterion Complexity Weight Norm N. Cand. N. Iterations
1.5 10-5
6.2 105 -916 65 9.6 1043 30
3.7 10-5
5.1 109 -831.7 64 4.2 921 26
1.7 10-5
9.9 107 -918.5 61 5.1 1264 33
1.5 10-5 6.7 105 -915.7 65 9.1 1067 30
2.5 10-5
3.2 10-6 -869 111 43.14 1273 38
4.2 10-5
9.9 10-6 -924 65 3.7 495 19
1
yi
ti
----–
 
 
2
i 1=
n
∑ ti 0≠, 1 T
1–
y–
2
wˆ T
1–
A( )
+
1=
λ 0=
λ 10
2–
=
λ 10
4–
=
λ 10
6–
=
λ 0=
λ 10
2–
=
λ 10
4–
=
λ 10
6–
=
λ 0=
λ 10
2–
=
λ 10
4–
=
λ 10
6–
=
λ 0=
λ 10
2–
=
Resolutions
212 Artificial Neural Networks
We can observe that, as expected, the use of the MSRE criterion achieves better
results in terms of the final MRSE, and often better results also in terms of the MSE.
The difference in terms of MRSE is more significant in terms of the Coordinate
Inverse problem, as it has significant smaller values of the target data than the pH
problem.
d) We shall compare here the results of early-stopping methods, with the two criteria,
with no regularization or with different values of the regularization parameter.
First we shall use the MSE criterion. The first four rows were obtained using an
early-stopping method, where 30% of the data were used for validation. The last four
rows illustrate the results obtained, for the same estimation and validation data, but
with model trained on all the data. The Matlab function gen_set.m and the files in
Asmod.zip were used for this problem.The termination criterion, for the early-stop-
ping method was formulated as: the training stopped if the MSE for the validation set
increased constantly in the last 4 iterations, or the standard ASMOD termination was
found. This can be inspected comparing the column It. Min with N. It. If they are
equal, the standard termination criterion was found first.
Concerning the pH problem, the results obtained are in the table below:.
2.4 10-6
3.6 10-7 -1,114 110 4 594 22
1.4 10-6
1.8 10-7 -1,182 110 4.4 829 27
Table 4.31 - ASMOD Results - Early Stopping versus complete training; MSE (pH problem)
Reg. factor MSEe MSREe MSEv MREv It Min Crit. Comp W. N. N. C. N It
8.6 10-9
5.9 10-7
4.3 10-6 0.034 41 -1139 42 3.7 1853 45
4.3 10-6 2.8 10-4 7.5 10-6 0.034 19 -808 16 2.2 285 19
7.6 10-9
5.6 10-7
4.3 10-6 0.034 47 -1139 44 3.7 2098 47
9.9 10-9
6 10-7
4.4 10-6 0.034 41 -1133 41 3.7 1853 45
1.4 10-31
8.5 10-31
1.5 10-31
2.2 10-30 --- -6705 101 5.82 9945 101
3.4 10-6
2.8 10-5
2.7 10-6
2.8 10-5 --- -1190 17 2.25 341 19
4.3 10-9
9.5 10-8
3 10-9
5.6 10-7 --- -1673 61 4.36 3569 61
3.4 10-13
1.5 10-12
3.3 10-9
1.0 10-6 --- -2440 98 5.71 11056 107
Table 4.30 - ASMOD Results - MSE versus MSRE (CT problem)
Reg. factor MSE MSRE Criterion Complexity Weight Norm N. Cand. N. Iterations
λ 10
4–
=
λ 10
6–
=
λ 0=
λ 10
2–
=
λ 10
4–
=
λ 10
6–
=
λ 0=
λ 10
2–
=
λ 10
4–
=
λ 10
6–
=
Chapter 2
Artificial Neural Networks 213
Concerning the Coordinate Transformation problem, the initial model consisted of 2
univariate sub-models and 1 bivariate sub-model, all with 0 interior knots. The fol-
lowing table summarizes the results obtained:
Then we shall employ the MSRE criterion. Concerning the pH problem, the results
obtained are in the table below:.
Table 4.32 - ASMOD Results - Early Stopping versus complete training; MSE (CT problem)
Reg. factor MSEe MSREe MSEv MREv It Min Crit. Comp W. N. N. C. N It
2.7 10-4
7.7 109
15 10-3
9.7 1012 10 -476 36 5.5 499 101
4.5 10-5
7.0 109
12 10-3
1013 31 -553 50 3.9 1103 31
9 10-6
4.3 107
13 10-3
1013 31 -680 49 5.3 1135 32
7.5 10-7 412 11 10-3
3.7 1012 40 -751 77 5.2 1670 40
1.4 10-5
4.9 105
1.6 10-5
2.4 105 --- -916 65 9.6 1043 30
3.7 10-5 5.5 109 2.6 10-6 4.2 109 --- -831.7 64 4.18 921 26
1.6 10-5
1.4 108
2.2 10-5
1.1 107 --- -918.5 61 5.1 1264 33
1.4 10-5
7.3 105
1.6 10-5
5.9 103 --- -915.7 65 9.1 1067 30
Table 4.33 - ASMOD Results - Early Stopping versus complete training; MSRE (pH problem)
Reg. factor MSEe MSREe MSEv MREv It Min Crit. Comp W. N. N. C. N It
1.3 10-8
2.3 10-8
4.3 10-6 0.034 49 -1038 49 3.6 2605 53
1.2 10-6
2.5 10-6
4.8 10-6 0.034 31 -805 26 2.4 821 31
9.7 10-9 1.9 10-8 4.4 10-6 0.034 49 -1047 50 3.6 2605 53
9.2 10-9
1.8 10-8
4.3 10-6 0.034 49 -1050 50 3.6 2605 53
5.8 10-31
1.4 10-30
6.5 10-10
6.3 10-31 --- -6,427 101 5.81 9699 99
1 10-6
2 10-6
7.9 10-7
2.2 10-6 --- -1,173 29 2.54 989 34
1.2 10-9
1.7 10-9
1.8 10-9
1.7 10-9 --- -1,652 79 4.75 6723 84
1.5 10-13
2 10-13
7.5 10-10
2.6 10-13 --- -2,462 98 5.71 11222 108
λ 0=
λ 10
2–
=
λ 10
4–
=
λ 10
6–
=
λ 0=
λ 10
2–
=
λ 10
4–
=
λ 10
6–
=
λ 0=
λ 10
2–
=
λ 10
4–
=
λ 10
6–
=
λ 0=
λ 10
2–
=
λ 10
4–
=
λ 10
6–
=
Resolutions
214 Artificial Neural Networks
Concerning the Coordinate Inverse problem, the initial model consisted of 2 univari-
ate sub-models and 1 bivariate sub-model, all with 2 interior knots. The following
table summarizes the results obtained:
For all cases, the MSE or the MSRE for the validation set is much lower if the training
is performed using all the data. This is more evident for the CT problem, employing
the MSRE criterion.
Table 4.34 - ASMOD Results - Early Stopping versus complete training; MSRE (CT problem)
Reg. factor MSEe MSREe MSEv MREv It Min Crit. Comp W. N. N. C. N It
4.9 10-6
8.2 10-7 0.014 1.2 1013 15 -787 67 4.9 624 19
5.7 10-5
1.3 10-5 0.014 1 1013 18 -631 54 3.7 519 18
1.9 10-6
4.2 10-7 0.014 1.1 1013 24 -805 74 5.4 1037 24
5 10-6
8.3 10-7 0.017 1.2 1013 15 -787 67 4.9 626 19
2.8 10-5
3.6 10-6
1.9 10-5
2.3 10-6 --- -869 111 43.14 1273 38
4.7 10-5
1.1 10-5
3.2 10-5
7.4 10-6 --- -924 65 3.7 495 19
2.5 10-6
3.6 10-7
2.2 10-6
3.6 10-7 --- -1,114 110 4 594 22
1.5 10-6
1.9 10-7
1.2 10-6
1.7 10-7 --- -1,182 110 4.4 829 27
λ 0=
λ 10
2–
=
λ 10
4–
=
λ 10
6–
=
λ 0=
λ 10
2–
=
λ 10
4–
=
λ 10
6–
=

Mais conteúdo relacionado

Mais procurados (20)

Sub1567
Sub1567Sub1567
Sub1567
 
Control assignment#3
Control assignment#3Control assignment#3
Control assignment#3
 
Inner product space
Inner product spaceInner product space
Inner product space
 
Mech MA6351 tpde_notes
Mech MA6351 tpde_notes Mech MA6351 tpde_notes
Mech MA6351 tpde_notes
 
maths ppt.pdf
maths ppt.pdfmaths ppt.pdf
maths ppt.pdf
 
C0531519
C0531519C0531519
C0531519
 
Algebra 2 Section 4-1
Algebra 2 Section 4-1Algebra 2 Section 4-1
Algebra 2 Section 4-1
 
Anti-Synchronizing Backstepping Control Design for Arneodo Chaotic System
Anti-Synchronizing Backstepping Control Design for Arneodo Chaotic SystemAnti-Synchronizing Backstepping Control Design for Arneodo Chaotic System
Anti-Synchronizing Backstepping Control Design for Arneodo Chaotic System
 
Chain Rule
Chain RuleChain Rule
Chain Rule
 
B010310813
B010310813B010310813
B010310813
 
Chain Rule
Chain RuleChain Rule
Chain Rule
 
Some properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spacesSome properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spaces
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracing
 
線形回帰モデル
線形回帰モデル線形回帰モデル
線形回帰モデル
 
Lesson 10: The Chain Rule
Lesson 10: The Chain RuleLesson 10: The Chain Rule
Lesson 10: The Chain Rule
 
2-D array
2-D array2-D array
2-D array
 
Datastructure tree
Datastructure treeDatastructure tree
Datastructure tree
 
Chapter 5 assignment
Chapter 5 assignmentChapter 5 assignment
Chapter 5 assignment
 
Knapsack problem using dynamic programming
Knapsack problem using dynamic programmingKnapsack problem using dynamic programming
Knapsack problem using dynamic programming
 
Cs229 notes-deep learning
Cs229 notes-deep learningCs229 notes-deep learning
Cs229 notes-deep learning
 

Semelhante a redes neuronais

Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronsmitamm
 
Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2Scilab
 
The Controller Design For Linear System: A State Space Approach
The Controller Design For Linear System: A State Space ApproachThe Controller Design For Linear System: A State Space Approach
The Controller Design For Linear System: A State Space ApproachYang Hong
 
Discrete control2 converted
Discrete control2 convertedDiscrete control2 converted
Discrete control2 convertedcairo university
 
Lyapunov-type inequalities for a fractional q, -difference equation involvin...
Lyapunov-type inequalities for a fractional q, -difference equation involvin...Lyapunov-type inequalities for a fractional q, -difference equation involvin...
Lyapunov-type inequalities for a fractional q, -difference equation involvin...IJMREMJournal
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machinenozomuhamada
 
Natural and Clamped Cubic Splines
Natural and Clamped Cubic SplinesNatural and Clamped Cubic Splines
Natural and Clamped Cubic SplinesMark Brandao
 
Computer Architecture 3rd Edition by Moris Mano CH 01-CH 02.ppt
Computer Architecture 3rd Edition by Moris Mano CH  01-CH 02.pptComputer Architecture 3rd Edition by Moris Mano CH  01-CH 02.ppt
Computer Architecture 3rd Edition by Moris Mano CH 01-CH 02.pptHowida Youssry
 
Computer organiztion3
Computer organiztion3Computer organiztion3
Computer organiztion3Umang Gupta
 
Ch1-2, Digital Logic Circuit and Digital Components.ppt
Ch1-2, Digital Logic Circuit and Digital Components.pptCh1-2, Digital Logic Circuit and Digital Components.ppt
Ch1-2, Digital Logic Circuit and Digital Components.pptJamesRodriguez109117
 

Semelhante a redes neuronais (20)

Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2
 
The Controller Design For Linear System: A State Space Approach
The Controller Design For Linear System: A State Space ApproachThe Controller Design For Linear System: A State Space Approach
The Controller Design For Linear System: A State Space Approach
 
Fourier series
Fourier seriesFourier series
Fourier series
 
Discrete control2 converted
Discrete control2 convertedDiscrete control2 converted
Discrete control2 converted
 
Lyapunov-type inequalities for a fractional q, -difference equation involvin...
Lyapunov-type inequalities for a fractional q, -difference equation involvin...Lyapunov-type inequalities for a fractional q, -difference equation involvin...
Lyapunov-type inequalities for a fractional q, -difference equation involvin...
 
Assignment2 control
Assignment2 controlAssignment2 control
Assignment2 control
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 
Natural and Clamped Cubic Splines
Natural and Clamped Cubic SplinesNatural and Clamped Cubic Splines
Natural and Clamped Cubic Splines
 
alt klausur
alt klausuralt klausur
alt klausur
 
Computer Architecture 3rd Edition by Moris Mano CH 01-CH 02.ppt
Computer Architecture 3rd Edition by Moris Mano CH  01-CH 02.pptComputer Architecture 3rd Edition by Moris Mano CH  01-CH 02.ppt
Computer Architecture 3rd Edition by Moris Mano CH 01-CH 02.ppt
 
Computer organiztion3
Computer organiztion3Computer organiztion3
Computer organiztion3
 
Ch1 2 (2)
Ch1 2 (2)Ch1 2 (2)
Ch1 2 (2)
 
Signals and Systems Assignment Help
Signals and Systems Assignment HelpSignals and Systems Assignment Help
Signals and Systems Assignment Help
 
Control assignment#4
Control assignment#4Control assignment#4
Control assignment#4
 
Servo systems
Servo systemsServo systems
Servo systems
 
Control assignment#2
Control assignment#2Control assignment#2
Control assignment#2
 
Logic gates
Logic gatesLogic gates
Logic gates
 
Digital logic circuit
Digital logic circuitDigital logic circuit
Digital logic circuit
 
Ch1-2, Digital Logic Circuit and Digital Components.ppt
Ch1-2, Digital Logic Circuit and Digital Components.pptCh1-2, Digital Logic Circuit and Digital Components.ppt
Ch1-2, Digital Logic Circuit and Digital Components.ppt
 

Último

Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Dipal Arora
 
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls JaipurCall Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipurparulsinha
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...chandars293
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...Taniya Sharma
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...Arohi Goyal
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...astropune
 
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...vidya singh
 
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Call Girls in Nagpur High Profile
 
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...astropune
 
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...indiancallgirl4rent
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escortsaditipandeya
 
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Servicevidya singh
 
Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Bareilly Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 

Último (20)

Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls JaipurCall Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
 
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
 
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
 
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
 
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
 
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
 
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
 
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
 
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
 
Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Bareilly Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
 

redes neuronais

  • 1. Artificial Neural Networks 173 CHAPTER 4 Resolutions Chapter 1 1. a) Please see text in Section 1.2. b) Please see text in Section 1.2 2. Please see text in Section 3. This is a very simple exercise. a) The net input is just . The corresponding output is . b) To obtain the input patterns you can use the Matlab function inp=randn(10,2). You can also obtain values downloading the file inp.mat. The Matlab function required is in singneur.m. 4. Consider . a) The derivative, , of this activation function, with respect to the standard devia- tion, is: et 1 1 1 1 1– 0.5 0= = 1 1 e 0.5– + -------------------- 0.622= = fi Ci k, σi,( ) e Ci k, xk–( )2 k 1= n ∑ 2σi 2 ------------------------------------– = σi∂ ∂fi σi
  • 2. Resolutions 174 Artificial Neural Networks b) The derivative, , of this activation function, with respect to its kth centre, is: σi∂ ∂fi e Ci k, xk–( )2 k 1= n ∑ 2σi 2 ------------------------------------–                   ′ f– i Ci k, xk–( )2 k 1= n ∑ 2σi 2 --------------------------------------                 ′ ⋅ f– i Ci k, xk–( )2 k 1= n ∑ 2 -------------------------------------- σi 2–( ) ′ ⋅ ⋅ f– i Ci k, xk–( )2 k 1= n ∑ 2 -------------------------------------- 2σi 3––⋅ ⋅ fi Ci k, xk–( )2 k 1= n ∑ σi 3 --------------------------------------⋅ = = = = = Ci k,∂ ∂fi Ci k, Ci k,∂ ∂fi e Ci k, xk–( )2 k 1= ∑ 2σi 2 ------------------------------------–                   f– i Ci k, xk–( )2 k 1= n ∑ 2σi 2 --------------------------------------                 ⋅ f– i 1 2σi 2 --------- Ci k, xk–( )2 k 1= n ∑         ′ ⋅ ⋅ f– i 1 2σi 2 --------- 2 Ci k, xk–( )⋅ ⋅ f– i Ci k, xk–( ) -------------------------⋅ = = = = =
  • 3. Chapter 1 Artificial Neural Networks 175 c) The derivative, , of this activation function, with respect to the kth input, is: 5. The recursive definition a B-spline function is: a) By definition . b) We now have to determine which are the values of and . If , then , and . If , then , and . Splines of order 2 can be seen in fig. 1.13 b). xk∂ ∂fi xk e Ci k, xk–( )2 k 1= ∑ 2σi 2 ------------------------------------–                   f– i Ci k, xk–( )2 k 1= ∑ 2σi 2 --------------------------------------                 ⋅ fi 1 2σi 2 --------- Ci k, xk–( )2 k 1= n ∑         ′ ⋅ ⋅ f– i 1 2σi 2 --------- 2–( ) Ci k, x–(⋅ ⋅ 2 C x( ) = = = Nk j x( ) x λj k–– λj 1– λj k–– ----------------------------     Nk 1– j 1– x( ) λj x– λj λj k– 1+– -----------------------------     Nk 1– j x( )+= N1 j x( ) 1 if x Ij∈ 0 otherwise   = 1 j x( ) 1 if x Ij∈ 0 otherwise   = N2 j x( ) x λj 2–– λj 1– λj 2–– -----------------------------     N1 j 1– x( ) λj x– λj λj 1–– ----------------------     N1 j x( )+= N1 j 1– x( ) N1 j x( ) x Ij∈ N1 j 1– x( ) 0 N1 j x( ) 1 = = N2 j x( ) λj x– λj λj –– -------------------= x Ij 1–∈ N1 j 1– x( ) 1 N1 j x( ) 0 = = 2 j x( ) x λj 2–– λj 1– λj –– -------------------------=
  • 4. Resolutions 176 Artificial Neural Networks c) We now have to find out which are the values of and . We have done that above, and: . Replacing the last two equations, we have: In a more compact form, we have: Assuming that the knots are equidistant, and that every interval is denoted by , we can have: N3 j x( ) x λj 3–– λj 1– λj 3–– -----------------------------     N3 j 1– x( ) λj x– λj λj 2–– ----------------------     N2 j x( )+= N2 j 1– x( ) N2 j x( ) 1– x( ) λj 1– x– λj 1– λj 2–– ----------------------------- if x Ij –∈, x λj 3–– λj 2– λj 3–– ----------------------------- if x Ij –∈,        = x( ) λj 1– x– λj 1– λj 1–– ----------------------------- if x Ij∈, x λj 2–– λj 1– λj 2–– ----------------------------- if x Ij –∈,        = x λj 3–– λj 1– λj 3–– ----------------------------- x λj 3–– λj 2– λj 3–– -----------------------------⋅ x Ij 2–∈, x λj 3–– λj 1– λj 3–– ----------------------------- λj 1– x– λj 1– λj 2–– -----------------------------⋅ λj x– λj λj 2–– ---------------------- x λj 2–– λj 1– λj 2–– -----------------------------⋅+ x ∈, λj x– λj λj 2–– ---------------------- λj x– λj λj 1–– ----------------------⋅ x Ij∈,            x λj 3––( ) 2 λj 1– λj 3––( ) λj 2– λj 3––( ) -------------------------------------------------------------------- x Ij 2–∈, x λj 3–– λj 1– λj 3–– ----------------------------- λj 1– x– λj 1– λj 2–– -----------------------------⋅ λj x– λj λj 2–– ---------------------- x λj 2–– λj 1– λj 2–– -----------------------------⋅+ x ∈, λj x–( ) 2 λj λj 2–( ) λj λj 1–( ) ------------------------------------------------------ x Ij∈,            ∆
  • 5. Chapter 1 Artificial Neural Networks 177 Splines of order 3 can be seen in fig. 1.13 c). 6. Please see text text in Section 1.3.3. 7. The input vector is: and the desired target vector is: . a) The training criterion is: , where . The output vector, y, is, in this case: , and therefore: . The gradient vector, in general form, is: For the point [0,0], it is: x λj 3––( ) 2 2∆( ) 2 --------------------------- x Ij 2–∈, x λj 3––( ) λj 1– x–( ) 2∆( ) 2 -------------------------------------------------- λj x–( ) x λj 2––( ) 2∆( ) 2 -------------------------------------------+ x I∈, λj x–( ) 2 2∆( ) 2 -------------------- x Ij∈,              1– 0.5– 0 0.5 1, , , ,[ ]= 1 0.25 0 0.25 1, , , ,[ ]= Ω e 2 i[ ] i 1= 5 ∑ 2 ---------------------= e t y–= xw2 w+= Ω 1 w1 w2–( )–( ) 2 0.25 w1 0.5w2–( )–( ) 2 0 w1–( ) 2 0.25 w1 0.5w2+( )–( ) 2 1 w1 w2+( )–( ) 2 + + + + ( ) 2⁄ = w1∂ ∂Ω w1∂ ∂ e– 1 e2– e3– e4 e5+ + e1 0.5e2 0 0.5e4– –+ + = =
  • 6. Resolutions 178 Artificial Neural Networks . b) For each pattern p, the correlation matrix is: . For the 5 input patterns, we have (we have just 1 output, and therefore a weight vector): Chapter 2 8. A decision surface which is an hyperplane, such as the one represented in the next figure, separates data into two classes: If it is possible to define an hyperplane that separates the data into two classes (i.e., if it is possible to determine a weight vector w that accomplishes this), then data is said to be linearly separable. e– 1 e2– e3– e4 e5+ + 0.5e2 0 0.5e4– e5–+ + w 0 0 = 2– 0 = W IpTp T = 1 1– 1 1 0.5– 0.25⋅–⋅ 0 1 0.5 0.25⋅ 1 1 1⋅+ + + = Class C1 Class C2 w1x1 w2x2 θ–+ 0=
  • 7. Chapter 2 Artificial Neural Networks 179 The above figure illustrates the 2 classes of an XOR problem. There is no straight line that can separate the circles from the crosses. Therefore, the XOR problem is not linearly separable. 9. In an Adaline, the input and output variables are bipolar {-1,+1}, while in a Perceptron the inputs and outputs are 0 or 1. The major difference, however, lies in the learning algorithm, which in the case of the Adaline is the LMS algorithm, and in the Perceptron is the Perceptron Learning Rule. Also, the point where the error is computed, in an Adaline is at the neti point, and not at the output, in a Perceptron. Therefore, in an Adaline, error is not limited to the discrete values {-1, 0, 1} as in the normal perceptron, but can take any real value. 10. Consider the figure below: The AND function has the following truth table: Class C1 Class C2 w1x1 w2x2 θ–+ 0=
  • 8. Resolutions 180 Artificial Neural Networks This means that if we design a line passing through the points (0,1) and (1,0), and translate this line so that it stays in the middle of these points, and the point (1,1), we have a decision boundary that is able to classify data according to the AND function. The line that passes through (0,1), (1,0) is given by: , which means that . Any value of satisfying will do the job. 11. Please see Ex. 2.4. 12. The exclusive OR function can be implemented as: Therefore, we need two AND functions, and one OR function. To implement the first AND function, if the sign of the 1st weight is changed, and the 3rd weight is changed to , then the original AND function implements the function (Please see Ex. 2.4). Using the same reasoning, if the sign of weight 2 is changed, and the 3rd weight is changed to , then the original AND function implements the function . Finally, if the perceptron implementing the OR function is employed, with the out- puts of the previous perceptrons as inputs, the XOR problem is solved. Then, the implementation of the function uses just the Adaline that implements the AND function, with inputs and the output of the XOR function. 13. Please see Section 2.1.2.2. 14. Assume that you have a network with just one hidden layer (the proof can be easily Table 4.172 - AND truth table I1 I2 AND 0 0 0 0 1 0 1 0 0 1 1 1 x1 x2 1–+ 0= w1 w2 θ 1= = = θ 1 θ 2< < X Y⊕ XY XY+= θ w1+ xy θ w2+ xy f x1 x2 x3, ,( ) x1 x2 x3⊕( )∧= x1
  • 9. Chapter 2 Artificial Neural Networks 181 extended to more than 1 hidden layer). The output of the first hidden layer, for pattern p, can be given as: , as the activation functions are linear. In the same way, the output of the network is given by: . Combining the two equations, we stay with: . Therefore, a one hidden layer network, with linear activation functions, are equiva- lent to a neural network with no hidden layers. 15. Let us consider . Let us compute the square: Please note that all the terms in the numerator of the two last fractions are scalar. a) Let us compute . The derivative of the first term in the numerator of the last equation is null, as it does not depend on w. is a row vector, and so the next term in the numerator is a dot product (if we denote as xT, the dot prod- uct is: . Therefore . Op ., 2( ) W 1( ) Op ., 1( ) = Op ., 3( ) W 2( ) Op ., 2( ) = Op ., 3( ) W 2( ) W 1( ) Op ., 1( ) WOp ., 1( ) = = Ωl t Aw– 2 2 -----------------------= Ωl t Aw–( ) T t Aw–( )⋅ 2 ------------------------------------------------- t T t t T Aw– w T A T t– w T A T Aw+ 2 ---------------------------------------------------------------------------- t T t 2t T Aw– w T A T Aw+ 2 --------------------------------------------------------- = = = gl w T d dΩ l = t T A t T A t T Aw x1w1 x2w2 … xnwn+ + += w1d d t T Aw … w2d d t T Aw x1 … xn t T A( ) T A T t= = =
  • 10. Resolutions 182 Artificial Neural Networks Concentrating now on the derivative of the last term, is a square symmetric matrix. Let us consider a 2*2 matrix denoted as C: Then the derivative is just: As , we finally have: Putting all together, , is: . b) The minimum of is given by . Doing that, we stay with: . c) Consider an augmented matrix A, , and an augmented vector t, . Then: A T A A T A w T A T Aw w T Cw w1 w2 C1 1, C1 2, C2 1, C2 2, w1 w2 w1C1 1, w2C2 1,+ w1C1 2, w2C2 2,+ w1 w2 w1 2 C1 1, w1w2C2 1, w2w1C1 2, w2 2 C2 2,+ + + = = = = w T d d w T A T Aw 2w1C1 1, w2C2 1, w2C1 2,+ + w1C2 1, w1C1 2, 2w2C2 2,+ + = C1 2, C2 1,= w T d d w T A T Aw 2w1C1 1, 2w2C2 1,+ 2w1C2 1, 2w2C2 2,+ 2Cw= = gl w T d dΩ l = gl A T t– A T Aw+ A T t– A T y+ A T e–= = = Ω l g 0= 0 A T t– A T Aw w + A T A( ) 1– A T t = = A A λI = t t 0 =
  • 11. Chapter 2 Artificial Neural Networks 183 Then, all the results above can be employed, by replacing A with and t with . Therefore, the gradient is: . Notice that the gradient can also be formulated as the negative of the product of the transpose of the Jacobean and the error vector: , where: , . The optimum is therefore: . 16. a) Please see Section 2.1.3.1. b) The error back-propagation is a computationally efficient algorithm, but, since it implements a steepest descent method, it is unreliable and can have a very slow rate of convergence. Also it s difficult to select appropriate values of the learning parameter. For more details please see Section 2.1.3.3 and Section 2.1.3.4. The problem related with lack of convergence can be solved by incorporating a line- search algorithm, to guarantee that the training criterion does not increase in any iter- ation. To have a faster convergence rate, second-order methods can be used. It is proved in Section 2.1.3.5 that the Levenberg-Marquardt algorithm is the best tech- nique to use, which does not employ a learning rate parameter. t Aw– 2 2 ----------------------- t T t 2t T Aw– w T A T Aw+ 2 --------------------------------------------------------- t T t 2t T Aw– w T A T A λI+( )w+ 2 -------------------------------------------------------------------------- t T t 2t T Aw– w T A T Aw λw T w+ + 2 ------------------------------------------------------------------------------ t Aw– 2 λ w 2 + 2 -------------------------------------------- φl = = = = = A t gl φ A T t– A T Aw+ A T t– A T A λI+( )w+ A T t– A T A( )w λw+ + gl λw+ = = = = gl φ A T t– A T Aw+ A T t– A T y+ A T e–= = = e t y–= y Aw A λI w y λw = = = 0 A T t– A T A λI+( )w wˆ + A T A λI+( ) 1– A T t = =
  • 12. Resolutions 184 Artificial Neural Networks 17. The sigmoid function is covered in Section 1.3.1.2.4 and the hyperbolic tangent function in Section 1.3.1.2.51 . Notice that these functions are related as: . The advantages of using an hyperbolic tangent function over a sigmoid function are: 1.The hyperbolic function generates a better conditioned model. Notice that a MLP with a linear function in the output layer has always a column of ones in the Jacobean matrix (related with the output bias). As the Jacobean columns related with the weights from the last hidden layer to the output layer are a linear function of the outputs of the last hidden layer, and as the mean of an hyperbolic tangent function is 0, while the mean of a sigmoid function is 1/2, in this latter case those Jacobean columns are more correlated with the Jacobean column related with the output bias; 2.The derivative of the sigmoid function lies between , its expected value considering an uniform probability density function at the output of the node is 1/ 6. For a hyperbolic tangent function, its derivative lies within and its expected value is 2/3. When we compute the Jacobean matrix, one of the factors involved in the computation is (see (2.42) ). Therefore, in comparison with weights related with the linear output layer, the columns of the Jacobean matrix related with the nonlinear layers appear “squashed” of a mean factor of 1/6, for the sigmoid function, and a factor of 2/3, for the hyperbolic tangent function. This “squashing” is translated into smaller eigenvalues, which itself is translated into a slow rate of convergence, as the rate of convergence is related with the smaller eigenvalues of the normal equation matrix (see Section 2.1.3.3.2). As this “squashing” is smaller for the case of the hyperbolic tangent function, than a net- work with these activation functions has potentially a faster rate of convergence. 18. We shall start by the pH problem. Using the same topology ([4 4 1]) and the same initial values, the only difference in the code is to change, in the Matlab file ThreeLay.m, the instructions: Y1=ones(np,NNP(1))./(1+exp(-X1)); Der1=Y1.*(1-Y1); Y2=ones(np,NNP(2))./(1+exp(-X2)); Der2=Y2.*(1-Y2); by the following instructions: Y1=tanh(X1); 1. If we consider , f1 x( )( ) f2 x( )( ) f x( ) tahn x( )= f' x( ) 1 x( )tanh 2 – 1 f x( ) 2 –= = f2 x( ) 2f1 2x( ) 1–= 0 0.25,[ ] 0 1,[ ] ∂Oi ., z 1+( ) ∂Neti ., z 1+( ) -------------------------
  • 13. Chapter 2 Artificial Neural Networks 185 Der1=1-Y1.^2; Y2=tanh(X2); Der2=1-Y2.^2; then, the following results are obtained using BP.m: Comparing these results with the ones shown in fig. 2.18 , it can be seen that a better accuracy has been obtained. 0 10 20 30 40 50 60 70 80 90 100 0 1 2 3 4 5 6 7 Iteration ErrorNorm neta=0.005 neta=0.001
  • 14. Resolutions 186 Artificial Neural Networks Addressing now the Inverse Coordinate problem, using the same topology ([5 1]) and the same initial values, and changing the instructions only related with layer 1 (see above) in TwoLayer.m, the following results are obtained: Again, better accuracy results are obtained using the hyperbolic tangent function (compare this figure with fig. 2.23 ). It should be mentioned that smaller learning rates than the ones used with the sigmoid function had to be applied, as the training process diverged. 19. The error back-propagation is a computationally efficient algorithm, but, since it implements a steepest descent method, it is unreliable and can have a very slow rate of convergence. Also it s difficult to select appropriate values of the learning parameter. The Levenberg-Marquardt methods is the “state-of-the-art” technique in non-linear least-squares problems. It guarantees convergence to a local minimum, and usually the rate of convergence is second-order. Also, it does not require any user-defined parameter, such as learning-rate. Its disadvantage is that, computationally, it is a more demanding algorithm. 20. Please see text in Section 2.1.3.6. 21. Please see text in Section 2.1.3.4. 22. Use the following Matlab code: x=randn(10,5); % Matrix with 10*5 random elements following a normal distribution cond(x); %The condition number of the original matrix 0 10 20 30 40 50 60 70 80 90 100 0 2 4 6 8 10 12 14 16 18 Iteration ErrorNorm neta=0.005 neta=0.001
  • 15. Chapter 2 Artificial Neural Networks 187 Now use the following code: alfa(1)=10; for i=1:3 x1=[x(:,1:4)/alfa(i) x1(:,5)]; %The first four columns are reduced of alfa c(i)=cond(x1); alfa(i+1)=alfa(i)*10; % alfa will have the values of 10, 100 and 1000 end If now, we compare the ratio of the condition numbers obtained ( c(2)/c(1) and c(3)/ c(2) ), we shall see that they are 9.98 and 9.97, very close to the factor 10 that was used. 23. Use the following Matlab code: for i=1:100 [W,Ki,Li,Ko,Lo,IPS,TPS,cg,ErrorN,G]=MLP_initial_par([5 1],InpPat,TarPat,2); E(i)=ErrorN(2); c(i)=cond(G); end This will generate 100 different initializations of the weight vector, with the weights in the linear output layer computed as random values. Afterwards use the following Matlab code: for i=1:100 [W,Ki,Li,Ko,Lo,IPS,TPS,cg,ErrorN,G]=MLP_initial_par([5 1],InpPat,TarPat,1); E1(i)=ErrorN(2); c1(i)=cond(G); end This will generate 100 different initializations of the weight vector, with the weights in the linear output layer computed as the least-square values. Finally, use the Matlab code:
  • 16. Resolutions 188 Artificial Neural Networks for i=1:100 W=randn(1,21); [Y,G,E,c]=TwoLayer(InpPat,TarPat,W,[5 1]); E2(i)=norm(TarPat-Y); c2(i)=cond(G); end The mean results obtained are summarized in the following table: 24. Let us consider first the input. Determining the net input of the first hidden layer: This way, each row within the first k1 lines of W(1) appears multiplied by each ele- ment of the diagonal of ki, while for each element of the last row (related with the bias) a quantity is added, which is the dot product of the diagonal elements of li and each column of the first k1 lines of W(1). Let us address now the output. Table 4.1 - Mean values of the Initial Jacobean condition number and error norm Method Jacobean Condition Number Initial Error Norm MLP_init (random values for linear weights) 1.6 106 15.28 MLP_init (optimal values for linear weights) 3.9 106 1.87 random values 2.8 1010 24.11 Net 2( ) IPs I W 1( ) IP ki⋅ I m k1×( ) li⋅+ | I m 1×( ) W1…K1 ., 1( ) - WK1 1+ ., 1( ) IP kiW1…K1 ., 1( ) ⋅ I m k1×( ) liW1…K1 ., 1( ) I m 1×( )WK1 1+ ., 1( ) +⋅+ = = =
  • 17. Chapter 2 Artificial Neural Networks 189 This is, the weights connecting the last hidden neurons with the output neuron appear divided by ko, and the bias is first subtracted of lo, and afterwards divided by ko. 25. The results presented below should take into account that in each iteration of Train_MLPs.m a new set of initial weight values is generated, and therefore, no run is equal. Those results were obtained using the Levenberg-Marquardt methods, minimizing the new criterion. For the early-stopping method, a percentage of 30% for the validation set was employed. In terms of the pH problem, employing a termination criterion of 10-3 , the following results were obtained: In terms of the Coordinate Transformation problem, a termination criterion of 10-5 was employed. The following results were obtained: Table 4.2 - Results for the pH problem Regularization Parameter Error Norm Linear Weight Norm Number of Iterations Error Norm (Validation Set) 0 0.021 80 20 0.003 10-6 0.016 5.3 17 0.015 10-4 0.033 7.3 15 0.026 10-2 0.034 9.4 38 0.018 early-stopping 0.028 21 24 0.02 Table 4.3 - Results for the Coordinate Transformation problem Regularization Parameter Error Norm Linear Weight Norm Number of Iterations Error Norm (Validation Set) 0 0.41 17.5 49 0.39 10-6 0.99 2.3 45 0.91 Os Oko Ilo+ Net q 1–( ) | I w1…kq 1– q 1–( ) - wkq 1– 1+ q 1–( ) O 1 ko ----- Net q 1–( ) w1…kq 1– q 1–( ) Iwkq 1– 1+ q 1–( ) Ilo–+( ) Net q 1–( ) w1…kq 1– q 1–( ) ko -------------------⋅ I wkq 1– 1+ q 1–( ) lo– ko -----------------------------       + = = = = =
  • 18. Resolutions 190 Artificial Neural Networks The results presented above show that, only in the 2nd case, the early-stopping tech- nique achieves better generalization results than the standard technique, with or with- out regularization. Again, care should be taken in the interpretation of the results, as in every case different initial values were employed. 26. For both cases we shall use as termination criterion of 10-3. The Matlab files can be extracted from Const.zip. The results for the pH problem can be seen in the following figure: There is no noticeable decrease in the error norm after 5 hidden neurons. Networks with more than 5 neurons exhibit the phenomenon of overmodelling. If a MLP with 10 neurons is constructed using the Matlab function Train_MLPs.m, the error norm obtained is 0.086, while with the constructive method we obtain 0.042. The results for the Inverse Coordinate Problem can be seen in the following figure: 10-4 1.28 2.5 20 0.93 10-2 0.5 10.6 141 0.39 early-stopping 0.38 40 119 0.24 Table 4.3 - Results for the Coordinate Transformation problem Regularization Parameter Error Norm Linear Weight Norm Number of Iterations Error Norm (Validation Set) 1 2 3 4 5 6 7 8 9 10 0 0.5 1 1.5 2 2.5 Number of nonlinear neurons ErrorNorm
  • 19. Chapter 2 Artificial Neural Networks 191 As it can be seen, after the 7th neuron, there is no noticeable improvement in the accuracy. For this particular case, models with more than 7 neurons exhibit the phe- nomenon of overmodelling. If a MLP with 10 neurons is constructed using the Mat- lab function Train_MLPs.m, the error norm obtained is 0.086, while with the constructive method we obtain 0.097. It should be mentioned that it can be seen that the strategy employed in this construc- tive method lends to bad initial models, with a number of neurons greater than, let us say, 5. 27. The instantaneous autocorrelation matrix is given by: . The eigenvalues and eigenvectors of satisfy the equation: . Replacing the previous equation in the last one, we have: As the product is a scalar, then this corresponds to the eigenvalue, and is the eigenvector. 28. After adaptation with the LMS rule, the a posteriori output of the network, , is 1 2 3 4 5 6 7 8 9 10 0 0.5 1 1.5 2 2.5 Number of nonlinear neurons ErrorNorm R k[ ] a k[ ]a’ k[ ]= R k[ ] Re λe= a k[ ]a’ k[ ]e λe= a’ k[ ]e a k[ ] y k[ ]
  • 20. Resolutions 192 Artificial Neural Networks given by: , where the a posterior error , is defined as: . For a non-null error, the following relations apply: 29. The following figure illustrates the results obtained with the NLMS rule, for the y k[ ] a T k[ ]w k[ ] a T k[ ]w k 1–[ ] a T k[ ]ηe k[ ] a k[ ]( )+ δ a k[ ] 2 yˆ k[ ] 1 η a k[ ] 2 –( )y k[ ]+ = = = e k[ ] e k[ ] yˆ k[ ] y k[ ]– 1 η a k[ ] 2 –( )e k[ ]= = e k[ ] e k[ ] if η 0 2,( ) a k[ ] 2 ⁄[ ]∉( ) e k[ ] e k[ ] if= η 0= or η 2 a k[ ] 2 ⁄ e k[ ] e k[ ] if η 0 2,( ) a k[ ] 2 ⁄[ ]∈( ) e k[ ] < = > 0 if η 1 a k[ ] 2 ⁄= =
  • 21. Chapter 2 Artificial Neural Networks 193 Coordinate Inversion problem, when . Learning is stable in all cases, and the rate of convergence is almost independent of the learning rate employed. If we employ a learning rate (2.001) slightly larger than the stable domain, we obtain unstable learning: η 0.1 1 1.9, ,{ }= 0 200 400 600 800 1000 1200 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 MSE Iterations 0 200 400 600 800 1000 1200 0 5 10 15 20 25 30 35 40 45 Iterations MSE
  • 22. Resolutions 194 Artificial Neural Networks Using now the standard LMS rule, the following results were obtained with . Values higher than 0.5 result in unstable learning. In terms of convergence rate, the methods produce similar results. The NLMS rule enables to guarantee convergence within the domain . 30. We shall consider first the pH problem. The average absolute error, after off-line training (using the parameters stored in Initial_on-pH.mat) is: . Using this value in , η 0.1 0.5,{ }= 0 200 400 600 800 1000 1200 0 0.5 1 1.5 2 2.5 Iterations MSE η 0 2,[ ]∈ ς E e n k[ ][ ] 0.0014= = e d k[ ] 0 if e k[ ] ς≤( ) e k[ ] ς if e k[ ] ς–<( )+ e k[ ] ς if e k[ ] ς>( )–     =
  • 23. Chapter 2 Artificial Neural Networks 195 the next figure shows the MSE value, for the last 10 (out of 20 passes) of adaptation, using the NLMS, with . The results obtained with the LMS rule, with , are shown in the next figure. η 1= 1000 1200 1400 1600 1800 2000 2200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 10 -3 Iterattions(Last 10 passes) MSE With Dead-Zone Without Dead-Zone η 0.5= 1000 1200 1400 1600 1800 2000 2200 0 1 2 3 4 5 6 7 8 x 10 -4 Iterattions(Last 10 passes) MSE With Dead-Zone Without Dead-Zone
  • 24. Resolutions 196 Artificial Neural Networks Considering now the Coordinate Transformation problem, the average absolute error, after off-line training is: . Using this value, the NLMS rule, with , produces the following results: The above figure shows the MSE in the last pass (out of 20) of adaptation. Using now the LMS rule, with , we obtain: ς E e n k[ ][ ] 0.027= = η 1= 1900 1920 1940 1960 1980 2000 2020 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 Iterattions(Last pass) MSE With Dead-Zone Without Dead-Zone η 0.5= 1900 1920 1940 1960 1980 2000 2020 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 Iterattions(Last pass) MSE With Dead-Zone Without Dead-Zone
  • 25. Chapter 2 Artificial Neural Networks 197 For all the cases, better results are obtained with the inclusion of an output dead-zone in the adaptation algorithm. The main problem is to determine the dead-zone parame- ter, in real situations. 31. Considering the Coordinate Inverse problem, using the NLMS rule, with , the results obtained with a worse conditioned model (weights in Initial_on_CT_bc.m), compared with the results obtained with a better conditioned model (weights in Initial_on_CT.m), are represented in the next figure: Regarding now the pH problem, using the NLMS rule, with , the results obtained with a worse conditioned model (weights in Initial_on_pH_bc.m), com- η 1= 0 500 1000 1500 2000 2500 0 0.5 1 1.5 2 2.5 Iterattions MSE Worse conditioned Better conditioned η 1=
  • 26. Resolutions 198 Artificial Neural Networks pared with the results obtained with a better conditioned model (weights in Initial_on_pH.m), are represented in the next figure It is obvious that a better conditioned model achieves a better adaptation rate. 32. We shall use the NLMS rule, with , in the conditions of Ex. 2.6. We shall start the adaptation from 4 different initial conditions: . The next figure illustrates the evolution of the adaptation, with 10 passes over the training set. The 4 different adaptations converge to a small area, indicated by the green colour in the 0 500 1000 1500 2000 2500 0 0.05 0.1 0.15 0.2 0.25 Iterations MSE Worse conditioned Better conditioned η 1= wi 10±= i, 1 2,=
  • 27. Chapter 2 Artificial Neural Networks 199 figure. If we zoom this small area, we can see that . In the first example ( ), it enter this are in iteration 203, in the second example ( ), it enter this are in iteration 170, in the third case ( ), it enter this are in iteration 204, and in the fourth case( ), it enter this domain in iteration 170. This is shown in the next figure. w1 0 0.75,[ ]∈ w2 0.07– 1,[ ]∈, w 1[ ] 10 10,[ ]= w 1[ ] 10 10–,[ ]= w 1[ ] 10– 10–,[ ]= w 1[ ] 10– 10,[ ]=
  • 28. Resolutions 200 Artificial Neural Networks This domain, where, after being entered, the weights never leave and never settle, is called the minimal capture zone. If we compare the evolution of the weight vector, starting from , with or without dead-zone, we obtain the following results: The optimal values of the weight parameters, in the least squares sense, are given by: , where x and y are the input and target data, obtaining an optimal MSE of 0.09. The dead-zone parameter employed was . 33. Assuming an interpolation scheme, the number of basis is equal to the number of patterns. This way, your network has 100 neurons in the hidden layer. The centers of the network are placed in the input training points. So, if the matrix of the centers is denoted as C, then C=X. With respect to the spreads, as nothing is mentioned, you can employ the most standard scheme, which is equal spreads of value , where dmax is the maximum distance between the centers. With respect to the linear output weights, they are the optimal values, in the least squares sense, that is, , where G is the matrix of the outputs of the hidden neurons. The main problem with the last scheme is that the network grows as the training set grows. This results in ill-conditioning of matrix G, or even singularity. For this rea- son, an approximation scheme, with the number of neurons strictly less than the number of patterns is the option usually taken. w 1[ ] 10 10–,[ ]= 0 500 1000 1500 2000 2500 -10 -8 -6 -4 -2 0 2 4 6 8 10 Iterations w1,w2 wˆ x 1 + y 0 0.3367= = ς max e n k[ ][ ] 0.663= = σ dmax 2m1 --------------= wˆ G + t=
  • 29. Chapter 2 Artificial Neural Networks 201 34. The k-means clustering algorithm places the centres in regions where a significant number of examples is presented. The algorithm is: 35. We shall use, as initial values, the data stored in winiph_opt.m and winict_opt.m, for the pH and the Coordinate inverse problems, respectively. The new criterion, and the Levenberg-Marquardt will be used in all these problems. a) With respect to the pH problem, the application of the termination criterion ( ), is expressed in the next table: Table 4.6 - Standard Termination Method Number of Iterations Error Norm Linear Weight Norm Condition of Basis Functions LM (New Criterion) 5 0.0133 7.8 104 1.4 107 1.Initialization - Choose random values for the centres; they must be all different 2.For j=1 to n 2.1.Sampling - Find a sample vector x from the input matrix 2.2.Similarity matching - Find the centre closest to x. Let its index be k(x): (4.4) 2.3.Updating - Adjust the centres of the radial basis func- tions according to: (4.5) 2.4.j=j+1 end ) mink x k( ) cj i[ ]– 2 arg= j 1 …,=, 1+ ] cj i[ ] η x k( ) cj i[ ]–( )+ j k(=, cj i[ ] otherwise,      = τ 10 4– =
  • 30. Resolutions 202 Artificial Neural Networks With respect to the Coordinate Inverse problem, the application of the termination criterion ( ), is expressed in the next table: b) With respect to the pH problem, the application of the termination criterion ( ), to the LM method, minimizing the new criterion, and using an early- stopping method (the Matlab function gen_set.m was applied with a percentage of 30%), gives the following results: The second line represents the results obtained, for the same estimation and valida- tion sets, using the parameters found by the application of the regularization tech- nique (unitary matrix) to all the training set. It can be seen that the same accuracy is obtained for the validation set, with better results in the estimation set. With respect to the Coordinate Inverse problem, the application of the termination criterion ( ), to the LM method, minimizing the new criterion, and using an early-stopping method (the Matlab function gen_set.m was applied with a percentage of 30%), gives the following results: The results presented below show that using all the training data with the regulariza- tion method, a better result was obtained for the validation set, although a worse result was obtained for the estimation set. Table 4.7 - Standard Termination Method Number of Iterations Error Norm Linear Weight Norm Condition of Basis Functions LM (New Criterion) 18 0.19 5.2 108 2.5 1016 Table 4.8 - Early-stopping Method Method Number of Iterations Error Norm (Est. set) Error Norm (Val. set) Linear Weight Norm Condition of Basis Functions Early-Stop- ping 6 0.0075 0.0031 8.7 104 1.7 107 19 0.0044 0.0029 15.4 1.6 103 Table 4.9 - Early-stopping Method Method Number of Iterations Error Norm (Est. set) Error Norm (Val. set) Linear Weight Norm Condition of Basis Functions Early-Stop- ping 77 0.1141 0.1336 1.3 108 4.6 1014 19 0.1693 0.1047 168 3.7 1014 τ 10 3– = τ 10 4– = λ 10 6– = τ 10 3– = λ 10 6– =
  • 31. Chapter 2 Artificial Neural Networks 203 c) With respect to the pH problem, the application of the termination criterion ( ), to the LM method, minimizing the new criterion, is expressed in the next table: With respect to the Coordinate Inverse problem, the application of the termination criterion ( ), to the LM method, minimizing the new criterion, is expressed in the next table: d) With respect to the pH problem, the application of the termination criterion ( ), to the LM method, minimizing the new criterion, is expressed in the next table: Table 4.10 - Explicit Regularization (I) Method Number of Iterations Error Norm Linear Weight Norm Condition of Basis Functions 19 0.0053 15.4 1.6 103 25 0.0247 9.75 3.8 104 83 0.044 2.07 4.5 103 Table 4.11 - Explicit Regularization (I) Method Number of Iterations Error Norm Linear Weight Norm Condition of Basis Functions 100 0.199 168 3.7 1014 100 0.4039 29 1.2 1015 100 0.9913 9.9 3.8 1017 Table 4.12 - Explicit Regularization (G0) Method Number of Iterations Error Norm Linear Weight Norm Condition of Basis Functions 43 0.0132 3.6 104 5.6 106 17 0.0196 633 1.2 105 150 0.0539 38 1.3 106 τ 10 4– = λ 10 6– = λ 10 4– = λ 10 2– = τ 10 3– = λ 10 6– = λ 10 4– = λ 10 2– = τ 10 4– = λ 10 6– = λ 10 4– = λ 10 2– =
  • 32. Resolutions 204 Artificial Neural Networks With respect to the Coordinate Inverse problem, the application of the termination criterion ( ), to the LM method, minimizing the new criterion, is expressed in the next table: 36. The generalization parameter is 2, so there are 2 overlays. a) FIGURE 4.66 - Overlay diagram, with There are cells within the lattice. There are 18 basis functions within the network. At any moment, only 2 basis functions are active in the network. Table 4.13 - Explicit Regularization (G0) Method Number of Iterations Error Norm Linear Weight Norm Condition of Basis Functions 100 0.49 91 4 1015 100 0.3544 27 5.2 1015 100 1.229 11.5 2.4 1018 τ 10 3– = λ 10 6– = λ 10 4– = λ 10 2– = a3a1 Input lattice 1st overlay d1=(1,1) 2nd overlay d2=(2,2) a2 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16 a17 a18 ρ 2= p’ ri 1+( ) i 1= n ∏ 5 2 25= = =
  • 33. Chapter 2 Artificial Neural Networks 205 b) Analysing fig. 4.66 , we can see that as the input moves along the lattice one cell parallel to an input axis, the number of basis functions dropped from, and intro- duced to, the output calculation is a constant (1) and does not depend on the input. c) A CMAC is said to be well defined if the generalization parameter satisfies: (4.14) 37. The decomposition of the basis functions into overlays demonstrates that the number of basis functions increases exponentially with the input dimension. The total number of basis functions is the sum of basis functions in each overlay. This number, in turn, is the product of the number of univariate basis functions on each axis. These have a bounded support, and therefore there are at least two defined on each axis. Therefore, a lower bound for the number of basis functions for each overlay, and subsequently, for the AMN, is 2n . These networks suffer therefore from the curse of dimensionality. In B-splines, this problem can be alleviated decomposing a multidimensional network into a network composed of additive sub-networks of smaller dimensions. An algorithm to perform this task is the ASMOD algorithm. 38. The network has 4 inputs and 1 output. a) The network can be described as: . The number of basis functions for each sub-network is given by: . Therefore, we have basis functions for the overall network. In terms of active basis functions, we have , where n is the number of sub-networks, ni is the number of inputs for sub-network i, and kj,i is the B-spline order for the jth dimension of the ith sub-network. For this case, . ρ maxi ri 1+(≤ ≤ ρ f1 x1( ) f2 x2( ) f3 x3( ) f4 x3 x,(+ + += p ri ki+( ) i 1= n ∏= 5 2+( ) 4 2+( ) 3 2+( ) 4 3+( ) 2 + + + 18 49+= = p’’ kj i, j 1= ni ∏ i 1= n ∑= 2 2 2 3 3×+ + + 1= =
  • 34. Resolutions 206 Artificial Neural Networks b) The ASMOD algorithm can be described as: Algorithm 4.1 - ASMOD algorithm Each main part of the algorithm will be detailed below. •Candidate models are generated by the applications of a refinement step, where the complexity of the current model is increased, and a pruning set, where the current model is simplified, in an attempt to determine a simpler method that performs as well as the current model. Note that, in the majority of the cases, the latter step does not generate candidates that are eager to proceed to the next iteration. Because of this, this set is often applied after a certain number of refinement steps, or just applied to the optimal model resulting from the ASMOD algorithm, with just refinement steps. Three methods are considered for model growing: 1.For every input variable not presented in the current network, a new univariate sub-model is introduced in the network. The spline order and the number of interior knots is specified by the user, and usually 0 or 1 interior knots are applied; 2.For every combination of sub-models presented in the current model, combine them in a multivariate network with the same knot vector and spline order. Care must be taken in this step to ensure that the complexity (in terms of weights) of the final model does not overpass the size of the training set; 3.For every sub-model in the current network, for every dimension in each sub- model split every interval in two, creating therefore candidate models with a complexity higher of 1. For network pruning, also three possibilities are considered: 1.For all univariate sub-models with no knots in the interior, replace them by a spline of order k-1, also with no interior knots. If k-1=1, remove this sub-model from the network, as it is just a constant; i = 1; termination criterion = FALSE; WHILE NOT(termination criterion) Generate a set of candidate networks; Estimate the parameters for each candidate network; Determine the best candidate, , according to some crite- rion J; IF termination criterion = TRUE END; i=i+1; END mi 1– Initial Mode= Mi mi mi( ) J mi 1–(≥
  • 35. Chapter 2 Artificial Neural Networks 207 2.For every multivariate (n inputs) sub-models in the current network, split them into n sub-models with n-1 inputs; 3.For every sub-model in the current network, for every dimension in each sub- model, remove each interior knot, creating therefore candidate models with a complexity smaller of 1. 39. Recall Exercise 1.3. Consider that no interior knots are employed. Therefore, a B- spline of order 1 is given by: (4.15) The output corresponding to this basis function is therefore: , (4.16) which means that with a sub-model which is a B-spline of order 1, any constant term can be obtained. Consider now a spline of order 2. It is defined as: (4.17) It is easy to see that (4.18) For our case, (4.19) The outputs corresponding to these basis functions are simply: N1 1 x( ) 1 x I1∈, 0 x I1∉,      = N1 1 x( )) w1 x I∈, 0 x I1∉,      = N2 j x( ) x λj 2–– λj 1– λj 2–– -----------------------------     N1 j 1– x( ) λj x– λj λj 1–– ----------------------     N1 j x( )+=     j, 1 2,= N2 1 x( ) λ1 x– λ1 λo– -----------------    = x I∈, N2 2 x( ) x λ0– λ1 λ0– -----------------    = x I∈, N2 1 x1( ) 1 x1–( )= x1 I∈, N2 2 x1( ) x1= x1 I1∈, N2 1 x2( ) 1 x2– 2 --------------    = x2 I∈, N2 2 x2( ) x2 1+ 2 --------------    = x2 I∈,
  • 36. Resolutions 208 Artificial Neural Networks (4.20) Therefore, we can construct the functions 4x1 and -2x2 just by setting w2=0, w3=4, and w4=4 w5=0. Note that this is not the only solution. Using this solution, note that , which means that we must subtract 2, in order to get -2x2. Consider now a bivariate sub-model, of order 2. As we know, bivariate B-splines are constructed from univariate B-splines using: (4.21) We have now 4 basis functions: (4.22) These are equal to: (4.23) Therefore, the corresponding output is: N2 1 x1( )( ) w2 1 x1–( )= x1 I∈, y N2 2 x1( )( ) w3x1 = x1 I1∈, N2 1 x2( )( ) w4 1 x2– 2 --------------    = x2 I∈, N2 2 x2( )( ) w5 x2 1+ 2 --------------    = x2 I∈, N2 1 x2( )( ) 2 2x–= Nk j x( ) Nki i, j xi( ) i 1= n ∏= 2 1 x1 x2,( ) 1 x1–( ) 1 x2– 2 --------------    = x1 I1∈ x2 ∈, , N2 2, 2 x1 x2,( ) x1 x2 1+ 2 --------------    = x1 I1∈ x2 I1∈, , N2 2, 3 x1 x2,( ) x1 1 x2– 2 --------------    = x1 I1∈ x2 I1∈, , 2 4 x1 x2,( ) 1 x1–( ) x2 1+ 2 --------------    = x1 I1∈ x2 ∈, , 2 2, 1 x1 x2,( ) 1 x1– x2– x1x2+ 2 ------------------------------------------    = x1 I1∈ x2 ∈, , N2 2, 2 x1 x2,( ) x1x2 x1+ 2 ----------------------    = x1 I1∈ x2 I1∈, , N2 2, 3 x1 x2,( ) x1x2 x1+ 2 ----------------------    –= x1 I1∈ x2 I1∈, , 2 4 x1 x2,( ) 1 x1– x2 x1– x2+ + 2 ---------------------------------------------    –= x1 I1∈ x2 ∈, ,
  • 37. Chapter 2 Artificial Neural Networks 209 (4.24) The function 0.5x1x2 can be constructed in many ways. Consider w6=w8=w9=0 and w7=1. Therefore , which means that we must subtract x1/2 from the output to get 0.5x1x2. This means that we should not design 4x1, but 7/2x1, therefore setting w3=7/2. To summarize, we can design a network implementing the function by employing 4 sub-networks, all with zero interior knots: 1.A univariate sub-network (input x1 or x2, it does not matter) of order 1, with w1=1; 2.A univariate sub-network with input x1, order 2, with w1=0 and w3=7/2; 3.A univariate sub-network with input x2, order 2, with w4=4 and w5=0; 4.A bivariate sub-network with inputs x1 and x2, order 2, with w6=w8=w9=0 and w7=1. 40. a) The Matlab functions in Asmod.zip were employed to solve this problem. First, gen_set.m was employed, to split the training sets between estimation and valida- tion sets, with a percentage of 30% for the latter. Then Asmod was employed, with the termination criterion formulated as: the training stopped if the MSE for the val- idation set increased constantly in the last 4 iterations, or the standard ASMOD termination was found. In the following tables, the first row illustrates the results obtained with this approach. The second row illustrates the application of the model obtained with the standard ASMOD, trained using all the training set, to the estimation and validation sets used in the other approach. Concerning the pH problem, the following results were obtained: Table 4.25 - ASMOD Results - Early-Stopping versus complete training (pH problem) MSEe MSREe MSEv MREv Compl. Wei. N. 8.6 10-9 5.9 10-7 4.3 10-6 0.034 42 3.7 1.4 10-31 8.5 10-31 1.5 10-31 2.2 10-30 101 5.8 N2 2, 1 x1 x2,( )) w6 1 x1– x2– x1x2+ 2 ------------------------------------------    = x1 I1∈ x2 ∈, , y N2 2, 2 x1 x2,( )( ) w7 x1x2 x1+ 2 ----------------------    = x1 I1∈ x2 I1∈, , y N2 2, 3 x1 x2,( )( ) w8 x1x2 x1+ 2 ----------------------    –= x1 I1∈ x2 I1∈, , 2 2, 4 x1 x2,( )) w9 1 x1– x2 x1– x2+ 2 ---------------------------------------    –= x1 I1∈ x2, , 2 2, 2 x1 x2,( )) x1x2 x1+ 2 ----------------------    = x1 I1∈ x2 ∈, , f x1 x2,( ) 3 4x1 2x2– 0.5x1x2+ +=
  • 38. Resolutions 210 Artificial Neural Networks Concerning the Coordinate Transformation problem, the following results were obtained: For both cases, the MSE for the validation set is much lower if the training is per- formed using all the data. b) The Matlab functions in Asmod.zip were employed to solve this problem. Differ- ent values of the regularization parameter were employed, Concerning the pH problem, the following table summarizes the results obtained: Concerning the Coordinate Transformation problem, the following table summarizes the results obtained: For both cases, an increase in the regularization parameter decreases the MSE, decreases the complexity and the linear weight norm. c) To minimize the MSRE, we can apply the following strategy: the training criterion can be changed to: . This is equivalent to: Table 4.26 - ASMOD Results - Early-Stopping versus complete training (CT problem) MSEe MSREe MSEv MREv Compl. Wei. N. 2.7 10-4 7.7 109 15 10-3 9.7 1012 36 5.5 1.4 10-5 4.9 105 1.6 10-5 2.3 105 65 9.6 Table 4.27 - ASMOD Results - Different regularization values (pH problem) Reg. factor MSE Criterion Complexity Weight Norm N. Candidates N. Iterations 1.4 10-31 -6,705 101 5.82 9945 101 3.2 10-6 -1,190 17 2.25 341 19 3.9 10-9 -1,673 61 4.36 3569 61 3.6 10-13 -2,440 98 5.71 11056 107 Table 4.28 - ASMOD Results - Different regularization values (CT problem) Reg. factor MSE Criterion Complexity Weight Norm N. Candidates N. Iterations 1.5 10-5 -916 65 9.6 1043 30 3.7 10-5 -831.7 64 4.18 921 26 1.7 10-5 -918.5 61 5.1 1264 33 1.5 10-5 -915.7 65 9.1 1067 30 λ 0= λ 10 2– = λ 10 4– = λ 10 6– = λ 0= λ 10 2– = λ 10 4– = λ 10 6– = ti yi– ti -------------     2 i 1= n ∑ ti 0≠,
  • 39. Chapter 2 Artificial Neural Networks 211 , or, in matrix form: , where T is a diagonal matrix with the values of the target vector in the diagonal, and 1 is a vector of ones. As y is a linear combination of the outputs of the basis functions, A, we can employ, to determine the optimal weights: . Using this strategy, we compare the results obtained by the ASMOD algorithm, in terms of the MSE and MSRE, using regularization or not, with the standard criterion. The first 4 rows show the results obtained by the ASMOD algorithm, in terms of the MSE criterion, and the last four rows the MSRE. The Matlab functions in Asmod.zip were employed to solve this problem. Concerning the pH problem, the following table summarizes the results obtained: Concerning the Coordinate Transformation problem, the following table summarizes the results obtained: Table 4.29 - ASMOD Results - MSE versus MSRE (pH problem) Reg. factor MSE MSRE Criterion Complexity Weight Norm N. Cand. N. Iterations 1.4 10-31 1.2 10-30 -6,705 101 5.82 9945 101 3.2 10-6 2.7 10-5 -1,190 17 2.25 341 19 3.9 10-9 4.2 10-7 -1,673 61 4.36 3569 61 3.6 10-13 1.5 10-12 -2,440 98 5.71 11056 107 1.9 10-10 1.1 10-30 -6,427 101 5.81 9699 99 9.5 10-7 2.1 10-6 -1,173 29 2.54 989 34 1.3 10-9 1.7 10-9 -1,652 79 4.75 6723 84 2.2 10-10 2.2 10-13 -2,462 98 5.71 11222 108 Table 4.30 - ASMOD Results - MSE versus MSRE (CT problem) Reg. factor MSE MSRE Criterion Complexity Weight Norm N. Cand. N. Iterations 1.5 10-5 6.2 105 -916 65 9.6 1043 30 3.7 10-5 5.1 109 -831.7 64 4.2 921 26 1.7 10-5 9.9 107 -918.5 61 5.1 1264 33 1.5 10-5 6.7 105 -915.7 65 9.1 1067 30 2.5 10-5 3.2 10-6 -869 111 43.14 1273 38 4.2 10-5 9.9 10-6 -924 65 3.7 495 19 1 yi ti ----–     2 i 1= n ∑ ti 0≠, 1 T 1– y– 2 wˆ T 1– A( ) + 1= λ 0= λ 10 2– = λ 10 4– = λ 10 6– = λ 0= λ 10 2– = λ 10 4– = λ 10 6– = λ 0= λ 10 2– = λ 10 4– = λ 10 6– = λ 0= λ 10 2– =
  • 40. Resolutions 212 Artificial Neural Networks We can observe that, as expected, the use of the MSRE criterion achieves better results in terms of the final MRSE, and often better results also in terms of the MSE. The difference in terms of MRSE is more significant in terms of the Coordinate Inverse problem, as it has significant smaller values of the target data than the pH problem. d) We shall compare here the results of early-stopping methods, with the two criteria, with no regularization or with different values of the regularization parameter. First we shall use the MSE criterion. The first four rows were obtained using an early-stopping method, where 30% of the data were used for validation. The last four rows illustrate the results obtained, for the same estimation and validation data, but with model trained on all the data. The Matlab function gen_set.m and the files in Asmod.zip were used for this problem.The termination criterion, for the early-stop- ping method was formulated as: the training stopped if the MSE for the validation set increased constantly in the last 4 iterations, or the standard ASMOD termination was found. This can be inspected comparing the column It. Min with N. It. If they are equal, the standard termination criterion was found first. Concerning the pH problem, the results obtained are in the table below:. 2.4 10-6 3.6 10-7 -1,114 110 4 594 22 1.4 10-6 1.8 10-7 -1,182 110 4.4 829 27 Table 4.31 - ASMOD Results - Early Stopping versus complete training; MSE (pH problem) Reg. factor MSEe MSREe MSEv MREv It Min Crit. Comp W. N. N. C. N It 8.6 10-9 5.9 10-7 4.3 10-6 0.034 41 -1139 42 3.7 1853 45 4.3 10-6 2.8 10-4 7.5 10-6 0.034 19 -808 16 2.2 285 19 7.6 10-9 5.6 10-7 4.3 10-6 0.034 47 -1139 44 3.7 2098 47 9.9 10-9 6 10-7 4.4 10-6 0.034 41 -1133 41 3.7 1853 45 1.4 10-31 8.5 10-31 1.5 10-31 2.2 10-30 --- -6705 101 5.82 9945 101 3.4 10-6 2.8 10-5 2.7 10-6 2.8 10-5 --- -1190 17 2.25 341 19 4.3 10-9 9.5 10-8 3 10-9 5.6 10-7 --- -1673 61 4.36 3569 61 3.4 10-13 1.5 10-12 3.3 10-9 1.0 10-6 --- -2440 98 5.71 11056 107 Table 4.30 - ASMOD Results - MSE versus MSRE (CT problem) Reg. factor MSE MSRE Criterion Complexity Weight Norm N. Cand. N. Iterations λ 10 4– = λ 10 6– = λ 0= λ 10 2– = λ 10 4– = λ 10 6– = λ 0= λ 10 2– = λ 10 4– = λ 10 6– =
  • 41. Chapter 2 Artificial Neural Networks 213 Concerning the Coordinate Transformation problem, the initial model consisted of 2 univariate sub-models and 1 bivariate sub-model, all with 0 interior knots. The fol- lowing table summarizes the results obtained: Then we shall employ the MSRE criterion. Concerning the pH problem, the results obtained are in the table below:. Table 4.32 - ASMOD Results - Early Stopping versus complete training; MSE (CT problem) Reg. factor MSEe MSREe MSEv MREv It Min Crit. Comp W. N. N. C. N It 2.7 10-4 7.7 109 15 10-3 9.7 1012 10 -476 36 5.5 499 101 4.5 10-5 7.0 109 12 10-3 1013 31 -553 50 3.9 1103 31 9 10-6 4.3 107 13 10-3 1013 31 -680 49 5.3 1135 32 7.5 10-7 412 11 10-3 3.7 1012 40 -751 77 5.2 1670 40 1.4 10-5 4.9 105 1.6 10-5 2.4 105 --- -916 65 9.6 1043 30 3.7 10-5 5.5 109 2.6 10-6 4.2 109 --- -831.7 64 4.18 921 26 1.6 10-5 1.4 108 2.2 10-5 1.1 107 --- -918.5 61 5.1 1264 33 1.4 10-5 7.3 105 1.6 10-5 5.9 103 --- -915.7 65 9.1 1067 30 Table 4.33 - ASMOD Results - Early Stopping versus complete training; MSRE (pH problem) Reg. factor MSEe MSREe MSEv MREv It Min Crit. Comp W. N. N. C. N It 1.3 10-8 2.3 10-8 4.3 10-6 0.034 49 -1038 49 3.6 2605 53 1.2 10-6 2.5 10-6 4.8 10-6 0.034 31 -805 26 2.4 821 31 9.7 10-9 1.9 10-8 4.4 10-6 0.034 49 -1047 50 3.6 2605 53 9.2 10-9 1.8 10-8 4.3 10-6 0.034 49 -1050 50 3.6 2605 53 5.8 10-31 1.4 10-30 6.5 10-10 6.3 10-31 --- -6,427 101 5.81 9699 99 1 10-6 2 10-6 7.9 10-7 2.2 10-6 --- -1,173 29 2.54 989 34 1.2 10-9 1.7 10-9 1.8 10-9 1.7 10-9 --- -1,652 79 4.75 6723 84 1.5 10-13 2 10-13 7.5 10-10 2.6 10-13 --- -2,462 98 5.71 11222 108 λ 0= λ 10 2– = λ 10 4– = λ 10 6– = λ 0= λ 10 2– = λ 10 4– = λ 10 6– = λ 0= λ 10 2– = λ 10 4– = λ 10 6– = λ 0= λ 10 2– = λ 10 4– = λ 10 6– =
  • 42. Resolutions 214 Artificial Neural Networks Concerning the Coordinate Inverse problem, the initial model consisted of 2 univari- ate sub-models and 1 bivariate sub-model, all with 2 interior knots. The following table summarizes the results obtained: For all cases, the MSE or the MSRE for the validation set is much lower if the training is performed using all the data. This is more evident for the CT problem, employing the MSRE criterion. Table 4.34 - ASMOD Results - Early Stopping versus complete training; MSRE (CT problem) Reg. factor MSEe MSREe MSEv MREv It Min Crit. Comp W. N. N. C. N It 4.9 10-6 8.2 10-7 0.014 1.2 1013 15 -787 67 4.9 624 19 5.7 10-5 1.3 10-5 0.014 1 1013 18 -631 54 3.7 519 18 1.9 10-6 4.2 10-7 0.014 1.1 1013 24 -805 74 5.4 1037 24 5 10-6 8.3 10-7 0.017 1.2 1013 15 -787 67 4.9 626 19 2.8 10-5 3.6 10-6 1.9 10-5 2.3 10-6 --- -869 111 43.14 1273 38 4.7 10-5 1.1 10-5 3.2 10-5 7.4 10-6 --- -924 65 3.7 495 19 2.5 10-6 3.6 10-7 2.2 10-6 3.6 10-7 --- -1,114 110 4 594 22 1.5 10-6 1.9 10-7 1.2 10-6 1.7 10-7 --- -1,182 110 4.4 829 27 λ 0= λ 10 2– = λ 10 4– = λ 10 6– = λ 0= λ 10 2– = λ 10 4– = λ 10 6– =