1. Neural Networks
Prof. Ruchi Sharma
Bharatividyapeeth College of Engineering
New Delhi
Lecture 1 1
2. Plan
Requirements
Background
Brains and Computers
Computational Models
Learnability vs Programming
Representability vs Training Rules
Abstractions of Neurons
Abstractions of Networks
Completeness of 3-Level Mc-
Cullough-Pitts Neurons
Learnability in Perceptrons
Lecture 1 2
3. Brains and Computers
What is computation?
Basic Computational Model
(Like C, Pascal, C++, etc)
(See course on Models of
Computation.)
Analysis of problem and
reduction to algorithm.
Implementation of algorithm by
Programmer.
Lecture 1 3
4. Computation
Brain also computes. But
Its programming seems
different.
Flexible, Fault Tolerant
(neurons die every day),
Automatic Programming,
Learns from examples,
Generalizability
Lecture 1 4
5. Brain vs. Computer
Brain works on slow
components (10**-3 sec)
Computers use fast
components (10**-9 sec)
Brain more efficient (few
joules per operation) (factor
of 10**10.)
Uses massively parallel
mechanism.
****Can we copy its secrets?
Lecture 1 5
6. Brain vs Computer
Areas that Brain is better:
Sense recognition and
integration
Working with incomplete
information
Generalizing
Learning from examples
Fault tolerant (regular
programming is notoriously
fragile.)
Lecture 1 6
7. AI vs. NNs
AI relates to cognitive
psychology
Chess, Theorem Proving,
Expert Systems, Intelligent
agents (e.g. on Internet)
NNs relates to
neurophysiology
Recognition tasks, Associative
Memory, Learning
Lecture 1 7
8. How can NNs work?
Look at Brain:
10**10 neurons (10
Gigabytes) .
10 ** 20 possible
connections with different
numbers of dendrites (reals)
Actually about 6x 10**13
connections (I.e. 60,000 hard
discs for one photo of
contents of one brain!)
Lecture 1 8
12. One Neuron
McCullough-Pitts
This is very complicated. But
abstracting the details,we
x1 have
w1
x2 w2 Σ
wn Ιntegrate Τhreshold
xn
Integrate-and-fire Neuron
Lecture 1 12
13. Representability
What functions can be
represented by a network of
Mccullough-Pitts neurons?
Theorem: Every logic
function of an arbitrary
number of variables can be
represented by a three level
network of neurons.
Lecture 1 13
14. Proof
Show simple functions: and,
or, not, implies
Recall representability of
logic functions by DNF form.
Lecture 1 14
19. DNF and All Functions
Theorem
Any logic (boolean) function of
any number of variables can
be represented in a network of
McCullough-Pitts neurons.
In fact the depth of the network
is three.
Proof: Use DNF and And, Or,
Not representation
Lecture 1 19
20. Other Questions?
What if we allow REAL
numbers as inputs/outputs?
What real functions can be
represented?
What if we modify threshold to
some other function; so
output is not {0,1}. What
functions can be
represented?
Lecture 1 20
22. Learnability and
Generalizability
The previous theorem tells
us that neural networks are
potentially powerful, but
doesn’t tell us how to use
them.
We desire simple networks
with uniform training rules.
Lecture 1 22
23. One Neuron
(Perceptron)
What can be represented by
one neuron?
Is there an automatic way to
learn a function by
examples?
Lecture 1 23
24. Perceptron Training
Rule
Loop:
Take an example. Apply
to network. If correct
answer,
return to loop. If incorrect,
go to FIX.
FIX: Adjust network weights
by
input example . Go to Loop.
Lecture 1 24
28. Perceptron
Convergence
Theorem
If the concept is
representable in a
perceptron then the
perceptron learning rule will
converge in a finite amount
of time.
(MAKE PRECISE and Prove)
Lecture 1 28
29. What is a Neural
Network?
What is an abstract Neuron?
What is a Neural Network?
How are they computed?
What are the advantages?
Where can they be used?
Agenda
What to expect
Lecture 1 29
30. Perceptron Algorithm
Start: Choose arbitrary value
for weights, W
Test: Choose arbitrary
example X
If X pos and WX >0 or X neg
and WX <= 0 go to Test
Fix:
If X pos W := W +X;
If X negative W:= W –X;
Go to Test;
Lecture 1 30
31. Perceptron Conv.
Thm.
Let F be a set of unit length
vectors. If there is a vector
V* and a value e>0 such that
V*X > e for all X in F then
the perceptron program goes
to FIX only a finite number of
times.
Lecture 1 31
32. Applying Algorithm to
“And”
W0 = (0,0,1) or random
X1 = (0,0,1) result 0
X2 = (0,1,1) result 0
X3 = (1,0, 1) result 0
X4 = (1,1,1) result 1
Lecture 1 32
35. Proof of Conv
Theorem
Note:
1. By hypothesis, there is a δ >0
such that V*X > δ for all x ε F
1. Can eliminate threshold
(add additional dimension to input)
W(x,y,z) > threshold if and only if
W* (x,y,z,1) > 0
2. Can assume all examples are
positive ones
(Replace negative examples
by their negated vectors)
W(x,y,z) <0 if and only if
W(-x,-y,-z) > 0.
Lecture 1 35
36. Proof (cont).
Consider quotient V*W/|W|.
(note: this is multidimensional
cosine between V* and W.)
Recall V* is unit vector .
Quotient <= 1.
Lecture 1 36
37. Proof(cont)
Now each time FIX is visited W
changes via ADD.
V* W(n+1) = V*(W(n) + X)
= V* W(n) + V*X
>= V* W(n) + δ
Hence
V* W(n) >= n δ (∗)
Lecture 1 37
38. Proof (cont)
Now consider denominator:
|W(n+1)| = W(n+1)W(n+1) =
( W(n) + X)(W(n) + X) = |W(n)|
**2 + 2W(n)X + 1
(recall |X| = 1 and W(n)X < 0
since X is positive example and
we are in FIX)
< |W(n)|**2 + 1
So after n times
|W(n+1)|**2 < n (**)
Lecture 1 38
39. Proof (cont)
Putting (*) and (**) together:
Quotient = V*W/|W|
> nδ/ sqrt(n)
Since Quotient <=1 this means
n < (1/δ)∗∗2.
This means we enter FIX a
bounded number of times.
Q.E.D.
Lecture 1 39
44. Additional Facts
Note: If X’s presented in
systematic way, then solution W
always found.
Note: Not necessarily same as V*
Note: If F not finite, may not
obtain solution in finite time
Can modify algorithm in minor
ways and stays valid (e.g. not
unit but bounded examples);
changes in W(n).
Lecture 1 44
45. Perceptron
Convergence
Theorem
If the concept is
representable in a
perceptron then the
perceptron learning rule will
converge in a finite amount
of time.
(MAKE PRECISE and Prove)
Lecture 1 45
46. Important Points
Theorem only guarantees
result IF representable!
Usually we are not interested
in just representing
something but in its
generalizability – how will it
work on examples we havent
seen!
Lecture 1 46
48. Generalizability
Typically train a network on a
sample set of examples
Use it on general class
Training can be slow; but
execution is fast.
Lecture 1 48
49. Perceptron
•weights
∑w x i i > threshold ⇔ A in receptive field
kdkdkfjlll
∑w x
i i > θ ⇔ The letter A is in the recept
•Pattern
Identification
•(Note: Neuron
is trained)
Lecture 1 49
50. What wont work?
Example: Connectedness
with bounded diameter
perceptron.
Compare with Convex with
(use sensors of order three).
Lecture 1 50
51. Biases and
Thresholds
We can
replace the
threshold with
a bias. n
∑ wx >θ i i
A bias acts i= 1
n
exactly as a
weight on a
∑ w x -θ > 0
i= 1
i i
n
connection
from a unit ∑ wx > 0
i= 0
i i w0 = - θ x0 = 1
whose
activation is
always 1.
Lecture 1 51
52. Perceptron
Loop :
Take an example and apply to
network.
If correct answer – return to
Loop.
If incorrect – go to Fix.
Fix :
Adjust network weights by
input example.
Go to Loop.
Lecture 1 52
53. Perceptron Algorithm
v x∈ F
Let be arbitrary
Choose: chooseF
Test: v ⋅Ifx > 0x∈ x ∈ and
F+ go to
v⋅ x ≤
Choose 0 x∈ F +
If and go to
v⋅ x ≤ 0
Fix plus x∈ F −
vIf > 0
⋅x x ∈ and
F− go to
v := v + x
Choose
If and go to
v := v − x
Fix minus
Fix plus: go to Choose
Fix minus: go to Choose
Lecture 1 53
54. Perceptron Algorithm
Conditions to the algorithm
existence :
Condition no.1: +
if x ∈ F v * ⋅ x > 0
∃ v∗ ,
if x ∈ F v * ⋅ x ≤ 0
−
Condition no.2:
We choose F to be a group of unit
vectors.
Lecture 1 54
56. Perceptron Algorithm
Based on these conditions the
number of times we enter the
Loop is finite.
Examples
Proof: world
+ −
F = F F
{ x A x > 0} { y A y ≤ 0}
* *
Positiv Negati
e ve
exampl examp
es les
Lecture 1 56
57. Perceptron Algorithm-
Proof
We replace the threshold
with a bias.
We assume F is a group
of unit vectors.
∀φ ∈ F φ =1
Lecture 1 57
58. Perceptron Algorithm-
Proof
We reduce
∑ ay≤0 ∑ a (− y ) ≥ 0
what we have
*
i i ⇒ *
i i
to prove by yi ⇒ -yi
eliminating all +
the negative F= F
examples and
placing their ∀φ ∈ F A*φ > 0
negations in
the positive ∀φ ∈ F A*φ > δ
examples. ⇓
∃δ > 0 Aφ > δ
*
Lecture 1 58
59. Perceptron Algorithm-
Proof
A* ⋅ A
A* ⋅ A
G ( A) = * = ≤1
A ⋅A A
The
numerator :
A ⋅ Al + 1 = A ⋅ ( Al + φ ) = A ⋅ Al + A ⋅ φ ≥ A ⋅ Al + δ
* * * * *
>δ
After n
changes A* ⋅ An ≥ n ⋅ δ
Lecture 1 59
60. Perceptron Algorithm-
Proof
The
denominator
:
At + 1 = ( At + 1 )( At + 1 ) = ( At + φ )( At + φ ) =
2
2 2 2
= At + 2 ⋅ At ⋅ φ + φ < At + 1
<0 =1
After n 2
changes An < n
Lecture 1 60
61. Perceptron Algorithm-
Proof
From the
denominato
r: 2
An < n
From the
numerator :
A* ⋅ An > n ⋅ δ
A ⋅ An n ⋅ δ
*
G ( A) = > = n ⋅δ ≤ 1
An n
⇓
n is 1
final
n<
δ 2
Lecture 1 61
64. Problem
LHS1 MHS1 RHS1 + MHS2 MHS 2RHS < LHSLHS 3 + 3MHS 33 + RHS 3 > 0
LHS 2 LHS + RHS 2 0 3 MHS RHS
2 2
LHS1 + MHS1 + RHS1 > 0
MHS1 = MHS 2 = MHS 3
LHS1 + RHS1 > 0 LHS 2 + RHS 2 < 0 LHS 3 + RHS3 > 0
LHS1 = LHS 2 RHS 2 = RHS 3
should be small
RHS 2 should be small
LHS 2
enough so that enough so that
LHS 2 + RHS 2 < 0 LHS 2 + RHS 2 < 0
contradictio
!!!n
Lecture 1 64
65. Linear Separation
Every perceptron determines
a classification of vector
inputs which is determined
by a hyperline
Two dimensional examples
(add algebra)
OR AND XOR
not
possible
Lecture 1 65
66. Linear Separation in
Higher Dimensions
In higher dimensions, still
linear separation, but hard to
tell
Example: Connected;
Convex - which can be
handled by Perceptron with
local sensors; which can not
be.
Note: Define local sensors.
Lecture 1 66
68. Limitations of
Perceptron
Representability
Only concepts that are linearly
separable.
Compare: Convex versus
connected
Examples: XOR vs OR
Lecture 1 68