Lecture artificial neural networks and pattern recognition
1. T H E UN I V E R S I T Y of TE X A S
HE A L T H S C I E N C E CE N T E R A T HO U S T O N
S C H O O L of HE A L T H I N F O R M A T I O N S C I E N C E S
Artificial Neural Networks and
Pattern Recognition
For students of HI 5323
“Image Processing”
Willy Wriggers, Ph.D.
School of Health Information Sciences
http://biomachina.org/courses/processing/13.html
4. Neuro-
Physiological
Background
• 10 billion neurons in
human cortex
• 60 trillion synapses
• In first two years from birth
~1 million synapses / sec.
formed
pyramidal cell
19. Main Types of ANN
Supervised Learning:
ƒ Feed-forward ANN
- Multi-Layer Perceptron (with sigmoid hidden neurons)
ƒ Recurrent Networks
- Neurons are connected to self and others
- Time delay of signal transfer
- Multidirectional information flow
Unsupervised Learning:
ƒ Self-organizing ANN
- Kohonen Maps
- Vector Quantization
- Neural Gas
64. Vector Quantization
Lloyd (1957)
Linde, Buzo, & Gray (1980)
Martinetz & Schulten (1993)
Digital Signal Processing,
Speech and Image Compression.
Neural Gas.
}
Encode data (in ℜ D ) using a finite set { w j } (j=1,…,k) of codebook vectors.
Delaunay triangulation divides ℜ D into k Voronoi polyhedra (“receptive fields”):
V i = { v ∈ℜ D v − w i ≤ v − w j ∀j
}
66. k-Means a.k.a. Linde, Buzo & Gray (LBG)
Encoding Distortion Error:
2
E = Σ vi −wj i d
(data points) ( ) i
i
Lower E ( { w ( t ) } ) iteratively: Gradient descent j
∀r :
w t w t w t E v w d
∂ Σ
r ( ) r ( ) r ( 1 ) rj ( i ) ( i r ) i
.
2 w
r i
ε
ε δ
∂
Δ ≡ − − = − ⋅ = ⋅ −
v (t) i : i d
Inline (Monte Carlo) approach for a sequence selected at random
according to propability density function
( ) ~ ( ( ) ) . r rj(i) i r Δw t = ε ⋅δ ⋅ v t −w
Advantage: fast, reasonable clustering.
Limitations: depends on initial random positions,
difficult to avoid getting trapped in the many local minima of E
67. Neural Gas Revisited
Avoid local minima traps of k-means by smoothing of energy function:
r
−
r r w t e v t w
∀ Δ = ⋅ ⋅ −
( ( ) { }) r i j s v t , w
s
: ( ) ~ ε λ
( i ( ) r
) , Where is the closeness rank:
v w v w v w
− ≤ − ≤ ≤ − −
s s s k
i j i j … i j k
0 1 ( 1)
= = = −
0 1 1
r r r
68. Neural Gas Revisited
λ →0 :
λ ≠ 0 : j(i ) w
({ } ) k 2
λ E w λ
= Σ e Σ vi −
i wj d , ( ) .
r 1
sr
j i
i
−
=
Note: k-means.
not only “winner” , also second, third, ... closest are updated.
Can show that this corresponds to stochastic gradient descent on
λ →0 : E~→ E .
λ → ∞ : E~ } ⇒ λ (t)
Note: k-means.
parabolic (single minimum).
69. Neural Gas Revisited
Q: How do we know that we have found the global minimum of E?
A: We don’t (in general).
{ } j w
But we can compute the statistical variability of the by repeating the
calculation with different seeds for random number generator.
Codebook vector variability arises due to:
• statistical uncertainty,
• spread of local minima.
A small variability indicates good convergence behavior.
Optimum choice of # of vectors k: variability is minimal.
90. Neural Network References
• Neural Networks, a Comprehensive Foundation, S. Haykin, ed. Prentice Hall
(1999)
• Neural Networks for Pattern Recognition, C. M. Bishop, ed Claredon Press,
Oxford (1997)
• Self Organizing Maps, T. Kohonen, Springer (2001)
91. Some ANN Toolboxes
• Free software
ƒ SNNS: Stuttgarter Neural Network Systems Java NNS
ƒ GNG at Uni Bochum
• Matlab toolboxes
ƒ Fuzzy Logic
ƒ Artificial Neural Networks
ƒ Signal Processing
92. Pattern Recognition /
Vector Quantization References
Textbooks
Duda, Heart: Pattern Classification and Scene Analysis. J. Wiley Sons, New York,
1982. (2nd edition 2000).
Fukunaga: Introduction to Statistical Pattern Recognition. Academic Press, 1990.
Bishop: Neural Networks for Pattern Recognition. Claredon Press, Oxford, 1997.
Schlesinger, Hlaváč: Ten lectures on statistical and structural pattern recognition.
Kluwer Academic Publisher, 2002.