SlideShare uma empresa Scribd logo
1 de 92
Tutorial on Neural
Networks
Prévotet Jean-Christophe
University of Paris VI
FRANCE
Biological inspirations
 Some numbers…
 The human brain contains about 10 billion nerve cells
(neurons)
 Each neuron is connected to the others through
10000 synapses
 Properties of the brain
 It can learn, reorganize itself from experience
 It adapts to the environment
 It is robust and fault tolerant
Biological neuron
 A neuron has
 A branching input (dendrites)
 A branching output (the axon)
 The information circulates from the dendrites to the axon
via the cell body
 Axon connects to dendrites via synapses
 Synapses vary in strength
 Synapses may be excitatory or inhibitory
axon
cell body
synapse
nucleus
dendrites
What is an artificial neuron ?
 Definition : Non linear, parameterized function
with restricted output range







 


1
1
0
n
i
i
i x
w
w
f
y
x1 x2 x3
w0
y
Activation functions
0 2 4 6 8 10 12 14 16 18 20
0
2
4
6
8
10
12
14
16
18
20
-10 -8 -6 -4 -2 0 2 4 6 8 10
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-10 -8 -6 -4 -2 0 2 4 6 8 10
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Linear
Logistic
Hyperbolic tangent
x
y 
)
exp(
1
1
x
y



)
exp(
)
exp(
)
exp(
)
exp(
x
x
x
x
y





Neural Networks
 A mathematical model to solve engineering problems
 Group of highly connected neurons to realize compositions of
non linear functions
 Tasks
 Classification
 Discrimination
 Estimation
 2 types of networks
 Feed forward Neural Networks
 Recurrent Neural Networks
Feed Forward Neural Networks
 The information is
propagated from the
inputs to the outputs
 Computations of No non
linear functions from n
input variables by
compositions of Nc
algebraic functions
 Time has no role (NO
cycle between outputs
and inputs)
x1 x2 xn
…..
1st hidden
layer
2nd hidden
layer
Output layer
Recurrent Neural Networks
 Can have arbitrary topologies
 Can model systems with
internal states (dynamic ones)
 Delays are associated to a
specific weight
 Training is more difficult
 Performance may be
problematic
 Stable Outputs may be more
difficult to evaluate
 Unexpected behavior
(oscillation, chaos, …)
x1 x2
1
0
1
0
1
0
0
0
Learning
 The procedure that consists in estimating the parameters of neurons
so that the whole network can perform a specific task
 2 types of learning
 The supervised learning
 The unsupervised learning
 The Learning process (supervised)
 Present the network a number of inputs and their corresponding outputs
 See how closely the actual outputs match the desired ones
 Modify the parameters to better approximate the desired outputs
Supervised learning
 The desired response of the neural
network in function of particular inputs is
well known.
 A “Professor” may provide examples and
teach the neural network how to fulfill a
certain task
Unsupervised learning
 Idea : group typical input data in function of
resemblance criteria un-known a priori
 Data clustering
 No need of a professor
 The network finds itself the correlations between the
data
 Examples of such networks :
 Kohonen feature maps
Properties of Neural Networks
 Supervised networks are universal approximators (Non
recurrent networks)
 Theorem : Any limited function can be approximated by a
neural network with a finite number of hidden neurons to
an arbitrary precision
 Type of Approximators
 Linear approximators : for a given precision, the number of
parameters grows exponentially with the number of variables
(polynomials)
 Non-linear approximators (NN), the number of parameters grows
linearly with the number of variables
Other properties
 Adaptivity
 Adapt weights to environment and retrained easily
 Generalization ability
 May provide against lack of data
 Fault tolerance
 Graceful degradation of performances if damaged =>
The information is distributed within the entire net.
 In practice, it is rare to approximate a known
function by a uniform function
 “black box” modeling : model of a process
 The y output variable depends on the input
variable x with k=1 to N
 Goal : Express this dependency by a function,
for example a neural network
Static modeling
 
k
p
k
y
x ,
 If the learning ensemble results from measures, the
noise intervenes
 Not an approximation but a fitting problem
 Regression function
 Approximation of the regression function : Estimate the
more probable value of yp for a given input x
 Cost function:
 Goal: Minimize the cost function by determining the
right function g
 
2
1
)
,
(
)
(
2
1
)
( 



N
k
k
k
p w
x
g
x
y
w
J
Example
Classification (Discrimination)
 Class objects in defined categories
 Rough decision OR
 Estimation of the probability for a certain
object to belong to a specific class
Example : Data mining
 Applications : Economy, speech and
patterns recognition, sociology, etc.
Example
Examples of handwritten postal codes
drawn from a database available from the US Postal service
What do we need to use NN ?
 Determination of pertinent inputs
 Collection of data for the learning and testing
phase of the neural network
 Finding the optimum number of hidden nodes
 Estimate the parameters (Learning)
 Evaluate the performances of the network
 IF performances are not satisfactory then review
all the precedent points
Classical neural architectures
 Perceptron
 Multi-Layer Perceptron
 Radial Basis Function (RBF)
 Kohonen Features maps
 Other architectures
An example : Shared weights neural networks
Perceptron
 Rosenblatt (1962)
 Linear separation
 Inputs :Vector of real values
 Outputs :1 or -1
0
2
2
1
1
0 

 x
c
x
c
c
+
+
+
+
+
+
+
+
+
+ + +
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
1


y
1


y
0
c
1
c 2
c

1
x
2
x
1
2
2
1
1
0 x
c
x
c
c
v 


)
(v
sign
y 
Learning (The perceptron rule)
 Minimization of the cost function :
 J(c) is always >= 0 (M is the ensemble of bad classified
examples)
 is the target value
 Partial cost
 If is not well classified :
 If is well classified
 Partial cost gradient
 Perceptron algorithm
k
x
 

 M
k
k
k
pv
y
c
J )
(
k
p
y
k
k
p
k
k
p
k
k
p
x
y
v
y
v
y





1)
-
c(k
c(k)
:
)
classified
not well
is
x
(
0
if
1)
-
c(k
c(k)
:
)
classified
well
is
(x
0
if
k
k
k
x
k
k
p
k
v
y
c
J 

)
(
0
)
( 
c
J k
k
k
p
k
x
y
c
c
J



 )
(
 The perceptron algorithm converges if
examples are linearly separable
Multi-Layer Perceptron
 One or more hidden
layers
 Sigmoid activations
functions
1st hidden
layer
2nd hidden
layer
Output layer
Input data
Learning
 Back-propagation algorithm
 
)
(
'
)
(
)
(
)²
(
2
1
)
(
0
j
j
j
j
j
j
j
j
j
j
j
j
j
j
j
i
j
ji
j
j
ji
ji
j
j
j
n
i
i
ji
j
j
net
f
o
t
o
t
o
E
o
t
E
net
f
o
E
net
o
o
E
o
w
net
net
E
w
E
w
net
f
o
o
w
w
net



































 





If the jth node is an output unit
j
j
net
E





Credit assignment
)
(
)
1
(
)
(
)
1
(
)
(
)
(
)
(
)
(
'
t
w
t
w
t
w
t
w
t
o
t
t
w
w
net
f
w
o
net
net
E
o
E
ji
ji
ji
ji
i
j
ji
k kj
k
j
j
j
k k kj
k
j
j




















 






 


Momentum term to smooth
The weight changes over time
Structure
Types of
Decision Regions
Exclusive-OR
Problem
Classes with
Meshed regions
Most General
Region Shapes
Single-Layer
Two-Layer
Three-Layer
Half Plane
Bounded By
Hyperplane
Convex Open
Or
Closed Regions
Abitrary
(Complexity
Limited by No.
of Nodes)
A
A
B
B
A
A
B
B
A
A
B
B
B
A
B
A
B
A
Different non linearly separable
problems
Neural Networks – An Introduction Dr. Andrew Hunter
Radial Basis Functions (RBFs)
 Features
 One hidden layer
 The activation of a hidden unit is determined by the distance between
the input vector and a prototype vector
Radial units
Outputs
Inputs
 RBF hidden layer units have a receptive
field which has a centre
 Generally, the hidden unit function is
Gaussian
 The output Layer is linear
 Realized function
 
 



K
j j
j c
x
W
x
s 1
)
(
 
2
exp









 




j
j
j
c
x
c
x

Learning
 The training is performed by deciding on
 How many hidden nodes there should be
 The centers and the sharpness of the Gaussians
 2 steps
 In the 1st stage, the input data set is used to
determine the parameters of the basis functions
 In the 2nd stage, functions are kept fixed while the
second layer weights are estimated ( Simple BP
algorithm like for MLPs)
MLPs versus RBFs
 Classification
 MLPs separate classes via
hyperplanes
 RBFs separate classes via
hyperspheres
 Learning
 MLPs use distributed learning
 RBFs use localized learning
 RBFs train faster
 Structure
 MLPs have one or more
hidden layers
 RBFs have only one layer
 RBFs require more hidden
neurons => curse of
dimensionality
X2
X1
MLP
X2
X1
RBF
Self organizing maps
 The purpose of SOM is to map a multidimensional input
space onto a topology preserving map of neurons
 Preserve a topological so that neighboring neurons respond to «
similar »input patterns
 The topological structure is often a 2 or 3 dimensional space
 Each neuron is assigned a weight vector with the same
dimensionality of the input space
 Input patterns are compared to each weight vector and
the closest wins (Euclidean Distance)
 The activation of the
neuron is spread in its
direct neighborhood
=>neighbors become
sensitive to the same
input patterns
 Block distance
 The size of the
neighborhood is initially
large but reduce over
time => Specialization of
the network
First neighborhood
2nd neighborhood
Adaptation
 During training, the
“winner” neuron and its
neighborhood adapts to
make their weight vector
more similar to the input
pattern that caused the
activation
 The neurons are moved
closer to the input pattern
 The magnitude of the
adaptation is controlled
via a learning parameter
which decays over time
Shared weights neural networks:
Time Delay Neural Networks (TDNNs)
 Introduced by Waibel in 1989
 Properties
 Local, shift invariant feature extraction
 Notion of receptive fields combining local information
into more abstract patterns at a higher level
 Weight sharing concept (All neurons in a feature
share the same weights)
 All neurons detect the same feature but in different position
 Principal Applications
 Speech recognition
 Image analysis
TDNNs (cont’d)
 Objects recognition in an
image
 Each hidden unit receive
inputs only from a small
region of the input space :
receptive field
 Shared weights for all
receptive fields =>
translation invariance in
the response of the
network
Inputs
Hidden
Layer 1
Hidden
Layer 2
 Advantages
Reduced number of weights
 Require fewer examples in the training set
 Faster learning
Invariance under time or space translation
Faster execution of the net (in comparison of
full connected MLP)
Neural Networks (Applications)
 Face recognition
 Time series prediction
 Process identification
 Process control
 Optical character recognition
 Adaptative filtering
 Etc…
Conclusion on Neural Networks
 Neural networks are utilized as statistical tools
 Adjust non linear functions to fulfill a task
 Need of multiple and representative examples but fewer than in other
methods
 Neural networks enable to model complex static phenomena (FF) as
well as dynamic ones (RNN)
 NN are good classifiers BUT
 Good representations of data have to be formulated
 Training vectors must be statistically representative of the entire input
space
 Unsupervised techniques can help
 The use of NN needs a good comprehension of the problem
Preprocessing
Why Preprocessing ?
 The curse of Dimensionality
The quantity of training data grows
exponentially with the dimension of the input
space
In practice, we only have limited quantity of
input data
 Increasing the dimensionality of the problem leads
to give a poor representation of the mapping
Preprocessing methods
 Normalization
Translate input values so that they can be
exploitable by the neural network
 Component reduction
Build new input variables in order to reduce
their number
No Lost of information about their distribution
Character recognition example
 Image 256x256 pixels
 8 bits pixels values
(grey level)
 Necessary to extract
features
images
different
10
2 158000
8
256
256



Normalization
 Inputs of the neural net are often of
different types with different orders of
magnitude (E.g. Pressure, Temperature,
etc.)
 It is necessary to normalize the data so
that they have the same impact on the
model
 Center and reduce the variables
 

N
n
n
i
i x
N
x 1
1
 
 



N
n i
n
i
i x
x
N 1
2
2
1
1

i
i
n
i
n
i
x
x
x



Average on all points
Variance calculation
Variables transposition
Components reduction
 Sometimes, the number of inputs is too large to
be exploited
 The reduction of the input number simplifies the
construction of the model
 Goal : Better representation of the data in order
to get a more synthetic view without losing
relevant information
 Reduction methods (PCA, CCA, etc.)
Principal Components Analysis
(PCA)
 Principle
 Linear projection method to reduce the number of parameters
 Transfer a set of correlated variables into a new set of
uncorrelated variables
 Map the data into a space of lower dimensionality
 Form of unsupervised learning
 Properties
 It can be viewed as a rotation of the existing axes to new
positions in the space defined by original variables
 New axes are orthogonal and represent the directions with
maximum variability
 Compute d dimensional mean
 Compute d*d covariance matrix
 Compute eigenvectors and Eigenvalues
 Choose k largest Eigenvalues
 K is the inherent dimensionality of the subspace governing the
signal
 Form a d*d matrix A with k columns of eigenvectors
 The representation of data consists of projecting data into
a k dimensional subspace by
)
( 

 x
A
x t
Example of data representation
using PCA
Limitations of PCA
 The reduction of dimensions for complex
distributions may need non linear
processing
Curvilinear Components
Analysis
 Non linear extension of the PCA
 Can be seen as a self organizing neural network
 Preserves the proximity between the points in
the input space i.e. local topology of the
distribution
 Enables to unfold some varieties in the input
data
 Keep the local topology
Example of data representation
using CCA
Non linear projection of a horseshoe
Non linear projection of a spiral
Other methods
 Neural pre-processing
Use a neural network to reduce the
dimensionality of the input space
Overcomes the limitation of PCA
Auto-associative mapping => form of
unsupervised training
x1 x2 xd
….
x1 x2 xd
….
z1 zM
 Transformation of a d
dimensional input space
into a M dimensional
output space
 Non linear component
analysis
 The dimensionality of the
sub-space must be
decided in advance
D dimensional input space
D dimensional output space
M dimensional sub-space
« Intelligent preprocessing »
 Use an “a priori” knowledge of the problem
to help the neural network in performing its
task
 Reduce manually the dimension of the
problem by extracting the relevant features
 More or less complex algorithms to
process the input data
Example in the H1 L2 neural
network trigger
 Principle
 Intelligent preprocessing
 extract physical values for the neural net (impulse, energy, particle
type)
 Combination of information from different sub-detectors
 Executed in 4 steps
Clustering Matching Ordering
Post
Processing
find regions of
interest
within a given
detector layer
combination of clusters
belonging to the same
object
sorting of objects
by parameter
generates
variables
for the
neural network
Conclusion on the preprocessing
 The preprocessing has a huge impact on
performances of neural networks
 The distinction between the preprocessing and
the neural net is not always clear
 The goal of preprocessing is to reduce the
number of parameters to face the challenge of
“curse of dimensionality”
 It exists a lot of preprocessing algorithms and
methods
 Preprocessing with prior knowledge
 Preprocessing without
Implementation of neural
networks
Motivations and questions
 Which architectures utilizing to implement Neural Networks in real-
time ?
 What are the type and complexity of the network ?
 What are the timing constraints (latency, clock frequency, etc.)
 Do we need additional features (on-line learning, etc.)?
 Must the Neural network be implemented in a particular environment (
near sensors, embedded applications requiring less consumption etc.) ?
 When do we need the circuit ?
 Solutions
 Generic architectures
 Specific Neuro-Hardware
 Dedicated circuits
Generic hardware architectures
 Conventional microprocessors
Intel Pentium, Power PC, etc …
 Advantages
 High performances (clock frequency, etc)
 Cheap
 Software environment available (NN tools, etc)
 Drawbacks
 Too generic, not optimized for very fast neural
computations
Specific Neuro-hardware circuits
 Commercial chips CNAPS, Synapse, etc.
 Advantages
 Closer to the neural applications
 High performances in terms of speed
 Drawbacks
 Not optimized to specific applications
 Availability
 Development tools
 Remark
 These commercials chips tend to be out of production
Example :CNAPS Chip
64 x 64 x 1 in 8 µs
(8 bit inputs, 16 bit weights,
CNAPS 1064 chip
Adaptive Solutions,
Oregon
Dedicated circuits
 A system where the functionality is once and for
all tied up into the hard and soft-ware.
 Advantages
 Optimized for a specific application
 Higher performances than the other systems
 Drawbacks
 High development costs in terms of time and money
What type of hardware to be used
in dedicated circuits ?
 Custom circuits
 ASIC
 Necessity to have good knowledge of the hardware design
 Fixed architecture, hardly changeable
 Often expensive
 Programmable logic
 Valuable to implement real time systems
 Flexibility
 Low development costs
 Fewer performances than an ASIC (Frequency, etc.)
Programmable logic
 Field Programmable Gate Arrays (FPGAs)
Matrix of logic cells
Programmable interconnection
Additional features (internal memories +
embedded resources like multipliers, etc.)
Reconfigurability
 We can change the configurations as many times
as desired
FPGA Architecture
I/O Ports
Block Rams
Programmable
connections
Programmable
Logic
Blocks
DLL
LUT
LUT
Carry &
Control
Carry &
Control
D Q
D Q
y
yq
xb
x
xq
cin
cout
G4
G3
G2
G1
F4
F3
F2
F1
bx
Xilinx Virtex slice
Real time Systems
Real-Time Systems
Execution of applications with time constraints.
hard and soft real-time systems
digital fly-by-wire control system of an aircraft:
No lateness is accepted Cost. The lives of people depend on
the correct working of the control system of the aircraft.
A soft real-time system can be a vending machine:
Accept lower performance for lateness, it is not catastrophic
when deadlines are not met. It will take longer to handle one
client with the vending machine.
Typical real time processing
problems
 In instrumentation, diversity of real-time
problems with specific constraints
 Problem : Which architecture is adequate
for implementation of neural networks ?
 Is it worth spending time on it?
Some problems and dedicated
architectures
 ms scale real time system
Architecture to measure raindrops size and
velocity
Connectionist retina for image processing
 µs scale real time system
Level 1 trigger in a HEP experiment
Architecture to measure raindrops
size and velocity
 2 focalized beams on 2
photodiodes
 Diodes deliver a signal
according to the received
energy
 The height of the pulse
depends on the radius
 Tp depends on the speed
of the droplet
 Problematic
Tp
Input data
High level of noise
Significant variation of
The current baseline
Real droplet
Noise
Feature extractors
5
2
Input stream
10 samples
Input stream
10 samples
Proposed architecture
20 input windows
Presence of a
droplet
Size
Full interconnection Full interconnection
Velocity
Feature
extractors
Performances
Estimated
Radii
(mm)
Actual Radii (mm)
Estimated
Velocities
(m/s)
Actual velocities (m/s)
Hardware implementation
 10 KHz Sampling
 Previous times => neuro-hardware
accelerator (Totem chip from Neuricam)
 Today, generic architectures are sufficient
to implement the neural network in real-
time
Connectionist Retina
 Integration of a neural
network in an artificial
retina
 Screen
 Matrix of Active Pixel
sensors
 CAN (8 bits converter)
256 levels of grey
 Processing Architecture
 Parallel system where
neural networks are
implemented
Processing
Architecture
CAN
I
Processing architecture: “The
maharaja” chip
Integrated Neural Networks :
WEIGHTHED SUM ∑i wiXi
EUCLIDEAN (A – B)2
MANHATTAN |A – B|
MAHALANOBIS (A – B) ∑ (A – B)
Radial Basis function [RBF]
Multilayer Perceptron [MLP]
The “Maharaja” chip
 Micro-controller
 Enable the steering of the
whole circuit
 Memory
 Store the network
parameters
 UNE
 Processors to compute the
neurons outputs
 Input/Output module
 Data acquisition and storage
of intermediate results
Micro-controller
Sequencer
Command bus
Input/Output
unit
Instruction Bus
UNE-0 UNE-1 UNE-2 UNE-3
M M M M
Hardware Implementation
FPGA implementing the
Processing architecture
Matrix of Active Pixel Sensors
Performances
Neural Networks
Performances
Latency
(Timing constraints)
Estimated
execution time
MLP (High Energy Physics)
(4-8-8-4) 10 µs 6,5 µs
RBF (Image processing)
(4-10-256) 40 ms
473 µs (Manhattan)
23ms
(Mahalanobis)
Level 1 trigger in a HEP experiment
 Neural networks have provided interesting
results as triggers in HEP.
Level 2 : H1 experiment
Level 1 : Dirac experiment
 Goal : Transpose the complex processing
tasks of Level 2 into Level 1
 High timing constraints (in terms of latency
and data throughput)
……..
……..
64
128
4
Execution time : ~500 ns
Weights coded in 16 bits
States coded in 8 bits
with data arriving every BC=25ns
Electrons, tau, hadrons, jets
Neural Network architecture
Very fast architecture
 Matrix of n*m matrix
elements
 Control unit
 I/O module
 TanH are stored in
LUTs
 1 matrix row
computes a neuron
 The results is back-
propagated to
calculate the output
layer
TanH
PE
256 PEs for a 128x64x4 network
PE PE
PE
PE PE PE
PE
PE PE PE
PE
PE PE PE
PE
TanH
TanH
TanH
ACC
ACC
ACC
ACC
I/O module
Control unit
PE architecture
X
Accumulator
Multiplier
Weights mem
Input data 8
16
Addr gen
+
Data in
cmd bus
Control Module
Data out
Technological Features
4 input buses (data are coded in 8 bits)
1 output bus (8 bits)
Processing Elements
Signed multipliers 16x8 bits
Accumulation (29 bits)
Weight memories (64x16 bits)
Look Up Tables
Addresses in 8 bits
Data in 8 bits
Internal speed
Inputs/Outputs
Targeted to be 120 MHz
Neuro-hardware today
 Generic Real time applications
 Microprocessors technology is sufficient to implement most of
neural applications in real-time (ms or sometimes µs scale)
 This solution is cheap
 Very easy to manage
 Constrained Real time applications
 It still remains specific applications where powerful computations
are needed e.g. particle physics
 It still remains applications where other constraints have to be
taken into consideration (Consumption, proximity of sensors,
mixed integration, etc.)
Hardware specific applications
 Particle physics triggering (µs scale or
even ns scale)
Level 2 triggering (latency time ~10µs)
Level 1 triggering (latency time ~0.5µs)
 Data filtering (Astrophysics applications)
Select interesting features within a set of
images
For generic applications : trend of
clustering
 Idea : Combine performances of different
processors to perform massive parallel
computations
High speed
connection
Clustering(2)
 Advantages
Take advantage of the intrinsic parallelism of
neural networks
Utilization of systems already available
(university, Labs, offices, etc.)
High performances : Faster training of a
neural net
Very cheap compare to dedicated hardware
Clustering(3)
 Drawbacks
Communications load : Need of very fast links
between computers
Software environment for parallel processing
Not possible for embedded applications
Conclusion on the Hardware
Implementation
 Most real-time applications do not need dedicated
hardware implementation
 Conventional architectures are generally appropriate
 Clustering of generic architectures to combine performances
 Some specific applications require other solutions
 Strong Timing constraints
 Technology permits to utilize FPGAs
 Flexibility
 Massive parallelism possible
 Other constraints (consumption, etc.)
 Custom or programmable circuits

Mais conteúdo relacionado

Semelhante a tutorial.ppt

Soft Computing-173101
Soft Computing-173101Soft Computing-173101
Soft Computing-173101
AMIT KUMAR
 

Semelhante a tutorial.ppt (20)

Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptxACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
 
10-Perceptron.pdf
10-Perceptron.pdf10-Perceptron.pdf
10-Perceptron.pdf
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019 Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalization
 
Neural Networks-introduction_with_prodecure.pptx
Neural Networks-introduction_with_prodecure.pptxNeural Networks-introduction_with_prodecure.pptx
Neural Networks-introduction_with_prodecure.pptx
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
Ann
Ann Ann
Ann
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cse
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
 
Soft Computing-173101
Soft Computing-173101Soft Computing-173101
Soft Computing-173101
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
 
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdfNEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
 
A temporal classifier system using spiking neural networks
A temporal classifier system using spiking neural networksA temporal classifier system using spiking neural networks
A temporal classifier system using spiking neural networks
 

Mais de Vara Prasad (20)

interduction to electrical power systems
interduction  to electrical power systemsinterduction  to electrical power systems
interduction to electrical power systems
 
high voltge engineering useful for diploma
high voltge engineering useful for diplomahigh voltge engineering useful for diploma
high voltge engineering useful for diploma
 
power point presentation on high voltage
power point presentation on high voltagepower point presentation on high voltage
power point presentation on high voltage
 
PPTChapter12.pdf
PPTChapter12.pdfPPTChapter12.pdf
PPTChapter12.pdf
 
EE504.9.ppt
EE504.9.pptEE504.9.ppt
EE504.9.ppt
 
EE504.5.ppt
EE504.5.pptEE504.5.ppt
EE504.5.ppt
 
EE504.2.ppt
EE504.2.pptEE504.2.ppt
EE504.2.ppt
 
EE503.29.ppt
EE503.29.pptEE503.29.ppt
EE503.29.ppt
 
EE504.74.ppt
EE504.74.pptEE504.74.ppt
EE504.74.ppt
 
EE504.75.ppt
EE504.75.pptEE504.75.ppt
EE504.75.ppt
 
EE504.1.ppt
EE504.1.pptEE504.1.ppt
EE504.1.ppt
 
EM material eee.ppt
EM material eee.pptEM material eee.ppt
EM material eee.ppt
 
GRSP-49-28e.ppt
GRSP-49-28e.pptGRSP-49-28e.ppt
GRSP-49-28e.ppt
 
Capacitors with Dielectrics.ppt
Capacitors with Dielectrics.pptCapacitors with Dielectrics.ppt
Capacitors with Dielectrics.ppt
 
Power_Line_Carrier_Communication.pptx
Power_Line_Carrier_Communication.pptxPower_Line_Carrier_Communication.pptx
Power_Line_Carrier_Communication.pptx
 
Maxwell.ppt
Maxwell.pptMaxwell.ppt
Maxwell.ppt
 
Class 13-Long-Duration Voltage Variations.ppt
Class 13-Long-Duration Voltage Variations.pptClass 13-Long-Duration Voltage Variations.ppt
Class 13-Long-Duration Voltage Variations.ppt
 
EE394V_DG_Week6.ppt
EE394V_DG_Week6.pptEE394V_DG_Week6.ppt
EE394V_DG_Week6.ppt
 
Oxford_citizens_assembly_Eyre_final.pptx
Oxford_citizens_assembly_Eyre_final.pptxOxford_citizens_assembly_Eyre_final.pptx
Oxford_citizens_assembly_Eyre_final.pptx
 
5e361f6f7aa7cf6628a2bfda_Protection-Basics_r3.ppt
5e361f6f7aa7cf6628a2bfda_Protection-Basics_r3.ppt5e361f6f7aa7cf6628a2bfda_Protection-Basics_r3.ppt
5e361f6f7aa7cf6628a2bfda_Protection-Basics_r3.ppt
 

Último

notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
HenryBriggs2
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 

Último (20)

kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptx
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 

tutorial.ppt

  • 1. Tutorial on Neural Networks Prévotet Jean-Christophe University of Paris VI FRANCE
  • 2. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells (neurons)  Each neuron is connected to the others through 10000 synapses  Properties of the brain  It can learn, reorganize itself from experience  It adapts to the environment  It is robust and fault tolerant
  • 3. Biological neuron  A neuron has  A branching input (dendrites)  A branching output (the axon)  The information circulates from the dendrites to the axon via the cell body  Axon connects to dendrites via synapses  Synapses vary in strength  Synapses may be excitatory or inhibitory axon cell body synapse nucleus dendrites
  • 4. What is an artificial neuron ?  Definition : Non linear, parameterized function with restricted output range            1 1 0 n i i i x w w f y x1 x2 x3 w0 y
  • 5. Activation functions 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 -10 -8 -6 -4 -2 0 2 4 6 8 10 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -10 -8 -6 -4 -2 0 2 4 6 8 10 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 Linear Logistic Hyperbolic tangent x y  ) exp( 1 1 x y    ) exp( ) exp( ) exp( ) exp( x x x x y     
  • 6. Neural Networks  A mathematical model to solve engineering problems  Group of highly connected neurons to realize compositions of non linear functions  Tasks  Classification  Discrimination  Estimation  2 types of networks  Feed forward Neural Networks  Recurrent Neural Networks
  • 7. Feed Forward Neural Networks  The information is propagated from the inputs to the outputs  Computations of No non linear functions from n input variables by compositions of Nc algebraic functions  Time has no role (NO cycle between outputs and inputs) x1 x2 xn ….. 1st hidden layer 2nd hidden layer Output layer
  • 8. Recurrent Neural Networks  Can have arbitrary topologies  Can model systems with internal states (dynamic ones)  Delays are associated to a specific weight  Training is more difficult  Performance may be problematic  Stable Outputs may be more difficult to evaluate  Unexpected behavior (oscillation, chaos, …) x1 x2 1 0 1 0 1 0 0 0
  • 9. Learning  The procedure that consists in estimating the parameters of neurons so that the whole network can perform a specific task  2 types of learning  The supervised learning  The unsupervised learning  The Learning process (supervised)  Present the network a number of inputs and their corresponding outputs  See how closely the actual outputs match the desired ones  Modify the parameters to better approximate the desired outputs
  • 10. Supervised learning  The desired response of the neural network in function of particular inputs is well known.  A “Professor” may provide examples and teach the neural network how to fulfill a certain task
  • 11. Unsupervised learning  Idea : group typical input data in function of resemblance criteria un-known a priori  Data clustering  No need of a professor  The network finds itself the correlations between the data  Examples of such networks :  Kohonen feature maps
  • 12. Properties of Neural Networks  Supervised networks are universal approximators (Non recurrent networks)  Theorem : Any limited function can be approximated by a neural network with a finite number of hidden neurons to an arbitrary precision  Type of Approximators  Linear approximators : for a given precision, the number of parameters grows exponentially with the number of variables (polynomials)  Non-linear approximators (NN), the number of parameters grows linearly with the number of variables
  • 13. Other properties  Adaptivity  Adapt weights to environment and retrained easily  Generalization ability  May provide against lack of data  Fault tolerance  Graceful degradation of performances if damaged => The information is distributed within the entire net.
  • 14.  In practice, it is rare to approximate a known function by a uniform function  “black box” modeling : model of a process  The y output variable depends on the input variable x with k=1 to N  Goal : Express this dependency by a function, for example a neural network Static modeling   k p k y x ,
  • 15.  If the learning ensemble results from measures, the noise intervenes  Not an approximation but a fitting problem  Regression function  Approximation of the regression function : Estimate the more probable value of yp for a given input x  Cost function:  Goal: Minimize the cost function by determining the right function g   2 1 ) , ( ) ( 2 1 ) (     N k k k p w x g x y w J
  • 17. Classification (Discrimination)  Class objects in defined categories  Rough decision OR  Estimation of the probability for a certain object to belong to a specific class Example : Data mining  Applications : Economy, speech and patterns recognition, sociology, etc.
  • 18. Example Examples of handwritten postal codes drawn from a database available from the US Postal service
  • 19. What do we need to use NN ?  Determination of pertinent inputs  Collection of data for the learning and testing phase of the neural network  Finding the optimum number of hidden nodes  Estimate the parameters (Learning)  Evaluate the performances of the network  IF performances are not satisfactory then review all the precedent points
  • 20. Classical neural architectures  Perceptron  Multi-Layer Perceptron  Radial Basis Function (RBF)  Kohonen Features maps  Other architectures An example : Shared weights neural networks
  • 21. Perceptron  Rosenblatt (1962)  Linear separation  Inputs :Vector of real values  Outputs :1 or -1 0 2 2 1 1 0    x c x c c + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1   y 1   y 0 c 1 c 2 c  1 x 2 x 1 2 2 1 1 0 x c x c c v    ) (v sign y 
  • 22. Learning (The perceptron rule)  Minimization of the cost function :  J(c) is always >= 0 (M is the ensemble of bad classified examples)  is the target value  Partial cost  If is not well classified :  If is well classified  Partial cost gradient  Perceptron algorithm k x     M k k k pv y c J ) ( k p y k k p k k p k k p x y v y v y      1) - c(k c(k) : ) classified not well is x ( 0 if 1) - c(k c(k) : ) classified well is (x 0 if k k k x k k p k v y c J   ) ( 0 ) (  c J k k k p k x y c c J     ) (
  • 23.  The perceptron algorithm converges if examples are linearly separable
  • 24. Multi-Layer Perceptron  One or more hidden layers  Sigmoid activations functions 1st hidden layer 2nd hidden layer Output layer Input data
  • 25. Learning  Back-propagation algorithm   ) ( ' ) ( ) ( )² ( 2 1 ) ( 0 j j j j j j j j j j j j j j j i j ji j j ji ji j j j n i i ji j j net f o t o t o E o t E net f o E net o o E o w net net E w E w net f o o w w net                                           If the jth node is an output unit j j net E      Credit assignment
  • 26. ) ( ) 1 ( ) ( ) 1 ( ) ( ) ( ) ( ) ( ' t w t w t w t w t o t t w w net f w o net net E o E ji ji ji ji i j ji k kj k j j j k k kj k j j                                 Momentum term to smooth The weight changes over time
  • 27. Structure Types of Decision Regions Exclusive-OR Problem Classes with Meshed regions Most General Region Shapes Single-Layer Two-Layer Three-Layer Half Plane Bounded By Hyperplane Convex Open Or Closed Regions Abitrary (Complexity Limited by No. of Nodes) A A B B A A B B A A B B B A B A B A Different non linearly separable problems Neural Networks – An Introduction Dr. Andrew Hunter
  • 28. Radial Basis Functions (RBFs)  Features  One hidden layer  The activation of a hidden unit is determined by the distance between the input vector and a prototype vector Radial units Outputs Inputs
  • 29.  RBF hidden layer units have a receptive field which has a centre  Generally, the hidden unit function is Gaussian  The output Layer is linear  Realized function        K j j j c x W x s 1 ) (   2 exp                j j j c x c x 
  • 30. Learning  The training is performed by deciding on  How many hidden nodes there should be  The centers and the sharpness of the Gaussians  2 steps  In the 1st stage, the input data set is used to determine the parameters of the basis functions  In the 2nd stage, functions are kept fixed while the second layer weights are estimated ( Simple BP algorithm like for MLPs)
  • 31. MLPs versus RBFs  Classification  MLPs separate classes via hyperplanes  RBFs separate classes via hyperspheres  Learning  MLPs use distributed learning  RBFs use localized learning  RBFs train faster  Structure  MLPs have one or more hidden layers  RBFs have only one layer  RBFs require more hidden neurons => curse of dimensionality X2 X1 MLP X2 X1 RBF
  • 32. Self organizing maps  The purpose of SOM is to map a multidimensional input space onto a topology preserving map of neurons  Preserve a topological so that neighboring neurons respond to « similar »input patterns  The topological structure is often a 2 or 3 dimensional space  Each neuron is assigned a weight vector with the same dimensionality of the input space  Input patterns are compared to each weight vector and the closest wins (Euclidean Distance)
  • 33.  The activation of the neuron is spread in its direct neighborhood =>neighbors become sensitive to the same input patterns  Block distance  The size of the neighborhood is initially large but reduce over time => Specialization of the network First neighborhood 2nd neighborhood
  • 34. Adaptation  During training, the “winner” neuron and its neighborhood adapts to make their weight vector more similar to the input pattern that caused the activation  The neurons are moved closer to the input pattern  The magnitude of the adaptation is controlled via a learning parameter which decays over time
  • 35. Shared weights neural networks: Time Delay Neural Networks (TDNNs)  Introduced by Waibel in 1989  Properties  Local, shift invariant feature extraction  Notion of receptive fields combining local information into more abstract patterns at a higher level  Weight sharing concept (All neurons in a feature share the same weights)  All neurons detect the same feature but in different position  Principal Applications  Speech recognition  Image analysis
  • 36. TDNNs (cont’d)  Objects recognition in an image  Each hidden unit receive inputs only from a small region of the input space : receptive field  Shared weights for all receptive fields => translation invariance in the response of the network Inputs Hidden Layer 1 Hidden Layer 2
  • 37.  Advantages Reduced number of weights  Require fewer examples in the training set  Faster learning Invariance under time or space translation Faster execution of the net (in comparison of full connected MLP)
  • 38. Neural Networks (Applications)  Face recognition  Time series prediction  Process identification  Process control  Optical character recognition  Adaptative filtering  Etc…
  • 39. Conclusion on Neural Networks  Neural networks are utilized as statistical tools  Adjust non linear functions to fulfill a task  Need of multiple and representative examples but fewer than in other methods  Neural networks enable to model complex static phenomena (FF) as well as dynamic ones (RNN)  NN are good classifiers BUT  Good representations of data have to be formulated  Training vectors must be statistically representative of the entire input space  Unsupervised techniques can help  The use of NN needs a good comprehension of the problem
  • 41. Why Preprocessing ?  The curse of Dimensionality The quantity of training data grows exponentially with the dimension of the input space In practice, we only have limited quantity of input data  Increasing the dimensionality of the problem leads to give a poor representation of the mapping
  • 42. Preprocessing methods  Normalization Translate input values so that they can be exploitable by the neural network  Component reduction Build new input variables in order to reduce their number No Lost of information about their distribution
  • 43. Character recognition example  Image 256x256 pixels  8 bits pixels values (grey level)  Necessary to extract features images different 10 2 158000 8 256 256   
  • 44. Normalization  Inputs of the neural net are often of different types with different orders of magnitude (E.g. Pressure, Temperature, etc.)  It is necessary to normalize the data so that they have the same impact on the model  Center and reduce the variables
  • 45.    N n n i i x N x 1 1        N n i n i i x x N 1 2 2 1 1  i i n i n i x x x    Average on all points Variance calculation Variables transposition
  • 46. Components reduction  Sometimes, the number of inputs is too large to be exploited  The reduction of the input number simplifies the construction of the model  Goal : Better representation of the data in order to get a more synthetic view without losing relevant information  Reduction methods (PCA, CCA, etc.)
  • 47. Principal Components Analysis (PCA)  Principle  Linear projection method to reduce the number of parameters  Transfer a set of correlated variables into a new set of uncorrelated variables  Map the data into a space of lower dimensionality  Form of unsupervised learning  Properties  It can be viewed as a rotation of the existing axes to new positions in the space defined by original variables  New axes are orthogonal and represent the directions with maximum variability
  • 48.  Compute d dimensional mean  Compute d*d covariance matrix  Compute eigenvectors and Eigenvalues  Choose k largest Eigenvalues  K is the inherent dimensionality of the subspace governing the signal  Form a d*d matrix A with k columns of eigenvectors  The representation of data consists of projecting data into a k dimensional subspace by ) (    x A x t
  • 49. Example of data representation using PCA
  • 50. Limitations of PCA  The reduction of dimensions for complex distributions may need non linear processing
  • 51. Curvilinear Components Analysis  Non linear extension of the PCA  Can be seen as a self organizing neural network  Preserves the proximity between the points in the input space i.e. local topology of the distribution  Enables to unfold some varieties in the input data  Keep the local topology
  • 52. Example of data representation using CCA Non linear projection of a horseshoe Non linear projection of a spiral
  • 53. Other methods  Neural pre-processing Use a neural network to reduce the dimensionality of the input space Overcomes the limitation of PCA Auto-associative mapping => form of unsupervised training
  • 54. x1 x2 xd …. x1 x2 xd …. z1 zM  Transformation of a d dimensional input space into a M dimensional output space  Non linear component analysis  The dimensionality of the sub-space must be decided in advance D dimensional input space D dimensional output space M dimensional sub-space
  • 55. « Intelligent preprocessing »  Use an “a priori” knowledge of the problem to help the neural network in performing its task  Reduce manually the dimension of the problem by extracting the relevant features  More or less complex algorithms to process the input data
  • 56. Example in the H1 L2 neural network trigger  Principle  Intelligent preprocessing  extract physical values for the neural net (impulse, energy, particle type)  Combination of information from different sub-detectors  Executed in 4 steps Clustering Matching Ordering Post Processing find regions of interest within a given detector layer combination of clusters belonging to the same object sorting of objects by parameter generates variables for the neural network
  • 57. Conclusion on the preprocessing  The preprocessing has a huge impact on performances of neural networks  The distinction between the preprocessing and the neural net is not always clear  The goal of preprocessing is to reduce the number of parameters to face the challenge of “curse of dimensionality”  It exists a lot of preprocessing algorithms and methods  Preprocessing with prior knowledge  Preprocessing without
  • 59. Motivations and questions  Which architectures utilizing to implement Neural Networks in real- time ?  What are the type and complexity of the network ?  What are the timing constraints (latency, clock frequency, etc.)  Do we need additional features (on-line learning, etc.)?  Must the Neural network be implemented in a particular environment ( near sensors, embedded applications requiring less consumption etc.) ?  When do we need the circuit ?  Solutions  Generic architectures  Specific Neuro-Hardware  Dedicated circuits
  • 60. Generic hardware architectures  Conventional microprocessors Intel Pentium, Power PC, etc …  Advantages  High performances (clock frequency, etc)  Cheap  Software environment available (NN tools, etc)  Drawbacks  Too generic, not optimized for very fast neural computations
  • 61. Specific Neuro-hardware circuits  Commercial chips CNAPS, Synapse, etc.  Advantages  Closer to the neural applications  High performances in terms of speed  Drawbacks  Not optimized to specific applications  Availability  Development tools  Remark  These commercials chips tend to be out of production
  • 62. Example :CNAPS Chip 64 x 64 x 1 in 8 µs (8 bit inputs, 16 bit weights, CNAPS 1064 chip Adaptive Solutions, Oregon
  • 63.
  • 64. Dedicated circuits  A system where the functionality is once and for all tied up into the hard and soft-ware.  Advantages  Optimized for a specific application  Higher performances than the other systems  Drawbacks  High development costs in terms of time and money
  • 65. What type of hardware to be used in dedicated circuits ?  Custom circuits  ASIC  Necessity to have good knowledge of the hardware design  Fixed architecture, hardly changeable  Often expensive  Programmable logic  Valuable to implement real time systems  Flexibility  Low development costs  Fewer performances than an ASIC (Frequency, etc.)
  • 66. Programmable logic  Field Programmable Gate Arrays (FPGAs) Matrix of logic cells Programmable interconnection Additional features (internal memories + embedded resources like multipliers, etc.) Reconfigurability  We can change the configurations as many times as desired
  • 67. FPGA Architecture I/O Ports Block Rams Programmable connections Programmable Logic Blocks DLL LUT LUT Carry & Control Carry & Control D Q D Q y yq xb x xq cin cout G4 G3 G2 G1 F4 F3 F2 F1 bx Xilinx Virtex slice
  • 68. Real time Systems Real-Time Systems Execution of applications with time constraints. hard and soft real-time systems digital fly-by-wire control system of an aircraft: No lateness is accepted Cost. The lives of people depend on the correct working of the control system of the aircraft. A soft real-time system can be a vending machine: Accept lower performance for lateness, it is not catastrophic when deadlines are not met. It will take longer to handle one client with the vending machine.
  • 69. Typical real time processing problems  In instrumentation, diversity of real-time problems with specific constraints  Problem : Which architecture is adequate for implementation of neural networks ?  Is it worth spending time on it?
  • 70. Some problems and dedicated architectures  ms scale real time system Architecture to measure raindrops size and velocity Connectionist retina for image processing  µs scale real time system Level 1 trigger in a HEP experiment
  • 71. Architecture to measure raindrops size and velocity  2 focalized beams on 2 photodiodes  Diodes deliver a signal according to the received energy  The height of the pulse depends on the radius  Tp depends on the speed of the droplet  Problematic Tp
  • 72. Input data High level of noise Significant variation of The current baseline Real droplet Noise
  • 73. Feature extractors 5 2 Input stream 10 samples Input stream 10 samples
  • 74. Proposed architecture 20 input windows Presence of a droplet Size Full interconnection Full interconnection Velocity Feature extractors
  • 76. Hardware implementation  10 KHz Sampling  Previous times => neuro-hardware accelerator (Totem chip from Neuricam)  Today, generic architectures are sufficient to implement the neural network in real- time
  • 77. Connectionist Retina  Integration of a neural network in an artificial retina  Screen  Matrix of Active Pixel sensors  CAN (8 bits converter) 256 levels of grey  Processing Architecture  Parallel system where neural networks are implemented Processing Architecture CAN I
  • 78. Processing architecture: “The maharaja” chip Integrated Neural Networks : WEIGHTHED SUM ∑i wiXi EUCLIDEAN (A – B)2 MANHATTAN |A – B| MAHALANOBIS (A – B) ∑ (A – B) Radial Basis function [RBF] Multilayer Perceptron [MLP]
  • 79. The “Maharaja” chip  Micro-controller  Enable the steering of the whole circuit  Memory  Store the network parameters  UNE  Processors to compute the neurons outputs  Input/Output module  Data acquisition and storage of intermediate results Micro-controller Sequencer Command bus Input/Output unit Instruction Bus UNE-0 UNE-1 UNE-2 UNE-3 M M M M
  • 80. Hardware Implementation FPGA implementing the Processing architecture Matrix of Active Pixel Sensors
  • 81. Performances Neural Networks Performances Latency (Timing constraints) Estimated execution time MLP (High Energy Physics) (4-8-8-4) 10 µs 6,5 µs RBF (Image processing) (4-10-256) 40 ms 473 µs (Manhattan) 23ms (Mahalanobis)
  • 82. Level 1 trigger in a HEP experiment  Neural networks have provided interesting results as triggers in HEP. Level 2 : H1 experiment Level 1 : Dirac experiment  Goal : Transpose the complex processing tasks of Level 2 into Level 1  High timing constraints (in terms of latency and data throughput)
  • 83. …….. …….. 64 128 4 Execution time : ~500 ns Weights coded in 16 bits States coded in 8 bits with data arriving every BC=25ns Electrons, tau, hadrons, jets Neural Network architecture
  • 84. Very fast architecture  Matrix of n*m matrix elements  Control unit  I/O module  TanH are stored in LUTs  1 matrix row computes a neuron  The results is back- propagated to calculate the output layer TanH PE 256 PEs for a 128x64x4 network PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE TanH TanH TanH ACC ACC ACC ACC I/O module Control unit
  • 85. PE architecture X Accumulator Multiplier Weights mem Input data 8 16 Addr gen + Data in cmd bus Control Module Data out
  • 86. Technological Features 4 input buses (data are coded in 8 bits) 1 output bus (8 bits) Processing Elements Signed multipliers 16x8 bits Accumulation (29 bits) Weight memories (64x16 bits) Look Up Tables Addresses in 8 bits Data in 8 bits Internal speed Inputs/Outputs Targeted to be 120 MHz
  • 87. Neuro-hardware today  Generic Real time applications  Microprocessors technology is sufficient to implement most of neural applications in real-time (ms or sometimes µs scale)  This solution is cheap  Very easy to manage  Constrained Real time applications  It still remains specific applications where powerful computations are needed e.g. particle physics  It still remains applications where other constraints have to be taken into consideration (Consumption, proximity of sensors, mixed integration, etc.)
  • 88. Hardware specific applications  Particle physics triggering (µs scale or even ns scale) Level 2 triggering (latency time ~10µs) Level 1 triggering (latency time ~0.5µs)  Data filtering (Astrophysics applications) Select interesting features within a set of images
  • 89. For generic applications : trend of clustering  Idea : Combine performances of different processors to perform massive parallel computations High speed connection
  • 90. Clustering(2)  Advantages Take advantage of the intrinsic parallelism of neural networks Utilization of systems already available (university, Labs, offices, etc.) High performances : Faster training of a neural net Very cheap compare to dedicated hardware
  • 91. Clustering(3)  Drawbacks Communications load : Need of very fast links between computers Software environment for parallel processing Not possible for embedded applications
  • 92. Conclusion on the Hardware Implementation  Most real-time applications do not need dedicated hardware implementation  Conventional architectures are generally appropriate  Clustering of generic architectures to combine performances  Some specific applications require other solutions  Strong Timing constraints  Technology permits to utilize FPGAs  Flexibility  Massive parallelism possible  Other constraints (consumption, etc.)  Custom or programmable circuits