Unit 5 Introduction to Planning and ANN.pptx

Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423603
(AnAutonomous Institute Affiliated to Savitribai Phule Pune University, Pune)
NAAC ‘A’GradeAccredited, ISO 9001:2015 Certified
Department of Information Technology
(NBA Accredited)
Department:- Information Technology
Name of Subject:- Artificial Intelligence
Class:- TYIT
Subject Code:- IT313
Sanjivani College of Engineering, Kopargaon Dept of Information Technology

Course Objectives:
1. To understand the basic principles of Artificial Intelligence
2. To provide an understanding of uninformed search strategies.
3. To provide an understanding of informed search strategies.
4. To study the concepts of Knowledge based system.
5. To learn and understand use of fuzzy logic and neural networks.
6. To learn and understand various application domain of Artificial Intelligence.

Planning in AI
 We require domain description, task specification,
and goal description for any planning system.
Planning in artificial intelligence is about decision-
making actions performed by robots or computer
programs to achieve a specific goal.
Execution of the plan is about choosing a sequence
of tasks with a high probability of accomplishing a
specific task.
 A plan is considered a sequence of actions, and
each action has its preconditions that must be
satisfied before it can act and some effects that can
be positive or negative.
 Planning systems do the following;
 divide-and-conquer
 relax requirement for sequential construction of
solutions
We have Forward
(FSSP) and Backward
State
State
Space
Space
Planning
Planning
(BSSP) at the basic level.
Problem solving Planning
States Data structures Logical sentences
Actions Code Preconditions/
Outcomes
Goal Code Logical sentences
Plan Sequence from S0 Constraints on actions

Types of Planning
4
1. Forward State Space Planning (FSSP)
FSSP behaves in the same way as forwarding state-space search. It says that given an initial state S in any
domain, we perform some necessary actions and obtain a new state S' (which also contains some new
terms), called a progression. It continues until we reach the target position. Action should be taken in this
matter.
Disadvantage: Large branching factor
Advantage: The algorithm is Sound
2. Backward State Space Planning (BSSP)
BSSP behaves similarly to backward state-space search. In this, we move from the target state g to the sub-
goal g, tracing the previous action to achieve that goal. This process is called regression (going back to the
previous goal or sub-goal). These sub-goals should also be checked for consistency. The action should be
relevant in this case.
Disadvantages: Not sound algorithm (sometimes inconsistency can be found)
Advantage: Small branching factor (much smaller than FSSP)
So for an efficient planning system, we need to combine the features of FSSP and BSSP.

Block-world planning problem
5
When two sub-goals, G1 and G2, are given, a non-
interleaved planner either produces a plan for G1
that is combined with a plan for G2 or vice versa.
In the block-world problem, three blocks labeled
'A', 'B', and 'C' are allowed to rest on a flat surface.
The given condition is that only one block can be
moved at a time to achieve the target.
The start position and target position are shown
in the following diagram;
 Components of the planning system:
The plan includes the following important steps;
1. Choose the best rule to apply the next rule based on
the best available guess.
2. Apply the chosen rule to calculate the new problem
condition.
3. Find out when a solution has been found.
4. Detect dead ends so they can be discarded and direct
system effort in more useful directions.
5. Find out when a near-perfect solution is found.
 Target stack plan:
1. It is one of the most important planning algorithms
used by STRIPS.
2. Stacks are used in algorithms to capture the action
and complete the target. A knowledge base is used
to hold the current situation and actions.
3. A target stack is similar to a node in a search tree,
where branches are created with a choice of action.

The important steps of the algorithm are mentioned below
1. Start by pushing the original target onto the stack. Repeat this until the pile is empty. If the stack top is a mixed
target, push its unsatisfied sub-targets onto the stack.
2. If the stack top is a single unsatisfied target, replace it with action and push the action precondition to the
stack to satisfy the condition.
3. If the stack top is an action, pop it off the stack, execute it and replace the knowledge base with the action's
effect.
If the stack top is a satisfactory target, pop it off the stack.
 Non-linear Planning:
This Planning is used to set a goal stack and is included in the search
It handles the goal
space of all possible sub-goal orderings.
interactions by the interleaving method.
Advantages of non-Linear Planning:
Non-linear Planning may be an optimal solution concerning
planning length.
Disadvantages of Nonlinear Planning:
It takes a larger search space since all possible goal orderings are
considered.
Algorithm:
1. Choose a goal 'g' from the goal set
2. If 'g' does not match the state, then
i. Choose an operator 'o' whose add-
list matches goal g
ii. Push 'o' on the OpStack
iii. Add the preconditions of 'o' to the
goal set
3. While all preconditions of the operator on
top of OpenStack are met in a state
i. Pop operator o from top of opstack
ii. state = apply(o, state)
iii. plan = [plan, o]

Block world problem using FOL
 In block world problem, the state is described by a set of
predicates representing the facts that were true in that state.
 One must describe for every action, each of the changes it
makes to the state description.
In addition, some statements that everything else remains
unchanged is also necessary.
We are having four types of operations done by robot in block
world environment .They are;
1. UNSTACK (X, Y) : [US (X, Y)] Pick up X from its current
position on block Y. The arm must be empty and X has no
block on top of it.
2. STACK (X, Y): [S (X, Y)] Place block X on block Y. Arm must
holding X and the top of Y is clear.
3. PICKUP (X): [PU (X) ] Pick up X from the table and hold it.
Initially the arm must be empty and top of X is clear.
4. PUTDOWN (X): [PD (X)] Put block X down on the table. The
arm must have been holding block X.
Along with the operations ,some predicates to
be used to describe an environment clearly.
Those predicates are,
ON(X, Y) -
ONT(X) -
CL(X) -
HOLD(X) -
AE -
Block X on block Y.
Block X on the table.
Top of X clear.
Robot-Arm holding X.
Robot-arm empty.
Logical statements true in this block world.
1. X Holding X means, arm is not empty
( ∃X) HOLD (X) → ~ AE
X is on a table means that X is not on the top of
any block
(∀X) ONT (X) → ~ (∃Y) ON (X, Y)
Any block with no block on has clear top
(∀X) (~ (∃Y) ON (Y
,X)) → CL (X)

STRIPS
STRIPS stands for "STanford Research Institute Problem
Solver," was the planner used in Shakey, one of the first
robots built using AI technology ,which is an action-centric
representation ,for each action , specifies the effect of an
action.
 A STRIPS planning problem specifies;
 an initial state S,
 a goal G,
 a set of STRIPS actions.
The STRIPS representation for an action consists of three
lists,
1. Pre_Cond list contains predicates which have to be
true before operation.
2. ADD list contains those predicates which will be true
after operation.
3. DELETE list contain those predicates which are no
longer true after operation.
 Predicates not included on either of these lists are
assumed to be unaffected by the operation.
 Frame axioms are specified implicitly in STRIPS which
greatly reduces amount of information stored.
 Let us discuss about the action lists for operations of
block world problem;
Stack (X, Y)
Pre:
Del:
Add:
CL (Y) ,HOLD (X)
CL (Y), HOLD (X)
AE , ON (X, Y)
UnStack (X, Y)
Pre:
Del:
Add:
ON (X, Y) , CL (X) , AE
ON (X, Y) , AE
HOLD (X) , CL (Y)
Pickup (X)
Pre:
Del:
Add:
ONT (X) , CL (X) ,AE
ONT (X) , AE
HOLD (X)
Putdown (X)
Pre:
Del:
Add:
HOLD (X)
HOLD (X)
ONT (X) , AE

Goal Stack Planning
Goal Stack Planning (GSP) is the one of the
simplest planning algorithm that is designed to
handle problems having compound goals.
And it utilizes STRIP as a formal language for
specifying and manipulating the world with
which it is working.
This approach uses a Stack for plan
generation. The stack can contain Sub-goal and
actions described using predicates.
 The Sub-goals can be solved one by one in
any order.
Algorithm:
Push the Goal state in to the Stack
Push the individual Predicates of the Goal State into the Stack
Loop till the Stack is empty
Pop an element E from the stack
IF E is a Predicate
IF E is True then
Do Nothing
ELSE
Push the relevant action into the Stack
Push the individual predicates of the Precondition of
the action into the Stack
Else IF E is an Action
Apply the action to the current State.
Add the action ‘a’ to the plan

Implementation using Goal Stack Planning
 Lets start here with the example above, the initial
state is our current description of our world.
 The Goal state is what we have to achieve.
The following list of actions can be applied to
the various situation in our problem;

Goal Stack Planning
1. First step is to push the goal into the stack.
The popped element is indicated with a strike-through in
the above diagram. The element is ON(B,D) which is a
predicate and it is not true in our current world.
2. Next push the individual predicates of the goal into
the stack.
3. Now pop an element out from the stack.

Goal Stack Planning
4. The next step is to push the relevant action which
could achieve the sub-goal ON(B,D) in to the stack.
5. Now again push the precondition of the action
Stack(B,D) into the stack.
The HOLDING(B) is pushed first and CLEAR(D) is pushed next
indicating that the HOLDING sub-goal has to be done second
comparing with the CLEAR. Because we are considering the
block world with single arm robot and everything that we
usually do here is depending on the robotic arm
12
K. S. Ubale

Goal Stack Planning
iii)
i) The popped element is HOLDING(B) which is a predicate and
note that it is not true in our current world.
ii) So we have to push the relevant action into the stack. In
order to make the HOLDING(D) to be true there are possibly
two action that can achieve it.
One is PICKUP(D) and the other is UNSTACK(D,y). But now in-
order to choose the best among the two actions available
we have to think ahead and utilize the heuristics possibly.
iv) For instance if we choose PICKUP(B) then first of all BLOCK
D should be available on the table. For that we have to
UNSTACK(B,D) and it will achieve HOLDING(B) which is what
we want but if we use PICKUP
PUTDOWN(B) making HOLDING(B)
then we need to
false and then use
PICKUP(B) action to achieve HOLDING(B) again which can be
easily achieved by using UNSTACK.
v) So the best action is UNSTACK(B,y) and it also makes the
current situation more close to the goal state. The variable y
indicates any block below D.
After popping we see that CLEAR(D) is true in the
current world model so we don’t have to do anything.
7. So again pop the stack.
6. POP an element out from the stack.

Goal Stack Planning
9. Now push the individual precondition of UNSTACK
(B,C) into the stack.
8. Lets push the action UNSTACK(B,C) into the stack.
10. POP the stack. Note here that on popping we could see
that ON(B,C) ,CLEAR(B) AND ARMEMPTY are true in our
current world. So don’t do anything.
11. Now again pop the stack .
When we do that we will get an action, so just apply the action to the
current world and add that action to plan list. Plan= { UNSTACK(B,C) }

Goal Stack Planning
12. Again pop an element. Now its STACK(B,D) which is
an action so apply that to the current state and add it to
the PLAN. PLAN= { UNSTACK(B,C), STACK(B,D) }
Plan= { UNSTACK(B,C) }
13. Now the stack will look like the one given below and
our current world is like the one above.
PLAN= { UNSTACK(B,C), STACK(B,D) }

Goal Stack Planning
15. STACK(C,A) is pushed now into the stack and now
push the individual preconditions of the action into
the stack.
17. In order to achieve HOLDING(C) we have to push the
action PICKUP(C) and its individual preconditions into the
stack.
16. Now pop the stack. We will get CLEAR(A) and it is
true in our current world so do nothing. Next element
that is popped is HOLDING(C) which is not true so push
the relevant action into the stack.
14. Again pop the stack. The popped element is a
predicate and it is not true in our current world so
push the relevant action into the stack.

Goal Stack Planning
PLAN= { UNSTACK(B,C), STACK(B,D) ,PICKUP(C) }
18. Now doing pop we will get ONTABLE(C) which is true in
our current world. Next CLEAR(C) is popped and that also
is achieved. Then PICKUP(C) is popped which is an action so
apply it to the current world and add it to the PLAN. The
world model and stack will look like below,
19. Again POP the stack, we will get STACK(C,A)
which is an action apply it to the world and insert it
to the PLAN.
PLAN= { UNSTACK(B,D), STACK(B,D) ,PICKUP(C) ,
STACK(C,A) }

Goal Stack Planning
20. Now pop the stack we will get CLEAR(C) which is already achieved in our current situation. So we don’t need to
do anything.
At last when we pop the element we will get all the three sub-goal which is true and our PLAN will contain all the
necessary actions to achieve the goal.
PLAN= { UNSTACK(B,D), STACK(B,D) ,PICKUP(C) ,STACK(C,A) }

Artificial Neural Networks
The term "Artificial Neural
Network" is derived from
Biological neural networks that
develop the structure of a
human brain.
Similar to the human brain
that has neurons
interconnected to one another,
artificial neural networks also
have neurons that are
interconnected to one another
in various layers of the
networks. These neurons are
known as nodes.

biological neuron (left) and a common mathematical model (right)

 The basic unit of computation in a neural network is the neuron, often called a node or unit.
It receives input from some other nodes, or from an external source and computes an output.
 Each input has an associated weight (w), which is assigned on the basis of its relative
importance to other inputs. The node applies a function to the weighted sum of its inputs.
The idea is that the synaptic strengths (the weights w) are learnable and control the
strength of influence and its direction: excitory (positive weight) or inhibitory (negative weight)
of one neuron on another.
In the basic model, the dendrites carry the signal to the cell body where they all get
summed. If the final sum is above a certain threshold, the neuron can fire, sending a spike
along its axon.
 In the computational model, we assume that the precise timings of the spikes do not matter,
and that only the frequency of the firing communicates information.
 We model the firing rate of the neuron with an activation function (e.x sigmoid function),
which represents the frequency of the spikes along the axon.

Artificial Neural Networks and the Brain
Artificial neural networks doesn’t work like our brain, ANN are simple crude comparison,
the connections between biological networks are much more complex than those
implemented by Artificial neural network architectures.
Remember, our brain is much more complex and there is more we need to learn from it.
There are many things we don’t know about our brain and this also makes hard to know
how we should model an Artificial Brain to reason at human level.
 Whenever we train a neural network, we want our model to learn;
 the optimal weights (w)
 that best predicts the desired outcome (y)
 given the input signals or information (x).

The architecture of an artificial neural network
To understand the concept of the architecture of an artificial neural network, we have to
understand what a neural network consists of.
 In order to define a neural network that consists of a large number of artificial neurons,
which are termed units arranged in a sequence of layers.

1.Input Nodes (input layer): No computation is done here within this layer, they just pass the
information to the next layer (hidden layer most of the time). A block of nodes is also called layer.
2.Hidden nodes (hidden layer): In Hidden layers is where intermediate processing or computation is
done, they perform computations and then transfer the weights (signals or information) from the input
layer to the following layer (another hidden layer or to the output layer). It is possible to have a neural
network without a hidden layer.
3. Output Nodes (output layer): Here we finally use an activation function that maps to the desired
output format (e.g. softmax for classification).
4.Connections and weights: The network consists of connections, each connection transferring the
output of a neuron i to the input of a neuron j. In this sense i is the predecessor of j and j is the
successor of i, Each connection is assigned a weight Wij.

5. Activation function:
 The activation function of a node defines the output of that node given an input or set of inputs.
Eg: A standard computer chip circuit can be seen as a digital network of activation functions that can
be “ON” (1) or “OFF” (0), depending on input.
This is similar to the behavior of the linear perceptron in neural networks. However, it is the
nonlinear activation function that allows such networks to compute nontrivial problems using only a small
number of nodes. In artificial neural networks this function is also called the transfer function.
6. Learning rule: The learning rule is a rule or an algorithm which modifies the parameters of the
neural network, in order for a given input to the network to produce a favored output.
This learning process typically amounts to modifying the weights and thresholds.

Types of Neural Networks
1. Feedforward Neural Network:
A feedforward neural network is an artificial neural network where
connections between the units do not form a cycle.
In this network, the information moves in only one direction, forward, from
the input nodes, through the hidden nodes (if any) and to the output nodes.
There are no cycles or loops in the network.
We can distinguish three types of feedforward neural networks:
1.1. Single-layer Perceptron:
 This is the simplest feedforward neural Network and does not contain any
hidden layer, which means it only consists of a single layer of output nodes.
This is said to be single because when we count the layers we do not include
the input layer, the reason for that is because at the input layer no
computations is done, the inputs are fed directly to the outputs via a series of
weights.

1.2. Multi-layer perceptron (MLP):
This class of networks consists of multiple layers
of computational units, usually interconnected in a
feed-forward way.
 Each neuron in one layer has directed
connections to the neurons of the subsequent layer.
 In many applications the units of these networks
apply a sigmoid function as an activation function.
MLP are very more useful and one good reason is
that, they are able to learn non-linear
representations.

1.3. Convolutional Neural Network (CNN):
Convolutional Neural Networks are very similar to
ordinary Neural Networks, they are made up of neurons
that have learnable weights and biases.
In convolutional neural network (CNN, or ConvNet or
shift invariant or space invariant) the unit connectivity
pattern is inspired by the organization of the visual
cortex, units respond to stimuli in a restricted region of
space known as the receptive field.
Receptive fields partially overlap, over-covering the
entire visual field. Unit response can be approximated
mathematically by a convolution operation. They are
variations of multilayer perceptrons that use minimal
preprocessing.
Their wide applications is in image and video
recognition, recommender systems and natural language
processing. CNNs requires large data to train on.

2. Recurrent neural networks:
In recurrent neural network (RNN), connections between units form a directed cycle (they
propagate data forward, but also backwards, from later processing stages to earlier stages).
 This allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks,
RNNs can use their internal memory to process arbitrary sequences of inputs.
This makes them applicable to tasks such as unsegmented, connected handwriting
recognition, speech recognition and other general sequence processors.

Commonly used activation functions
 Every activation function (or non-linearity) takes a single number and performs a certain
fixed mathematical operation on it.
Activation functions are also known as transfer function is used to map input nodes to
output nodes in certain fashion.
They are used to impart non linearity .
 Here are some activations functions you will often find in practice:
1. Sigmoid
2. Tanh
3. ReLU
4. Leaky ReLU

 Identity or linear activation function :-
→ F(x) = x
→ We will get the exact same curve.
→ Input maps to same output.
 Binary Step:-
→ Very useful in classifiers.

 Logistic or Sigmoid:-
→ Maps any sized inputs to outputs in range [0,1].
→ Useful in neural networks.
 Tanh:-
→ Maps input to output ranging in [-1,1].
→Similar to sigmoid function except it maps output
in [-1,1] whereas sigmoid maps output to [0,1].

Rectified Linear Unit (ReLu):-
→ It removes negative part of function.
Leaky ReLu:-
→ The only difference between ReLu and Leaky
ReLu is it does not completely vanishes the
negative part, it just lower its magnitude.

Softmax:-
→ Softmax function is used to impart probabilities
when you have more than one outputs you get
probability distribution of outputs.
→Useful for finding most probable occurrence of
output with respect to other outputs.

Representation of ANN
To make things clearer, lets understand ANN using a simple example;
A bank wants to assess whether to approve a loan application to a customer, so, it wants to predict
whether a customer is likely to default on the loan. It has data like;

Representation of ANN

Key Points related to the architecture
1. The network architecture has an input layer, hidden layer (there can be more than 1) and the output layer.
It is also called MLP (Multi Layer Perceptron) because of the multiple layers.
2.The hidden layer can be seen as a “distillation layer” that distills some of the important patterns from the
inputs and passes it onto the next layer to see. It makes the network faster and efficient by identifying only
the important information from the inputs leaving out the redundant information
3. The activation function serves two notable purposes:
- It captures non-linear relationship between the inputs
- It helps convert the input into a more useful output.
In the above example, the activation function used is sigmoid;
O1 = 1 / (1+exp(-F))
Where F = W1*X1 + W2*X2 + W3*X3
Sigmoid activation function creates an output with values between 0 and 1. There can be other activation
functions like Tanh, softmax and RELU.

4. Similarly, the hidden layer leads to the final prediction at the output layer:
O3 = 1 / (1+exp(-F 1))
Where F 1= W7*H1 + W8*H2
Here, the output value (O3) is between 0 and 1. A value closer to 1 (e.g. 0.75) indicates that there is a higher
indication of customer defaulting.
5.The weights W are the importance associated with the inputs. If W1 is 0.56 and W2 is 0.92, then there is
higher importance attached to X2: Debt Ratio than X1: Age, in predicting H1.
6.The above network architecture is called “feed-forward network”, as you can see that input signals are
flowing in only one direction (from inputs to outputs). We can also create “feedback networks where signals
flow in both directions.

7. A good model with high accuracy gives predictions that are very close to the actual values.
So, in the table above, Column X values should be very close to Column W values. The error in
prediction is the difference between column W and column X;

8.The key to get a good model with accurate predictions is to find “optimal values of W —
weights” that minimizes the prediction error. This is achieved by “Back propagation
algorithm” and this makes ANN a learning algorithm because by learning from the errors, the
model is improved.
9.The most common method of optimization algorithm is called “gradient descent”, where,
iteratively different values of W are used and prediction errors assessed. So, to get the optimal
W, the values of W are changed in small amounts and the impact on prediction errors assessed.
Finally, those values of W are chosen as optimal, where with further changes in W, errors are
not reducing further.

Perceptron Learning Rule
Perceptron learning rule – Network starts its learning by assigning a random value to each
weight.
iii)
i) Each connection in a neural network has an associated weight, which changes in the
course of learning. According to it, an example of supervised learning, the network starts
its learning by assigning a random value to each weight.
ii) Calculate the output value on the basis of a set of records for which we can know the
expected output value. This is the learning sample that indicates the entire definition. As a
result, it is called a learning sample.
The network then compares the calculated output value with the expected value. Next
calculates an error function ∈,which can be the sum of squares of the errors occurring for
each individual in the learning sample.

Case of binary classification in Perceptron
 Imagine we have a binary classification problem at
hand, and we want to use a perceptron to learn this task.
 So, the perceptron can produce 2 values: +1 / -1 where
+1 means that the input example belongs to the + class,
and -1 means the input example belongs to the – class.
Obviously, as we have 2 classes, we would want to learn
the weight vector of our perceptron in such a way that,
for every training example (depending on whether it
belongs to the + / – class), the perceptron would produce
the correct +1 / -1.
NOTE: We define which class is + and which is -! Moreover, we can train the perceptron and find a weight vector that
produced +1 for – class and -1 for + class! It doesn’t really matter, as long as the perceptron can generate
2 different outputs for the instances that belong to class + / -. This is how you can measure the separating, and
classification power of the perceptron.

Working of Perceptron learning algorithm
1. Consider supervised learning here, which means that we know the true class labels for every
training example in our training set. As a result, in the perceptron training rule, we would initialize
the weights at random and then feed the training examples into our perceptron and look at the
produced outputthat can be either +1 or -1!
2. So, we would want the perceptron to produce +1 for one class and -1 for the other. After
observing the output for a given training example, we will NOT modify the weights unless the
produced output was wrong!
3. For example, if we want to produce +1 for + class and -1 for the – class, and if we fed an instance
of the – class and the perceptron returned +1, then it means that we need to modify the
parameters of our network, i.e., the weights.
4. We will keep this process, and we will keep iterating through the training set until the perceptron
classifies all the training examples correctly.

How do we update the weights?
 At every step of feeding a training example, when the perceptron fails to produce the correct +1/-1, we revise
every weight wi associated with every input xi, according to the following rule: wi = wi + Δwi where; Δwi = η(t – o)xi
The variables in here are described as follows:
1. Δwi : This means how much should I change the value of the weight. In other words, this is the amount that is
added to the old value of to update it. This can be positive or negative, meaning we might increase or decrease
2. η : This is the learning rate, or the step size. We tend to choose a small value for this, as if it is too big we will
never converge and if it is too small, we will take for ever to converge to the correct weight vector and have a
decent classifier. This step size, simply moderates the weight updates just so the updates would not make an
aggressive change to the old values of the weights.
3. t : This is the ground truth label that we have for every training example in our training set. For a classification
task, as we know that our perceptron can produce either +1 or -1, then we will consider to be +1 for the +ve
examples and -1 for the negative examples. Then we will train our classifier to produce the correct +1 and -1 for
the + and – examples. That is, +1 for the + examples and -1 for the – examples (we determine which class is +
and which class is – )
4. o : This is the output of our model, which in this case can be either+1 or -1.
5. xi : This is the dimension of our input training example , which is connected to the weight

The Intuition Behind the Perceptron Training Rule
Suppose our perceptron correctly classified a training example! Then clearly, we know that we will not
need to change the weights of our perceptron! But does our learning rule confirm this as well?
 If the example has been classified correctly, then it means that (t – o) is 0! Why? Because when an
example is classified correctly, the output of our perceptron is for sure equal to our ground truth, i.e., o = t!
 Now let’s say the correct class was indeed the positive class where t =1, but our perceptron predicted the
negative class, that is the output is -1, o = -1.
So, looking at the figure of our perceptron, and knowing that for this particular example our perceptron
has made a mistake, we realize that we need to change the weights in such a way that the output o would
get closer to t. This means that we need to increase the value of the output, o.
So, it seems that we need to increase the weights in such a way that w.x would increase! This way, if our
input data are all positive xi > 0, then for sure increasing wi will bring the perceptron closer to correctly
classifying this particular training example!
Now, would you say our training rule would also follow our logic? Meaning, would it increase the wi? Well,
in this case (t – o), η, and xi are all positive, so Δwi is also positive, which means that we are increasing the
old value of wi positively.

Perceptron Learning Algorithm
 Steps for binary classification problem:
1. Add an extra component with the value 1 to each input
vector. This is the bias term.
2. Pull the training samples, and run each one through the
classifier.
3. If the output is correct, leave the weights alone.
4. If the output is incorrect, and a false negative (gives 0
when should give 1), add the input vector to the weights
vector.
5. If the output is incorrect, and a false positive (gives 1
when it should give 0), subtract the input vector from
the weights vector.
In the perceptron model, inputs can be real numbers. The output
from the model will still be binary {0, 1}. The perceptron model takes
the input x if the weighted sum of the inputs is greater than threshold
b output will be 1 else output will be 0.

Advantages of Neural Networks
1) Store information on the entire network
Just like it happens in traditional programming where information is stored on the network
and not on a database. If a few pieces of information disappear from one place, it does not
stop the whole network from functioning.
2) The ability to work with insufficient knowledge:
After the training of ANN, the output produced by the data can be incomplete or insufficient.
The importance of that missing information determines the lack of performance.
3) Good fault tolerance:
The output generation is not affected by the corruption of one or more than one cell of
artificial neural network. This makes the networks better at tolerating faults.

Advantages of Neural Networks
48
4) Distributed memory:
For an artificial neural network to become able to learn, it is necessary to outline the examples and to
teach it according to the output that is desired by showing those examples to the network. The
progress of the network is directly proportional to the instances that are selected.
5) Gradual Corruption:
Indeed a network experiences relative degradation and slows over time. But it does not immediately
corrode the network.
6) Ability to train machine:
ANN learn from events and make decisions through commenting on similar events.
7) The ability of parallel processing:
These networks have numerical strength which makes them capable of performing more than one
function at a time.

Applications of Neural Networks
4
9
Handwriting Recognition
Neural networks are used to convert handwritten characters into digital characters that a machine can
recognize.
Stock-Exchange prediction
The stock exchange is affected by many different factors, making it difficult to track and difficult to
understand. However, a neural network can examine many of these factors and predict the prices daily,
which would help stockbrokers.
Traveling Issues of sales professionals
This application refers to finding an optimal path to travel between cities in a given area. Neural networks
help solve the problem of providing higher revenue at minimal costs.
Image compression
The idea behind neural network data compression is to store, encrypt, and recreate the actual image again.
Therefore, we can optimize the size of our data using image compression neural networks. It is the ideal
application to save memory and optimize it.

Types of Neuron Connection architecture
There exist five basic types of neuron connection architecture :
1. Single-layer feed-forward network
2. Multilayer feed-forward network
3. Single node with its own feedback
4. Single-layer recurrent network
5. Multilayer recurrent network

Single-layer feed-forward network
In this type of network, we have only
two layers input layer and output layer
but the input layer does not count because
no computation is performed in this layer.
The output layer is formed when
different weights are applied on input
nodes and the cumulative effect per node
is taken.
After this, the neurons collectively give
the output layer to compute the output
signals.

Multilayer feed-forward network
This layer also has a hidden layer that is
internal to the network and has no direct
contact with the external layer.
The existence of one or more hidden
layers enables the network to be
computationally stronger, feed-forward
network because of information owns
through the input function, and the
intermediate computations used to define
the output Z.
There are no feedback connections in
which outputs of the model are fed back
into itself.

Single node with its own feedback
When outputs can be directed back as
inputs to the same layer or preceding
layer nodes, then it results in feedback
networks.
 Recurrent networks are feedback
networks with closed loops.
The figure shows a single recurrent
network having a single neuron with
feedback to itself.

Single-layer recurrent network
The network is a single-layer network with
a feedback connection in which the
processing element’s output can be directed
back to itself or to another processing
element or both.
A recurrent neural network is a class of
artificial neural networks where connections
between nodes form a directed graph along a
sequence.
This allows it to exhibit dynamic temporal
behavior for a time sequence. Unlike
feedforward neural networks, RNNs can use
their internal state (memory) to process
sequences of inputs.

Multilayer recurrent network
 In this type of network, processing element
output
element
can be directed to the processing
in the same layer and in the
preceding layer forming a multilayer
recurrent network.
They perform the same task for every
element of a sequence, with the output being
dependent on the previous computations.
Inputs are not needed at each time step.
The main feature of a Recurrent Neural
Network is its hidden state, which captures
some information about a sequence.

Multilayer Perceptron Example
Given a set of features X = (x1, x2, ...) and a target y, a Multi Layer Perceptron can learn the relationship between
the features and the target, for either classification or regression.
Lets take an example to understand Multi Layer Perceptrons better. Suppose we have the following student-marks
dataset;
i) The two input columns show the number of hours the
student has studied and the mid term marks obtained by the
student.
ii) The Final Result column can have two values 1 or 0 indicating
whether the student passed in the final term. For example,
we can see that if the student studied 35 hours and had
obtained 67 marks in the mid term, he / she ended up
passing the final term.
iii) Now, suppose, we want to predict whether a student
studying 25 hours and having 70 marks in the mid term will
pass the final term.

Multilayer Perceptron Example
Training our MLP: The Back-Propagation Algorithm:
The process by which a Multi Layer Perceptron learns is called the Backpropagation algorithm. BackProp
is like "learning from mistakes". The supervisor corrects the ANN whenever it makes mistakes.
BackProp Algorithm:
1. Initially all the edge weights are randomly assigned. For every input in the training dataset, the ANN is
activated and its output is observed.
2. This output is compared with the desired output that we already know, and the error is "propagated"
back to the previous layer.
3. This error is noted and the weights are "adjusted" accordingly. This process is repeated until the
output error is below a predetermined threshold.
4.Once the above algorithm terminates, we have a "learned" ANN which, we consider is ready to work
with "new" inputs. This ANN is said to have learned from several examples (labeled data) and from its
mistakes (error propagation).

References
Dept of Information Technology
S
S
a
a
n
n
j
j
i
i
v
v
a
a
n
n
i
iC
C
o
o
l
l
l
l
e
e
g
g
e
eo
o
f
fE
E
n
n
g
g
i
i
n
n
e
e
e
e
r
r
i
i
n
n
g
g
,
,K
K
o
o
p
p
a
a
r
r
g
g
a
a
o
o
n
n
50
• “Introduction to Artificial Neural Systems”, Jacek M. Zurada, Jaico

Thank you

Unit 5 Introduction to Planning and ANN.pptx

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Unit 5 Introduction to Planning and ANN.pptx

Semelhante a Unit 5 Introduction to Planning and ANN.pptx (20)

Mais de DrYogeshDeshmukh1

Mais de DrYogeshDeshmukh1 (13)

Último

Último (20)

Unit 5 Introduction to Planning and ANN.pptx