1. An Incremental Machine Learning Mechanism Applied to
Robot Navigation
Nawwaf N Kharma, Majd Alwan, and Peter Y K Cheung
Department of Electrical Engineering, Imperial College of Science Technology and Medicine,
London SW7 2BT U.K. Fax: + 44 171 5814419
2. Abstract
In this paper we apply an incremental machine learning algorithm to the problem of robot navigation. The
learning algorithm is applied to a simple robot simulation to automatically induce a list of declarative rules. The
rules are pruned in order to remove the rules that are operationally useless. The final set is initially used to
control the robot navigating an obstacle-free path planned in a polygonal environment with satisfactory results.
Crisp conditions used in the rules are then replaced by fuzzy conditions fashioned by a human expert. The new
set of rules are shown to produce better results.
Keywords
incremental machine learning, tripartite rules, schema, robot navigation.
1. Introduction to make it more suitable for sensorimotor robotic
applications. Our learning mechanism:
Both Classifier Systems and Q-Learning techniques
[1] have two major common deficiencies. These are: * Aims at the automatic induction of a list of
declarative rules that describe the interaction between
A. The Two-Part Rule Form an agent (e.g. robot) and its environment.
Production Rules (PR) (both classifiers and Q- * Is simplified and made practically useful for real-
Learning rules are PR) have two parts only: A left side time applications.
representing the conditions that have to be met before
the right side, the action, is taken. Thus PR in this * Is amended to take into account the richness and the
form are situation-action rules. The information in inherent uncertainties of the real world.
schemas (see 2.2) can be coded in a two-part PR This paper has four main sections, the first represents
syntax. However, for many reasons this is not suitable. the basic assumptions and terms that are needed to
1. There is evidence from animal ethology [2,3] make the algorithm work. The second section
indicating that animals learn an action result describes the main algorithm itself, the third outlines
association, and that this association, as a unit, is then the specific problem and experiments carried out, and
linked with the context. the fourth section shows and discusses the results
obtained.
2. The number of rules in PR systems is determined
by the number of combinations of contexts, actions, 2. Assumptions and Terms
and results that could make up a rule. This could 2.1 Basic Assumptions
result in a very large number of rules. In contrast,
schemas used in this paper are built incrementally and The following are the basic assumptions made, in
hence require less memory. order for the main algorithm to work in line with
expectations:
B. Implicit Representation of Result Values * All learned information may be put in the form of a
1. Q-learning as well as Classifier Systems assign a list of declarative rules.
strength to each rule that implicitly expresses the * The nature of the environment is static. The set of
operational value the rule. This contrasts to schemas laws that govern the environment do not change over
that have explicit declarative result components. time.
* There are no hidden states. The relevant aspects of
2. The rule selection mechanism in PR systems the environment are all detectable through the robot’s
chooses high strength rules, or rules that are in the sensors.
vicinity of high strength ones. This means that * Crisp conditions are initially sufficient for learning.
learning only takes place along the fringe of the state- * Disjunctions of conjunctions of conditions are
space that has already been connected to a goal. While enough to characterise any state.
animats should be allowed to seek knowledge that * Temporal Credit Allocation problem [5] may be
may not have immediate use for the goal at hand. overlooked.
The Learning Mechanism on the other hand is a * Actions are taken in serial. They are finite in
system that learns incrementally (i.e. every rule is duration, and do not destabilise the system.
built in a number of steps) using explicit units of * Relevant results can be pre-defined in terms of a
representation (schemas). The algortihm aims to combination of conditions.
enable robots to acquire and use sensorimotor * There are any number of agents in the environment.
knowledge autonomously, without a priori knowledge Any one of them may be monitored by the learning
of the skills concerned. This algorithm is based on the algorithm.
Schema Mechanism developed by Drescher [4]. It was 2.2 Definiton of Terms
altered and amended significantly in three main ways
• Schema and Schema Space
3. result follow more reliably. Reliability of a schema is
context result measured by the ratio of: the number of times that its
action
extended structures
action is executed, in the right context, and leads to
the fulfilment of its result, to the total number of times
schemai that its action is executed in the right context.
Fig. 1 A schema. • Configuration Parameters and others
The main structure of the learning mechanism is the - Result spin-off: a new schema made out of a copy of
Schema (or rule) Space. The Schema Space is the a previous one, by adding a condition to the result
collection of all schemas. At any time the Schema side.
Space of the robot represents all its knowledge. The
job of the learning algorithm is simply to create, - Context spin-off: a new schema made out of a copy
modify, delete, and possibly link schemas. of a previous one, by adding a condition to the context
side.
A schema representation is made of two main
structures: a Main Body (which comprises of a - θ1: the relevance threshold, for producing result
context, action and result), and the extended spin-offs.
structures. (see Fig. 1.) The main body is a tripartite - N1: the total number of experiments that need to be
rule representing a counterfactual assertion. The taken before a result spin-off is allowed.
extended structures keep information that is mainly
used for creating (or spinning-off) new schemas. - θ2: the reliability threshold, used for producing
context spin-offs.
A schema has both declarative and procedural aspects.
- N2: The number of activations that a result spin-off
Declaratively, it is a unit of information about an schema needs to go through before it is allowed to
interaction between the robot and the environment. produce a context spin-off.
Procedurally, a schema represents a possible action to
be taken at situations when its context is fulfilled and 3. The Main Algorithm
its result is desired. The learning mechanism is best described by
explaining the main algorithm that it embodies. This
The components of a schema are:
main algorithm goes through the following main
• Main Body: steps:
- Conditions: A condition may be viewed as a 1. Randomly select an action and execute it.
function representing the degree of membership (or 2. Use the data collected before, during and after
D.O.M.) of a sensor's output in a set representing that taking the action of the schema in its context, to
condition. In a crisp DOM case, a condition can either update the two sets of correlation statistics
be true or false. 3. Based on the statistics in step 2, the rule base may
be updated as detailed in the algorithmic notation.
- Context, Result and Action: A Context (and 4. Repeat steps 1 to 3 above until the predetermined
similarly a Result) is a conjunction of one or more number of experiments is met.
conditions (and their negations.) A Result could be
either predefined or created at run-time. Contexts of The two phases of rule base update is best described in
reliable schemas are automatically added, at run-time, the following algorithmic notation:
to the set of results. An Action represents a command If ( no of experiments > N1
to an effector to take an action. If an Action is taken AND PTC/NTC(Resulti) >= θ1
then it's command is executed. AND Resulti not used before)
• Extended Structures then Result spin-off
Each schema has extended structures that contain two When update is completed and once the PSC and NSC
main sets of correlation statistics. These statistics are are known context spin-off takes place according to:
necessary for the development of schemas. The first If (no. of experiments > N2
set contains the Positive Transition Correlation (PTC) AND PSC/NSC(Conditioni) >= θ2
and the Negative Transition Correlation (NTC) are AND Conditioni not used before)
used to find relevant results of an action. A relevant then Context spin-off
result of an action is a result that has empirically
shown that it follows the execution of that action 4. Problem and Experiments
significantly more often than other actions from the The learning algorithm is now applied to the problem
robot’s repertoire of actions. The PTC discovers of robot navigation. The goal of this application is to:
positive results while the NTC discovers negative
ones. The second set of statistics contains the Positive * Show that the algorithm is capable to deduce a list
Success Correlation (PSC) and the Negative Success of rules that is capable (if properly pruned) of
Correlation (NSC). PSC is used to find conditions that controlling the navigational behaviour of the robot
when included in the context of a schema, will make navigating an obstacle-free path planned in a given
its result follow more reliably than before adding environment.
these conditions. NSC has the same function as PSC * Investigate the results of fuzzifying the
except that it is used to find conditions that need to be context/result conditions on the execution of the
excluded from the context of a schema to make its deduced rule base.
4. 4.1 The Robot Simulation and the Task to Learn IF right_big ^ left_slight THEN right_small
The robot has a cylindrical body, a differential drive IF right_small ^ left_slight THEN centre
with two independently driven motorised wheels that IF centre ^ left_slight THEN left_small
perform both the diving and steering. Four castors to
support the mobile base on the flour (see Fig. 2.) IF left_small ^ left_slight THEN left_big
C C They were found with different reliability values
depending on the specific series of experiments taken.
DW L DW R The rules produced by the learning algorithm are then
pruned using the criteria of:-
1. Relevance to the goal (heading towards the goal),
C C
2. High reliability.
Fig. 2 A schematic of the Mobile Robot’s Base.
The above rules become:
Steering results from driving the left and right wheels
at different speeds. This arrangement enables the robot IF right_big ^ left_slight THEN right_small
to turn around its centre. IF right_small ^ left_slight THEN centre
The robot is equipped with an on-board electronic With respect to rule block 1, the final list becomes
compass, and odometry for localisation. (put in operational form):
The robot requires two commands, linear speed and IF DirDif: right_big THEN DirOut: left_far
change of direction. These are separated into
individual rotational speed commands for the two IF right_big THEN left_slight
driving motors, which are put in a velocity closed- IF right_small THEN left_far
loop control. The global position control loop is
IF right_small THEN left_slight
closed by the feedback coming from the localisation
system. IF centre THEN straight
The task we want to learn is navigating our robot on a IF left_small THEN right_slight
path consisting of straight line segments. The learnt IF left_small THEN right_far
navigation rulebase should be able to control the robot
to traverse the planned path smoothly. IF left_big THEN right_slight
A simulation of the kinematics and dynamics of the IF left_big THEN right_far
described mobile robot base was used for testing the For the second block of rules, those that are concerned
learnt control rules. The robot simulation links to with the control of the linear velocity of the robot
FuzzyTECH 3.1 development environment [6], where when heading towards a goal, a number of constraints
the rules and the input/output membership functions is placed on the learning algorithm:-
(including crisp ones) can be graphically edited.
1. Due to inertia, the robot is prevented from taking an
4.2 Experimental set-up for learning experiment in which the speed changes suddenly from
The learning algorithm goes through two runs. One to slow to high or high to zero. Speed can only change
discover the block of rules that are relevant to the gradually. This corresponds to real robots with
orientation control. The second block contains rules dynamics, as opposed to mere kinematic simulations.
that control the linear velocity of the robot. 2. We prune the first list of rules according to a
different criteria (from the case with the orientation
The learning algorithm is configured as follows: θ1:= block). This criteria is:
2, N1:= the total number of experiments taken, θ2:= 1,
N2:= 3. Negative spin-off mechanisms are disabled. A. Highest reliability.
B. Maximum distance traversal at each step.
The sets of conditions and actions used are: C. Zero speed at the goal.
DirDif={right_big, right_small, centre, left_small, This gives us the following list of rules:
left_big}, Dist={very_near, near, medium, far},
SpIn={zero, slow, medium, high}, IF Dist: X ^ SpIn: zero THEN SpOut: slow
DirOut={left_far, left_slight, straight, right_slight, IF medium ^ medium THEN medium
right_far}, SpOut={ zero, slow, medium, high }.
IF far ^ medium THEN high
5. Results
IF far ^ slow THEN medium
5.1 Learning Algorithm Results
IF far ^ high THEN high
A series of experiments are fed to the learning
algorithm. These experiments were chosen such that IF medium ^ high THEN medium
they cover, on a uniformly random basis, the context IF medium ^ slow THEN medium
space of the actions concerned. The learning
IF near ^ slow THEN slow
algorithm is run and a series of rules are produced. If
the direction control action left_slight is taken as an IF very_near ^ slow THEN zero
example we will find the following rules are
produced:
5. Since the two blocks of rules are learned separately, However, when appropriate fuzzy membership
separation is enforced in action. This is done via functions replace the crisp ones, the performance of
adding another block of rules which makes sure that the learnt navigation rules significantly improves, as
the speed rules are only active when the robot is Fig. 4 shows. This is because when the robot becomes
heading towards the goal. This special block is: closer to the direction of the goal, the final output of
the orientation control rules is significantly reduced
IF DirOut: right_far THEN SpOut: zero
according to the degree of fulfilment.
IF left_far THEN zero
6. Conclusions and Recommendations
IF right_slight THEN zero
The learning algorithm succeeded in finding the
IF left_slight THEN zero declarative rules that represent, in their totality, the
This means that at execution the robot should first interaction between the robot and the environment.
execute the first set making sure that robot is in the Many of these rules were operationally useless, and
right direction and then the second block starts had to be pruned (according to the criteria mentioned
executing. previously). Once pruned, the resulting rules (both in
crisp and fuzzy forms) were effective in controlling
5.2 Simulation Results the robot in navigation.
We have shown that the performance of the learnt
schemas improves as the context conditions are
fuzzified. Hence, our future work would be making
the learning schema mechanism a fuzzy one, which
would be more general and capable of learning tasks
in the continuous real world. Our learning mechanism,
presented in this paper, readily allows this extension.
7. References
[1] Dorigo M. Et al. (1994) "A comparison of Q-learning and
classifier systems." In From animals to animats 3, edited by D.Cliff et
al. MIT Press, Cambridge, MA.
[2] Rescorla R. (1990) "Evidence for an association between the
disciminative stimulus and the response-outcome association in
instrumental learning." Journal of experimental psychology: animal
behavior process, 16, 326-334.
Fig. 3 Navigation using Crisp Conditions. [3] Roitblat H.(1994) "Mechanism and process in animal behavior:
models of animals, animals as models." In From animals to animats
Fig. 3 shows the robot navigating a planned path using 3, edited by D.Cliff et al. MIT Press, Cambridge, MA.
the learnt rules with crisp membership functions for [4] Drescher G. (1990) "Made-up minds: a constructivist approach to
the context conditions. It is clear that the robot’s artificial intelligence." MIT Press, Cambridge, MA.
centre moved off the straight path segments, due to [5] Holland J H. (1992) “Adaptation in Natural and Artificial
non-overlapping between the straight and the Systems.” MIT Press, Cambridge, MA.
contiguous membership functions, and to its width. [6] FuzzyTECH 3.1 Software Manuals.
This is unsuitable in cluttered environments (e.g. a
narrow corridor). Had the straight membership
function been narrower the robot would have swung
right and left of the path in a zigzag, due to the
activation of exactly the same rules regardless of the
required amount of direction change when change of
direction is required.
Fig. 4 Navigation using Fuzzy Conditions.