Lesson12: Reinforcement Learning for Critterbot Science 8

ReinforcementReinforcement
LearningLearning
Science 8 Unit B: Cells and Systems (Nature of Science Emphasis)

Introduction
> What does it mean to have a behaviour
reinforced?
> Let’s look at a famous example first...

Introduction
Ivan Pavlov (1849-1936)
> Born in Russia in 1849, Ivan Pavlov abandoned
a religious career for which he had been
preparing, and instead went into science.
> His work had a great impact on the field of
physiology (the study of the mechanical,
physical, and biochemical functions of living
organisms) by studying the mechanisms
underlying the digestive system in mammals.
Source: Nobelprize.org

Introduction
> Pavlov was awarded the Nobel Prize in
Physiology or Medicine in 1904. He then
turned to studying reflexes, in particular with
dogs.
> His discoveries led to the science of
behaviour.

Introduction
> Pavlov became interested in
studying reflexes when he
noticed that dogs sometimes
drooled even without food
being shown to them.
> Although no food was in sight,
their saliva still dribbled. It
turned out that the dogs were
reacting to lab coats.

Introduction
> Every time the dogs were served
food, the person who served the food
was wearing a lab coat.
> The lab coats became a “stimulus”.

Introduction
> A stimulus is anything capable of
evoking a response in an
organism.
> Examples of stimuli include
sights, sounds, heat, cold, smells,
or other sensations.
> Therefore, the dogs reacted as if
food was on its way whenever
they saw a lab coat.

Introduction
> In a series of experiments, Pavlov
then tried to figure out why this was
happening.
> For example, he struck a bell when
the dogs were fed. If the bell was
sounded close to meal time, the
dogs learnt to associate the sound of
the bell with food.
> After a while, the stimulus of the bell,
caused them to drool.

More on Pavlov's Dog
> You can read more about Pavlov’s dog and
see if you can train a dog to drool on command
online at the Nobel Prize website.

Reinforcement Learning
> Dogs are often trained through a method of
reinforcement.
> For example, if a dog hears the word “sit” and
receives a treat, he or she will learn that
“sitting” provides a treat.
> In fact, almost all animals can learn through
reinforcement.

Definition:
– Reinforcement occurs when an event following a
response causes an increase in the probability of
that response occurring in the future.
> So when a dog hears “sit” (response) and
receives a treat (event), the dog will more
likely sit in the future in hopes of receiving
another treat.

> If animals (including humans) can learn by
reinforcement, can a machine also learn
through reinforcement?
> Computing Scientists at the Centre for
Machine Learning believe so, and they are
building a robot that learns through
reinforcement.

> The robot is called
“Critterbot”.
> The robot responds to
stimuli in the environment.
> For lessons on Critterbot
see Critterbot for Physics
30 and Critterbot for
Science 8.

How can a Machine be Reinforced?
> In Machine Learning (which is a type of
artificial intelligence) the “learner” is a
computer that learns by trying to obtain a
maximum reward.
> So what does a computer or robot want as a
reward?
– Just a number.
-1 0
1
-1
0
1
-1 0
1
0
1
-1

> A positive reward will result in a “1”
> A neutral reward will result in a “0”
> A negative reward will result in a “-1”

> What separates Reinforcement Learning
from other forms of artificial intelligence is
that the learner is never told what actions to
take.
> The learner uses a trial-and-error search
approach and if it receives a positive reward,
will continue that action.
> But if it receives a negative reward, it will
learn to avoid that action.

Questions
1. How is a robot that uses Machine Learning
different from robot that is programmed for
specific tasks?
– Answer: In Machine Learning, the robot is not told
what actions to take. It learns by trial and error.

Questions
2. A robot in a car factory is designed to build
cars at a fast rate. Would Machine Learning
be a good application for a car building
machine? Why or why not?
Answer: No, probably not.
Robots that build use
specific designs to ensure
they build exactly as they
are told.

Questions
3. Are dogs the only animals that respond to a
stimulus by salivating? For example, what
happens to you when you are just about to put
a pickle in your mouth? Or mustard? Or a
sour candy?
– Answer: Humans also respond to visual stimuli and
will salivate at the sight of some stimuli.

Questions
4. Critterbot was designed to respond to stimuli
(plural for stimulus). Imagine that you had to
design a robot to that will automatically shovel
snow from your driveway every winter.
– The robot cannot have any human assistance, it has
to be autonomous (work on its own).
– First, come up with a ‘cool’ name for your robot.
– Use drawings and written descriptions to write up a
one page explanation of how your robot would work.
continued...

Question 4 continued.
– What types of sensors would it need to have to work
without your assistance? Remember, it is only going
to shovel your driveway, and not wander down the
street shovelling every driveway.
– Animals require energy and use special systems to
convert food into energy. For example, the digestive
system takes in food, digests it to extract energy and
nutrients.
– How will your robot gets its energy? Remember, it
has to work in winter conditions, most often when it
is snowing.

Centre for Mathematics Science and Technology Education (CMASTE)
382 Education South
University of Alberta
Edmonton AB T6G 2G5
www.CMASTE.ca
To download: select Outreach, Alberta Ingenuity Resources and Centre for Machine Learning
Filename: AICML6BrainTumourAnalysis
Centre for Machine Learning
Department of Computing Science
University of Alberta
2-21 Athabasca Hall
Edmonton AB T6G 2E8
(780) 492-4828
www.machinelearningcentre.ca
Alberta Ingenuity
2410 Manulife Place, 10180-101 Street
Edmonton AB T5J 3S4
(780) 423-5735
www.albertaingenuity.ca

Lesson12: Reinforcement Learning for Critterbot Science 8

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Lesson12: Reinforcement Learning for Critterbot Science 8

Semelhante a Lesson12: Reinforcement Learning for Critterbot Science 8 (20)

Mais de butest

Mais de butest (20)

Lesson12: Reinforcement Learning for Critterbot Science 8