SlideShare uma empresa Scribd logo
1 de 22
ReinforcementReinforcement
LearningLearning
Science 8 Unit B: Cells and Systems (Nature of Science Emphasis)
Introduction
> What does it mean to have a behaviour
reinforced?
> Let’s look at a famous example first...
Introduction
Ivan Pavlov (1849-1936)
> Born in Russia in 1849, Ivan Pavlov abandoned
a religious career for which he had been
preparing, and instead went into science.
> His work had a great impact on the field of
physiology (the study of the mechanical,
physical, and biochemical functions of living
organisms) by studying the mechanisms
underlying the digestive system in mammals.
Source: Nobelprize.org
Introduction
> Pavlov was awarded the Nobel Prize in
Physiology or Medicine in 1904. He then
turned to studying reflexes, in particular with
dogs.
> His discoveries led to the science of
behaviour.
Source: Nobelprize.org
Introduction
> Pavlov became interested in
studying reflexes when he
noticed that dogs sometimes
drooled even without food
being shown to them.
> Although no food was in sight,
their saliva still dribbled. It
turned out that the dogs were
reacting to lab coats.
Source: Nobelprize.org
Introduction
> Every time the dogs were served
food, the person who served the food
was wearing a lab coat.
> The lab coats became a “stimulus”.
Source: Nobelprize.org
Introduction
> A stimulus is anything capable of
evoking a response in an
organism.
> Examples of stimuli include
sights, sounds, heat, cold, smells,
or other sensations.
> Therefore, the dogs reacted as if
food was on its way whenever
they saw a lab coat.
Source: Nobelprize.org
Introduction
> In a series of experiments, Pavlov
then tried to figure out why this was
happening.
> For example, he struck a bell when
the dogs were fed. If the bell was
sounded close to meal time, the
dogs learnt to associate the sound of
the bell with food.
> After a while, the stimulus of the bell,
caused them to drool.
Source: Nobelprize.org
More on Pavlov's Dog
> You can read more about Pavlov’s dog and
see if you can train a dog to drool on command
online at the Nobel Prize website.
Reinforcement Learning
> Dogs are often trained through a method of
reinforcement.
> For example, if a dog hears the word “sit” and
receives a treat, he or she will learn that
“sitting” provides a treat.
> In fact, almost all animals can learn through
reinforcement.
Reinforcement Learning
Definition:
– Reinforcement occurs when an event following a
response causes an increase in the probability of
that response occurring in the future.
> So when a dog hears “sit” (response) and
receives a treat (event), the dog will more
likely sit in the future in hopes of receiving
another treat.
Reinforcement Learning
> If animals (including humans) can learn by
reinforcement, can a machine also learn
through reinforcement?
> Computing Scientists at the Centre for
Machine Learning believe so, and they are
building a robot that learns through
reinforcement.
Reinforcement Learning
> The robot is called
“Critterbot”.
> The robot responds to
stimuli in the environment.
> For lessons on Critterbot
see Critterbot for Physics
30 and Critterbot for
Science 8.
How can a Machine be Reinforced?
> In Machine Learning (which is a type of
artificial intelligence) the “learner” is a
computer that learns by trying to obtain a
maximum reward.
> So what does a computer or robot want as a
reward?
– Just a number.
-1 0
1
-1
0
1
-1 0
1
0
1
-1
How can a Machine be Reinforced?
> A positive reward will result in a “1”
> A neutral reward will result in a “0”
> A negative reward will result in a “-1”
How can a Machine be Reinforced?
> What separates Reinforcement Learning
from other forms of artificial intelligence is
that the learner is never told what actions to
take.
> The learner uses a trial-and-error search
approach and if it receives a positive reward,
will continue that action.
> But if it receives a negative reward, it will
learn to avoid that action.
Questions
1. How is a robot that uses Machine Learning
different from robot that is programmed for
specific tasks?
– Answer: In Machine Learning, the robot is not told
what actions to take. It learns by trial and error.
Questions
2. A robot in a car factory is designed to build
cars at a fast rate. Would Machine Learning
be a good application for a car building
machine? Why or why not?
Answer: No, probably not.
Robots that build use
specific designs to ensure
they build exactly as they
are told.
Questions
3. Are dogs the only animals that respond to a
stimulus by salivating? For example, what
happens to you when you are just about to put
a pickle in your mouth? Or mustard? Or a
sour candy?
– Answer: Humans also respond to visual stimuli and
will salivate at the sight of some stimuli.
Questions
4. Critterbot was designed to respond to stimuli
(plural for stimulus). Imagine that you had to
design a robot to that will automatically shovel
snow from your driveway every winter.
– The robot cannot have any human assistance, it has
to be autonomous (work on its own).
– First, come up with a ‘cool’ name for your robot.
– Use drawings and written descriptions to write up a
one page explanation of how your robot would work.
continued...
Question 4 continued.
– What types of sensors would it need to have to work
without your assistance? Remember, it is only going
to shovel your driveway, and not wander down the
street shovelling every driveway.
– Animals require energy and use special systems to
convert food into energy. For example, the digestive
system takes in food, digests it to extract energy and
nutrients.
– How will your robot gets its energy? Remember, it
has to work in winter conditions, most often when it
is snowing.
Centre for Mathematics Science and Technology Education (CMASTE)
382 Education South
University of Alberta
Edmonton AB T6G 2G5
www.CMASTE.ca
To download: select Outreach, Alberta Ingenuity Resources and Centre for Machine Learning
Filename: AICML6BrainTumourAnalysis
Centre for Machine Learning
Department of Computing Science
University of Alberta
2-21 Athabasca Hall
Edmonton AB T6G 2E8
(780) 492-4828
www.machinelearningcentre.ca
Alberta Ingenuity
2410 Manulife Place, 10180-101 Street
Edmonton AB T5J 3S4
(780) 423-5735
www.albertaingenuity.ca

Mais conteúdo relacionado

Semelhante a Lesson12: Reinforcement Learning for Critterbot Science 8

Learning, Memory, and Representation (in Cognitive Science)
Learning, Memory, and Representation (in Cognitive Science)Learning, Memory, and Representation (in Cognitive Science)
Learning, Memory, and Representation (in Cognitive Science)Jim Davies
 
Introduction to learning
Introduction to learningIntroduction to learning
Introduction to learningLance Jones
 
Are your gadgets making you behave like pavlov’s dog ?
Are your gadgets making you behave like pavlov’s dog ?Are your gadgets making you behave like pavlov’s dog ?
Are your gadgets making you behave like pavlov’s dog ?Tushar Vakil
 
Trial-and-Error Learning .pptx
Trial-and-Error Learning .pptxTrial-and-Error Learning .pptx
Trial-and-Error Learning .pptxShiniMelukunnel
 
UNITS TWO APPROACES 1 BEHAVIOURISM (1) [Autosaved].pptx
UNITS TWO APPROACES 1 BEHAVIOURISM (1) [Autosaved].pptxUNITS TWO APPROACES 1 BEHAVIOURISM (1) [Autosaved].pptx
UNITS TWO APPROACES 1 BEHAVIOURISM (1) [Autosaved].pptxABRAHAMJOSEPHKWESIBE
 
Train Your Dog
Train Your DogTrain Your Dog
Train Your Dogbreedguide
 
Positive conditioning techniques in animal training vs aversive technique
Positive conditioning techniques in animal training vs aversive techniquePositive conditioning techniques in animal training vs aversive technique
Positive conditioning techniques in animal training vs aversive techniquesheetal chand
 
Characteristics of living organismss.pdf
Characteristics of living organismss.pdfCharacteristics of living organismss.pdf
Characteristics of living organismss.pdfmihajlokarna1
 
Elements Of Behavior
Elements Of BehaviorElements Of Behavior
Elements Of Behaviorguest10f469
 
Poison Ivy and other Poisionous Plants Identification Visual Quiz
Poison Ivy and other Poisionous Plants Identification Visual QuizPoison Ivy and other Poisionous Plants Identification Visual Quiz
Poison Ivy and other Poisionous Plants Identification Visual Quizwww.sciencepowerpoint.com
 
Stimulus and-response-1202604916520991-5
Stimulus and-response-1202604916520991-5Stimulus and-response-1202604916520991-5
Stimulus and-response-1202604916520991-5Sarawut Charoenrob
 
Introduction to Psychological Research
Introduction to Psychological ResearchIntroduction to Psychological Research
Introduction to Psychological ResearchLance Jones
 
Levels of Biological Organisation 1
Levels of Biological Organisation 1Levels of Biological Organisation 1
Levels of Biological Organisation 1catski68
 
Adopt Digital Pets
Adopt Digital PetsAdopt Digital Pets
Adopt Digital PetsDolly Bhasin
 
Scientific Method for Living Environments
Scientific Method for Living EnvironmentsScientific Method for Living Environments
Scientific Method for Living EnvironmentsSteven_iannuccilli
 

Semelhante a Lesson12: Reinforcement Learning for Critterbot Science 8 (20)

B6 lesson part two
B6 lesson part twoB6 lesson part two
B6 lesson part two
 
Learning, Memory, and Representation (in Cognitive Science)
Learning, Memory, and Representation (in Cognitive Science)Learning, Memory, and Representation (in Cognitive Science)
Learning, Memory, and Representation (in Cognitive Science)
 
Introduction to learning
Introduction to learningIntroduction to learning
Introduction to learning
 
Are your gadgets making you behave like pavlov’s dog ?
Are your gadgets making you behave like pavlov’s dog ?Are your gadgets making you behave like pavlov’s dog ?
Are your gadgets making you behave like pavlov’s dog ?
 
Pavlov
PavlovPavlov
Pavlov
 
Trial-and-Error Learning .pptx
Trial-and-Error Learning .pptxTrial-and-Error Learning .pptx
Trial-and-Error Learning .pptx
 
UNITS TWO APPROACES 1 BEHAVIOURISM (1) [Autosaved].pptx
UNITS TWO APPROACES 1 BEHAVIOURISM (1) [Autosaved].pptxUNITS TWO APPROACES 1 BEHAVIOURISM (1) [Autosaved].pptx
UNITS TWO APPROACES 1 BEHAVIOURISM (1) [Autosaved].pptx
 
Train Your Dog
Train Your DogTrain Your Dog
Train Your Dog
 
Behaviourism.pptx
Behaviourism.pptxBehaviourism.pptx
Behaviourism.pptx
 
Behaviourism.pptx
Behaviourism.pptxBehaviourism.pptx
Behaviourism.pptx
 
Positive conditioning techniques in animal training vs aversive technique
Positive conditioning techniques in animal training vs aversive techniquePositive conditioning techniques in animal training vs aversive technique
Positive conditioning techniques in animal training vs aversive technique
 
Characteristics of living organismss.pdf
Characteristics of living organismss.pdfCharacteristics of living organismss.pdf
Characteristics of living organismss.pdf
 
Elements Of Behavior
Elements Of BehaviorElements Of Behavior
Elements Of Behavior
 
thorndike.pptx
thorndike.pptxthorndike.pptx
thorndike.pptx
 
Poison Ivy and other Poisionous Plants Identification Visual Quiz
Poison Ivy and other Poisionous Plants Identification Visual QuizPoison Ivy and other Poisionous Plants Identification Visual Quiz
Poison Ivy and other Poisionous Plants Identification Visual Quiz
 
Stimulus and-response-1202604916520991-5
Stimulus and-response-1202604916520991-5Stimulus and-response-1202604916520991-5
Stimulus and-response-1202604916520991-5
 
Introduction to Psychological Research
Introduction to Psychological ResearchIntroduction to Psychological Research
Introduction to Psychological Research
 
Levels of Biological Organisation 1
Levels of Biological Organisation 1Levels of Biological Organisation 1
Levels of Biological Organisation 1
 
Adopt Digital Pets
Adopt Digital PetsAdopt Digital Pets
Adopt Digital Pets
 
Scientific Method for Living Environments
Scientific Method for Living EnvironmentsScientific Method for Living Environments
Scientific Method for Living Environments
 

Mais de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Mais de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Lesson12: Reinforcement Learning for Critterbot Science 8

  • 1. ReinforcementReinforcement LearningLearning Science 8 Unit B: Cells and Systems (Nature of Science Emphasis)
  • 2. Introduction > What does it mean to have a behaviour reinforced? > Let’s look at a famous example first...
  • 3. Introduction Ivan Pavlov (1849-1936) > Born in Russia in 1849, Ivan Pavlov abandoned a religious career for which he had been preparing, and instead went into science. > His work had a great impact on the field of physiology (the study of the mechanical, physical, and biochemical functions of living organisms) by studying the mechanisms underlying the digestive system in mammals. Source: Nobelprize.org
  • 4. Introduction > Pavlov was awarded the Nobel Prize in Physiology or Medicine in 1904. He then turned to studying reflexes, in particular with dogs. > His discoveries led to the science of behaviour. Source: Nobelprize.org
  • 5. Introduction > Pavlov became interested in studying reflexes when he noticed that dogs sometimes drooled even without food being shown to them. > Although no food was in sight, their saliva still dribbled. It turned out that the dogs were reacting to lab coats. Source: Nobelprize.org
  • 6. Introduction > Every time the dogs were served food, the person who served the food was wearing a lab coat. > The lab coats became a “stimulus”. Source: Nobelprize.org
  • 7. Introduction > A stimulus is anything capable of evoking a response in an organism. > Examples of stimuli include sights, sounds, heat, cold, smells, or other sensations. > Therefore, the dogs reacted as if food was on its way whenever they saw a lab coat. Source: Nobelprize.org
  • 8. Introduction > In a series of experiments, Pavlov then tried to figure out why this was happening. > For example, he struck a bell when the dogs were fed. If the bell was sounded close to meal time, the dogs learnt to associate the sound of the bell with food. > After a while, the stimulus of the bell, caused them to drool. Source: Nobelprize.org
  • 9. More on Pavlov's Dog > You can read more about Pavlov’s dog and see if you can train a dog to drool on command online at the Nobel Prize website.
  • 10. Reinforcement Learning > Dogs are often trained through a method of reinforcement. > For example, if a dog hears the word “sit” and receives a treat, he or she will learn that “sitting” provides a treat. > In fact, almost all animals can learn through reinforcement.
  • 11. Reinforcement Learning Definition: – Reinforcement occurs when an event following a response causes an increase in the probability of that response occurring in the future. > So when a dog hears “sit” (response) and receives a treat (event), the dog will more likely sit in the future in hopes of receiving another treat.
  • 12. Reinforcement Learning > If animals (including humans) can learn by reinforcement, can a machine also learn through reinforcement? > Computing Scientists at the Centre for Machine Learning believe so, and they are building a robot that learns through reinforcement.
  • 13. Reinforcement Learning > The robot is called “Critterbot”. > The robot responds to stimuli in the environment. > For lessons on Critterbot see Critterbot for Physics 30 and Critterbot for Science 8.
  • 14. How can a Machine be Reinforced? > In Machine Learning (which is a type of artificial intelligence) the “learner” is a computer that learns by trying to obtain a maximum reward. > So what does a computer or robot want as a reward? – Just a number. -1 0 1 -1 0 1 -1 0 1 0 1 -1
  • 15. How can a Machine be Reinforced? > A positive reward will result in a “1” > A neutral reward will result in a “0” > A negative reward will result in a “-1”
  • 16. How can a Machine be Reinforced? > What separates Reinforcement Learning from other forms of artificial intelligence is that the learner is never told what actions to take. > The learner uses a trial-and-error search approach and if it receives a positive reward, will continue that action. > But if it receives a negative reward, it will learn to avoid that action.
  • 17. Questions 1. How is a robot that uses Machine Learning different from robot that is programmed for specific tasks? – Answer: In Machine Learning, the robot is not told what actions to take. It learns by trial and error.
  • 18. Questions 2. A robot in a car factory is designed to build cars at a fast rate. Would Machine Learning be a good application for a car building machine? Why or why not? Answer: No, probably not. Robots that build use specific designs to ensure they build exactly as they are told.
  • 19. Questions 3. Are dogs the only animals that respond to a stimulus by salivating? For example, what happens to you when you are just about to put a pickle in your mouth? Or mustard? Or a sour candy? – Answer: Humans also respond to visual stimuli and will salivate at the sight of some stimuli.
  • 20. Questions 4. Critterbot was designed to respond to stimuli (plural for stimulus). Imagine that you had to design a robot to that will automatically shovel snow from your driveway every winter. – The robot cannot have any human assistance, it has to be autonomous (work on its own). – First, come up with a ‘cool’ name for your robot. – Use drawings and written descriptions to write up a one page explanation of how your robot would work. continued...
  • 21. Question 4 continued. – What types of sensors would it need to have to work without your assistance? Remember, it is only going to shovel your driveway, and not wander down the street shovelling every driveway. – Animals require energy and use special systems to convert food into energy. For example, the digestive system takes in food, digests it to extract energy and nutrients. – How will your robot gets its energy? Remember, it has to work in winter conditions, most often when it is snowing.
  • 22. Centre for Mathematics Science and Technology Education (CMASTE) 382 Education South University of Alberta Edmonton AB T6G 2G5 www.CMASTE.ca To download: select Outreach, Alberta Ingenuity Resources and Centre for Machine Learning Filename: AICML6BrainTumourAnalysis Centre for Machine Learning Department of Computing Science University of Alberta 2-21 Athabasca Hall Edmonton AB T6G 2E8 (780) 492-4828 www.machinelearningcentre.ca Alberta Ingenuity 2410 Manulife Place, 10180-101 Street Edmonton AB T5J 3S4 (780) 423-5735 www.albertaingenuity.ca