SlideShare uma empresa Scribd logo
1 de 36
Lisa Torrey University of Wisconsin – Madison HAMLET 2009 Reinforcement Learning
Reinforcement learning What is it and why is it important in machine learning? What machine learning algorithms exist for it? Q-learning in theory How does it work? How can it be improved? Q-learning in practice What are the challenges? What are the applications? Link with psychology Do people use similar mechanisms? Do people use other methods that could inspire algorithms? Resources for future reference Outline
Reinforcement learning What is it and why is it important in machine learning? What machine learning algorithms exist for it? Q-learning in theory How does it work? How can it be improved? Q-learning in practice What are the challenges? What are the applications? Link with psychology Do people use similar mechanisms? Do people use other methods that could inspire algorithms? Resources for future reference Outline
Machine Learning Classification:  where AI meets statistics Given Training data Learn A model for making a single prediction or decision xnew Classification Algorithm Training Data (x1, y1) (x2, y2) (x3, y3) … Model ynew
Animal/Human Learning Memorization x1 y1 Classification xnew ynew Procedural decision Other? environment
Learning how to act to accomplish goals Given Environment that contains rewards Learn A policy for acting Important differences from classification You don’t get examples of correct answers You have to try things in order to learn Procedural Learning
A Good Policy
Do you know your environment? The effects of actions The rewards If yes, you can use Dynamic Programming More like planning than learning Value Iteration and Policy Iteration If no, you can use Reinforcement Learning (RL) Acting and observing in the environment What You Know Matters
RL shapes behavior using reinforcement Agent takes actions in an environment (in episodes) Those actions change the state and trigger rewards Through experience, an agent learns a policy for acting Given a state, choose an action Maximize cumulative reward during an episode Interesting things about this problem Requires solving credit assignment What action(s) are responsible for a reward? Requires both exploring and exploiting Do what looks best, or see if something else is really best? RL as Operant Conditioning
Search-based:  evolution directly on a policy E.g. genetic algorithms Model-based:  build a model of the environment Then you can use dynamic programming Memory-intensive learning method Model-free:  learn a policy without any model Temporal difference methods (TD) Requires limited episodic memory (though more helps) Types of Reinforcement Learning
Actor-critic learning The TD version of Policy Iteration Q-learning The TD version of Value Iteration This is the most widely used RL algorithm Types of Model-Free RL
Reinforcement learning What is it and why is it important in machine learning? What machine learning algorithms exist for it? Q-learning in theory How does it work? How can it be improved? Q-learning in practice What are the challenges? What are the applications? Link with psychology Do people use similar mechanisms? Do people use other methods that could inspire algorithms? Resources for future reference Outline
Current state:   s Current action:   a Transition function:   δ(s, a) = sʹ Reward function:   r(s, a) Є R Policy π(s) = a Q(s, a) ≈ value of taking action a from state s Q-Learning:  Definitions Markov property: this is independent of previous states given current state In classification we’d have examples (s, π(s)) to learn from
Q(s, a) estimates the discounted cumulative reward Starting in state s Taking action a Following the current policy thereafter Suppose we have the optimal Q-function What’s the optimal policy in state s? The action argmaxb Q(s, b) But we don’t have the optimal Q-function at first Let’s act as if we do And updates it after each step so it’s closer to optimal Eventually it will be optimal! The Q-function
Q-Learning:  The Procedure Agent Q(s1, a) = 0 π(s1) = a1 Q(s1, a1)  Q(s1, a1) + Δ π(s2) = a2 s2 s3 a1 a2 r2 r3 s1 Environment δ(s2, a2) = s3 r(s2, a2) = r3 δ(s1, a1) = s2 r(s1, a1) = r2
Q-Learning:  Updates ,[object Object]
With a discount factor to give later rewards less impact
With a learning rate for non-deterministic worlds,[object Object]
Q-Learning:  Update Example 1 2 3 4 5 6 7 8 9 10 11
Q-Learning:  Update Example 1 2 3 4 5 6 7 8 9 10 11
The Need for Exploration 1 2 3 Explore! 4 5 6 7 8 9 10 11
Can’t always choose the action with highest Q-value The Q-function is initially unreliable Need to explore until it is optimal Most common method:  ε-greedy Take a random action in a small fraction of steps (ε) Decay ε over time There is some work on optimizing exploration  Kearns & Singh, ML 1998 But people usually use this simple method Explore/Exploit Tradeoff
Under certain conditions, Q-learning will converge to the correct Q-function The environment model doesn’t change States and actions are finite Rewards are bounded Learning rate decays with visits to state-action pairs Exploration method would guarantee infinite visits to every state-action pair over an infinite training period Q-Learning:  Convergence
Extensions:  SARSA ,[object Object]
Use the action actually chosen in updatesRegular: PIT! SARSA:
Extensions:  Look-ahead ,[object Object]
Use some episodic memory to speed credit assignment1 2 3 4 5 6 7 8 9 10 11 TD(λ):  a weighted combination of look-ahead distances The parameter λ controls the weighting
Eligibility traces:  Lookahead with less memory Visiting a state leaves a trace that decays Update multiple states at once States get credit according to their trace Extensions:  Eligibility Traces 3 1 2 4 5 6 9 7 8 10 11
Options:  Create higher-level actions Extensions:  Options and Hierarchies ,[object Object],Whole Maze Room A Room B
Extensions:  Function Approximation Function approximation:  allow complex environments The Q-function table could be too big (or infinitely big!) Describe a state by a feature vector f = (f1 , f2 , … , fn) Then the Q-function can be any regression model E.g. linear regression:   Q(s, a) = w1 f1  + w2 f2  + … + wn fn Cost:  convergence goes away in theory, though often not in practice Benefit:  generalization over similar states Easiest if the approximator can be updated incrementally, like neural networks with gradient descent, but you can also do this in batches
Reinforcement learning What is it and why is it important in machine learning? What machine learning algorithms exist for it? Q-learning in theory How does it work? How can it be improved? Q-learning in practice What are the challenges? What are the applications? Link with psychology Do people use similar mechanisms? Do people use other methods that could inspire algorithms? Resources for future reference Outline
Feature/reward design can be very involved Online learning (no time for tuning) Continuous features(handled by tiling) Delayed rewards (handled by shaping) Parameters can have large effects on learning speed Tuning has just one effect: slowing it down Realistic environments can have partial observability Realistic environments can be non-stationary There may be multiple agents Challenges in Reinforcement Learning
Tesauro 1995:  Backgammon Crites & Barto 1996:  Elevator scheduling Kaelbling et al. 1996:  Packaging task Singh & Bertsekas 1997:  Cell phone channel allocation Nevmyvaka et al. 2006:  Stock investment decisions Ipek et al. 2008:  Memory control in hardware Kosorok 2009:  Chemotherapy treatment decisions No textbook “killer app” Just behind the times? Too much design and tuning required? Training too long or expensive? Too much focus on toy domains in research? Applications of Reinforcement Learning
Reinforcement learning What is it and why is it important in machine learning? What machine learning algorithms exist for it? Q-learning in theory How does it work? How can it be improved? Q-learning in practice What are the challenges? What are the applications? Link with psychology Do people use similar mechanisms? Do people use other methods that could inspire algorithms? Resources for future reference Outline
Should machine learning researchers care? Planes don’t fly the way birds do; should machines learn the way people do? But why not look for inspiration? Psychological research does show neuron activity associated with rewards Really prediction error:  actual – expected Primarily in the striatum Do Brains Perform RL?
Schönberg et al., J. Neuroscience 2007 Good learners have stronger signals in the striatum than bad learners Frank et al., Science 2004 Parkinson’s patients learn better from negatives On dopamine medication, they learn better from positives Bayer & Glimcher, Neuron 2005 Average firing rate corresponds to positive prediction errors Interestingly, not to negative ones Cohen & Ranganath, J. Neuroscience 2007 ERP magnitude predicts whether subjects change behavior after losing Support for Reward Systems

Mais conteúdo relacionado

Mais procurados

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningEng Teong Cheah
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialOmar Enayet
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning Melaku Eneayehu
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
Supervised Unsupervised and Reinforcement Learning
Supervised Unsupervised and Reinforcement Learning Supervised Unsupervised and Reinforcement Learning
Supervised Unsupervised and Reinforcement Learning Aakash Chotrani
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised LearningLukas Tencer
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithmJie-Han Chen
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Artificial Intelligence Searching Techniques
Artificial Intelligence Searching TechniquesArtificial Intelligence Searching Techniques
Artificial Intelligence Searching TechniquesDr. C.V. Suresh Babu
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsBill Liu
 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsSneha Ravikumar
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationAnkit Gupta
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Machine Learning
Machine LearningMachine Learning
Machine LearningRahul Kumar
 

Mais procurados (20)

Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Supervised Unsupervised and Reinforcement Learning
Supervised Unsupervised and Reinforcement Learning Supervised Unsupervised and Reinforcement Learning
Supervised Unsupervised and Reinforcement Learning
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Artificial Intelligence Searching Techniques
Artificial Intelligence Searching TechniquesArtificial Intelligence Searching Techniques
Artificial Intelligence Searching Techniques
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving Cars
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 

Destaque

Introduction to Reinforcement Learning
Introduction to Reinforcement LearningIntroduction to Reinforcement Learning
Introduction to Reinforcement LearningEdward Balaban
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)pauldix
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningJungyeol
 
[Dl輪読会]introduction of reinforcement learning
[Dl輪読会]introduction of reinforcement learning[Dl輪読会]introduction of reinforcement learning
[Dl輪読会]introduction of reinforcement learningDeep Learning JP
 
Teaching parents how to use puzzle temp
Teaching parents how to use  puzzle tempTeaching parents how to use  puzzle temp
Teaching parents how to use puzzle tempMelisa G. Trent
 
Argumentation persuasion
Argumentation persuasionArgumentation persuasion
Argumentation persuasionNikki Wilkinson
 
Effects of Reinforcement in the Classroom
Effects of Reinforcement in the ClassroomEffects of Reinforcement in the Classroom
Effects of Reinforcement in the ClassroomAMaciocia
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 
Using Reinforcement in the Classroom
Using Reinforcement in the ClassroomUsing Reinforcement in the Classroom
Using Reinforcement in the Classroomsworaac
 
Reinforcement & Punishment
Reinforcement & PunishmentReinforcement & Punishment
Reinforcement & Punishmentcaseylashaek
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
Quantum Information Technology
Quantum Information TechnologyQuantum Information Technology
Quantum Information TechnologyFenny Thakrar
 
Deep Reinforcement Learning An Introduction
Deep Reinforcement Learning An IntroductionDeep Reinforcement Learning An Introduction
Deep Reinforcement Learning An IntroductionXiaohu ZHU
 
Reinforcement theory observational learning theory
Reinforcement theory   observational learning theoryReinforcement theory   observational learning theory
Reinforcement theory observational learning theoryHazel Dacaldacal
 
Reinforcement (Behavioral Learning)
Reinforcement (Behavioral Learning)Reinforcement (Behavioral Learning)
Reinforcement (Behavioral Learning)Emman Chavez
 

Destaque (20)

Introduction to Reinforcement Learning
Introduction to Reinforcement LearningIntroduction to Reinforcement Learning
Introduction to Reinforcement Learning
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
[Dl輪読会]introduction of reinforcement learning
[Dl輪読会]introduction of reinforcement learning[Dl輪読会]introduction of reinforcement learning
[Dl輪読会]introduction of reinforcement learning
 
Teaching parents how to use puzzle temp
Teaching parents how to use  puzzle tempTeaching parents how to use  puzzle temp
Teaching parents how to use puzzle temp
 
Argumentation persuasion
Argumentation persuasionArgumentation persuasion
Argumentation persuasion
 
Effects of Reinforcement in the Classroom
Effects of Reinforcement in the ClassroomEffects of Reinforcement in the Classroom
Effects of Reinforcement in the Classroom
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
Using Reinforcement in the Classroom
Using Reinforcement in the ClassroomUsing Reinforcement in the Classroom
Using Reinforcement in the Classroom
 
Argumentation 111312
Argumentation 111312Argumentation 111312
Argumentation 111312
 
Reinforcement & Punishment
Reinforcement & PunishmentReinforcement & Punishment
Reinforcement & Punishment
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Quantum Information Technology
Quantum Information TechnologyQuantum Information Technology
Quantum Information Technology
 
Deep Reinforcement Learning An Introduction
Deep Reinforcement Learning An IntroductionDeep Reinforcement Learning An Introduction
Deep Reinforcement Learning An Introduction
 
Quantum computing
Quantum computingQuantum computing
Quantum computing
 
Reinforcement theory observational learning theory
Reinforcement theory   observational learning theoryReinforcement theory   observational learning theory
Reinforcement theory observational learning theory
 
Planning
PlanningPlanning
Planning
 
Reward & Punishment
Reward & PunishmentReward & Punishment
Reward & Punishment
 
Reinforcement
ReinforcementReinforcement
Reinforcement
 
Reinforcement (Behavioral Learning)
Reinforcement (Behavioral Learning)Reinforcement (Behavioral Learning)
Reinforcement (Behavioral Learning)
 

Semelhante a Reinforcement Learning

Decision support systems
Decision support systemsDecision support systems
Decision support systemsMR Z
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfKuan-Tsae Huang
 
Machine learning (domingo's paper)
Machine learning (domingo's paper)Machine learning (domingo's paper)
Machine learning (domingo's paper)Akhilesh Joshi
 
Machine Learning Interview Questions and Answers
Machine Learning Interview Questions and AnswersMachine Learning Interview Questions and Answers
Machine Learning Interview Questions and AnswersSatyam Jaiswal
 
mlrev.ppt
mlrev.pptmlrev.ppt
mlrev.pptbutest
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.pptcharusharma165
 
acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.pptbutest
 
Genetic Algorithms.ppt
Genetic Algorithms.pptGenetic Algorithms.ppt
Genetic Algorithms.pptRohanBorgalli
 
A Review on Introduction to Reinforcement Learning
A Review on Introduction to Reinforcement LearningA Review on Introduction to Reinforcement Learning
A Review on Introduction to Reinforcement Learningijtsrd
 

Semelhante a Reinforcement Learning (20)

Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
Decision support systems
Decision support systemsDecision support systems
Decision support systems
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdf
 
Machine learning
Machine learningMachine learning
Machine learning
 
Joseph Jay Williams - WESST - Bridging Research via MOOClets and Collaborativ...
Joseph Jay Williams - WESST - Bridging Research via MOOClets and Collaborativ...Joseph Jay Williams - WESST - Bridging Research via MOOClets and Collaborativ...
Joseph Jay Williams - WESST - Bridging Research via MOOClets and Collaborativ...
 
Joseph Jay Williams - WESST - Bridging Research and Practice via MOOClets & C...
Joseph Jay Williams - WESST - Bridging Research and Practice via MOOClets & C...Joseph Jay Williams - WESST - Bridging Research and Practice via MOOClets & C...
Joseph Jay Williams - WESST - Bridging Research and Practice via MOOClets & C...
 
Machine learning (domingo's paper)
Machine learning (domingo's paper)Machine learning (domingo's paper)
Machine learning (domingo's paper)
 
Chapter 5 of 1
Chapter 5 of 1Chapter 5 of 1
Chapter 5 of 1
 
My experiment
My experimentMy experiment
My experiment
 
Lec0
Lec0Lec0
Lec0
 
Machine Learning Interview Questions and Answers
Machine Learning Interview Questions and AnswersMachine Learning Interview Questions and Answers
Machine Learning Interview Questions and Answers
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
mlrev.ppt
mlrev.pptmlrev.ppt
mlrev.ppt
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.ppt
 
S10
S10S10
S10
 
S10
S10S10
S10
 
acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.ppt
 
Genetic Algorithms.ppt
Genetic Algorithms.pptGenetic Algorithms.ppt
Genetic Algorithms.ppt
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
A Review on Introduction to Reinforcement Learning
A Review on Introduction to Reinforcement LearningA Review on Introduction to Reinforcement Learning
A Review on Introduction to Reinforcement Learning
 

Mais de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Mais de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Reinforcement Learning

  • 1. Lisa Torrey University of Wisconsin – Madison HAMLET 2009 Reinforcement Learning
  • 2. Reinforcement learning What is it and why is it important in machine learning? What machine learning algorithms exist for it? Q-learning in theory How does it work? How can it be improved? Q-learning in practice What are the challenges? What are the applications? Link with psychology Do people use similar mechanisms? Do people use other methods that could inspire algorithms? Resources for future reference Outline
  • 3. Reinforcement learning What is it and why is it important in machine learning? What machine learning algorithms exist for it? Q-learning in theory How does it work? How can it be improved? Q-learning in practice What are the challenges? What are the applications? Link with psychology Do people use similar mechanisms? Do people use other methods that could inspire algorithms? Resources for future reference Outline
  • 4. Machine Learning Classification: where AI meets statistics Given Training data Learn A model for making a single prediction or decision xnew Classification Algorithm Training Data (x1, y1) (x2, y2) (x3, y3) … Model ynew
  • 5. Animal/Human Learning Memorization x1 y1 Classification xnew ynew Procedural decision Other? environment
  • 6. Learning how to act to accomplish goals Given Environment that contains rewards Learn A policy for acting Important differences from classification You don’t get examples of correct answers You have to try things in order to learn Procedural Learning
  • 8. Do you know your environment? The effects of actions The rewards If yes, you can use Dynamic Programming More like planning than learning Value Iteration and Policy Iteration If no, you can use Reinforcement Learning (RL) Acting and observing in the environment What You Know Matters
  • 9. RL shapes behavior using reinforcement Agent takes actions in an environment (in episodes) Those actions change the state and trigger rewards Through experience, an agent learns a policy for acting Given a state, choose an action Maximize cumulative reward during an episode Interesting things about this problem Requires solving credit assignment What action(s) are responsible for a reward? Requires both exploring and exploiting Do what looks best, or see if something else is really best? RL as Operant Conditioning
  • 10. Search-based: evolution directly on a policy E.g. genetic algorithms Model-based: build a model of the environment Then you can use dynamic programming Memory-intensive learning method Model-free: learn a policy without any model Temporal difference methods (TD) Requires limited episodic memory (though more helps) Types of Reinforcement Learning
  • 11. Actor-critic learning The TD version of Policy Iteration Q-learning The TD version of Value Iteration This is the most widely used RL algorithm Types of Model-Free RL
  • 12. Reinforcement learning What is it and why is it important in machine learning? What machine learning algorithms exist for it? Q-learning in theory How does it work? How can it be improved? Q-learning in practice What are the challenges? What are the applications? Link with psychology Do people use similar mechanisms? Do people use other methods that could inspire algorithms? Resources for future reference Outline
  • 13. Current state: s Current action: a Transition function: δ(s, a) = sʹ Reward function: r(s, a) Є R Policy π(s) = a Q(s, a) ≈ value of taking action a from state s Q-Learning: Definitions Markov property: this is independent of previous states given current state In classification we’d have examples (s, π(s)) to learn from
  • 14. Q(s, a) estimates the discounted cumulative reward Starting in state s Taking action a Following the current policy thereafter Suppose we have the optimal Q-function What’s the optimal policy in state s? The action argmaxb Q(s, b) But we don’t have the optimal Q-function at first Let’s act as if we do And updates it after each step so it’s closer to optimal Eventually it will be optimal! The Q-function
  • 15. Q-Learning: The Procedure Agent Q(s1, a) = 0 π(s1) = a1 Q(s1, a1)  Q(s1, a1) + Δ π(s2) = a2 s2 s3 a1 a2 r2 r3 s1 Environment δ(s2, a2) = s3 r(s2, a2) = r3 δ(s1, a1) = s2 r(s1, a1) = r2
  • 16.
  • 17. With a discount factor to give later rewards less impact
  • 18.
  • 19. Q-Learning: Update Example 1 2 3 4 5 6 7 8 9 10 11
  • 20. Q-Learning: Update Example 1 2 3 4 5 6 7 8 9 10 11
  • 21. The Need for Exploration 1 2 3 Explore! 4 5 6 7 8 9 10 11
  • 22. Can’t always choose the action with highest Q-value The Q-function is initially unreliable Need to explore until it is optimal Most common method: ε-greedy Take a random action in a small fraction of steps (ε) Decay ε over time There is some work on optimizing exploration Kearns & Singh, ML 1998 But people usually use this simple method Explore/Exploit Tradeoff
  • 23. Under certain conditions, Q-learning will converge to the correct Q-function The environment model doesn’t change States and actions are finite Rewards are bounded Learning rate decays with visits to state-action pairs Exploration method would guarantee infinite visits to every state-action pair over an infinite training period Q-Learning: Convergence
  • 24.
  • 25. Use the action actually chosen in updatesRegular: PIT! SARSA:
  • 26.
  • 27. Use some episodic memory to speed credit assignment1 2 3 4 5 6 7 8 9 10 11 TD(λ): a weighted combination of look-ahead distances The parameter λ controls the weighting
  • 28. Eligibility traces: Lookahead with less memory Visiting a state leaves a trace that decays Update multiple states at once States get credit according to their trace Extensions: Eligibility Traces 3 1 2 4 5 6 9 7 8 10 11
  • 29.
  • 30. Extensions: Function Approximation Function approximation: allow complex environments The Q-function table could be too big (or infinitely big!) Describe a state by a feature vector f = (f1 , f2 , … , fn) Then the Q-function can be any regression model E.g. linear regression: Q(s, a) = w1 f1 + w2 f2 + … + wn fn Cost: convergence goes away in theory, though often not in practice Benefit: generalization over similar states Easiest if the approximator can be updated incrementally, like neural networks with gradient descent, but you can also do this in batches
  • 31. Reinforcement learning What is it and why is it important in machine learning? What machine learning algorithms exist for it? Q-learning in theory How does it work? How can it be improved? Q-learning in practice What are the challenges? What are the applications? Link with psychology Do people use similar mechanisms? Do people use other methods that could inspire algorithms? Resources for future reference Outline
  • 32. Feature/reward design can be very involved Online learning (no time for tuning) Continuous features(handled by tiling) Delayed rewards (handled by shaping) Parameters can have large effects on learning speed Tuning has just one effect: slowing it down Realistic environments can have partial observability Realistic environments can be non-stationary There may be multiple agents Challenges in Reinforcement Learning
  • 33. Tesauro 1995: Backgammon Crites & Barto 1996: Elevator scheduling Kaelbling et al. 1996: Packaging task Singh & Bertsekas 1997: Cell phone channel allocation Nevmyvaka et al. 2006: Stock investment decisions Ipek et al. 2008: Memory control in hardware Kosorok 2009: Chemotherapy treatment decisions No textbook “killer app” Just behind the times? Too much design and tuning required? Training too long or expensive? Too much focus on toy domains in research? Applications of Reinforcement Learning
  • 34. Reinforcement learning What is it and why is it important in machine learning? What machine learning algorithms exist for it? Q-learning in theory How does it work? How can it be improved? Q-learning in practice What are the challenges? What are the applications? Link with psychology Do people use similar mechanisms? Do people use other methods that could inspire algorithms? Resources for future reference Outline
  • 35. Should machine learning researchers care? Planes don’t fly the way birds do; should machines learn the way people do? But why not look for inspiration? Psychological research does show neuron activity associated with rewards Really prediction error: actual – expected Primarily in the striatum Do Brains Perform RL?
  • 36. Schönberg et al., J. Neuroscience 2007 Good learners have stronger signals in the striatum than bad learners Frank et al., Science 2004 Parkinson’s patients learn better from negatives On dopamine medication, they learn better from positives Bayer & Glimcher, Neuron 2005 Average firing rate corresponds to positive prediction errors Interestingly, not to negative ones Cohen & Ranganath, J. Neuroscience 2007 ERP magnitude predicts whether subjects change behavior after losing Support for Reward Systems
  • 37. Various results in animals support different algorithms Montague et al., J. Neuroscience 1996: TD O’Doherty et al., Science 2004: Actor-critic Daw, Nature 2005: Parallel model-free and model-based Morris et al., Nature 2006: SARSA Roesch et al., Nature 2007: Q-learning Other results support extensions Bogacz et al., Brain Research 2005: Eligibility traces Daw, Nature 2006: Novelty bonuses to promote exploration Mixed results on reward discounting (short vs. long term) Ainslie 2001: people are more impulsive than algorithms McClure et al., Science 2004: Two parallel systems Frank et al., PNAS 2007: Controlled by genetic differences Schweighofer et al., J. Neuroscience 2008: Influenced by serotonin Support for Specific Mechanisms
  • 38. Parallelism Separate systems for positive/negative errors Multiple algorithms running simultaneously Use of RL in combination with other systems Planning: Reasoning about why things do or don’t work Advice: Someone to imitate or correct us Transfer: Knowledge about similar tasks More impulsivity Is this necessarily better? The goal for machine learning: Take inspiration from humans without being limited by their shortcomings What People Do Better My work
  • 39. Reinforcement LearningSutton & Barto, MIT Press 1998 The standard reference book on computational RL Reinforcement LearningDayan, Encyclopedia of Cognitive Science 2001 A briefer introduction that still touches on many computational issues Reinforcement learning: the good, the bad, and the uglyDayan & Niv, Current Opinions in Neurobiology 2008 A comprehensive survey of work on RL in the human brain Resources on Reinforcement Learning