SlideShare uma empresa Scribd logo
1 de 30
Lisa Torrey University of Wisconsin – Madison CS 540 Transfer Learning
Education Hierarchical curriculum Learning tasks share common stimulus-response elements Abstract problem-solving Learning tasks share general underlying principles Multilingualism Knowing one language affects learning in another Transfer can be both positive and negative Transfer Learning in Humans
Transfer Learning in AI Given Learn Task T Task S
Goals of Transfer Learning higher asymptote higher slope performance higher start training
Inductive Learning Search Allowed Hypotheses All Hypotheses
Transfer in Inductive Learning Search Allowed Hypotheses All Hypotheses Thrun and Mitchell 1995:  Transfer  slopes for gradient descent
Transfer in Inductive Learning Bayesian methods Bayesian Learning Bayesian Transfer Prior distribution + Data = Posterior  Distribution Raina et al.2006:  Transfer  a Gaussian prior
Transfer in Inductive Learning Hierarchical methods Pipe Surface Circle Line Curve Stracuzzi2006:  Learn Boolean concepts that can depend on each other
Transfer in Inductive Learning Dealing with Missing Data or Labels Task T Task S Shi et al. 2008:  Transfer  via active learning
Reinforcement Learning Agent Q(s1, a) = 0 π(s1) = a1 Q(s1, a1)  Q(s1, a1) + Δ π(s2) = a2 s2 s3 a1 a2 r2 r3 s1 Environment δ(s2, a2) = s3 r(s2, a2) = r3 δ(s1, a1) = s2 r(s1, a1) = r2
Transfer in Reinforcement Learning Starting-point  methods Hierarchical  methods Alteration methods Imitation  methods New RL algorithms
Transfer in Reinforcement Learning Starting-point methods Initial Q-table transfer Source task no transfer target-task training Taylor et al. 2005:  Value-function transfer
Transfer in Reinforcement Learning Hierarchical methods Soccer Pass Shoot Run Kick Mehta et al. 2008:  Transfer  a learned hierarchy
Transfer in Reinforcement Learning Alteration methods Task S Original states Original actions Original rewards New states New actions New rewards Walsh et al. 2006:  Transfer  aggregate states
Transfer in Reinforcement Learning New RL Algorithms Agent Q(s1, a) = 0 π(s1) = a1 Q(s1, a1)  Q(s1, a1) + Δ π(s2) = a2 a1 a2 s2 s3 s1 r2 r3 Environment δ(s2, a2) = s3 r(s2, a2) = r3 δ(s1, a1) = s2 r(s1, a1) = r2 Torrey et al. 2006:  Transfer advice about skills
Transfer in Reinforcement Learning Imitation methods source policy used target training Torrey et al. 2007:  Demonstrate a strategy
My Research Starting-point  methods Hierarchical  methods Hierarchical  methods Imitation  methods New RL algorithms Skill Transfer Macro Transfer
RoboCup Domain 3-on-2 KeepAway 3-on-2 BreakAway 2-on-1 BreakAway 3-on-2 MoveDownfield
Inductive Logic Programming IF         [  ] THEN  pass(Teammate) IF          distance(Teammate) ≤ 5  THEN   pass(Teammate) IF          distance(Teammate) ≤ 10  THEN   pass(Teammate) … IF          distance(Teammate) ≤ 5               angle(Teammate, Opponent) ≥ 15  THEN   pass(Teammate) IF          distance(Teammate) ≤ 5               angle(Teammate, Opponent) ≥ 30  THEN   pass(Teammate)
Advice Taking Batch Reinforcement Learning  via  Support Vector Regression (RL-SVR) Agent Agent Compute Q-functions … Environment Environment Batch 2 Batch  1 Find Q-functions that minimize:	ModelSize + C × DataMisfit
Advice Taking Batch Reinforcement Learning  with  Advice (KBKR) Agent Agent Compute Q-functions … Environment Environment Advice Batch  1 Batch 2 + µ × AdviceMisfit Find Q-functions that minimize:	ModelSize + C × DataMisfit
Skill Transfer Algorithm Source ILP IF          distance(Teammate) ≤ 5               angle(Teammate, Opponent) ≥ 30 THEN   pass(Teammate) Mapping Advice Taking Target [Human advice]
Selected Results Skill transfer to 3-on-2 BreakAway from several tasks
Macro-Operators pass(Teammate) move(Direction) IF         [ ... ]  THEN  pass(Teammate) IF         [ ... ]  THEN  move(ahead) IF         [ ... ]  THEN  shoot(goalRight) IF         [ ... ]  THEN  shoot(goalLeft) IF         [ ... ]  THEN  pass(Teammate) IF         [ ... ]  THEN  move(left) IF         [ ... ]  THEN  shoot(goalRight) IF         [ ... ]  THEN  shoot(goalRight) shoot(goalRight) shoot(goalLeft)
Demonstration An imitation method source policy used target training
Macro Transfer Algorithm Source ILP Demonstration Target
Macro Transfer Algorithm Learning  structures Positive: BreakAway games that score Negative: BreakAway games that didn’t score ILP IF           actionTaken(Game, StateA, pass(Teammate), StateB)               actionTaken(Game, StateB, move(Direction), StateC)               actionTaken(Game, StateC, shoot(goalRight), StateD)               actionTaken(Game, StateD, shoot(goalLeft), StateE) THEN   isaGoodGame(Game)
Macro Transfer Algorithm Learning  rules for arcs Positive: states in good games that took the arc Negative: states in good games that could have taken the arc but didn’t ILP pass(Teammate) shoot(goalRight) IF         [ … ] THEN  loop(State, Teammate)) IF         [ … ] THEN  enter(State)
Selected Results Macro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway
Machine learning is often designed in standalone tasks Transfer is a natural learning ability that we would like to incorporate into machine learners There are some successes, but challenges remain, like avoiding negative transfer and automating mapping Summary

Mais conteúdo relacionado

Destaque (7)

Transfer of Learning
Transfer of LearningTransfer of Learning
Transfer of Learning
 
Transfer of Learning and Motivation in Learning
Transfer of Learning and Motivation in LearningTransfer of Learning and Motivation in Learning
Transfer of Learning and Motivation in Learning
 
Transfer of learning
Transfer of learningTransfer of learning
Transfer of learning
 
TRANSFER OF LEARNING by Lorraine Anoran
TRANSFER OF LEARNING by Lorraine AnoranTRANSFER OF LEARNING by Lorraine Anoran
TRANSFER OF LEARNING by Lorraine Anoran
 
Transfer of Training
Transfer of TrainingTransfer of Training
Transfer of Training
 
Transfer Of Learning
Transfer Of LearningTransfer Of Learning
Transfer Of Learning
 
Transfer of Learning
Transfer of LearningTransfer of Learning
Transfer of Learning
 

Semelhante a Transfer Learning in Humans and AI

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 
Cs344 lect15-robotic-knowledge-inferencing-prolog-11feb08
Cs344 lect15-robotic-knowledge-inferencing-prolog-11feb08Cs344 lect15-robotic-knowledge-inferencing-prolog-11feb08
Cs344 lect15-robotic-knowledge-inferencing-prolog-11feb08Praveen Kumar
 
Part 1
Part 1Part 1
Part 1butest
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureNecip Oguz Serbetci
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validationgmorishita
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningBig_Data_Ukraine
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Languagevsssuresh
 
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...Alexandros Karatzoglou
 
Introduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theoristIntroduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theoristAkin Osman Kazakci
 
Knowledge graphs, meet Deep Learning
Knowledge graphs, meet Deep LearningKnowledge graphs, meet Deep Learning
Knowledge graphs, meet Deep LearningConnected Data World
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
 
Relational Transfer in Reinforcement Learning
Relational Transfer in Reinforcement LearningRelational Transfer in Reinforcement Learning
Relational Transfer in Reinforcement Learningbutest
 
Intro to Machine Learning with TF- workshop
Intro to Machine Learning with TF- workshopIntro to Machine Learning with TF- workshop
Intro to Machine Learning with TF- workshopProttay Karim
 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxajondaree
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Charles Martin
 

Semelhante a Transfer Learning in Humans and AI (20)

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
Cs344 lect15-robotic-knowledge-inferencing-prolog-11feb08
Cs344 lect15-robotic-knowledge-inferencing-prolog-11feb08Cs344 lect15-robotic-knowledge-inferencing-prolog-11feb08
Cs344 lect15-robotic-knowledge-inferencing-prolog-11feb08
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
 
Part 1
Part 1Part 1
Part 1
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
19.pptx
19.pptx19.pptx
19.pptx
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic Architecture
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
 
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
 
Introduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theoristIntroduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theorist
 
Knowledge graphs, meet Deep Learning
Knowledge graphs, meet Deep LearningKnowledge graphs, meet Deep Learning
Knowledge graphs, meet Deep Learning
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
Relational Transfer in Reinforcement Learning
Relational Transfer in Reinforcement LearningRelational Transfer in Reinforcement Learning
Relational Transfer in Reinforcement Learning
 
Intro to Machine Learning with TF- workshop
Intro to Machine Learning with TF- workshopIntro to Machine Learning with TF- workshop
Intro to Machine Learning with TF- workshop
 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptx
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 

Mais de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Mais de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Transfer Learning in Humans and AI

  • 1. Lisa Torrey University of Wisconsin – Madison CS 540 Transfer Learning
  • 2. Education Hierarchical curriculum Learning tasks share common stimulus-response elements Abstract problem-solving Learning tasks share general underlying principles Multilingualism Knowing one language affects learning in another Transfer can be both positive and negative Transfer Learning in Humans
  • 3. Transfer Learning in AI Given Learn Task T Task S
  • 4. Goals of Transfer Learning higher asymptote higher slope performance higher start training
  • 5. Inductive Learning Search Allowed Hypotheses All Hypotheses
  • 6. Transfer in Inductive Learning Search Allowed Hypotheses All Hypotheses Thrun and Mitchell 1995: Transfer slopes for gradient descent
  • 7. Transfer in Inductive Learning Bayesian methods Bayesian Learning Bayesian Transfer Prior distribution + Data = Posterior Distribution Raina et al.2006: Transfer a Gaussian prior
  • 8. Transfer in Inductive Learning Hierarchical methods Pipe Surface Circle Line Curve Stracuzzi2006: Learn Boolean concepts that can depend on each other
  • 9. Transfer in Inductive Learning Dealing with Missing Data or Labels Task T Task S Shi et al. 2008: Transfer via active learning
  • 10. Reinforcement Learning Agent Q(s1, a) = 0 π(s1) = a1 Q(s1, a1)  Q(s1, a1) + Δ π(s2) = a2 s2 s3 a1 a2 r2 r3 s1 Environment δ(s2, a2) = s3 r(s2, a2) = r3 δ(s1, a1) = s2 r(s1, a1) = r2
  • 11. Transfer in Reinforcement Learning Starting-point methods Hierarchical methods Alteration methods Imitation methods New RL algorithms
  • 12. Transfer in Reinforcement Learning Starting-point methods Initial Q-table transfer Source task no transfer target-task training Taylor et al. 2005: Value-function transfer
  • 13. Transfer in Reinforcement Learning Hierarchical methods Soccer Pass Shoot Run Kick Mehta et al. 2008: Transfer a learned hierarchy
  • 14. Transfer in Reinforcement Learning Alteration methods Task S Original states Original actions Original rewards New states New actions New rewards Walsh et al. 2006: Transfer aggregate states
  • 15. Transfer in Reinforcement Learning New RL Algorithms Agent Q(s1, a) = 0 π(s1) = a1 Q(s1, a1)  Q(s1, a1) + Δ π(s2) = a2 a1 a2 s2 s3 s1 r2 r3 Environment δ(s2, a2) = s3 r(s2, a2) = r3 δ(s1, a1) = s2 r(s1, a1) = r2 Torrey et al. 2006: Transfer advice about skills
  • 16. Transfer in Reinforcement Learning Imitation methods source policy used target training Torrey et al. 2007: Demonstrate a strategy
  • 17. My Research Starting-point methods Hierarchical methods Hierarchical methods Imitation methods New RL algorithms Skill Transfer Macro Transfer
  • 18. RoboCup Domain 3-on-2 KeepAway 3-on-2 BreakAway 2-on-1 BreakAway 3-on-2 MoveDownfield
  • 19. Inductive Logic Programming IF [ ] THEN pass(Teammate) IF distance(Teammate) ≤ 5 THEN pass(Teammate) IF distance(Teammate) ≤ 10 THEN pass(Teammate) … IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate) IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate)
  • 20. Advice Taking Batch Reinforcement Learning via Support Vector Regression (RL-SVR) Agent Agent Compute Q-functions … Environment Environment Batch 2 Batch 1 Find Q-functions that minimize: ModelSize + C × DataMisfit
  • 21. Advice Taking Batch Reinforcement Learning with Advice (KBKR) Agent Agent Compute Q-functions … Environment Environment Advice Batch 1 Batch 2 + µ × AdviceMisfit Find Q-functions that minimize: ModelSize + C × DataMisfit
  • 22. Skill Transfer Algorithm Source ILP IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) Mapping Advice Taking Target [Human advice]
  • 23. Selected Results Skill transfer to 3-on-2 BreakAway from several tasks
  • 24. Macro-Operators pass(Teammate) move(Direction) IF [ ... ] THEN pass(Teammate) IF [ ... ] THEN move(ahead) IF [ ... ] THEN shoot(goalRight) IF [ ... ] THEN shoot(goalLeft) IF [ ... ] THEN pass(Teammate) IF [ ... ] THEN move(left) IF [ ... ] THEN shoot(goalRight) IF [ ... ] THEN shoot(goalRight) shoot(goalRight) shoot(goalLeft)
  • 25. Demonstration An imitation method source policy used target training
  • 26. Macro Transfer Algorithm Source ILP Demonstration Target
  • 27. Macro Transfer Algorithm Learning structures Positive: BreakAway games that score Negative: BreakAway games that didn’t score ILP IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE) THEN isaGoodGame(Game)
  • 28. Macro Transfer Algorithm Learning rules for arcs Positive: states in good games that took the arc Negative: states in good games that could have taken the arc but didn’t ILP pass(Teammate) shoot(goalRight) IF [ … ] THEN loop(State, Teammate)) IF [ … ] THEN enter(State)
  • 29. Selected Results Macro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway
  • 30. Machine learning is often designed in standalone tasks Transfer is a natural learning ability that we would like to incorporate into machine learners There are some successes, but challenges remain, like avoiding negative transfer and automating mapping Summary