SlideShare uma empresa Scribd logo
1 de 22
Robot Motor Skill Coordination withEM-based Reinforcement Learning Petar Kormushev, Sylvain Calinon, Darwin G. Caldwell Italian Institute of TechnologyAdvanced Robotics dept.http://www.iit.it October 20, 2010IROS 2010
Motivation How to learn complex motor skills which also require variable stiffness? How to demonstrate the required stiffness/compliance? How to teach highly-dynamic tasks? Petar Kormushev, Italian Institute of Technology 2/22
Background Learning adaptive stiffness by extracting variability and correlation information from multiple demonstrations Petar Kormushev, Italian Institute of Technology Sylvain Calinon et al., IROS 2010 3/22
Robot Motor Skill Learning Motion capture       Kinesthetic teaching Imitation learning Reinforcement learning Shared representation(encoding) Petar Kormushev, Italian Institute of Technology 4/22
Skill representation (encoding) ,[object Object]
Time independentTrajectory-based Via-points DMP GMM/GMR DS-based Petar Kormushev, Italian Institute of Technology 5/22
Dynamic Movement Primitives DMP Ijspeert, Nakanishi, Schaal,  IROS 2001 Demonstrated trajectory Sequence of attractors Petar Kormushev, Italian Institute of Technology 6/22
Extended DMP to include coordination Stiffness gain (scalar) Coordination matrix (full stiffness matrix) Advantages: ,[object Object]
 reduce number of primitivesPetar Kormushev, Italian Institute of Technology Proposal:  use Reinforcement learning to learn the coordination matrices 7/22
Example: Reaching task with obstacle Using full coordination matrices Using diagonal matrices Expected return: 0.61 Expected return: 0.73 Reward function: Petar Kormushev, Italian Institute of Technology 8/22
EM-based Reinforcement learning (RL) PoWER algorithm  -  Policy learning by Weighting Exploration with the Returns Advantages over policy-gradient based RL: no need of learning rate can use importance sampling single rollout enough to update policy Jens Kober and Jan Peters,  NIPS 2009 Petar Kormushev, Italian Institute of Technology 9/22
RL implementation Policy parameters Full coordination matrices: Attractor vectors: Policy update rule: Importance sampling uses best σ rollouts so far Petar Kormushev, Italian Institute of Technology 10/22
Pancake flipping: Experimental setup Petar Kormushev, Italian Institute of Technology Barrett WAM 7-DOF robot Artificial pancakewith 4 passive markers (more robust to occlusions) Frying pan mounted on the end-effector 11/22
Evaluation: Tracking of the pancake NaturalPointOptiTrack motion capture system Petar Kormushev, Italian Institute of Technology x12 100 Hz camera fps  40 Hz real-time capturing 12/22
Cumulative return of a rollout: Reward function Petar Kormushev, Italian Institute of Technology ,[object Object],orientation position height 13/22
Kinesthetic demonstration of the task Petar Kormushev, Italian Institute of Technology 14/22
Learning by trial and error Petar Kormushev, Italian Institute of Technology 15/22
Finally learned skill Petar Kormushev, Italian Institute of Technology 16/22
Motion capture to evaluate rollouts Petar Kormushev, Italian Institute of Technology 17/22
Captured pancake trajectory Petar Kormushev, Italian Institute of Technology 90° flip 180° flip 18/22
Performance Petar Kormushev, Italian Institute of Technology 19/22
Gravity compensation Task execution Reproduction control strategy Petar Kormushev, Italian Institute of Technology 20/22

Mais conteúdo relacionado

Destaque

Lecture 7
Lecture 7Lecture 7
Lecture 7butest
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3butest
 
Machine Learning CSCI 5622
Machine Learning CSCI 5622Machine Learning CSCI 5622
Machine Learning CSCI 5622butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximizationbutest
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modeljins0618
 
Measurement,evaluation,assessment(upload)
Measurement,evaluation,assessment(upload)Measurement,evaluation,assessment(upload)
Measurement,evaluation,assessment(upload)Dr.Shazia Zamir
 
Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Edureka!
 
Measurement, Evaluation, Assessment, and Tests
Measurement, Evaluation, Assessment, and TestsMeasurement, Evaluation, Assessment, and Tests
Measurement, Evaluation, Assessment, and TestsMonica P
 
Learning coordination strategies using reinforcement learning myriam z abrams...
Learning coordination strategies using reinforcement learning myriam z abrams...Learning coordination strategies using reinforcement learning myriam z abrams...
Learning coordination strategies using reinforcement learning myriam z abrams...Chang Ching-Chao
 
Psych 101 - Introduction to Psychology - Lecture 1
Psych 101 - Introduction to Psychology - Lecture 1Psych 101 - Introduction to Psychology - Lecture 1
Psych 101 - Introduction to Psychology - Lecture 1WhatisPsychology
 
Educational measurement, assessment and evaluation
Educational measurement, assessment and evaluationEducational measurement, assessment and evaluation
Educational measurement, assessment and evaluationBoyet Aluan
 
Evaluation in Education
Evaluation in EducationEvaluation in Education
Evaluation in EducationKusum Gaur
 

Destaque (16)

Lecture 7
Lecture 7Lecture 7
Lecture 7
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3
 
Machine Learning CSCI 5622
Machine Learning CSCI 5622Machine Learning CSCI 5622
Machine Learning CSCI 5622
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
Micro teaching skills
Micro teaching skillsMicro teaching skills
Micro teaching skills
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
Measurement,evaluation,assessment(upload)
Measurement,evaluation,assessment(upload)Measurement,evaluation,assessment(upload)
Measurement,evaluation,assessment(upload)
 
Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples
 
Measurement, Evaluation, Assessment, and Tests
Measurement, Evaluation, Assessment, and TestsMeasurement, Evaluation, Assessment, and Tests
Measurement, Evaluation, Assessment, and Tests
 
Learning coordination strategies using reinforcement learning myriam z abrams...
Learning coordination strategies using reinforcement learning myriam z abrams...Learning coordination strategies using reinforcement learning myriam z abrams...
Learning coordination strategies using reinforcement learning myriam z abrams...
 
Psych 101 - Introduction to Psychology - Lecture 1
Psych 101 - Introduction to Psychology - Lecture 1Psych 101 - Introduction to Psychology - Lecture 1
Psych 101 - Introduction to Psychology - Lecture 1
 
Educational measurement, assessment and evaluation
Educational measurement, assessment and evaluationEducational measurement, assessment and evaluation
Educational measurement, assessment and evaluation
 
Evaluation in Education
Evaluation in EducationEvaluation in Education
Evaluation in Education
 
Types of evaluation
Types of evaluationTypes of evaluation
Types of evaluation
 

Semelhante a Robot Motor Skill Coordination with EM-based Reinforcement Learning

Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Vincenzo Lomonaco
 
Unmanned Aerial Vehicle for Surveillance
Unmanned Aerial Vehicle for Surveillance Unmanned Aerial Vehicle for Surveillance
Unmanned Aerial Vehicle for Surveillance Vedant Srivastava
 
John W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final PresentationJohn W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final PresentationJohn Vinti
 
A Video Processing based System for Counting Vehicles
A Video Processing based System for Counting VehiclesA Video Processing based System for Counting Vehicles
A Video Processing based System for Counting VehiclesIRJET Journal
 
Human Action Recognition in Videos
Human Action Recognition in VideosHuman Action Recognition in Videos
Human Action Recognition in VideosIRJET Journal
 
OPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video StreamingOPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video StreamingAlpen-Adria-Universität
 
OPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdfOPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdfVignesh V Menon
 
A real time instrumentation approach for bridges and tunnels
A real time instrumentation approach for bridges and tunnelsA real time instrumentation approach for bridges and tunnels
A real time instrumentation approach for bridges and tunnelsDerya Dinçer
 
DIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSING
DIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSINGDIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSING
DIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSINGcsandit
 
Centrifuge Demonstration Project
Centrifuge Demonstration ProjectCentrifuge Demonstration Project
Centrifuge Demonstration ProjectVideoguy
 
Portfolio - Marco Piccinini
Portfolio - Marco PiccininiPortfolio - Marco Piccinini
Portfolio - Marco PiccininiMarco Piccinini
 
Wireless Bomb Disposal Robot
Wireless Bomb Disposal RobotWireless Bomb Disposal Robot
Wireless Bomb Disposal RobotAbhishek Gupta
 
Wireless Bomb Disposal Robot
Wireless Bomb Disposal RobotWireless Bomb Disposal Robot
Wireless Bomb Disposal RobotAbhishek Gupta
 
PGL SUM Video Summarization
PGL SUM Video SummarizationPGL SUM Video Summarization
PGL SUM Video SummarizationVasileiosMezaris
 
Visual Mapping and Collision Avoidance Dynamic Environments in Dynamic Enviro...
Visual Mapping and Collision Avoidance Dynamic Environments in Dynamic Enviro...Visual Mapping and Collision Avoidance Dynamic Environments in Dynamic Enviro...
Visual Mapping and Collision Avoidance Dynamic Environments in Dynamic Enviro...Darius Burschka
 
Acumen Overview Slides Rev
Acumen Overview Slides RevAcumen Overview Slides Rev
Acumen Overview Slides Revrginnca
 
Space solarpower
Space solarpowerSpace solarpower
Space solarpowerisrokids
 

Semelhante a Robot Motor Skill Coordination with EM-based Reinforcement Learning (20)

Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
 
Coep fab8-2012
Coep fab8-2012Coep fab8-2012
Coep fab8-2012
 
Unmanned Aerial Vehicle for Surveillance
Unmanned Aerial Vehicle for Surveillance Unmanned Aerial Vehicle for Surveillance
Unmanned Aerial Vehicle for Surveillance
 
John W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final PresentationJohn W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final Presentation
 
A Video Processing based System for Counting Vehicles
A Video Processing based System for Counting VehiclesA Video Processing based System for Counting Vehicles
A Video Processing based System for Counting Vehicles
 
Rail Deflection
Rail DeflectionRail Deflection
Rail Deflection
 
Human Action Recognition in Videos
Human Action Recognition in VideosHuman Action Recognition in Videos
Human Action Recognition in Videos
 
OPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video StreamingOPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video Streaming
 
OPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdfOPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdf
 
A real time instrumentation approach for bridges and tunnels
A real time instrumentation approach for bridges and tunnelsA real time instrumentation approach for bridges and tunnels
A real time instrumentation approach for bridges and tunnels
 
DIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSING
DIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSINGDIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSING
DIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSING
 
Centrifuge Demonstration Project
Centrifuge Demonstration ProjectCentrifuge Demonstration Project
Centrifuge Demonstration Project
 
Iciap 2
Iciap 2Iciap 2
Iciap 2
 
Portfolio - Marco Piccinini
Portfolio - Marco PiccininiPortfolio - Marco Piccinini
Portfolio - Marco Piccinini
 
Wireless Bomb Disposal Robot
Wireless Bomb Disposal RobotWireless Bomb Disposal Robot
Wireless Bomb Disposal Robot
 
Wireless Bomb Disposal Robot
Wireless Bomb Disposal RobotWireless Bomb Disposal Robot
Wireless Bomb Disposal Robot
 
PGL SUM Video Summarization
PGL SUM Video SummarizationPGL SUM Video Summarization
PGL SUM Video Summarization
 
Visual Mapping and Collision Avoidance Dynamic Environments in Dynamic Enviro...
Visual Mapping and Collision Avoidance Dynamic Environments in Dynamic Enviro...Visual Mapping and Collision Avoidance Dynamic Environments in Dynamic Enviro...
Visual Mapping and Collision Avoidance Dynamic Environments in Dynamic Enviro...
 
Acumen Overview Slides Rev
Acumen Overview Slides RevAcumen Overview Slides Rev
Acumen Overview Slides Rev
 
Space solarpower
Space solarpowerSpace solarpower
Space solarpower
 

Robot Motor Skill Coordination with EM-based Reinforcement Learning

  • 1. Robot Motor Skill Coordination withEM-based Reinforcement Learning Petar Kormushev, Sylvain Calinon, Darwin G. Caldwell Italian Institute of TechnologyAdvanced Robotics dept.http://www.iit.it October 20, 2010IROS 2010
  • 2. Motivation How to learn complex motor skills which also require variable stiffness? How to demonstrate the required stiffness/compliance? How to teach highly-dynamic tasks? Petar Kormushev, Italian Institute of Technology 2/22
  • 3. Background Learning adaptive stiffness by extracting variability and correlation information from multiple demonstrations Petar Kormushev, Italian Institute of Technology Sylvain Calinon et al., IROS 2010 3/22
  • 4. Robot Motor Skill Learning Motion capture Kinesthetic teaching Imitation learning Reinforcement learning Shared representation(encoding) Petar Kormushev, Italian Institute of Technology 4/22
  • 5.
  • 6. Time independentTrajectory-based Via-points DMP GMM/GMR DS-based Petar Kormushev, Italian Institute of Technology 5/22
  • 7. Dynamic Movement Primitives DMP Ijspeert, Nakanishi, Schaal, IROS 2001 Demonstrated trajectory Sequence of attractors Petar Kormushev, Italian Institute of Technology 6/22
  • 8.
  • 9. reduce number of primitivesPetar Kormushev, Italian Institute of Technology Proposal: use Reinforcement learning to learn the coordination matrices 7/22
  • 10. Example: Reaching task with obstacle Using full coordination matrices Using diagonal matrices Expected return: 0.61 Expected return: 0.73 Reward function: Petar Kormushev, Italian Institute of Technology 8/22
  • 11. EM-based Reinforcement learning (RL) PoWER algorithm - Policy learning by Weighting Exploration with the Returns Advantages over policy-gradient based RL: no need of learning rate can use importance sampling single rollout enough to update policy Jens Kober and Jan Peters, NIPS 2009 Petar Kormushev, Italian Institute of Technology 9/22
  • 12. RL implementation Policy parameters Full coordination matrices: Attractor vectors: Policy update rule: Importance sampling uses best σ rollouts so far Petar Kormushev, Italian Institute of Technology 10/22
  • 13. Pancake flipping: Experimental setup Petar Kormushev, Italian Institute of Technology Barrett WAM 7-DOF robot Artificial pancakewith 4 passive markers (more robust to occlusions) Frying pan mounted on the end-effector 11/22
  • 14. Evaluation: Tracking of the pancake NaturalPointOptiTrack motion capture system Petar Kormushev, Italian Institute of Technology x12 100 Hz camera fps 40 Hz real-time capturing 12/22
  • 15.
  • 16. Kinesthetic demonstration of the task Petar Kormushev, Italian Institute of Technology 14/22
  • 17. Learning by trial and error Petar Kormushev, Italian Institute of Technology 15/22
  • 18. Finally learned skill Petar Kormushev, Italian Institute of Technology 16/22
  • 19. Motion capture to evaluate rollouts Petar Kormushev, Italian Institute of Technology 17/22
  • 20. Captured pancake trajectory Petar Kormushev, Italian Institute of Technology 90° flip 180° flip 18/22
  • 21. Performance Petar Kormushev, Italian Institute of Technology 19/22
  • 22. Gravity compensation Task execution Reproduction control strategy Petar Kormushev, Italian Institute of Technology 20/22
  • 23. Conclusion Combining Imitation learning + RL to learn motor skills with variable stiffness Imitation used to initialize policy RL to learn coordination matrices Learned variable stiffness duringreproduction Future work other representations other RL algorithms Petar Kormushev, Italian Institute of Technology 21/22
  • 24. Thanks for your attention! Petar Kormushev, Italian Institute of Technology 22/22

Notas do Editor

  1. Some tasks can be learned very efficiently through this kinesthetic teaching mechanism. However, some other tasks are more difficult to learn in such a way. By trying to generalize the skill based on several observation, it might be difficult to extract the important information from motion observation. For task that are highly dynamic such as pancake flipping, the controller learned by imitation does not generalize correctly the task of flipping the pancake.The skill can however be refined through reinforcement learning. After about 50 trials, the robot learns that the first part of the task requires a stiff behaviorto throw the pancake in the air, while the second part of the task requires the hand to be more compliant to catch the pancake without letting it bounce off the pan.
  2. Use of a gravity compensation controller as a user-friendly means of transferring a skill through kinesthetic teaching, and for physical human-robot interaction tasks where safety issues need to be considered.