Robot Motor Skill Coordination with EM-based Reinforcement Learning

•Transferir como PPTX, PDF•

1 gostou•720 visualizações

A Barrett WAM robot learns to flip pancakes by reinforcement learning. The motion is encoded in a mixture of basis force fields through an extension of Dynamic Movement Primitives (DMP) that represents the synergies across the different variables through stiffness matrices. An Inverse Dynamics controller with variable stiffness is used for reproduction. The skill is first demonstrated via kinesthetic teaching, and then refined by Policy learning by Weighting Exploration with the Returns (PoWER) algorithm. After 50 trials, the robot learns that the first part of the task requires a stiff behavior to throw the pancake in the air, while the second part requires the hand to be compliant in order to catch the pancake without having it bounced off the pan.

Robot Motor Skill Coordination withEM-based Reinforcement Learning Petar Kormushev, Sylvain Calinon, Darwin G. Caldwell Italian Institute of TechnologyAdvanced Robotics dept.http://www.iit.it October 20, 2010IROS 2010

Motivation How to learn complex motor skills which also require variable stiffness? How to demonstrate the required stiffness/compliance? How to teach highly-dynamic tasks? Petar Kormushev, Italian Institute of Technology 2/22

Background Learning adaptive stiffness by extracting variability and correlation information from multiple demonstrations Petar Kormushev, Italian Institute of Technology Sylvain Calinon et al., IROS 2010 3/22

Robot Motor Skill Learning Motion capture Kinesthetic teaching Imitation learning Reinforcement learning Shared representation(encoding) Petar Kormushev, Italian Institute of Technology 4/22

Skill representation (encoding) ,[object Object]

Time independentTrajectory-based Via-points DMP GMM/GMR DS-based Petar Kormushev, Italian Institute of Technology 5/22

Dynamic Movement Primitives DMP Ijspeert, Nakanishi, Schaal, IROS 2001 Demonstrated trajectory Sequence of attractors Petar Kormushev, Italian Institute of Technology 6/22

Extended DMP to include coordination Stiffness gain (scalar) Coordination matrix (full stiffness matrix) Advantages: ,[object Object]

reduce number of primitivesPetar Kormushev, Italian Institute of Technology Proposal: use Reinforcement learning to learn the coordination matrices 7/22

Example: Reaching task with obstacle Using full coordination matrices Using diagonal matrices Expected return: 0.61 Expected return: 0.73 Reward function: Petar Kormushev, Italian Institute of Technology 8/22

EM-based Reinforcement learning (RL) PoWER algorithm - Policy learning by Weighting Exploration with the Returns Advantages over policy-gradient based RL: no need of learning rate can use importance sampling single rollout enough to update policy Jens Kober and Jan Peters, NIPS 2009 Petar Kormushev, Italian Institute of Technology 9/22

RL implementation Policy parameters Full coordination matrices: Attractor vectors: Policy update rule: Importance sampling uses best σ rollouts so far Petar Kormushev, Italian Institute of Technology 10/22

Pancake flipping: Experimental setup Petar Kormushev, Italian Institute of Technology Barrett WAM 7-DOF robot Artificial pancakewith 4 passive markers (more robust to occlusions) Frying pan mounted on the end-effector 11/22

Evaluation: Tracking of the pancake NaturalPointOptiTrack motion capture system Petar Kormushev, Italian Institute of Technology x12 100 Hz camera fps 40 Hz real-time capturing 12/22

Cumulative return of a rollout: Reward function Petar Kormushev, Italian Institute of Technology ,[object Object],orientation position height 13/22

Kinesthetic demonstration of the task Petar Kormushev, Italian Institute of Technology 14/22

Learning by trial and error Petar Kormushev, Italian Institute of Technology 15/22

Finally learned skill Petar Kormushev, Italian Institute of Technology 16/22

Motion capture to evaluate rollouts Petar Kormushev, Italian Institute of Technology 17/22

Captured pancake trajectory Petar Kormushev, Italian Institute of Technology 90° flip 180° flip 18/22

Performance Petar Kormushev, Italian Institute of Technology 19/22

Gravity compensation Task execution Reproduction control strategy Petar Kormushev, Italian Institute of Technology 20/22

Mais conteúdo relacionado

Destaque

Lecture 7butest

Machine Learning: Decision Trees Chapter 18.1-18.3butest

Machine Learning CSCI 5622butest

Executive Summary Hare Chevrolet is a General Motors dealership ...butest

EL MODELO DE NEGOCIO DE YOUTUBEbutest

Lecture 18: Gaussian Mixture Models and Expectation Maximizationbutest

Micro teaching skillsAngel Rathnabai

Clustering：k-means, expect-maximization and gaussian mixture modeljins0618

Measurement,evaluation,assessment(upload)Dr.Shazia Zamir

Application of Clustering in Data Science using Real-life Examples Edureka!

Measurement, Evaluation, Assessment, and TestsMonica P

Learning coordination strategies using reinforcement learning myriam z abrams...Chang Ching-Chao

Psych 101 - Introduction to Psychology - Lecture 1WhatisPsychology

Educational measurement, assessment and evaluationBoyet Aluan

Evaluation in EducationKusum Gaur

Types of evaluationUpendra Yadav

Destaque (16)

Lecture 7

Machine Learning: Decision Trees Chapter 18.1-18.3

Machine Learning CSCI 5622

Executive Summary Hare Chevrolet is a General Motors dealership ...

EL MODELO DE NEGOCIO DE YOUTUBE

Lecture 18: Gaussian Mixture Models and Expectation Maximization

Micro teaching skills

Clustering：k-means, expect-maximization and gaussian mixture model

Measurement,evaluation,assessment(upload)

Application of Clustering in Data Science using Real-life Examples

Measurement, Evaluation, Assessment, and Tests

Learning coordination strategies using reinforcement learning myriam z abrams...

Psych 101 - Introduction to Psychology - Lecture 1

Educational measurement, assessment and evaluation

Evaluation in Education

Types of evaluation

Semelhante a Robot Motor Skill Coordination with EM-based Reinforcement Learning

Deep Learning for Computer Vision: A comparision between Convolutional Neural...Vincenzo Lomonaco

Coep fab8-2012amitsawant04

Unmanned Aerial Vehicle for Surveillance Vedant Srivastava

John W. Vinti Particle Tracker Final PresentationJohn Vinti

A Video Processing based System for Counting VehiclesIRJET Journal

Rail DeflectionRailways and Harbours

Human Action Recognition in VideosIRJET Journal

OPTE: Online Per-title Encoding for Live Video StreamingAlpen-Adria-Universität

OPTE: Online Per-title Encoding for Live Video Streaming.pdfVignesh V Menon

A real time instrumentation approach for bridges and tunnelsDerya Dinçer

DIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSINGcsandit

Centrifuge Demonstration ProjectVideoguy

Iciap 2Ionut Mironica

Portfolio - Marco PiccininiMarco Piccinini

Wireless Bomb Disposal RobotAbhishek Gupta

PGL SUM Video SummarizationVasileiosMezaris

Visual Mapping and Collision Avoidance Dynamic Environments in Dynamic Enviro...Darius Burschka

Acumen Overview Slides Revrginnca

Space solarpowerisrokids

Semelhante a Robot Motor Skill Coordination with EM-based Reinforcement Learning (20)

Deep Learning for Computer Vision: A comparision between Convolutional Neural...

Coep fab8-2012

Unmanned Aerial Vehicle for Surveillance

John W. Vinti Particle Tracker Final Presentation

A Video Processing based System for Counting Vehicles

Rail Deflection

Human Action Recognition in Videos

OPTE: Online Per-title Encoding for Live Video Streaming

OPTE: Online Per-title Encoding for Live Video Streaming.pdf

A real time instrumentation approach for bridges and tunnels

DIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSING

Centrifuge Demonstration Project

Iciap 2

Portfolio - Marco Piccinini

Wireless Bomb Disposal Robot

PGL SUM Video Summarization

Visual Mapping and Collision Avoidance Dynamic Environments in Dynamic Enviro...

Acumen Overview Slides Rev

Space solarpower

Robot Motor Skill Coordination with EM-based Reinforcement Learning

1. Robot Motor Skill Coordination withEM-based Reinforcement Learning Petar Kormushev, Sylvain Calinon, Darwin G. Caldwell Italian Institute of TechnologyAdvanced Robotics dept.http://www.iit.it October 20, 2010IROS 2010

2. Motivation How to learn complex motor skills which also require variable stiffness? How to demonstrate the required stiffness/compliance? How to teach highly-dynamic tasks? Petar Kormushev, Italian Institute of Technology 2/22

3. Background Learning adaptive stiffness by extracting variability and correlation information from multiple demonstrations Petar Kormushev, Italian Institute of Technology Sylvain Calinon et al., IROS 2010 3/22

4. Robot Motor Skill Learning Motion capture Kinesthetic teaching Imitation learning Reinforcement learning Shared representation(encoding) Petar Kormushev, Italian Institute of Technology 4/22

6. Time independentTrajectory-based Via-points DMP GMM/GMR DS-based Petar Kormushev, Italian Institute of Technology 5/22

7. Dynamic Movement Primitives DMP Ijspeert, Nakanishi, Schaal, IROS 2001 Demonstrated trajectory Sequence of attractors Petar Kormushev, Italian Institute of Technology 6/22

9. reduce number of primitivesPetar Kormushev, Italian Institute of Technology Proposal: use Reinforcement learning to learn the coordination matrices 7/22

10. Example: Reaching task with obstacle Using full coordination matrices Using diagonal matrices Expected return: 0.61 Expected return: 0.73 Reward function: Petar Kormushev, Italian Institute of Technology 8/22

11. EM-based Reinforcement learning (RL) PoWER algorithm - Policy learning by Weighting Exploration with the Returns Advantages over policy-gradient based RL: no need of learning rate can use importance sampling single rollout enough to update policy Jens Kober and Jan Peters, NIPS 2009 Petar Kormushev, Italian Institute of Technology 9/22

12. RL implementation Policy parameters Full coordination matrices: Attractor vectors: Policy update rule: Importance sampling uses best σ rollouts so far Petar Kormushev, Italian Institute of Technology 10/22

13. Pancake flipping: Experimental setup Petar Kormushev, Italian Institute of Technology Barrett WAM 7-DOF robot Artificial pancakewith 4 passive markers (more robust to occlusions) Frying pan mounted on the end-effector 11/22

14. Evaluation: Tracking of the pancake NaturalPointOptiTrack motion capture system Petar Kormushev, Italian Institute of Technology x12 100 Hz camera fps 40 Hz real-time capturing 12/22

15.

16. Kinesthetic demonstration of the task Petar Kormushev, Italian Institute of Technology 14/22

17. Learning by trial and error Petar Kormushev, Italian Institute of Technology 15/22

18. Finally learned skill Petar Kormushev, Italian Institute of Technology 16/22

19. Motion capture to evaluate rollouts Petar Kormushev, Italian Institute of Technology 17/22

20. Captured pancake trajectory Petar Kormushev, Italian Institute of Technology 90° flip 180° flip 18/22

21. Performance Petar Kormushev, Italian Institute of Technology 19/22

22. Gravity compensation Task execution Reproduction control strategy Petar Kormushev, Italian Institute of Technology 20/22

23. Conclusion Combining Imitation learning + RL to learn motor skills with variable stiffness Imitation used to initialize policy RL to learn coordination matrices Learned variable stiffness duringreproduction Future work other representations other RL algorithms Petar Kormushev, Italian Institute of Technology 21/22

24. Thanks for your attention! Petar Kormushev, Italian Institute of Technology 22/22

Notas do Editor

Some tasks can be learned very efficiently through this kinesthetic teaching mechanism. However, some other tasks are more difficult to learn in such a way. By trying to generalize the skill based on several observation, it might be difficult to extract the important information from motion observation. For task that are highly dynamic such as pancake flipping, the controller learned by imitation does not generalize correctly the task of flipping the pancake.The skill can however be refined through reinforcement learning. After about 50 trials, the robot learns that the first part of the task requires a stiff behaviorto throw the pancake in the air, while the second part of the task requires the hand to be more compliant to catch the pancake without letting it bounce off the pan.
Use of a gravity compensation controller as a user-friendly means of transferring a skill through kinesthetic teaching, and for physical human-robot interaction tasks where safety issues need to be considered.

Robot Motor Skill Coordination with EM-based Reinforcement Learning

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (16)

Semelhante a Robot Motor Skill Coordination with EM-based Reinforcement Learning

Semelhante a Robot Motor Skill Coordination with EM-based Reinforcement Learning (20)

Robot Motor Skill Coordination with EM-based Reinforcement Learning

Notas do Editor