A Barrett WAM robot learns to flip pancakes by reinforcement learning.
The motion is encoded in a mixture of basis force fields through an extension of Dynamic Movement Primitives (DMP) that represents the synergies across the different variables through stiffness matrices. An Inverse Dynamics controller with variable stiffness is used for reproduction.
The skill is first demonstrated via kinesthetic teaching, and then refined by Policy learning by Weighting Exploration with the Returns (PoWER) algorithm. After 50 trials, the robot learns that the first part of the task requires a stiff behavior to throw the pancake in the air, while the second part requires the hand to be compliant in order to catch the pancake without having it bounced off the pan.
Robot Motor Skill Coordination with EM-based Reinforcement Learning
1. Robot Motor Skill Coordination withEM-based Reinforcement Learning Petar Kormushev, Sylvain Calinon, Darwin G. Caldwell Italian Institute of TechnologyAdvanced Robotics dept.http://www.iit.it October 20, 2010IROS 2010
2. Motivation How to learn complex motor skills which also require variable stiffness? How to demonstrate the required stiffness/compliance? How to teach highly-dynamic tasks? Petar Kormushev, Italian Institute of Technology 2/22
3. Background Learning adaptive stiffness by extracting variability and correlation information from multiple demonstrations Petar Kormushev, Italian Institute of Technology Sylvain Calinon et al., IROS 2010 3/22
4. Robot Motor Skill Learning Motion capture Kinesthetic teaching Imitation learning Reinforcement learning Shared representation(encoding) Petar Kormushev, Italian Institute of Technology 4/22
7. Dynamic Movement Primitives DMP Ijspeert, Nakanishi, Schaal, IROS 2001 Demonstrated trajectory Sequence of attractors Petar Kormushev, Italian Institute of Technology 6/22
8.
9. reduce number of primitivesPetar Kormushev, Italian Institute of Technology Proposal: use Reinforcement learning to learn the coordination matrices 7/22
10. Example: Reaching task with obstacle Using full coordination matrices Using diagonal matrices Expected return: 0.61 Expected return: 0.73 Reward function: Petar Kormushev, Italian Institute of Technology 8/22
11. EM-based Reinforcement learning (RL) PoWER algorithm - Policy learning by Weighting Exploration with the Returns Advantages over policy-gradient based RL: no need of learning rate can use importance sampling single rollout enough to update policy Jens Kober and Jan Peters, NIPS 2009 Petar Kormushev, Italian Institute of Technology 9/22
12. RL implementation Policy parameters Full coordination matrices: Attractor vectors: Policy update rule: Importance sampling uses best σ rollouts so far Petar Kormushev, Italian Institute of Technology 10/22
13. Pancake flipping: Experimental setup Petar Kormushev, Italian Institute of Technology Barrett WAM 7-DOF robot Artificial pancakewith 4 passive markers (more robust to occlusions) Frying pan mounted on the end-effector 11/22
14. Evaluation: Tracking of the pancake NaturalPointOptiTrack motion capture system Petar Kormushev, Italian Institute of Technology x12 100 Hz camera fps 40 Hz real-time capturing 12/22
22. Gravity compensation Task execution Reproduction control strategy Petar Kormushev, Italian Institute of Technology 20/22
23. Conclusion Combining Imitation learning + RL to learn motor skills with variable stiffness Imitation used to initialize policy RL to learn coordination matrices Learned variable stiffness duringreproduction Future work other representations other RL algorithms Petar Kormushev, Italian Institute of Technology 21/22
24. Thanks for your attention! Petar Kormushev, Italian Institute of Technology 22/22
Notas do Editor
Some tasks can be learned very efficiently through this kinesthetic teaching mechanism. However, some other tasks are more difficult to learn in such a way. By trying to generalize the skill based on several observation, it might be difficult to extract the important information from motion observation. For task that are highly dynamic such as pancake flipping, the controller learned by imitation does not generalize correctly the task of flipping the pancake.The skill can however be refined through reinforcement learning. After about 50 trials, the robot learns that the first part of the task requires a stiff behaviorto throw the pancake in the air, while the second part of the task requires the hand to be more compliant to catch the pancake without letting it bounce off the pan.
Use of a gravity compensation controller as a user-friendly means of transferring a skill through kinesthetic teaching, and for physical human-robot interaction tasks where safety issues need to be considered.