[DL輪読会]Zero shot visual imitation

•

2 likes•1,162 views

Deep Learning JP

2018/2/2 Deep Learning JP: http://deeplearning.jp/seminar-2/

Technology

Zero-Shot
Visual Imitation
(ICLR 2018)
DEEPLEARNINGJP
[DLPapers]
http://deeplearning.jp/
“Zero-Shot Visual Imitation (ICLR2018)”
Iori Yanokura, JSK Lab
1

• ICLR 2018 accepted
• Project page: https://sites.google.com/view/zero-shot-visual-imitation/home
• Reviews: 8 (confidence: 4), 8 (confidence: 3),
7 (confidence: 5) = rank of 986 / 1000 = top 2%
• : Deepak Pathak, Parsa Mahmoudieh, Michael Luo, Pulkit Agrawal, Dian
Chen,Fred Shentu, Evan Shelhamer, Jitendra Malik, Alexei A. Efros, Trevor
Darrell
•
•
•
2

•
•
•
•
3
(Sutton&Barto, 1998)
Reward( )
Reward Engineering...

• One-shot imitation learning (Duan et al., 2017)
- 1
6

(Visual Demonstration)
•
•
•
(Nair et al., 2017) Sermanet et al., 2016) (Liu et al., 2017)
7

V Policy goal
• Universal value functions(Schaul et al., 2015)
- V(s;θ)
• Hindsight experience replay (Andrychowicz et al., 2017)
-
9

3. LEARNING TO IMITATE WITHOUT EXPERT
SUPERVISION
•
•
•
•
•
10

• Imitation Learning
-
• Visual Demonstration
-
• Forward and Inverse Dynamics
- Dynamics model
multiple steps policy
• Goal Conditioning
- zero-shot
11

3.1 LEARNING THE PARAMETRIC SKILL
FUNCTION
•
•
•
•
•
13

• One-step (st, at, st+1)
• Cross-entropy loss SGD
•
Ground-truth
14
3.1 LEARNING THE PARAMETRIC SKILL
FUNCTION

3.2 MODELING ACTION DISTRIBUTION VIA
FORWARD CONSISTENCY
• State
•
RNN state forward dynamics
model-free action
15

3.2 MODELING ACTION DISTRIBUTION VIA
FORWARD CONSISTENCY
•
forward dynamics f
θh, θf
16

4.1 NAVIGATION IN INDOOR OFFICE
ENVIRONMENT
19

4.1 NAVIGATION IN INDOOR OFFICE
ENVIRONMENT (Result)
20
https://www.youtube.com/watch?v=ynfVRM27YFU
https://www.youtube.com/watch?time_continue=3&v=OwvnqjgUqc8

4.2 VISION - BASED ROPE MANIPULATION
Task: ( )
State: RGB
Action:
(Nair et al., 2017)
21

22
4.2 VISION - BASED ROPE
MANIPULATION(Result)
https://www.youtube.com/watch?v=YlaojV
XHagM

5 CONCLUSION
•
•
•
• (Andrychowicz et al., 2017)
23

Reference
Ashvin Nair, Dian Chen, Pulkit Agrawal, Phillip Isola, Pieter Abbeel, Jitendra Malik, and Sergey Levine. Combining self-
supervised learning and imitation for vision-based rope manipulation. ICRA, 2017.
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised
prediction. In ICML, 2017.
Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin,
Pieter Abbeel, and Wojciech Zaremba. Hindsight experience replay. arXiv preprint arXiv:1707.01495, 2017.
Tom Schaul, Daniel Horgan, Karol Gregor, and David Silver. Universal value function approxima- tors. In Proceedings of the
32nd International Conference on Machine Learning (ICML-15), pp. 1312–1320, 2015.
Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever,Pieter Abbeel, and Wojciech
Zaremba. One-shot imitation learning. arXiv preprint arXiv:1703.07326, 2017.
Pierre Sermanet, Kelvin Xu, and Sergey Levine. Unsupervised perceptual rewards for imitation learning. arXiv preprint
arXiv:1612.06699, 2016.
Bradly C Stadie, Pieter Abbeel, and Ilya Sutskever. Third-person imitation learning. arXiv preprint arXiv:1703.01703, 2017.
YuXuan Liu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine. Imitation from observation: Learn- ing to imitate behaviors
from raw video via context translation. arXiv preprint arXiv:1707.03374,2017.
Manuel Watter, Jost Springenberg, Joschka Boedecker, and Martin Riedmiller. Embed to control: A locally linear
latent dynamics model for control from raw images. In NIPS, 2015.
Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In Advances in neural information
processing systems, pp. 305–313, 1989.
Sutton, Richard S., and Andrew G. Barto. Introduction to reinforcement learning. Vol. 135. Cambridge: MIT press,
1998.
25

What's hot

【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"Deep Learning JP

【DL輪読会】Transformers are Sample Efficient World ModelsDeep Learning JP

TensorFlowで逆強化学習Mitsuhisa Ohta

完全自動運転実現のための信頼度付き自己位置推定の提案Naoki Akai

強化学習と逆強化学習を組み合わせた模倣学習Eiji Uchibe

SSII2022 [TS2] 自律移動ロボットのためのロボットビジョン〜オープンソースの自動運転ソフトAutowareを解説〜SSII

機械学習モデルの判断根拠の説明Satoshi Hara

Reinforcement Learning @ NeurIPS2018佑甲野

myCobotがある生活Ryo Kabutan

【DL輪読会】マルチエージェント強化学習における近年の協調的方策学習アルゴリズムの発展Deep Learning JP

近年のHierarchical Vision TransformerYusuke Uchida

[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...Deep Learning JP

3次元計測とフィルタリングNorishige Fukushima

【DL輪読会】論文解説：Offline Reinforcement Learning as One Big Sequence Modeling ProblemDeep Learning JP

強化学習 DQNからPPOまでharmonylab

【DL輪読会】DayDreamer: World Models for Physical Robot LearningDeep Learning JP

これからの Vision & Language ～ Acadexit した4つの理由Yoshitaka Ushiku

四脚ロボットによるつくばチャレンジへの取り組みkiyoshiiriemon

[DL輪読会] マルチエージェント強化学習と心の理論Deep Learning JP

ポーカーAIの最新動向 20171031Jun Okumura

What's hot (20)

【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"

【DL輪読会】Transformers are Sample Efficient World Models

TensorFlowで逆強化学習

完全自動運転実現のための信頼度付き自己位置推定の提案

強化学習と逆強化学習を組み合わせた模倣学習

SSII2022 [TS2] 自律移動ロボットのためのロボットビジョン〜オープンソースの自動運転ソフトAutowareを解説〜

機械学習モデルの判断根拠の説明

Reinforcement Learning @ NeurIPS2018

myCobotがある生活

【DL輪読会】マルチエージェント強化学習における近年の協調的方策学習アルゴリズムの発展

近年のHierarchical Vision Transformer

[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...

3次元計測とフィルタリング

【DL輪読会】論文解説：Offline Reinforcement Learning as One Big Sequence Modeling Problem

強化学習 DQNからPPOまで

【DL輪読会】DayDreamer: World Models for Physical Robot Learning

これからの Vision & Language ～ Acadexit した4つの理由

四脚ロボットによるつくばチャレンジへの取り組み

[DL輪読会] マルチエージェント強化学習と心の理論

ポーカーAIの最新動向 20171031

Similar to [DL輪読会]Zero shot visual imitation

MixTaiwan 20170222 清大電機孫民 AI The Next Big ThingMix Taiwan

Dr. Ajay Kumar SinghDr. Ajay Kumar Singh

Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Universitat Politècnica de Catalunya

Top Cited Articles in Computer Graphics and Animationijcga

Progress Reprot.pptxrahulverma136219

Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Universitat Politècnica de Catalunya

Shot-Net: A Convolutional Neural Network for Classifying Different Cricket ShotsMohammad Shakirul islam

Multimodal Multi-sensory Interaction for Mixed RealityMark Billinghurst

Recurrent Neural NetworksRakuten Group, Inc.

Learning by Design: Bringing Poster Carousels to Life Through Augmented Reali...Parisa Mehran

Learning with Unpaired DataGoergen Institute for Data Science

Effects of virtual reality and augmented reality on visitor experiences in mu...International Federation for Information Technologies in Travel and Tourism (IFITT)

Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...Universitat Politècnica de Catalunya

IRJET- Development of a Face Recognition System with Deep Learning and Py...IRJET Journal

Magic fingerWen Qian Peng

Is Augmented Reality Robot as Effective as Physical Robot in Motivating Stude...Oka Kurniawan

Learning Analytics Tracking LearningErik Duval

Exploring EEG for object detection and retrievalUniversitat Politècnica de Catalunya

Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016Universitat Politècnica de Catalunya

Predicting Media Memorability with Audio, Video, and Text representationsAlison Reboud

Similar to [DL輪読会]Zero shot visual imitation (20)

MixTaiwan 20170222 清大電機孫民 AI The Next Big Thing

Dr. Ajay Kumar Singh

Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...

Top Cited Articles in Computer Graphics and Animation

Progress Reprot.pptx

Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...

Shot-Net: A Convolutional Neural Network for Classifying Different Cricket Shots

Multimodal Multi-sensory Interaction for Mixed Reality

Recurrent Neural Networks

Learning by Design: Bringing Poster Carousels to Life Through Augmented Reali...

Learning with Unpaired Data

Effects of virtual reality and augmented reality on visitor experiences in mu...

Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...

IRJET- Development of a Face Recognition System with Deep Learning and Py...

Magic finger

Is Augmented Reality Robot as Effective as Physical Robot in Motivating Stude...

Learning Analytics Tracking Learning

Exploring EEG for object detection and retrieval

Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Predicting Media Memorability with Audio, Video, and Text representations

Recently uploaded

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

MS Copilot expands with MS Graph connectorsNanddeep Nachan

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

DBX First Quarter 2024 Investor PresentationDropbox

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

MINDCTI Revenue Release Quarter One 2024MIND CTI

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

Why Teams call analytics are critical to your entire businesspanagenda

Manulife - Insurer Transformation Award 2024The Digital Insurer

Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede

FWD Group - Insurer Innovation Award 2024The Digital Insurer

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

MS Copilot expands with MS Graph connectors

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

AWS Community Day CPH - Three problems of Terraform

Boost Fertility New Invention Ups Success Rates.pdf

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

How to Troubleshoot Apps for the Modern Connected Worker

DBX First Quarter 2024 Investor Presentation

presentation ICT roal in 21st century education

MINDCTI Revenue Release Quarter One 2024

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Why Teams call analytics are critical to your entire business

Manulife - Insurer Transformation Award 2024

Spring Boot vs Quarkus the ultimate battle - DevoxxUK

FWD Group - Insurer Innovation Award 2024

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

[DL輪読会]Zero shot visual imitation

1. Zero-Shot Visual Imitation (ICLR 2018) DEEPLEARNINGJP [DLPapers] http://deeplearning.jp/ “Zero-Shot Visual Imitation (ICLR2018)” Iori Yanokura, JSK Lab 1

2. • ICLR 2018 accepted • Project page: https://sites.google.com/view/zero-shot-visual-imitation/home • Reviews: 8 (confidence: 4), 8 (confidence: 3), 7 (confidence: 5) = rank of 986 / 1000 = top 2% • : Deepak Pathak, Parsa Mahmoudieh, Michael Luo, Pulkit Agrawal, Dian Chen,Fred Shentu, Evan Shelhamer, Jitendra Malik, Alexei A. Efros, Trevor Darrell • • • 2

3. • • • • 3 (Sutton&Barto, 1998) Reward( ) Reward Engineering...

4. (Imitation Learning) 2 • • 4

5. • • • • 5

6. • One-shot imitation learning (Duan et al., 2017) - 1 6

7. (Visual Demonstration) • • • (Nair et al., 2017) Sermanet et al., 2016) (Liu et al., 2017) 7

8. • • 8

9. V Policy goal • Universal value functions(Schaul et al., 2015) - V(s;θ) • Hindsight experience replay (Andrychowicz et al., 2017) - 9

10. 3. LEARNING TO IMITATE WITHOUT EXPERT SUPERVISION • • • • • 10

11. • Imitation Learning - • Visual Demonstration - • Forward and Inverse Dynamics - Dynamics model multiple steps policy • Goal Conditioning - zero-shot 11

12. Forward dynamics 12

13. 3.1 LEARNING THE PARAMETRIC SKILL FUNCTION • • • • • 13

14. • One-step (st, at, st+1) • Cross-entropy loss SGD • Ground-truth 14 3.1 LEARNING THE PARAMETRIC SKILL FUNCTION

15. 3.2 MODELING ACTION DISTRIBUTION VIA FORWARD CONSISTENCY • State • RNN state forward dynamics model-free action 15

16. 3.2 MODELING ACTION DISTRIBUTION VIA FORWARD CONSISTENCY • forward dynamics f θh, θf 16

17. 3.3 GOAL SATISFACTION • • • 17

18. 4 Results • Navigation • • • PSF 18

19. 4.1 NAVIGATION IN INDOOR OFFICE ENVIRONMENT 19

20. 4.1 NAVIGATION IN INDOOR OFFICE ENVIRONMENT (Result) 20 https://www.youtube.com/watch?v=ynfVRM27YFU https://www.youtube.com/watch?time_continue=3&v=OwvnqjgUqc8

21. 4.2 VISION - BASED ROPE MANIPULATION Task: ( ) State: RGB Action: (Nair et al., 2017) 21

22. 22 4.2 VISION - BASED ROPE MANIPULATION(Result) https://www.youtube.com/watch?v=YlaojV XHagM

23. 5 CONCLUSION • • • • (Andrychowicz et al., 2017) 23

24. • RL • • 24

25. Reference Ashvin Nair, Dian Chen, Pulkit Agrawal, Phillip Isola, Pieter Abbeel, Jitendra Malik, and Sergey Levine. Combining self- supervised learning and imitation for vision-based rope manipulation. ICRA, 2017. Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. In ICML, 2017. Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. Hindsight experience replay. arXiv preprint arXiv:1707.01495, 2017. Tom Schaul, Daniel Horgan, Karol Gregor, and David Silver. Universal value function approxima- tors. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 1312–1320, 2015. Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever,Pieter Abbeel, and Wojciech Zaremba. One-shot imitation learning. arXiv preprint arXiv:1703.07326, 2017. Pierre Sermanet, Kelvin Xu, and Sergey Levine. Unsupervised perceptual rewards for imitation learning. arXiv preprint arXiv:1612.06699, 2016. Bradly C Stadie, Pieter Abbeel, and Ilya Sutskever. Third-person imitation learning. arXiv preprint arXiv:1703.01703, 2017. YuXuan Liu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine. Imitation from observation: Learn- ing to imitate behaviors from raw video via context translation. arXiv preprint arXiv:1707.03374,2017. Manuel Watter, Jost Springenberg, Joschka Boedecker, and Martin Riedmiller. Embed to control: A locally linear latent dynamics model for control from raw images. In NIPS, 2015. Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In Advances in neural information processing systems, pp. 305–313, 1989. Sutton, Richard S., and Andrew G. Barto. Introduction to reinforcement learning. Vol. 135. Cambridge: MIT press, 1998. 25

[DL輪読会]Zero shot visual imitation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to [DL輪読会]Zero shot visual imitation

Similar to [DL輪読会]Zero shot visual imitation (20)

More from Deep Learning JP

More from Deep Learning JP (20)

Recently uploaded

Recently uploaded (20)

[DL輪読会]Zero shot visual imitation