[DL輪読会]SeqGan Sequence Generative Adversarial Nets with Policy Gradient

•

5 gostaram•3,412 visualizações

Deep Learning JP

2016/9/30 Deep Learning JP: http://deeplearning.jp/seminar-2/

Tecnologia

SeqGAN: Sequence Generative
Adversarial Nets with Policy
Gradient
Lantao Yu† , Weinan Zhang† , Jun Wang‡ , Yong Yu†
†Shanghai Jiao Tong University, ‡University College London
{yulantao,wnzhang,yyu}@apex.sjtu.edu.cn, j.wang@cs.ucl.ac.uk
2016/9/30
発表者：金子貴輝

2
• GANのDからGへの勾配伝播が離散系列では消えてしまうので，
Gを微修正の効く確率的なモデルに置き換え，方策勾配で伝える
• GにはLSTMも使用する
• Qはパラメータ近似せず，毎回MC-searchで平均して求める
• 内部的にRLを使うのは先行研究有り
– VRNNのエンコーダがguide，デコーダが真の方策と，
VAEをguided policy searchだと捉え直せる
• 実験は合成データと実データの２つ
– 学習曲線が特徴的
内容

GANの苦手分野
• GANはGeneratorとDiscriminatorの２つで学習する
• GANの式
– 𝑎𝑟𝑔𝑚𝑖𝑛'Ε) * log 𝐷/(𝐺' 𝑧 )
𝑠. 𝑡. 𝜙 = 𝑎𝑟𝑔𝑚𝑖𝑛/Ε)∗ : log 𝐷/ 𝑥 + Ε)= : log (1 − 𝐷/ 𝑥 )
• 生成する値を通して誤差逆伝播する
– 離散値の生成モデルは苦手
3
離散系列の生成モデルのために，Generatorと微分の仕方を変える

方策勾配法と他の勾配法との違い
• 方策勾配法では分布を微分する
• 代わりに，期待値の中身は微分しない
• スカラーで伝播する(行動価値など)
4
𝛻' 𝑝' 𝑥
𝑝' 𝑥
= 𝛻' log 𝑝' 𝑥

SeqGANのアルゴリズム
• Discriminatorの学習はそのまま(左)
• Generatorを強化学習に置き換える(右)
– 𝑎𝑟𝑔𝑚𝑖𝑛'Ε) * log 𝐷/(𝐺' 𝑧 )から
𝑎𝑟𝑔𝑚𝑖𝑛'Ε) * ΕC~E= * 𝐷/(𝑦)へ
– 離散系列を順にサンプリングしていく
– 行動価値QはMC-searchで平均して求める
5

実験
• ランダムLSTMからの人工データ
• 中国漢詩,オバマ演説,Nothingham(midi楽譜)
6

学習曲線
• 人工データでの負の対数尤度の推移
– MLEでの学習(pre-train)が終わった後，RLでの学習がNLLを
大きく改善している
7

系列の生成にRLを使う先行研究
• 拡張したguided policy searchの式がVAEの系列
モデルを包含する
– guideが入力を観測できるエンコーダ
– 学習するpolicyがデコーダ
• reparameterization trickを使うので強化学習の
枠組みで考えても同じアルゴリズムになる
9

Mais conteúdo relacionado

Mais procurados

[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence ModelingDeep Learning JP

Non-autoregressive text generationnlab_utokyo

転移学習ランキング・ドメイン適応Elpo González Valbuena

敵対的生成ネットワーク（GAN）cvpaper. challenge

[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...Deep Learning JP

【DL輪読会】ViT + Self Supervised LearningまとめDeep Learning JP

【DL輪読会】Flow Matching for Generative ModelingDeep Learning JP

[DL輪読会]相互情報量最大化による表現学習Deep Learning JP

【DL輪読会】Is Conditional Generative Modeling All You Need For Decision-Making?Deep Learning JP

深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜Jun Okumura

SSII2019OS: 深層学習にかかる時間を短くしてみませんか？～分散学習の勧め～SSII

continual learning surveyぱんいちすみもと

SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術〜足りない情報をどのように補うか？〜SSII

アンサンブル学習Hidekazu Tanaka

畳み込みニューラルネットワークの高精度化と高速化Yusuke Uchida

[DL輪読会]"Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,0...Deep Learning JP

【DL輪読会】Reward Design with Language ModelsDeep Learning JP

【論文読み会】Self-Attention Generative Adversarial NetworksARISE analytics

CVPR2018 pix2pixHD論文紹介 (CV勉強会@関東)Tenki Lee

[DL輪読会]Neural Ordinary Differential EquationsDeep Learning JP

Mais procurados (20)

[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling

Non-autoregressive text generation

転移学習ランキング・ドメイン適応

敵対的生成ネットワーク（GAN）

[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...

【DL輪読会】ViT + Self Supervised Learningまとめ

【DL輪読会】Flow Matching for Generative Modeling

[DL輪読会]相互情報量最大化による表現学習

【DL輪読会】Is Conditional Generative Modeling All You Need For Decision-Making?

深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜

SSII2019OS: 深層学習にかかる時間を短くしてみませんか？～分散学習の勧め～

continual learning survey

SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術〜足りない情報をどのように補うか？〜

アンサンブル学習

畳み込みニューラルネットワークの高精度化と高速化

[DL輪読会]"Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,0...

【DL輪読会】Reward Design with Language Models

【論文読み会】Self-Attention Generative Adversarial Networks

CVPR2018 pix2pixHD論文紹介 (CV勉強会@関東)

[DL輪読会]Neural Ordinary Differential Equations

Destaque

第35回強化学習勉強会・論文紹介　[Lantao Yu : 2016]Takayuki Sekine

Adversarial Networks の画像生成に迫る @WBAFLカジュアルトーク#3Daiki Shimada

生成モデルの Deep LearningSeiya Tokui

論文輪読: Generative Adversarial Text to Image Synthesismmisono

[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...Deep Learning JP

[DL輪読会]Learning What and Where to Draw (NIPS’16)Deep Learning JP

[DL輪読会]Image-to-Image Translation with Conditional Adversarial NetworksDeep Learning JP

[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...Deep Learning JP

[DL輪読会]Unsupervised Cross-Domain Image GenerationDeep Learning JP

Deep parkingShintaro Shiba

Adversarial learning for neural dialogue generationKeon Kim

Generative adversarial text to image synthesisUniversitat Politècnica de Catalunya

[Dl輪読会]video pixel networksDeep Learning JP

[輪読会]Multilingual Image Description with Neural Sequence ModelsDeep Learning JP

[Dl輪読会]Censoring Representation with AdversaryDeep Learning JP

[Dl輪読会]bridging the gaps between residual learning, recurrent neural networks...Deep Learning JP

[DL輪読会]Learning convolutional neural networks for graphsDeep Learning JP

[DL輪読会]Let there be colorDeep Learning JP

[DL輪読会]QUASI-RECURRENT NEURAL NETWORKSDeep Learning JP

[DL輪読会]最新の深層強化学習Deep Learning JP

Destaque (20)

第35回強化学習勉強会・論文紹介　[Lantao Yu : 2016]

Adversarial Networks の画像生成に迫る @WBAFLカジュアルトーク#3

生成モデルの Deep Learning

論文輪読: Generative Adversarial Text to Image Synthesis

[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...

[DL輪読会]Learning What and Where to Draw (NIPS’16)

[DL輪読会]Image-to-Image Translation with Conditional Adversarial Networks

[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...

[DL輪読会]Unsupervised Cross-Domain Image Generation

Deep parking

Adversarial learning for neural dialogue generation

Generative adversarial text to image synthesis

[Dl輪読会]video pixel networks

[輪読会]Multilingual Image Description with Neural Sequence Models

[Dl輪読会]Censoring Representation with Adversary

[Dl輪読会]bridging the gaps between residual learning, recurrent neural networks...

[DL輪読会]Learning convolutional neural networks for graphs

[DL輪読会]Let there be color

[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS

[DL輪読会]最新の深層強化学習

Mais de Deep Learning JP

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDeep Learning JP

【DL輪読会】事前学習用データセットについてDeep Learning JP

【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...Deep Learning JP

【DL輪読会】Zero-Shot Dual-Lens Super-ResolutionDeep Learning JP

【DL輪読会】BloombergGPT: A Large Language Model for Finance arxivDeep Learning JP

【DL輪読会】マルチモーダル LLMDeep Learning JP

【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...Deep Learning JP

【DL輪読会】AnyLoc: Towards Universal Visual Place RecognitionDeep Learning JP

【DL輪読会】Can Neural Network Memorization Be Localized?Deep Learning JP

【DL輪読会】Hopfield network　関連研究についてDeep Learning JP

【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )Deep Learning JP

【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...Deep Learning JP

【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"Deep Learning JP

【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "Deep Learning JP

【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat ModelsDeep Learning JP

【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"Deep Learning JP

【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...Deep Learning JP

【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...Deep Learning JP

【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP

【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...Deep Learning JP

Mais de Deep Learning JP (20)

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners

【DL輪読会】事前学習用データセットについて

【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...

【DL輪読会】Zero-Shot Dual-Lens Super-Resolution

【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv

【DL輪読会】マルチモーダル LLM

【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...

【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition

【DL輪読会】Can Neural Network Memorization Be Localized?

【DL輪読会】Hopfield network　関連研究について

【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )

【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...

【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"

【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "

【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models

【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"

【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...

【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...

【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...

【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...

Último

[DevOpsDays Tokyo 2024] 〜デジタルとアナログのはざまに〜スマートビルディング爆速開発を支える自動化テスト戦略Ryo Sasaki

Amazon SES を勉強してみるその１2024/04/12の勉強会で発表されたものです。iPride Co., Ltd.

新人研修のまとめ 2024/04/12の勉強会で発表されたものです。iPride Co., Ltd.

IoT in the era of generative AI, Thanks IoT ALGYAN.pptxAtomu Hidaka

UPWARD_share_company_information_20240415.pdffurutsuka

スマートフォンを用いた新生児あやし動作の教示システムsugiuralab

PHP-Conference-Odawara-2024-04-000000000Shota Ito

Postman LT Fukuoka_Quick Prototype_By Danieldanielhu54

20240412_HCCJP での Windows Server 2025 Active Directoryosamut

[DL輪読会]SeqGan Sequence Generative Adversarial Nets with Policy Gradient

1. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient Lantao Yu† , Weinan Zhang† , Jun Wang‡ , Yong Yu† †Shanghai Jiao Tong University, ‡University College London {yulantao,wnzhang,yyu}@apex.sjtu.edu.cn, j.wang@cs.ucl.ac.uk 2016/9/30 発表者：金子貴輝

2. 2 • GANのDからGへの勾配伝播が離散系列では消えてしまうので， Gを微修正の効く確率的なモデルに置き換え，方策勾配で伝える • GにはLSTMも使用する • Qはパラメータ近似せず，毎回MC-searchで平均して求める • 内部的にRLを使うのは先行研究有り – VRNNのエンコーダがguide，デコーダが真の方策と， VAEをguided policy searchだと捉え直せる • 実験は合成データと実データの２つ – 学習曲線が特徴的内容

3. GANの苦手分野 • GANはGeneratorとDiscriminatorの２つで学習する • GANの式 – 𝑎𝑟𝑔𝑚𝑖𝑛'Ε) * log 𝐷/(𝐺' 𝑧 ) 𝑠. 𝑡. 𝜙 = 𝑎𝑟𝑔𝑚𝑖𝑛/Ε)∗ : log 𝐷/ 𝑥 + Ε)= : log (1 − 𝐷/ 𝑥 ) • 生成する値を通して誤差逆伝播する – 離散値の生成モデルは苦手 3 離散系列の生成モデルのために，Generatorと微分の仕方を変える

4. 方策勾配法と他の勾配法との違い • 方策勾配法では分布を微分する • 代わりに，期待値の中身は微分しない • スカラーで伝播する(行動価値など) 4 𝛻' 𝑝' 𝑥 𝑝' 𝑥 = 𝛻' log 𝑝' 𝑥

5. SeqGANのアルゴリズム • Discriminatorの学習はそのまま(左) • Generatorを強化学習に置き換える(右) – 𝑎𝑟𝑔𝑚𝑖𝑛'Ε) * log 𝐷/(𝐺' 𝑧 )から 𝑎𝑟𝑔𝑚𝑖𝑛'Ε) * ΕC~E= * 𝐷/(𝑦)へ – 離散系列を順にサンプリングしていく – 行動価値QはMC-searchで平均して求める 5

6. 実験 • ランダムLSTMからの人工データ • 中国漢詩,オバマ演説,Nothingham(midi楽譜) 6

7. 学習曲線 • 人工データでの負の対数尤度の推移 – MLEでの学習(pre-train)が終わった後，RLでの学習がNLLを大きく改善している 7

8. 学習曲線 8

9. 系列の生成にRLを使う先行研究 • 拡張したguided policy searchの式がVAEの系列モデルを包含する – guideが入力を観測できるエンコーダ – 学習するpolicyがデコーダ • reparameterization trickを使うので強化学習の枠組みで考えても同じアルゴリズムになる 9