Enviar pesquisa
Carregar
[DL輪読会]Bridging the Gap Between Value and Policy Based Reinforcement Learning
•
14 gostaram
•
2,976 visualizações
Deep Learning JP
Seguir
2017/3/10 Deep Learning JP: http://deeplearning.jp/seminar-2/
Leia menos
Leia mais
Tecnologia
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 36
Baixar agora
Baixar para ler offline
Recomendados
[DL輪読会]Unsupervised Cross-Domain Image Generation
[DL輪読会]Unsupervised Cross-Domain Image Generation
Deep Learning JP
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Deep Learning JP
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
Deep Learning JP
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
Deep Learning JP
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
Deep Learning JP
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
Deep Learning JP
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
Deep Learning JP
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
Deep Learning JP
Recomendados
[DL輪読会]Unsupervised Cross-Domain Image Generation
[DL輪読会]Unsupervised Cross-Domain Image Generation
Deep Learning JP
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Deep Learning JP
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
Deep Learning JP
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
Deep Learning JP
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
Deep Learning JP
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
Deep Learning JP
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
Deep Learning JP
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
Deep Learning JP
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
Deep Learning JP
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
Deep Learning JP
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
Deep Learning JP
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
Deep Learning JP
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
Deep Learning JP
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
Deep Learning JP
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
Deep Learning JP
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
Deep Learning JP
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
Deep Learning JP
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
Deep Learning JP
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
Deep Learning JP
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
Deep Learning JP
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
Deep Learning JP
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル
Deep Learning JP
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
Deep Learning JP
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
Deep Learning JP
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM
Deep Learning JP
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
Deep Learning JP
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
DianaGray10
Mais conteúdo relacionado
Mais de Deep Learning JP
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
Deep Learning JP
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
Deep Learning JP
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
Deep Learning JP
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
Deep Learning JP
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
Deep Learning JP
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
Deep Learning JP
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
Deep Learning JP
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
Deep Learning JP
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
Deep Learning JP
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
Deep Learning JP
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
Deep Learning JP
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
Deep Learning JP
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
Deep Learning JP
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル
Deep Learning JP
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
Deep Learning JP
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
Deep Learning JP
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM
Deep Learning JP
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
Deep Learning JP
Mais de Deep Learning JP
(20)
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
Último
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
DianaGray10
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Jago de Vreede
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Orbitshub
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Jeffrey Haguewood
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
danishmna97
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
The Digital Insurer
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Dropbox
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Angeliki Cooney
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Orbitshub
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Khushali Kathiriya
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Product Anonymous
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
apidays
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Architecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
Último
(20)
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Architecting Cloud Native Applications
Architecting Cloud Native Applications
[DL輪読会]Bridging the Gap Between Value and Policy Based Reinforcement Learning
1.
2.
3.
4.
5.
6.
7.
Qπ (s,a) = r(s,a)+γ
Eπ [Qπ (s',a')] Qπ L = (r(s,a)+γ Qθ π (s',a')−Qθ π (s,a))2
8.
9.
10.
Qo (s,a) = r(s,a)+γ
max a' Qo (s',a') Qo L = (r(s,a)+γ max a' Qθ o (s',a')−Qθ o (s,a))2
11.
๏ ๏
12.
๏ ๏
13.
Qπ (s,a) = r(s,a)+γ
Eπ [Qπ (s',a')] L = ( γ i r(si ,ai ) i=0 n−1 ∑ +γ n Qθ π (sn,an )−Qθ π (s0,a0 ))2
14.
๏ ๏
15.
L = (r(s,a)+γ
max a' Qθ o (s',a')−Qθ o (s,a))2
16.
17.
18.
19.
Q∗ (s,a) = r(s,a)+γτ
log exp(Q∗ (s',a') /τ )a'∑ Q∗
20.
τ log exp(Q∗ (s',a')
/τ )a'∑ = τ log(exp(Q∗ (sM ,aM ) /τ ) exp((Q∗ (s',a')−Q∗ (sM ,aM )) /τ )a'∑ ) = max a' Q∗ (s',a')+τ log( exp((Q∗ (s',a')−Q∗ (sM ,aM )) /τ )a'∑ )
21.
V∗ (s) = −τ
logπ∗ (a | s)+ r(s,a)+γV∗ (s')
22.
s0,v0 {a1,...,an} {v1,...,vn} {s1,...,sn} OMR (π) =
π(ai )(ri +γ vi o ) i=1 n ∑ v0 o = OMR (πo ) = max i (ri +γ vi o )
23.
OENT (π) =
π(ai )(ri +γ vi ∗ −τ logπ(ai )) i=1 n ∑ OENT (π) = −τ π(ai )log π(ai ) exp((ri +γ vi ∗ ) /τ ) / Si=1 n ∑ +τS π∗ (ai ) = exp((ri +γ vi ∗ ) /τ ) exp((ri' +γ vi' ∗ ) /τ ) i'=1 n ∑
24.
v0 ∗ = OENT (π∗ )
= τ log exp((ri +γ vi ∗ ) /τ ) i=1 n ∑ π∗ (ai ) = exp((ri +γ vi ∗ ) /τ ) exp((ri' +γ vi' ∗ ) /τ ) i'=1 n ∑ v0 ∗ = −τ logπ∗ (ai )+ r(si ,ai )+γ vi ∗
25.
V∗ (s) = −τ
logπ∗ (a | s)+ r(s,a)+γV∗ (s') −V∗ (s1)+γ t−1 V∗ (st )+ R(s1:t )−τG(s1:t ,π∗ ) = 0 R(sm:n ) = γ i r(sm+i ,am+i ) i=0 n−m−1 ∑ G(sm:n,π) = γ i logπ(am+i | sm+i ) i=0 n−m−1 ∑
26.
Cθ,φ (s1:t )
= −Vφ (s1)+γ t−1 Vφ (st )+ R(s1:t )−τG(s1:t ,πθ ) Δθ ∝Cθ,φ (s1:t )∇θG(s1:t ,πθ ) Δφ ∝Cθ,φ (s1:t )(∇φVφ (s1)− ∇φγ t−1 Vφ (st ))
27.
28.
Aθ,φ (s1:d+1) =
−Vφ (s1)+γ d Vφ (sd+1)+ R(s1:d+1) Δθ ∝ Es0:T [ Aθ,φ (si:i+d )∇θ logπθ (ai | si ) i=0 T −1 ∑ ] Δφ ∝ Es0:T [ Aθ,φ (si:i+d )∇φVφ (si ) i=0 T −1 ∑ ] Cθ,φ (s1:t ) = −Vφ (s1)+γ t−1 Vφ (st )+ R(s1:t )−τG(s1:t ,πθ )
Baixar agora