SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
A Deeper Look at Experience Replay
(17.12)
Seungjae Ryan Lee
Online Learning
• Learn directly from experience
• Highly correlated data
New transition t
Experience Replay
• Save transitions (𝑆𝑡, 𝐴 𝑡, 𝑅𝑡+1, 𝑆𝑡+1) into buffer and sample batch 𝐵
• Use batch 𝐵 to train the agent
(S, A, R, S)
(S, A, R, S)
(S, A, R, S)
Replay Buffer D
Transition t Batch B
Effectiveness of Experience Replay
• Only method that can generate uncorrelated data for online RL
• Except using multiple workers (A3C)
• Significantly improves data efficiency
• Norm in many deep RL algorithms
• Deep Q-Networks (DQN)
• Deep Deterministic Policy Gradient (DDPG)
• Hindsight Experience Replay (HER)
Problem with Experience Replay
• There has been default capacity of 106 used for:
• Different algorithms (DQN, PG, etc.)
• Different environments (retro games, continuous control, etc.)
• Different neural network architectures
Result 1
Replay buffer capacity can have significant negative impact on
performance if too low or too high.
Combined Experience Replay (CER)
• Save transitions (𝑆𝑡, 𝐴 𝑡, 𝑅𝑡+1, 𝑆𝑡+1) into buffer and sample batch 𝐵
• Use batch 𝐵 to and online transition 𝑡 to train the agent
(S, A, R, S)
(S, A, R, S)
(S, A, R, S)
Replay Buffer D
Batch B
Transition t
Combined Experience Replay (CER)
Result 2
CER can remedy the negative influence of a large replay buffer with
𝑂 1 computation.
CER vs. Prioritized Experience Replay (PER)
• Prioritized Experience Replay (PER)
• Stochastic replay method
• Designed to replay the buffer more efficiently
• Always expected to improve performance
• 𝑂(𝑁 log 𝑁)
• Combined Experience Replay (CER)
• Guaranteed to use newest transition
• Designed to remedy negative influence of a large replay buffer
• Does not improve performance for good replay buffer sizes
• 𝑂(1)
Test agents
1. Online-Q
• Q-learning with online transitions 𝑡
2. Buffer-Q
• Q-learning with the replay buffer 𝐵
3. Combined-Q
• Q-learning with both the replay buffer 𝐵 and online transitions 𝑡
Testbed Environments
• 3 environments for 3 methods
• Tabular, Linear and Nonlinear approximations
• Introduce “timeout” to all tasks
• Episode ends automatically after 𝑇 timesteps (large enough for each task)
• Prevent episode being arbitrarily long
• Used partial-episode-bootstrap (PEB) to minimize negative side-effects
Testbed: Gridworld
• Represent tabular methods
• Agent starts in 𝑆 and has a goal state 𝐺
• Agent can move left, right, up, down
• Reward is -1 until goal is reached
• If the agent bumps into the wall (black),
it remains in the same position
Gridworld Results (Tabular)
• Online-Q solves task very slowly
• Buffer-Q shows worse performance / speed for larger buffers
• Combined-Q shows slightly faster speed for larger buffers
Gridworld Results (Nonlinear)
• Online-Q fails to learn
• Combined-Q significantly speeds up learning
Testbed: Lunar Lander
• Represent linear approximation methods (with tile coding)
• Agent tries to land a shuttle on
the moon
• State space: 𝑅8
• 4 discrete actions
Lunar Lander Results (Linear)
• Buffer-Q shows worse learning speed for larger buffers
• Combined-Q is robust for varying buffer size
Lunar Lander Results (Nonlinear)
• Online-Q achieves best performance
• Combined-Q shows marginal improvement to Buffer-Q
• Buffer-Q and Combined-Q overfits after some time
Testbed: Pong
• Represent nonlinear approximation methods
• RAM states used instead of raw
pixels
• More accurate state representation
• State space: 0, … , 255 128
• 6 discrete actions
Pong Results (Nonlinear)
• All 3 agents fail to learn with a simple 1-hidden-layer network
• CER does not improve performance or speed
Limitations of Experience Replay
• Important transitions have delayed effects
• Partially mitigated with PER, but has a cost of 𝑂(𝑁 log 𝑁)
• Partially mitigated with correct buffer size or CER
• Both are workarounds, not solutions
• Experience Replay itself is flawed
• Focus should be on replacing experience replay
Thank you!
Original Paper: https://arxiv.org/abs/1712.01275
Paper Recommendations:
• Prioritized Experience Replay
• Hindsight Experience Replay
• Asynchronous Methods for Deep Reinforcement Learning
You can find more content in www.endtoend.ai/slides

Mais conteúdo relacionado

Último

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Destaque

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destaque (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

[1712.01275] A Deeper Look at Experience Replay

  • 1. A Deeper Look at Experience Replay (17.12) Seungjae Ryan Lee
  • 2. Online Learning • Learn directly from experience • Highly correlated data New transition t
  • 3. Experience Replay • Save transitions (𝑆𝑡, 𝐴 𝑡, 𝑅𝑡+1, 𝑆𝑡+1) into buffer and sample batch 𝐵 • Use batch 𝐵 to train the agent (S, A, R, S) (S, A, R, S) (S, A, R, S) Replay Buffer D Transition t Batch B
  • 4. Effectiveness of Experience Replay • Only method that can generate uncorrelated data for online RL • Except using multiple workers (A3C) • Significantly improves data efficiency • Norm in many deep RL algorithms • Deep Q-Networks (DQN) • Deep Deterministic Policy Gradient (DDPG) • Hindsight Experience Replay (HER)
  • 5. Problem with Experience Replay • There has been default capacity of 106 used for: • Different algorithms (DQN, PG, etc.) • Different environments (retro games, continuous control, etc.) • Different neural network architectures Result 1 Replay buffer capacity can have significant negative impact on performance if too low or too high.
  • 6. Combined Experience Replay (CER) • Save transitions (𝑆𝑡, 𝐴 𝑡, 𝑅𝑡+1, 𝑆𝑡+1) into buffer and sample batch 𝐵 • Use batch 𝐵 to and online transition 𝑡 to train the agent (S, A, R, S) (S, A, R, S) (S, A, R, S) Replay Buffer D Batch B Transition t
  • 7. Combined Experience Replay (CER) Result 2 CER can remedy the negative influence of a large replay buffer with 𝑂 1 computation.
  • 8. CER vs. Prioritized Experience Replay (PER) • Prioritized Experience Replay (PER) • Stochastic replay method • Designed to replay the buffer more efficiently • Always expected to improve performance • 𝑂(𝑁 log 𝑁) • Combined Experience Replay (CER) • Guaranteed to use newest transition • Designed to remedy negative influence of a large replay buffer • Does not improve performance for good replay buffer sizes • 𝑂(1)
  • 9. Test agents 1. Online-Q • Q-learning with online transitions 𝑡 2. Buffer-Q • Q-learning with the replay buffer 𝐵 3. Combined-Q • Q-learning with both the replay buffer 𝐵 and online transitions 𝑡
  • 10. Testbed Environments • 3 environments for 3 methods • Tabular, Linear and Nonlinear approximations • Introduce “timeout” to all tasks • Episode ends automatically after 𝑇 timesteps (large enough for each task) • Prevent episode being arbitrarily long • Used partial-episode-bootstrap (PEB) to minimize negative side-effects
  • 11. Testbed: Gridworld • Represent tabular methods • Agent starts in 𝑆 and has a goal state 𝐺 • Agent can move left, right, up, down • Reward is -1 until goal is reached • If the agent bumps into the wall (black), it remains in the same position
  • 12. Gridworld Results (Tabular) • Online-Q solves task very slowly • Buffer-Q shows worse performance / speed for larger buffers • Combined-Q shows slightly faster speed for larger buffers
  • 13. Gridworld Results (Nonlinear) • Online-Q fails to learn • Combined-Q significantly speeds up learning
  • 14. Testbed: Lunar Lander • Represent linear approximation methods (with tile coding) • Agent tries to land a shuttle on the moon • State space: 𝑅8 • 4 discrete actions
  • 15. Lunar Lander Results (Linear) • Buffer-Q shows worse learning speed for larger buffers • Combined-Q is robust for varying buffer size
  • 16. Lunar Lander Results (Nonlinear) • Online-Q achieves best performance • Combined-Q shows marginal improvement to Buffer-Q • Buffer-Q and Combined-Q overfits after some time
  • 17. Testbed: Pong • Represent nonlinear approximation methods • RAM states used instead of raw pixels • More accurate state representation • State space: 0, … , 255 128 • 6 discrete actions
  • 18. Pong Results (Nonlinear) • All 3 agents fail to learn with a simple 1-hidden-layer network • CER does not improve performance or speed
  • 19. Limitations of Experience Replay • Important transitions have delayed effects • Partially mitigated with PER, but has a cost of 𝑂(𝑁 log 𝑁) • Partially mitigated with correct buffer size or CER • Both are workarounds, not solutions • Experience Replay itself is flawed • Focus should be on replacing experience replay
  • 20. Thank you! Original Paper: https://arxiv.org/abs/1712.01275 Paper Recommendations: • Prioritized Experience Replay • Hindsight Experience Replay • Asynchronous Methods for Deep Reinforcement Learning You can find more content in www.endtoend.ai/slides