Reinforcement Learning and deep reinforcement learning

•Transferir como PPTX, PDF•

0 gostou•5 visualizações

Reinforcement Learning: An Introduction Imitation Learning Lecture Slides from CMU Deep Reinforcement Learning Course We want a reinforcement learning agent to earn lots of reward The agent must prefer past actions that have been found to be effective at producing reward The agent must exploit what it already knows to obtain reward The agent must select untested actions to discover reward-producing actions The agent must explore actions to make better action selections in the future Trade-off between exploration and exploitation Reinforcement learning systems have 4 main elements: Policy Reward signal Value function Optional model of the environment Networks) Policy Gradient Methods (Finite Difference Policy Gradient, REINFORCE, Actor-Critic) Asynchronous Reinforcement Learning The reward signal defines the goal On each time step, the environment sends a single number called the reward to the reinforcement learning agent The agent’s objective is to maximise the total reward that it receives over the long run The reward signal is used to alter the policy Use the values to make and evaluate decisions Action choices are made based on value judgements Prefer actions that bring about states of highest value instead of highest reward Rewards are given directly by the environment Values must continually be re-estimated from the sequence of observations that an agent makes over its lifetime A model of the environment allows inferences to be made about how the environment will behave Example: Given a state and an action to be taken while in that state, the model could predict the next state and the next reward Models are used for planning, which means deciding on a course of action by considering possible future situations before they are experienced Model-based methods use models and planning. Think of this as modelling the dynamics p(s’ | s, a) Model-free methods learn exclusively from trial-and-error (i.e. no modelling of the environment) This presentation focuses on model-free methods

Engenharia

Introduction to Reinforcement
Learning
Chapter 1 – Reinforcement Learning: An Introduction
Imitation Learning Lecture Slides from CMU Deep
Reinforcement Learning Course

Finite Markov Decision Processes
Chapter 3 – Reinforcement Learning: An Introduction

Temporal-Difference Learning
Chapter 6 – Reinforcement Learning: An Introduction
Playing Atari with Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning
David Silver’s Tutorial on Deep Reinforcement Learning

Policy Gradient Methods
Chapter 13 – Reinforcement Learning: An Introduction
Policy Gradient Lecture Slides from David Silver’s
Reinforcement Learning Course
David Silver’s Tutorial on Deep Reinforcement Learning

Asynchronous Reinforcement
Learning
Asynchronous Methods for Deep Reinforcement Learning

What is Asynchronous Reinforcement Learning?

Asynchronous one-step Q-learning Algorithm

Asynchronous n-step Q-learning Algorithm

Mais conteúdo relacionado

Semelhante a Reinforcement Learning and deep reinforcement learning

Deep reinforcement learning from scratchJie-Han Chen

reinforcement learning in artificial intelligencepanditadesh123

An introduction to reinforcement learningJie-Han Chen

acai01-updated.pptbutest

Reinforcement learning Chandra Meena

Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le

Machine Learning: A gentle IntroductionMatthias Zimmermann

Real-world Reinforcement LearningMax Pagels

Similarity learningLearnbay Datascience

Real-world Reinforcement LearningMax Pagels

An AHP-based Framework for Quality and Security EvaluationPorfirio Tramontana

Semelhante a Reinforcement Learning and deep reinforcement learning (11)

Deep reinforcement learning from scratch

reinforcement learning in artificial intelligence

An introduction to reinforcement learning

acai01-updated.ppt

Reinforcement learning

Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity

Machine Learning: A gentle Introduction

Real-world Reinforcement Learning

Similarity learning

Real-world Reinforcement Learning

An AHP-based Framework for Quality and Security Evaluation

Último

Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfJNTUA

Intelligent Agents, A discovery on How A Rational Agent ActsSheetal Jain

Complex plane, Modulus, Argument, Graphical representation of a complex numbe...MohammadAliNayeem

Geometric constructions Engineering Drawing.pdfJNTUA

Introduction to Artificial Intelligence and History of AISheetal Jain

Circuit Breaker arc phenomenon.pdf engineeringKanchhaTamang

Introduction to Arduino Programming: Features of ArduinoAbhimanyu Sangale

Final DBMS Manual (2).pdf final lab manualBalamuruganV28

Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...ShivamTiwari995432

Quiz application system project report..pdfKamal Acharya

The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...Roi Lipman

Linux Systems Programming: Semaphores, Shared Memory, and Message QueuesRashidFaridChishti

Research Methodolgy & Intellectual Property Rights Series 2T.D. Shashikala

Interfacing Analog to Digital Data Converters ee3404.pdfragupathi90

ChatGPT Prompt Engineering for project managers.pdfqasastareekh

Lab Manual Arduino UNO Microcontrollar.docxRashidFaridChishti

"United Nations Park" Site Visit Report.MdManikurRahman

Introduction to Heat Exchangers: Principle, Types and ApplicationsKineticEngineeringCo

SLIDESHARE PPT-DECISION MAKING METHODS.pptxCHAIRMAN M

Supermarket billing system project report..pdfKamal Acharya

Reinforcement Learning and deep reinforcement learning

1. Reinforcement Learning

2. Overview

3. Introduction to Reinforcement Learning Chapter 1 – Reinforcement Learning: An Introduction Imitation Learning Lecture Slides from CMU Deep Reinforcement Learning Course

4. What is Reinforcement Learning?

5. Exploration versus Exploitation

6. Reinforcement Learning Systems

7. Policy

8. Reward Signal

9. Value Function (1)

10. Value Function (2)

11. Model-free versus Model-based

12. On-policy versus Off-policy

13. Credit Assignment Problem

14. Reward Design

15. What is Deep Reinforcement Learning?

16. Finite Markov Decision Processes Chapter 3 – Reinforcement Learning: An Introduction

17. Markov Decision Process (MDP)

18. Time Discounting

19. Agent-Environment Interaction (1)

20. Agent-Environment Interaction (2)

21. Action Selection

22. MDP Dynamics

23. State Transition Probabilities

24. Expected Rewards

25. State-Value Function (1)

26. State-Value Function (2)

27. Action-Value Function

28. Bellman Equation (1)

29. Bellman Equation (2)

30. Optimality

31. Temporal-Difference Learning Chapter 6 – Reinforcement Learning: An Introduction Playing Atari with Deep Reinforcement Learning Asynchronous Methods for Deep Reinforcement Learning David Silver’s Tutorial on Deep Reinforcement Learning

32. What is TD learning?

33. Value-based Reinforcement Learning

34. Update Rule for TD(0)

35. Update Rule Intuition

36. Tabular TD(0) Algorithm

37. SARSA – On-policy TD Control

38. SARSA Update Rule

39. SARSA Algorithm

40. Q-learning – Off-policy TD Control

41. One-step Q-learning Algorithm

42. Epsilon-greedy Policy

43. Deep Q-Networks (DQN)

44. Q-Networks

45. Experience Replay

46. State representation

47. Q-Network Training

48. Loss Function Gradient Derivation

49. DQN Algorithm

50. Comments

51. Policy Gradient Methods Chapter 13 – Reinforcement Learning: An Introduction Policy Gradient Lecture Slides from David Silver’s Reinforcement Learning Course David Silver’s Tutorial on Deep Reinforcement Learning

52. What are Policy Gradient Methods?

53. Policy-based Reinforcement Learning

54. Notation

55. Policy Approximation

56. Types of Policy Gradient Method

57. Finite Difference Policy Gradient

58. REINFORCE: Monte Carlo Policy Gradient

59. REINFORCE Properties

60. REINFORCE Algorithm

61. Actor-Critic Methods

62. One-step Actor-Critic Update Rules

63. One-step Actor-Critic Algorithm