38. Copyright (C) DeNA Co.,Ltd. All Rights Reserved.
引⽤⽂献
38
[1] Sutton, R. S. and Barto, A. G. Reinforcement Learning: An Introduction. MIT Press, Cambridge, 1998.
[2] Littman, M. L. Reinforcement learning improves behaviour from evaluative feedback. Nature, 521, (7553), 445-451. 2015.
[3] Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3-4), 229–256. 1992.
[4] V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller. A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg,
and D. Hassabis. Human-level control through deep reinforcement learning. Nature 518 (7540): 529–533, 2015 http://dx.doi.org/10.1038/nature14236, (https://storage.googleapis.com/deepmind-media/dqn/
DQNNaturePaper.pdf).
[5] Hessel, Matteo, et al. "Rainbow: Combining Improvements in Deep Reinforcement Learning." arXiv preprint arXiv:1710.02298. 2017.
[6] Mnih, V., Mirza, M., Graves, A., Harley, T., Lillicrap, T. P., & Silver, D. Asynchronous Methods for Deep Reinforcement Learning. International Conference on Machine Learning (ICML 2016), 2016.
[7] Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Hubert, T., Hassabis, D., et al.: Mastering the Game of Go without Human Knowledge. 2017.
[8] Jaderberg, M., Mnih, V., Czarnecki, W.M., Schaul, T., Leibo, J.Z., Silver, D., and Kavukcuoglu, K. Reinforcement learning with unsupervised auxiliary tasks. In International Conference on Learning Representations.
2017.
[9] Teh, Y. W., Bapst, V., Czarnecki, W. M., Quan, J., Kirkpatrick, J., Hadsell, R. Pascanu, R., et.al.: Distral : Robust Multitask Reinforcement Learning. NIPS. 2017.
[10] Bellemare, M. G., Schaul, T., Saxton, D., and Ostrovski, G. Unifying Count-Based Exploration and Intrinsic Motivation, NIPS. 2016.
[11] Ostrovski, G., Bellemare, M. G., Oord, V. D. O., Munon, R. Count-Based Exploration with Neural Density Models. NIPS. 2017.
[12] Weber, T., Racanière, S., Reichert, D. P., Buesing, L., et.al.: Imagination-Augmented Agents for Deep Reinforcement Learning. NIPS. 2017. arXiv. https://arxiv.org/pdf/1707.06203.pdf
[13] Vezhnevets, A., Mnih, V., Osindero, S., Graves, A., Vinyals, O., Agapiou, J., et al.: Strategic attentive writer for learning macro-actions. In: Advances in Neural Information Processing Systems, pp. 3486–3494 2016.
[FuN: Vezhnevets, 2017] Vezhnevets, A. S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., and Kavukcuoglu, K. FeUdal Networks for Hierarchical Reinforcement Learning. 2017. (http://arxiv.org/abs/
1703.01161)
[14] Bacon, P.-L., Harb, J., Precup, D. The option-critic architecture. Proceedings of AAAI, 1726–1734, 2017.
[15] Vezhnevets, A., Mnih, V., Agapiou, J., Osindero, S., Graves, A., Vinyals, O., Kavukcuoglu, K. Strategic Attentive Writer for Learning Macro-Actions. ArXiv. Retrieved from https://arxiv.org/abs/1606.04695, 2016.
[16] Vezhnevets, A. S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., and Kavukcuoglu, K. FeUdal Networks for Hierarchical Reinforcement Learning. ArXiv. Retrieved from http://arxiv.org/abs/1703.01161 ,
2017.
[17] Kulkarni, T. D., Narasimhan, K., Saeedi, A., Tenenbaum, J. B. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation. Proceedings of the 30th Conference on Neural
Information Processing Systems (NIPS 2016), 2016.
[18] Florensa, C., Duan, Y., Abbeel, P. Stochastic Neural Networks for Hierarchical Reinforcement Learning. Proceedings of the International Conference on Learning Representations (ICLR 2017), 2017.
[19] Vezhnevets, A., Mnih, V., Agapiou, J., Osindero, S., Graves, A., Vinyals, O., Kavukcuoglu, K. Strategic Attentive Writer for Learning Macro-Actions. ArXiv. Retrieved from https://arxiv.org/abs/1606.04695, 2016.
[20] Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., Zaremba, W. Hindsight Experience Replay. Proceedings of the 31st Conference on Neural Information
Processing Systems (NIPS 2017), 2017.
[21] Mutz, F., Schmidhuber, J. Hindsight Policy Gradients. Proceedings of Hierarchical Reinforcement Learning Workshop at the 31st Conference on Neural Information Processing Systems (HRL@NIPS 2017), 2017.
[22] Barreto, A., Dabney, W., Munos, R., Hunt, J., J., Tom Schaul, Silver, D., Hasselt, H., P. Successor Features for Transfer in Reinforcement Learning. Proceedings of the 31st Conference on Neural Information Processing
Systems (NIPS 2017), 2017.
[23] Racanière, S., Weber, T., David Reichert, Buesing, L., Guez, A., Rezende, D. J., Badia, A. P., Vinyals, O., Heess, N., Li, Y., Pascanu, R. Battaglia, P., Hassabis, R., Silver, D., Wierstra, D. Imagination-Augmented Agents
for Deep Reinforcement Learning. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), 2017.
[24] Huang, V., Ley, T., Vlachou-Konchylaki, M., Hu, W. Enhanced Experience Replay Generation for Efficient Reinforcement Learning. ArXiv. Retrieved from https://arxiv.org/abs/1705.08245 , 2017.
[25] Fu, J., Co-Reyes, J., Levine, S. EX2 : Exploration with Exemplar Models for Deep Reinforcement Learning. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), 2017.
[26] Tang, H., Houthooft, R., Foote, D., Stooke, A., Chen, X., Duan, Y., Schulman, J., DeTurck, F., Abbeel, P. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning. Proceedings of the 31st
Conference on Neural Information Processing Systems (NIPS 2017), 2017.
[27] Xu, Z., Modayil, J., Hasselt, H., P., Barreto, A., Silver, D., Schaul, T. Natural Value Approximators: Learning when to Trust Past Estimates. Proceedings of the 31st Conference on Neural Information Processing Systems
(NIPS 2017), 2017.
[28] Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
[29] Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., and de Freitas, N. (2016). Sample efficient actor-critic with experience replay. International Conference on Learning Representations (ICLR 2017),
2017.
[30] O’Donoghue, B., Munos, R., Kavukcuoglu, K., and Mnih, V. PGQ: Combining policy gradient and Q-learning. arXiv preprint arXiv:1611.01626, 2016.
[31] Nachum, O., Norouzi, M., Xu, K. and Schuurmans, D. Bridging the gap between value and policy based reinforcement learning. arXiv preprint arXiv:1702.08892, 2017.
[32] Haarnoja, T., Zhou, A., Abbeel, P. and Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. International Conference on Machine Learning (ICML 2018),
2018
[33] Ha, D., Schmidhuber, J., World Models, 2018. https://arxiv.org/abs/1803.10122
[34] Ho, J. and Ermon, S. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems (NIPS 2016), pp. 4565–4573. 2016.
[35] Peng, X.B., Abbeel, P., Levine, S., van de Panne, M. DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills. arXiv preprint arXiv:1804.02717. 2018.
[36] S, Sukhbaatar, et al. Intrinsic motivation and automatic curricula via asymmetric self-play. In ICLR. 2017. https://openreview.net/forum?id=SkT5Yg-RZ http://search.iclr2018.smerity.com/
[37] Horgan, D., Quan, J., Budden, D., Barth-Maron,G., Hessel, M., van Hasselt, H., and Silver, D. Distributed prioritized experience replay. International Conference on Learning Representations (ICLR 2018), 2018. https://
openreview.net/forum?id=H1Dy---0Z
[38] Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., …. Overcoming catastrophic forgetting in neural networks. the National Academy of Sciences 114 (13), 3521-3526. 2017.
[39] Pritzel, A., Uria, B., Sriram Srinivasan, ‘Puig-dome’nech, A., Vinyals, O., Hassabis, D., Wierstra, D., and Blundell, C.. Neural Episodic Control. International Conference on Machine Learning (ICML 2017), 2017.
[41] Graves, A.,Wayne, G., Danihelka, I. Neural turing machines. arXiv preprint arXiv:1410.5401. 2014.
[42] Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., ... & Badia, A. P. Hybrid computing using a neural network with dynamic external memory. Nature. 2016.
[43] https://www.kdnuggets.com/2017/03/next-challenges-reinforcement-learning.html
[44] Reinforcement Learning never worked, and 'deep' only helped a bit. FEBRUARY 23, 2018 ( http://bit.ly/2MdnMoV)