Sample-Efficient Deep Reinforcement Learning for Continuous Control
View / Open Files
Authors
Advisors
Turner, Richard E.
Ghahramani, Zoubin
Schoelkopf, Bernhard
Date
2019-11-01Awarding Institution
University of Cambridge
Author Affiliation
Department of Engineering
Qualification
Doctor of Philosophy (PhD)
Language
English
Type
Thesis
Metadata
Show full item recordCitation
Gu, S. (2019). Sample-Efficient Deep Reinforcement Learning for Continuous Control (Doctoral thesis). https://doi.org/10.17863/CAM.45105
Abstract
Reinforcement learning (RL) is a powerful, generic approach to discovering optimal policies
in complex sequential decision-making problems. Recently, with flexible function approximators such as neural networks, RL has greatly expanded its realm of applications, from
playing computer games with pixel inputs, to mastering the game of Go, to learning parkour
movements by simulated humanoids. However, the common RL approaches are known
to be sample intensive, making them difficult to be applied to real-world problems such
as robotics. This thesis makes several contributions toward developing RL algorithms for
learning in the wild, where sample-efficiency and stability are critical. The key contributions
include Normalized Advantage Functions (NAF), extending Q-learning for continuous action problems; Interpolated Policy Gradient (IPG), unifying prior policy gradient algorithm
variants through theoretical analyses on bias and variance; and Temporal Difference Models
(TDM), interpreting a parameterized Q-function as a generalized dynamics model for novel
temporally abstracted model-based planning. Importantly, this thesis highlights that these
algorithms can be seen as bridging gaps between branches of RL – model-based with modelfree, and on-policy with off-policy. The proposed algorithms not only achieve substantial
improvements over the prior approaches, but also provide novel perspectives on how to mix
different branches of RL effectively to gain the best of both worlds. NAF has subsequently
been shown to be able to train two 7-DoF robot arms to open doors using only 2.5 hours of
real-world experience, making it one of the first demonstrations of deep RL approaches on
real robots.
Keywords
Reinforcement Learning, Continuous Control, Robotics, Deep Learning, Machine Learning, Model-based Planning, Model-free Reinforcement Learning
Sponsorship
- Cambridge-Tuebingen PhD Fellowship in Machine Learning
- Google Focused Research Award
- NSERC
Identifiers
This record's DOI: https://doi.org/10.17863/CAM.45105
Rights
All rights reserved, All Rights Reserved
Licence URL: https://www.rioxx.net/licenses/all-rights-reserved/
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.
Recommended or similar items
The current recommendation prototype on the Apollo Repository will be turned off on 03 February 2023. Although the pilot has been fruitful for both parties, the service provider IKVA is focusing on horizon scanning products and so the recommender service can no longer be supported. We recognise the importance of recommender services in supporting research discovery and are evaluating offerings from other service providers. If you would like to offer feedback on this decision please contact us on: support@repository.cam.ac.uk