Sample-Efficient Deep Reinforcement Learning for Continuous Control

Gu, Shixiang

doi:10.17863/CAM.45105

Sample-Efficient Deep Reinforcement Learning for Continuous Control

Repository URI

https://www.repository.cam.ac.uk/handle/1810/298048

Repository DOI

https://doi.org/10.17863/CAM.45105

Files

thesis (4.5 MB)

Type

Thesis

Authors

Gu, Shixiang

https://orcid.org/0000-0001-9246-0896

Abstract

Reinforcement learning (RL) is a powerful, generic approach to discovering optimal policies in complex sequential decision-making problems. Recently, with flexible function approximators such as neural networks, RL has greatly expanded its realm of applications, from playing computer games with pixel inputs, to mastering the game of Go, to learning parkour movements by simulated humanoids. However, the common RL approaches are known to be sample intensive, making them difficult to be applied to real-world problems such as robotics. This thesis makes several contributions toward developing RL algorithms for learning in the wild, where sample-efficiency and stability are critical. The key contributions include Normalized Advantage Functions (NAF), extending Q-learning for continuous action problems; Interpolated Policy Gradient (IPG), unifying prior policy gradient algorithm variants through theoretical analyses on bias and variance; and Temporal Difference Models (TDM), interpreting a parameterized Q-function as a generalized dynamics model for novel temporally abstracted model-based planning. Importantly, this thesis highlights that these algorithms can be seen as bridging gaps between branches of RL – model-based with modelfree, and on-policy with off-policy. The proposed algorithms not only achieve substantial improvements over the prior approaches, but also provide novel perspectives on how to mix different branches of RL effectively to gain the best of both worlds. NAF has subsequently been shown to be able to train two 7-DoF robot arms to open doors using only 2.5 hours of real-world experience, making it one of the first demonstrations of deep RL approaches on real robots.

Date

2018-10-31

Advisors

Turner, Richard E.
Ghahramani, Zoubin
Schoelkopf, Bernhard

Keywords

Reinforcement Learning, Continuous Control, Robotics, Deep Learning, Machine Learning, Model-based Planning, Model-free Reinforcement Learning

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Sponsorship

- Cambridge-Tuebingen PhD Fellowship in Machine Learning - Google Focused Research Award - NSERC

Collections

Theses - Engineering

Sample-Efficient Deep Reinforcement Learning for Continuous Control

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Date

Advisors

Keywords

Qualification

Awarding Institution

Rights and licensing

Sponsorship

Collections