Data-efficient reinforcement learning in continuous state-action Gaussian-POMDPs
View / Open Files
Publication Date
2017-01-01Journal Title
Advances in Neural Information Processing Systems
ISSN
1049-5258
Volume
2017-December
Pages
2041-2050
Type
Conference Object
This Version
AM
Metadata
Show full item recordCitation
McAllister, R., & Rasmussen, C. (2017). Data-efficient reinforcement learning in continuous state-action Gaussian-POMDPs. Advances in Neural Information Processing Systems, 2017-December 2041-2050. https://doi.org/10.17863/CAM.21273
Abstract
We present a data-efficient reinforcement learning method for continuous state-action systems under significant observation noise. Data-efficient solutions under small noise exist, such as PILCO which learns the cartpole swing-up task in 30s. PILCO evaluates policies by planning state-trajectories using a dynamics model. However, PILCO applies policies to the observed state, therefore planning in observation space. We extend PILCO with filtering to instead plan in belief space, consistent with partially observable Markov decisions process (POMDP) planning. This enables data-efficient learning under significant observation noise, outperforming more naive methods such as post-hoc application of a filter to policies optimised by the original (unfiltered) PILCO algorithm. We test our method on the cartpole swing-up task, which involves nonlinear dynamics and requires nonlinear control.
Sponsorship
Alan Turing Institute (unknown)
EPSRC (EP/J012300/1)
Identifiers
This record's DOI: https://doi.org/10.17863/CAM.21273
This record's URL: https://www.repository.cam.ac.uk/handle/1810/274184
Rights
Licence:
http://www.rioxx.net/licenses/all-rights-reserved