PIPPS: Flexible model-based policy search robust to the curse of chaos
View / Open Files
Authors
Parmas, P
Rasmussen, CE
Peters, J
Doya, K
Publication Date
2018Journal Title
35th International Conference on Machine Learning, ICML 2018
Conference Name
International Conference on Machine Learning
ISSN
2640-3498
ISBN
9781510867963
Volume
9
Pages
6463-6472
Type
Conference Object
Metadata
Show full item recordCitation
Parmas, P., Rasmussen, C., Peters, J., & Doya, K. (2018). PIPPS: Flexible model-based policy search robust to the curse of chaos. 35th International Conference on Machine Learning, ICML 2018, 9 6463-6472. https://doi.org/10.17863/CAM.27143
Abstract
Previously, the exploding gradient problem has
been explained to be central in deep learning and
model-based reinforcement learning, because it
causes numerical issues and instability in optimization.
Our experiments in model-based reinforcement
learning imply that the problem is not
just a numerical issue, but it may be caused by
a fundamental chaos-like nature of long chains
of nonlinear computations. Not only do the magnitudes
of the gradients become large, the direction
of the gradients becomes essentially random.
We show that reparameterization gradients suffer
from the problem, while likelihood ratio gradients
are robust. Using our insights, we develop
a model-based policy search framework, Probabilistic
Inference for Particle-Based Policy Search
(PIPPS), which is easily extensible, and allows
for almost arbitrary models and policies, while
simultaneously matching the performance of previous
data-efficient learning algorithms. Finally,
we invent the total propagation algorithm, which
efficiently computes a union over all pathwise
derivative depths during a single backwards pass,
automatically giving greater weight to estimators
with lower variance, sometimes improving over
reparameterization gradients by 10^6 times.
Sponsorship
Alan Turing Institute (unknown)
Identifiers
External DOI: https://doi.org/10.17863/CAM.27143
This record's URL: https://www.repository.cam.ac.uk/handle/1810/279773
Rights
Licence:
http://www.rioxx.net/licenses/all-rights-reserved
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.
Recommended or similar items
The current recommendation prototype on the Apollo Repository will be turned off on 03 February 2023. Although the pilot has been fruitful for both parties, the service provider IKVA is focusing on horizon scanning products and so the recommender service can no longer be supported. We recognise the importance of recommender services in supporting research discovery and are evaluating offerings from other service providers. If you would like to offer feedback on this decision please contact us on: support@repository.cam.ac.uk