Sequential decision making with feature-linear models
View / Open Files
Authors
Janz, David
Advisors
Hernández-Lobato, José Miguel
Ghahramani, Zoubin
Date
2021-10-01Awarding Institution
University of Cambridge
Qualification
Doctor of Philosophy (PhD)
Type
Thesis
Metadata
Show full item recordCitation
Janz, D. (2021). Sequential decision making with feature-linear models (Doctoral thesis). https://doi.org/10.17863/CAM.84872
Abstract
This thesis is concerned with the problem of sequential decision making, where an agent interacts sequentially with an unknown environment and aims to maximise the sum of the rewards it receives. Our focus is on methods that model the reward as linear in some feature space. We consider a bandit problem, where the rewards are linear in a reproducing kernel Hilbert space, and a reinforcement learning setting with features given by a neural network. The thesis is split into two parts accordingly.
In part I, we introduce a new algorithm for the optimisation of continuous functions with a known number of derivatives under noisy bandit feedback. The algorithm, Partitioned GP-UCB, combines ideas from classic kernel-based bandit algorithms and hierarchical optimisation methods to provide near-tight confidence intervals and strong guarantees on computational complexity. As part of our analysis, we develop a novel bound on the effective dimension of kernel ridge regression with the Matérn kernel with respect to the volume of the domain. Part I is mainly concerned with the derivation of this bound.
In part II we tackle practical exploration in deep reinforcement learning, with a focus on methods that combine linear uncertainty estimates with feature embeddings given by deep Q-function networks. We observe a flaw within previous such work: while these methods enforce a temporal difference relationship on the mean state-action values given by the linear model, they do not constrain its inputs---the neural network embeddings---and these determine the uncertainty estimates of linear models. We show that such embeddings need to follow a certain successor feature structure and develop a model based on this insight: Successor Uncertainties. We demonstrate that our model is capable of solving hard tabular exploration challenges and that it scales to the Atari Learning Environment, where it attains strong scores relative to competing methods.
Keywords
Kernel methods, Machine learning, Sequential decision making, Reinforcement learning, Bandit algorithms, Feature-linear models, Gaussian processes
Identifiers
This record's DOI: https://doi.org/10.17863/CAM.84872
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.
Recommended or similar items
The current recommendation prototype on the Apollo Repository will be turned off on 03 February 2023. Although the pilot has been fruitful for both parties, the service provider IKVA is focusing on horizon scanning products and so the recommender service can no longer be supported. We recognise the importance of recommender services in supporting research discovery and are evaluating offerings from other service providers. If you would like to offer feedback on this decision please contact us on: support@repository.cam.ac.uk