Repository logo
 

Deep Memory Models and Efficient Reinforcement Learning under Partial Observability


Loading...
Thumbnail Image

Type

Change log

Abstract

Reinforcement learning is a framework for optimal decision making that considers the long-term consequences of actions. Deep variants of reinforcement learning have emerged as powerful tools for decision making in complex environments. Prior work has mastered complex games like Go and Atari, outperforming some of the best humans on Earth. Yet, these impressive feats are often constrained to games or simulations. What is keeping them from the real world? In this thesis, we address one major roadblock: limited and imperfect sensory information.

In many realistic tasks, sensory information is noisy or incomplete, breaking a core assumption of reinforcement learning. The solution to this challenge is actually well known -- the use of memory. Memory is the storage and recall of sensory information for use in decision making, similar to the function of memory in humans and many other organisms. Memory enables such organisms to build and update internal representations of the world, make educated guesses, and succeed in the face of uncertainty. What is not well understood is how to model memory reliably or in a tractable manner. In this thesis, we attempt to make memory modeling slightly less intractable and slightly more practical.

First, we propose a form of memory that utilizes prior knowledge we have about a task. Using this knowledge, we construct a graph of memories on the fly, improving data and parameter efficiency when compared with standard memory models. Next, we discuss a large scale study of memory models. We design a variety of procedurally generated tasks, and then implement and evaluate an array of memory models on these tasks. Taking a practical approach, we determine which models show promise, saving time and computational cycles for future researchers. Then, we explore models of human memory as prescribed by computational psychologists. Using these principles, we develop a memory model that achieves better time and space efficiency than standard models. We go on to show this method outperforms prior work, while also exhibiting interesting theoretical properties. Finally, we discover a unifying theoretical framework for efficient memory modeling, encompassing many existing memory models. Using this framework, we suggest a new way to train memory models, improving time, space, and data efficiency.

Description

Date

2024-07-18

Advisors

Prorok, Amanda

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International (CC BY 4.0)
Sponsorship
Toshiba Europe Ltd