Repository logo
 

Parallel reward and punishment control in humans and robots: Safe reinforcement learning using the MaxPain algorithm

Accepted version
Peer-reviewed

Type

Article

Change log

Authors

Elfwing, S 

Abstract

An important issue in reinforcement learning systems for autonomous agents is whether it makes sense to have separate systems for predicting rewards and punishments. In robotics, learning and control are typically achieved by a single controller, with punishments coded as negative rewards. However in biological systems, some evidence suggests that the brain has a separate system for punishment. Although this may in part be due to biological constraints of implementing negative quantities, it raises the question as to whether there is any computational rationale for keeping reward and punishment prediction operationally distinct. Here we outline a basic argument supporting this idea, based on the proposition that learning best-case predictions (as in Q-learning) does not always achieve the safest behaviour. We introduce a modified RL scheme involving a new algorithm which we call 'MaxPain' - which back-ups worst-case predictions in parallel, and then scales the two predictions in a multi-attribute RL policy. i.e. independently learning 'what to do' as well as 'what not to do' and then combining this information. We show how this scheme can improve performance in benchmark RL environments, including a grid-world experiment and a delayed version of the mountain car experiment. In particular, we demonstrate how early exploration and learning are substantially improved, leading to much 'safer' behaviour. In conclusion, the results illustrate the importance of independent punishment prediction in RL, and provide a testable framework for better understanding punishment (such as pain) and avoidance in humans, in both health and disease.

Description

Keywords

46 Information and Computing Sciences, 4602 Artificial Intelligence, 4611 Machine Learning, Clinical Research, 1.2 Psychological and socioeconomic processes, 1 Underpinning research, Mental health, Generic health relevance

Journal Title

7th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, ICDL-EpiRob 2017

Conference Name

Journal ISSN

2161-9484

Volume Title

2018-January

Publisher

IEEE
Sponsorship
Wellcome Trust (097490/Z/11/Z)
Arthritis Research UK (21537)