Repository logo
 

Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information

Accepted version
Peer-reviewed

Loading...
Thumbnail Image

Change log

Abstract

Recent research on reinforcement learning (RL) has suggested that trained agents are vulnerable to maliciously-crafted adversarial samples. In this work, we show how such samples can be generalised from White-box and Grey-box attacks to a strong Black-box case, where the attacker has no knowledge of the agents, their training parameters or their training methods. We use sequence-to-sequence models to predict a single action or a sequence of future actions that a trained agent will make. First, we show that our approximation model, based on time-series information from the agent, consistently predicts RL agents’ future actions with high accuracy in a Black-box setup on a wide range of games and RL algorithms. Second, we find that although adversarial samples are transferable from the sequence-to-sequence model to our RL agents, they often outperform Random Gaussian Noise only marginally. Third, we propose a novel use for adversarial samples in Black-box attacks of RL agents: they can be used to trigger a trained agent to misbehave after a specific time delay. This potentially enables an attacker to use devices controlled by RL agents as time bombs.

Description

Journal Title

2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)

Conference Name

2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Rights and licensing

Except where otherwised noted, this item's license is described as All rights reserved