Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information
View / Open Files
Authors
Zhao, Y
Shumailov, I
Cui, H
Gao, X
Mullins, R
Anderson, R
Publication Date
2020Journal Title
Proceedings - 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN-W 2020
Conference Name
2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)
ISBN
9781728172637
Publisher
IEEE
Pages
16-24
Type
Conference Object
This Version
AM
Metadata
Show full item recordCitation
Zhao, Y., Shumailov, I., Cui, H., Gao, X., Mullins, R., & Anderson, R. (2020). Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information. Proceedings - 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN-W 2020, 16-24. https://doi.org/10.1109/DSN-W50199.2020.00013
Abstract
Recent research on reinforcement learning (RL) has suggested that trained
agents are vulnerable to maliciously crafted adversarial samples. In this work,
we show how such samples can be generalised from White-box and Grey-box attacks
to a strong Black-box case, where the attacker has no knowledge of the agents,
their training parameters and their training methods. We use
sequence-to-sequence models to predict a single action or a sequence of future
actions that a trained agent will make. First, we show our approximation model,
based on time-series information from the agent, consistently predicts RL
agents' future actions with high accuracy in a Black-box setup on a wide range
of games and RL algorithms. Second, we find that although adversarial samples
are transferable from the target model to our RL agents, they often outperform
random Gaussian noise only marginally. This highlights a serious methodological
deficiency in previous work on such agents; random jamming should have been
taken as the baseline for evaluation. Third, we propose a novel use for
adversarial samplesin Black-box attacks of RL agents: they can be used to
trigger a trained agent to misbehave after a specific time delay. This appears
to be a genuinely new type of attack. It potentially enables an attacker to use
devices controlled by RL agents as time bombs.
Keywords
Reinforcement Learning, Adversarial Machine Learning
Identifiers
External DOI: https://doi.org/10.1109/DSN-W50199.2020.00013
This record's URL: https://www.repository.cam.ac.uk/handle/1810/313105
Rights
All rights reserved
Licence:
http://www.rioxx.net/licenses/all-rights-reserved
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.
Recommended or similar items
The current recommendation prototype on the Apollo Repository will be turned off on 03 February 2023. Although the pilot has been fruitful for both parties, the service provider IKVA is focusing on horizon scanning products and so the recommender service can no longer be supported. We recognise the importance of recommender services in supporting research discovery and are evaluating offerings from other service providers. If you would like to offer feedback on this decision please contact us on: support@repository.cam.ac.uk