Repository logo
 

Architectural Backdoors in Neural Networks

Accepted version
Peer-reviewed

Type

Conference Object

Change log

Authors

Bober-Irizar, M 
Shumailov, I 
Zhao, Y 
Papernot, N 

Abstract

Machine learning is vulnerable to adversarial manipulation. Previous literature demonstrated that at the training stage attackers can manipulate data [14] and data sampling procedures [29] to control model behaviour. A common attack goal is to plant backdoors i.e. force the victim model to learn to recognise a trigger known only by the adversary. In this paper, we introduce a new class of backdoor attacks that hide inside model architectures i.e. in the inductive bias of the functions used to train. These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture that others will reuse unknowingly. We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch. We formalise the main construction principles behind architectural backdoors, such as a connection between the input and the output, and describe some possible protections against them. We evaluate our attacks on computer vision benchmarks of different scales and demonstrate the underlying vulnerability is pervasive in a variety of common training settings.

Description

Keywords

46 Information and Computing Sciences, 4611 Machine Learning, Networking and Information Technology R&D (NITRD)

Journal Title

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Conference Name

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Journal ISSN

1063-6919

Volume Title

2023-June

Publisher

IEEE