Towards Certifiable Adversarial Sample Detection
Accepted version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Abstract
Convolutional Neural Networks (CNNs) are deployed in more and more
classification systems, but adversarial samples can be maliciously crafted to
trick them, and are becoming a real threat. There have been various proposals
to improve CNNs' adversarial robustness but these all suffer performance
penalties or other limitations. In this paper, we provide a new approach in the
form of a certifiable adversarial detection scheme, the Certifiable Taboo Trap
(CTT). The system can provide certifiable guarantees of detection of
adversarial inputs for certain