Repository logo
 

Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness

Published version
Peer-reviewed

Type

Conference Object

Change log

Authors

Malinin, Andrey 

Abstract

Ensemble approaches for uncertainty estimation have recently been applied tothe tasks of misclassification detection, out-of-distribution input detection andadversarial attack detection. Prior Networks have been proposed as an approachto efficientlyemulatean ensemble of models for classification by parameteris-ing a Dirichlet prior distribution over output distributions. These models havebeen shown to outperform alternative ensemble approaches, such as Monte-CarloDropout, on the task of out-of-distribution input detection. However, scalingPrior Networks to complex datasets with many classes is difficult using the train-ing criteria originally proposed. This paper makes two contributions. First, weshow that the appropriate training criterion for Prior Networks is thereverseKL-divergence between Dirichlet distributions. This addresses issues in the nature ofthe training data target distributions, enabling prior networks to be successfullytrained on classification tasks with arbitrarily many classes, as well as improvingout-of-distribution detection performance. Second, taking advantage of this newtraining criterion, this paper investigates using Prior Networks to detect adversarialattacks and proposes a generalized form of adversarial training. It is shown that theconstruction of successfuladaptivewhitebox attacks, which affect the predictionand evade detection, against Prior Networks trained on CIFAR-10 and CIFAR-100using the proposed approach requires a greater amount of computational effort thanagainst networks defended using standard adversarial training or MC-dropo

Description

Keywords

Journal Title

Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

Conference Name

NeurIPS

Journal ISSN

1049-5258

Volume Title

Publisher

Sponsorship
EPSRC Cambridge Assessment