Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness
Published version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Abstract
Ensemble approaches for uncertainty estimation have recently been applied tothe tasks of misclassification detection, out-of-distribution input detection andadversarial attack detection. Prior Networks have been proposed as an approachto efficientlyemulatean ensemble of models for classification by parameteris-ing a Dirichlet prior distribution over output distributions. These models havebeen shown to outperform alternative ensemble approaches, such as Monte-CarloDropout, on the task of out-of-distribution input detection. However, scalingPrior Networks to complex datasets with many classes is difficult using the train-ing criteria originally proposed. This paper makes two contributions. First, weshow that the appropriate training criterion for Prior Networks is thereverseKL-divergence between Dirichlet distributions. This addresses issues in the nature ofthe training data target distributions, enabling prior networks to be successfullytrained on classification tasks with arbitrarily many classes, as well as improvingout-of-distribution detection performance. Second, taking advantage of this newtraining criterion, this paper investigates using Prior Networks to detect adversarialattacks and proposes a generalized form of adversarial training. It is shown that theconstruction of successfuladaptivewhitebox attacks, which affect the predictionand evade detection, against Prior Networks trained on CIFAR-10 and CIFAR-100using the proposed approach requires a greater amount of computational effort thanagainst networks defended using standard adversarial training or MC-dropo