Anomalous Inputs in Deep Learning: a Probabilistic Perspective
Repository URI
Repository DOI
Change log
Authors
Abstract
Can neural networks recognize their own limitations? A crucial aspect of robustness is the ability to identify when an input falls outside the scope of one’s knowledge or training --- a task known as out-of-distribution (OOD) detection. For example, a dog breed classifier should ideally recognize a cat image as OOD and refrain from classifying it as a breed of dog.
Conversely, can we manipulate neural networks into making confident but incorrect classifications? This task, known as adversarial attack, involves altering an input to be misclassified while preserving its original semantic content.
Both of these tasks are concerned with anomalous inputs to a neural network, but they have so far been addressed by two different bodies of literature, using different sets of tools. In this thesis I introduce a probabilistic framework that I call the `three distribution problem', which unifies both tasks. I use this framework to develop new methods for detecting OOD inputs and for creating adversarial attacks.
Furthermore, my three-distribution approach gives insight into how `semantics' is operationalized: I demonstrate how we can visualize the semantics implicit in off-the-shelf OOD detection algorithms; and my adversarial attack method allows the attacker to specify explicitly the semantics that are to be preserved by the attack.

