Analyzing Biases to Spurious Correlations in Text Classification Tasks
Accepted version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Abstract
Machine learning systems have shown impressive performance across a range of natural language tasks. However, it has been hypothesized that these systems are prone to learning spurious correlations that may be present in the training data. Though these correlations will not impact in-domain performance, they are unlikely to generalize well to out-of-domain data, limiting the applicability of systems. This work examines this phenomenon on text classification tasks. Rather than artificially injecting features into the data, we demonstrate that real spurious correlations can be exploited by current stateof-the-art deep-learning systems. Specifically, we show that even when only ‘stop’ words are available at the input stage, it is possible to predict the class significantly better than random. Though it is shown that these stop words are not required for good in-domain performance, they can degrade the ability of the system to generalize well to out-of-domain data