Dataset Bias in Deception Detection

Mambreyan, Ara 
Punskaya, Elena 

Thumbnail Image
Conference Object
Change log

With the advances in Machine Learning, lie detection technology gained significant attention. In recent years, several multi-modal techniques achieved as high as 99% accuracy results using the Real-life Trial dataset with only 121 data points. This led to considerable media hype and research interest in lie detection with machine learning. In this paper, we analyze the effect of dataset bias in deception detection. More specifically, we train a classifier to predict the sex of the identity appearing in the video. On a test data point, we use the sex predictor to predict sex which we use as a proxy for predicting deception, predicting lie for females and truth for males. This lie predictor simulates a classifier that uses nothing but dataset bias. Nevertheless, we find that the performance of this biased classifier is comparable to those of state-of-the-art papers. More specifically, when using IDT features, our biased classifier achieves 64.6% and 59.3% AUC while a classifier trained normally on truth/lie labels achieves 57.4% accuracy and 69.3% AUC. We perform similar experiments on the Bag-of-Lies dataset and show that it too is biased with respect to sex. In addition, we apply the state-of- the-art techniques on an unbiased dataset and show that their performance is no better than chance. Our experiments strongly suggest that the results of recent deception detection techniques can be explained by the bias inherent in the datasets.

Publication Date
Online Publication Date
Acceptance Date
Journal Title
Proceedings of the 26th International Conference on Pattern Recognition
Journal ISSN
Volume Title
Engineering and Physical Sciences Research Council (EP/R030782/1)
European Commission Horizon 2020 (H2020) Societal Challenges (826232)