Repository logo

Dataset Bias in Deception Detection.

Accepted version


Conference Object

Change log


Mambreyan, Ara 
Punskaya, Elena 


With the advances in Machine Learning, lie detection technology gained significant attention. In recent years, several multi-modal techniques achieved as high as 99% accuracy results using the Real-life Trial dataset with only 121 data points. This led to considerable media hype and research interest in lie detection with machine learning. In this paper, we analyze the effect of dataset bias in deception detection. More specifically, we train a classifier to predict the sex of the identity appearing in the video. On a test data point, we use the sex predictor to predict sex which we use as a proxy for predicting deception, predicting lie for females and truth for males. This lie predictor simulates a classifier that uses nothing but dataset bias. Nevertheless, we find that the performance of this biased classifier is comparable to those of state-of-the-art papers. More specifically, when using IDT features, our biased classifier achieves 64.6% and 59.3% AUC while a classifier trained normally on truth/lie labels achieves 57.4% accuracy and 69.3% AUC. We perform similar experiments on the Bag-of-Lies dataset and show that it too is biased with respect to sex. In addition, we apply the state-of- the-art techniques on an unbiased dataset and show that their performance is no better than chance. Our experiments strongly suggest that the results of recent deception detection techniques can be explained by the bias inherent in the datasets.



Journal Title


Conference Name

The 26th International Conference on Pattern Recognition

Journal ISSN

Volume Title


Engineering and Physical Sciences Research Council (EP/R030782/1)
European Commission Horizon 2020 (H2020) Societal Challenges (826232)