Repository logo

Multimodal Deep Learning Framework for Mental Disorder Recognition

Accepted version


Conference Object

Change log


Zhang, Z 
Lin, W 
Liu, M 
Mahmoud, M 


Current methods for mental disorder recognition mostly depend on clinical interviews and self-reported scores that can be highly subjective. Building an automatic recognition system can help in early detection of symptoms and providing insights into the biological markers for diagnosis. It is, however, a challenging task as it requires taking into account indicators from different modalities, such as facial expressions, gestures, acoustic features and verbal content. To address this issue, we propose a general-purpose multimodal deep learning framework, in which multiple modalities - including acoustic, visual and textual features - are processed individually with the cross-modality correlation considered. Specifically, a Multimodal Deep Denoising Autoencoder (multiDDAE) is designed to obtain multimodal representations of audio-visual features followed by the Fisher Vector encoding which produces session-level descriptors. For textual modality, a Paragraph Vector (PV) is proposed to embed the transcripts of interview sessions into document representations capturing cues related to mental disorders. Following an early fusion strategy, both audio-visual and textual features are then fused prior to feeding them to a Multitask Deep Neural Network (DNN) as the final classifier. Our framework is evaluated on the automatic detection of two mental disorders: bipolar disorder (BD) and depression, using two datasets: Bipolar Disorder Corpus (BDC) and the Extended Distress Analysis Interview Corpus (E-DAIC), respectively. Our experimental evaluation results showed comparable performance to the state-of-the art in BD and depression detection, thus demonstrating the effective multimodal representation learning and the capability to generalise across different mental disorders.



46 Information and Computing Sciences, 4608 Human-Centred Computing, 4603 Computer Vision and Multimedia Computation, Mental Illness, Behavioral and Social Science, Machine Learning and Artificial Intelligence, Brain Disorders, Bipolar Disorder, Mental Health, Depression, Serious Mental Illness, Networking and Information Technology R&D (NITRD), Mental health, 3 Good Health and Well Being

Journal Title

Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020

Conference Name

2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)

Journal ISSN

Volume Title




All rights reserved