Repository logo

Multi-modal Human Behaviour Graph Representation Learning for Automatic Depression Assessment

Accepted version


Conference Object

Change log


Shen, Haotian 
Song, Siyang 


Automatic depression assessment (ADA) often relies on crucial cues embedded in human verbal and non-verbal behaviors, which exists in video, audio, and text modalities. Although these modalities often show in time-series forms, current research offers limited exploration of the complex intra-modal temporal dynamics inherent to each modality, failing to extract the depression-related cues in a global view. While many methodologies attempt to exploit the multifaceted information encoded across modalities via decision-level or feature-level fusion techniques, they often fall short in effectively representing pairwise inter-modal relationships, which is the key to utilize the distinct complementary relationship between each modality pair. This paper presents a novel graph-based multimodal fusion approach, which can model intra-modal and inter-modal dynamics conveniently using a graph representation. It adopts undirected edges to link not only temporally continuous, pre-extracted features of each modality, but also temporally aligned features across each pair of modalities. This ensures the seamless propagation of global information across temporal dimensions and helps capture the pairwise inter-modal dynamics. We conduct experiments on the E-DAIC dataset to prove our approach's effectiveness, with an RMSE of 4.80 and a CCC value of 0.563, which rival the top-performing method. We also experiment on the AFAR-BSFP dataset to show the generality of our approach. Our code will be made publicly available.



Journal Title

Conference Name

The 18th IEEE International Conference on Automatic Face and Gesture Recognition

Journal ISSN

Volume Title


Publisher DOI

Publisher URL

Engineering and Physical Sciences Research Council (EP/R030782/1)