Repository logo
 

XHate-999: Analyzing and Detecting Abusive Language Across Domains and Languages

Published version
Peer-reviewed

Loading...
Thumbnail Image

Type

Conference Object

Change log

Authors

Glavas, Goran 
Karan, Mladen 
Vulic, Ivan 

Abstract

We present XHate -999, a multi-domain and multilingual evaluation data set for abusive language detection. By aligning test instances across six typologically diverse languages, XHate-999 for the first time allows for disentanglement of the domain transfer and language transfer effects in abusive language detection. We conduct a series of domain- and language-transfer experiments with state-of-the-art monolingual and multilingual transformer models, setting strong baseline results and profiling XH ATE -999 as a comprehensive evaluation resource for abusive language detection. Finally, we show that domain- and language-adaptation, via intermediate masked language modeling on abusive corpora in the target language, can lead to substantially improved abusive language detection in the target language in the zero-shot transfer setups.

Description

Keywords

Journal Title

Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020)

Conference Name

28th International Conference on Computational Linguistics (COLING 2020)

Journal ISSN

Volume Title

Publisher

International Committee on Computational Linguistics
Sponsorship
European Research Council (648909)