CAMsterdam at SemEval-2019 task 6: Neural and graph-based feature extraction for the identification of offensive tweets
Published version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Abstract
We describe the CAMsterdam team entry to the SemEval-2019 Shared Task 6 on offen-sive language identification in Twitter data.Our proposed model learns to extract tex-tual features using a multi-layer recurrent net-work, and then performs text classification us-ing gradient-boosted decision trees (GBDT). A self-attention architecture enables the model to focus on the most relevant areas in the text.We additionally learn globally optimised em-beddings for hashtags using node2vec, which are given as additional tweet features to the GBDT classifier.Our best model obtains78.79% macro F1-score on detecting offensive language (subtask A), 66.32% on categorising offence types (targeted/untargeted; subtask B),and 55.36% on identifying the target of of-fence (subtask C).