CAMsterdam at SemEval-2019 task 6: Neural and graph-based feature extraction for the identification of offensive tweets

We describe the CAMsterdam team entry to the SemEval-2019 Shared Task 6 on offen-sive language identification in Twitter data.Our proposed model learns to extract tex-tual features using a multi-layer recurrent net-work, and then performs text classification us-ing gradient-boosted decision trees (GBDT). A self-attention architecture enables the model to focus on the most relevant areas in the text.We additionally learn globally optimised em-beddings for hashtags using node2vec, which are given as additional tweet features to the GBDT classifier.Our best model obtains78.79% macro F1-score on detecting offensive language (subtask A), 66.32% on categorising offence types (targeted/untargeted; subtask B),and 55.36% on identifying the target of of-fence (subtask C).

Journal Title

NAACL HLT 2019 - International Workshop on Semantic Evaluation, SemEval 2019, Proceedings of the 13th Workshop

Conference Name

In Proceedings of the NAACL International Workshop on Semantic Evaluation (SemEval 2019)

Publisher DOI

https://doi.org/10.17863/CAM.41797

Rights

Attribution 4.0 International

Sponsorship

Cambridge Assessment (unknown)

Collections

Cambridge University Research Outputs