SemEval-2020 Task 3: Graded Word Similarity in Context

Santos Armendariz, Carlos; Purver, Matthew; Pollak, Senja; Ljubesic, Nikola; Ulcar, Matej; Vulic, Ivan; Pilehvar, Mohammad Taher

SemEval-2020 Task 3: Graded Word Similarity in Context

Published version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/315096

Repository DOI

https://doi.org/10.17863/CAM.62203

Files

Published version (600.71 KB)

Type

Conference Object

Authors

Santos Armendariz, Carlos

Purver, Matthew

Pollak, Senja

Ljubesic, Nikola

Ulcar, Matej

Show 2 more

Abstract

This paper presents the Graded Word Similarity in Context (GWSC) task which asked participants to predict the effects of context on human perception of similarity in English, Croatian, Slovene and Finnish. We received 15 submissions and 11 system description papers. A new dataset (CoSimLex) was created for evaluation in this task: it contains pairs of words, each annotated within two short text passages. Systems beat the baselines by significant margins, but few did well in more than one language or subtask. Almost every system employed a Transformer model, but with many variations in the details: WordNet sense embeddings, translation of contexts, TF-IDF weightings, and the automatic creation of datasets for fine-tuning were all used to good effect.

Journal Title

Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval 2020)

Conference Name

14th International Workshop on Semantic Evaluation (SemEval 2020)

Publisher

International Committee for Computational Linguistics

Publisher DOI

https://doi.org/10.17863/CAM.62203

Rights

Attribution 4.0 International

Sponsorship

European Research Council (648909)

Collections

Cambridge University Research Outputs