Repository logo
 

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Published version
Peer-reviewed

Type

Conference Object

Change log

Authors

Lauscher, Anne 
Vulic, Ivan 
Korhonen, Anna 
Glavas, Goran 

Abstract

Unsupervised pretraining models have been shown to facilitate a wide range of downstream NLP applications. These models, however, retain some of the limitations of traditional static word embeddings. In particular, they encode only the distributional knowledge available in raw text corpora, incorporated through language modeling objectives. In this work, we complement such distributional knowledge with external lexical knowledge, that is, we integrate the discrete knowledge on word-level semantic similarity into pretraining. To this end, we generalize the standard BERT model to a multi-task learning setting where we couple BERT’s masked language modeling and next sentence prediction objectives with an auxiliary task of binary word relation classification. Our experiments suggest that our "Lexically Informed” BERT (LIBERT), specialized for the word-level semantic similarity, yields better performance than the lexically blind “vanilla” BERT on several language understanding tasks. Concretely, LIBERT outperforms BERT in 9 out of 10 tasks of the GLUE benchmark and is on a par with BERT in the remaining one. Moreover, we show consistent gains on 3 benchmarks for lexical simplification, a task where knowledge about word-level semantic similarity is paramount, as well as large gains on lexical reasoning probes.

Description

Keywords

Journal Title

Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020)

Conference Name

28th International Conference on Computational Linguistics (COLING 2020)

Journal ISSN

Volume Title

Publisher

International Committee on Computational Linguistics
Sponsorship
European Research Council (648909)