Repository logo
 

Improving bilingual lexicon induction with unsupervised post-processing of monolingual word vector spaces

Accepted version
Peer-reviewed

Type

Conference Object

Change log

Authors

Vulić, I 
Korhonen, A 
Glavaš, G 

Abstract

Work on projection-based induction of cross-lingual word embedding spaces (CLWEs) predominantly focuses on the improvement of the projection (i.e., mapping) mechanisms. In this work, in contrast, we show that a simple method for post-processing monolingual embedding spaces facilitates learning of the cross-lingual alignment and, in turn, substantially improves bilingual lexicon induction (BLI). The post-processing method we examine is grounded in the generalisation of first- and second-order monolingual similarities to the nth-order similarity. By post-processing monolingual spaces before the cross-lingual alignment, the method can be coupled with any projection-based method for inducing CLWE spaces. We demonstrate the effectiveness of this simple monolingual post-processing across a set of 15 typologically diverse languages (i.e., 15*14 BLI setups), and in combination with two different projection methods.

Description

Keywords

Journal Title

Proceedings of the Annual Meeting of the Association for Computational Linguistics

Conference Name

Proceedings of the 5th Workshop on Representation Learning for NLP (RepL4NLP, collocated with ACL 2020)

Journal ISSN

0736-587X

Volume Title

Publisher

Rights

All rights reserved
Sponsorship
European Research Council (648909)