Repository logo
 

Are All Good Word Vector Spaces Isomorphic?

Published version
Peer-reviewed

Loading...
Thumbnail Image

Type

Conference Object

Change log

Authors

Vulic, Ivan 
Ruder, Sebastian 
Søgaard, Anders 

Abstract

Existing algorithms for aligning cross-lingual word vector spaces assume that vector spaces are approximately isomorphic. As a result, they perform poorly or fail completely on non-isomorphic spaces. Such non-isomorphism has been hypothesised to result from typological differences between languages. In this work, we ask whether non-isomorphism is also crucially a sign of degenerate word vector spaces. We present a series of experiments across diverse languages which show that variance in performance across language pairs is not only due to typological differences, but can mostly be attributed to the size of the monolingual resources available, and to the properties and duration of monolingual training (e.g. "under-training").

Description

Keywords

Journal Title

Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)

Conference Name

Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)

Journal ISSN

Volume Title

Publisher

Association for Computational Linguistics

Rights

All rights reserved
Sponsorship
European Research Council (648909)