Repository logo
 

Cross-lingual semantic specialization via lexical relation induction

Accepted version
Peer-reviewed

Type

Conference Object

Change log

Authors

Ponti, EM 
Vulić, I 
Glavaš, G 
Reichart, R 
Korhonen, A 

Abstract

Semantic specialization integrates structured linguistic knowledge from external resources (such as lexical relations in WordNet) into pretrained distributional vectors in the form of constraints. However, this technique cannot be leveraged in many languages, because their structured external resources are typically incomplete or non-existent. To bridge this gap, we propose a novel method that transfers specialization from a resource-rich source language (English) to virtually any target language. Our specialization transfer comprises two crucial steps: 1) Inducing noisy constraints in the target language through automatic word translation; and 2) Filtering the noisy constraints via a state-of-the-art relation prediction model trained on the source language constraints. This allows us to specialize any set of distributional vectors in the target language with the refined constraints. We prove the effectiveness of our method through intrinsic word similarity evaluation in 8 languages, and with 3 downstream tasks in 5 languages: lexical simplification, dialog state tracking, and semantic textual similarity. The gains over the previous state-of-art specialization methods are substantial and consistent across languages. Our results also suggest that the transfer method is effective even for lexically distant source-target language pairs. Finally, as a by-product, our method produces lists of WordNet-style lexical relations in resource-poor languages.

Description

Keywords

Journal Title

EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference

Conference Name

2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing

Journal ISSN

Volume Title

Publisher

Rights

All rights reserved
Sponsorship
European Research Council (648909)