Morph-fitting: Fine-tuning word vector spaces with simple language-specific rules
Publication Date
2017Journal Title
ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
Conference Name
Proceedings of the 55th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers)
ISBN
9781945626753
Publisher
Association for Computational Linguistics
Language
English
Type
Conference Object
This Version
VoR
Metadata
Show full item recordCitation
Vulic, I., Mrkšic, N., Reichart, R., Séaghdha, D., Young, S., & Korhonen, A. (2017). Morph-fitting: Fine-tuning word vector spaces with simple language-specific rules. ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) https://doi.org/10.18653/v1/P17-1006
Abstract
Morphologically rich languages accentuate two properties of distributional vector space models: 1) the difficulty of inducing accurate representations for low-frequency word forms; and 2) insensitivity to distinct lexical relations that have similar distributional signatures. These effects are detrimental for language understanding systems, which may infer that inexpensive is a rephrasing for expensive or may not associate acquire with acquires. In this work, we propose a novel morph-fitting procedure which moves past the use of curated semantic lexicons for improving distributional vector spaces. Instead, our method injects morphological constraints generated using simple language-specific rules, pulling inflectional forms of the same word close together and pushing derivational antonyms far apart. In intrinsic evaluation over four languages, we show that our approach: 1) improves low-frequency word estimates; and 2) boosts the semantic quality of the entire word vector collection. Finally, we show that morph-fitted vectors yield large gains in the downstream task of dialogue state tracking, highlighting the importance of morphology for tackling long-tail phenomena in language understanding tasks.
Keywords
Semantic specialisation, Morphologically complex languages, Vector space models, Dialogue state tracking, Word embeddings
Sponsorship
European Research Council (648909)
Embargo Lift Date
2100-01-01
Identifiers
External DOI: https://doi.org/10.18653/v1/P17-1006
This record's URL: https://www.repository.cam.ac.uk/handle/1810/264637
Rights
Attribution 4.0 International, Attribution 4.0 International, Attribution 4.0 International, Attribution 4.0 International, Attribution 4.0 International
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.