A Joint Model for Word Embedding and Word Morphology

This paper presents a joint model for performing unsupervised morphological analysis on words, and learning a character-level composition function from morphemes to word embeddings. Our model splits individual words into segments, and weights each segment according to its ability to predict context words. Our morphological analysis is comparable to dedicated morphological analyzers at the task of morpheme boundary recovery, and also performs better than word-based embedding models at the task of syntactic analogy answering. Finally, we show that incorporating morphology explicitly into character-level models helps them produce embeddings for unseen words which correlate better with human judgments.

Keywords

cs.CL, cs.CL

Conference Name

Workshop on Representation Learning for NLP

Publisher DOI

https://doi.org/10.17863/CAM.21365

Rights and licensing

Except where otherwised noted, this item's license is described as http://www.rioxx.net/licenses/all-rights-reserved

Sponsorship

EPSRC (1510349)

Collections

University of Cambridge Research Outputs (Articles and Conferences)