Repository logo
 

Modeling Word Forms Using Latent Underlying Morphs and Phonology

Published version
Peer-reviewed

Type

Conference Object

Change log

Authors

Cotterell, Ryan 
Peng, Nanyun 
Eisner, Jason 

Abstract

The observed pronunciations or spellings of words are often explained as arising from the “underlying forms” of their mor- phemes. These forms are latent strings that linguists try to reconstruct by hand. We propose to reconstruct them automatically at scale, enabling generalization to new words. Given some surface word types of a concatenative language along with the abstract morpheme sequences that they ex- press, we show how to recover consistent underlying forms for these morphemes, together with the (stochastic) phonology that maps each concatenation of underly- ing forms to a surface form. Our technique involves loopy belief propagation in a nat- ural directed graphical model whose vari- ables are unknown strings and whose con- ditional distributions are encoded as finite- state machines with trainable weights. We define training and evaluation paradigms for the task of surface word prediction, and report results on subsets of 7 languages.

Description

Keywords

47 Language, Communication and Culture, 4704 Linguistics

Journal Title

Transactions of the Association for Computational Linguistics (TACL) 2015

Conference Name

Association for Computational Linguistics

Journal ISSN

2307-387X
2307-387X

Volume Title

3

Publisher

MIT Press