Repository logo
 

Leveraging a semantically annotated corpus to disambiguate prepositional phrase attachment


Change log

Abstract

Accurate parse ranking requires semantic information, since a sentence may have many candidate parses involving common syntactic constructions. In this paper, we propose a probabilistic frame- work for incorporating distributional semantic information into a maximum entropy parser. Further- more, to better deal with sparse data, we use a modified version of Latent Dirichlet Allocation to smooth the probability estimates. This LDA model generates pairs of lemmas, representing the two arguments of a semantic relation, and can be trained, in an unsupervised manner, on a corpus anno- tated with semantic dependencies. To evaluate our framework in isolation from the rest of a parser, we consider the special case of prepositional phrase attachment ambiguity. The results show that our semantically-motivated feature is effective in this case, and moreover, the LDA smoothing both produces semantically interpretable topics, and also improves performance over raw co-occurrence frequencies, demonstrating that it can successfully generalise patterns in the training data.

Description

Keywords

Journal Title

Iwcs 2015 Proceedings of the 11th International Conference on Computational Semantics

Conference Name

Journal ISSN

Volume Title

Publisher

The Association for Computer Linguistics

Publisher DOI

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales