Dependency parsing of learner English

Huang, Yan; Murakami, Akira; Alexopoulou, Theodora; Korhonen, Anna

Dependency parsing of learner English

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/275806

Repository DOI

https://doi.org/10.17863/CAM.23072

Files

Accepted version (648.14 KB)

Type

Article

Authors

Huang, Yan

https://orcid.org/0000-0002-6879-0446

Murakami, Akira

Alexopoulou, Theodora

Korhonen, Anna

Abstract

Current syntactic annotation of large-scale learner corpora mainly resorts to “standard parsers” trained on native language data. Understanding how these parsers perform on learner data is important for downstream research and application related to learner language. This study evaluates the performance of multiple standard probabilistic parsers on learner English. Our contributions are three-fold. Firstly, we demonstrate that the common practice of constructing a gold standard – by manually correcting the pre-annotation of a single parser – can introduce bias to parser evaluation. We propose an alternative annotation method which can control for the annotation bias. Secondly, we quantify the influence of learner errors on parsing errors, and identify the learner errors that impact on parsing most. Finally, we compare the performance of the parsers on learner English and native English. Our results have useful implications on how to select a standard parser for learner English.

Keywords

dependency parsing, learner English, annotation bias, parsing accuracy, learner error

Journal Title

International Journal of Corpus Linguistics

Journal ISSN

1384-6655
1569-9811

Volume Title

23

Publisher

John Benjamins Publishing Company

Publisher DOI

https://doi.org/10.1075/ijcl.16080.hua

Rights

http://www.rioxx.net/licenses/all-rights-reserved

Collections

Scholarly Works - Theoretical and Applied Linguistics