Repository logo
 

On the evaluation and application of neural language models for grammatical error detection


Loading...
Thumbnail Image

Type

Change log

Authors

Abstract

Neural language models (NLM) have become a core component in many downstream applications within the field of natural language processing, including the task of data-driven automatic grammatical error detection (GED). This thesis explores whether information from NLMs can positively transfer to GED within the domain of learning English as a second language (ESL), and looks at whether NLMs encode and make use of linguistic signals that would facilitate robust and generalisable GED performance.

First, I investigate whether information from different types of neural language model can be transferred to models for GED. I evaluate five models against three publicly available ESL benchmarks, and report results showing positive transfer effects to the extent that fine-grained error detection using a single model is becoming viable. Second, I carry out a causal investigation to understand whether NLM-GED models make use of robust linguistic signals during inference – in theory, this would enable them to generalise across different data distributions. The results show a high degree of linear encoding of noun-number within each model’s token-level contextual representations, but they also show markedly varying error detection performance across model types and across in- and out-of-domain datasets. Altogether, the results indicate models employ different strategies for error detection. Third, I re-frame the typically downstream GED task as an evaluation framework to test whether the pre-trained NLMs implicitly encode information about grammatical errors as an artefact of their language modelling objective. I present results illustrating stark differences between masked language models and autoregressive language models – while the former seemingly encodes much more information related to the detection of grammatical errors, the results also present evidence of a brittle encoding across different syntactic constructions.

Altogether, this thesis presents a holistic analysis of NLMs – how they might be applied to GED, whether they utilise linguistic information to enable robust inference, and whether their pre-training objective implicitly imbues them with knowledge about grammaticality.

Description

Date

2023-09-22

Advisors

Buttery, Paula

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as All Rights Reserved
Sponsorship
EPSRC (1940766)