Modelling text meta-properties in automated text scoring for non-native English writing

Zhang, Meng

Modelling text meta-properties in automated text scoring for non-native English writing

Repository URI

https://www.repository.cam.ac.uk/handle/1810/294717

Repository DOI

https://doi.org/10.17863/CAM.41820

Files

Thesis (1.66 MB)

Type

Thesis

Authors

Zhang, Meng

Abstract

Automated text scoring (ATS) is the task of automatically scoring a text based on some given grading criteria. This thesis focuses on ATS in the context of free-text writing exams aimed at learners of English as a foreign language (EFL). The benefit of an ATS system is primarily to provide instant and consistent feedback to language learners, and service reliability also forms a crucial part of an ATS system. Based on previous work, we investigated only partially explored meta-properties in text and integrated them into a machine learning based ATS model across multiple datasets:

In most previous work, the proposed models implicitly assume that texts produced by learners in an exam are written independently. However, this is not true for the exams where learners are required to compose multiple texts. We hence explicitly instructed our model which texts are written by the same learner, which boosts model performance in most cases.

We used three intra-exam properties within the same exam including prompt, genre and task as a starting point, and we showed that explicitly modelling these properties via frustratingly easy domain adaptation (FEDA) can positively affect model performance in some cases. Furthermore, modelling multiple intra-exam properties together is better than modelling any single property individually or no property in four out of five test sets.

We studied how to utilise and combine learners' responses from multiple writing exams. We also proposed a new variant of the transfer-learning ATS model which mitigates the drawbacks of previous work. This variant first builds a ranking model across multiple datasets via FEDA, and the ranking score of each text predicted by the ranking model is used as an extra feature in the baseline model. This variants gives improvement compared to a baseline model on the development sets in terms of root-mean-square error. Furthermore, the transfer-learning model utilising multiple datasets tuned on each development set is always better than the baseline model on the corresponding test set.

We found that different datasets favour different meta properties. We therefore combined all the models looking at different meta properties together using ensemble learning. Compared to the baseline model, the combined model has a statistically significant improvement on all the test sets in terms of root-mean-square error based on a permutation test.

Date

2018-05-31

Advisors

Briscoe, Ted

Keywords

Automated Text Scoring, Natural Language Processing, Machine learning, Multi-domain learning

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Sponsorship

The Institute for Automated Language Teaching and Assessment

Collections

Theses - Computer Science and Technology