Repository logo
 

Annotation of L2 English Speech for Developing and Evaluating End-to-End Spoken Grammatical Error Correction

Accepted version
Peer-reviewed

Loading...
Thumbnail Image

Change log

Abstract

A challenge for automated spoken language assessment and feedback is the lack of high quality manually annotated L2 learner corpora, even for a common language like English. At the same time the popularity of end-to-end systems, which integrate speech recognition (ASR) with downstream tasks, has increased. This paper describes the annotation of a corpus that supports end-to-end system evaluation for Spoken Grammatical Error Correction (SGEC). There raises a number of challenges. This is further complicated as the annotation is preferably able to handle evaluation and development of individual modules, such as ASR, disfluency detection and GEC, combinations of these modules, as well as the final end-to-end system. A detailed description of the process used to annotate data from the Linguaskill Speaking test, a multi-level test for candidates from CEFR levels below A1 to C1 and above, is given. An example of how the corpus has been used to evaluate an advanced SGEC system is presented.

Description

Keywords

Journal Title

Conference Name

SLaTE 2023: 9th Workshop on Speech and Language Technology in Education

Journal ISSN

Volume Title

Publisher

International Speech Communication Association

Publisher DOI

Publisher URL

Rights and licensing

Except where otherwised noted, this item's license is described as All Rights Reserved
Sponsorship
Cambridge Assessment (Unknown)