Repository logo
 

Assessment of L2 Oral Proficiency Using Self-Supervised Speech Representation Learning

Accepted version
Peer-reviewed

Loading...
Thumbnail Image

Change log

Abstract

A standard pipeline for automated spoken language assessment is to start with an automatic speech recognition (ASR) system and derive features that exploit transcriptions and audio. Although efficient, these approaches require ASR systems that can be used for second language (L2) speakers and preferably tuned to the specific form of test being deployed. Recently, a self-supervised speech representation-based scheme requiring no ASR was proposed. This work extends the initial analysis to a large-scale proficiency test, Linguaskill. The performance of a self-supervised, wav2vec 2.0, system is compared to a high-performance hand-crafted assessment system and a BERT-based system, both of which use ASR transcriptions. Though the wav2vec 2.0 based system is found to be sensitive to the nature of the response, it can be configured to yield comparable performance to systems requiring transcriptions and shows significant gains when appropriately combined with standard approaches.

Description

Keywords

Journal Title

9th Workshop on Speech and Language Technology in Education (SLaTE)

Conference Name

9th Workshop on Speech and Language Technology in Education (SLaTE)

Journal ISSN

Volume Title

Publisher

International Speech Communication Association

Rights and licensing

Except where otherwised noted, this item's license is described as All Rights Reserved
Sponsorship
Cambridge Assessment (Unknown)
Cambridge University Press & Assessment