Repository logo

Assessment of L2 Oral Proficiency Using Self-Supervised Speech Representation Learning

Accepted version


Conference Object

Change log


Bannò, Stefano 
Knill, Katherine M 
Matassoni, Marco 
Raina, Vyas 
Gales, Mark 


A standard pipeline for automated spoken language assessment is to start with an automatic speech recognition (ASR) system and derive features that exploit transcriptions and audio. Although efficient, these approaches require ASR systems that can be used for second language (L2) speakers and preferably tuned to the specific form of test being deployed. Recently, a self-supervised speech representation-based scheme requiring no ASR was proposed. This work extends the initial analysis to a large-scale proficiency test, Linguaskill. The performance of a self-supervised, wav2vec 2.0, system is compared to a high-performance hand-crafted assessment system and a BERT-based system, both of which use ASR transcriptions. Though the wav2vec 2.0 based system is found to be sensitive to the nature of the response, it can be configured to yield comparable performance to systems requiring transcriptions and shows significant gains when appropriately combined with standard approaches.



Journal Title

9th Workshop on Speech and Language Technology in Education (SLaTE)

Conference Name

9th Workshop on Speech and Language Technology in Education (SLaTE)

Journal ISSN

Volume Title


Cambridge University Press & Assessment