A deep learning approach to automatic characterisation of rhythm in non-native English speech

Kyriakopoulos, K; Knill, KM; Gales, MJF

A deep learning approach to automatic characterisation of rhythm in non-native English speech

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/296837

Repository DOI

https://doi.org/10.17863/CAM.43882

Files

Accepted version (241.64 KB)

Type

Conference Object

Authors

Kyriakopoulos, Konstantinos

https://orcid.org/0000-0002-7659-4541

Knill, KM

Gales, MJF

Abstract

A speaker's rhythm contributes to the intelligibility of their speech and can be characteristic of their language and accent. For non-native learners of a language, the extent to which they match its natural rhythm is an important predictor of their proficiency. As a learner improves, their rhythm is expected to become less similar to their L1 and more to the L2. Metrics based on the variability of the durations of vocalic and consonantal intervals have been shown to be effective at detecting language and accent. In this paper, pairwise variability (PVI, CCI) and variance (varcoV, varcoC) metrics are first used to predict proficiency and L1 of non-native speakers taking an English spoken exam. A deep learning alternative to generalise these features is then presented, in the form of a tunable duration embedding, based on attention over an RNN over durations. The RNN allows relationships beyond pairwise to be captured, while attention allows sensitivity to the different relative importance of durations. The system is trained end-to-end for proficiency and L1 prediction and compared to the baseline. The values of both sets of features for different proficiency levels are then visualised and compared to native speech in the L1 and the L2.

Keywords

prosody, rhythm, CALL, speech recognition

Journal Title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Conference Name

Interspeech 2019

Journal ISSN

2308-457X
1990-9772

Volume Title

2019-September

Publisher

ISCA

Publisher DOI

https://doi.org/10.21437/Interspeech.2019-3186

Rights

Sponsorship

Cambridge Assessment (unknown)

ALTA Institute

Collections

Cambridge University Research Outputs