Repository logo
 

A deep learning approach to automatic characterisation of rhythm in non-native English speech

Accepted version
Peer-reviewed

Type

Conference Object

Change log

Authors

Kyriakopoulos, Konstantinos  ORCID logo  https://orcid.org/0000-0002-7659-4541
Knill, KM 
Gales, MJF 

Abstract

A speaker's rhythm contributes to the intelligibility of their speech and can be characteristic of their language and accent. For non-native learners of a language, the extent to which they match its natural rhythm is an important predictor of their proficiency. As a learner improves, their rhythm is expected to become less similar to their L1 and more to the L2. Metrics based on the variability of the durations of vocalic and consonantal intervals have been shown to be effective at detecting language and accent. In this paper, pairwise variability (PVI, CCI) and variance (varcoV, varcoC) metrics are first used to predict proficiency and L1 of non-native speakers taking an English spoken exam. A deep learning alternative to generalise these features is then presented, in the form of a tunable duration embedding, based on attention over an RNN over durations. The RNN allows relationships beyond pairwise to be captured, while attention allows sensitivity to the different relative importance of durations. The system is trained end-to-end for proficiency and L1 prediction and compared to the baseline. The values of both sets of features for different proficiency levels are then visualised and compared to native speech in the L1 and the L2.

Description

Keywords

prosody, rhythm, CALL, speech recognition

Journal Title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Conference Name

Interspeech 2019

Journal ISSN

2308-457X
1990-9772

Volume Title

2019-September

Publisher

ISCA

Rights

All rights reserved
Sponsorship
Cambridge Assessment (unknown)
ALTA Institute