Comparative Pronunciation Assessment and Feedback with Interpretable Speech Features

Pronunciation assessment and feedback models typically focus on detecting word or phone-level errors, which can then be fed back to the learner with the phonetic cues required to fix the errors. Annotators have low levels of agreement about these kinds of errors, which affects the consistency of any model trained on these annotations and also the utility of the feedback provided. In this paper, we propose a combined pronunciation assessment and feedback system which uses interpretable speech features to align a learner's production with synthetic native and non-native speaker productions. We demonstrate that the overall alignment error correlates well with utterance-level pronunciation scores, and peaks in the alignment error can provide error detection and intuitive feedback over continuous stretches of speech, not limited to strict word or phone boundaries.

Journal Title

10th Workshop on Speech and Language Technology in Education (SLaTE)

Conference Name

10th Workshop on Speech and Language Technology in Education (SLaTE)

Publisher

International Speech Communication Association

Publisher DOI

https://doi.org/10.21437/slate.2025-8

Rights and licensing

Sponsorship

Cambridge Assessment (Unknown)

Supported by the Automated Language Teaching and Assessment (ALTA) group sponsored by Cambridge University Press and Assessment

Collections

University of Cambridge Research Outputs (Articles and Conferences)