Comparative Pronunciation Assessment and Feedback with Interpretable Speech Features
Accepted version
Peer-reviewed
Repository URI
Repository DOI
Change log
Abstract
Pronunciation assessment and feedback models typically focus on detecting word or phone-level errors, which can then be fed back to the learner with the phonetic cues required to fix the errors. Annotators have low levels of agreement about these kinds of errors, which affects the consistency of any model trained on these annotations and also the utility of the feedback provided. In this paper, we propose a combined pronunciation assessment and feedback system which uses interpretable speech features to align a learner's production with synthetic native and non-native speaker productions. We demonstrate that the overall alignment error correlates well with utterance-level pronunciation scores, and peaks in the alignment error can provide error detection and intuitive feedback over continuous stretches of speech, not limited to strict word or phone boundaries.
