Repository logo

Towards Acoustic-to-Articulatory Inversion for Pronunciation Training

Accepted version


Conference Object

Change log


McGhee, Charles 
Knill, Kate 
Gales, Mark 


Visual feedback of articulators using Electromagnetic- Articulography (EMA) has been shown to aid acquisition of non-native speech sounds. Using physical EMA sensors is expensive and invasive making it impractical for providing real-world pronunciation feedback. Our work focuses on us- ing neural Acoustic-to-Articulatory Inversion (AAI) models to map speech directly to EMA sensor positions. Self-Supervised Learning (SSL) speech models, such as HuBERT, can produce representations of speech that have been shown to significantly improve performance on AAI tasks. Probing experiments have indicated that certain layers and iterations of SSL models pro- duce representations that may yield better inversion perfor- mance than others. In this paper, we build on these probing results to create an AAI model that improves upon a state-of- the-art baseline inversion model and evaluate the model’s suit- ability for pronunciation training.



Journal Title

Conference Name

The 9th Workshop on Speech and Language Technology in Education

Journal ISSN

Volume Title


Publisher DOI

Publisher URL

Cambridge Assessment (unknown)
EPSRC DTP and Vice Chancellor's Award.
Is supplemented by: