Repository logo

Semantic Cues Modulate Children's and Adults' Processing of Audio-Visual Face Mask Speech.

Published version

Change log


Schwarz, Julia 
Li, Katrina Kechun 
Sim, Jasper Hong 
Zhang, Yixin 
Buchanan-Worster, Elizabeth 


During the COVID-19 pandemic, questions have been raised about the impact of face masks on communication in classroom settings. However, it is unclear to what extent visual obstruction of the speaker's mouth or changes to the acoustic signal lead to speech processing difficulties, and whether these effects can be mitigated by semantic predictability, i.e., the availability of contextual information. The present study investigated the acoustic and visual effects of face masks on speech intelligibility and processing speed under varying semantic predictability. Twenty-six children (aged 8-12) and twenty-six adults performed an internet-based cued shadowing task, in which they had to repeat aloud the last word of sentences presented in audio-visual format. The results showed that children and adults made more mistakes and responded more slowly when listening to face mask speech compared to speech produced without a face mask. Adults were only significantly affected by face mask speech when both the acoustic and the visual signal were degraded. While acoustic mask effects were similar for children, removal of visual speech cues through the face mask affected children to a lesser degree. However, high semantic predictability reduced audio-visual mask effects, leading to full compensation of the acoustically degraded mask speech in the adult group. Even though children did not fully compensate for face mask speech with high semantic predictability, overall, they still profited from semantic cues in all conditions. Therefore, in classroom settings, strategies that increase contextual information such as building on students' prior knowledge, using keywords, and providing visual aids, are likely to help overcome any adverse face mask effects.



Psychology, speech processing, face masks, cued shadowing, audio-visual integration, semantic prediction, language development, internet-based data collection, bottom-up vs. top-down

Journal Title

Front Psychol

Conference Name

Journal ISSN


Volume Title


Frontiers Media SA
ESRC (2159864)
This research was funded by a grant for a project entitled Speech Perception through Masks in School Contexts (PerMaSC) from the Cambridge Language Sciences Incubator Fund (Principal Investigator: KM; Lead Applicants: JSc and KKL) and a UKRI grant to JS [ES/J500033/1]. For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) license to any Author Accepted Manuscript version arising. Additional anonymized data and scripts for the statistical analysis related to this publication are available at the OSF data repository: DOI 10.17605/OSF.IO/ETVDG.
Is supplemented by: