Audio-driven Robot Upper-body Motion Synthesis
Accepted version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Abstract
Body language is an important aspect of human communication, which an effective human-robot interaction interface should mimic well. Human beings exchange information and convey their thoughts and feelings through gaze, facial expressions, body language and tone of voice along with spoken words, and infer 65% of the meaning of the communicated messages from these nonverbal cues. Modern robotic platforms are however limited in their ability to automatically generate behaviours that align with their speech. In this paper, we develop a neural network based system that takes audio from a user as an input and generates upper-body gestures including head, hand and torso movements of the user on a humanoid robot, namely, Softbank Robotics’ Pepper. Our system was evaluated quantitatively as well as qualitatively using web-surveys when driven by natural speech and synthetic speech. We compare the impact of generic and person-specific neural network models on the quality of synthesised movements. We further investigate the relationships between quantitative and qualitative evaluations and examine how the speaker’s personality traits affect the synthesised movements.
Description
Keywords
Journal Title
Conference Name
Journal ISSN
2168-2275
Volume Title
Publisher
Publisher DOI
Sponsorship
Engineering and Physical Sciences Research Council (EP/R030782/1)