Repository logo
 

Audio-driven Robot Upper-body Motion Synthesis

Accepted version
Peer-reviewed

Type

Article

Change log

Authors

Ondras, Jan 
Celiktutan, Oya 
Bremner, Paul 

Abstract

Body language is an important aspect of human communication, which an effective human-robot interaction interface should mimic well. Human beings exchange information and convey their thoughts and feelings through gaze, facial expressions, body language and tone of voice along with spoken words, and infer 65% of the meaning of the communicated messages from these nonverbal cues. Modern robotic platforms are however limited in their ability to automatically generate behaviours that align with their speech. In this paper, we develop a neural network based system that takes audio from a user as an input and generates upper-body gestures including head, hand and torso movements of the user on a humanoid robot, namely, Softbank Robotics’ Pepper. Our system was evaluated quantitatively as well as qualitatively using web-surveys when driven by natural speech and synthetic speech. We compare the impact of generic and person-specific neural network models on the quality of synthesised movements. We further investigate the relationships between quantitative and qualitative evaluations and examine how the speaker’s personality traits affect the synthesised movements.

Description

Keywords

Facial Expression, Gestures, Hand, Humans, Robotics, Speech

Journal Title

IEEE Transactions on Cybernetics

Conference Name

Journal ISSN

1083-4419
2168-2275

Volume Title

Publisher

Institute of Electrical and Electronics Engineers
Sponsorship
Engineering and Physical Sciences Research Council (EP/L00416X/1)
Engineering and Physical Sciences Research Council (EP/R030782/1)
EPSRC