The effect of using normalized models in statistical speech synthesis

Shannon, Matt; Zen, Heiga; Byrne, William

The effect of using normalized models in statistical speech synthesis

Repository URI

http://www.dspace.cam.ac.uk/handle/1810/244406

Files

shannon2011effect.pdf (801.4 KB)

Type

Conference Object

Authors

Shannon, Matt

Zen, Heiga

Byrne, William

Abstract

The standard approach to HMM-based speech synthesis is inconsistent in the enforcement of the deterministic constraints between static and dynamic features. The trajectory HMM and autoregressive HMM have been proposed as normalized models which rectify this inconsistency. This paper investigates the practical effects of using these normalized models, and examines the strengths and weaknesses of the different models as probabilistic models of speech. The most striking difference observed is that the standard approach greatly underestimates predictive variance. We argue that the normalized models have better predictive distributions than the standard approach, but that all the models we consider are still far from satisfactory probabilistic models of speech. We also present evidence that better intra-frame correlation modelling goes some way towards improving existing normalized models.

Keywords

HMM-based speech synthesis, acoustic modelling, autoregressive HMM, trajectory HMM, normalization

Journal Title

Proceedings of the 12$^th$ Annual Conference of the International Speech Communication Association

Publisher

ISCA (International Speech Communication Association)

Publisher URL

http://mi.eng.cam.ac.uk/~sms46/papers/shannon2011effect.pdf

Rights

Attribution 2.0 UK: England & Wales

Sponsorship

This work was partly supported by the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement 213845 (EMIME).

Collections

Scholarly Works - Engineering - Information Engineering
Symplectic mapped items for data match