Autoregressive Models for Statistical Parametric Speech Synthesis

Shannon, Matt; Zen, Heiga; Byrne, William

doi:10.1109/tasl.2012.2227740

Autoregressive Models for Statistical Parametric Speech Synthesis

Repository URI

http://www.dspace.cam.ac.uk/handle/1810/244407

Files

Primary shannon2013autoregressive.pdf (627 KB)

Type

Article

Authors

Shannon, Matt

Zen, Heiga

Byrne, William

Abstract

We propose using the autoregressive hidden Markov model (HMM) for speech synthesis. The autoregressive HMM uses the same model for parameter estimation and synthesis in a consistent way, in contrast to the standard approach to statistical parametric speech synthesis. It supports easy and efficient parameter estimation using expectation maximization, in contrast to the trajectory HMM. At the same time its similarities to the standard approach allow use of established high quality synthesis algorithms such as speech parameter generation considering global variance. The autoregressive HMM also supports a speech parameter generation algorithm not available for the standard approach or the trajectory HMM and which has particular advantages in the domain of real-time, low latency synthesis. We show how to do efficient parameter estimation and synthesis with the autoregressive HMM and look at some of the similarities and differences between the standard approach, the trajectory HMM and the autoregressive HMM. We compare the three approaches in subjective and objective evaluations. We also systematically investigate which choices of parameters such as autoregressive order and number of states are optimal for the autoregressive HMM.

Keywords

46 Information and Computing Sciences, 4602 Artificial Intelligence

Journal Title

IEEE Transactions on Audio Speech and Language Processing

Journal ISSN

2329-9290
1558-7924

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Publisher DOI

https://doi.org/10.1109/tasl.2012.2227740

Rights and licensing

Except where otherwised noted, this item's license is described as http://www.rioxx.net/licenses/all-rights-reserved

Sponsorship

European Commission (213845)

This work was supported in part by the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement 213845 (EMIME) and in part by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology).

Collections

Scholarly Works - Engineering - Information Engineering
Symplectic mapped items for data match

Autoregressive Models for Statistical Parametric Speech Synthesis

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Keywords

Journal Title

Conference Name

Journal ISSN

Volume Title

Publisher

Publisher DOI

Rights and licensing

Sponsorship

Collections