A Pulse Model in Log-domain for a Uniform Synthesizer
Proceedings of the 9th ISCA Speech Synthesis Workshop
9th ISCA Speech Synthesis Workshop
International Speech Communication Association
MetadataShow full item record
Degottex, G., Lanchantin, P., & Gales, M. (2016). A Pulse Model in Log-domain for a Uniform Synthesizer. Proceedings of the 9th ISCA Speech Synthesis Workshop, 230-236. https://doi.org/10.17863/CAM.9734
The quality of the vocoder plays a crucial role in the performance of parametric speech synthesis systems. In order to improve the vocoder quality, it is necessary to reconstruct as much of the perceived components of the speech signal as possible. In this paper, we first show that the noise component is currently not accurately modelled in the widely used STRAIGHT vocoder, thus, limiting the voice range that can be covered and also limiting the overall quality. In order to motivate a new, alternative, approach to this issue, we present a new synthesizer, which uses a uniform representation for voiced and unvoiced segments. This synthesizer has also the advantage of using a simple signal model compared to other approaches, thus offering a convenient and controlled alternative for future developments. Experiments analysing the synthesis quality of the noise component shows improved speech reconstruction using the suggested synthesizer compared to STRAIGHT. Additionally an experiment about analysis/resynthesis shows that the suggested synthesizer solves some of the issues of another uniform vocoder, Harmonic Model plus Phase Distortion (HMPD). In text-to-speech synthesis, it outperforms HMPD and exhibits a similar, or only slightly worse, quality to STRAIGHT’s quality, which is encouraging for a new vocoding approach.
parametric speech synthesis, vocoder, pulse model
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 655764. The research for this paper was also partly supported by EPSRC grant EP/I031022/1 (Natural Speech Technology).
Embargo Lift Date
This record's DOI: https://doi.org/10.17863/CAM.9734
This record's URL: https://www.repository.cam.ac.uk/handle/1810/264297