Show simple item record

dc.contributor.authorDegottex, Gillesen
dc.contributor.authorLanchantin, Pierreen
dc.contributor.authorGales, Marken
dc.date.accessioned2017-05-18T12:53:09Z
dc.date.available2017-05-18T12:53:09Z
dc.date.issued2016-09-15en
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/264297
dc.description.abstractThe quality of the vocoder plays a crucial role in the performance of parametric speech synthesis systems. In order to improve the vocoder quality, it is necessary to reconstruct as much of the perceived components of the speech signal as possible. In this paper, we first show that the noise component is currently not accurately modelled in the widely used STRAIGHT vocoder, thus, limiting the voice range that can be covered and also limiting the overall quality. In order to motivate a new, alternative, approach to this issue, we present a new synthesizer, which uses a uniform representation for voiced and unvoiced segments. This synthesizer has also the advantage of using a simple signal model compared to other approaches, thus offering a convenient and controlled alternative for future developments. Experiments analysing the synthesis quality of the noise component shows improved speech reconstruction using the suggested synthesizer compared to STRAIGHT. Additionally an experiment about analysis/resynthesis shows that the suggested synthesizer solves some of the issues of another uniform vocoder, Harmonic Model plus Phase Distortion (HMPD). In text-to-speech synthesis, it outperforms HMPD and exhibits a similar, or only slightly worse, quality to STRAIGHT’s quality, which is encouraging for a new vocoding approach.
dc.description.sponsorshipThis project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 655764. The research for this paper was also partly supported by EPSRC grant EP/I031022/1 (Natural Speech Technology).
dc.language.isoenen
dc.publisherInternational Speech Communication Association
dc.subjectparametric speech synthesisen
dc.subjectvocoderen
dc.subjectpulse modelen
dc.titleA Pulse Model in Log-domain for a Uniform Synthesizeren
dc.typeConference Object
prism.endingPage236
prism.publicationDate2016en
prism.publicationNameProceedings of the 9th ISCA Speech Synthesis Workshopen
prism.startingPage230
dc.identifier.doi10.17863/CAM.9734
dcterms.dateAccepted2016-06-19en
rioxxterms.versionVoRen
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserveden
rioxxterms.licenseref.startdate2016-09-15en
dc.contributor.orcidGales, Mark [0000-0002-5311-8219]
rioxxterms.typeConference Paper/Proceeding/Abstracten
pubs.conference-name9th ISCA Speech Synthesis Workshopen
pubs.conference-start-date2016-09-13en
cam.orpheus.successThu Nov 05 11:57:29 GMT 2020 - The item has an open VoR version.*
rioxxterms.freetoread.startdate2100-01-01


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record