Show simple item record

dc.contributor.authorGerlach, Lindaen
dc.contributor.authorMcDougall, Kirstyen
dc.contributor.authorKelly, Finnianen
dc.contributor.authorAlexander, Anilen
dc.contributor.authorNolan, Francisen
dc.date.accessioned2020-08-24T23:30:21Z
dc.date.available2020-08-24T23:30:21Z
dc.identifier.issn0167-6393
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/309562
dc.description.abstractThe present study investigates relationships between voice similarity ratings made by human listeners and comparison scores produced by an automatic speaker recognition system that includes phonetic, perceptually-relevant features in its modelling. The study analyses human voice similarity ratings of pairs of speech samples from unrelated speakers from an accent-controlled database (DyViS, Standard Southern British English) and the comparison scores from an i-vector-based automatic speaker recognition system using ‘auto-phonetic’ (automatically extracted phonetic) features. The voice similarity ratings were obtained from 106 listeners who each rated the voice similarity of pairings of ten speakers on a Likert scale via an online test. Correlation analysis and Multidimensional Scaling showed a positive relationship between listeners’ judgements and the automatic comparison scores. A separate analysis of the subsets of listener responses from English and German native speaker groups showed that a positive relationship was present for both groups, but that the correlation was higher for the English listener group. This work has key implications for forensic phonetics through highlighting the potential to automate part of the process of selecting foil voices in voice parade construction for which the collection and processing of human judgements is currently needed. Further, establishing that it is possible to use automatic voice comparisons using phonetic features to select similar-sounding voices has important applications in ‘voice casting’ (finding voices that are similar to a given voice) and ‘voice banking’ (saving one’s voice for future synthesis in case of an operation or degenerative disease).
dc.publisherElsevier
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleExploring the relationship between voice similarity estimates by listeners and by an automatic speaker recognition system incorporating phonetic featuresen
dc.typeArticle
prism.publicationNameSpeech Communicationen
dc.identifier.doi10.17863/CAM.56656
dcterms.dateAccepted2020-08-11en
rioxxterms.versionofrecord10.1016/j.specom.2020.08.003en
rioxxterms.versionAM
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserveden
rioxxterms.licenseref.startdate2020-08-11en
dc.contributor.orcidNolan, Francis [0000-0002-8302-5726]
dc.identifier.eissn1872-7182
rioxxterms.typeJournal Article/Reviewen
cam.issuedOnline2020-08-12en
cam.orpheus.successTue Sep 01 09:01:25 BST 2020 - Embargo updated*
rioxxterms.freetoread.startdate2022-02-28


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 International
Except where otherwise noted, this item's licence is described as Attribution-NonCommercial-NoDerivatives 4.0 International