Repository logo
 

Automatic assessment of voice similarity and its implications for forensic applications


Loading...
Thumbnail Image

Type

Change log

Authors

Abstract

Speakers of varying degrees of similarity are required for forensic purposes, for instance officer or witness protection, relevant populations for forensic speaker recognition, or voice parades and earwitness assessments. In the past decade, the collection of speech has become much easier, offering the opportunity to assess the suitability of large quantities of speakers for such purposes. However, manual assessment of voices by a forensic phonetician or the appointment of lay listeners to judge voice similarity is time-consuming and costly, thus limiting the number of voices that can be processed in a case. Automatic similarity assessment of speakers could radically reduce processing time while expanding the voice search pool before additional verification by a trained phonetician. This thesis explores the selection of similar-sounding speakers based on perceptual judgements and automatically measured features for such forensic applications. The primary research questions addressed are whether automatic speaker recognition may be used to assess perceived voice similarity and how large databases may be filtered according to speaker similarity. The study employs various combinations of features, speaker modelling approaches, and distance measures and uses correlation analyses, clustering methods, and ranking to find subgroups of similar-sounding speakers. A listener experiment is conducted to gain a deeper understanding of the perception of extreme similarity among unrelated speakers. The research highlights that it is indeed possible to filter similar-sounding speakers from large databases in a semi-automatic manner to the level of hard-to-distinguish unrelated speakers. Applications in likelihood ratio-based forensic automatic speaker recognition, voice parades, and speech synthesis are explored to varying degrees with key findings including that the perceived similarity of the relevant population to the questioned speaker may bias the strength of evidence, and that similarity of synthetic speech to a target speaker may not be assessed in the same way as natural speech. The findings of this dissertation have implications for future research in the field of speaker similarity and have practical applications in forensics.

Description

Date

2024-05-27

Advisors

McDougall, Kirsty

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as All rights reserved
Sponsorship
Cambridge European & Selwyn Oxford Wave Research Studentship