Speaker diarisation and longitudinal linking in multi-genre broadcast data

Karanasou, P; Gales, MJF; Lanchantin, P; Liu, X; Qian, Y; Wang, L; Woodland, PC; Zhang, C

Speaker diarisation and longitudinal linking in multi-genre broadcast data

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/251270

Files

Accepted version (227.9 KB)

Type

Conference Object

Authors

Karanasou, P

Gales, MJF

Lanchantin, P

Liu, X

Qian, Y

Show 3 more

Abstract

This paper presents a multi-stage speaker diarisation system with longitudinal linking developed on BBC multi-genre data for the 2015 Multi-Genre Broadcast (MGB) challenge. The basic speaker diarisation system draws on techniques from the Cambridge March 2005 system with a new deep neural network (DNN)-based speech/non speech segmenter. A newly developed linking stage is next added to the basic diarisation output aiming at the identification of speakers across multiple episodes of the same series. The longitudinal constraint imposes an incremental processing of the episodes, where speaker labels for each episode can be obtained using only material from the episode in question, and those broadcast earlier in time. The nature of the data as well as the longitudinal linking constraint position this diarisation task as a new open-research topic, and a particularly challenging one. Different linking clustering metrics are compared and the lowest within-episode and cross-episode DER scores are achieved on the MGB challenge evaluation set.

Keywords

speaker diarisation, speaker segmentation, agglomerative clustering, longitudinal linking

Journal Title

2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings

Conference Name

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Publisher

IEEE

Publisher DOI

https://doi.org/10.1109/ASRU.2015.7404859

Rights

http://www.rioxx.net/licenses/all-rights-reserved

Sponsorship

This work is in part supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology). C. Zhang is also supported by a Cambridge International Scholarship from the Cambridge Commonwealth, European & International Trust.

Collections

Scholarly Works - Engineering
Symplectic mapped items for data match