Repository logo
 

Rapid Nonlinear Speaker Adaptation for Large-Vocabulary Continuous Speech Recognition

Accepted version
Peer-reviewed

Loading...
Thumbnail Image

Change log

Abstract

Recently, kernel eigenvoices were revisited using kernel representations of distributions for rapid nonlinear speaker adaptation. These representations reassure the validity of the adapted distribution functions and enable expectation-maximisation training. Though gains have been shown in terms of word error rate for rapid speaker adaptation, this approach leads to an increase in decoding cost as the number of likelihood evaluations is amplified. The present paper addresses this issue by providing a coherent framework for systematic probabilistic approaches aimed at reducing the recognition cost and yet yielding equally powerful adapted models. The common denominator of such approaches is the use of probabilistic criteria, such as Kullback-Leibler divergence. However, in the general case, the resulting adapted models have full covariance matrices. In order to overcome this issue, the use of predictive semi-tied transforms to yield diagonal covariances for decoding is investigated in this paper. Experimental results are presented on a largevocabulary conversational telephone task.

Description

Journal Title

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3

Conference Name

Interspeech 2012 : 13th Annual Conference of the International Speech Communication Association

Journal ISSN

Volume Title

2

Publisher

Publisher DOI

Publisher URL

Rights and licensing

Except where otherwised noted, this item's license is described as All Rights Reserved