A hierarchical attention based model for off-topic spontaneous spoken response detection

Malinin, A; Knill, K; Gales, MJF

A hierarchical attention based model for off-topic spontaneous spoken response detection

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/279179

Repository DOI

https://doi.org/10.17863/CAM.26559

Files

Accepted version (690.06 KB)

Type

Conference Object

Authors

Malinin, A

Knill, Katherine

https://orcid.org/0000-0003-1292-2769

Gales, MJF

Abstract

Automatic spoken language assessment and training systems are becoming increasingly popular to handle the growing demand to learn languages. However, current systems often assess only fluency and pronunciation, with limited content-based features being used. This paper examines one particular aspect of content-assessment, off-topic response detection. This is important for deployed systems as it ensures that candidates understood the prompt, and are able to generate an appropriate answer. Previously proposed approaches typically require a set of prompt-response training pairs, which lim- its flexibility as example responses are required whenever a new test prompt is introduced. Recently, the attention based neural topic model (ATM) was presented, which can assess the relevance of prompt-response pairs regardless of whether the prompt was seen in training. This model uses a bidirectional Recurrent Neural Network (BiRNN) embedding of the prompt combined with an attention mechanism to attend over the hidden states of a BiRNN embedding of the response to compute a fixed-length embedding used to predict relevance. Unfortunately, performance on prompts not seen in the training data is lower than on seen prompts. Thus, this paper adds the following contributions: several im- provements to the ATM are examined; a hierarchical variant of the ATM (HATM) is proposed, which explicitly uses prompt similarity to further improve performance on unseen prompts by interpolating over prompts seen in training data given a prompt of interest via a second attention mechanism; an in-depth analysis of both models is conducted and main failure mode identified. On spontaneous spo- ken data, taken from BULATS tests, these systems are able to assess relevance to both seen and unseen prompts

Keywords

Spoken Language Assessment, Relevance Assessment, Deep Learning

Journal Title

2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings

Conference Name

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

Volume Title

2018-January

Publisher

IEEE

Publisher DOI

https://doi.org/10.1109/ASRU.2017.8268963

Rights

http://www.rioxx.net/licenses/all-rights-reserved

Sponsorship

Cambridge Assessment (unknown)
EPSRC (1464018)

Collections

Cambridge University Research Outputs