Efficient Uncertainty Estimation and Sequence Modelling

Fathullah, Yassir

doi:https://doi.org/10.17863/CAM.128883

Efficient Uncertainty Estimation and Sequence Modelling

Repository URI

https://www.repository.cam.ac.uk/handle/1810/400866

Repository DOI

https://doi.org/10.17863/CAM.128883

Files

Primary Thesis (11.88 MB)

Type

Thesis

Authors

Fathullah, Yassir

Abstract

Transformer-based autoregressive sequence models have revolutionised natural language processing and speech processing, achieving state-of-the-art performance on a wide range of tasks. However, their deployment in real-world scenarios, especially safety-critical applications like autonomous systems or medical diagnosis, necessitates not only high accuracy but also robust and efficient methods for estimating the uncertainty associated with their predictions. Furthermore, extending these powerful, predominantly text-based models to new modalities like audio, and addressing the inherent computational challenges posed by the quadratic complexity of self-attention in long sequences, remain significant research problems limiting their deployment in resource-controlled settings.

This thesis investigates various aspects of efficiency within deep sequence modelling, with a primary focus on reliable, efficient uncertainty estimation and sequence representation. The first part addresses the challenge of efficiently capturing predictive uncertainty, often derived from computationally expensive ensembles. We explore Ensemble Distribution Distillation (EDD) techniques to compress the distributional knowledge of an ensemble into a single, compact student model, introducing improved training objectives. We further propose Self-Distribution Distillation (S2D) and its hierarchical extension (H2D) as methods enabling a single model to implicitly capture ensemble-like diversity. Additionally, we introduce Non-Autoregressive Proxy (NAP) models – lightweight networks trained to directly predict sequence-level attributes (e.g., uncertainty, performance metrics) from encoder representations, bypassing the expensive autoregressive decoding process entirely and enabling efficient downstream applications.

Secondly, we extend the treatment of uncertainty estimation to the context of ranking, specifically investigating the emerging field of using Large Language Models (LLMs) for assessment of NLG outputs. We introduce a generalised Product-of-Experts (PoE) framework that leverages pairwise LLM judgments, enabling robust ranking even with incomplete comparison data and mitigating the quadratic computational cost of full pairwise comparisons. Within this framework, we derive novel uncertainty metrics to improve the modelling and efficiency of ranking NLG outputs and show how to estimate our confidence in the estimated rankings.

Finally, addressing the computational bottleneck of self-attention for long sequences, we investigate structured recurrent models, proposing Multi-Head Structured State Space Models (MH-SSMs) with a novel inter-head gating mechanism to capture diverse temporal dynamics efficiently. Experimental results on large-scale ASR benchmarks demonstrate that these structured recurrent approaches achieve competitive and state-of-the-art performance while maintaining linear computational complexity, offering a powerful alternative for efficient long-sequence modelling.

Date

2025-07-05

Advisors

Gales, Mark

Keywords

Distillation, Distribution Distillation, Model Efficiency, Non-Autoregressive, Sequence Modeling, State Space Models, State Spaces, Uncertainty Estimation

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International (CC BY 4.0)

Sponsorship

Supported by the Gates Cambridge Trust (grant OPP1144 from the Gates Foundation).

Collections

Theses - Engineering

Efficient Uncertainty Estimation and Sequence Modelling

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Date

Advisors

Keywords

Qualification

Awarding Institution

Rights and licensing

Sponsorship

Collections