Technical and societal implications of machine learning security

Khachaturov, David

doi:https://doi.org/10.17863/CAM.129123

Technical and societal implications of machine learning security

Repository URI

https://www.repository.cam.ac.uk/handle/1810/401172

Repository DOI

https://doi.org/10.17863/CAM.129123

Files

Primary Thesis (8.05 MB)

Type

Thesis

Authors

Khachaturov, David

Abstract

Machine learning has become increasingly integrated into critical applications, making its security an increasingly pressing issue. This thesis addresses the practical and societal dimensions of ML security. On the attack front, the thesis demonstrates new adversarial techniques. For computer vision, it shows that simple human-crafted markers can fool state-of-the-art image classifiers. For multimodal systems, it presents ``audio jailbreaks'' -- imperceptible audio perturbations that consistently bypass an LLM's alignment. In addition, experiments reveal scaling behaviour in data poisoning: larger language models are more susceptible to poisoning than smaller ones, learning harmful behaviours from minimal malicious training data. To better quantify model vulnerability, the thesis explores the use of effective dimensionality, an information-theoretic measure of a neural network's complexity, as a new robustness metric. Empirical results show that models with lower effective dimensionality exhibit greater resistance to adversarial manipulation.

On the defence front, a weakness in LLM watermarking schemes is identified: in multi-turn dialogues, unwatermarked participants inadvertently mimic a watermarked model's distinct lexical patterns. The thesis develops an Adversarial Suffix Filtering mechanism -- a lightweight, model-agnostic input sanitisation pipeline that removes malicious prompt suffixes, protecting LLMs from adversarial prompt injections without requiring any changes to the underlying model. Finally, on a governance front, the thesis proposes a policy framework of tiered anonymity. By linking identity verification requirements to an account's audience reach on social media, this framework aims to dampen large-scale AI-generated misinformation while maintaining pseudonymity for ordinary users.

These contributions reflect an interdisciplinary approach that integrates technical interventions with governance measures, illustrating how layered strategies of attacking, measuring, defending, and governing can collectively enhance the security and trustworthiness of machine learning systems in society.

Date

2025-12-05

Advisors

Mullins, Robert
Anderson, Ross

Keywords

machine learning security, machine learning, data poisoning, prompt injection, large language models, multimodal machine learning, adversarial defence, AI governance

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Sponsorship

University of Cambridge Harding Distinguished Postgraduate Scholars Programme

Collections

Theses - Computer Science and Technology

Technical and societal implications of machine learning security

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Date

Advisors

Keywords

Qualification

Awarding Institution

Rights and licensing

Sponsorship

Collections