Repository logo
 

Theses - Computer Science and Technology

Browse

Recent Submissions

Now showing 1 - 20 of 206
  • ItemOpen Access
    Graph Neural Networks for Multi-Agent Learning
    Kortvelesy, Ryan
    Over time, machine learning research has placed an increasing emphasis on utilising relational inductive biases. By focusing on the underlying relationships in graph structured data, it has become possible to create models with superior performance and generalisation. Given different graph topologies, these models can manifest in the form of CNNs (on grid graphs), RNNs (on line graphs), and Transformers (on fully connected graphs). However, all of these architectures can be subsumed as special cases under graph neural networks (GNNs), a framework for operating over any graph-structured data. Taking advantage of relational inductive biases, GNNs use local filters to learn functions that generalise over high-dimensional data. They are particularly useful in the context of multi-agent learning, where most data is structured as a graph (e.g., communication links in a multi-robot team generate a graph connectivity). In this thesis, we study the field of multi-agent learning. Recent advances in the domain are promising, but further innovation is required to tackle problems such as learning under partial observability and facilitating collaborative behaviour. GNNs provide a useful framework for solving these problems, as their decentralised architecture allows them to utilise global information while maintaining generalisation and sample-efficiency. Despite these benefits, existing GNN-based approaches to multi-agent problems have only been applied with a limited scope. In this thesis, we investigate the weaknesses of current GNN architectures, and propose extensions to improve their capabilities. Furthermore, we branch out into new learning paradigms, allowing GNN-based approaches to tackle new applications. We start this thesis by developing a modular framework to analyse existing GNN-based approaches to multi-robot tasks. By ablating over the submodules within our framework, we can draw conclusions about the best allocation of representational complexity within a GNN architecture. Our analysis highlights the need for mapping to a learned latent space prior to aggregation, allowing the network to preserve the most important information. To avoid the loss of information through naive aggregation, our subsequent work strives to find an architecture that allows the aggregator itself to be learned. We introduce a novel method that parametrises the space of all standard aggregators, and validate its performance in graph learning problems. Next, we expand our focus to the related problem of learning over sets. We introduce a novel set autoencoder, allowing a bijective mapping from sets to fixed-size embeddings to be learned in an unsupervised manner. To demonstrate the usefulness of this architecture, we use it to create a task-agnostic multi-agent communication strategy. In our final work, we use GNNs to tackle the credit assignment problem in multi-agent reinforcement learning. Leveraging the decentralised manner in which GNNs combine local and aggregated neighbouring information, we perform value factorisation with a GNN-based architecture. This approach maintains the ability to represent non-factorisable value functions, yet performs factorisation when it is possible. We conclude this thesis by reflecting over our contributions, which span the fields of supervised learning, unsupervised learning, and reinforcement learning. The new architectures that we have developed open up new avenues of research---not only in applications, but in extensions to our methods. For each research topic in our thesis, we propose future work that can further impact the field.
  • ItemOpen Access
    Decentralised protocol-independent automation in smart buildings
    Safronov, Vadim
    Providing automation and reacting in time are primary functions of a modern smart building. That is done thanks to sensors and actuators distributed across the architecture. Smart building devices employ a diverse set of communication protocols, and the number of such devices grows every year. A typical modern Building Automation System (BAS) is not prepared to effectively handle that growing amount of heterogeneous data and represents a single point of failure as well as a potential source of congestion and unacceptable delays. Most BAS platforms run automation workflows over IP-based networks, making non-IP protocols, such as Low-Power Wide Area Networks (LPWAN), incompatible with building automation services. Existing solutions for integrating non-IP wireless protocols into smart building automation are ineffective as reduce application performance, increase latency and drain sensor battery life. A centralised approach also assumes aggregation of sensor data from the entire building at a single point, potentially infringing occupants privacy. This dissertation proposes a decentralised, IP-agnostic approach for improving efficiency and robustness of smart building automation. By offloading BAS functions to a hierarchy of smart gateways (SGWs), the system manages device interoperation and provides seamless building automation in a decentralised manner, irrespective of the underlying communication protocols. This can be achieved as traditional gateways are underutilised and can provide automation and interoperation logic on top of their basic functions. The decentralised architecture, by its nature, also has a potential for improving occupant privacy and security through edge protection. The dissertation demonstrates the feasibility, efficiency, and robustness of the decentralised automation compared to the classic centralised BAS approach, by evaluating its prototype within realistic MQTT automation scenarios involving 300 LoRaWAN and Wi-Fi sensors and actuators. The dissertation also introduces tools and techniques for large-scale evaluation and presents a concrete example of SGW privacy-preserving threat protection, aimed at enhancing building occupants' security and privacy.
  • ItemEmbargo
    Deep learning-based medical image reconstruction for multi-contrast magnetic resonance imaging
    Yang, Junwei
    With the emergence of deep learning, their successful applications have been witnessed in various computer-vision tasks. As computational power has grown over the past decades, it has become possible to design deeper and more complex architectures to better extract features from the raw data, which can ultimately achieve better performance. However, medical images have different properties compared to natural images, and straightforward applications of deep networks for these tasks may not be feasible in clinical settings. Therefore, in this thesis, we propose deep learning-based frameworks that leverage additional information that is typically available in clinical conditions, with a focus on the modality of magnetic resonance imaging (MRI). Specifically, we investigate the inverse problem of MRI reconstruction, aiming to accelerate MRI acquisition by under-sampling while preserving the quality of acquired images. First, we address the issue of effectively leveraging highly correlated information across contrasts in sequentially acquired multi-contrast MR scans. We design a framework to optimise the under-sampling pattern of a target MRI contrast, by exploiting the com- plementary fully-sampled reference contrast through a novel synthesis-based information extraction strategy. Meanwhile, the reconstruction network is jointly optimised to provide guidance on the selection of sampled positions, allowing for better exploitation given limited measurements to be sampled. Second, we propose to leverage information from the original acquisition domain of MRI, known as the *k*-space, which is associated with, but different from the image domain. To allow for improved reconstruction with information in both domains, we present a dual-domain reconstruction framework with specifically designed regularisations to exploit the coupled information during optimisation. Moreover, the framework is extended for scenarios of multi-contrast MRI, with the focus on reducing the cross-contrast misalignment in both domains to better leverage the information from the reference contrast. Finally, we deal with a specific clinical scenario where it is impractical to acquire largefully-sampled dataset by introducing a scan-specific multi-contrast MRI reconstruction framework. To effectively learn from limited sparsely sampled *k*-space data of highly imbalanced distribution, the model can adaptively learn from such sparse data without additional regularisation on a coarse-to-fine basis, to better capture information of various frequency levels. Moreover, the technique of implicit neural representation learning is employed to enable reconstruction of a single subject without supervision of fully-sampled data. To maximise the assistance power from the reference contrast, a Siamese architecture is designed to simultaneously reconstruct both contrasts, which can also mitigate overfitting due to the lack of supervision. Through comprehensive experiments conducted across multiple datasets, we demon- strate the superiority of our proposed frameworks. These frameworks achieve improvements by leveraging diverse information types, including different contrasts, tasks, domains, and frequency bands from the data. Meanwhile, the developed methods covered both super- vised and unsupervised learning paradigms. The approaches explored in our research not only contribute to the current state of MRI reconstruction but also offer possible directions for future investigation in this field, potentially leading to further improvements in MRI reconstruction.
  • ItemOpen Access
    Exploring Neuroimaging-Specific Deep Learning Biases: Uncertainty, Dynamic Graphs, and Communities
    Campbell, Alexander
    The human brain is a complex dynamical system composed of numerous interacting regions. It has long been a subject of immense interest for neuroscientists seeking to understand brain function. Recent technological advancements have introduced non-invasive functional neuroimaging techniques, offering unparalleled insights into neural activity. Diverse imaging modalities, from electroencephalography's measurement of electrical signals to functional magnetic resonance imaging's visualisation of blood flow, have generated vast datasets that serve as a bedrock for quantitative neuroscience. Leveraging this data, new methods are emerging for investigating both normal brain function and pathological conditions. The use of machine learning (ML) in quantitative neuroscience has played a pivotal role in advancing understanding of brain function. Specifically, ML has been instrumental in discovering neuroimaging biomarkers that are fundamental to assessments of brain function. More recently, deep learning (DL), a subfield of ML, has become the forefront of this pursuit. To date, DL's accomplishments in the field of functional neuroimaging have predominantly centred on brain disorder classification, achieving performance levels on par with domain experts, albeit within mainly research settings. DL's success can be attributed, in part, to its unparalleled ability for representation learning, where highly complex non-linear patterns are extracted directly from raw neuroimaging data. This proficiency is further refined by the inductive biases incorporated in DL model designs, specifically crafted to fit the unique characteristics of the input data. This thesis embarks on a deeper exploration of DL methods applied to functional neuroimaging data, moving beyond conventional classification tasks. We introduce novel neural network architectures, influenced by neuroscientific-specific inductive biases, to enhance the representation learning process. These biases focus on efficient uncertainty quantification, dynamic brain graph structure learning, and dynamic brain graph community detection, all grounded in established neuroscientific findings. Through meticulous experiments, we prove these biases boost performance, enriching our understanding of model trustworthiness, robustness, and interpretability. The neuroimaging-focused DL architectures presented in this thesis offer the prospect of pioneering innovative data representations, driving transformative insights that stand to redefine the realm of functional neuroimaging research.
  • ItemOpen Access
    Federated self-supervised learning
    Gao, Yan
    Federated learning (FL) has garnered significant attention from both research and industrial communities due to its distinctive ability to facilitate collaborative learning from large-scale datasets without compromising users’ data privacy. However, current FL practices predominantly focus on supervised learning tasks, necessitating the availability of high-quality, domain-specific labels alongside the data. This prerequisite constrains the implementation of FL in numerous real-world applications where access to such labels at the edge is limited. Self-supervised learning (SSL) enables the acquisition of representations from unlabelled data, which can subsequently be employed to address various downstream tasks. Integrating SSL with FL presents considerable advantages beyond privacy-preserving training, including robust distributed representation learning, enhanced scalability, and resilience to noisy data. Despite its potential, research on SSL within the context of FL remains scarce. This thesis endeavours to bridge this research gap by illuminating the underlying challenges and proposing potential solutions to advance the training of SSL models in FL environments, specifically within the speech, video, and image domains. First, we present a systematic investigation into the feasibility and complexities of implementing speech SSL in FL contexts concerning hardware limitations and algorithmic aspects, and provide an elementary solution to the efficiency issue of training with short input sequences. Second, we delve into the unexplored area of video-SSL in FL and propose a novel FL framework, incorporating stochastic weighted averaging during aggregation and partial weights updating, which achieves new state-of-the-art performance on downstream tasks. Third, we examine the prevalent issue of model divergence, instigated by clients’ bias in the area of image-federated SSL. We introduce a novel aggregation scheme, designed to mitigate this problem by utilising angular divergence as a contributing coefficient for weighting clients’ models at the layer level. Finally, we revisit the efficiency challenge in FL-SSL and incorporate sparsification into federated SSL model training to accelerate the deployment of such models on FL edge devices. Overall, the original contributions of this thesis address the task of integrating SSL model training into FL environments across three pervasive domains (speech, video, and image). This work lays the groundwork for transferring SSL training to local edge devices for a wide array of real-world applications.
  • ItemOpen Access
    Enhancing Interpretability: The Role of Concept-based Explanations Across Data Types
    Kazhdan, Dmitry
    Deep Neural Networks (DNNs) have achieved remarkable performance on a range of tasks. Unfortunately, they have been shown to be black-boxes whose behaviour cannot be understood directly. Thus, a key step to further empowering DNN-based approaches is improving their explainability, which is addressed by the field of Explainable AI (XAI). Recently, Concept-based explanations (CbEs) have emerged as a powerful new XAI paradigm, providing model explanations in terms of human-understandable units, rather than individual features, pixels, or characters. Despite their numerous applications, existing literature is lacking precise answers to what concepts are, where they can be applied, and how they should be applied. In this thesis, we give more precise answers to these questions. Firstly, we unify and extend existing definitions of CbEs in Computer Vision using our Concept Decomposition (ConD) framework, and use it formulate our novel Concept-based Model Extraction (CME) framework. CME provides a way of extracting Concept-based Models from pre-trained vanilla Convolutional Neural Networks in a semi-supervised fashion, requiring vastly fewer concept annotations. Following our work on CME, we explore how similar approaches transfer to the temporal data modality. In particular, we introduce our Model Explanations via Model Extraction (MEME) framework, a first-of-its-kind temporal CbE framework capable of extracting Concept-based Models from Recurrent Neural Networks (RNNs). Using MEME, we demonstrate how we may harness the benefits of CbEs in the temporal data domain. Equipped with a better understanding of what CbEs are, we proceed to exploring their applications to graph data. In particular, we automate the verification and inspection process of graph concepts, by leveraging the inherent structure of the underlying graph data. In order to achieve this, we introduce the Graph Concept Interpretation (GCI) framework, which can be used to extract and quantitatively evaluate concepts from Graph Neural Networks (GNNs). Overall, this thesis introduces a range of approaches enabling: (i) novel concept definitions, (ii) novel concept applications, (iii) novel concept evaluations. Consequently, these approaches advance the field of concept-based XAI, bringing us closer to making DNNs more transparent, trustworthy, and interpretable.
  • ItemOpen Access
    DEliBots : Deliberation Enhancing Bots
    Karadzhov, Georgi
    Deliberation is the dialogue between a group of people that facilitates careful weighing of information, providing adequate speaking opportunities and bridging the differences among participants’ diverse ways of thinking and problem-solving. It is traditionally explored in the field of psychology, where researchers examine the conditions under which groups can make better decisions together. Previous research has shown that a small focus group can outperform wisdom-of-the-crowd. Further, researchers have shown that under some conditions, a group can outperform even the most knowledgeable individual within it, which is also known as the the assembly bonus effect. However, not all groups are equal, and not every group deliberates well together. In this thesis, I introduce the concept of Deliberation Enhancing Bots - DEliBots. The primary objective of a DEliBot is to improve the group's decision-making by providing a framework for good deliberation practices. To achieve this, we leverage conversational probing -- asking questions that stimulate further discussion and innovative thinking within a group setting. A DEliBot is an artificial moderator that models group dynamics and encourages effective deliberation without providing solutions. To achieve this, we address two significant challenges - identifying when to intervene and what should a DEliBot say in order to improve group deliberation. First, regarding timing, we recognise that a moderator should pick a good moment to intervene within a conversation, and intervening too often or too rarely can harm a group's decision-making. Secondly, while uttering a fluent response has already been addressed in dialogue systems research, generating responses that improve group deliberation has not been previously explored. Thus, we introduce a method that models both short-term and long-term improvement in deliberation. We perform a human evaluation study, integrating a DEliBot within a group of people solving a cognitive task. We show that a DEliBot can improve groups' decision-making by asking good questions and encouraging good deliberation practices.
  • ItemOpen Access
    Improving Photometric Camera Accuracy and Image Quality in High Dynamic Range Imaging
    Hanji, Param; Hanji, Param [0000-0002-7985-4177]
    Cameras have long grappled with the challenge of capturing the vast range of light intensities present in the real world and reproducing them on a medium with much lower dynamic range. This challenge persisted from the era of print photography and continued with the adoption of standard dynamic range (SDR) displays, such as CRT and LCD displays. However, the emergence of high dynamic range (HDR) displays, such as OLED and dual-modulation LCD-LED, has enabled more accurate reproduction of a wider range of light intensities. To fully leverage the capabilities of these HDR displays, it is essential to capture images that faithfully represent the range of light intensities in the real world. In this dissertation, I propose improvements to the HDR capture and reconstruction pipeline by developing noise-aware statistical estimation methods that use physically accurate parametric models to approximate the noise distribution of pixels. These methods address inaccuracies in existing merging algorithms and do not rely on camera-specific parameters that can be difficult to calibrate accurately. Accurate HDR reconstruction also requires accounting for small inaccuracies in the reported camera metadata caused by changes in lighting or rounding by the camera firmware. The proposed HDR reconstruction pipeline has the potential to enhance existing HDR datasets. To demonstrate its effectiveness, I evaluate the improved pipeline using two datasets designed for different applications. The reconstruction pipeline also plays a crucial role in an ultra-realistic capture-display-render system built to faithfully reproduce our 3D world. The psychophysical experiments conducted on this prototype display, which showed that the reconstructed 3D scenes are indistinguishable from their real counterparts, serve as the most comprehensive validation of the HDR pipeline, as they focus on reconstructing real-world scenes in a physically accurate manner. Additionally, I investigate the effectiveness of learning-based methods on the challenging task of predicting HDR information from a single 8-bit photograph. This problem requires the hallucination of underexposed and overexposed pixels, making it more difficult than merging an exposure stack. Through a controlled subjective experiment, I found that state-of-the-art deep convolutional networks are only partially successful at reconstructing accurate HDR pixels. I highlight the limitations of existing evaluation protocols, which explain the poor correlations between widely-used HDR image quality metrics and subjective data. Building upon these findings, I propose a simple correction that significantly improves the predictions of quality metrics for this task. One observation from the study is that the single-image HDR problem is treated as a one-to-one mapping by most existing works. In response to this, I present an alternative approach that utilises a generative network to model the one-to-many mapping for global operations. This approach offers several advantages, including the ability to generate multiple plausible HDR images from a single input photograph, thereby enhancing the accuracy of HDR reconstruction. By tackling the challenges associated with capturing and reconstructing HDR information from still scenes, as well as predicting HDR information from single images, this research has the potential to make significant advancements in the field of HDR imaging. Through these advancements, we can take a significant step forward in bridging the gap between the dynamic ranges observed in the real world and those present in our digital content.
  • ItemOpen Access
    Formalising Combinatorial Structures and Proof Techniques in Isabelle/HOL
    Edmonds, Chelsea; Edmonds, Chelsea [0000-0002-8559-9133]
    The formalisation of mathematics is an area of increasing interest, enabling us to verify correctness, gain deeper insight into proofs, and benefit from advances in automation and search. The remarkable growth over the last decade in proof assistant capabilities and advanced formal mathematical libraries has seen the field reach an exciting new stage. Formalisation has the potential to change how mathematical research is done, while from a computer science perspective, formalised mathematics provides a basis for the formal verification of many applications. Until recently combinatorics was a comparatively undeveloped field in formal mathematical libraries, despite its many critical computing applications. This thesis presents the first formalisations of numerous combinatorial structures and generalised formal proof techniques for combinatorics using the proof assistant Isabelle/HOL. These, in turn, explore the interplay between different mathematical fields in a formal environment. The thesis begins with an outline of the locale-centric approach for formalising complex hierarchies, inspired by Ballarin and further explored through this work. The approach and proof techniques are demonstrated through the formalisation of complex overlapping hierarchies of combinatorial structures, including aspects of graph theory, hypergraph theory, and combinatorial design theory. There is a particular focus on design theory, which had not previously been formalised in any system. The remaining chapters focus on the formalisation of three categories of proof techniques: traditional combinatorial techniques such as counting and generating functions, algebraic techniques, and probabilistic techniques. In each chapter, the thesis details the general formal techniques and frameworks developed, as well as their successful application to formalising numerous theorems on the construction and existence of combinatorial structures. Additionally, the thesis highlights my individual contributions to collaborations which resulted in the formalisation of several significant combinatorial theorems, namely Szemerédi's regularity lemma, Roth's theorem on arithmetic progressions, and the Balog-Szemerédi-Gowers theorem. Through this work, I aim to present a new approach to mathematical formalisation which focuses on developing general, modular formal proof techniques and modelling approaches, rather than purely verifying theorems. Ultimately, this research intends to improve the accessibility of formalising mathematics in Isabelle/HOL through significant modular, flexible, and extensible libraries and frameworks.
  • ItemOpen Access
    Near-Memory Processing for Low-precision Deep Neural Networks
    Miralaei, Aida
    Deep neural networks (DNNs) provide many application domains with state-of-the-art performance and accuracy. However, they are compute-heavy and data-intensive which makes deploying them on resource-constrained edge devices challenging. Transforming the real-valued parameters of a DNN into a minimised-bit-width approximated version through quantising or binarising lowers their accuracy to an extent but significantly reduces their computation complexity and storage space. Moreover, it makes them good candidates to be considered for near-memory processing due to these criteria. On the other hand, when it comes to designing a hardware module (either for near-memory processing or not), the more specialised the hardware the better the performance. The downside with over-specialised modules then becomes their inability to adapt, which can be problematic in a fast-evolving field such as deep learning. To this end, designing a module with reasonable performance, area overhead and energy consumption while maintaining a good balance between the physical limitations of the design and its flexibility is a challenge. The contribution of this thesis is to introduce a processing-near-memory module (PNM) for low-precision convolution neural networks. The placement of this module is near the main memory (DDR4 DRAM) and the memory controller, with a data layout that does not require rearrangement either before or after the convolution operations. Here, two distinct design modes for the PNM module are presented: mode S, which focuses on minimizing area overhead, and mode T, aimed at optimizing data transfers between the module and DRAM. The performance, area, and energy costs of these designs were thoroughly assessed through both analytical and practical analysis. These evaluations highlighted the impact of varying filters, hardware replicas, and bit-widths for each model on the stated criteria, leading to recommended design choices tailored to specific use cases' demands and constraints. An evaluation using a BCNN based on AlexNet indicates that for mode S, a configuration of one hardware replica with 16 filters per replica offers an optimal balance between area, runtime, and energy. In contrast, for mode T, the best configuration comprises 16 replicas and 32 filters. Comparatively, mode T surpasses mode S by a factor of 6.39 in performance, while mode S achieves greater savings in area and energy, by factors of 3.56 and 15.99, respectively.
  • ItemOpen Access
    Reliable and decentralised deep learning for physiological data
    Xia, Tong
    Physiological data encompass measurements from various bodily functions and processes. By employing machine learning to model these data, especially with the advancement of mobile sensing technologies, it becomes feasible to automatically and continually monitor and diagnose one's health status. This holds considerable promise for easing the burden on clinical resources and ensuring timely treatment for the wider population. Nonetheless, significant challenges related to the data and the modelling methods are yet to be resolved, obstructing the deployment of machine learning, especially deep learning, in real-world healthcare contexts. One challenge is that labelled physiological data for model development are usually insufficient and imbalanced, leading to models occasionally exhibiting bias and overconfidence in their predictions. This can result in unreliable diagnoses which yield expensive clinical costs. Moreover, deep learning research generally requires massive data on a centralised server, while privacy concerns hinder the aggregation of physiological data from individuals or hospitals. In order to tackle these challenges and pave the way for reliable deep learning-driven health diagnostics, this thesis proposes several novel solutions and makes the following contributions: Chapter 4 introduces an ensemble learning approach designed to handle data imbalance and model overconfidence for binary health screening. This method utilises balanced training sets derived from imbalanced physiological data, training multiple ensemble models. The predictions from these models are fused to reduce bias and calibrate confidence from a signal model, with model uncertainty measured by the inconsistency among multiple models. This approach effectively mitigates model overconfidence, thereby facilitating reliable automated diagnoses. In Chapter 5, an efficient uncertainty quantification approach is presented to improve the reliability of multi-class mobile health diagnostics. This approach incorporates the cutting-edge technique of evidential deep learning and introduces two novel mechanisms specifically designed to handle class imbalance. The quantified uncertainty enables accurate and efficient detection of misdiagnoses and out-of-training distributed inputs. Chapter 6 introduces a cross-device federated learning method to address privacy concerns arising from gathering physiological data for model development. This method allows physiological data to remain on personal mobile devices, with only locally trained models aggregated into a global health diagnostic model. To mitigate bias caused by data imbalance, a novel loss-weighted model aggregation method is proposed to enhance the performance of the global model. Chapter 7 illustrates a cross-silo federated learning method that enables multiple data holders such as hospitals to collaboratively train a model without exchanging raw data. The distributional heterogeneity of these physiological data silos poses a challenge to federated learning. To address this, a novel method based on feature sharing and augmentation is proposed to balance privacy protection and model performance. All proposed methods have been validated using real-world physiological datasets and commonly used machine learning benchmark data. Specific attention is given to clinical tasks, including the modelling of respiratory audio for respiratory health screening, ECG signals for predicting cardiovascular diseases, and dermoscopic images for detecting skin cancer. Extensive experiments demonstrate that these methods effectively address challenges posed by limited, imbalanced, and decentralised physiological data, thereby enabling reliable health diagnoses. These contributions have significant potential to advance the deployment of deep learning in real-world healthcare scenarios.
  • ItemOpen Access
    Towards Maintainable and Explainable AI Systems with Dataflow
    Paleyes, Andrei
    Machine learning is enjoying rapid growth both as a thriving academic discipline and as a technology that has the potential to transform many aspects of our everyday lives. We have already witnessed breakthroughs in speech generation, drug discovery, recommendation algorithms, and more, all achieved with the help of machine learning. It is vital to realise that any practical application of machine learning is not limited to just creating an accurate model based on a sanitised dataset. Such real-life applications are complex software systems, in which the model is only one, albeit important, component. A significant effort is also spent on creating data collection and cleaning pipelines, quality assurance, model updating workflows, monitoring and operational maintenance of these systems. The experience of numerous practitioners shows that the translation of a well-performing machine learning model to a well-performing machine learning system is not easy. This thesis embarks on a quest to understand the pain points of this translation process and explore software architecture paradigms well suited for the needs of modern data-driven systems. We begin by surveying existing reports on ML deployment and the difficulties they describe. The identified issues and concerns are matched against a typical ML deployment workflow, and we show that there is no single bottleneck, and the entire deployment pipeline is riddled with challenges. We argue that a lot of these challenges are caused by existing software infrastructure and that more data-oriented approaches to software architecture are needed to tackle them. This observation leads us to the second contribution of this thesis, in which we examine data-oriented architecture (DOA) as a promising software architecture paradigm that machine learning systems can benefit from. We focus on measuring the level of adoption of DOA in practical deployments of machine learning and show that even though the paradigm itself is relatively unknown, its principles widely permeate the modern engineering of ML systems. Specifically, we identify dataflow architecture as one of the patterns that realise all DOA principles. We proceed to evaluate the benefits of the dataflow for the deployment of machine learning. The evaluation is presented in two parts. In the first part, we compare the process of deploying an ML model within the functionally equivalent codebases of applications implemented with dataflow and service-oriented approaches, the latter being used as a baseline. We identify some benefits of dataflow, such as higher discoverability and simpler data collection in the system. We also identify the limitations of the paradigm. We then present Seldon Core v2, an open-source model inference platform we designed following the dataflow architecture. We present a detailed discussion on how DOA principles can be implemented in practice, discuss the data observability features of the platform, and quantify the performance trade-offs involved. The last contribution of the thesis points out another benefit of dataflow architecture for software development: a strong relationship between dataflow software and graphical causal models. We identify a connection between dataflow graphs and causal graphs and argue that this relationship allows a straightforward application of causal inference to dataflow software. We use fault localisation as a concrete example of this idea and showcase it in a variety of dataflow systems and scenarios. The thesis closes with a discussion on research avenues that can further develop the community's understanding and adoption of Data-Oriented Architectures and dataflow for machine learning systems.
  • ItemOpen Access
    Non-parametric modelling of signals on graphs
    Opolka, Felix
    Graphs are simple yet powerful data structures that describe entities and their relationships between each other using nodes and edges, making them popular candidates for modelling a wide variety of real-world objects, ranging from molecules to social or biological networks. As a result of their suitability for various modelling scenarios, machine learning on graph-shaped data has emerged as an important field of research in the last few years. While powerful when coupled with machine learning models, graphs pose unique challenges to those models, which need to be able to adapt to not only highly diverse data but also a highly diverse graph domain that may vary in size, connectivity patterns, and its interaction with node features, to name a few. In this work, I hypothesise that Gaussian processes, a class of Bayesian non-parametric models, are particularly well suited for modelling data on graph domains. To provide evidence for this hypothesis, I demonstrate the merits of Bayesian non-parametric modelling for graph data by deriving Gaussian process models for three of the most important tasks in graph machine learning: link prediction, graph-level prediction, and node-level prediction. The resulting models exhibit a number of strengths, including good model fit and robustness against overfitting due to their non-parametric nature, in addition to well calibrated uncertainty estimates. Moreover, the capability of Gaussian processes to optimise hyper-parameters allows designing models that adapt to a graph's particular characteristics, such as the smoothness and multi-scale structure of a graph signal or the locality of features. These strengths of the proposed models and in particular their competitive performance compared to a range of baseline models are confirmed in extensive experiments on a wide range of real-world data sets.
  • ItemOpen Access
    On the evaluation and application of neural language models for grammatical error detection
    Davis, Christopher; Davis, Christopher [0000-0003-4517-5851]
    Neural language models (NLM) have become a core component in many downstream applications within the field of natural language processing, including the task of data-driven automatic grammatical error detection (GED). This thesis explores whether information from NLMs can positively transfer to GED within the domain of learning English as a second language (ESL), and looks at whether NLMs encode and make use of linguistic signals that would facilitate robust and generalisable GED performance. First, I investigate whether information from different types of neural language model can be transferred to models for GED. I evaluate five models against three publicly available ESL benchmarks, and report results showing positive transfer effects to the extent that fine-grained error detection using a single model is becoming viable. Second, I carry out a causal investigation to understand whether NLM-GED models make use of robust linguistic signals during inference – in theory, this would enable them to generalise across different data distributions. The results show a high degree of linear encoding of noun-number within each model’s token-level contextual representations, but they also show markedly varying error detection performance across model types and across in- and out-of-domain datasets. Altogether, the results indicate models employ different strategies for error detection. Third, I re-frame the typically downstream GED task as an evaluation framework to test whether the pre-trained NLMs implicitly encode information about grammatical errors as an artefact of their language modelling objective. I present results illustrating stark differences between masked language models and autoregressive language models – while the former seemingly encodes much more information related to the detection of grammatical errors, the results also present evidence of a brittle encoding across different syntactic constructions. Altogether, this thesis presents a holistic analysis of NLMs – how they might be applied to GED, whether they utilise linguistic information to enable robust inference, and whether their pre-training objective implicitly imbues them with knowledge about grammaticality.
  • ItemOpen Access
    Deception and defense from machine learning to supply chains
    Boucher, Nicholas
    Broad classes of modern cyberattacks are dependent upon their ability to deceive human victims. Given the ubiquity of text across modern computational systems, we present and analyze a set of techniques that attack the encoding of text to produce deceptive inputs to critical systems. By targeting a core building block of modern systems, we can adversarially manipulate dependent applications ranging from natural language processing pipelines to search engines to code compilers. Left undefended, these vulnerabilities enable many ill effects including uncurtailed online hate speech, disinformation campaigns, and software supply chain attacks. We begin by generating adversarial examples for text-based machine learning systems. Due to the discrete nature of text, adversarial examples for text pipelines have traditionally involved conspicuous perturbations compared to the subtle changes of the more continuous visual and auditory domains. Instead, we propose imperceptible perturbations: techniques that manipulate text encodings without affecting the text in its rendered form. We use these techniques to craft the first set of adversarial examples for text-based machine learning systems that are human-indistinguishable from their unperturbed form, and demonstrate their efficacy against systems ranging from machine translation to toxic content detection. We also describe a set of defenses against these techniques. Next, we propose a new attack setting which we call adversarial search. In this setting, an adversary seeks to manipulate the results of search engines to surface certain results only and consistently when a hidden trigger is detected. We accomplish this by applying the encoding techniques of imperceptible perturbations to both indexed content and queries in major search engines. We demonstrate that imperceptibly encoded triggers can be used to manipulate the results of current commercial search engines, and then describe a social engineering attack exploiting this vulnerability that can be used to power disinformation campaigns. Again, we describe a set of defenses against these techniques. We then look to compilers and propose a different set of text perturbations which can be used to craft deceptive source code. We exploit the bidirectional nature of modern text standards to embed directionality control characters into comments and string literals. These control characters allow attackers to shuffle the sequence of tokens rendered in source code, and in doing so to implement programs that appear to do one thing when rendered to human code reviewers, but to do something different from the perspective of the compiler. We dub this technique the Trojan Source attack, and demonstrate the vulnerability of C, C++, C#, JavaScript, Java, Rust, Go, Python, SQL, Bash, Assembly, and Solidity. We also explore the applicability of this attack technique to launching supply chain attacks, and propose defenses that can be used to mitigate this risk. We also describe and analyze a 99-day coordinated disclosure that yielded patches to dozens of market-leading compilers, code editors, and code repositories. Finally, we propose a novel method of identifying software supply chain attacks that works not only for Trojan Source attacks, but for most forms of supply chain attacks. We describe an extension to compilers dubbed the Automated Bill of Materials, or ABOM, which embeds dependency metadata into compiled binaries. Specifically, hashes of each source code file consumed by a compiler are embedded into its emitted binary, and these hashes are included recursively into all downstream dependencies. They are stored in a highly space and time efficient probabilistic data structure that requires an expected value of just 2.1 bytes to represent each unique dependency source code file. With ABOMs, it becomes possible to detect all naturally occurring and most adversarially induced vulnerabilities used for supply chain attacks in downstream software by querying binaries for the presence of poisoned dependencies without the need to locate tangible indicators of compromise. In this thesis, we therefore demonstrate how weaknesses in a core building block of modern systems – text encodings – can cause failures in a wide range of domains including machine learning, search engines, and source code. We propose defenses against each variant of our attack, including a new tool to identify most generic software supply chain attacks. We believe that these techniques will be useful in securing software ecosystems against the next generation of attacks.
  • ItemOpen Access
    On the Optimality of the Lexicon
    Pimentel Martins Da Silva, Tiago
    The principle of least effort posits that a pressure towards communicative efficiency shapes natural languages. In this thesis, we investigate the existence, the nature, and the impact of such a pressure in natural languages' lexicons. We investigate the existence of this pressure by (i) estimating what optimal word lengths would be using coding theory, (ii) proposing pressure-free baselines and estimating their word lengths, and then (iii) comparing natural lexicons to both these optimal and pressure-free artificial lexicons. We investigate the nature of this pressure by comparing multiple ways in which communicative efficiency can be operationalised; we formalise it as either a pressure to shorten utterances, or a pressure to keep information rates as close as possible to an unknown communication channel capacity. Finally, we study the impact of this pressure on cross-linguistic differences in word lengths and on the ratio of homophones in natural languages. Overall, our results support a Zipfian view of communicative efficiency, in which lexicons are pressured towards having utterances that are as short as possible. Our results, however, also highlight the existence of competing constraints and pressures in how lexicons are structured: (i) a language's phonotactic complexity seems to bottleneck the extent to which economy of expression can optimise a lexicon, and (ii) a pressure for clarity seems to keep the ratio of homophones in a language close to chance.
  • ItemOpen Access
    Self-supervised learning for data-efficient human activity recognition
    Tang, Chi Ian
    Over the last decade, smart mobile devices have become ubiquitous, bringing about significant lifestyle changes worldwide. Mobile sensing, which involves obtaining and analysing data from mobile devices and the environment, has emerged as an active research area. It captures the unique opportunity for mobile devices to offer insight into user behaviours. Within mobile sensing, human activity recognition is a fundamental task that aims to identify users' physical actions. Motivated by advancements in deep learning, human activity recognition research has also widely adopted these methods. However, compared to other data modalities, human activity recognition models struggle with the limited availability of labels, due to the difficulty of ground-truth collection. These models often fail to generalise across different users, devices, and changing data distributions. This thesis tackles these challenges by developing and evaluating novel training paradigms. Our proposed paradigms leverage data from additional sources, including other devices and readily available unlabelled data that can be collected easily and often passively, to provide supervision for deep learning, enabling human activity recognition models to be more data-efficient. First, we proposed a new semi-supervised training pipeline that combines self-supervised learning and knowledge distillation to effectively leverage large-scale unlabelled datasets for human activity recognition. This helps models generalise better across different users by increasing the diversity of data that the model is trained on through augmentation and unlabelled data. Next, we designed a collaborative self-supervised learning technique that leverages unlabelled data from multiple devices carried by a user. This method is inspired by the insight that data from multiple devices capture the same physical activity from different viewpoints. The contrastive learning setup, which makes representations for samples from different devices to be similar, is used to extract high-quality features from the data. Finally, we developed continual learning methods motivated by observations that user behaviour often shifts over time due to lifestyle changes. These methods help models better adapt to changing data distributions and learn from new data. We first proposed a multi-task training method that allows models to have better flexibility in adapting to new tasks. Then, we developed a continual learning strategy that balances retaining prior knowledge and learning from new data. This strategy uses self-supervised learning for knowledge retention and a carefully designed loss function to balance different learning objectives. Through extensive evaluation on open datasets, the training paradigms proposed in this thesis provide evidence for and contribute to the development of data-efficient human activity recognition systems by leveraging readily-available data through self-supervised learning.
  • ItemOpen Access
    Formally verifying the security properties of a proof-of-stake blockchain protocol
    Worrasangasilpa, Kawin
    In 2009, Bitcoin was brought into existence, as the first real-world application of blockchain protocols, by its pseudonymous and mysterious creator, Satoshi Nakamoto. It was presented as a form of cryptocurrency, has become widely known and been recognised as the first successful E-cash since the introduction of the E-cash idea in 1983. Many more crytocurrencies including Ethereum and Tether have emerged following the success of Bitcoin. Bitcoin is secure, meaning that it satisfies persistence and liveness. Without the need of a trusted third party, it appears to prevent double-spending attacks. Bitcoin's security is obtained by the use of cryptographically secure chains of blocks for time-stamping (hence the name 'blockchain') and a technique, often called Nakamoto consensus, combined from the longest-chain rule and Proof-of-Work (PoW). Briefly, it allows and encourages all parties to participate in picking the longest chain in the system and solving a cryptographically difficult puzzle to declare the next block of transactions for that chain. PoW is sometimes referred to as a lottery system. PoW requires a majority of the computational power to be honest and it consumes a gigantic amount of energy, so it is not scalable. In the light of this problem, Proof-of-Stake (PoS) was suggested to replace PoW. PoS-based blockchain protocols, instead of using computational power, use in-system currency to agree on a new block to be added, but keep mostly everything else the same. Kiayias et al. was the first to propose a provably secure PoS-based blockchain protocol: Ouroboros. Ouroboros' security guarantees, persistence and liveness, can be verified through proving that Ouroboros satisfies three elementary properties for blockchains as proposed by Gary et al.: Common Prefix, Chain Growth, and Chain Quality. In this project, we attempt to formalise, in Isabelle/HOL, the combinatorial analysis used to prove that Ouroboros satisfies common prefix with near certainty. We cover the case of a static stake protocol under a few assumptions: the network is synchronous, the majority of the stake is honest, and the stake transfer between executions does not effect the lottery system.
  • ItemOpen Access
    Evaluating Natural Language Generation Tasks for Grammaticality, Faithfulness and Diversity
    Xie, Huiyuan
    Natural language generation (NLG) plays a vital role in many applications. Evaluating the quality of generated text is crucial for ensuring the effectiveness and user satisfaction of NLG systems. With the popularisation of deep learning in recent years, many models have been reported to achieve super-human performance on popular benchmarks. However, it has been observed that existing holistic benchmarks and evaluation metrics frequently fail to accurately assess specific evaluation factors that are of interest to the field. This thesis explores a diagnostic evaluation framework for assessing the grammaticality, faithfulness, and diversity (GFD) of generated text in NLG tasks. These three metrics are considered as essential linguistic qualities, which need to be present in the outputs of NLG models. Grammaticality is examined by analysing the parsability of a sentence with a well-defined formal grammar. Faithfulness is divided into two facets: grounding faithfulness and task faithfulness. These two facets investigate how well the model outputs align with both the information provided in the input, and the inherent requirements of the task. Diversity is further divided into word-level and parse-level diversity measures. In the proposed GFD framework, the evaluation of the three metrics does not require task-specific references to be constructed. By clearly defining and evaluating these generation qualities, this framework aims to provide insights into the strengths and limitations of NLG models. To demonstrate the versatility of the GFD evaluation framework, three different generation tasks are explored: synthetic image captioning, football highlight generation from match statistics, and topic-shift dialogue generation. These tasks are deliberately chosen to cover a diverse range of generation scenarios. Each task provides unique grounding information and constraints that influence the generation process, which in turn create diverse challenges for the evaluation of NLG models. Experiments on these tasks reveal the challenges in fine-grained NLG evaluation when the availability of ground truth representations diminishes or when there is a delicate balance between input groundings and task constraints. This thesis empirically demonstrates how the GFD evaluation framework, in combination with diagnostic datasets, can provide insights into model strengths and limitations to supplement standard evaluations.
  • ItemOpen Access
    Distributional and relational inductive biases for graph representation learning in biomedicine
    Scherer, Paul; Scherer, Paul [0000-0002-2240-7501]
    The immense complexity in which DNAs, RNAs, proteins and other biomolecules interact amongst themselves, with one another, and the environment to bring about life processes motivates the mass collection of biomolecular data and data-driven modelling to gain insights into physiological phenomena. Recent predictive modelling efforts have focused on deep representation learning methods which offer a flexible modelling paradigm to handling high dimensional data at scale and incorporating inductive biases. The emerging field of representation learning on graph structured data opens opportunities to leverage the abundance of structured biomedical knowledge and data to improve model performance. Grand international initiatives have been coordinated to organise and structure our growing knowledge about the interactions and putative functions of biomolecular entities using graphs and networks. This dissertation considers how we may use the inductive biases within recent graph representation learning methods to leverage these structures and incorporate biologically relevant relational priors into machine learning methods for biomedicine. We present contributions in two parts with the aim to foster research in this multidisciplinary domain and present novel methods that achieve strong performance through the use of distributional and relational inductive biases operating on graph-structured biomedical knowledge and data. The first part is concerned with consolidating and expanding the current ecosystem of practical frameworks dedicated to graph representation learning. Our first contribution presents Geo2DR, the first practical framework and software library for constructing methods capable of learning distributed representations of graphs. Our second contribution, Pytorch Geometric Temporal, is the first open source representation learning library for dynamic graphs, expanding the scope of research software on graph neural networks that were previously limited to static graphs. The second part presents three methods wherein each contribution tackles an active biomedical research problem using relational structures that exist within different aspects of the data. First we present a methodology for learning distributed representations of molecular graphs in the context of drug pair scoring. Next, we present a method for leveraging structured knowledge on the variables of gene expression profiles to automatically construct sparse neural models for cancer subtyping. Finally, we present a state-of-the-art cell deconvolution model for spatial transcriptomics data using the positional relationships between observations in the dataset.