Towards Maintainable and Explainable AI Systems with Dataflow

Paleyes, Andrei

doi:https://doi.org/10.17863/CAM.108344

Towards Maintainable and Explainable AI Systems with Dataflow

Repository URI

https://www.repository.cam.ac.uk/handle/1810/367937

Repository DOI

https://doi.org/10.17863/CAM.108344

Files

Primary Thesis (2.73 MB)

Type

Thesis

Authors

Paleyes, Andrei

Abstract

Machine learning is enjoying rapid growth both as a thriving academic discipline and as a technology that has the potential to transform many aspects of our everyday lives. We have already witnessed breakthroughs in speech generation, drug discovery, recommendation algorithms, and more, all achieved with the help of machine learning. It is vital to realise that any practical application of machine learning is not limited to just creating an accurate model based on a sanitised dataset. Such real-life applications are complex software systems, in which the model is only one, albeit important, component. A significant effort is also spent on creating data collection and cleaning pipelines, quality assurance, model updating workflows, monitoring and operational maintenance of these systems. The experience of numerous practitioners shows that the translation of a well-performing machine learning model to a well-performing machine learning system is not easy. This thesis embarks on a quest to understand the pain points of this translation process and explore software architecture paradigms well suited for the needs of modern data-driven systems.

We begin by surveying existing reports on ML deployment and the difficulties they describe. The identified issues and concerns are matched against a typical ML deployment workflow, and we show that there is no single bottleneck, and the entire deployment pipeline is riddled with challenges. We argue that a lot of these challenges are caused by existing software infrastructure and that more data-oriented approaches to software architecture are needed to tackle them. This observation leads us to the second contribution of this thesis, in which we examine data-oriented architecture (DOA) as a promising software architecture paradigm that machine learning systems can benefit from. We focus on measuring the level of adoption of DOA in practical deployments of machine learning and show that even though the paradigm itself is relatively unknown, its principles widely permeate the modern engineering of ML systems. Specifically, we identify dataflow architecture as one of the patterns that realise all DOA principles.

We proceed to evaluate the benefits of the dataflow for the deployment of machine learning. The evaluation is presented in two parts. In the first part, we compare the process of deploying an ML model within the functionally equivalent codebases of applications implemented with dataflow and service-oriented approaches, the latter being used as a baseline. We identify some benefits of dataflow, such as higher discoverability and simpler data collection in the system. We also identify the limitations of the paradigm. We then present Seldon Core v2, an open-source model inference platform we designed following the dataflow architecture. We present a detailed discussion on how DOA principles can be implemented in practice, discuss the data observability features of the platform, and quantify the performance trade-offs involved.

The last contribution of the thesis points out another benefit of dataflow architecture for software development: a strong relationship between dataflow software and graphical causal models. We identify a connection between dataflow graphs and causal graphs and argue that this relationship allows a straightforward application of causal inference to dataflow software. We use fault localisation as a concrete example of this idea and showcase it in a variety of dataflow systems and scenarios.

The thesis closes with a discussion on research avenues that can further develop the community's understanding and adoption of Data-Oriented Architectures and dataflow for machine learning systems.

Date

2023-10-27

Advisors

Lawrence, Neil

Keywords

causal inference, dataflow architecture, data-oriented architecture, flow-based programming, machine learning, service-oriented architecture

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Sponsorship

Alan Turing Institute (Unknown)

Collections

Theses - Computer Science and Technology

Towards Maintainable and Explainable AI Systems with Dataflow

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Date

Advisors

Keywords

Qualification

Awarding Institution

Rights and licensing

Sponsorship

Collections