Enhancing Interpretability: The Role of Concept-based Explanations Across Data Types
Repository URI
Repository DOI
Change log
Authors
Abstract
Deep Neural Networks (DNNs) have achieved remarkable performance on a range of tasks. Unfortunately, they have been shown to be black-boxes whose behaviour cannot be understood directly. Thus, a key step to further empowering DNN-based approaches is improving their explainability, which is addressed by the field of Explainable AI (XAI). Recently, Concept-based explanations (CbEs) have emerged as a powerful new XAI paradigm, providing model explanations in terms of human-understandable units, rather than individual features, pixels, or characters. Despite their numerous applications, existing literature is lacking precise answers to what concepts are, where they can be applied, and how they should be applied. In this thesis, we give more precise answers to these questions.
Firstly, we unify and extend existing definitions of CbEs in Computer Vision using our Concept Decomposition (ConD) framework, and use it formulate our novel Concept-based Model Extraction (CME) framework. CME provides a way of extracting Concept-based Models from pre-trained vanilla Convolutional Neural Networks in a semi-supervised fashion, requiring vastly fewer concept annotations. Following our work on CME, we explore how similar approaches transfer to the temporal data modality. In particular, we introduce our Model Explanations via Model Extraction (MEME) framework, a first-of-its-kind temporal CbE framework capable of extracting Concept-based Models from Recurrent Neural Networks (RNNs). Using MEME, we demonstrate how we may harness the benefits of CbEs in the temporal data domain. Equipped with a better understanding of what CbEs are, we proceed to exploring their applications to graph data. In particular, we automate the verification and inspection process of graph concepts, by leveraging the inherent structure of the underlying graph data. In order to achieve this, we introduce the Graph Concept Interpretation (GCI) framework, which can be used to extract and quantitatively evaluate concepts from Graph Neural Networks (GNNs).
Overall, this thesis introduces a range of approaches enabling: (i) novel concept definitions, (ii) novel concept applications, (iii) novel concept evaluations. Consequently, these approaches advance the field of concept-based XAI, bringing us closer to making DNNs more transparent, trustworthy, and interpretable.
Description
Date
Advisors
Lio, Pietro
Keywords
Qualification
Awarding Institution
Sponsorship
Engineering and Physical Sciences Research Council (2281344)