Deep concept reasoning: beyond the accuracy-interpretability trade-off
Deep learning researchers stockpile ground-breaking achievements almost as fast as they find flaws in their models. Although deep learning models can achieve superhuman performances, explaining deep learning decisions and mistakes is often impossible even for "explainable AI" specialists, causing lawmakers to question the ethical and legal ramifications of deploying deep learning systems. For this reason, the key open problem in the field is to increase deep neural networks transparency and trustworthiness to enable a safe deployment of such technologies.
The lack of human trust in deep learning is affected by three key factors. Firstly, the absence of a formal and comprehensive theory undermines the field of explainable AI. This leads to ill-posed questions, induces re-discovery of similar ideas, and impedes researchers to approach the domain. Secondly, the explainable AI literature is mostly dominated by methods providing post-hoc, qualitative, and local explanations, which are often inaccurate and misleading. Finally, machine learning systems---including deep neural networks---struggle in striking a balance between task accuracy and interpretability. Existing solutions either sacrifice model transparency for task accuracy or vice versa, making it difficult to optimize both objectives simultaneously.
This thesis includes four research works contributing in addressing these challenges. The first work addresses the lack of a formal theory of explainable AI. This work proposes the first-ever theory of explainable AI and concept learning, which formalizes some of the fundamental ideas used in this field. The key innovation of this chapter is the use of categorical structures to formalize explainable AI notions and processes. The use of category theory is particularly noteworthy as it provides a sound and abstract formalism to examine general structures and systems of structures, avoiding contingent details and focusing on their fundamental essence. This theoretical foundation serves as a solid basis for the other chapters in the thesis. The second work aims to overcome the limitations of current explainable AI techniques providing post-hoc, qualitative, and local explanations. To this end, this work proposes Logic Explained Networks, a novel class of concept-based models that can solve and explain classification problems simultaneously. The key innovation of Logic Explained Networks is a sparse attention layer that selects the most relevant concepts in neural concept-based models. This way, the model learns to generate simple logic explanations. The third work tackles the accuracy-explainability trade-off, a major limitation of concept-based models. To address this issue, this work proposes Concept Embedding Models. The key innovation of Concept Embeddings Models is a fully supervised high-dimensional concept representation. The high-dimensional representation enables Concept Embedding Models to overcome the information bottleneck, enabling them to achieve state-of-the-art accuracy without sacrificing model transparency. The fourth work addresses the limitations of Concept Embeddings Models which are unable to provide concept-based logic explanations for their predictions. To fill this gap, this work presents the Deep Concept Reasoner, the first interpretable concept-based model using concept embeddings. The key innovation of the Deep Concept Reasoner is the use of neural networks to generate interpretable rules which are executed symbolically to make task predictions. This enables the Deep Concept Reasoner to attain state-of-the-art performance in complex tasks and to provide human-understandable and formal explanations for its predictions.
Overall, this thesis makes significant contributions by introducing the first formal theory of explainable AI and presenting novel deep learning techniques going beyond the current accuracy-interpretability trade-off. The results of the experiments demonstrate how these innovations lead to a new generation of deep learning architectures that are both transparent and accurate. The introduction of these new techniques lays the groundwork to increase deep learning transparency and trustworthiness, enabling a safe deployment of robust and controllable machine learning agents.