What is Interpretability?
Publication Date
2021Journal Title
Philos Technol
ISSN
2210-5433
Publisher
Springer Science and Business Media LLC
Volume
34
Issue
4
Pages
833-862
Language
en
Type
Article
This Version
VoR
Metadata
Show full item recordCitation
Erasmus, A., Brunet, T. D., & Fisher, E. (2021). What is Interpretability?. Philos Technol, 34 (4), 833-862. https://doi.org/10.1007/s13347-020-00435-2
Description
Funder: Ernest Oppenheimer Memorial Trust (ZA)
Funder: Williamson, Rausing and Lipton HPS Trust Fund (GB)
Funder: Wellcome Trust; doi: https://doi.org/10.13039/100004440
Funder: Cambridge Commonwealth, European and International Trust; doi: https://doi.org/10.13039/501100003343
Funder: Cambridge Commonwealth Trust; doi: https://doi.org/10.13039/501100003342
Funder: Social Sciences and Humanities Research Council of Canada; doi: https://doi.org/10.13039/501100000155
Abstract
We argue that artificial networks are explainable and offer a novel theory of interpretability. Two sets of conceptual questions are prominent in theoretical engagements with artificial neural networks, especially in the context of medical artificial intelligence: (1) Are networks explainable, and if so, what does it mean to explain the output of a network? And (2) what does it mean for a network to be interpretable? We argue that accounts of "explanation" tailored specifically to neural networks have ineffectively reinvented the wheel. In response to (1), we show how four familiar accounts of explanation apply to neural networks as they would to any scientific phenomenon. We diagnose the confusion about explaining neural networks within the machine learning literature as an equivocation on "explainability," "understandability" and "interpretability." To remedy this, we distinguish between these notions, and answer (2) by offering a theory and typology of interpretation in machine learning. Interpretation is something one does to an explanation with the aim of producing another, more understandable, explanation. As with explanation, there are various concepts and methods involved in interpretation: Total or Partial, Global or Local, and Approximative or Isomorphic. Our account of "interpretability" is consistent with uses in the machine learning literature, in keeping with the philosophy of explanation and understanding, and pays special attention to medical artificial intelligence systems.
Keywords
Explainability, Interpretability, Medical AI, XAI
Identifiers
s13347-020-00435-2, 435
External DOI: https://doi.org/10.1007/s13347-020-00435-2
This record's URL: https://www.repository.cam.ac.uk/handle/1810/331818
Rights
Licence:
http://creativecommons.org/licenses/by/4.0/
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.
Recommended or similar items
The current recommendation prototype on the Apollo Repository will be turned off on 03 February 2023. Although the pilot has been fruitful for both parties, the service provider IKVA is focusing on horizon scanning products and so the recommender service can no longer be supported. We recognise the importance of recommender services in supporting research discovery and are evaluating offerings from other service providers. If you would like to offer feedback on this decision please contact us on: support@repository.cam.ac.uk