Do Concept Bottleneck Models Learn as Intended?

Margeloiu, Andrei; Ashman, Matthew; Bhatt, Umang; Chen, Yanzhi; Jamnik, Mateja; Weller, Adrian

doi:10.17863/CAM.80941

Do Concept Bottleneck Models Learn as Intended?

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/333521

Repository DOI

https://doi.org/10.17863/CAM.80941

Files

Accepted version (5.01 MB)

Type

Article

Authors

Margeloiu, Andrei

Ashman, Matthew

Bhatt, Umang

Chen, Yanzhi

Jamnik, Mateja

https://orcid.org/0000-0003-2772-2532

Show 1 more

Abstract

Concept bottleneck models map from raw inputs to concepts, and then from concepts to targets. Such models aim to incorporate pre-specified, high-level concepts into the learning procedure, and have been motivated to meet three desiderata: interpretability, predictability, and intervenability. However, we find that concept bottleneck models struggle to meet these goals. Using post hoc interpretability methods, we demonstrate that concepts do not correspond to anything semantically meaningful in input space, thus calling into question the usefulness of concept bottleneck models in their current form.

Keywords

cs.LG, cs.LG, cs.AI

Journal Title

CoRR

Conference Name

ICLR-21 Workshop on Responsible AI

Rights

Collections

University of Cambridge Research Outputs (Articles and Conferences)

Do Concept Bottleneck Models Learn as Intended?

Accepted version

Peer-reviewed

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Keywords

Journal Title

Conference Name

Journal ISSN

Volume Title

Publisher

Publisher DOI

Publisher URL

Rights

Collections