Repository logo
 

Do Concept Bottleneck Models Learn as Intended?

Accepted version
Peer-reviewed

Type

Article

Change log

Authors

Margeloiu, Andrei 
Ashman, Matthew 
Bhatt, Umang 
Chen, Yanzhi 

Abstract

Concept bottleneck models map from raw inputs to concepts, and then from concepts to targets. Such models aim to incorporate pre-specified, high-level concepts into the learning procedure, and have been motivated to meet three desiderata: interpretability, predictability, and intervenability. However, we find that concept bottleneck models struggle to meet these goals. Using post hoc interpretability methods, we demonstrate that concepts do not correspond to anything semantically meaningful in input space, thus calling into question the usefulness of concept bottleneck models in their current form.

Description

Keywords

cs.LG, cs.LG, cs.AI

Journal Title

CoRR

Conference Name

ICLR-21 Workshop on Responsible AI

Journal ISSN

Volume Title

Publisher

Publisher DOI

Publisher URL