Do Concept Bottleneck Models Learn as Intended?
View / Open Files
Publication Date
2021Journal Title
CoRR
Conference Name
ICLR-21 Workshop on Responsible AI
Type
Article
This Version
AM
Metadata
Show full item recordCitation
Margeloiu, A., Ashman, M., Bhatt, U., Chen, Y., Jamnik, M., & Weller, A. (2021). Do Concept Bottleneck Models Learn as Intended?. CoRR https://doi.org/10.17863/CAM.80941
Abstract
Concept bottleneck models map from raw inputs to concepts, and then from
concepts to targets. Such models aim to incorporate pre-specified, high-level
concepts into the learning procedure, and have been motivated to meet three
desiderata: interpretability, predictability, and intervenability. However, we
find that concept bottleneck models struggle to meet these goals. Using post
hoc interpretability methods, we demonstrate that concepts do not correspond to
anything semantically meaningful in input space, thus calling into question the
usefulness of concept bottleneck models in their current form.
Keywords
cs.LG, cs.LG, cs.AI
Identifiers
This record's DOI: https://doi.org/10.17863/CAM.80941
This record's URL: https://www.repository.cam.ac.uk/handle/1810/333521
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.