Repository logo
 

Deep convolutional neural networks, features, and categories perform similarly at explaining primate high-level visual representations

Published version
Peer-reviewed

Type

Conference Object

Change log

Authors

Jozwik, KM 
Kriegeskorte, Nikolaus 
Cichy, Radoslaw Martin 
Mur, Marieke 

Abstract

Deep convolutional neural networks (DNNs) are currently the best computational model for explaining image representations across the visual cortical hierarchy. However, it is unclear how the representations in DNNs relate to those of simpler “oracle” models of features and categories. We obtained DNN (AlexNet) representations for a set of 92 real-world object images. Human observers generated category and feature labels for the images. Category labels included subordinate, basic and superordinate categories; feature labels included object parts, colors, textures, and contours. We used the AlexNet representations and labels to explain brain representations of the images, measured with fMRI in humans and cell recordings in monkeys. For both human and monkey inferior temporal (IT) cortex, late AlexNet layers perform similarly to basic categories and object parts. Furthermore, late AlexNet layers can account for more than half of the variance that these labels explain in IT. Finally, while feature and category models predominantly explain image representations in high-level visual cortex, AlexNet layers explain representations across the entire visual cortical hierarchy. DNNs may provide a computationally explicit model of how features and categories are computed by the brain.

Description

Keywords

46 Information and Computing Sciences, 4611 Machine Learning, 44 Human Society, Eye Disease and Disorders of Vision, Bioengineering, Machine Learning and Artificial Intelligence

Journal Title

2018 Conference on Cognitive Computational Neuroscience

Conference Name

2018 Conference on Cognitive Computational Neuroscience

Journal ISSN

Volume Title

Publisher

Cognitive Computational Neuroscience
Sponsorship
Wellcome Trust (1360)
This work was funded by a Sir Henry Wellcome Postdoctoral Fellowship (206521/Z/17/Z) to KMJ, a DFG grant CI 241-1/1 (Emmy Noether Programme) to RMC and a British Academy Postdoctoral Fellowship (PS 140117) to MM.