A convolutional neural network model of the neural responses of inferotemporal cortex to complex visual objects
BMC Neuroscience volume 12, Article number: P35 (2011)
We present a neural network model that replicates the response properties of the neurons in monkey inferior temporal cortex described in the studies of Tanaka and colleagues [1, 2]. A convolutional neural network (CNN) known for its visual pattern recognition capabilities is used for this purpose. The present work consists of two studies.
In the first study, we simulate the “image reduction method” of  in order to study the responses of tuned neurons to complex visual patterns. The CNN used in this study consists of 4 hidden layers, 12 output neurons, and accepts a input image of size 50 X 50. The first hidden layer has 5 sub layers, each of size 46 X 46, and the third hidden layer has 12 sub layers, each of size 20 X 20. The network is trained on 12 images selected from the original study . Neurons of the penultimate layer that exhibit a distinct response to one image, as opposed to all other images, are selected as tuned neurons. When reduced version of an image is presented, the corresponding tuned neurons preferentially show a drastic reduction in response; no such change is seen in the responses of a non-tuned neuron (fig. 1).
Next we investigate the inherent hierarchy of categorical representations of model neuron responses as in the experimental study of . In this case, the CNN is trained on images of 12 categories used in , each consisting of about 15 sample images. The network consists of 6 hidden layers, 12 output neurons, and accepts an input image of size 50x50. The first hidden layer has 5 sub layers, each of size 46x46; the third hidden layer has 12 sub layers, each of size 20x20, and the fifth hidden layer has 12 hidden layers of size 6x6. We perform hierarchical clustering on the response vectors of neurons in the penultimate layer. Selected nodes in this tree are assigned one of the 12 low-level categories based on a score which is an average of two ratios (ratio1 = (number of category members under the node)/ (number of all members in the category) and ratio2 = (number of category members under the node)/ (number of all stimuli under the node)). The scores given to various categories in the model data bears a strong resemblance to the corresponding scores obtained in the experimental study.
Tanaka K: Mechanisms of visual object recognition studied in monkeys. Spatial Vision. 2000, 13: 147-163. 10.1163/156856800741171.
Kiani R, Esteky H, Mirpour K, Tanaka K: Object Category Structure in Response Patterns of Neuronal Population in Monkey Inferior Temporal Cortex. Journal of Neurophysiology. 2007, 97: 4296-4309. 10.1152/jn.00024.2007.
About this article
Cite this article
Rohit, S., Chakravarthy, S. A convolutional neural network model of the neural responses of inferotemporal cortex to complex visual objects. BMC Neurosci 12 (Suppl 1), P35 (2011). https://doi.org/10.1186/1471-2202-12-S1-P35