- Poster presentation
- Open access
- Published:
Reinforcement learning on complex visual stimuli
BMC Neuroscience volume 10, Article number: P90 (2009)
Animals are confronted with the problem of initiating motor actions based on very complex sensory input. We have built a biologically plausible model that uses reinforcement learning on complex visual stimuli to direct an agent towards a target. This is made possible by first extracting a high-level representation of the scene with a hierarchical network and then applying a correlation based RL-learning rule.
The sensory input given to the model consists of grayscale images of size 155 × 155 pixels; see figure. Given this complex input, the model should extract the position and direction of the agent, and the position of the target. This estimation is successfully performed by a multi-layer hierarchical network modeled after the visual system [1]. In each layer, we use Slow Feature Analysis (SFA) [2, 3] to efficiently extract higher-level features based on time structure. SFA has the advantage that learning is done unsupervised, just by feeding the model with image sequences. The high-level output of the hierarchical network is then used to learn corresponding motor commands with a reinforcement-learning algorithm. The reward signal is given by the distance to the target, which is the only supervision signal in the whole model (biologically it could be interpreted as a scent of the target). The motor command output is then used to update the scene, so the model runs in a feedback loop. The resulting trajectories (Figure 1) show how the model directs the agent towards its target. Our model demonstrates that by a division-of-labor strategy simple learning rules can solve a rather difficult problem.
References
Franzius M, Wilbert N, Wiskott L: Invariant object recognition with slow feature analysis. Proc 18th Int'l Conf on Artificial Neural Networks. Edited by: Kurková V, Neruda R, Koutník J. 2008, Springer-Verlag, 961-970.
Wiskott L, Sejnowski TJ: Slow feature analysis: Unsupervised learning of invariances. Neural Computation. 2002, 14: 715-770. 10.1162/089976602317318938.
Zito T, Wilbert N, Wiskott L, Berkes P: Modular toolkit for data processing (MDP): A Python data processing framework. Front Neuroinformatics. 2008, 2: 8.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Wilbert, N., Legenstein, R., Franzius, M. et al. Reinforcement learning on complex visual stimuli. BMC Neurosci 10 (Suppl 1), P90 (2009). https://doi.org/10.1186/1471-2202-10-S1-P90
Published:
DOI: https://doi.org/10.1186/1471-2202-10-S1-P90