- Poster presentation
- Open Access
Biologically plausible reinforcement learning of continuous actions
BMC Neurosciencevolume 14, Article number: P28 (2013)
Humans and animals have the ability to perform very precise movements to obtain rewards. For instance, it is no problem at all to pick up a mug of coffee from your desk while you are working. Unfortunately, it is unknown how exactly the non-linear mapping between sensory inputs (e.g. your mug on the retina) and the correct motor actions (e.g. a set of joint angles) are learned by the brain. Here we show how a biologically plausible learning scheme can learn to perform non-linear transformations from sensory inputs to continuous actions based on reinforcement learning.
To arrive at our novel scheme, we built on the idea of attention-gated reinforcement learning (AGREL) , a biologically plausible learning scheme that explains how networks of neurons can learn to perform non-linear transformations from sensory inputs to discrete actions (e.g. pressing a button) based on reinforcement learning . We recently showed that the AGREL learning scheme can be generalized to perform multiple simultaneous discrete actions , and we now show how this scheme can be further generalized to continuous action spaces. The key idea is that motor areas have feedback connections to earlier processing layers which inform the network about the selected action. Synaptic plasticity is constrained to those synapses that were involved in the decision, and it follows a simple Hebbian rule which is gated by a globally available neuromodulatory signal that codes reward prediction errors. In our novel scheme motor units are situated in a population coding layer that encodes the outcome of the decision process as a bump of activations . This contrasts to our earlier work where single motor units code for actions [1, 3]. We show that the synaptic updates perform stochastic gradient descent on the prediction error that results from the combined action-value prediction of all the motor units that encoded the decision. Unlike other reinforcement learning based approaches, e.g. , our reinforcement learning rule is powerful enough to learn tasks that require non-linear transformations. The distribution of population centers in the motor layer can also be automatically adapted to task demands, yielding more representational power when actions need to be precise.
We show that the novel scheme can learn to perform non-linear transformations from sensory inputs to motor outputs in a variety of direct reward tasks. The model can explain how visuomotor coordinate transforms might be learned by reinforcement learning instead of semi-supervised learning as used in . It might also explain how humans learn to weigh the accuracy of their movement against the potential rewards and punishments for making inaccurate movements as in the visually guided movement task described in .
Roelfsema PR, van Ooyen A: Attention-gated reinforcement learning of internal representations for classification. Neural Comp. 2005, 17: 2176-2214. 10.1162/0899766054615699.
Sutton RS, Barto AG: Reinforcement Learning: an introduction. 1998, MIT Press
Rombouts JO, van Ooyen A, Roelfsema PR, Bohte SM: Biologically Plausible Multi-dimensional Reinforcement Learning in Neural Networks. ICANN. 2012, 443-450.
Zhang K: Representation of spatial orientation by the intrinsic dynamics of the head-direction cell ensemble: a theory. J Neurosci. 1996, 16: 2112-2126.
Ognibene D, Rega A, Baldassarre G: A model of reaching that integrates reinforcement learning and population encoding of postures. From Animals to Animats 9. 2006, 381-393.
Ghahramani Z, Wolpert DM, Jordan MI: Generalization to local remappings of the visuomotor coordinate transformation. J Neurosci. 1996, 16: 7085-7096.
Trommershäuser J, Maloney LT, Landy MS: Statistical decision theory and the selection of rapid, goal-directed movements. J Opt Soc Am A Opt Image Sci Vis. 2003, 20: 1419-1433. 10.1364/JOSAA.20.001419.