Skip to main content
  • Poster presentation
  • Open access
  • Published:

A spiking temporal-difference learning model based on dopamine-modulated plasticity

Making predictions about future rewards and adapting the behavior accordingly is crucial for any higher organism. One theory specialized for prediction problems is temporal-difference (TD) learning. Experimental findings suggest that TD learning is implemented by the mammalian brain. In particular, the resemblance of dopaminergic activity to the TD error signal [1] and the modulation of corticostriatal plasticity by dopamine [2] lend support to this hypothesis. We recently proposed the first spiking neural network model to implement actor-critic TD learning [3], enabling it to solve a complex task with sparse rewards. However, this model calculates an approximation of the TD error signal in each synapse, rather than utilizing a neuromodulatory system.

Here, we propose a spiking neural network model which dynamically generates a dopamine signal based on the actor-critic architecture proposed by Houk [4]. This signal modulates as a third factor the plasticity of the synapses encoding value function and policy. The proposed model simultaneously accounts for multiple experimental results, such as the generation of a TD-like dopaminergic signal with realistic firing rates in conditioning protocols [1], and the role of presynaptic activity, postsynaptic activity and dopamine in the plasticity of corticostriatal synapses [5]. The excellent agreement between the predictions of our synaptic plasticity rules and the experimental findings is particularly noteworthy, as the update rules were postulated employing a purely top-down approach.

We performed simulations in NEST [6] to test the learning behavior of the model in a two dimensional grid-world task with a single rewarded state. The network learns to evaluate the states with respect to its reward proximity and adapt its policy accordingly. The learning speed and equilibrium performance are comparable to those of a discrete time algorithmic TD learning implementation.

The proposed model paves the way for investigations of the role of the dynamics of the dopaminergic system in reward-based learning. For example, we can use lesion studies to analyze the effects of dopamine treatment in Parkinson's patients. Finally, the experimentally constrained model can be used as the centerpiece of closed-loop functional models.

References

  1. Schultz W, Dayan P, Montague PR: A neural substrate of prediction and reward. Science. 1997, 275: 1593-1599. 10.1126/science.275.5306.1593.

    Article  CAS  PubMed  Google Scholar 

  2. Reynolds JN, Hyland BI, Wickens JR: A cellular mechanism of reward-related learning. Nature. 2001, 413: 67-70. 10.1038/35092560.

    Article  CAS  PubMed  Google Scholar 

  3. Potjans W, Morrison A, Diesmann M: A spiking neural network model of an actor-critic learning agent. Neural Computation. 2009, 21: 301-339. 10.1162/neco.2008.08-07-593.

    Article  PubMed  Google Scholar 

  4. Houk JC, Adams JL, Barto AG: A model of how the basal ganglia generate and use neural signals that predict reinforcement. 1995, MIT Press, Cambridge, MA

    Google Scholar 

  5. Reynolds JN, Hyland BI, Wickens JR: Dopamine-dependent plasticity of corticostriatal synapses. Neural Networks. 2002, 15: 507-521. 10.1016/S0893-6080(02)00045-X.

    Article  PubMed  Google Scholar 

  6. Gewaltig M-O, Diesmann M: NEST (neural simulation tool). Scholarpedia. 2007, 2: 1430.

    Article  Google Scholar 

Download references

Acknowledgements

Partially funded by EU Grant 15879 (FACETS), BMBF Grant 01GQ0420 to BCCN Freiburg, Next-Generation Supercomputer Project of MEXT, Japan, and the Helmholtz Alliance on Systems Biology.

Author information

Authors and Affiliations

Authors

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Potjansu, W., Morrison, A. & Diesmann, M. A spiking temporal-difference learning model based on dopamine-modulated plasticity. BMC Neurosci 10 (Suppl 1), P140 (2009). https://doi.org/10.1186/1471-2202-10-S1-P140

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1471-2202-10-S1-P140

Keywords