Which Temporal Difference learning algorithm best reproduces dopamine activity in a multi-choice task?

Bellot, Jean; Khamassi, Mehdi; Sigaud, Olivier; Girard, Benoît

doi:10.1186/1471-2202-14-S1-P144

Volume 14 Supplement 1

Abstracts from the Twenty Second Annual Computational Neuroscience Meeting: CNS*2013

Poster presentation
Open access
Published: 08 July 2013

Which Temporal Difference learning algorithm best reproduces dopamine activity in a multi-choice task?

Jean Bellot^1,2,
Mehdi Khamassi^1,2,
Olivier Sigaud^1,2 &
…
Benoît Girard^1,2

BMC Neuroscience volume 14, Article number: P144 (2013) Cite this article

1292 Accesses
Metrics details

The activity of dopaminergic (DA) neurons has been hypothesized to encode a reward prediction error (RPE) [1] which corresponds to the error signal in Temporal Difference (TD) learning algorithms [2]. This hypothesis has been reinforced by numerous studies showing the relevance of TD learning algorithms to describe the role of basal ganglia in classical conditioning. However, most studies have recorded DA activity during pavlovian conditioning, and thus the exact nature of the signal encoded by DA neurons during a choice remains unclear. In the literature of reinforcement learning different TD learning algorithms predict different RPE during a choice. If the algorithm SARSA predicts a RPE based on the future choice, Q-learning predicts a RPE that will be based on the action that maximize the future amount of reward and V-learning predicts a RPE based on an average of the values of the different available options.

Recent recordings of DA neurons during multi-choice tasks investigated this issue and raised contradictory interpretations on whether DA's RPE signal is action dependent [3] or not [4]. While the first study suggests that DA neurons encode a RPE compatible with SARSA, results from the second study are interpreted as more consistent with Q-learning [4]. However these studies only proposed a qualitative comparison of the ability of these TD learning algorithms to explain these patterns of activity. In this work we simulated and precisely analyzed these algorithms in relation with previous electrophysiological recordings in a multi-choice task performed by rats [4]. We found that, when fitting the behavior, the simulated algorithms predict a fast convergence of the RPE, incompatible with the observed DA activity, suggesting an apparent dissociation between the signal encoded by dopamine neurons and behavioral adaption of the animals. Further analyses of the evolution of dopamine neurons activity across learning indicated that, complementarily to the RPE, the value function fits well with the activity. However the value function cannot explain the inhibition of DA activity during omission and the global decrease of DA activity during a session at the time of reward delivery. Thus in this task, information about both RPE and value may be conveyed by dopamine activity.

By quantitatively comparing the ability of the different TD learning algorithms, this work shows the limitation of these algorithms to fit both the behavior and the DA activity observed in a multi-choice task, when interpreting DA activity as a RPE only. Unexpectedly we show that a value function better fits DA activity suggesting that DA neurons recorded in this task may encode multiple information.

References

Schultz W, Dayan P, Montague PR: A neural substrate of prediction and reward. Science. 1997, 275 (5306): 1593-1599. 10.1126/science.275.5306.1593.
Article CAS PubMed Google Scholar
Richard S, Sutton Andrew G, Barto : Introduction to Reinforcement Learning. MIT Press. 1998, 1,
Google Scholar
Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H: Midbrain dopamine neurons encode decisions for future action. Nature neuroscience. 2006, 9 (8): 1057-1063. 10.1038/nn1743.
Article CAS PubMed Google Scholar
Roesch MR, Calu DJ, Schoenbaum G: Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nature Neuroscience. 2007, 10 (12): 1615-1624. 10.1038/nn2013.
Article PubMed Central CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Institut des Systemes Intelligents et de Robotique, Université Pierre et Marie Curie, Paris, France
Jean Bellot, Mehdi Khamassi, Olivier Sigaud & Benoît Girard
Centre National de la Recherche Scientifique, UMR, 7222, Paris, France
Jean Bellot, Mehdi Khamassi, Olivier Sigaud & Benoît Girard

Authors

Jean Bellot
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Khamassi
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Sigaud
View author publications
You can also search for this author in PubMed Google Scholar
Benoît Girard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jean Bellot.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Bellot, J., Khamassi, M., Sigaud, O. et al. Which Temporal Difference learning algorithm best reproduces dopamine activity in a multi-choice task?. BMC Neurosci 14 (Suppl 1), P144 (2013). https://doi.org/10.1186/1471-2202-14-S1-P144

Download citation

Published: 08 July 2013
DOI: https://doi.org/10.1186/1471-2202-14-S1-P144

Abstracts from the Twenty Second Annual Computational Neuroscience Meeting: CNS*2013

Which Temporal Difference learning algorithm best reproduces dopamine activity in a multi-choice task?

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Neuroscience

Contact us

Abstracts from the Twenty Second Annual Computational Neuroscience Meeting: CNS*2013

Which Temporal Difference learning algorithm best reproduces dopamine activity in a multi-choice task?

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Neuroscience

Contact us