Stable reinforcement learning via temporal competition between LTP and LTD traces

Huertas, Marco A; Schwettmann, Sarah; Kirkwood, Alfredo; Shouval, Harel

doi:10.1186/1471-2202-15-S1-O12

Volume 15 Supplement 1

Abstracts from the Twenty Third Annual Computational Neuroscience Meeting: CNS*2014

Oral presentation
Open access
Published: 21 July 2014

Stable reinforcement learning via temporal competition between LTP and LTD traces

Marco A Huertas¹,
Sarah Schwettmann^1,2,
Alfredo Kirkwood³ &
…
Harel Shouval¹

BMC Neuroscience volume 15, Article number: O12 (2014) Cite this article

1928 Accesses
3 Citations
Metrics details

Neuronal systems that are involved in reinforcement learning must solve the temporal credit assignment problem, i.e., how is a stimulus associated with a reward that is delayed in time? Theoretical studies [1–3] have postulated that neural activity underlying learning ‘tags’ synapses with an ‘eligibility trace’, and that the subsequent arrival of a reward converts the eligibility traces into actual modification of synaptic efficacies. While eligibility traces provide one simple solution to the temporal credit assignment problem, they alone do not constitute a stable learning rule because there is no other mechanism indicating when learning should cease. In order to attain stability, rules involving eligibility traces often assume that once the association is learned, further learning is prevented via an inhibition of the reward stimulus [1, 3, 4].

Although synaptic plasticity is responsible for reinforcement learning in the brain, theories of reinforcement learning are generally abstract and involve neither neurons nor synapses. Furthermore, biophysical theories of synaptic plasticity typically model unsupervised learning and ignore the contribution of reinforcement. Here we describe a biophysically based theory of reinforcement-modulated synaptic plasticity and postulate the existence of two eligibility traces with different temporal profiles: one corresponding to the induction of LTP, and the other to the induction of LTD. The traces have different kinetics and their difference in magnitude at the time of reward determines if synaptic modification will correspond to LTP or LTD. Due to the difference in their decay rates, the LTP and LTD traces can exhibit temporal competition at the reward time and thus provides a mechanism for stable reinforcement learning without the need to inhibit reward. We test this novel reinforcement-learning rule on an experimentally motivated model of a recurrent cortical network [5], and compare the model results to experimental results at both the cellular and circuit levels. We further suggest that these eligibility traces are implemented via kinases and phosphatases, thus accounting for results at both the cellular and system levels.

References

Sutton RS, Barto AG: Reinforcement Learning. 1990, Cambridge, MA: MIT Press
Google Scholar
Izhikevich EM: Solving the distal reward problem through linkage of STDP and dopamine signaling. Cereb Cortex. 2007, 17 (10): 2443-2452. 10.1093/cercor/bhl152.
Article PubMed Google Scholar
Gavornik JP, Shuler MG, Loewenstein Y, Bear MF, Shouval HZ: Learning reward timing in cortex through reward dependent expression of synaptic plasticity. Proc Natl Acad Sci U S A. 2009, 106 (16): 6826-31. 10.1073/pnas.0901835106.
Article PubMed Central CAS PubMed Google Scholar
Rescorla RA, Wagner AR: A theory of Pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement. Classical Conditioning II: Current Research and Theory. Edited by: AH Black & WF Prokasy. 1972, New York: Appleton-Century-Crofts, 64-69.
Google Scholar
Shuler MG, Bear MF: Reward timing in the primary visual cortex. Science. 2006, 311 (5767): 1606-9. 10.1126/science.1123513.
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Dep. Neurobiology and Anatomy, University of Texas Medical School, Houston, TX, 77030, USA
Marco A Huertas, Sarah Schwettmann & Harel Shouval
Dep. Computational and Applied Mathematics, Rice University, Houston, TX, 77005, USA
Sarah Schwettmann
Mind/Brain Institute, Johns Hopkins University, Baltimore, MD, 21218, USA
Alfredo Kirkwood

Authors

Marco A Huertas
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Schwettmann
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Kirkwood
View author publications
You can also search for this author in PubMed Google Scholar
Harel Shouval
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco A Huertas.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Huertas, M.A., Schwettmann, S., Kirkwood, A. et al. Stable reinforcement learning via temporal competition between LTP and LTD traces. BMC Neurosci 15 (Suppl 1), O12 (2014). https://doi.org/10.1186/1471-2202-15-S1-O12

Download citation

Published: 21 July 2014
DOI: https://doi.org/10.1186/1471-2202-15-S1-O12

Abstracts from the Twenty Third Annual Computational Neuroscience Meeting: CNS*2014

Stable reinforcement learning via temporal competition between LTP and LTD traces

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Neuroscience

Contact us

Abstracts from the Twenty Third Annual Computational Neuroscience Meeting: CNS*2014

Stable reinforcement learning via temporal competition between LTP and LTD traces

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Neuroscience

Contact us