Policy gradient rules for populations of spiking neurons
© Friedrich et al; licensee BioMed Central Ltd. 2011
Published: 18 July 2011
Population coding is widely regarded as a key mechanism for achieving reliable behavioral decisions given a high neuronal variability. Here, we present two general recipes to derive learning rules from a policy gradient approach for different neural codes and decision making networks, one based on partial integration across feature values, and one based on linear approximation around a target feature. The first technique leads to a tightly code-specific learning rule where details of the code-irrelevant spiking information are integrated away and the code-specificity enters at the synaptic level. The second technique yields modular learning rules which can be weakly code-specific, with a spike-timing dependent base synaptic plasticity rule which is modulated by a code specific population and decision signal. Decisions can be binary, multi-valued, or even continuous-valued. For illustration, we consider a spike count and a spike latency code. We test them on simple model problems and assess the superiority of tight over weak code-specificity with respect to the performance. While code-specific rules increase the performance only marginally when considering a single neuron , our tightly code-specific rule designed for population coding can strongly boost performance. Both code-specific learning rules improve in performance with increasing population size as opposed to standard reinforcement learning . For mathematical clarity we presented the rules for an episodic learning scenario. But a biological plausible implementation of a fully online scheme is also possible [2, 3].
- Sprekeler H, Hennequin G, Gerstner W: Code-specific policy gradient rules for spiking neurons. Advances in Neural Information Processing Systems. Edited by: Bengio Y, Schuurmans D, Lafferty J, Williams CKI, and Culotta A. 2009, 22: 1741-1749.Google Scholar
- Urbanczik R, Senn W: Reinforcement learning in populations of spiking neurons. Nature Neuroscience. 2009, 12 (3): 250-252. 10.1038/nn.2264.View ArticlePubMedGoogle Scholar
- Friedrich J, Urbanczik R, Senn W: Learning spike-based population codes by reward and population feedback. Neural Computation. 2010, 22 (7): 1698-1717. 10.1162/neco.2010.05-09-1010.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.