Is self-control a learned strategy employed by a reward maximizing brain?
© Cleanthous and Christodoulou; licensee BioMed Central Ltd. 2009
Published: 13 July 2009
Self-control can be defined as choosing a large delayed reward over a small immediate reward . Brain-imaging studies  have shown that such behaviors result from competition between neural systems demonstrating that two separate systems are involved in such decisions. In particular, parts of the limbic system are preferentially activated by decisions involving instant rewards whereas regions of the prefrontal cortex are engaged uniformly by intertemporal choices irrespective of delay . Moreover, the subjects' choice was directly linked to the relative activation of the two systems . As Kavka  suggests, it is possible that such inner conflicts are resolved as if they were a result of strategic interaction among rational subagents.
A computational model of interpersonal conflict is proposed where we implement two spiking neural networks as two players, learning simultaneously but independently, competing in the Iterated Prisoner's Dilemma (IPD) game. An interpretation of the IPD is that it demonstrates interpersonal conflict  where the Cooperate-Cooperate (CC) outcome corresponds to the behavior of self-control. The outcome of each round of the game is taken according to the relative output activation. The purpose of the system is to learn how to exhibit self-control through biologically plausible reinforcement learning. To the best of our knowledge, our work implements, for the first time, a game theoretical view of self-control with a computational system that learns through biologically plausible algorithms.
Learning in our system links behavior to the synaptic level by reinforcing stochastic synaptic transmission . Results show that the system managed to maximize reward by establishing a strong self-controlled behavior, reflected by a strong CC outcome . It is noted that the self-control outcome not only persisted during the final rounds of the games, but it also did not change after the 100th round due to the system's dynamics that were evolved by that point in time in such a way to consistently produce the self-control outcome. This reveals that after a certain point the networks learned that is for their own benefit to compromise in order to maximize their long-term reward. Preliminary results suggest that the system's performance, especially its adaptability, is further enhanced when reinforcement learning through modulated Spike-Timing-Depended Plasticity [6, 7] is integrated into the system. Overall, our results indicate that self-control is a learned strategy employed by a reward maximizing brain in the presence of competing neural systems that results to the regulated activation of the respective systems.
We gratefully acknowledge the support of the University of Cyprus for a Small Size Internal Research Programme grant and the Cyprus Research Promotion Foundation as well as the European Union Structural Funds for grant PENEK/ENISX/0308/82.
- Rachlin H: The Science of Self-Control. 2000, Cambridge, MA: Harvard University PressGoogle Scholar
- McClure SM, Laibson DI, Loewenstein G, Cohen JD: Separate neural systems value immediate and delayed monetary rewards. Science. 2004, 306: 503-507. 10.1126/science.1100907.PubMedView ArticleGoogle Scholar
- Kavka G: Is individual choice less problematic than collective choice?. Economics and Philosophy. 1991, 7: 143-165.View ArticleGoogle Scholar
- Seung HS: Learning in spiking neural networks by reinforcement of synaptic transmission. Neuron. 2003, 40: 1063-1073. 10.1016/S0896-6273(03)00761-X.PubMedView ArticleGoogle Scholar
- Christodoulou C, Banfield G, Cleanthous A: Self-control with spiking and non-spiking neural networks playing games. Journal of Physiology (Paris).Google Scholar
- Florian RV: Reinforcement learning through modulation of spike-timing dependent synaptic plasticity. Neural Computation. 2007, 19: 1468-1502. 10.1162/neco.2007.19.6.1468.PubMedView ArticleGoogle Scholar
- Legenstein R, Pecevski D, Maass W: A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback. PLoS Computational Biology. 2008, 4: e1000180-10.1371/journal.pcbi.1000180.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd.