Skip to main content
  • Poster presentation
  • Open access
  • Published:

Coordination of adaptive working memory and reinforcement learning systems explaining choice and reaction time in a human experiment

Contemporary behavioral learning theory provides a comprehensive description of how we and other animals learn, and places behavioral flexibility and automaticity at heart of adaptive behaviors. However, to our knowledge, the computations supporting the interactions between deliberative and habitual decision-making systems are still poorly understood. Previous functional magnetic resonance imaging (fMRI) results suggest that the dorsal striatum host complementary computations that may differentially support deliberative and habitual processes [1] in the form of a dynamical interplay rather than a serial recruitment of strategies. From the same instrumental task, we develop a dual-system computational model of the two systems that can predict both performance (i.e., participant choices) and modulations in reaction times during learning. The instrumental task is a trial-and-error learning task requiring participants to find the correct associations between color stimuli and finger responses.

To model the habitual system, we use a simple Q-learning algorithm (QL) [2] whose properties are fast responses, but slow convergence. For the deliberative (i.e goal-directed) system, we propose a new Bayesian Working Memory (BWM) which searches for information in the history of previous trials and stops as soon as the uncertainty on the action to perform decreases below a certain threshold. Last, we also propose a model for QL and BWM coordination. Currently, most models of system selection tend to control action selection concurrently, using either the deliberative or habitual model according to uncertainty criteria [3, 4]. Only one model has investigated the relation between working memory and reinforcement learning [5] without, however explicitly modeling the temporal aspect of memory manipulation. In our approach, we propose a model for QL and BWM coordination. QL and BWM are merged such that the expensive memory manipulation is under control of, among others, the level of convergence of the habitual learning. Consequently, we also predict specific reaction times for each model that can be compared with the evolution of reaction times in instrumental learning tasks.

Models are optimized for each subject with the NSGA-2 multi-objective evolutionary algorithm. The first fitness function is the Bayesian Information Criterion for individual choices. The second fitness function is also a likelihood that maximizes the probability of performing reaction times similar to humans. We compare the ability of the new model to explain human behavior with the QL or BWM only, as well as with a combination of these models based on [4], which reveals that the proposed model is in general more accurate. To conclude, we suggest that a close combination of BWM and QL better explains both choices and reaction times for most participants.


  1. Brovelli A, Nazarian B, Meunier M, Boussaoud D: Differential roles of caudate nucleus and putamen during instrumental learning. NeuroImage. 2011, 57 (4): 1580-1590. 10.1016/j.neuroimage.2011.05.059.

    Article  PubMed  Google Scholar 

  2. Watkins C, Dayan P: Q-Learning. Machine Learning. 1992, 292 (8): 279-292.

    Google Scholar 

  3. Daw ND, Niv Y, Dayan P: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience. 2005, 8 (12): 1704-1711. 10.1038/nn1560.

    Article  CAS  PubMed  Google Scholar 

  4. Keramati M, Dezfouli A, Piray P: Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS computational biology. 2011, 7 (5): e1002055-10.1371/journal.pcbi.1002055.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  5. Collins A, Frank MJ: How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational and neurogenetic analysis. European Journal of Neuroscience. 2012, 35 (7): 1024-1035. 10.1111/j.1460-9568.2011.07980.x.

    Article  PubMed Central  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Guillaume D Viejo.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Viejo, G.D., Khamassi, M., Brovelli, A. et al. Coordination of adaptive working memory and reinforcement learning systems explaining choice and reaction time in a human experiment. BMC Neurosci 15 (Suppl 1), P156 (2014).

Download citation

  • Published:

  • DOI: