Long-term addicts find themselves powerless to resist drugs, despite knowing that drug-taking may be a harmful course of action, and an explicit motivation to quit. In controlled experiments, human addicts show a self-described mistake characterized by an inconsistency between drug-seeking response and their reported subjective value. We provide a unified computational theory for this inconsistency by showing how addictive drugs gradually produce a motivational bias toward drug-seeking at low-level habitual decision processes, despite the low abstract cognitive values. This pathology emerges within the hierarchical reinforcement learning (HRL) framework when chronic drug-exposure pharmacologically hijacks the dopaminergic spirals that cascade reinforcement signal down the ventro-dorsal cortico-striatal hierarchy.
Here, rt is the rewarding value of the outcome, be it natural rewards or addictive drugs. These equations show that in order to compute the prediction error signal for updating the value (Q) of state-action pairs at the n-th level of decision hierarchy, the value of the temporally-advanced state (st+1) comes from one higher level of abstraction (n+1). This captures the role of dopamine-dependent serial connectivity linking the ventral to the dorsal striatum (known as dopamine spirals), which is suggested to integrate information across the segregated cortico-basal ganglia loops, thereby allowing more abstract levels to tune the reinforcement signal used at more detailed levels . The pharmacological effect of addictive drugs on increasing the extracellular concentration of dopamine within the striatum is incorporated into this model by adding a positive term D to the prediction error signal. Simulation results (Figure 1) show that drug-induced dopamine-release puts a bias on the transfer of reinforcement signal from one level of abstraction to the next. The accumulation of these biases along the rostro-caudal axis progressively induces a significant discrepancy in the value of drug-seeking behaviors at the top and bottom extremes of the hierarchy, thereby, an inconsistency between cognitive plans and motor-level habits.
Beside this central phenomenon, our model also accounts for several behavioral and neurobiological aspects of addiction, such as the gradual insensitivity of drug-seeking to drug-associated punishments (compulsivity), the delayed development of cue-elicited dopamine efflux in addicts’ dorsal striatum, and the occurrence of blocking effect for drug rewards. It also suggests key testable predictions and beyond that, sets the stage for a view of addiction as a pathology of hierarchical decision making processes.
Haruno M, Kawato M: Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning. Neural Networks. 2006, 19: 1242-1254. 10.1016/j.neunet.2006.06.007.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.