This study compared the behavioral reaction to rarefaction of large reinforcement in a pair of similar two-choice operant-behavior tasks, where preference for large or small rewards is assessed in food-restricted rats. We wish to underline that the present animals were trained to consume their daily meal during the testing session and shortly when placed back into the home cage. In this way, they were facing a 23 h/day feeding absence, and were hence motivated to work for food during operant testing. While undergoing this procedure, which is largely adopted in the literature, rats are not under starvation nor malnutrition, since they fully express a normal behavioral repertoire in the home cage, including a playful social behavior. The attributes of food reward in either test share some similarities and some key differences. Namely, the small reward (SS) is always for sure and comes immediately, whereas the large reward delivery may be either delayed (ID protocol) or uncertain (PD protocol). Specific values, to be run for each daily session, were selected after an a priori calculation of the correspondence between delays and odds/probability, aimed to produce a similar level of rarefaction for LL/L delivery in both protocols. This was to be sure that the striking differences in behavior, generated by the two protocols, could not be explained by gross differences in the maximal frequency of large rewards potentially obtained, and should be ascribed to other intrinsic key features. An a posteriori re-estimation of equivalence between odds/probability and delays was conducted, which scaled the whole curve generated by the ID task towards the left. As a result, differences between choice profiles in the two tasks were even strengthened. Indeed, testing of animals until a delay of 150 s within the ID task, which would be substantially equivalent to odds = 6 reached in the PF task, likely would produce a further enhancement of the dissociation observed between the two profiles.
The experimental observations reported here demonstrate that the profile shown in PD protocols may differ substantially from classical intolerance/aversion observed in ID protocols. As classically reported, animals in the ID protocol then developed a robust aversion for the longest delays, and showed a marked shift from LL to SS choice. At this point, animals are likely to have realized that there is nothing to do to avoid the elapsing of delay intervals, and are therefore likely to develop an anti-economical intolerance (Evenden, 1999; Ho et al., 1999; Monterosso and Ainslie, 1999). Conversely, animals tested in the PD protocol did not show the same profile. As a reaction to an increasing proportion of omitted reinforcement, animals developed and maintained a robust preference for LLL reward. Such behavior may in part be explained by the fact that chamber and magazine lights, required to signal starting and ending of the time-out period, were also turned on when food delivery was omitted. These lights may indeed be acting as a conditioned reinforcers, and sustain LLL-seeking behavior until the primary reinforcer eventually comes (as in second-order schedules, see Everitt and Robbins, 2000). These considerations may help formulating a possible explanation, namely that training with the uncertainty challenge was gradual enough for rats to realize that large reinforcement eventually comes, despite the repeated and unpredictable omissions. Preference for LLL until odds of 5 and 6 is striking, since over 80% of these nose-poking demands did not trigger any food delivery. It is intriguing to note that the average waiting-time for the eventual LLL reward was 60 seconds or more, a delay which produced a considerable aversion for LL reward when assessed in the ID protocol.
It is noteworthy that animals faced with either delays or omissions reacted in two distinct ways. After a substantial similarity at the mildest levels of LL/L rarefaction, choice curves began to diverge beyond odds = 1, which was substantially equivalent to delay = 30 s. In our opinion, the crucial difference between LL and LLL rewards is that the former is delayed in a signaled and predictable manner, whereas delay in obtaining the latter is completely unpredictable and luck-linked (Cardinal et al., 2000). Indeed, once triggered, the delay in the ID protocol must follow its scheduled duration, and there is nothing that animals can do. This situation is likely to induce a state of frustration, thus generating intolerance/aversion for delays. Conversely, after each omitted large-reward delivery in the PD protocol, animals have the possibility to express another nose-poking choice. In general, there are more opportunities of nose-poking choices in the PD protocol, and this may help animals to feel that the situation is more "under control". A great number of studies have shown that controllability of stress sources helps animals to cope with adverse contingencies, and hence uncertainty of LLL reward should be not necessarily expected to produce aversion.
From our data, it seems that LLL rewards are actively preferred by rats. Animals apparently adopted the strategy to wait for a "lucky" but "rare" event, rather than collecting many smaller reinforcement. There are two possible explanations for this finding. One is that animals developed a "habit", the other is that rats were perhaps attracted by "binge" reinforcement, without being affected by its uncertainty neither by its rarefaction. As for the first possible explanation, the finding of minor effects on choice behavior in the PD protocol may indicate that the choice of nose-poking hole somewhat continued to occur independently from the outcome. This could suggest that rats developed a compulsive and perseverant hole preference, possibly due to the establishment of a behavioral habit. In other words, normal rats seem to express a sort of fixed habit-based responding, rather than being open to an evaluation of the actual outcome. Accordingly, recent findings (Adriani et al., 2004) raise the question of behavioral inflexibility (with a tendency to perseveration) shown during preference shifts, and suggest that some individuals may be less flexible than others (Evenden, 2002; Laviola et al., 2003). However, whilst PD rats were quite insensitive to LLL rarefaction, their ID-protocol siblings quickly displayed intolerance against large-reinforcement delay. ID subjects showed a shift to SS choice, a finding fully consistent with impulsivity-based responding, according to previous work (Evenden and Ryan 1996, 1999). It seems unlikely that PD rats were characterized by a lack of flexibility when their ID siblings were able to shift in their choice. Thus, we propose that the PD protocol generates a true "instinctive" preference for LLL rewards. Noteworthy, our present findings unequivocally suggest that rats prefer a random-coming reward, delivered eventually and all-at-once, rather than a similar overall gain, collected by slow accumulation of smaller unitary bits. In PD rats, the salient cue to decision seems to reside into the size attractivity of unitary reward, and not in frequency of its delivery nor in the total amount gained (or lost) over time.
Classical definitions of impulsivity predict that a given factor (delay, uncertainty) shall discount large-reward value, and that a choice shift is then observed against economic convenience of the outcome (Evenden, 1999; Ho et al., 1999; Monterosso and Ainslie, 1999). The phenomenon of uncertainty-produced discounting did not occur in our hands. It is likely that animals would eventually show a shift towards SS, at least for "p" values much lower than those presently tested. However, a shift to SS reward would be economically convenient in these conditions, and cannot therefore be used as an index of impulsive behavior. Indeed, only anti-economical choices can be considered an unbiased and reliable index in the field of impulsive decision making (Evenden, 1999; Ho et al., 1999; Monterosso and Ainslie, 1999). Such kind of considerations also apply to interpretation of data by Mobini and colleagues (2000, 2002). These authors employed a "small"/"large" reward ratio of 0.5 (Mathematical indifference point at "p" = 50%) and tested rats under a range of "p" values well beyond this point (Mobini et al., 2000, 2002). In our opinion, by promoting a preference shift according to (rather than against) economical convenience, these features do bias the preference shift as a parameter for impulsive decision making. Abnormalities in the process of large-reward discounting, observed by these authors in rats after various manipulations, might well be discussed in terms of altered perception of reward magnitudes and/or deficits in comparing global payoff of either choice option. In other words, their protocol contingencies may be useful to evaluate abnormalities in the emergence of an economically-forced shift in choice behavior, being conversely not suitable to measure impulsive choice, by definition.
Two considerations must be highlighted here regarding models of reward discounting. It is assumed that, when animals discount the value of a delayed or uncertain reward, then the value of the SS reward is compared against the discounted value of the LL/L reward to take a decision. These considerations are put forward by also assuming that perception of reward size is equivalent in both paradigms and that odds against rewarding may be compared across the two protocols. A first consideration is that delay-induced discounting is apparently a more reliable phenomenon than uncertainty-induced discounting. In other words, rats poorly cope with delay and inescapable waiting constraints, whereas they accept unpredictable omissions in reward-delivery. Moreover, under the present conditions, where the final net foraging was not affected, rats preferred to work for "binge wins", even at high levels of rarefaction, rather than shifting towards a SS-seeking strategy. The latter, made of nose-poking for a lower and constant outcome, would imply a flatter distribution of food gains. Such observation again suggests that very similar temporal distributions, made of bouts of five pellets coming far apart each other, are potentially generated in both tasks. However, such kind of distribution can be avoided, when the distance between bouts is an inflexible constraint (ID task), and can conversely be preferred, when its length varies in an unpredictable way (PD task). One possible explanation for the preference, displayed in the latter task, is that the LLL-choice behaviour may be reinforced by the contrast between many no-food trials and the eventual five-pellet one. Such kind of contrasts have been shown to increase dopaminergic activity in the midbrain (Fiorillo et al., 2003), and may support risk-taking behaviour (Van den Bos et al., 2005).
A second consideration is that there might be a pitfall in those models, where criteria for decision reside in comparison between unitary value of SS versus unitary discounted value of LL/L reward. This assumption may stand valid for a sort of "instinctive" decision process, where two alternative unitary values are first discounted and then compared. But more "evolved" decision processes may occur, where animals may be capable to take into account the global convenience of a choice strategy, by considering delivery failures and food amount earned (or lost) across multiple choices. This kind of process requires the ability to consider, beyond each single choice, the estimated quantity of food gained having access to many choices during a given time interval. Without taking into account this process, the conceptual models describing decision making may be limited. In this light, the PD task adopted here is suggested to unveil attraction for "binge and rare" over "low and constant" reinforcement, when rarefaction is obtained by random omissions (rather than a constant delay) of reinforcer delivery. This behavioral drive, consisting of willingness to seek for a highly rewarding sensation despite association with some negative features (such as random rarefaction in the present case), is not surprising. Much effort has been devoted to study determinants of risk-seeking drives, in terms of neurobiological substrates and adaptive function (Bardo et al., 1996; Laviola et al., 1999). Present observations may open new perspectives for studies in the field of peculiar (patho)physiological conditions, such as sensation seeking, reckless behavior, and gambling.