WORKING PAPER Uploaded to bioRxiv and SSRN Reward-specific satiety affects subjective value signals in orbitofrontal cortex during multi- component economic choice

Sensitivity to satiety constitutes a basic requirement for neuronal coding of subjective reward value. Satiety from natural on-going consumption affects reward functions in learning and approach behavior. More specifically, satiety reduces the subjective economic value of individual rewards during choice between options that typically contain multiple reward components. The unconfounded assessment of economic reward value requires tests at choice indifference between two options, which is difficult to achieve with sated rewards. By conceptualizing choices between options with multiple reward components ('bundles'), Revealed Preference Theory may offer a solution. Despite satiety, choices against an unaltered reference bundle may remain indifferent when the reduced value of a sated bundle reward is compensated by larger amounts of an unsated reward of the same bundle; then the value loss of the sated reward is indicated by the amount of the added unsated reward. Here we show psychophysically titrated choice indifference in monkeys between bundles of differently sated rewards. Neuronal chosen value signals in orbitofrontal cortex (OFC) followed closely the subjective value change within recording periods of individual neurons. A neuronal classifier distinguishing the bundles and predicting choice substantiated the subjective value change. Choice between conventional single rewards confirmed the neuronal changes seen with two-reward bundles. Thus, reward-specific satiety reduces subjective reward value signals in OFC. With satiety being an important factor of subjective reward value, these results extend the notion of subjective economic reward value coding in OFC neurons. Significance On-going consumption reduces the subjective value of rewards to different degrees depending on their individual properties, a phenomenon referred to as sensory-specific satiety. Such value change should be manifested in economic choices, and neuronal signals for subjective economic reward value should be sensitive to reward-specific satiety. We tested monkeys during choice between two options that each contained two different rewards ('bundles'); the two rewards were prone to different degrees of satiety. On-going reward consumption affected choices in a way that indicated satiety-induced reward-specific change of subjective economic value. Neuronal responses in the monkey orbitofrontal cortex (OFC) followed the differential reduction of subjective economic value. These results satisfy a crucial requirement for subjective reward value coding in OFC neurons.


Introduction
There are no specific sensory receptors for rewards, and their value is determined by the needs of individual decision makers. Thus, rewards have subjective value rather than being solely characterized by physical measures such as molecular concentrations, milliliters of juice or units of money. Accordingly, neuronal signals in prime reward structures, such as orbitofrontal cortex (OFC) and dopamine neurons, code reward value on a subjective basis (1)(2)(3). One of the key factors determining subjective value is satiety that arises from on-going reward consumption. After drinking a cup of coffee, we may desire a glass of water. We feel sated on coffee while still seeking liquid. Apparently, the coffee has lost more value for us than water. Such value loss is often referred to as sensory-specific satiety (or, more appropriately here, reward-specific satiety) and contrasts with general satiety that refers indiscriminately to all rewards (4,5). Thus, reward-specific satiety is a key factor of subjective reward value, and any claim towards neuronal subjective reward value coding should include sensitivity to reward-specific satiety.
Reward-specific satiety in humans, monkeys and rodents affects approach behavior, goaldirected behavior, operant responding, learning and pleasantness associated with the specific reward. Lesioning, inactivation, and neural studies demonstrate the involvement of frontal cortex, and in particular OFC, in behavioral changes induced by general and reward-specific satiety. The studies assessed alterations of associative strength, cognitive representations for learning, approach behavior and goal-directed behavior (6-18) but did not address the appreciation and maximization of subjective economic reward value that constitute the central interest of economic decision theory (18)(19)(20) and current neuroeconomics research (1,21,22,23,24). Economic reward value cannot be measured directly but is inferred from observable choice (19)(20)(21). Value estimations are made at choice indifference between a test option and a reference option, which renders them immune to unselective general satiety and controls for confounding slope changes of choice functions (25). While choice indifference is possible with milder value differences from reward type, reward delay, reward risk or spontaneous fluctuations (1,3,26), it may fail with substantial satiety when animals categorically prefer non-sated alternatives (15)(16)(17). By contrast, choice indifference becomes feasible when an added amount of unsated reward can compensate for the value loss of the sated reward. Such tests require reward options with two reward components ('bundles'). Indeed, all choice options constitute bundles; they are either single rewards with multiple components, like the taste and fluid of a cup of coffee, or contain multiple rewards, like meat and vegetables of a meal we choose.
The rationale of our experiment rests on the tenet that candidate neuronal signals for subjective economic value need to be sensitive to reward-specific satiety. We used bundles whose multiple rewards sated differentially and allowed testing at choice indifference. Using strict concepts of Revealed Preference Theory (19,27,28), we had demonstrated that monkeys chose rationally between two-reward bundles by satisfying completeness, transitivity and independence of irrelevant alternatives (24). Using these validations, we now estimated the loss of economic value from reward consumption during on-going task performance. Using two-reward bundle options, instead of single-reward options, we tested choice indifference at specifically set, constant levels. We used multiple, equally preferred indifference points for constructing two-dimensional graphic indifference curves (IC) on which all bundles had by definition the same subjective value. The slopes of these ICs demonstrated subjective value relationships ('currency') between two bundle rewards. On-going reward consumption changed the IC slopes in characteristic ways that suggested reward-specific subjective value reduction. During full recording periods of individual OFC neurons, chosen value responses tracked the IC slope changes in a systematic way that suggested a neuronal correlate for reward-specific satiety. These data support and extend the claim of subjective economic reward value coding by OFC neurons.

Results
Design. This study assumed that subjective reward value can be inferred from observable economic choice, that altered choice indicates changed subjective value, and that reduction of subjective value with reward consumption reflects satiety. By definition, two options that are chosen with equal probability have the same subjective value (indifference at choice P = 0.5 each option). Testing at choice indifference is important, as testing at other positions on the sigmoid choice function cannot exclude confounds from slope changes of the choice function (25). We implemented the following design: (1) A standard economic method estimates subjective economic value at choice indifference against a constant reference reward. As animals may not choose a sated single reward when a nonsated alternative is available (15)(16)(17) and thus fail to achieve choice indifference, we studied bundles that each contained the same two rewards on which the animals sated to different degrees. Thus, satiety-induced stronger value reduction of one reward could be compensated by larger quantity of the other, less sated reward to maintain choice indifference. Although the animal sated on one reward much less than on its alternative, it chose each bundle on one half of the trials. The choice involved a single arm movement, and the rewards from the chosen bundle were paid out immediately. Thus, the task contained two discrete, mutually exclusive and collectively exhaustive options. Due to the statistical requirements of neuronal data analysis, we tested stochastic choice in repeated trials rather than single-shot choice.
(1a) Specifically, the animal chose between a Reference Bundle and a simultaneously presented Variable Bundle. In both bundles, Reward A was blackcurrant juice without or with added monosodium glutamate (MSG), on which the animals sated very little. Reward B was either grape juice, strawberry juice, mango juice, water, apple juice or grape juice with added inosine monophosphate (IMG) on which the animals sated substantially, or peach juice inducing less satiety than blackcurrant juice. The height of two bars within two visual stimuli indicated the quantities of Reward A and B of each bundle on a horizontal touch monitor (higher bar indicated more reward) (Fig. 1A, B). In the Reference Bundle, both rewards were set to specific test quantities. In the Variable Bundle, one reward was set to a current test quantity, and the quantity of the other reward was adjustable.
(2) Choice indifference indicates equal subjective value between options. At choice indifference against a constant bundle, a change in subjective value of a reward of the alternative bundle needs to be compensated by a change of the other reward of that alternative bundle.
(2a) Specifically, the compensatory change of Reward A of the Variable Bundle required for choice indifference against the constant Reference Bundle was a measure of subjective value loss of Reward B of the Variable Bundle (Fig. 1C). The changed IPs of Variable Bundles relative to the constant Reference Bundle are conveniently graphed on a two-dimensional map ( Fig 1D). As a common satiety-induced value loss with both bundle rewards might not be detectable as IP change, any IP change may hide non-selective satiety; therefore, an IP change would indicate a value change for a specific reward relative to the other bundle reward, rather than an absolute value change of that specific reward.
(3) A series of IPs aligns as an IC. Thus, an IP change that indicates subjective value change result in an IC change (Fig. 1E).
(4) As consequence of changed IPs and ICs indicating subjective value change, the original, physically unchanged bundles to which the animal was indifferent before satiety fail to match the IPs and ICs established during satiety. The mismatch depends on the reward being sated; it is smaller for less sated rewards and larger for more sated rewards. (5) The mismatch between IPs and ICs established before and during satiety represents the major measure of altered subjective reward value by on-going consumption and will be used for describing the altered neuronal coding of subjective reward.

Consumption-induced relative subjective value reduction.
A typical test started with a fixed Reference Bundle (0.6 ml blackcurrant juice, no grape juice) and a Variable Bundle (variable blackcurrant juice quantity; fixed 0.3 ml grape juice). We titrated Reward A of the Variable Bundle (common currency) to achieve choice indifference between the two bundles. Psychophysical estimations with Weibull-fitted IPs during blocks of 80 trials demonstrated choice indifference with 0.1 ml of blackcurrant juice in the Variable Bundle (Fig. 1C, leftmost green IP; P = 0.5 choice of each bundle). On a two-dimensional plot of bundle rewards (Fig. 1D), the connecting line between two bundles indicated equal subjective value: the Variable Bundle (green dot) had the same subjective value as the Reference Bundle (black dot); the two dots were IPs relative to each other. Thus, at choice indifference, the gain of 0.3 ml of grape juice in the Variable Bundle relative to the Reference Bundle had a common currency value of 0.5 ml of blackcurrant juice.
With on-going consumption of blackcurrant and grape juices, choice indifference required increasing blackcurrant juice, as indicated by successive IPs (Fig. 1C, from green via violet, blue and orange to red at P = 0.5). Apparently, the gained 0.3 ml of grape juice in the Variable Bundle had lost some of its value for the animal, and only larger blackcurrant juice quantities compensated for that loss. The two IPs within the green 95% confidence interval suggested initially maintained subjective value of grape juice, whereas continuing consumption moved the next IPs outside the CI, indicating progressive reduction of subjective grape juice value. The consumption-induced rightward IP progression (Fig. 1C) translated into an upward movement of IPs on a twodimensional graph (Fig. 1D, from green to red), thus flattening the slope between the two IPs. Thus, the additional blackcurrant juice required for choice indifference provided a common currency estimate of the subjective value loss of grape juice. Or, compared to the Reference Bundle, the animal gave up progressively less blackcurrant juice to gain the same quantity of grape juice (0.3 ml) (decreased Marginal Rate of Substitution, MRS, of blackcurrant juice for grape juice). These measures indicated increasing subjective value loss of grape juice relative to blackcurrant juice with on-going consumption.
Wider bundle variations demonstrated consistent consumption-induced subjective value changes (Fig. 1E). Starting with a Reference Bundle of 0.6 ml of blackcurrant juice alone, the first IP was obtained by setting the Variable Bundle to coordinates inside the x-y graph. Subsequently, the Reference Bundle was set to the first IP, and another IP was estimated from a newly set Variable Bundle. Repetition of this procedure, in pseudorandomly alternating directions to avoid local distortions (29), resulted in a series of IPs. The curvature of ICs fitted to these IPs (see Materials and Methods; Eq. 1) progressed from convex via near-linear to concave (Fig. 1E, green via blue to red). The concavity indicated that the animal required substantial grape juice levels before trading in much blackcurrant juice, possibly reflecting the liquid content of the grape juice, which suggested systematic and advancing subjective value loss of grape juice relative to blackcurrant juice. These slope and curvature changes of two-dimensional IC occurred during recording periods of individual neurons and constituted our test scheme for behavioral and neuronal correlates of reward-specific satiety.
As control, we inverted the test scheme by holding blackcurrant juice constant and varying grape juice psychophysically to obtain choice indifference. Here, instead of the common currency quantity of blackcurrant juice, the increasing quantity of grape juice at choice indifference quantified its subjective value decrease with on-going consumption. When using single-reward bundles with only one but different reward in each bundle having non-zero amount, consumption of both rewards increased IPs, flattened IC slopes and increased blackcurrant:grape juice ratios at IPs; when using two-reward bundles, IC curvature changed from convex to concave (Fig. S1A-D). The ICs with bundles containing water in Monkey B showed similar slope flattening and reduced convexity with this test scheme (Fig. S1E, F). Thus, the consumption-induced relative subjective value reduction of grape juice or water relative to blackcurrant juice was robust irrespective of test scheme.
CI of the initial, left-most choice function between blackcurrant juice and any reward (green in Figs. 1D, S1A and S1E); any IP outside this interval indicated subjective value reduction.
Before satiety , we used a total of 38,443 choices to estimate 56 IPs for fitting up to 5 ICs with  the bundle (blackcurrant juice, grape juice), 68 IPs for 4 ICs with bundle (blackcurrant juice,  strawberry juice) (Monkey A, unless otherwise noted), 58 IPs for 4 ICs with bundle (blackcurrant  juice, water), 38 IPs for 5 ICs with bundle (blackcurrant juice,  On-going reward consumption changed IC curvature from convex to linear and concave or flattened IC slopes, with all bundles tested in this study. The IC change suggested that both animals became more sated across consecutive trials on water, grape juice, strawberry juice, mango juice and apple juice (x-axis) as compared to blackcurrant juice (y-axis) (Figs. 2A-G; S1D, F-H). Nevertheless, the downward slopes of the ICs indicated remaining positive reward value of the sated liquids: increasing amounts of sated liquid (increase on x-axis) required less compensatory unsated blackcurrant juice (decrease on y-axis) for maintaining choice indifference. None of the sated juices had acquired negative reward value, which would have been indicated by an upward IC slope: increasing amounts of negatively valued sated liquid (increase on x-axis) would have required more blackcurrant juice (increase on y-axis) for compensation at IC, which was not the case with the currently used liquids but had been detected with lemon juice, yoghourt and saline (25). The only exception to this pattern of satiety was peach juice: IC steepness increased for this bundle, indicating less satiety than for blackcurrant juice (Fig. 2H). These IC changes demonstrate robust and systematic relative subjective value changes with natural, on-going liquid consumption across a variety of bundle types.

Control for other choice variables.
A logistic regression confirmed that bundle choice varied only with the bundle rewards but not with unrelated variables with on-going consumption, such as trial number within block of consecutive trials and spatial choice (Eq. 2). As before satiety (24), the probability of choosing the Variable Bundle continued to correlate positively with the quantities of its both rewards, and inversely with the quantities of both Reference Bundle rewards ( Fig. S1I; VA, VB vs. RA, RB). Further, choice probability for the Variable Bundle was anticorrelated with accumulated blackcurrant juice (MA) consumption and positively correlated with grape juice consumption (MB). This asymmetry is explained by the reward quantities at the IPs; as grape juice lost more subjective value than blackcurrant juice during satiety, the animal required, and thus consumed, more grape juice for less blackcurrant juice at the titrated IP. Trial number within individual trial blocks (CT) and spatial choice CL) did not explain the choice. Thus, even with ongoing consumption, the animals based their choice on the reward quantities of the bundles and the actually consumed rewards; unrelated variables kept having no significant influence.

Lick durations.
Licking provides a simple measure that could serve as mechanism-independent confirmation for the subjective value changes seen with choices. With on-going reward consumption of both juices, anticipatory licking between bundle stimuli and the first reward decreased gradually with grape juice but changed inconspicuously with blackcurrant juice (Fig. 3A,  B). This change suggested greater subjective value loss for grape juice than blackcurrant juice. In both animals, cumulative licking varied considerably between the different liquids and decreased significantly during satiety (green) compared to before satiety (pink) (Fig. 3C-G). Thus, the licking changes corresponded to the relative reward-specific subjective value changes inferred from bundle choices, IC slopes and curvatures (Fig. 2).
Liquid consumption. The IC flattening with on-going consumption indicated that the animal required increasing quantities of the more devalued Reward B for giving up the same quantity of the less devalued blackcurrant juice (Reward A) at IP (Fig. 1E). This change was also evident in the choice between the constant Reference Bundle containing only blackcurrant juice (Reward A) and the Variable Bundle containing only one of the other liquids (Reward B) (anchor trials; Fig. S1B; increase on x-axis). With on-going consumption, the animal gave up the same quantity of the less sated blackcurrant juice only if it received increasingly more of the sated Reward B at choice indifference. As the animal had no control over the constant Reference Bundle that defined the IP, it ended up consuming more of the devalued reward as the session advanced. For example, with ongoing consumption of the bundle (blackcurrant juice, water), consumption increased more rapidly within each day for water (on which the animal was more sated) than for blackcurrant juice (on which the animal was less sated) ( Fig. 3H; blue vs. red; P = 5.0979 x 10 -7 ; Kolmogorov-Smirnov test; n = 7,160 trials including bundles with two non-zero quantities). The differential consumption change with satiety resulted in decreases of blackcurrant:water consumption ratios at IP ( Fig. 3I; anchor trials only). These ratio changes indicated relative subjective value changes among the juices (change in common currency).
Similar or less pronounced changes of consumption and consumption ratios occurred with bundles containing blackcurrant juice and grape, mango or apple juice ( Fig. S2A-D), corresponding to higher satiety for these juices relative to blackcurrant juice ( Fig. 2A, D, F, G). However, both consumption and ratios increased for bundles combining blackcurrant juice with strawberry or peach juice (Fig. S2E, F), corresponding to slightly or noticeably higher satiety for blackcurrant relative to strawberry and peach juice (Fig. 2C, H). The ratio changes were consistent across successive weekdays while accelerating on Fridays ( Fig. S2G-J).
To control for the delivery of two different juices, we tested a bundle containing blackcurrant juice as both components and found no change in consumption ratio (Fig. S3A). Further, we reversed the juice sequence, delivering grape juice (Reward A) before blackcurrant juice (Reward B), and interchanged their x-y coordinates. On-going juice consumption consistently changed IC slope and curvature (Fig. S3B). The slope became substantially steeper, which corresponded to the flattened slopes with the regular sequence and x-y coordinates ( Fig. 2A). The curvature became partly concave, indicating that the animal gave up little grape juice for small blackcurrant juice quantities but much more grape juice for larger blackcurrant juice quantities. Together, these IC changes with reversed juice sequence suggested overall reduction of subjective grape juice value relative to blackcurrant juice that confirmed the satiety in the regular juice sequence and indicated grape juice satiety independent of delivery sequence. Further, temporal consumption profiles were similar for both juices (Fig. S3C top), whereas consumption ratios reflected well the reduced grape juice value relative to blackcurrant juice (Fig. 3C bottom) and corresponded to the inverse ratio changes in the opposite, regular juice sequence (Fig. S2A).
Thus, the consumption changes confirmed the relative reward-specific subjective value changes inferred from bundle choices.
Neuronal test design. Due to the IC change with on-going reward consumption, the original, physically unchanged bundles that were IPs before satiety failed to match the new ICs estimated during satiety. The mismatch depended on the degree of satiety: bundles with variation of less sated reward showed less mismatch, whereas bundles with variation of more sated reward showed more mismatch. To benefit from the IC scheme, we used two tests: variation of blackcurrant juice while holding grape juice constant, and variation of grape juice while holding blackcurrant juice constant. Comparison of IC maps between the pre-sated state ( Fig. 4A and B, left) and the sated state (C and D, left) shows that IC flattening with satiety moved bundle positions relative to ICs very little for blackcurrant juice variation (A vs. C) but substantially for grape juice variation (B vs. D).
We tested the influence of on-going reward consumption during the recording period of individual neurons, which allowed us to compare responses of the same neuron between non-sated vs. sated states, as defined by IPs inside vs. outside 95% confidence intervals, respectively (Fig. 1C, green zone). As these tests required several tens of minutes with each neuron, neurons not coding chosen value were not further investigated. All satiety-tested neuronal responses followed the basic scheme of ICs: monotonic increase with bundles placed on different ICs (testing bundles with different subjective value), and insignificant response variation with bundles positioned along same ICs (testing equally preferred bundles with equal subjective value) (24). We assessed these characteristics with a combination of multiple linear regression (Eq. 3), Spearman rank-correlation, and two-way Anova (see Materials and Methods). All tested responses belonged to the subgroup of previously studied OFC neurons (30) that were sensitive to multiple rewards and coded the value of the bundle the animal chose ('chosen value', as defined by Eqs. 4 and 5). To reduce visual confounds between the two choice options, we used similar visual stimuli that were not identifiable as distinct objects; therefore, we could not test object value or offer value coding as the other major OFC response category (1).
We subjected most neurons to two bundle tests: (i) choice over zero-reward bundle; both rewards were set to zero in one bundle, and the animal unfailingly chose the alternative, non-zero bundle; (ii) choice between two non-zero bundles; at least one reward was set to non-zero in both bundles, and the animal chose either bundle (Table 1). In addition, for comparison with previous studies on single-reward options, we tested all neurons with single-reward bundles in which only one reward was set to non-zero; usually the non-zero reward differed between the two bundles.
Single-neuron subjective value coding follows IC changes. At the beginning of daily testing, neuronal responses during choice over zero-reward bundle followed monotonically the increase of both bundle rewards (blackcurrant and grape juice), confirming value coding. Stimuli for bundles on higher ICs elicited significantly larger responses (Fig. 4A, B) (pre-stimulus increases due to the short inter-trial interval varied insignificantly and were unrelated to trial sequence; Fig. S4). With on-going consumption of both bundle rewards, increasing blackcurrant juice, on which the animal was less sated, remained positioned on different ICs. Correspondingly, neuronal responses continued to vary across ICs (here primarily between the top two ICs) ( Fig. 4C; red vs. blue-green). By contrast, increasing grape juice did not require much compensation by decrease of blackcurrant juice for maintaining choice indifference, as shown by the flattened and concave ICs (Fig. 4D left). The IC change indicated that more grape juice had only little more reward value to the animal. Accordingly, the neuron recorded during this test showed only a small and unmodulated response despite grape juice increase (Fig. 4D right; P = 0.22; across-bundle factor of two-way Anova); the response peak for the largest grape juice quantity dropped by 75%. Thus, on-going consumption of both juices reduced the subjective value variation of grape juice, and the neuronal responses reflected the reduced value variation, without change in coding rules or reward selectivity.
Similar consumption-induced neuronal changes occurred in choice between two non-zero bundles (Fig. S5). As bundles varying only in blackcurrant juice continued to occupy different ICs, OFC responses continued to increase with blackcurrant juice (Fig. S5A, C). By contrast, as original bundles varying only in grape juice were now positioned on lower and fewer ICs, neuronal responses decreased and became less differential ( Thus, OFC neurons continued to code reward value with on-going reward consumption. The responses continued to discriminate well the quantity of blackcurrant juice whose subjective value had changed relatively less (Figs. 4A vs. 4C, S5A vs. S5C) but were reduced for grape juice whose value had dropped more, despite unchanged physical grape juice quantities (Figs. 4B vs. 4D, S5B vs. S5D). In following the altered ICs, these OFC signals reflected reward-specific relative subjective value changes induced by on-going consumption.
Neuronal population. Out of 424 tested OFC neurons, 272 showed changes during task performance and were investigated during on-going reward consumption neurons. These neurons were located in area 13 at 30-38 mm anterior to the interaural line and lateral 0-19 mm from the midline; they were parts of the population reported previously (30). Responses in 98 of these 272 task-related OFC neurons (36%) coded chosen value (defined by Eqs. 4 and 5) and followed the IC scheme in any of the four task epochs (Bundle stimulus, Go, Choice or Reward) during choice over zero-reward bundle or choice between two non-zero bundles ( Table 1). Of the 98 chosen value neurons, 82 showed satiety-related changes with bundles composed of blackcurrant juice (Reward A) and grape juice, water or mango juice (Reward B) (Tables 2, S1).
Using the scheme of differential consumption-induced IC changes of Figs. 4 and S5, we found that averages of z-scored positive subjective value coding responses in 31 neurons (Table 2) continued to vary with blackcurrant juice quantity (despite peak reduction) (Reward A; Fig. 5A, B), whereas responses became insignificant with grape juice, water or mango juice in the same neurons (43% peak reduction) (Reward B; Fig. 5C, D). These contrasting, consumption-related neuronal response variations occurred individually in all four task epochs (Table S1).
Quantifications of individual response changes demonstrated consumption-induced significant response reduction with positive subjective value coding neurons and significant response increase with negative (inverse) coding neurons (lower response with higher subjective value) during choice over zero-reward bundle (Fig. S6A, B, red) and during choice between two non-zero bundles (Fig. S6C, D, red; Tables 2, S1). Fewer neurons showed inverse changes that were difficult to reconcile with subjective value coding (black in Fig. S6), or insignificant changes.
Thus, OFC neuronal population responses reflected consumption-induced changes in subjective value in a similar way as individual neurons.
Neuronal classifier performance. We used a neuronal classifier as additional means for demonstrating changes of neuronal reward coding by on-going reward consumption. We analyzed only neuronal responses that followed the IC scheme; they increased or decreased monotonically across ICs during any of the four task epochs but changed only insignificantly along ICs; further, the responses changed with on-going reward consumption as shown in Figs. 4, 5 and S5.
We first established the accuracy with which an ideal observer using neuronal responses distinguished bundles on different ICs. We trained a support vector machine (SVM) classifier on neuronal responses to randomly selected bundles positioned on the lowest and highest of three ICs (Bundle stimulus epoch). Before satiety, as defined by IPs inside the initial CI (Fig. 1C), the classifier showed decent bundle discrimination with as few as five neurons during choice over zeroreward bundle; classifier performance increased meaningfully with added neurons (Fig. 6A black). Then we used this classifier trained on pre-satiety responses and tested distinction of the same bundles using neuronal responses during consumption-induced satiety. We found a significant difference of accuracy (F(1, 34) = 68.02, p = 1.26653 x 10 -9 ; 1-way Anova); the accuracy increase with added neurons suggested maintained valid classification (Fig. 6A red). The satiety-related accuracy drop was also evident in the inverse test sequence: whereas accuracy was high with classifier training and testing during satiety, it was significantly lower when training on neuronal responses during satiety but testing using responses before satiety (F(1, 34) = 17.99, p = 0,0002).
The change in classifier accuracy occurred with choice over zero-reward bundle with neuronal responses to Bundle stimuli ( Fig. 6) and during the Go epoch (Fig. S7A), but not during Choice and Reward epochs (Fig. S7B, C). The changes were not explained by baseline changes during the 1 s Pretrial control epoch (Fig. S7D). Similar accuracy differences were seen in choice between two non-zero bundles during Bundle stimuli, Go epoch and Choice epoch, but not during the Reward epoch ( Fig. S7E-H), again unexplained by baseline changes (Fig. S7I). The accuracy differences were consistent across on-going consumption steps (Fig. S7J).
Finally, we analyzed activity from 265 unmodulated and unselected neurons, as before (27). From the total of 424 tested OFC neurons, we excluded the 98 neurons whose activity followed the IC scheme and further 61 neurons that coded only one of the two bundle rewards. Of the remaining 265 neurons, 113 showed task-related activity, whereas the other 152 neurons did not. The classification from the activities of the 265 neurons still showed mild differences between presatiety training responses and satiety test responses during the task epochs of Bundle stimuli, Go and Choice (Figs. S8, S9). These remaining differences may reflect sub-significance changes and suggest that subjective value coding reflects satiety mildly in a rather large OFC population.
Thus, the classifier results provide additional evidence for neuronal responses reflecting consumption-induced subjective value changes indicative of satiety.
Neuronal changes with single-reward bundles. Our use of choice options with two reward components differs from the conventional use of single reward options (1, 2) and thus requires controls and additional analyses. To do so, we used the same two visual component stimuli but set only one, but different, reward in each bundle to a non-zero quantity. The reward settings allowing choice indifference were derived from our use of two-reward bundles. The single-reward bundles were positioned graphically along the x-axis and y-axis but not inside the IC map (Fig. S1B) and thus were equivalent to single-reward choice options tested earlier (1,3,26). Trials with these anchor bundles interleaved with trials using two regular bundles of which at least one contained two non-zero rewards. These regular bundles provided the IPs at the axes required for analyzing anchor bundle choices.
First, we confirmed the results from two-reward bundles using the same IC formalism. We tested single-reward bundles on the same 82 neurons that had shown satiety-related changes with two-reward bundles. The neuronal responses of Fig. 7A, B distinguished both blackcurrant juice and water quantities during choice over zero-reward bundle before satiety. On-going consumption of both rewards flattened the ICs. The IC relationship of blackcurrant juice was preserved, and the neuron kept discriminating blackcurrant juice quantities (Fig. 7C). By contrast, the large water quantity was now positioned further below the top IC than before (Fig. 7D, red on x-axis) and close to the IC of the small blackcurrant quantity (blue on y-axis), and the small water quantity was now positioned below its original IC (blue on x-axis). Correspondingly, neuronal activity with large water quantity lost its peak and varied only weakly between the two water quantities (Fig. 7D, blue vs. red). These neuronal changes with single-reward bundles reflected the consumption-induced subjective value changes in a similar way as with two-reward bundles (Figs. 4, S5).
Next, we used single-reward bundles to compare neuronal response changes with behavioral changes. We established vector plots that displayed the ratio of reward weights (b's) for z-scored neuronal population responses (Eq. 3; Fig. 7E-I, red) and, separately, for behavioral choice (Eq. 1a; green). The inequality of subjective value between the two rewards was manifested as deviation of these vectors from the 45 deg diagonal line. On-going reward consumption increased the elevation angle of the behavioral vector, indicating subjective value loss for Reward B (grape juice, water or mango juice) relative to Reward A (blackcurrant juice). The neuronal vector changed correspondingly ( Fig. 7E-I, red). For example, during choice of the bundle (blackcurrant juice, grape juice) over zero-reward bundle, the behavioral vector angle increased from 40 deg before satiety to 65 deg during satiety, and the neuronal population vector increased from 35 deg to 62 deg (Fig. 7E, green, red). Similarly, during choice between two non-zero bundles, the behavioral vector increased from 40 deg to 52 deg while the neuronal vector increased from 38 deg to 45 deg (Fig.  7F). Further, the shorter neuronal vectors during satiety indicated general reduced responding, indicating additional general satiety (red). Bundles containing water or mango juice showed similar changes ( Fig. 7G-I). Thus, both before and during satiety, the neuronal vectors (red) were within the CIs of the behavioral vectors (green), indicating intact neuronal subjective value coding that reflected the value changes with on-going reward consumption.
In addition to the vector analysis, IC slopes confirmed the close neuronal-behavioral correspondence during satiety, with satiety being defined by IPs exceeding the initial, pre-sated psychophysical CIs (Fig. S1A, E). As estimated from regression coefficient ratios (-β 2 / β 1 ) (Eq. 3) and (-b / a) (Eq. 1), the slopes of the linear neuronal ICs of single-reward bundles correlated with the slopes of linear behavioral ICs (Fig. 7J). These results from single-reward bundles during ongoing reward consumption compared well with the results from the earlier OFC study on single rewards with spontaneously varying subjective reward value (1).
Taken together, the consumption-induced changes in single OFC neurons and in the OFC population corresponded well between single-reward and two-reward choice options. This similarity provides a sound foundation for the current findings on the more natural multi-component choice options.

Discussion
This study investigated whether the reward responses of OFC neurons reflected reductions of subjective economic reward value induced by satiety for specific rewards, using stringent tests based on formal concepts of economic decision theory. As reward-specific satiety constitutes a major contributor to subjective value, the observed changes present crucial support in favor of economic value coding in OFC neurons and extend the role of this cortical structure in economic decision mechanisms. The use of bundles containing two differently sated rewards assured choice of both options as essential requirement for assessing subjective reward value in a controlled and unequivocal manner at choice indifference. The value loss was captured by graphic ICs that were constructed from psychophysically estimated IPs and represented the subjective economic value of the bundles. The ICs changed in an orderly and characteristic manner with on-going reward consumption (Figs. 1, 2, S1); they flattened progressively and changed from convex via near-linear to concave, thus indicating gradual subjective value loss for all bundle rewards (plotted on the xaxis) relative to blackcurrant juice (y-axis) (except for lesser peach juice value loss). These IC changes suggested that the animal required increasing quantities of the less sated blackcurrant juice for compensating the increasing subjective value loss of the other, more sated bundle reward with on-going consumption of both rewards. The specific and asymmetric IC changes made alternative explanations unlikely, such as general satiety, motivation loss, passage of time, or proximity of return to home cage, all of which would affect all rewards non-differentially. Licking behavior and consumption supported the notion of reward-specific satiety in a mechanism-independent manner (Figs. 3, S2, S3). Taken together, the demonstrated sensitivity to reward-specific satiety contributes an important argument in favor of formal subjective economic value coding by OFC neurons inferred from other choice data.
Our preceding study had established neuronal chosen value responses in OFC that were sensitive to multiple rewards and followed the animal's rational choice of two-reward bundles, including completeness, transitivity and independence from option set size (24). The current study tested the effects of on-going reward consumption on these chosen value responses. We found that the neuronal responses matched the consumption-induced IC changes during recording periods of individual neurons. The responses became weaker for more sated rewards (Figs. 4, 5, S4, S5). Most impressively, neuronal responses failed to distinguish between bundles that were physically unchanged but came to be on the same ICs because of the ICs' curvature change to concave (Figs. 4D). Classifiers predicting bundle discrimination from neuronal responses confirmed accurate subjective reward value coding both before and during satiety and demonstrated the substantial nature of neuronal changes (Figs. 6, S7-S9). Neuronal response vectors of conventional singlereward choice options correlated well with behavioral choice vectors; the correlations were maintained during consumption-induced subjective value changes (Fig. 7). These results demonstrate that OFC responses followed the differential subjective value changes induced by on-going reward consumption. As the rewards did not change physically with satiety, these results satisfy a necessary requirement for the coding of subjective value by OFC neurons that had been suggested based on choices between single-reward options (1) and two-reward bundles (30).
The consumption increase of sated rewards like water, grape juice, apple juice and mango juice (Figs. 3H, S2A-D) seemed to contradict earlier findings and the general intuition that satiety would rather reduce consumption of rewards on which an animal is sated. Differences in study design might explain these discrepancies. In the earlier studies, monkeys chose between sated and non-sated rewards, or between accepting and not accepting sated rewards, and naturally preferred non-sated rewards (8,9,(15)(16)(17). As this tendency precluded choice indifference, we studied choice between bundles that each contained two differently sated rewards. Advancing satiety on Reward B required increasing quantities of this reward for choice indifference against bundles containing the less sated Reward A (see Figs. 1E, S1B). Thus, satiety increased consumption of the sated reward. By contrast, outright rejection of the more sated Reward B would have been represented by an upward sloped IC, which had been observed with lemon juice, yoghourt and saline (24) but not with the currently used rewards; such upward sloped ICs indicated that an animal needed to be 'bribed' with more reward for accepting these normally rejected sated rewards. By contrast, in the current study, the maintained downward IC slope indicated that the animal was not entirely averse to the sated reward.
The currently reported altered neuronal value coding with reward-specific satiety builds on previous studies on monkey OFC neurons that investigated satiety by clinical tests and behavioral observations. There, monkeys were presented with syringes or tubes containing various fruit juices; rating scales and go-nogo lick responses indicated behavioral acceptance or rejection of these juices (8,9). The studies report that OFC neurons lost responses only for the particularly sated juice. Spontaneous variations of single-reward choices likely reflected appetite variations over the course of daily experimentation and affected chosen value coding in monkey OFC (1). Stronger general satiety effects in ventromedial prefrontal cortex compared to OFC (11) suggest widespread satiety sensitivity in ventral frontal cortex. Human studies using pleasantness ratings demonstrate neuroimaging correlates of reward satiety in OFC (12)(13)(14). The presence of satiety effects in OFC contrast with their absence in the earlier gustatory system, including nucleus of the solitary tract, frontal opercular taste cortex and insular taste cortex (6,7,31). The neurophysiological results (1,8,9,11) correspond to the reward functions of frontal cortex that become deficient after lesion and inactivation, including associative strength (or, equivalently, incentive value), cognitive reward representation, and approach and goal-directed behavior (10,(15)(16)(17). Our present data on subjective value coding in a specific choice context extend these results into the domain of economic decisionmaking.
In addition to reward-specific satiety, on-going consumption induces also general satiety that may consist of a general loss of pleasure, arousal, attention and motivation. Our shorter neuronal population vectors with single-reward bundles may reflect general satiety, in addition to the changed vector angle that suggests reward-specific satiety ( Fig. 7E-I, red). General and unspecific satiety would lead to general subjective value reduction in a similar way as reduced physical reward quantity; both would be represented by parallel, or curvi-parallel, IC displacement towards the x-y graph origin ( Fig. 2A-H, left panels), whereas reward-specific satiety is indicated by changes in IC slope and curvature (right panels). General satiety is seen with reduced midbrain responses in humans during consumption of Swiss chocolate (12) and reduced dopamine responses in mice receiving food pellets for extended periods of time (32). Thus, general satiety cannot explain our asymmetric IC changes and the corresponding asymmetric neuronal changes, both of which reflect subjective value changes indicative of reward-specific satiety.

Materials and Methods
The study used the same 2 male adult rhesus monkeys as previously (24,30) and was licensed by the UK Home Office (for details, see Supplementary Information). The animals chose between two compound stimuli that were positioned at pseudorandomly alternating fixed left-right positions on a computer monitor and predicted bundles containing the same two rewards whose quantities varied pseudorandomly and without specific temporal order. We psychophysically estimated multiple choice indifference points (IP; Fig. 1C, S1A, E) to which we fitted ICs along which all bundles were equally preferred, using a hyperbolic function d: with y and x as milliliter quantity of Rewards A and B (Figs. 1D, E; S1B, D, F), a and b as weights of the influence of the two reward quantities, and c as curvature. Eq. 1 can be equivalently rewritten as regression in analogy to the regression used for analysing neuronal responses: with A and B as milliliter quantity of Reward A (plotted at y-axis) and Reward B (x-axis), respectively, b 0 as offset coefficient, b 1 and b 2 as behavioral regression coefficients, and e as compound of errors err 0 , err 1 , err 2 , err 3 for offset and regressors 1-3.
To test whether the animal's choice reflected the quantity of the bundle rewards during satiety, rather than other, unintended variables such as spatial bias, we used the logistic regression: with P ( Following behavioral training and surgical preparation for single neuron recording, we identified neuronal task relationships with the paired Wilcoxon-test. We identified changes of taskrelated neuronal responses across ICs with a linear regression: with y as neuronal response in any of the four task epochs (Bundle stimulus, Go, Choice, Reward), measured as impulses/s and z-scored normalized to the Pretrial control epoch of 1.0 s, A and B as milliliter quantity of Reward A (plotted at y-axis) and Reward B (x-axis), respectively, b 0 as offset coefficient, b 1 and b 2 as neuronal regression coefficients, and e as compound error. In addition, all significant neuronal response changes across ICs identified by Eq. 3 needed to be also significant in a Spearman rank-correlation test (P < 0.05).
To assess neuronal compliance with the two-dimensional IC scheme, we used a two-factor Anova on each task-related response that was significant for both regressors in Eq. 3. Neuronal responses following the IC scheme were significant across-ICs (factor 1: P < 0.05) but insignificant within-IC (factor 2).
Chosen value (CV) was defined as: weighting parameter k 1 served to adjust for differences in subjective value between rewards A and B, such that the quantity of Reward B entered the regression on a common-currency scale defined by Reward A. We assessed neuronal coding of chosen value in all neurons that followed the revealed preference scheme, using the following regression: with UCV as value of the unchosen option that was not further considered here, and e as compound error.      (C), the responses followed the satiety-induced IC change (D). This reduction of bundle stimulus response was seen in 30 neurons (Table S1).   (C) Despite IC flattening with on-going reward consumption, the two bundles with blackcurrant juice variation remained on the same two ICs, and the neuronal response variation remained significant (P = 4.7616 x 10 -8 , F(1,40) = 30.91; P = 6.9739 x 10 -13 , F(1,40) = 54.5), with some peak response reduction (red). Dotted ICs are from pre-sated state. (D) IC flattening with reward consumption indicates relative subjective value reduction of water. The two unchanged bundles with water variation were now located on and below the lower IC (dotted lines). Neuronal activity varied only weakly (red) (P = 8.9470 x 10 -8 , F(1,40) = 29.6; P = 0.0367, F(1,40) = 4.39). Further, the large-water bundle (dotted red line) elicited now a similar response as the low-blackcurrant bundle that was now on the same IC (solid blue line in (C)). Thus, while continuing to code subjective reward value (shown in (C)), the responses followed the satietyinduced IC change. This reduction of bundle stimulus response was seen in 30 neurons (Table S1).  General behavior. The animals were habituated during several months to sit in a primate chair (Crist Instruments) for a few hours each working day. They were trained in a specific, computercontrolled behavioral task in which they contacted visual stimuli on a horizontally mounted touchsensitive computer monitor (Elo) located 30 cm in front of them. The animal's eye position in the horizontal and vertical plane were monitored with a non-invasive infrared oculometer (Iscan). Matlab software (Mathworks) running on a Microsoft Windows XP computer controlled the behavior and collected, analyzed and presented the data on-line. A solenoid valve (ASCO, SCB262C068) controlled by the same Windows computer served to deliver specific liquid quantities. A Microsoft SQL Server 2008 Database served for Matlab off-line data analysis.

Reward-specific satiety affects subjective value signals in orbitofrontal cortex during multicomponent economic choice
Following task training for about 6 months, animals were surgically implanted with a recording chamber for electrophysiological recordings, which typically lasted for another 6-10 months.

Stimuli and rewards.
A computer touch monitor presented the subject with two visual stimuli (4 deg apart), each representing one choice option, called Reference Bundle and Variable Bundle (Fig.  1A). Each bundle contained two rewards that were represented separately by a colored rectangle: Reward A, represented by a violet rectangle, and Reward B, represented by a green rectangle. Reward quantities were set independently and indicated by the vertical position of a bar within each rectangle (higher was more). The Reference Bundle contained two preset Reward quantities that were fixed for a given block of trials. The Variable Bundle contained a specifically set quantity of one Reward and an experimentally varied quantity of the other reward. Reward A in all bundles was blackcurrant juice without or with added monosodium glutamate (MSG), Reward B was grape juice, strawberry juice, mango juice, water, apple juice, peach juice, or grape juice with added inosine monophosphate (IMG). While MSG and IMG are known taste enhancers, we did not test choice preferences between them on their own. Thus, we cannot state whether they had own reward value for the animals.
Task. Each trial began when the animal contacted a centrally located touch sensitive key for 1.0 s after a pseudorandom inter-trial interval of 1.6 ± 0.25 s. Then the two stimulus pairs representing the two bundles appeared at pseudorandomly alternating fixed left-right positions on a computer monitor in front of the animal (Fig. 1B). After 2.0 s, two blue spots appeared as GO stimulus underneath the bundle stimuli, upon which the animal released the touch key and touched the blue spot underneath the bundle of its choice within 2.0 s. The required action consisted of one arm movement and was constant across bundles and trials. After a hold time of 1.0 s, the blue spot underneath the chosen bundle turned green, and the blue spot underneath the unchosen bundle disappeared. Simultaneously a white frame around the chosen bundle appeared as feedback for successful choice. The computer-controlled liquid solenoid valve delivered Reward A at 1.0 s after the choice, followed 0.5 s later by Reward B (except when using peach juice as Reward B; here the sequence was reversed: Reward B was delivered first, then 0.5 s later Reward A, blackcurrant juice). Task training was initially restricted to one bundle type and was extended to other bundle types only when satisfactory behavioral performance was obtained. The longer delay for liquid B compared to liquid A likely generated asymmetric temporal discounting that affected the subjective value of each liquid. However, all delays were kept constant, which allowed the subjective value differences from different temporal discounting to be incorporated as a constant factor into the subjective value of each liquid. We choose this delay, rather than simultaneous delivery or pseudorandomly alternating single liquid delivery, to prevent more serious taste interactions between simultaneously delivered liquids and to avoid temporal uncertainty. Reaching for a target before appearance of the blue dots, or key release during required key touch or target-hold, were considered as errors and lead directly to the inter-trial interval without reward.
Estimation of behavioral ICs. The behavioral method for obtaining an IP from stochastic choice has been presented in full detail (1,2). With two bundle options, the animal chose between the preset Reference Bundle (left in Fig. 1A) and the Variable Bundle (right) in repeated trials. Thus, the constant Reference Bundle provided a stable reference against the changing bundle composition in the Variable Bundle. We set one reward in the Variable Bundle to one unit (> 0.1 ml) above the quantity of the same reward in the Reference Bundle, while pseudorandomly varying the quantity of the other reward of the Variable Bundle over the whole test range and in pseudorandom temporal order within each block of trials. The variation of the animal's repeated choice with that single, pseudorandomly varying reward allowed us to construct a full psychophysical function and estimate an IP from Weibull fitting (point of subjective equivalence; P = 0.5 choice of each bundle).
As in our previous study (1), we used the Matlab function GLMFIT for psychophysical fitting. This function returns a number called 'Deviance' between 0 and infinity that can be used to compare fitting between Weibull and logit. The Deviance is the difference between the loglikelihood of the fitted model and the maximum possible log-likelihood. Lower values are better. The estimated Deviance for psychophysics for the first 5,000 trials and 2 monkeys was 1.0415 for the Weibull model and 1.6009 for the logit model, suggesting that the Weibull fitted the data better. Hence, we used Weibull fitting for all psychophysical fitting.
We obtained each IP from a total of 80 trials (2 left-right stimulus positions with 5 equally spaced reward quantities in 8 trials). To avoid known adaptations in OFC neurons (3-6), we always tested the full reward range of the experiment.
To obtain an IC, we fit a series of IPs with a hyperbolic function d using weighted least mean squares: d = ay + bx + cxy [1] with y and x as milliliter quantity of Reward A (plotted on y-axis on 2D graph, Fig. 1D, E) and Reward B (plotted on x-axis), a and b as weights of the influence of the Reward quantities plotted on the y-and x-axes, respectively, and c as curvature. A potent reward that contributes strongly to the choice of the bundle would have a large weight (high coefficient a or b), whereas a less potent reward would have lower weight coefficients. Thus, with the potent (more weight) reward plotted on the x-axis, and the less potent (less weight) reward plotted on the y-axis, choice indifference between them would occur with smaller milliliter quantities on the x-axis compared to the y-axis. Hence, the IC slope would be steeper than the diagonal line (see Fig. 1D, E). By resolving Eq. 1 as y = -(b / a) * x, the IC slope would be the ratio of the coefficients that reflect the weights of the rewards: -b / a. With a higher potency of Reward B (x-axis) compared to Reward A (y-axis), the rectified IC slope would be larger than 1. Relatively stronger satiety for Reward B (x-axis) compared to Reward A (y-axis) would reduce the weight of Reward B, reduce the absolute value of the ratio -b / a, and flatten the IC slope. Thus, the IC slope -b / a describes the relative impact of the two bundle rewards (reflecting the value ratio between the two rewards), whereas the weights (a and b) describe the influence of the reward quantities. The hyperbolic function can be re-written in an equivalent form to the regression with interaction used for analysing neuronal responses (see Eq. 3 below): with A and B as milliliter quantity of Reward A (plotted at y-axis) and Reward B (x-axis), respectively, b 0 as offset coefficient, b 1 and b 2 as behavioral regression coefficients, and e as compound of errors err 0 , err 1 , err 2 , err 3 for offset and regressors 1-3.

Definition and criteria for pre-sated and sated states.
With on-going reward consumption, the changes of psychophysical choice functions exceeding the confidence intervals (CI) of initial tests suggested a changed subjective value relationship between the two bundle rewards suggestive of relative, reward-specific satiety (see Figs. 1D, S1A, S1E). More specifically, the gradual effect of satiety on choice preference was identified by tracking the IPs as consumption advanced across blocks of 80 trials. Importantly, these changes occurred fast enough to be studied during the recording durations of single neurons, thus allowing us to compare responses between non-sated and sated states in the same neuron. The Weibull-fitted IPs were obtained psychophysically for fixed and equally spaced quantities of Reward B. Changes in relative subjective value of the two bundle rewards were assessed with interleaved anchor trials in choices between bundles with only one non-zero reward: bundle (fixed non-zero blackcurrant juice; no Reward B) vs. bundle (no blackcurrant juice; variable non-zero Reward B), using any Reward B (Fig. S1B). To aggregate IP data across sessions and compensate for across-session variability, we normalized the reward quantity ratio to the first titration block in all sessions. We then compared the normalized distributions of IPs within the CI of the first block with the distributions of IPs exceeding the CI of the first block.

Control regressions for behavioral choice.
To test whether the animal's choice reflected the quantity of the bundle rewards during satiety, rather than other, unintended variables such as spatial bias, we used the logistic regression with P (V) as probability of choice of Variable Bundle, b 0 as offset coefficient, b 1 -b 7 as correlation strength (regression slope) coefficients indicating the influence of the respective regressor, CT as trial number within blocks of consecutive trials, RA as quantity of Reward A of Reference Bundle, RB as quantity of Reward B of Reference Bundle, VA as quantity of Reward A of Variable Bundle, VB as quantity of Reward B of Variable Bundle, CL as choice of any bundle stimulus presented at the left, MA as consumed quantity of Reward A, MB as consumed quantity of Reward B, and e as compound error for offset and all regressors. We used a binomial fit with logit link function to obtain standardized b coefficients. Choices over zero-reward bundles were excluded in the regression to avoid internal correlation between value and consumption.
Licking. Licking was monitored with an infrared optosensor positioned below the juice spout (V6AP; STM Sensors). Anticipatory lick durations were measured between the appearance of the bundle stimuli and delivery of the first reward liquid (approximately 5 -6 s duration) in bundles containing only one non-zero reward (single-reward bundles) within single working sessions. Licking data were collected with four bundle types, namely (blackcurrant juice, grape juice), (blackcurrant juice, water), (blackcurrant juice, strawberry juice) and (blackcurrant juice, mango juice). Surgical procedures and electrophysiology. As described before for the same animals (2), a headrestraining device and a recording chamber (40 x 40 mm, Gray Matter) were implanted on the skull under full general anesthesia and aseptic conditions. The stereotactic coordinates of the chamber enabled neuronal recordings of the orbitofrontal cortex (OFC) (7). We located the OFC from bone marks on coronal and sagittal radiographs taken with a guide cannula inserted at a known coordinate in reference to the implanted chamber, using a medio-lateral vertical and a 20º degree forward-directed approach aiming for area 13. Monkey A provided data from the left hemisphere and Monkey B from the right hemisphere via a craniotomy ranging from Anterior 30 to Anterior 38 and Lateral from 0 to 19. We conducted single-neuron electrophysiological recordings using both custom made glass-coated tungsten electrodes (8) and commercial electrodes (Alpha Omega) (impedance of about 1 MOhm at 1 kHz). Electrodes were inserted into the cortex with a multielectrode drive (NaN drive) with the same angled approach as used for the radiography. Neuronal signals were collected at 20 kHz, amplified using conventional differential amplifiers (CED 1902 Cambridge Electronics Design) and band-passed filtered (high: 300 Hz, low: 5 kHz). We used a Schmitt-trigger to digitize the analog neuronal signal online into a computer-compatible TTL signal. However, we did not use the Schmitt-trigger to separate simultaneous recordings from multiple neurons, in which case we searched for another recording from only a single neuron, or we stored occasionally the data in analog form for off-line separation by dedicated software (Plexon offline sorter). An infrared eye tracking system monitored eye position (ETL200; ISCAN), with temperature check on an experimenter's hand at the approximate position of the animal's head.
Definition of neurons following the revealed preference scheme. We analysed single-neuron activity during four task epochs vs. Pretrial control (1 s): visual Bundle stimulus (2 s), Go signal (1 s), Choice (1 s) and Reward (2 s, starting with Reward A, followed 0.5 s later by Reward B, except where noted, thus covering both rewards). To establish neuronal relationships to these task epochs, we compared the activity in each neuron during each task epoch separately against the Pretrial control epoch using the paired Wilcoxon test (P < 0.01). A neuron was considered task-related if its activity in at least one of the four task epochs differed significantly from the activity during the Pretrial control epoch.
Responses of individual neurons should follow the scheme of two-dimensional ICs that characterizes revealed behavioral preferences for two-dimensional bundles. Specifically, the responses should comply with three characteristics defined previously (2).
(Characteristic 1) Neuronal responses should change monotonically with increasing behavioral preference across behavioral ICs, irrespective to bundle composition. Such monotonic neuronal response changes should reflect increasing quantities of one or both bundle rewards, assuming a positive monotonic subjective value function on reward quantity.
(Characteristic 2) Neuronal responses should vary insignificantly for all equally preferred bundles positioned along a same behavioral IC, despite different physical bundle composition.
(Characteristic 3) Neuronal responses should follow the IC slope and the nonlinear curvature of behavioral ICs. The IC slope reflects the subjective value relationship between the two bundle rewards, and thus the subjective value of one reward (in our case Reward B) in the common currency of a reference reward (Reward A).
We used a combination of three statistical tests to assess these characteristics. Characteristic 1: To capture the change across ICs in the most conservative, assumption-free manner possible, we used a simple linear regression on each Wilcoxon-identified task-related response: with y as neuronal response in any of the four task epochs, measured as impulses/s and z-scored changes identified by Eq. 3 needed to be also significant in a Spearman rank-correlation test that assessed ordinal monotonicity of response change across ICs without assuming linearity and numeric scale (P < 0.05). Characteristics 1 and 2: To assess the two-dimensional across/along IC scheme in a direct and intuitive way, and without assuming monotonicity, linearity and numeric scale, we used a twofactor Anova on each Wilcoxon-identified task-related response that was also significant for both regressors in Eq. 3; the factors were across-IC (ascending rank order of behavioral ICs) and along-IC (same rank order of behavioral IC). To be a candidate for following the IC scheme of Revealed Preference Theory, changes across-ICs should be significant (P < 0.05), changes within-IC should be insignificant, and their interaction should be insignificant.
Characteristic 3: Whereas the regression defined by Eq. 3 estimated neuronal responses across ICs, a full estimation of neuronal ICs for comparison with behavioral ICs would require inclusion of the IC slope and curvature, both of which depended on both rewards. By simplifying Eq. 3 by setting to zero both the b 3 coefficient and the constant neuronal response along the IC, the neuronal IC slope would be the ratio of coefficients (-b 2 / b 1 ). Note the different meanings of the slope term: the neuronal IC slope (-b 2 / b 1 ) describes the relative coding strength of the two bundle rewards (reflecting the neuronal ratio of the two rewards), whereas each neuronal regression slope alone (b) describes the coding strength of neuronal response (correlation with the specific regressor). The neuronal IC curvature was estimated from the b 3 coefficient of the interaction term AB (all b's P < 0.05; t-test).
Neuronal chosen value coding. As stated before (2), chosen value (CV) was defined as the value of the option the animal was going to choose or had already chosen. As each option consisted of two components, we used a linear combination of the quantity of Reward A (blackcurrant juice) and Reward B (any of the other five rewards): Weighting parameter k 1 served to adjust for differences in subjective value between rewards A and B. We established parameter k 1 during neuronal recording sessions from behavioral choice IPs using quantitative psychophysics in anchor trials (80 trials per test, see above Trial types for neuronal tests), rather than reading it from fitted ICs. Thus, k 1 equals the ratio of coefficients b 2 / b 1 of Eq. 3.
We established a common-currency scale in ml for all tested rewards by defining blackcurrant juice or blackcurrant-MSG (Reward A) as reference (numeraire); the subjective value of any reward is expressed as real-number multiple k 1 of the quantity of the numeraire at choice indifference. Specifically, the animal chose between the Variable Bundle that contained a psychophysically varied quantity of blackcurrant juice and the Reference Bundle that contained a fixed quantity of blackcurrant juice. A k 1 of < 1 indicated that more quantity was required for choice indifference against blackcurrant juice; thus, k 1 < 1 suggested that the tested reward had lower subjective value than blackcurrant juice. By contrast, k 1 > 1 suggested higher subjective value, as less quantity was required for choice indifference.
We assessed the coding of chosen value and unchosen value in all neurons that followed the revealed preference scheme, using the following regression: with UCV as value of the unchosen option that was not further considered here, and e as compound error for offset and all regressors.
Vector plots of OFC reward sensitivity. The purpose of this analysis was to provide quantitative and graphic information about satiety-induced behavioral and neuronal changes that would allow comparison with previous OFC studies that had not used two-component choice options with individually varying reward quantities and therefore did not establish ICs (9). This simplified analysis addressed monotonic response increase or decrease with increasing quantities of bundle rewards across ICs (characteristic 1 above), but did not address other IC characteristics such as trade-off, slope and curvature (characteristics 2 and 3) that had not been investigated previously. We established 2D plots whose dots indicated the relative contribution of each of the two bundle rewards to the neuronal response. We then compared vectors of behavioral choices with vectors of averaged neuronal population responses before and during satiety.
For behavioral choices, we plotted vectors (with 95% confidence intervals) from averaged dot positions defined by reward quantity (distance from center: sqrt (b 1 2 + b 2 2 )) and relative weight (elevation angle: arctangent (b 1 / b 2 )); coefficient b 1 refers to Reward A (blackcurrant, y-axis), coefficient b 2 refers to any of the other rewards (x-axis) (Eq. 1a). The angle of the vector reflects the relative contribution the two bundle rewards to the choice, as estimated by the a and b coefficients (Eq. 1). A deviation of the alignment angle from the diagonal line indicates an unequal contribution weight to bundle choice, and thus a non-1:1 reward ratio.
For neuronal responses, each dot on the two-dimensional plot was defined by the two b regression coefficients for neuronal responses (Eq. 3; P < 0.01, t-test) for each of the two rewards in any of the four task epochs. The distance from center indicates the z-scored response magnitude (sqrt (b 1 2 + b 2 2 )), coding sign (positive or negative), and relative weight (elevation angle; arctangent (b 1 / b 2 )) of the two b coefficients. Coefficient b 1 refers to Reward A (blackcurrant, yaxis), coefficient b 2 refers to any of the other rewards (x-axis). Responses with negative (inverse) coding were rectified. Further IC characteristics such as systematic trade-off across multiple IPs and IC curvature played no role in these graphs. The alignment of the dots along the diagonal axis shows the relative coding strength for the two bundle rewards, as estimated by the b regression coefficients; a deviation from the diagonal line indicates an unequal influence of the two bundle rewards on the neuronal responses, reflecting a neuronal correlate of reward ratio. Neuronal decoding. We used a linear support vector machine (SVM) classifier to decode neuronal activity according to bundles presented at different behavioral ICs during choice over zero-reward bundle (bundle distinction) and, separately, according to the behavioral choice between two nonzero bundles located on different ICs (choice prediction). As in our main study on revealed preferences (2), we implemented the decoder with linear kernel using custom-written software with svmtrain and svmclassify procedures in Matlab R2015b (Mathworks). (our previous work had shown that use of nonlinear SVM kernels did not improve decoding) (10). The SVM decoder was trained to find the optimal linear hyperplane for the best separation between two neuronal populations relative to lower vs. higher ICs.
All analyses employed single-neuron data, consisting of single-trial impulse counts that had been z-normalised to the activity during the Pretrial epoch in all trials recorded with the neuron under study. The analysis included activity from all neurons whose responses followed the IC scheme of revealed preferences during any of the four task epochs, as identified by our three-test statistics, except where noted. The neurons were recorded one at a time; therefore, the analysis concerned aggregated pseudo-populations of neuronal responses.
The decoding analysis used 10 trials per neuron for each of two ICs (total of 20 trials). Extensive analysis suggested that higher inclusion of 15-20 trials per group did not provide significantly better decoding rates (while reducing the number of included neurons). For neurons that had been recorded with > 10 trials per IC, we selected randomly 10 trials from each neuron for each of the two ICs. We used a leave-one-out cross-validation method in which we removed one of the 20 trials and trained the SVM decoder on the remaining 19 trials. We then used the SVM decoder to assess whether it accurately detected the IC of the left-out trial. We repeated this procedure 20 times, every time leaving out another one of the 20 trials. These 20 repetitions resulted in a percentage of accurate decoding (% out of n = 20). The final percentage estimate of accurate decoding resulted from averaging the results from 150 iterations of this 20-trial random selection procedure. To distinguish from chance decoding, we randomly shuffled the assignment of neuronal responses to the tested ICs, which should result in chance decoding (accuracy of 50% correct). A significant decoding with the real, non-shuffled data would be expressed as statistically significant difference against the shuffled data (P < 0.01; Wilcoxon rank-sum test).

Fig. S1. Additional behavioral tests demonstrating reward-specific satiety by changes of indifference curves (IC).
(A) Psychophysical assessment of choice between single-reward bundles with grape juice variation (constant Reference Bundle: 0.4 ml blackcurrant juice, 0.0 ml grape juice; Variable bundle: 0.0 ml blackcurrant juice, varying grape juice). Green and violet curves inside green 95% confidence interval: initial choices; blue, orange and red curves: advancing consumption. The decrease in blackcurrant : grape juice ratio at IP was significant between the first IP and all IPs exceeding the first confidence interval (ratios of 1.9857 ± 0.0173, n = 139, green, vs. 1.0077 ± 0.02, orange and red; mean ± standard error of the mean, SEM; individual trial blocks: P = 9.6943 x 10 7 , Kolmogorov-Smirnov test; P = 2.336 x 10 -32 , Wilcoxon rank-sum test; P = 3.1712 x 10 -46 , t-test;       Table S1. Neuronal changes with on-going reward consumption during different task epochs. decrease with higher subjective value. Most neurons were tested both in choice over zero-reward bundle and in choice between two non-zero bundles. Changes during the Reward epoch may indiscriminately reflect changes in subjective reward value and consumption (mouth movements, sensory stimulation); no attempts were made to distinguish neuronal relationships between these factors.