Reduced neuronal value signals in monkey orbitofrontal cortex during relative reward-specific satiety

Acknowledgements: We thank Aled David for invaluable help with animal training, Dr. Polly Taylor for anaesthesia during implantation, Paul Cisek for sharing his SQL-Matlab toolbox (NeuroMath), Charles R. Plott, Christopher Harris, Simone Ferrari-Toniolo and Fabian Grabenhorst for inspiration and insightful comments on experimental economics and neuronal data analysis. The Wellcome Trust (WT 095495, WT 204811), European Research Council (ERC; 293549) and US National Institutes of Mental Health Caltech Conte Center (NIMH; P50MH094258) supported this work.


SUMMARY
Reward-specific satiety changes the subjective value of one reward relative to other rewards. Twodimensional indifference curves (IC) capture relative reward-specific values of two-component choice options according to Revealed Preference Theory. Any change of reward value would be captured by specific IC distortions. We estimated two-dimensional ICs from stochastic choice and found that natural on-going consumption of two liquid rewards led to characteristic IC changes indicative of relative value reduction of specific rewards, suggesting reward-specific satiety. Licking changes confirmed the satiety in a mechanism-independent manner. Neuronal reward signals in monkey orbitofrontal cortex (OFC) followed the specific IC distortions and indicated value changes compatible with relative reward-specific satiety. A neuronal classifier predicted well the value changes inferred from the altered behavioral choices. These results demonstrate that neuronal signals in OFC reflect the altered subjective value of selectively sated rewards during economic choice.
. CC-BY 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted July 5, 2020. . https://doi.org/10.1101/2020.07.04.187518 doi: bioRxiv preprint

INTRODUCTION
An animal's internal state markedly influences subjective reward value (Cabanac, 1971;Rolls et al., 1983). A classic case is satiety, where two processes come into play. General, non-differential satiety concerns the reduction in subjective value of all rewards and also involves changes in general arousal, attention and motivation. By contrast, reward-specific satiety reduces the subjective value of specific rewards relative to other rewards (sensory-specific satiety; Cabanac, 1971;Rolls et al., 1983;Reichelt et al., 2014); in analogy, salt depletion makes salt attractive, indicating increased value of salt, while leaving sugar attraction unchanged (Robinson & Berridge 2013). However, the neuronal mechanisms of reward-specific satiety are poorly understood in animals, partly because the accompanying general satiety impairs task performance necessary for well-controlled tests. Existing data point to a role of orbitofrontal cortex (OFC) (Rolls et al., 1989;Critchley and Rolls, 1996;Small et al., 2001;Kringelbach et al., 2003).
Testing reward-specific satiety requires comparison between a reward on which the animal is sated and at least one other reward on which the animal is less or not sated. This requirement matches the notion that choice options have have multiple reward components. For example, a meal is composed of meat and vegetables, and the choice of the meal involves both rewards. This multicomponent nature is conceptualized in Revealed Preference Theory; its two-dimensional indifference curves (IC) graphically display reward preferences and subjective reward values that are revealed by measurable choice (Fisher, 1892;Samuelson, 1937;Samuelson, 1938). The preferences may be fixed, as the theory assumes, or they may be flexibly constructed on the fly at the time of choice; the distinction is debatable but not crucial for the current experiment (Payne, Bettman, & Schkade, 1999;Simonson, 2008;Dhar & Novemsky, 2008;Kivetz, Netzer & Schrift, 2008;Warren, McGraw & Van Boven, 2011). Our previous work established ICs in rhesus monkeys that represent subjective reward values in an orderly manner and fulfill necessary requirements for rationality, including completeness (preference for one or the other option, or indifference), transitivity, and independence of option set size (Pastor-Bernier et al., 2017). Similar ICs were empirically estimated in humans (Pastor-Bernier et al., 2020). The ICs represent the relative subjective values of the two bundle rewards; thus, important for the present study, IC changes would indicate changes in relative reward value. Responses of substantial fractions of OFC neurons follow the IC scheme, namely increasing with higher subjective value, and being equal with differently composed but equally valued bundles (Pastor-Bernier et al., 2019). The feasibility of testing OFC neurons with two-reward bundles allowed us to investigate value changes indicative of relative, reward-specific satiety during on-going reward consumption.
The current study used the rigorous formalisms of ICs to investigate the influence of on-going reward consumption on OFC neurons. We presented monkeys with bundles containing a common juice (blackcurrant) and one other reward liquid. Natural, on-going reward consumption of the reward bundle altered systematically the geometry and key parameters of ICs, which suggested reward value changes reflecting reward-specific satiety. Neuronal signals in OFC coding the chosen value of multiple rewards followed the altered ICs that indicated reward-specific satiety. These data from a novel, concept-driven approach unequivocally demonstrate reward-specific satiety of OFC value neurons. whose slope and curvature reflect, and change with, the subjective value of one bundle reward relative to the other bundle reward. We tested stochastic choices rather than single-shot choices for reasons of neuronal response statistics.

Figure 1. Design, task and behavior
(A) Test scheme: relative reward-specific satiety indicated by decreasing trade-off: with on-going consumption of both juices, the animal gave up progressively less blackcurrant juice for obtaining the same amount (0.3 ml) of grape juice while maintaining choice indifference between the black and one of the colored bundles (from green to red). The two colored curves show indifference curves estimated from choices of bundles between the colored dots. These changes suggested subjective value loss of grape juice relative to blackcurrant juice. (B) Choice options. Each bundle contained two rewards (A, B) with independently set amounts indicated by the vertical bar position within each rectangle (higher was more). The Reference Bundle contained two preset reward amounts. The Variable Bundle contained a specific amount of one reward and an experimentally varied amount of the other reward. (C) Task sequence: In each trial the animal contacted a central touch key for 1.0 s; then the two choice options appeared on a computer monitor. After 2.0 s, two blue spots appeared on the monitor, and the animal touched one option within 2.0 s. After touching the target for 1.0 s, the blue spot underneath the chosen bundle turned green as feedback for successful selection, and the blue spot disappeared. The computer-controlled liquid solenoid valve delivered reward A at 1.0 s after the choice, and reward B 0.5 s later. (D) Psychophysical assessment of choice between constant Reference Bundle (0.6 ml blackcurrant juice, 0.0 ml grape juice) and Variable bundle (varying blackcurrant juice, 0.3 ml grape juice) (same bundles as in C). Green and violet curves inside green +95% confidence intervals: initial choices; blue, orange and red curves: on-going consumption. Each curve was estimated from 80 trials (Weibull fits). (E) Gradual changes in slope and curvature of choice indifference curves between pre-satiety (green, violet) and during increasing satiety (blue, orange, red).
The two-dimensional ICs represent choices of two-reward bundles in a convenient graphical manner ( Figure 1A). In choice between two bundles, relative reward value can be inferred from the amount of one reward the animal gives up in order to obtain one unit of the other reward (called the Marginal Rate of Substitution, MRS). The trade-off is measured at choice indifference between the new bundle and the old bundle (equal probability of P = 0.5 for choosing each of two options). The  Choice position of equally preferred bundles on the two-dimensional graph are called choice indifference points (IP). Any reward value change between the components from reward-specific satiety would be manifested as a change in the trade-off amounts at the IP (MRS change).
At the onset of a daily experiment, the black and green bundles of Figure 1A were chosen with equal probability. When choosing the green bundle, the animal gave up 0.5 ml of blackcurrant juice (from 0.6 ml to 0.1 ml) to gain 0.3 ml of grape juice. With repeated choices, the animal consumed both juices, and the trade-off amounts changed: to gain the same 0.3 ml amount of grape juice, the animal gave up progressively less blackcurrant juice, from 0.45 ml via 0.38 ml and 0.25 ml to finally only 1.8 ml (upward arrow, from violet via blue and orange to red). Thus, the slope of the IC between the black and the colored bundles changed as the animal 'payed' progressively less blackcurrant juice for the same amount of grape juice. The IC changed also in shape; the curvature changed from initial convex (relative to origin; green) to concave (red), indicating that the animal was reluctant to give up any blackcurrant juice unless it received substantial amounts of grape juice. Both changes indicated a reduction of subjective reward value of grape juice relative to blackcurrant juice during on-going consumption of both juices, which suggested relative reward-specific satiety for grape juice. These IC changes constituted our test scheme for satiety.

On-going reward consumption affects subjective value represented by ICs
To establish ICs representing subjective reward value, we presented the monkey simultaneously with two composite stimuli on a horizontally mounted touch screen (binary choice task with two discrete, mutually exclusive and collectively exhaustive options; Figure 1B , C). Two rectangles in each stimulus represented a bundle with two reward components whose individual amounts were indicated by a vertical bar (higher was more). The two components were blackcurrant juice or blackcurrant juice with added monosodium glutamate (MSG) in all bundle types as reward A, and grape juice, strawberry juice, mango juice, water, apple juice, peach juice or grape juice with added inosine monophosphate (IMG) as reward B.
We set both rewards in the Reference Bundle to specific amounts, varied psychophysically the amount of one reward in the Variable Bundle over the whole testing range and estimated the amount of reward at which both bundles were stochastically chosen with equal probability using a Weibull fit on the choice function. These two amounts defined the IP on the two-dimensional graph. As schematized in Figure 1A, on-going juice consumption during these choices resulted in increasing amounts of blackcurrant juice being retained for gaining the same amount of grape juice at choice indifference ( Figure 1D; rightward shifts of IPs from green via violet, blue and orange to red). The initial two IPs were close together (green and violet in green zone), whereas the next IPs showed substantial change, suggesting initially maintained relative subjective value between the two rewards until an outright drop occurred (blue, yellow and red IPs). These changes indicated progressive value reduction of grape juice with on-going consumption.
The IPs were used to fit indifference curves (IC) along which all bundles were equally preferred ( Figure 1E; see Methods; Eq. 1). For example, the green IC was fitted from bundles that were all equally preferred to each other (and equally preferred to the black bundle at top left, given previous transitivity tests; Pastor-Bernier et al. 2019). On-going juice consumption resulted in wellordered, monotonic change of IC slope from green to red and concomitant transition from convex via linear to concave curvature, indicating relative reward-specific value reduction and satiety for grape juice.
Positioning of single-component bundles along the x-and y-axes allowed numeric value assessment without liquid interaction within bundles. Opposite to before, we held blackcurrant juice constant and psychophysically estimated the trade-in amounts of grape juice at IPs (Figure S1A-C). With on-going juice consumption, the animal gave up the same constant blackcurrant juice amount only when gaining monotonically increasing grape juice amounts at IP, thus reducing the ratio blackcurrant:grape juice and confirming the relative value reduction of grape juice. The IC curvature changed in a similar way as with the original testing scheme ( Figure S1D). The ICs with Monkey B showed similar changes (Figures S1E and S1F). These tests demonstrate robust value reduction of grape juice with on-going consumption irrespective of the test scheme employed.

Consistency across different bundles
Two rhesus monkeys performed 74,659 trials with the eight bundle types (Figure 2). Given that relative reward-specific satiety would change the ratio of reward amounts at IPs, and the observation that animals sated least on blackcurrant juice, we defined the boundary between presated and sated states by the confidence interval of the initial, left-most choice function between blackcurrant juice and any reward (green in Figures 1D, S1A and S1E); any IP outside this interval   On-going consumption of all eight bundles by both animals produced asymmetric satietyrelated changes of IC shape ( Figure 2). Stronger satiety for 7 of the 8 liquids (x-axis) relative to blackcurrant (y-axis) resulted in flattening of ICs and gradual transition from convexity via linearity to concavity. However, monkey B seemed to become less sated on peach juice compared to blackcurrant juice, as suggested by steeper ICs ( Figure 2H); with on-going consumption, the animal gave up more blackcurrant juice for gaining the same amount of peach juice, indicating value loss of blackcurrant juice relative to peach juice.
Numeric comparisons of IC parameters substantiated these findings. The IC slope relative to blackcurrant decreased significantly with on-going consumption of all rewards except for peach juice and strawberry juice ( Figure S1G; P = 0.0156, Wilcoxon paired test). The IC curvature flattened significantly and switched from convex to concave with five of the eight tested bundle types (Figures 2; S1H; P = 0.0313). These IC changes demonstrated robust relative subjective value loss with on-going liquid consumption in a variety of bundle types.

Control for other choice variables
To confirm that bundle choice continued to vary only with the bundle rewards and did not reflect unrelated variables during satiety, we performed a logistic regression (Eq. 2). As before satiety (Pastor-Bernier et al. 2019), we found that the probability of choosing the Variable Bundle continued to correlate positively with the amounts of both of its rewards, and inversely with the amounts of both Reference Bundle rewards ( Figure S1I; VA, VB vs. RA, RB), confirming previous findings (Pastor-Bernier et al., 2019). Further, choice probability for the Variable Bundle was anticorrelated with the accumulated consumption of blackcurrant juice (MA) and positively correlated with grape juice consumption (MB). This asymmetry is explained by the trade-off at IPs; as grape juice lost more value than blackcurrant juice during satiety, the animal consumed more grape juice and gave up less blackcurrant juice. Trial number within individual trial blocks (CT) and spatial choice CL) did not explain the choice. Thus, even with on-going consumption, the animals based their choice on the reward amounts of the bundles and the actually consumed rewards according to the experimental design; unrelated variables kept having no significant influence.

Licking and liquid consumption
Licking durations are a crude means for assessing subjective reward value and could represent a mechanism-independent confirmation for the value changes seen with the ICs. Trial-by-trial time courses of licking durations with on-going consumption showed gradual and asymmetric decreases for the bundle rewards. Licking remained nearly constant for blackcurrant juice (slope = -2.86 deg, R 2 = 0.56; linear regression) but decreased strongly for grape juice (slope = -20.6 deg, R 2 = 0.50), suggesting stronger value loss for grape juice compared to blackcurrant juice ( Figure 3A, B). Cumulative lick durations were significantly longer in the pre-sated state (green) compared to the sated state (violet) with the main liquids tested in both monkeys ( Figure 3C-G). The reward value changes inferred from lick durations corresponded to those inferred from IC slope and curvature changes. The lick durations indicated also some value reduction of blackcurrant juice, suggesting that the differential reward-specific value changes did not derive from a single bundle reward but were relative between the two rewards.
The IC flattening with on-going consumption indicated that the animal required increasing amounts of the more devalued reward B for giving up the same amount of the less devalued reward A at trade-off ( Figures 1E, 2). This led to increasing consumption of the more devalued reward B, which seems paradoxical but can be explained by the choice properties for two-component bundles; at trade-off, the animal gave up some of the less sated reward only if it received more of the sated reward. As the animal had no control over the Reference Bundle that defined the IP, the animal ended up consuming relatively more of the devalued reward as the session advanced. For example, with the bundle (blackcurrant juice, water), the consumption of the devalued water increased relative to that of the less devalued blackcurrant juice ( Figure 3H; blue vs. red; P = 5.0979 x 10 -7 ; Kolmogorov-Smirnov test; N = 7,160 trials). Concomitant with consumption, the ratio blackcurrant:water amounts at IP decreased, indicating that water had lost more subjective value than blackcurrant juice, as shown by exponential decay ( Figure 3I). The correlation between this ratio and the combined consumption of bundles blackcurrant juice with grape juice, water, strawberry juice and mango juice was highly significant (Rho = 0.3859; P = 0.0056; Pearson).
Thus, the licking changes confirmed in a mechanism-independent manner the relative rewardspecific value changes inferred from IC choices. (H) Cumulative consumption of water and blackcurrant juice during 10 advancing blocks and 7,160 anchor trials (each bundle contained only one non-zero liquid). For constant blackcurrant amounts (red), the animal consumed significantly more water than blackcurrant in gradually advancing trial blocks. (I)

Neuronal test design
We used the IC changes with on-going reward consumption observed in a large variety of bundles to investigate altered value coding in OFC reward neurons. Given the shallower slopes and the less convex and more concave curvatures, we placed bundles on specific segments of the ICs that would change with on-going consumption, such that the physically unaltered bundles would end up on different ICs or IC parts. We tested neurons in either or both of two situations: (i) during choice over zero-bundle, both rewards were set to zero in one bundle, and the animal unfailingly chose the alternative, non-zero bundle; (ii) during choice between two non-zero bundles, at least one reward was set to non-zero in both bundles, and the animal chose either bundle. All tested neuronal responses were sensitive to multiple rewards and coded the value of the bundle the animal chose (chosen value). The tested responses followed the basic scheme of ICs (Pastor-Bernier et al., 2019): monotonic increase with bundles placed on different ICs (testing bundles with different value), and insignificant response variation with bundles positioned along same ICs (testing equally preferred bundles with equal value). Our satiety test involved two bundle placements that considered the IC properties: variation of blackcurrant juice while holding grape juice constant, and variation of grape juice while holding blackcurrant juice constant. Comparison of the x-y plots between the pre-sated state ( Figure 4A and B) and the sated state (C and D) illustrates this test scheme. The IC flattening with satiety moved the bundle positions relative to the ICs substantially for grape juice variation (compare B and D) but very little for blackcurrant juice variation (compare A with C). Thus, tests following this design should be sensitive for detecting neuronal changes with satiety.

Single-neuron value-coding follows IC changes
At the beginning of daily testing, neuronal responses followed monotonically the increase of both bundle rewards, confirming value coding by the tested neuron ( Figure 4A and B). With on-going reward consumption, the ICs changed; as a consequence, bundles aligned with increasing blackcurrant juice kept their position on the ICs, and the neuronal responses continued to distinguish reward value during choice over zero-bundle ( Figure 4C). By contrast, as the ICs flattened and became concave, the three, physically unaltered bundles aligned with increasing grape juice were now almost on the same IC ( Figure 4D), which indicated similar reward value for these bundles. Correspondingly, the neuronal responses failed to vary with grape juice amounts, and the response peak for the largest grape juice quantity had dropped by 75% as this reward was now located on the second highest IC instead of the highest IC ( Figure 4D). This result is consistent with the stronger value reduction of grape juice compared to blackcurrant juice as inferred from the flattened ICs.
The neuronal changes on-going reward consumption occurred also in choices between two non-zero bundles ( Figure S2). The positions of bundles aligned with increasing blackcurrant juice remained on the same ICs as before, and the responses continued to code the value of the chosen option, as the intermediate responses to bundles on the intermediate IC suggested (Figure S2A and C; blue; dotted line for hollow dot). By contrast, the three physically unaltered bundles aligned with varying grape juice were now distributed over a narrower and lower IC range, indicating smaller differences of lower value, and the chosen value responses became correspondingly less differential and lower ( Figure S2B vs. S2D, red, blue, green). Further, the responses to the physically unaltered bundle whose position had changed from intermediate to highest IC (hollow blue) now dominated all other responses (dotted blue line).
With all these changes, OFC neurons continued to code reward value with on-going reward consumption. Their responses continued to follow the amount of blackcurrant juice whose value had changed less (Figures 4A and C,and S2A and C) but were substantially altered for grape juice whose value had changed more ( Figure 4B and D and Figure S2B and D). These OFC signals reflected reward-specific relative value change and satiety as inferred from the altered ICs.  (C), the responses followed the satiety-induced IC change.

Neuronal population
We investigated satiety in a total of 272 task-related OFC neurons in area 13 at 30-38 mm anterior to the interaural line and lateral 0-19 mm from the midline (which were a part of the population reported previously; Pastor-Bernier et al., 2019). Responses in 98 of these OFC neurons followed the IC scheme in any of the four task epochs (Bundle stimulus, Go, Choice or Reward) during choice over zero-bundle or choice between two non-zero bundles (Table 1). Of the 98 tested neurons, 82 showed satiety-related changes with bundles composed of blackcurrant juice (component A) and grape juice, water or mango juice (component B) ( Table 2).  (F) As (E)  We tested averaged z-scored neuronal population responses with the same scheme of bundle alignment on ICs as with single neurons. Bundles aligned with blackcurrant juice (component A) remained on the same three ICs during satiety; by contrast, with the satiety-induced IC flattening, bundles aligned with grape juice, water or mango juice (component B) that were on different ICs before satiety were now very close to a single, intermediate IC with little value variation (see left xy maps in Figure 4A -D). The population of 101 positive value coding responses in 31 neurons continued to vary with blackcurrant juice amount during satiety in any task epoch (Bundle stimulus, Go, Choice or Reward), although with a 12% peak reduction ( Figure 5A, B); response variations with reward amounts of component B in the same neurons went from significant differences before satiety to insignificant differences during satiety, with a 43% peak reduction ( Figure 5C, D). Thus, the neuronal population responses confirmed the satiety pattern seen in single neurons.

Figure 5. Population responses (A) -(D) Averaged z-scored population responses from
Numeric quantification of individual responses demonstrated satiety-induced significant response reduction with positive value coding neurons and significant response increases with negative (inverse) coding neurons during choice over zero-bundle ( Figure 5E and F, red) and during choice between two non-zero bundles ( Figure 5G and H, red; Table 2). A minority of neurons showed either inverse changes that were difficult to reconcile with value coding (black in Figures  5E-H), or no significant changes at all.

Neuronal satiety changes indicated by classification accuracy
To confirm the changes in neuronal value coding with a different approach, we tested the extent to which a hypothetical observer could use the neuronal responses to distinguish bundles on different ICs before and during satiety. Specifically, how well could neuronal responses obtained before satiety distinguish the same bundles during satiety, and vice versa? If the neuronal bundle responses reflected the substantial IC changes, the classification of the unchanged bundles should be rather low. To this end, we trained a support vector machine (SVM) classifier on neuronal responses to randomly selected bundles positioned on the lowest and highest of three ICs, respectively. Good classifier performance was evidenced by decent discrimination with as few as five neurons and increasing accuracy with added neurons (Figure 6). The two tests provided similar accuracy drops: First, the classifier trained on neuronal responses to bundle stimuli before satiety provided good bundle distinction before satiety during choice over zero-bundle, testifying to its accuracy. However, accuracy dropped dramatically when the classifier trained before satiety tested bundle distinction during satiety, despite continuing accuracy increase with added neurons ( Figure 6A).
Second, in the reversed procedure, accuracy was high when training and testing the classifier for bundle distinction during satiety, but lower when training during satiety but testing before satiety. These accuracy differences were seen during choice over zero-bundle with neuronal responses to Bundle ( Figure 6B) and Go stimuli but not during Choice and Reward epochs ( Figure  S3A-C). The changes were not explained by pretrial baseline changes ( Figure S3D). Substantial accuracy differences were also seen in choice between two non-zero bundles during the Bundle stimulus, Go and Choice epochs but not during the Reward epoch ( Figure S3E-H), again not explained by baseline changes ( Figure S3I). The changes in accuracy were consistent across ongoing consumption ( Figure S3J).
Together, these data demonstrate that the neuronal responses dynamically followed the substantial IC changes that reflected the value changes and satiety from on-going reward consumption.

Figure 6. Classifier performance demonstrates substantial satiety-induced value change (A) Classification by support vector machine (SVM) using neuronal responses to stimuli of bundles positioned on the lowest and highest indifference curve (IC), respectively (choice over zero-bundle). Left two maps show identical bundle positions on changed ICs with on-going juice consumption. Satiety-induced
value change is inferred from altered ICs (red). Right: results from classifier trained before satiety and tested for bundle distinction between the two ICs before satiety (black) and during satiety (red). The higher accuracy of bundle distinction with increasing neuron numbers attests to classifier validity. Error bars indicate standard error of the mean (SEM). (B) As (A), but training of classifier during satiety using bundles positioned in relation to satiety-altered ICs.

Neuronal satiety changes with single-reward bundles
Using choice options with two reward components differs in several ways from previous studies (Tremblay & Schultz 1999;Padoa-Schioppa & Assad 2006) and requires controls and additional analyses. We used the same two visual component stimuli but set only one, but different, reward in each bundle to a non-zero amount, which positioned the bundles graphically along the x-axis and yaxis but not inside the IC map; the ICs had been estimated with conventional bundles with two mostly non-zero rewards varying over the whole test range.
First we used single-reward bundles for confirming the results with conventional bundles. The responses of the neuron shown in Figure 7A, B distinguished well both rewards during choice over zero-bundle before satiety. With on-going consumption of both rewards, the ICs flattened, preserving the blackcurrant juice positions on the ICs ( Figure 7C) but changing the physically unchanged position of the two water amounts relative to the ICs ( Figure 7D). The neuron kept discriminating blackcurrant juice amounts during satiety ( Figure 7C). However, with the satietyinduced IC change, the large water amount was now positioned on a lower IC than before ( Figure  7D, red on x-axis), which was the same IC as the small blackcurrant amount was about on (blue on y-axis). Correspondingly, the neuronal activity with the large water amount lost its peak (reduction by 50%) and was now very similar to the activity with the small blackcurrant amount ( Figure 7C, D, red dotted vs. blue solid arrows). Further, the position of the small water amount was now below its original IC (blue on x-axis), and the neuron, with its lost response, failed to distinguish between the two water amounts. Thus, the neuronal changes with single-reward bundles followed the satiety- induced IC changes, indicating that the neuronal satiety changes reported above were not specific for multi-component bundles.  (C) Despite IC flattening after on-going reward consumption, the two bundles with blackcurrant juice variation remained on the same two ICs, and the neuronal response variation remained significant (P = 0.002, F = 11.04, 40 trials), and the peak response was only slightly reduced (red). Dotted ICs are from presated state. (D) IC flattening after on-going reward consumption indicates relative value reduction and satiety for water. The two unchanged bundles with water variation were now located below and at the IC. The neuronal response was substantially reduced by 50% (red) and had lost significant variation (P = 4337, F = 0.64, 40 trials). Further, the large-water bundle (dashed red line) elicited now a similar response as the low- blackcurrant bundle that is now located on the same IC (solid blue line). Thus, while continuing to code reward value (C), the responses followed the satiety-induced IC change. ( Next we used single-reward bundles for more quantification. We plotted neuronal population vectors from dots on polar plots that showed the influence of each of the two rewards on the neuronal response ( Figure 7E-I). The usually unequal value of the two rewards was manifested as deviation from the diagonal, and the relative value change with on-going consumption was expressed as change in the neuronal population vector. For example, in tests with the bundle (blackcurrant juice, grape juice), the elevation angle of the neuronal population vector increased from 35 deg before satiety to 62 deg during satiety in choice over zero-reward bundle ( Figure 7E, red), and from 38 deg to 45 deg with choice between two non-zero bundles ( Figure 7F). This change indicated value reduction of grape juice (plotted on x-axis) relative to blackcurrant juice (yaxis) with on-going consumption. Further, the shorter neuronal vectors during satiety indicated reduced overall responding (red). Similar changes indicated reduced value coding for water and mango juice (x-axis) relative to blackcurrant juice (y-axis) ( Figure 7G-I). These neuronal changes were paralleled by changes of the behavioral vector ( Figure 7E-I, green). Both before satiety and during satiety, the neuronal vectors (red) were within the confidence intervals of the behavioral vectors (green). An analysis of IC slopes during satiety confirmed the neuronal-behavioral correspondence seen with the vector plots. Estimated from regression coefficient ratios (-β 2 / β 1 ) (Eq. 3) and (-b / a) (Eq. 1), the slopes of the linear neuronal ICs of single-reward bundles correlated well with the slopes of linear behavioral ICs ( Figure 7J). Thus, the vector analysis of population responses confirmed and quantified the reward value changes with on-going consumption seen with the single-neuron responses to bundles aligned to ICs.

E) Polar and vectorial population plots for neuronal responses for bundle (blackcurrant juice, grape juice) (black, red), and vector plots for behavioural choice over zero-bundle (green). Neuronal vector slopes were 35 deg before satiety and 62 deg during satiety, using all significantly positive and normalized negative (inverse) coding responses from all four task epochs; all included responses followed the IC scheme. Dots refer to neuronal responses, vectors represent averages from behavioral choices (green; dotted lines: 95% confidence intervals) and neuronal responses (red), based on Eqs. 1 and 3, respectively (see Methods). Neuronal correlation coefficients (b's) on axes refer to Eq. 3. (F) As for (C) but for choice between two non-zero bundles. Neuronal vector slopes were 38 deg before and 45 deg during satiety. (G), (H) As (E, F) but for bundle (blackcurrant juice, water). (I) As (E) but for bundle (blackcurrant juice, mango juice). (J) Correlation between rectified neuronal and behavioral IC slopes during satiety in all tested neurons (rho
Taken together, the results with single-reward bundles confirmed the findings with our conventional two-reward bundles: neuronal value responses changed with on-going consumption in good correlation with behavioral changes, indicating a neuronal correlate for relative, rewardspecific satiety.

DISCUSSION
This study used bundles of two rewards and found changes in value coding of OFC neurons during on-going reward consumption that indicated relative reward-specific satiety. Behavioral choices were captured by graphic ICs that represented relative subjective values of two juice rewards in a conceptually rigorous manner. The ICs changed with on-going reward consumption during individual experimental sessions in a characteristic manner that indicated an orderly change in reward value and suggested relative, reward-specific satiety (Figures 1 and 2). Satiety was mechanism-independently suggested by changes in licking behavior ( Figure 3). Specifically, ongoing consumption of both bundle rewards resulted in progressive flattening of the ICs, which indicated value loss for one bundle reward relative to the other bundle reward. Our preceding study had established neuronal chosen value responses in OFC that were sensitive to multiple rewards and followed the animal's rational choice of two-reward bundles, including completeness, transitivity and independence from option set size (Pastor-Bernier et al., 2019). The current study shows that such OFC value responses matched the IC changes during relative reward-specific satiety. Specifically, the responses were similar with all equally valued rewards on flattened ICs (Figures 4  and 5). Machine learning classifiers predicting bundle discrimination from neuronal responses confirmed accurate reward value coding both before and during satiety and demonstrated the substantial nature of the neuronal changes ( Figure 6). Responses to conventional single rewards confirmed these satiety-indced changes (Figure 7). These data from a particularly sensitive reward value test demonstrate that neuronal responses in OFC follow the value alterations induced by reward-specific atiety.
The current demonstration of systematically altered reward value coding with reward-specific satiety builds on previous studies on monkey OFC neurons that investigated satiety in a more basic manner. There are notably the studies from Rolls' laboratory in which monkeys were presented with syringes or tubes containing various fruit juices; rating scales were used to assess behavioral acceptance or rejection of these juices after bolus injections or on-going consumption (Rolls et al. 1989;Critchley & Rolls 1996). The studies report on OFC neurons that responded to several juices and lost the response only for the particular juice on which the animal was sated. The response reduction with sensory-specific satiety in OFC contrast with Rolls' studies on earlier stages of the gustatory system, including the nucleus of the solitary tract, the frontal opercular taste cortex, and the insular taste cortex, where no such satiety-related changes were found (Yaxley et al. 1985;Yaxley et al. 1988;Rolls et al. 1988). These studies were the first to describe neuronal correlates of sensory-specific satiety, although it is unknown whether the neurons coded subjective reward value inferred from choices in the absence of satiety or covaried with other crucial aspects of reward value, such as reward amount and behavioral preference that formed the basis for our study. Another study found reward response increases with satiety in some OFC neurons (Pritchard et al. 2008), which might correspond to some of our results that were incompatible with reward value coding (satiety-induced response increases in positive value coding neurons, satiety-induced response decreases in inverse value coding neurons; Figure 5E-H).
While reward-specific satiety affects subjective reward value, on-going consumption induces also a general reduction of arousal, attention and motivation. Such general satiety affects the processing of all rewards in an environment or context in which some satiation occurs, both for rewards on which the animal has been sated and for those on which the animal has not been sated. General satiety effects cannot be distinguished from reward-specific satiety when testing only a single reward, and the effects may be attributed to motivation, as in the case of reduced dopamine responses in mice that received food pellets for extended periods of time (Rossi et al. 2013). Nevertheless, even with testing restricted to a single reward, dopamine reward signals may be susceptible to genuine satiety, as the reduction of human midbrain responses with on-going consumption of Swiss chocolate suggests (Small et al.,201). In our results, the shorter neuronal population vectors might indicate an effect of general satiety on neuronal responses, in addition to the reward-specific satiety suggested by the changes in vector angle ( Figure 7E-I). However, general satiety cannot explain our asymmetric behavioral and neuronal effects that indicate relative reward-specific value changes in OFC.
The observed increase in consumption of sated liquids like water ( Figure 3H) seemed to contradict earlier findings and the general intuition that satiety would rather reduce consumption of rewards on which an animal is sated (Rolls et al. 1989;Critchley & Rolls 1996). Differences in study design might explain these discrepancies. When an animal has the choice between a sated and a non-sated reward, or the choice between accepting and not accepting a reward, it would naturally prefer the non-sated reward which by definition would have more value. This was the case in the cited earlier studies. By contrast, in our study, the animal chose between two bundles that each had two rewards on which the animal was differently sated. As the animal was still interested to obtain the less sated reward, it would inadvertently also receive the other, more sated bundle reward. The animal had no control over the setting of the Reference Bundle against which it would choose the alternative bundle. At the IP, the animal had the choice to give up some of the non-sated reward in order to receive more of the sated reward. If the animal was still interested in a less sated reward, it might give up a limited amount of it if it were to receive a lot more of the other reward as compensation (as long as it did not outright reject it, which was not the case). This trade-off was represented by the increasing concavity of the ICs with on-going consumption, which indicated that really large amounts of the more devalued reward B were required for giving up the less devalued reward A (Figures 1E, 2). Outright rejection of reward B would be represented not by a downward sloped IC but by an upward sloped IC, which was observed in our animals with lemon juice, yoghourt and saline (Pastor-Bernier et al., 2017) but not with the currently used rewards; such upward sloped ICs indicate that an animal needed to be 'bribed' with more reward for accepting these normally rejected rewards. By contrast, in the current satiety experiment, the animal inadvertently consumed more of the sated reward during satiety compared to before, and the maintained downward IC slope indicated that the animal was not entirely averse to the sated reward.

Animals
Two adult male macaque monkeys (Macaca mulatta; Monkey A, Monkey B), weighing 11.0 kg and 10.0 kg, respectively, were used in these experiments that had already yielded behavioral and neuronal data without satiety (Pastor-Bernier et al., 2017;Pastor-Bernier et al., 2019). Neither animal had been used in any other study.

Ethical approval
This research has been ethically reviewed, approved, regulated and supervised by the following institutions and individuals in the UK and at the University of Cambridge (UCam): the UK Home Office implementing the Animals (Scientific Procedures) Act 1986 with Amendment Regulations 2012, the local UK Home Office Inspector, the UK Animals in Science Committee (ASC), the UK National Centre for Replacement, Refinement and Reduction of Animal Experiments (NC3Rs), the UCam Animal Welfare and Ethical Review Body (AWERB), the Certificate Holder of the UCam Biomedical Service (UBS), the UCam Welfare Officer, the UCam Governance and Strategy Committee, the UCam Named Veterinary Surgeon (NVS), and the UCam Named Animal Care and Welfare Officer (NACWO).

General behavior
The animals were habituated during several months to sit in a primate chair (Crist Instruments) for a few hours each working day. They were trained in a specific, computer-controlled behavioral task in which they contacted visual stimuli on a horizontally mounted touch-sensitive computer monitor (Elo) located 30 cm in front of them. The animal's eye position in the horizontal and vertical plane were monitored with a non-invasive infrared oculometer (Iscan). Matlab software (Mathworks) running on a Microsoft Windows XP computer controlled the behavior and collected, analyzed and presented the data on-line. A solenoid valve (ASCO, SCB262C068) controlled by the same Windows computer served to deliver specific liquid amounts. A Microsoft SQL Server 2008 Database served for Matlab off-line data analysis. Following task training for about 6 months, animals were surgically implanted with a recording chamber for electrophysiological recordings, which typically lasted for another 6-10 months.

Stimuli, task and rewards
A computer touch monitor presented the subject with two visual stimuli (4º apart) representing two bundles, a Reference Bundle and a Variable Bundle ( Figure 1A). Each bundle contained two rewards (Component reward A: violet rectangle, and component reward B: green rectangle) with independently set amounts indicated by the vertical bar position within each rectangle (higher was more). The Reference Bundle contained two preset reward amounts that were fixed for a given block of trials. The Variable Bundle contained a specifically set amount of one reward and an experimentally varied amount of the other reward. The task sequence ( Figure 1B) has been described in detail (Pastor-Bernier et al., 2017;Pastor-Bernier et al., 2019) and are summarized as follows. Reward A in all bundles was blackcurrant juice, or blackcurrant juice with added monosodium glutamate (MSG), Reward B was grape juice, strawberry juice, mango juice, water, apple juice, peach juice, or grape juice with added inosine monophosphate (IMG).
Each trial began when the animal contacted a centrally located touch sensitive key for 1.0 s after a pseudorandom inter-trial interval of 1.6 ± 0.25 s. Then two bundles appeared and remained on the screen for 2.0 s, after which two blue spots appeared as GO stimulus underneath the bundles, upon which the animal released the touch key and touched the blue spot of its choice within 2.0 s. After a hold time of 1.0 s, the chosen blue spot turned green and the unchosen blue spot disappeared. Simultaneously a white frame around the chosen bundle appeared providing feedback for successful choice. The computer-controlled liquid solenoid valve delivered liquid A at 1.0 s after the choice, followed 0.5 s later by liquid B (except when using peach juice as reward B; here the sequence was reversed: liquid B was delivered first, then 0.5 s later liquid A, blackcurrant juice).

Estimation of behavioral ICs
The behavioral method used to obtain an IP from stochastic choice has been presented in full detail (Pastor-Bernier et al., 2017;Pastor-Bernier et al., 2019). With two bundle options, the animal chose between the pre-set Reference Bundle (left in Figure 1A) and the Variable Bundle (right) in repeated trials. Thus, the constant Reference Bundle provided a stable reference against the changing bundle composition in the Variable Bundle. We set one reward in the Variable Bundle to one unit (> 0.1 ml) above the amount of the same reward in the Reference Bundle, while pseudorandomly varying the amount of the other reward widely. The variation of the animal's repeated choice with that single varying reward allowed us to construct a full psychophysical function and estimate the IP from a Weibull fit (point of subjective equivalence; P = 0.5 choice of each bundle). We obtained each IP from a total of 80 trials (2 left-right stimulus positions with 5 equally spaced reward amounts in 8 trials). To avoid known adaptations in OFC neurons (Tremblay and Schultz, 1999;Padoa-Schioppa, 2009;Kobayashi et al., 2010;Rustichini et al., 2017), we always tested the full reward range of the experiment.
To obtain an IC, we fit a series of IPs with a hyperbolic function using weighted least mean squares: d = ay + bx + cxy (Eq. 1) with y and x as milliliter amount of reward A (plotted at y-axis on 2D graph, Figure 1A and 1E) and reward B (plotted at x-axis), a and b as weights of the influence of the reward amounts plotted on the y-and x-axes, respectively, and c as curvature. A potent reward that contributes strongly to the choice of the bundle would have a large weight (high coefficient a or b), whereas a less potent reward would have lower weight coefficients. Thus, with the potent (more weight) reward plotted on the x-axis, and the less potent (less weight) reward plotted on the y-axis, choice indifference between them (IC) would occur with smaller milliliter amounts on the x-axis compared to the yaxis. Hence, the IC slope would be steeper than the diagonal line (see Figure 1A, D). By resolving Eq. 1 as y = -(b / a) * x, the IC slope would be the ratio of the coefficients that reflect the weights of the rewards: -b / a. With a higher potency of reward B (x-axis) compared to reward B (y-axis), the rectified IC slope would be larger than 1. Relatively stronger satiety for reward B (x-axis) compared to reward A (y-axis) would reduce the weight of reward B, reduce the absolute value of the ratio -b / a, and flatten the IC slope. Thus, the IC slope -b / a describes the relative impact of the two bundle rewards (reflecting the value ratio between the two rewards), whereas the weights (a and b) describe the influence of the reward amounts. The hyperbolic function can be written in an equivalent form to the regression with interaction used for analysing neuronal responses (b 0 = b 1 A + b 2 B + b 3 AB; see Eq. 3 below).

Definition and criteria for pre-sated and sated states
Satiety was detected by psychophysical choice functions exceeding the confidence intervals of initial tests (see Figures 1C, S1A and S1E); this measure indicated a changed value relation between the two bundle rewards. More specifically, the gradual effect of satiety on choice preference was identified by tracking the IPs as consumption advanced across blocks of 80 trials. The Weibullfitted IPs were obtained psychophysically for fixed and equally spaced amounts of reward B. Changes in relative value of the two bundle rewards were assessed with interleaved anchor trials in choices between bundles with only one non-zero reward: bundle (non-zero blackcurrant juice; no reward B) vs. bundle (no blackcurrant juice; non-zero reward B), using any reward B. To aggregate IP data across sessions and compensate for across-session variability, we normalized the reward value ratio to the first titration block in all sessions. We then compared the normalized distributions of IPs within the CI of the first block with the distributions of IPs exceeding the CI of the first block.

Control regressions for behavioral choice
To test whether the animal's choice reflected the amount of the bundle rewards during satiety, rather than other, unintended variables such as spatial bias, we used the logistic regression with P (V) as probability of choice of Variable Bundle, b 0 as offset coefficient, b 1 -b 7 as correlation strength (regression slope) coefficients indicating the influence of the respective regressor, CT as trial number within block of consecutive trials, RA as amount of reward A of Reference Bundle, RB as amount of reward B of Reference Bundle, VA as amount of reward A of Variable Bundle, VB as amount of reward B of Variable Bundle, CL as choice of any bundle stimulus presented at the left, MA as consumed amount of reward A, MB as consumed amount of reward B, and e as error. We used a binomial fit with logit link function to obtain standardized b coefficients. Choices over zero-reward bundles were excluded in the regression to avoid internal correlation between value and consumption.

Licking
Licking was monitored with an infrared optosensor positioned below the juice spout (V6AP; STM Sensors). Anticipatory licking durations were measured between the appearance of the bundle stimuli and delivery of the first reward liquid (approximate duration 5 -6 s) in bundles containing only one non-zero component reward with advancing trials in satiety and within single working sessions. Licking data were collected with four different bundles, namely (blackcurrant juice, grape juice), (blackcurrant juice, water), (blackcurrant juice, strawberry juice) and (blackcurrant juice, mango juice).

Surgical procedures and electrophysiology
As described before for the same animals (Pastor-Bernier et al., 2019), a head-restraining device and a recording chamber (40 x 40 mm, Gray Matter) were implanted on the skull under full general anesthesia and aseptic conditions. The stereotactic coordinates of the chamber enabled neuronal recordings of the orbitofrontal cortex (OFC) (Paxinos et al., 2000). We located the OFC from bone marks on coronal and sagittal radiographs taken with a guide cannula inserted at a known coordinate in reference to the implanted chamber, using a medio-lateral vertical and a 20º degree forward directed approach aiming for area 13. Monkey A provided data from the left hemisphere, Monkey B from the right hemisphere, via a craniotomy in each animal ranging from Anterior 30 to 38, and Lateral 0 to 19. We conducted single-neuron electrophysiological recordings using both custom made glass-coated tungsten electrodes (Merrill & Ainsworth, 1972), and commercial electrodes (Alpha Omega, Israel) (impedance of about 1 MOhm at 1 kHz). Electrodes were inserted into the cortex with a multi-electrode drive (NaN drive, Israel) with the same angled approach as used for the radiography. Neuronal signals were collected at 20 kHz, amplified using conventional differential amplifiers (CED 1902 Cambridge Electronics Design) and band-passed filtered (high: 300 Hz, low: 5 kHz). We used a Schmitt-trigger to digitize the analog neuronal signal online into a computer-compatible TTL signal. However, we did not use the Schmitt-trigger to separate simultaneous recordings from multiple neurons, in which case we searched for another recording from only a single neuron, or we stored occasionally the data in analog form for off-line separation by dedicated software (Plexon offline sorter). An infrared eye tracking system monitored eye position (ETL200; ISCAN), with temperature check on an experimenter's hand at the approximate position of the animal's head.

Definition for neurons following the revealed preference scheme
We analysed single-neuron activity during four task epochs vs. Pretrial control (1 s): visual Bundle stimulus (2 s), Go signal (1 s), Choice (1 s) and Reward (2 s, starting with reward A, followed 0.5 s later by reward B, thus covering both rewards). To establish neuronal relationships to these task epochs, we compared the activity in each neuron during each task epoch separately against the Pretrial control epoch using the paired Wilcoxon test (P < 0.01). A neuron was considered taskrelated if its activity in at least one of the four task epochs differed significantly from the activity during the Pretrial control epoch. Responses of individual neurons should follow the scheme of two-dimensional ICs that characterizes revealed behavioral preferences for two-dimensional bundles. Specifically, the responses should comply with three characteristics defined previously (Pastor-Bernier et al., 2019).
(Characteristic 1) Neuronal responses should change monotonically with increasing behavioral preference across behavioral ICs, irrespective to bundle composition. Such monotonic neuronal response changes should reflect increasing amounts of one or both bundle rewards, assuming a positive monotonic subjective value function on reward amount.
(Characteristic 2) Neuronal responses should vary insignificantly for all equally preferred bundles positioned along a same behavioral IC, despite different physical bundle composition.
(Characteristic 3) Neuronal responses should follow the IC slope and the non-linear curvature of behavioral ICs. The IC slope reflects the value relationship between the two bundle rewards, indicating the revealed preference relation between the two rewards of a bundle, and thus the value of one reward relative to a common reference reward.
We used a combination of three statistical tests to assess these characteristics. Characteristic 1: To capture the change across ICs in the most conservative, assumption-free manner possible, we used a simple linear regression on each Wilcoxon-identified task-related response: with y as neuronal response in any of the four task epochs, measured as impulses/s and z-scored normalized to the Pretrial control epoch of 1.0 s (z-scoring of neuronal responses applied to all regressions listed below), A and B as milliliter amount of reward A (plotted at y-axis) and reward B (x-axis), respectively, b 0 as offset coefficient, b 1 and b 2 as neuronal regression coefficients, and e as error consisting of the sum of individual errors of each expression (err 0 , err 1 , err 2 , err 3 for offset and respective regressors 1-3). The regression defined by Eq. 3 is equivalent to the hyperbolic model used for fitting behavioral ICs (d=ax+by+cxy; Eq. 1). The coefficients b 1 and b 2 needed to be either both positive (indicating positive neuronal relationship, higher neuronal activity reflecting more reward quantity) or both negative (inverse neuronal relationship) to reflect the additive nature of the individual bundle components giving rise to revealed preference (P < 0.05, unless otherwise stated; t-test).
This linear regression assessed the degree of linear monotonicity of neuronal response change across ICs (P < 0.05 for b coefficients; t-test). Further, all significant positive or negative response changes identified by Eq. 3 needed to be also significant in a Spearman rank-correlation test that assessed ordinal monotonicity of response change across ICs without assuming linearity and numeric scale (P < 0.05).
Characteristics 1 and 2: To assess the two-dimensional across/along IC scheme in a direct and intuitive way, and without assuming monotonicity, linearity and numeric scale, we used a twofactor Anova on each Wilcoxon-identified task-related response that was significant for both regressors in Eq. 3; the factors were across-IC (ascending rank order of behavioral ICs) and along-IC (same rank order of behavioral IC). To be a candidate for following IC scheme of revealed preferences, changes across-ICs should be significant (P < 0.05), changes within-IC should be insignificant, and their interaction should be insignificant.
Characteristic 3: Whereas the regression defined by Eq. 3 estimated neuronal responses across ICs, a full estimation of neuronal ICs for comparison with behavioral ICs would require inclusion of the IC slope and curvature, both of which depended on both rewards. By simplifying Eq. 3 by setting to zero both the b 3 coefficient and the constant neuronal response along the IC, the neuronal IC slope would be the ratio of coefficients (-b 2 / b 1 ). Note the different meanings of the slope term: the neuronal IC slope (-b 2 / b 1 ) describes the relative coding strength of the two bundle rewards (reflecting the neuronal ratio of the two rewards), whereas each neuronal regression slope alone (b) describes the coding strength of neuronal response (correlation with the specific regressor). The neuronal IC curvature was estimated from the b 3 coefficient of the interaction term AB (all b's P < 0.05; t-test).
Polar plot of OFC reward sensitivity. The purpose of this analysis was to provide quantitative and graphic information about satiety-induced behavioral and neuronal changes that would allow comparison with previous OFC studies that had not established ICs (Tremblay & Schultz 1999;Padoa-Schioppa & Assad 2006). The analysis concerned monotonic response increase or decrease with increasing amounts of bundle rewards across ICs (characteristic 1 above), but did not address other IC characteristics such as trade-off, slope and curvature (characteristics 2 and 3) that had not been investigated previously. We established 2D polar plots whose dots indicated the relative contribution of each of the two bundle rewards to the neuronal response. We then constructed vectors by averaging these dots of neuronal responses. We then compared vectors of averaged neuronal responses and averaged behavioral choices before and during satiety.
For the behavioral choices, we plotted vectors (with 95% confidence intervals) from averaged polar plot dot positions defined by magnitude (distance from center: sqrt (a 2 + b 2 )) and relative weight (elevation angle: arctangent (a / b)); coefficient a refers to reward A (blackcurrant, y-axis), coefficient b refers to any of the other rewards (x-axis) (Eq. 1). The angle of the vector reflected the relative contribution the two bundle rewards to the choice, as estimated by the a and b coefficients (Eq. 1). A deviation of the alignment angle from the diagonal line indicated an unequal contribution weight to bundle choice, and thus a non-1:1 reward ratio.
For the neuronal plots, each dot on the 2D plot was defined by the two b regression coefficients for neuronal responses (Eq. 3; P < 0.01, t-test) for each of the two rewards in any of the four task epochs. The distance from center indicated the z-scored response magnitude (sqrt (b 1 2 + b 2 2 )), coding sign (positive or negative), and relative weight (elevation angle; arctangent (b 1 / b 2 )) of the two b coefficients. Coefficient b 1 referred to reward A (blackcurrant, y-axis), coefficient b 2 referred to any of the other rewards (x-axis). Responses with negative (inverse) coding were rectified. Further IC characteristics such as systematic trade-off across multiple IPs and IC curvature played no role in these graphs. The alignment of the dots along the diagonal axis showed the relative coding strength for the two bundle rewards, as estimated by the b regression coefficients; a deviation from the diagonal line indicated an unequal influence of the two bundle rewards on the neuronal responses, reflecting a neuronal correlate of reward ratio.

Neuronal decoders
We used linear support vector machine (SVM) algorithms to decode neuronal activity according to bundles presented at different behavioral ICs during choice over zero-reward bundle (bundle distinction) and, separately, according to the behavioral choice between two non-zero bundles located on different ICs (choice prediction). As in our main study on revealed preferences (Pastor-Bernier et al., 2019), we implemented both decoders as custom-written software in Matlab R2015b (Mathworks). The SVM decoder with linear kernel was accomplished with svmtrain and svmclassify procedures (our previous work had shown that use of nonlinear SVM kernels does not improve decoding Tsutsui et al., 2016). The SVM decoder was trained to find the optimal linear hyperplane for the best separation between two neuronal populations relative to lower vs. higher ICs.
All analyses employed single-neuron data, consisting of single-trial impulse counts that had been z-normalised to the activity during the Pretrial epoch in all trials recorded with the neuron under study. The analysis included activity from all neurons whose responses followed the IC scheme of revealed preferences during any of the four task epochs, as identified by our three-test statistics, except where noted. The neurons were recorded one at a time; therefore, the analysis concerned aggregated pseudo-populations of neuronal responses.
The decoding analysis used 10 trials per neuron for each of two ICs (total of 20 trials). Extensive analysis suggested that higher inclusion of 15-20 trials per group did not provide significantly better decoding rates (while reducing the number of included neurons). For neurons that had been recorded with > 10 trials per IC, we selected randomly 10 trials from each neuron for each of the two ICs. We used a leave-one-out cross-validation method in which we removed one of the 20 trials and trained the SVM decoder on the remaining 19 trials. We then used the SVM decoder to assess whether it accurately detected the IC of the left-out trial. We repeated this procedure 20 times, every time leaving out another one of the 20 trials. These 20 repetitions resulted in a percentage of accurate decoding (% out of n = 20). The final percentage estimate of accurate decoding resulted from averaging the results from 150 iterations of this 20-trial random selection procedure. To distinguish from chance decoding, we randomly shuffled the assignment of neuronal responses to the tested ICs, which should result in chance decoding (accuracy of 50% correct). A significant decoding with the real, non-shuffled data would be expressed as statistically significant difference against the shuffled data (P < 0.01; Wilcoxon rank-sum test).

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.    (P = 0.0046, F = 9.7, 27 trials). The responses to the two blue bundles on the same IC differed insignificantly (P = 0.2622, F = 1.31, 29 trials). Same color labels as in (A). (C) Despite IC change indicating satiety, the neuronal response increase across ICs remained significant (P = 0.0014, F = 10.87, 17 trials). However, the two unchanged blue bundles were now on different ICs, and their responses varied significantly (P = 0.0028, F = 5.46, 40 trials). (D) With IC change from convex to concave indicating satiety, the three bundles with grape juice variation were now located within only two ICs. Although the neuronal response increase across ICs remained significant (P = 0.0144, F = 6.02, 35 trials), the peak response was reduced by 25% (from 40 to 30 imp/s, red) and the three responses were closer to each other. Further, the two unchanged blue bundles were now on different ICs, and their responses now differed significantly (P = 0.0201, F = 9.27, 52 trials). Thus, the changes of neuronal responses were consistent with the IC change indicating satiety.