Gambling disorder is a behavioral addiction that negatively impacts personal finances, work, relationships and mental health. In this pre-registered study (

Gambling disorder is a behavioral addiction that can have detrimental effects on quality of life including personal finances, work, relationships and overall mental health (

Recently, categorical definitions of mental illness have increasingly been called into question. The National Institute for Mental Health of the United States proposed the Research Domain Criteria (RDoC) to foster characterization of the dimensions underlying psychiatric disorders. According to this approach, research in cognitive science should focus on the identification of continuous neuro-cognitive dimensions that might go awry in disease, i.e. trans-diagnostic markers (

Steep discounting has been consistently observed in substance use disorders and gambling disorder (

Addiction is known to be under substantial contextual control. Addiction-related cues and environments are powerful triggers of subjective craving, drug use and relapse. Incentive sensitization theory (

Though rarely examined in naturalistic settings, contextual effects on trans-diagnostic dimensions of decision-making are of substantial clinical and scientific interest. Settings with high ecological validity might provide more informative insights into the central drivers of maladaptive behavior than laboratory-based studies (

The present pre-registered study thus had the following aims. First, we aimed to replicate the findings by Dixon et al. (

This study was preregistered via the open science framework (

Second, to account for recent developments in computational modelling we also complemented the standard softmax model analysis with additional analyses of RT distributions via temporal discounting and reinforcement learning DDMs (

A-priori sample size was calculated based on results by Dixon et al. (

Participants were recruited via advertisements posted online and in local gambling venues. First, they were screened via a telephone interview to verify that they show evidence for problematic gambling behavior, with a primary gambling mode of electronic slot machines. Further inclusion criteria were age in the range of nineteen to fifty, no illegal drug use, and no history of neuropsychiatric disorders, current medication or a history of cardiovascular disease. The ethics committee of the University of Cologne Medical Center approved all study procedures.

Forty-two participants were then invited to a first appointment, where they provided written informed consent and completed a questionnaire assessment and a set of working memory tasks (see section on

Participants were invited to three appointments. At the first appointment (^{2} of size. Testing occurred while the café was in business as usual and experimenter and participant sat at a table next to a wall to assure some privacy. The café was usually moderately attended and testing occurred at the same spot for all participants, with only a few exceptions when this seat was taken. The gambling environment was a common slot-machine venue operated by a German gambling conglomerate. The experimenter and participant were seated at a table placed next to a wall in sight of the electronic gaming machines (EGMs). In total there were four EGMs in direct sight of the participant and a total of ten in the room (hidden by eye protection walls). The density of gambling related cues varied as a function of people playing at EGMs, background sounds e.g. sounds of winning or money dropping were all depended on regularly customers. However, in nearly all cases other people were playing EGMs in direct sight of the participants. The experimenter was granted permission to conduct research in two local gambling venues. Two chairs and a table to use for the experimental session were provided. In both locations, subjects were placed in such a way that neither experimenter nor customers could view their screen. Both tasks ran on a 15inch Laptop using the Psychopysics toolbox (

Participants filled out a battery of questionnaires regarding gambling related cognition (GRCS) (

We assessed working memory capacity using a set of four working memory paradigms. First, in an Operation Span Task (

Participants performed 140 trials of a temporal discounting task where on each trial they made a choice between a smaller-but-sooner (SS) immediate reward, and a larger-but-later (LL) reward delivered after a specific delay. SS and LL rewards were randomly displayed on the left and right sides of the screen, and participants were free to make their choice at any time. While SS rewards were held constant at 20€. LL rewards were computed as multiples of the SS reward (task version 1: 1.05, 1.055, 1.15, 1.25, 1.35, 1.45, 1.55, 1.65, 1.85, 2.05, 2.25, 2.55, 2.85, 3.05, 3.45, 3.85; task version 2: 1.025, 1.08, 1.2, 1.20, 1.33, 1.47, 1.5, 1.70, 1.83, 2.07, 2.3, 2.5, 2.80, 3.10, 3.5, 3.80. Each LL reward from one version was then combined with each delay option for this version (in days): (either: 1, 7, 13, 31, 58, 122, or v: 2, 6 15, 29, 62, 118) yielding 140 trials in total. The mean larger LL magnitude was the same across task versions and the order was counterbalanced across subjects and session (neutral/gambling).

At the end of each session, one decision was randomly selected and paid out in the form of a gift certificate for a large online store, either immediately (in the case of an SS choice) or via email/text message after the respective delay (in the case of a LL choice).

Participants performed a slightly modified version of the 2-step task, a sequential reinforcement learning paradigm (^{st} stage (S1), participants chose between two fractals embedded in grey boxes. After taking an S1 action, participants transitioned to one of two possible 2^{nd} stages (S2) with fixed transition probabilities of 70% and 30%. In S2, participants chose between two new fractals each providing a reward outcome in points (between 0–100) that fluctuated over time. To achieve optimal performance, participants had to learn two aspects of the task. They had to learn the transition structure, that is, which S1 stimulus preferentially (70%) leads to which pair of S2 stimuli. Further, they had to infer the fluctuating reward magnitudes associated with each S2 stimulus.

In both versions, the tasks differed in the S1 and S2 stimuli, and in the fluctuating rewards in S2. However both task versions reward walks were equal in variance and mean, that is version 2 walks were simply just version 1 walks in reverse. Both versions were presented in counterbalanced order per session (neutral/gambling). Participants were instructed about the task structure and performed 40 practice trials (with different random walks and symbols) at the first appointment (

We applied a single-parameter hyperbolic discounting model to describe how subjective value changes as a function of LL reward height and delay (Mazur, 1987; Green and Myerson, 2004):

Here, _{t}_{t}_{t}_{k}

Softmax action selection models choice probabilities as a sigmoid function of value differences (Sutton and Barto, 1998):

Here, _{t}_{t}_{b}

To more comprehensively examine environmental effects on choice dynamics, we additionally replaced softmax action selection with a series of drift diffusion model (DDM)-based choice rules. In the DDM, choices arise from a noisy evidence accumulation process that terminates as soon as the accumulated evidence exceeds one of two response boundaries. In the present setting, the upper boundary was defined as selection of the LL option, whereas the lower boundary was defined as selection of the SS option.

RTs for choices of the SS option were multiplied by –1 prior to model fitting. We furthermore used a percentile-based cut-off, such that for each participant the fastest and slowest 2.5 percent of trials were excluded from the analysis. We then first examined a null model (DDM_{0}) without any value modulation. Here, the RT on each trial

The parameter α models the boundary separation (i.e. the amount of evidence required before committing to a decision), τ models the non-decision time (i.e., components of the RT related to motor preparation and stimulus processing), _{x}_{t}

As in previous work (_{lin}) (

Here, the drift rate on trial t is calculated as the scaled value difference between the subjective LL and SS rewards. Thus, we substituted the v+s_υ*I_t term within Eq. 3 with v_t (Eq. 4). As noted above, RTs for SS options were multiplied by –1 prior to model estimation, such that this formulation predicts more SS choices whenever SV(SS)>SV(LL) (the trial-wise drift rate is negative), and predicts longest RTs for trials with the highest decision-conflict (i.e., in the case of SV(SS)= SV(LL) the trial-wise drift rate is zero). We next examined a DDM with non-linear trial-wise drift rate scaling (DDM_{S}) that has recently been reported to account for the value-dependency of RTs better than the DDM_{lin} (_{max}

All parameters including _{coeff}_{max}_{x}_{t}

We first applied a slightly modified version of the hybrid RL model (_{mf}_{mb}

Here, _{21}, _{22}), _{1}, _{2}) and _{1} and _{2} denote the learning rate for S1 and S2, respectively. S2 MF _{2},_{t}_{S}_{2},_{t}

In addition, as proposed by Toyama et al. (_{decay} and centered to the mean of reward walks (0.5). A decay of

and

S1 action selection is then modelled via weighting S1 MF and MB

where:

_{MB}_{s}_{MB}_{MB}_{t}

_{MF}_{s}_{MF}_{MF}_{t}

_{s}_{t}

_{2}_{s}_{2} + _{2} * _{t}

As in our analysis of temporal discounting we replaced softmax action selection with a DDM choice rule (_{MF}, vcoeff_{MB}_{max}_{x}_{t}

Data were filtered using a percentile-based cut-off, such that for each participant the fastest and slowest 2.5 percent of RTs/trials were excluded from further analysis. In addition, trials with RTs < 150ms were excluded. We then first examined a null model (DDM_{0}; Eq. 3) without any value modulation followed by two value-informed models where the drift-rate (

and the drift rate in S2 is calculated as

For the non-linear version, the linear drift rate from equations 16 and 17 are additionally passed through a sigmoid:

where

_{MB}_{s}_{MB}_{v}_{MB}

_{MF}_{s}_{MF}_{v}_{MF}

_{S}_{2}_{s}_{S2}_{S}_{2} *

_{S}_{i}_{s}_{Si}_{S}_{i} *

Softmax models were fit to all trials from all participants using a hierarchical Bayesian modeling approach with separate group-level distributions for all baseline parameters for the neutral context and shift parameters (_{x}

For the intertemporal choice data, model estimation was performed using Markov Chain Monte Carlo (MCMC) sampling as implemented in the JAGS (Version 4.3) software package (_{x}

For the 2-step task, model estimation was performed using MCMC sampling as implemented in STAN (

For baseline group-level means, we used uniform and normal priors defined over numerically plausible parameter ranges (see code and data availability section for details). For all _{x}

For both tasks, relative model comparison was performed via the _{x}_{x}

We carried out posterior predictive checks to examine whether models reproduced key patterns in the data, in particular the value-dependency of RTs (_{S}, and separately for the neutral and gambling context. For each participant and context, we then plotted the mean observed RTs as a function of decision conflict, as well as the mean RTs across 10k data sets simulated from the posterior distributions of the DDM_{0}, DDM_{lin} and DDM_{S}. For the 2-step task, we extracted mean posterior parameter estimates and simulated 200 datasets in R (Version 4.0.3) using the Rwiener package (Version 1.3.3). We then show RTs as a function of S2 reward difference of observed data and the mean RTs across 200 simulated datasets for of all DDMs. We further show that our models capture the relationship of S2 reward differences and optimal (max[reward]) choices.

As a model-agnostic measure of temporal discounting, we performed a logistic regression on choices as a function context (neutral vs. gambling; fixed effect) and subject as random effect. For the 2-step task we likewise use a hierarchical generalized linear model (HGLM) and modeled 2nd-stage RTs as a function of transition (common vs. rare) and context (neutral vs. gambling) as fixed and subject as random effect. In line with our modelling analyses, data were filtered so that implausibly fast RTs were excluded (see Methods). A standard analysis of stay probabilities (

On each testing day, participants rated their subjective craving (“How much do you desire to gamble right now?”) on a visual-analogue scale ranging from 0 to 100, both at the beginning of the testing session, and at the end following task completion. We then used paired t-tests to examine whether subjective craving differed between the testing environments (neutral vs. gambling).

Craving was assessed on a visual-analogue-scale before and after task performance. Due to technical problems, ratings of the first eight participants were lost. Another two participants did not complete post-task ratings. In the remaining n = 22 participants, craving was substantially higher in the gambling-related environment compared to the neutral environment (paired t-test pre-task: t_{23} = –3.13; p = 0.0048, Cohen’s _{21} = –4.32, p = 0.0003, Cohen’s

Subjective craving was assessed at the beginning

Raw proportions of larger-but-later (LL) choices are plotted in _{contex} = –0.52; z = –10.62, p < 0.0001) such that participants made more LL selections in the neutral vs. the gambling-related environment. Overall response time (RT) distributions are plotted in

Behavioral data from the temporal discounting task.

We first modeled the data using standard softmax action selection. This analysis revealed an overall context effect on log(k), such that discounting was substantially steeper in the gambling context compared to the neutral context (_{k}) was about 116 times more likely than a decrease (see

Softmax model; Posterior distributions of mean hyperparameter distributions for the neutral baseline context (blue) and the corresponding shift in the gambling context (pink). _{k});

Model comparison of temporal discounting DDMs revealed the same model ranking in each context (Supplemental Table S3) such that the data were best accounted for by a temporal discounting DDM with non-linear drift rate scaling. This model accounted for around 90% of decisions (Supplemental Table S4, Supplemental Figure S1) and posterior predictive checks confirmed that it reproduced individual-participant RTs (Supplemental Figure S2).

We next examined the posterior distributions of model parameters of the best-fitting TD-DDM model (DDMs with sigmoid drift rate scaling; we further report model comparison, binary choice predictions and posterior predictive checks in the corresponding

Temporal discounting drift diffusion model results: posterior distributions for hyperparameter means from the neutral context. _{max}, _{coeff},

Overview of overall context differences. For group comparisons we report Bayes Factors for directional effects for s_{x} hyperparameter distributions of s_{x} > 0 (gambling context > neutral context).

MODEL PARAMETER (CHANGE IN GAMBLING CONTEXT) | SOFTMAX MODEL | DDM_{S} |
||
---|---|---|---|---|

_{k} |
0.77 | 1688.53 | 0.40 | 54.20 |

_{β} |
0.025 | 2.27 | – | – |

_{vcoeff} |
– | – | –0.012 | 0.25 |

_{τ} |
– | – | –0.05 | 0.10 |

_{α} |
– | – | 0.10 | 4.40 |

_{z} |
– | – | 0.02 | 13.64 |

_{vmax} |
– | – | 0.33 | 39490.71 |

Temporal discounting drift diffusion model results: posterior distributions for hyperparameter means for context shift (s_{x}) parameters modeling changes from the neutral to the gambling context. _{k}), _{max}, _{coeff}, _{z}

As in the softmax model (

As preregistered, we next examined whether the increased in discount-rate s_{k} in the gambling context was associated with symptom severity or gambling related cognition. We therefore computed a compound symptom severity _{k}). We thus modelled the gambling context related shift in the discount-rate as a linear combination of both GRCS total scores and the gambling symptom severity compound score (see _{k}, the change in log(k) (95% HDI > 0; dBF = 37.81).

Participants earned significantly more points in the gambling context (t-test: t_{28} = –2.44,

An analysis of stay probabilities adapted to the present 2-step task version is shown in Supplemental Table S5. In each context, we observed main effects of reward (reflecting model-free RL) and reward x transition interaction (reflecting model-based RL). The reward x transition x context interaction was not significant.

We first examined a modified version of the hybrid model (

Hybrid model with softmax choice rule posterior distributions (top row: neutral context, bottom row: parameter changes in gambling context) of all group level means.

Overview of overall context differences. For context comparisons we report Bayes Factors for directional effects for s_{x} hyperparameter distributions of s_{x} > 0 (gambling context > neutral context).

MODEL PARAMETER (SHIFT) | SOFTMAX MODEL | DDM_{S} |
||
---|---|---|---|---|

_{ηS1} |
0.44 | 3.29 | 0.0801 | 1.186 |

_{ηS2} |
0.40 | 92.3 | 0.280 | 14.658 |

_{τS1} |
– | – | 0.001 | 0.8454 |

_{τS2} |
– | – | 0.001 | 1.161 |

_{ρ} |
0.04 | 1.946 | 0.05 | 2.365 |

_{αS1} |
– | – | –0.002 | 0.9354 |

_{αS2} |
– | – | 0.0149 | 2.026 |

_{MF}/S_{vcoeffMF} |
–1.14 | 0.010 | –0.93 | 0.083 |

_{MB}/S_{vcoeffMB} |
1.08 | 4.00 | 4.01 | 169.62 |

_{S2}/S_{vcoeffS2} |
–0.44 | 0.428 | –0.64 | 0.271 |

_{vmaxS1} |
– | – | –0.19 | 0.296 |

_{vmaxS2} |
– | – | 0.41 | 15.83 |

We next combined the hybrid model with a DDM choice-rule (

Posterior distributions for the best-fitting RLDDM are shown in

RL-DDM. Posterior distributions of all hyperparameters for the neutral baseline condition. _{max}. _{MF}. _{MB}. _{S2}.

RL-DDM. Posterior distributions of all shift-hyperparameters modelling the change the change from neutral to gambling condition. _{max}. _{MF}. _{MB}. _{S2}.

As preregistered, we examined associations between ρ (perseveration) and gambling symptom severity (average

Here we comprehensively examined the contextual modulation of two putatively trans-diagnostic markers implicated in addiction, temporal discounting (

Theoretical accounts highlight the central role of addiction-related cues and environments in drug addiction (_{max}

In addition to temporal discounting, we included a 2-step sequential decision-making task designed to dissociate model-based (MB) from model-free (MF) contributions to behavior (

The latter result contrast with our pre-registered hypothesis of

If the effects of gambling environments on 2-step task performance are (at least in part) driven by increases in DA, then the question arises why gamblers at the same time exhibited substantially increased temporal discounting. The literature on DA effects on temporal discounting is a mixed bag (

Given that DA was neither measured nor directly manipulated here, these issues cannot be directly resolved. However, our data might nonetheless provide some insights. Effects of DA on decision-making might depend on both task and context (

Our results show that two prominent (potentially trans-diagnostic) computational processes, temporal discounting and MB control, are differentially modulated by addiction-related environments in regular slot machine gamblers. This provides a computational psychiatry perspective on factors that contribute to the understanding of this disorder. The substantial contextual effects on temporal discounting further highlight the potential clinical relevance of this process (

We also extended previous studies on this topic via a recent class of value-based decision models based on the DDM (

A number of limitations need to be acknowledged. First, as in the original study (

To conclude, here we show that two computational trans-diagnostic markers with high relevance for gambling disorder in particular and addiction more generally are modulated in opposite ways by exposure to real gambling environments. Gamblers showed increased temporal discounting in a gambling context, and this effect was modulated by maladaptive control beliefs. In contrast, MB control improved, a finding that posits a challenge for habit/compulsion theories of addiction. Ecologically valid testing settings such as those investigated here can thus yield novel insights into environmental drivers of maladaptive behavior underlying mental disorders.

Model code and raw choice data is available on the Open Science Framework:

The additional files for this article can be found as follows:

Supplemental Tables, Figures and Results. DOI:

Intertemporal Choice- and 2-Step Task datasets for all participants. DOI:

This work was funded by Deutsche Forschungsgemeinschaft (grant PE1627/5-1 to J.P.).

The authors have no competing interests to declare.

JP conceived the idea and acquired the funding. JP and BW designed the study. BJW acquired the data. BJW analyzed the data and performed the modeling. DM contributed analytical tools/software. BJW wrote the first draft of the paper. BJW and JP wrote the paper. DM provided revisions. JP supervised the project.