Stimulus-Driven Affective Change: Evaluating Computational Models of Affect Dynamics in Conjunction with Input

Niels Vanhasbroeck; Tim Loossens; Nil Anarat; Sigert Ariens; Wolf Vanpaemel; Agnes Moors; Francis Tuerlinckx

doi:10.1007/s42761-022-00118-5

. 2022 Jul 14;3(3):559–576. doi: 10.1007/s42761-022-00118-5

Stimulus-Driven Affective Change: Evaluating Computational Models of Affect Dynamics in Conjunction with Input

Niels Vanhasbroeck ^1,^✉, Tim Loossens ¹, Nil Anarat ², Sigert Ariens ¹, Wolf Vanpaemel ¹, Agnes Moors ^1,³, Francis Tuerlinckx ¹

PMCID: PMC9537408 PMID: 36385907

Abstract

The way in which emotional experiences change over time can be studied through the use of computational models. An important question with regard to such models is which characteristics of the data a model should account for in order to adequately describe these data. Recently, attention has been drawn on the potential importance of nonlinearity as a characteristic of affect dynamics. However, this conclusion was reached through the use of experience sampling data in which no information was available about the context in which affect was measured. However, affective stimuli may induce some or all of the observed nonlinearity. This raises the question of whether computational models of affect dynamics should account for nonlinearity, or whether they just need to account for the affective stimuli a person encounters. To investigate this question, we used a probabilistic reward task in which participants either won or lost money at each trial. A number of plausible ways in which the experimental stimuli played a role were considered and applied to the nonlinear Affective Ising Model (AIM) and the linear Bounded Ornstein-Uhlenbeck (BOU) model. In order to reach a conclusion, the relative and absolute performance of these models were assessed. Results suggest that some of the observed nonlinearity could indeed be attributed to the experimental stimuli. However, not all nonlinearity was accounted for by these stimuli, suggesting that nonlinearity may present an inherent feature of affect dynamics. As such, nonlinearity should ideally be accounted for in the computational models of affect dynamics.

Supplementary Information

The online version contains supplementary material available at 10.1007/s42761-022-00118-5.

Keywords: Affect, Affect dynamics, Computational modeling, Context, Input

A central topic in the study of affect (i.e., the subjective component of emotions) is its dynamics, or how it changes over time. Affect dynamics can be studied by repeatedly assessing individuals’ affective states, either in the laboratory or in daily life. Using this method, researchers were able to identify systematic individual differences in people’s baseline affective state, the variability around this baseline, and the strength of the regulation towards it (Congard et al., 2011; Kalokerinos et al., 2020; Kuppens et al., 2010; Wendt et al., 2020). These individual differences have been linked to, among others, the development and maintenance of mood disorders (Ebner-Priemer et al., 2015; Kuppens et al., 2012; Sperry et al., 2020; Trull et al., 2015), albeit with varying degrees of success (Dejonckheere et al., 2019).

There are currently two dominant approaches to studying affect dynamics (see also Loossens et al., 2020). A first approach uses summary statistics to characterize an individual’s affective time series (Dejonckheere et al., 2019; Wendt et al., 2020). This approach is especially useful for investigating specific characteristics of affective time series. For example, some studies have investigated group differences in person-specific affective means and standard deviations (e.g., Congard et al., 2011; Kalokerinos et al., 2019) while others have focused on more complicated affect dynamical characteristics, such as affective instability (Houben et al., 2021; Sperry et al., 2020). Despite this approach’s predominance in the literature, it has one major limitation: it cannot provide a full account of affect dynamics. While summary statistics are useful for describing specific data patterns, they cannot explain how these data come about, or what the data-generating mechanism looks like. Achieving these goals requires the use of a different approach, which consists of the development and application of computational models that formalize the fundamental dynamical features of an individual’s affective life. A central question within this approach concerns the principles on which computational models should be built in order to properly describe affect dynamics.

The most commonly used class of computational models are the discrete-time (e.g., Adolf et al., 2017; Albers & Bringmann, 2020) and continuous-time (Boker & Nesselroade, 2002; Oravecz et al., 2011; Voelkle & Oud, 2013) autoregressive models, which postulate a linear relationship between current and past affective states. In spite of their popularity; however, they cannot accommodate several nonlinear features of affective data, including skew in affective measurements, an L-shaped relationship between positive and negative affect (PA and NA respectively; Diener & Iran-Nejad, 1986; Kuppens et al., 2013; Loossens et al., 2020; Norris et al., 2010; Schimmack, 2001), and multiple baselines to which affect can be regulated — a feature sometimes referred to as multistability (Hollenstein, 2015; Bonsall et al., 2012). These features, two of which are visualized in Fig. 1, have been found to describe fundamental characteristics of affective time series (Loossens et al., 2020), indicating that there might be a need to move from linear to nonlinear models in order to capture relevant dynamical features of affect.

Fig. 1 — Visualization of the two nonlinear features that will be central in this paper. The plots on the left show the distribution of PA and NA scores within the (bounded) PA-NA space. The plots on the right show (potential) time series that may accompany these kinds of distributions. The scatter plot and time series at the top visualize an L-shaped relationship between PA and NA. This means that when PA/NA is high, NA/PA will probably be low. Importantly, low affective states in both dimension may also occur, resulting in an L-shaped distribution within the PA-NA space. The second nonlinear characteristic, visualized in the lower plots, is multistability. Multistability implies that an individual experiences discrete-like affective states, such as overall positive (high PA, low NA) and overall negative (low PA, high NA) states, with little to no transitioning period between these states. This is most easily seen in the plotted time series, where one can see a sudden transitioning point in which the person switches from an overall negative to an overall positive state. Importantly, multistability may also involve faster transitions between discrete-like affective states than the ones plotted here

Whereas the evidence of nonlinearity is starting to accumulate, the interpretation of this evidence is more contentious. While nonlinearity may be an inherent characteristic of affect, it can also be induced by external events, in which case the observed nonlinearity in affective time series is a reflection of the dynamics of the environment, and not an indicator of underlying nonlinearity in affect. Consider multistability as an example. It can arise as an inherent characteristic of one’s affect dynamics (e.g., in bipolar disorder; Holmes et al., 2016; Bonsall et al., 2012; for a modeling perspective, see Steinacher & Wright, 2013), but could also arise due to the occurrence of positive and negative events (see Fig. 2). To avoid drawing the wrong conclusions about the operation of the affective system, it is thus important to study affect dynamics in relation to its immediate environment (Asutay et al., 2020, 2021; Kuppens et al., 2012; Kuppens & Verduyn, 2017; Lapate & Heller, 2020; Rutledge et al., 2014; Villano et al., 2020).

Fig. 2 — Visualization of multistability that arises due to environmental stimuli (top) or due to environmentally unrelated nonlinearity of the affective system (bottom). Red dots denote the affective state during negative affective events, while green dots denote the affective state during positive affective events. If multistability is to be a feature of the affective system itself, then it should not be explained away by the environmental stimuli that a participant encounters, such as in the upper case

In the current study, we investigate whether nonlinearity is inherent to the affective system or whether it is solely a consequence of not accounting for affective stimuli. To this end, we compared the nonlinear Affective Ising Model (AIM) with a linear competitor, namely the Bounded Ornstein-Uhlenbeck model (BOU; see Loossens et al., 2020), a bounded version of the more often used Ornstein-Uhlenbeck model (OU; Driver & Voelkle, 2018; Oravecz et al., 2011; Uhlenbeck & Ornstein, 1930; Voelkle & Oud, 2013). Given that both models account for the experimental stimuli, we can answer our question by assessing the performance of the models though a variety of evaluation tools. More specifically, we will evaluate the models in a relative manner — through comparing model fit and predictive accuracy — and in an absolute manner — through assessing how well the models can reproduce specific characteristics of the data. If we find that the AIM outperforms the BOU, this is an indication that the experimental stimuli are not the sole source of nonlinearity in the affective data.

Method

Participants

A total of 178 individuals were recruited through Prolific, an online data collection platform (https://www.prolific.co/). The sample was well balanced with regards to gender (46% female) and educational background (31% high school, 38% bachelor, 25% master, 6% other). Participants were on average 27 years old (SD = 8, range = [18,59]).

As participants completed a probabilistic reward task, they were told that they would receive their total earnings as the reward for participation. Unbeknownst to them, this total was predetermined, so that everyone received £8 after successfully completing the experiment, irrespective of their behavior during the task. On average, participants spent 31min 42s to complete the task (SD = 11min 35s).

Materials

To assess participants’ momentary affective states, we used a modified version of the Evaluative Space Grid (ESG; Larsen et al., 2008). The ESG is a two-dimensional grid whose axes are formed by PA and NA, allowing participants to report mixed feelings (i.e., affective states that are both positive and negative). For our experiment, we slightly modified the ESG to be continuous rather than discrete, to have the labels “Positive” and “Negative” on its axes, and to have four qualitative labels in the corners of the grid (clockwise starting at the lower left: “neutral,” “bad,” “mixed,” and “good”). For our analyses, we scaled the participants’ momentary affective states to take on values between 0 and 1, based on the boundaries of the grid.

Within the scope of two associated (unpublished) master’s theses, we included several questionnaires which had to be filled out at the end of the experiment in a pseudo-random order, based on the time at which the participant started the study. Given that the research goals of these master’s theses were different from the ones of this article, we will not discuss them further.

Procedure

After filling out an informed consent form, participants were asked to provide some basic demographic information. Participants then received instructions and started the experiment.

The experiment was a probabilistic reward task. On each trial, participants were shown four doors that each hid either a monetary reward (win) or punishment (loss; see Fig. 3). Participants were told that two doors concealed a win, while the other two hid a loss, so that there was a presumed equal probability of winning or losing on each trail. Participants chose one of the four doors, after which their choice remained visible on screen for 500ms and the door opened, showing the trial outcome for 2s. One second after the trial outcome was known, the net total — which was always visible at the top of the screen — was updated.

Fig. 3 — Structure of a trial of the probabilistic reward task. Participants were presented with four doors, behind two of which hid a monetary reward (win), and behind the other two a monetary punishment (loss). At the top of the screen, participants saw their current total winnings, denoted as trial total. The modified version of the Evaluative Space Grid was shown at the bottom of the screen

At the end of each trial, participants were asked to report their affective state on the ESG at the bottom of the screen. The grid kept track of participants’ temporary responses by placing a red dot at the previously clicked location. Participants confirmed their response by pressing the spacebar, and then moved on to the next trial after a 1-s delay. The red dot in the ESG disappeared with the start of each new trial to avoid imposing dependence between two subsequent trials. At no point during the study were participants limited by a time constraint; the experiment was self-paced.

Each participant received a starting capital of £3. To practice the trial structure, participants first went through 10 mandatory practice trials, after which their total was reset to £3. They then completed another 152 experimental trials.

Trial Generation

To ensure that each participant had a comparable trajectory of stimuli, we fixed the order in which wins and losses occurred for each participant. That is, each participant went through the same string of outcomes, which was fixed beforehand. This information was, however, not available to the participants. Instead, they were told that the probability of winning and losing was equal on each trial. One main advantage of this design is that individual differences in affect dynamics can be more easily picked up. Given that all stimuli are the same, individual differences in affect dynamics cannot be attributed to a difference in the encountered stimuli, but are due to how participants affectively react to these stimuli. One disadvantage, however, relates to the motivation of the participants. Participants may (wrongly) assume that there is a pattern to be learned about the outcomes associated with different doors, and they may become frustrated over time when their internally generated hypotheses are proven wrong.

We generated the trials in several steps. First, wins and losses were semi-randomly drawn from the intervals [0.5,1.5] and [3,4], and the latter were multiplied by −1.1 Then, these wins and losses were manipulated so that participants would build up a total gain of £5 during the course of the experiment, which together with the £3 starting capital makes a total reward of £8. Lastly, we randomized the order of all outcomes and saved this order to be used for each participant.

Computational Models

The main goal of our study was to evaluate whether nonlinearity is an important characteristic of affect dynamics, or whether it is an artifact caused by a lack of accounting for affective stimuli. For this, we assessed the performance of several computational models, each composed of two building blocks: a dynamical model and an input function. The former governs affective changes over time while the latter describes how the experimental stimuli are related to affect, thus linking the stimuli to the dynamical model. In what follows, we discuss both building blocks more thoroughly.

Dynamical Models

The AIM and the BOU belong to the class of drift-diffusion models and consequently share several properties. First, the models assume that affective change happens continuously over time, which relates closely to how affective fluctuations are believed to be experienced (Boker & Nesselroade, 2002). Second, the models describe change in the affective system in terms of the current state, inherent stochastic fluctuations, and the presence of one (or more) attractors (Strogatz, 2018). An attractor can be interpreted as a baseline state to which the current affective state is regulated. In other words, every deviation from the attractor or baseline will be regulated back towards it. Due to the stochasticity of the models, variability around the attractor or baseline is to be expected, even in the absence of any immediate contextual influences.

To develop some intuition about how the dynamical models work, consider a marble that is released inside a bowl. As to our intuition, the marble tends to roll down towards the bottom of the bowl, which serves as the attractor of the system. However, stochastic influences will perturb the marble, so that it is flicked uphill again. Given these down- and uphill movements, the marble will stay in continuous motion, as is assumed to be the case with affect.

Modeling the dynamics of a process in terms of a baseline and variability around it (either in terms of regulation of perturbations, or in terms of oscillations) is common, both within and outside of psychology (see, e.g., Guastello et al., 2009; Boker & Laurenceau, 2006; Goldbeter, 2011), and both within a discrete- and continuous-time framework (e.g., VAR-models; Bringmann et al., 2018; Ariens et al., 2020). These models allow for capturing several affective phenomena, such as affective regulation, affective reactivity, and affective variability. Furthermore, these models allow for the identification of different regulatory patterns, each with their own defining trajectory (Strogatz, 2018; Vanhasbroeck et al., 2021).

In what follows, we zoom in on the AIM and the BOU separately. We provide an intuitive explanation of the models rather than a technical one. We refer the interested reader to the Supplementary Materials for a technical description of the models.

Affective Ising Model

The AIM is a continuous-time, nonlinear drift-diffusion model that captures affective fluctuations through an affective surface. This surface is a person-specific hilly landscape through which an individual’s affective state roams, in close resemblance to the marble metaphor above.

It is through this surface that the AIM displays its nonlinearities, of which we describe two in detail (visualized in Fig. 4). First, the AIM is able to create nonlinear “canyons” as seen from a bird-view perspective. These canyons may, for example, form an L-shape, having their minimum at neutral states (low PA-low NA) with protrusions towards positive (high PA-low NA) and negative states (low PA-high NA; see Fig. 4). Affect is expected to move within this canyon, thus mostly displaying neutral and occasionally positive or negative states (although mixed states are not excluded). Another nonlinear feature is what we previously referred to as multistability. This property refers to the presence of more than one (local) attractor or minimum in the affective surface. This means that while the affective state may initially be regulated towards one attractor, it may at some point jump towards another attractor, thus accounting for discrete jumps in affective states in a natural way (see again Fig. 1).

Bounded Ornstein-Uhlenbeck Model

In contrast to the AIM, the BOU is a continuous-time, linear drift-diffusion model. The BOU’s affective surface is much simpler than the AIM’s, as it corresponds to a paraboloid bowl (see Fig. 4). This bowl may vary between being circular to elliptical, depending on the BOU’s parameters (in particular those that capture the relationship between PA and NA). Unlike the AIM, the BOU cannot accommodate several nonlinear features of affect dynamics, such as the L-shaped relationships between PA and NA, and multistability in its affective surface.

As mentioned earlier, the BOU is a bounded version of the OU, in which reflecting boundaries are imposed on the model. Like the BOU, the OU is a linear drift-diffusion model that has received some attention in the literature. Unfortunately, the OU is an unbounded model, and it may thus produce nonsensical predictions that fall outside of the data range. As a consequence, the OU is naturally disadvantaged in comparison to the AIM, as the latter is naturally shielded against such nonsensical predictions. The reflecting boundaries within the BOU are imposed to alleviate this difficulty (see also Loossens et al., 2020).

Importantly, these boundaries make the BOU capable of accommodating skew in PA and NA, which stands in stark contrast to the strictly Gaussian patterns that the OU can create. By comparing the AIM with the BOU, we are therefore investigating whether the additional nonlinear features of the AIM (i.e., the L shape and/or multistability) provide a better means to describing affect dynamics than solely accounting for skew in the affective measurements, as done by the BOU. For simplicity’s sake, we will refer to the BOU as a linear model throughout this text.

Input Functions

On each trial of the experiment, participants encountered two experimental stimuli that are treated here as input to the dynamical models. These stimuli are the trial outcome (wins and losses) and the trial total (the updated total at a given trial) and their relationship to affect is captured by an input function. Such an input function links the input to the parameters of the dynamical models so that a change in the input produces a change in the affective surface and, ultimately, in the location (BOU) or strength of the attractors (AIM; see Fig. 5). For both dynamical models, the parameters β₁ and β₂ represent the input for PA and NA, respectively. Higher values of these parameters generally represent greater contextual change in the attractors, while a value of 0 represents the absence of any contextual influence. More specifically, this means that the zero vector [0,0]^T represents the absence of any contextual influences, while the vectors [β₁,0]^T , [0,β₂]^T , and [β₁,β₂]^T represent contextual influences on PA, NA, and PA and NA combined, respectively.

Fig. 5 — Visualization of the context-related changes to the AIM (left) and BOU (right). For the AIM, the affective surface is tilted in a specific direction. For the BOU, this surface is not tilted, but the attractor changes location more directly. Both manipulations lead to an increased probability of experiencing certain affective states, which is visualized in the probability plots

One crucial question, however, concerns the way in which the stimuli from the experiment determine the values of β₁ and β₂. To accommodate this issue, we constructed seven exploratory input functions, each differing in two aspects. First, input functions differed in which experimental stimuli they account for, namely only immediate trial outcomes, only the trial total (i.e., the total accumulated up to a specific trial), or both trial outcome and trial total. Second, the input functions differed in how the input plays a role, which was operationalized as a linear relationship between the parameters of the models and either the real values of the variables or dummy-coded versions thereof. Dummy-coding of the stimuli consisted of either differentiating between wins and losses (for trial outcomes) or between a positive and negative total (for trial total). Furthermore, we assumed that positive/negative events only changed PA/NA (relative to the neutral state). For completeness, we also considered the case in which the stimuli do not influence affect. The input functions that were used in our analyses are displayed in Table 1 with their respective names.

Table 1.

Values of the parameters β₁ and β₂, as determined by the input functions. Importantly, β₁ primarily changes the attractors of the models in the PA direction (i.e., higher or lower PA), while β₂ is primarily concerned with NA. The values of the ω’s (for trial outcomes) and τ’s (for trial total) are free parameters to be estimated for each individual within each dynamical model, and determined the strength with which the β’s influenced the attractors

Input name	Code	When?	β₁	β₂
Empty		Always	0	0
Dummy outcome	O₂	Win	ω₁	0
Dummy outcome	O₂	Loss	0	ω₂
Linear outcome	O_L	Win	ω₁ × \|O_t\|	0
Linear outcome	O_L	Loss	0	ω₂ × \|O_t\|
Dummy total	T₂	Positive total	τ₁	0
Dummy total	T₂	Negative total	0	τ₂
Dummy low and high outcome	O₄	Low win	ω₁₁	0
		High win	ω₁₂	0
		Low loss	0	ω₂₁
		High loss	0	ω₂₂
Dummy outcome, dummy total	O₂T₂	Win, positive total	ω₁ + τ₁	0
		Win, negative total	ω₁	τ₂
		Loss, positive total	τ₁	ω₂
		Loss, negative total	0	ω₂ + τ₂
Linear outcome, dummy total	O_LT₂	Win, positive total	ω₁ × \|O_t\| + τ₁	0
		Win, negative total	ω₁ × \|O_t\|	τ₂
		Loss, positive total	τ₁	ω₂ × \|O_t\|
		Loss, negative total	0	ω₂ × \|O_t\| + τ₂

Open in a new tab

Combining these input functions with the AIM and the BOU, a total of 14 models were compared to each other. In the remainder of this paper, we will refer to these combinations with a label that specifies the dynamical model and the input function (using the labels from Table 1). For example, an AIM coupled with an input function that accounts for wins and losses in a dummy-like way will be referred to as AIM O₂. We note that the letter O stands for the inclusion of the outcome and the letter T for the inclusion of the total. The subscript 2 denotes a dummy-like effect of these variables and the subscript L denotes a linear effect. The single exception to these rules is the input function O₄, which includes a dummy-like effect for the outcomes drawn from each separate outcome generation interval (see the “Trial Generation” section).

Statistical Analyses

Using the data, models were tested both in a relative and an absolute sense. We describe both strategies in turn, but first detail how parameters were estimated as an intermediate step.

Parameter Estimation

The parameters of the models were estimated for each individual separately through the GradientDiffusion package (Loossens et al., 2021). Estimates were obtained by minimizing the min-log-likelihood ℓ using the differential evolution heuristic (DE; Storn & Price, 1997; also see Supporting Information of Loossens et al., 2020). The general procedure went as follows. First, an individual’s affective time series was divided into pairs of measurements, with each pair containing the affective states at trials t and t−1. Then, P parameter sets were randomly generated and served as starting points from which optimization would occur. At each step of the DE, the probability distribution at time t conditional on the affective state at trial t–1 was computed for each parameter set. Importantly, these probability distributions were also conditional on the proposed parameters, the time that had passed since the previous affective measurement, and the contextual input at trial t. Next, the min-log-likelihood ℓ was computed for all measurement pairs and summed together, providing us with an overall measure of fit of the model to the data, using specific parameters. Finally, the parameter sets were combined to form P new parameter sets that were evaluated in the same way. The ℓ’s of this “new” generation (children) were compared to the ℓ’s of the “old” generation (parents). Whenever a child outperformed its parent, the child was selected for further use, and otherwise, the parent was used in the next part. The selected parameter sets were then used to create the next generation, and so the cycle continues. After N iterations of this combination and evaluation procedure, the parameter set with the lowest ℓ was selected to be the result of the estimation procedure.

As this procedure is a heuristic, it was not guaranteed that the global minimum would be found, possibly leading to suboptimal parameter estimates (Myung, 2003). The procedure was therefore run five times, each with 2,500 iterations and 100 starting values. To make sure that the parameters of the contextualized models converged, we ran the parameter estimation an additional five times with 5,000 iterations and 100 starting values when the ℓ of the models did not meet a basic sanity check. This sanity check consisted of comparing the ℓ for more complex models to the ℓ of a simpler, nested version thereof. When two models are nested, the ℓ of the simpler model should be greater than the ℓ of the more complex model, as a complex model will always fit better to the data than a more restricted model. Following this idea, we compared the ℓ of more complex models to the ℓ of simpler models, checking whether the former was indeed lower than the latter. For input functions with two parameters, we compared their ℓ to the ℓ of the Empty input function. For input functions with four parameters, we compared their ℓ to the O₂ input function, as the more complex input functions were all nested within this input function.

In the Supplementary Materials, we report on a recovery study in which we assessed whether the models’ parameters could be adequately recovered with this procedure on a subset of all models under investigation. In short, we find evidence that this is indeed the case.

Relative Model Fit

All models were compared to each other with regard to their relative fit to the data. This comparison may reveal whether the added complexity of the nonlinear AIM is necessary to fit the data. If this is not the case — that is, if the linear BOU outperforms the AIM — then we conclude that the observed nonlinearity can be attributed to the stimuli of the experiment. However, if the AIM were to outperform the BOU, then this nonlinearity is not completely dependent on the experimental stimuli, and it may thus possibly reside within the affective system itself. In essence, this analysis thus shows whether nonlinearity must be accounted for when fitting models to the data.

We compared the relative fit of the models in two ways, namely by using the AIC as a general measure of model fit and by comparing the predictive performance of the models using a cross-validation procedure.

General Fit

As a measure of fit of a model to the data, we used Akaike’s Information Criterion (AIC; Akaike, 1974). This measure balances the fit of a model to the data and the complexity of the model by penalizing fit according to the number of parameters of the model. As such, the AIC gives an indication of goodness-of-fit while compensating for overfit.

To ensure that we can adequately recover the relative performance of the models by comparing the AIC, we performed a distinguishability study using the AIC as a measure of fit (see Supplementary Materials). The results of this study suggest that we can adequately distinguish between very similar models. However, for practical reasons, we could only assess the distinguishability of a subset of all models and subsequently, results of the model comparison should be considered accordingly.

Predictive Performance

Predictive performance was assessed using a blocked 15-fold cross-validation procedure (see Arlot & Celisse, 2010; Hastie et al., 2009). First, we transformed an individual’s time series to contain the affective state at time t and time t−1 (see also Parameter estimation), ensuring relative independence between measurement pairs. Then, this transformed affective time series was repeatedly separated in two parts: a training sample and a test sample. The mapping of the data to samples was determined based on temporal proximity, so that data that lie closer together are kept together (see Fig. 6), ensuring relative independence between training and test samples (Roberts et al., 2017). The parameters of the models were estimated on the training sample and subsequently used to predict the data of the test sample. As we know the data of the test sample, we could assess how accurately the models predicted these data. We used the ℓ of the data given the conditional probability distributions as a measure of predictive accuracy.2 This procedure was repeated until every data point served once in the test sample. The measure of predictive accuracy was averaged across iterations to obtain an overall assessment of predictive performance.

Fig. 6 — Visualization of the selection of training and test data within the blocked k-fold cross-validation procedure

Absolute Model Fit

While the relative fit based on AIC and predictive accuracy give a relative indication of how well a model fits the data, this same model may not necessarily be able to reproduce the characteristics of the data (Palminteri et al., 2017). This is important, as a computational model should be able to reproduce the data to answer the questions we posed above, namely the question of what the affective system looks like. Relative performance fails to say anything about whether a model is able to produce data that resemble the observed data, and thus of whether the model resembles the data-generating mechanism.

To assess the models’ ability to reproduce the data characteristics, we used a parametric bootstrap procedure.