Abstract
We examined the role of working-memory (WM) in dynamic decision-making by having participants perform decision-making tasks under single-task or dual-task conditions. In two experiments participants performed dynamic decision-making tasks where they chose one of two options on each trial. The Decreasing option always gave a larger immediate reward, but caused future rewards for both options to decrease. The Increasing option always gave a smaller immediate reward, but caused future rewards for both options to increase. In each experiment we manipulated the reward structure such that the Decreasing option was the optimal choice in one condition and the Increasing option was the optimal choice in the other condition. Behavioral results indicated that Dual-task participants selected the immediately rewarding Decreasing option more often, while Single-task participants selected the Increasing option more often, regardless of which option was optimal. Thus, Dual-task participants performed worse on one type of task but better on the other type. Modeling results showed that Single-task participants’ data was most often best fit by a Win-Stay-Lose-Shift (WSLS) rule-based model that tracked differences across trials, while Dual-task participants’ data was most often best fit by a Softmax reinforcement learning model that tracked recency-weighted average rewards for each option. This suggests that manipulating WM load affects the degree to which participants focus on the immediate versus delayed consequences of their actions and whether they employ a rule-based WSLS strategy, but it does not necessarily affect how well people weigh the immediate versus delayed benefits when determining the long-term utility of each option.
Working-Memory Load and Temporal Myopia in Dynamic Decision-Making Decision-making is a recurring task in our everyday lives. Our decisions have both immediate and delayed consequences and understanding the mechanisms that affect decision making is of much importance. One thing required of the decision-maker is adequate cognitive resources to balance the strengths and weaknesses of each choice option. Many common decision making paradigms used in the laboratory involve participants repeatedly choosing from more than one option and receiving rewards or punishments often in the form of points or money gained or lost (e.g. Bechara, Damasio, Damasio, & Anderson, 1994; Worthy, Maddox, & Markman, 2007; Yechiam, Busemeyer, Stout, & Bechara, 2005; Otto, Markman, Gureckis, & Love, 2010; Rakow & Newell, 2010). In these paradigms, the decision-makers’ goal is typically to maximize gains and/or minimize losses. Each option must be assigned some expected reward value and the decision-maker must consider not only the immediate benefit that results from selecting each option, but also each option’s delayed benefits when determining the long-term utility of each option.1
Some research suggests that development and maintenance of expected reward values for the various choice options requires WM resources (Dretsch & Tipples, 2007; Bechara & Martin, 2004; Hinson, Jameson, & Whitney, 2002; Curtis & Lee, 2010). Under this view, reduced WM resources might hinder one’s ability to develop and maintain expected reward values which are needed to make the best decisions. One decision making process that could be dependent on WM resources is simply remembering the rewards given when each option was last chosen. The decision-maker can gain valuable insight into the values of each option by simply considering their immediate payoffs.
However, another possibility is that people can implicitly learn which options lead to the largest immediate rewards without explicitly remembering the rewards associated with each option. There is much evidence that suggests that a prediction error, which is the difference between the reward received and the expected reward value for a given option, is tracked by the ventral striatum, a sub-cortical region implicated in implicit, procedural learning (Pagnoni, Zink, Montague, & Berns, 2000; Pessiglione, Seymour, Flandin, Dolan, & Frith, 2006; Hare, O’Doherty, Camerer, Schultz, & Rangel, 2008). In many popular reinforcement learning models these prediction errors are used to update the expected value for the option that was chosen on each trial (e.g. Sutton & Barto, 1998; Yechiam et al., 2005; Worthy et al., 2007). These expected values are essentially running averages of the immediate rewards associated with each option, with more weight given to the rewards received most recently. The ability of sub-cortical regions to track these expected reward values suggests that people have the ability to implicitly learn which option provides the highest immediate payoff on each trial, and that this ability may remain intact under WM load (e.g. Frank & Claus, 2006; Frank, O’Reilly, & Curran, 2006).
While the work cited above suggests that expected reward values based on the immediate rewards received are tracked by sub-cortical regions that are also associated with implicit, or procedural learning, it is nevertheless very likely that expected reward values can be explicitly verbalized (e.g. “Option A seems to give larger immediate rewards than option B”). Thus, it is difficult to distinguish whether the dual-working memory task leads participants to make decisions based on a more implicit processing mode, or whether the dual-task leads participants to use a simpler strategy that is overseen by a weakened explicit processing system.
While immediate payoffs are important to consider, consideration must also be given to each option’s delayed payoffs. Often the consequences of certain decisions are not realized immediately, but become manifest at some point in the future. This is similar to many real-world decisions like deciding whether to attend graduate school or join the work force immediately after graduating from college. Joining the work force may lead to more immediate benefits in the form of higher income than would be earned while in graduate school, but a graduate degree could very likely mean higher cumulative income over the course of one’s life. Thus a second consideration that must be kept in mind when making decisions is how current choices will influence future outcomes or possibilities. People must not be temporally ‘myopic’ in that they fail to see the delayed consequences of each option as well as the immediate consequences.
A final component of the decision-making process is to efficiently juxtapose the immediate and delayed benefits of each option in order to make the best decision. While it is very often the case that making decisions based on how those decisions will be rewarded in the future is the best strategy, there are also situations where the immediate benefits outweigh any potential delayed benefits. For example, someone who is nearing retirement age may have little to gain by quitting their job to attend graduate school; similarly, someone may be faced with such a great immediate job prospect that going to graduate or law school may be counterproductive. In some situations the immediate benefits of one option are so valuable that they outweigh any potential delayed benefits of other options (Otto, Markman, & Love, 2012; Worthy, Gorlick, Pacheco, Schnyer, & Maddox, 2011).
Our goal in the present work is to investigate the role of WM load in these three common components of decision-making:
Evaluation of the immediate benefits of each option.
Evaluation of the delayed benefits of each option.
Juxtaposition of the immediate and delayed benefits to maximize cumulative reward.
One hypothesis is that the amount of necessary WM resources will increase from components 1-3, with the evaluation of the immediate benefits requiring the fewest WM resources, and the juxtaposition of the immediate and delayed benefits requiring the most WM resources. To address this issue we use a dynamic decision-making paradigm that is ideal for examining the role of WM load in these three component processes of decision-making.
Figure 1 shows the reward structures we use in Experiment 1. We use similar reward structures in Experiment 2, but we vary some aspects of the task to broaden the scope of our findings. In particular, we use different surface features in Experiment 2 that are more similar to “Gambling” paradigms that have been extensively used to examine experience-based decision-making (e.g. Bechara et al., 1994; Yechiam et al., 2005; Worthy et al., 2007; Worthy, Maddox, & Markman, 2008). In these tasks there are two options that participants choose from and receive rewards from on each trial. One option, the Increasing option, gives a smaller reward on each trial, but selecting this option causes rewards for both options to increase on future trials. This can be seen on the x-axes for Figures 1a and 1b where rewards are a direct function of the number of times the Increasing option has been selected over the previous ten trials. The other option, the Decreasing option, gives a larger reward on any given trial, but selecting this option causes future rewards for both options to decrease . Thus, selecting the Decreasing option always leads to a larger immediate reward, and selecting the Increasing option always leads to larger rewards for both options on future trials. Tendencies to select either of these options can indicate a preference or bias toward immediate versus delayed payoffs, and we can thus address how WM load influences these preferences.
Figure 1.
Rewards given for each task as a function of the number of the number of Increasing option selections over the previous ten trials. (a.) Rewards given for the Increasing optimal task. Selecting the Increasing option 10 consecutive times will lead to a reward of 80 points on each trial, whereas selecting the Decreasing option 10 consecutive times will lead to a reward of 40 points on each trial. (b.) Rewards given for the Decreasing-optimal Decreasing task. Selecting the Decreasing option 10 consecutive times will lead to a reward of 65 points on each trial, whereas selecting the Increasing option 10 consecutive times will lead to a reward of 55 points on each trial.
To address how WM load affects the third component of decision-making listed above, the juxtaposition of the immediate and delayed benefits of each option, we alter which option is optimal between the reward structures in Figure 1a and 1b. Figure 1a shows the reward structure for Experiment 1’s Increasing-optimal task. Here the Increasing option is the optimal choice because repeatedly selecting it will lead to more oxygen earned on each trial (80 points) than the amount earned from repeatedly selecting the Decreasing option (40 points). In contrast, in the Decreasing-optimal task shown in Figure 1b, the Decreasing option gives a much larger reward on each trial than the Increasing option (60 points more). Here the amount earned from repeatedly selecting the Increasing option (55 points) is smaller than the amount earned from repeatedly selecting the Decreasing option (65 points), so selecting the Increasing option is futile and counterproductive even though it leads to more rewards for both options on future trials. To perform well on both of these tasks participants must not simply be biased toward the immediate vs. delayed benefits of the two options, but must instead effectively juxtapose the benefits of both options.
We examine the role of WM load by having participants make decisions with or without performing a concurrent numerical Stroop task that is designed to deplete available WM resources. This task has been successfully used in previous published work to examine the role of WM load during category-learning (Zeithamova & Maddox, 2006; Waldron & Ashby, 2001; Newell, Dunn, & Kalish, 2010; Miles & Minda, 2011). This research has found that depleting WM resources through the use of the numerical Stroop task results in impaired performance on explicit, rule-based tasks where a verbalizable rule can distinguish members of each category, but it does not impair performance on procedural, information-integration tasks where verbalizable rule use is difficult (but see Newell et al., 2010 and Nosofsky & Kruschke, 2002 for a counterargument, and Ashby & Ell’s (2002) reply to Nosofsky & Kruschke (2002)). The numerical Stroop task requires participants to maintain information about two different properties of the numbers presented in working memory while making each decision (the numerical value of the number as well as the font size used to display the number).
Predictions
We offer two possible hypotheses for how performing under dual-task versus single-task conditions will affect dynamic decision making. One possibility is that participants performing under dual-task conditions will simply underperform on both Increasing-optimal and Decreasing-optimal tasks. Previous research has demonstrated poor choice performance under WM load for participants performing the Iowa Gambling task (Hinson et al., 2002; Dretsch, & Tipples, 2007), and reduced WM resources from the dual task may simply attenuate participants’ ability to appropriately weigh the immediate and delayed benefits of each option. Additionally, some previous research suggests that WM load may simply lead to more random responding (Franco-Watkins, Pashler, & Rickard, 2006). However, this conclusion has been challenged by other findings that suggest that participants under WM load are not simply behaving randomly (e.g. Dretsch &Tipples, 2008).
A second possibility, which we view as more likely, is that the dual-task will cause a type of temporal myopia where participants under WM load are unable to observe the long-term effects of selecting each option. This may result from the WM load causing Dual-task participants to learn the expected values for each option more implicitly based on the immediate feedback received after each selection and the resultant prediction error. Some previous work suggests that WM load may actually increase implicit forms of learning due to a reduction in interference from the explicit system when the task requires an implicit strategy (e.g. Filoteo, Lauritzen, & Maddox, 2010; Maddox, Ashby, Ing, & Pickering, 2004). Single-task participants may rely less on immediate feedback and may use a heuristic-based strategy that focuses on comparing the rewards received across trials. Such a comparison would allow Single-task participants to observe the delayed effects of choosing each option. One possible strategy that Single-task participants may employ is a win-stay lose-shift (WSLS) strategy (Novak & Sigmund, 1993; Steyvers, Lee, & Wagenmakers, 2009; Otto, Taylor, & Markman, 2011). This is a rule-based strategy that has been shown to be used in binary prediction tasks (e.g. Otto et al., 2011). Under this strategy, participants “stay” by picking the same option on the next trial if they were rewarded, and “switch” by picking the other option on the next trial if they were not rewarded.
Here, we examine whether participants use a more elaborate version of this strategy in the present tasks by examining whether participants “stay” by picking the same option on the next trial if the reward was equal to or larger than reward received on the previous trial (a “win” trial), or “shift” by selecting the other option on the next trial if the reward received on the current trial was smaller than the reward received on the previous trial (a “lose” trial). After examining the behavioral data to determine if there are differences in the proportion of times Single and Dual-task participants select each option and how well they perform on the tasks, we then fit the data to two learning models: a Softmax reinforcement learning model that assumes that expected reward values are updated each time an option is chosen based on the rewards received immediately after each selection, and a WSLS model that assumes that participants adjust their behavior based on a comparison between the current reward and the reward received on the previous trial. These models both have long histories in the decision-making literature. The former model may mimic a more implicit or procedural decision-making mode, while the latter model likely mimics a more explicit, or rule-based decision-making mode. A WSLS strategy also requires participants to remember the reward that was received on the previous trial. The dual-task will make doing so more difficult. Accordingly, we predict that behavior from Single-task participants will be better described by the WSLS model, while the behavior of Dual-task participants will be better described by the Softmax reinforcement learning model. The use of these different strategies should lead Single-task participants to select the Increasing option more often than Dual-task participants across all conditions. In the next section we present the details of the WSLS and Softmax models and then present simulations of the tasks shown in Figure 1 for each model. These simulations provide predictions for how participants will perform if they are using WSLS or Softmax-based strategies.
Predictions from the WSLS and Softmax models
The WSLS and Softmax models have been used in many previous studies to characterize decision-making behavior, and the assumptions and mechanisms of each are very transparent (Otto et al., 2011; Worthy & Maddox, 2012; Worthy et al., 2007; Steyvers et al., 2009; Frank & Kong, 2008; Sutton & Barto, 1998; Lee, Zhang, Munro, & Steyvers, 2011).
The Softmax model assumes that participants develop Expected Values (EVs) for each option that represent the rewards they expect to receive upon selecting each option. EVs for all options are initialized at 0 at the beginning of the task, and updated only for the chosen option, i, according to the following updating rule:
| (1) |
Learning is modulated by a learning rate, or recency, parameter (α),0 ≤ α ≤ 1, that weighs the degree to which the model updates the EVs for each option based on the prediction error between the reward received (r(t)), and the current EV on trial t. As α approaches 1 greater weight is given to the most recent rewards in updating EVs, indicative of more active updating of EVs on each trial, and as α approaches 0 rewards are given less weight in updating EVs. When α = 0 no learning takes place, and EVs are not updated throughout the experiment from their initial starting points.
The EVs for all j options are used to determine the model’s probability for selecting each option. Action selection probabilities for each option (a) are computed via a Softmax decision rule (Sutton & Barto, 1998):
| (2) |
Here γ is an exploitation parameter that determines the degree to which the option with the highest EV is chosen. As γ approaches infinity the highest valued option is chosen more often, and as γ approaches 0 all options are chosen equally often. This model has been used in a number of previous studies to characterize choice behavior (e.g. Daw et al., 2006; Otto, Markman, Gureckis, & Love, 2010; Worthy, Maddox, & Markman, 2007, Yechiam & Busemeyer, 2005).
The WSLS model has two free parameters: the probability of staying with the same option on the next trial if the reward received on the current trial is equal to or greater than the reward received on the previous trial, P(stay|win), and the probability of shifting to the other option on the next trial if the reward received on the current trial is less than the reward received on the previous trial, P(shift|loss).
To predict how participants would perform if they were relying on a WSLS versus a Softmax-based strategy we conducted 1000 simulations for each model for each of the two tasks shown in Figure 1. We selected reasonable parameter values to use in the simulations based on parameter values for each model that provided a good fit to data from published work in our labs (Worthy et al., 2011). The parameters used for the WSLS model were .90 for P(stay|win), and .5 for P(shift|loss), and the parameters used for the Softmax model were .50 for α and .03 for γ.
Figure 2 shows the average proportion of trials that each model selected the Increasing option in each task. The WSLS model selected the Increasing option more than the Softmax model in both tasks, and there was no difference between the two tasks. This is because the model’s decisions are based only on whether the current reward is greater or less than the reward given on the previous trial, regardless of the magnitude of the difference. The Softmax model, which is sensitive to the magnitude of the difference in average rewards given for each option, selected the Increasing option less often than the WSLS model in both tasks, but much less often in the Decreasing optimal task , where the Decreasing option gives a substantially higher reward than the Increasing option on each trial. Based on these simulations we predict that Single-task participants, who are predicted to rely on a WSLS strategy more than Dual participants will select the Increasing option more than Dual task participants in both tasks. This will lead to better performance in the Increasing optimal task, but worse performance than Dual-task participants in the Decreasing-optimal task.
Figure 2.
Average proportion of Increasing options selected from each model’s simulations.
Experiment 1
Method
Participants
Ninety-eight undergraduate students at Texas A&M University participated in th experim ent for course credit. Participants were randomly assigned to one of four between subjects conditions that consisted of the factorial combination of two WM load conditions (Single task vs. Dual task) and two reward structure conditions (Increasing optimal vs. Decreasing-optimal). There were 23 participants in the Dual optimal condition and 25 participants in each of the other three conditions.
Materials and Procedure
Participants performed the Experiment on PCs using Psychtoolbox for Matlab (version Psychtoolbox for Ma 2.5).
The procedure for the single task conditions was nearly identical to the procedure used in a previous study in our labs (Worthy et al., 2011 shows a sample screen-shot from the experiment. Participants were given a cover story that they would be testing two extraction systems that farmed oxygen on Mars, and their goal was to extract as much oxygen as possible. A similar paradigm has been used elsewhere to examine other issues in dynamic decision-making (Gureckis & Love, 2009a; 2009b; Otto, Gureckis, Markman & Love, 2009; Otto et al., in press). On each trial participants were told to “collect oxygen using one of the two systems” which appeared at the top of the screen. They were allowed as much time as they wished to make a response. After a delay of 500ms the amount of oxygen they received for that trial was indicated in the narrow tank labeled “Current”, and after another 1000ms the oxygen would be emptied into the “Cumulative” tank. To roughly equate the experiment time between WM load conditions, there was a 1000ms ITI in the Single task conditions where a black screen came up and participants saw the phrase, “Please wait for the next trial…” The next trial would then begin.
Participants performed a total of 250 trials of the task. The rewards they received were based on the reward structures shown in Figure 1. Participants in the Increasing-optimal condition received rewards plotted in Figure 1a and participants in the Decreasing-optimal condition received rewards plotted in Figure 1b. The place on the x-axis was determined on each trial by summing the number of times participants had selected the increasing option over the previous trial. Thus, there was a “moving window” which kept a count of the number of times the increasing option was selected on each trial. All participants began the experiment at the mid-point (5) on the x-axis.
A line on the larger tank corresponded to the amount of oxygen needed to sustain life on Mars. Participants were given the goal of trying to collect this amount of oxygen over the course of the experiment. The goal lines were set at the equivalent of 18,000 points for the Increasing-optimal task and 16,000 points for the Decreasing-optimal task. This corresponded to selecting the optimal choice in each task on roughly 80% of trials. Participants were told nothing about the rewards available for each option or the choice history-dependent structure of the rewards.
The procedure for participants in the Dual-task conditions was modified from the procedure for the Single-task conditions to include a numerical Stroop task used in previous work to investigate the role of WM load in perceptual category-learning (Zeithamova & Maddox, 2006; Waldron & Ashby, 2001). Participants had to perform a numerical analogue of the Stroop task (Stroop, 1935) concurrently with the decision-making task. The concurrent task required participants to remember which of two numbers was physically larger and which was larger in numerical value, and to hold that information in mind while deciding which oxygen system to choose on each trial. On each trial the picture of Mars, the two extraction systems, and the Current and Cumulative point meters were presented in the center of the screen. At the beginning of the trial the two numbers for the concurrent task were presented on each side of the screen, one number on each side, for 200ms. After 200ms a uniform white mask covered the numbers for 200ms and participants were then allowed to choose from one of the two oxygen extraction systems, and they were given feedback in an identical manner to that described above. A new screen then appeared that queried participants with either “VALUE” or “SIZE,” and participants selected from button labeled “Left” or “Right” to indicate which side had the number largest in either numerical value or physical size. After they made their response they were told whether they were correct or not, and the next trial began immediately.
Following previous studies that have used the same concurrent task manipulation, participants were told that they were required to achieve at least 80% accuracy on the numerical Stroop task, and to focus first on achieving good performance on the numerical Stroop task and to “use what you have left over” for the decision-making task (e.g. Zeithamova & Maddox, 2006; Waldron & Ashby, 2001). Participants’ current accuracy on the numerical Stroop task was indicated at the top of the screen when they received feedback regarding their performance on the concurrent task on each trial. Their percentage correct score was listed in green if it was above 80% and red if it was below 80%. All of the participants in the study achieved at least 80% accuracy on the numerical Stroop task.2
Results
Behavioral Analyses
The effects of WM load on behavior can be seen by simply plotting the proportion of times participants selected the Increasing option over the course of the experiment (Figure 4a). A 2 (WM load) X 2 (Reward Structure) ANOVA revealed a significant effect of WM load, F(1,94)=17.46, p<.001, η2=.16. Participants in the Single-task (M=.49, SD=.21) condition selected the Increasing option much more often than participants in the Dual-task condition (M=.30, SD=.24). There was a also a significant effect of task type, F(1,94)=7.73, p<.01, η2=.08. Participants who performed the Increasing-optimal task (M=.45, SD=.24) selected the Increasing option more often than participants who performed the Decreasing-optimal task (M=.33, SD=.23). The WM load X task-type interaction was not significant (F<1).
Figure 4.
(a.) Proportion of Increasing options selections for participants in each condition of Experiment 2. (b.) Proportion of the optimal cumulative payoff participants earned in the task. These proportions are roughly equivalent to the proportion of times participants selected the optimal option based on the reward structure.
To compare performance across tasks we derived a measure of the proportion of the optimal cumulative payoff that is commensurable across task structures. This measure was derived by computing: (Points earned – Minimum Possible Points)/Range of Possible Points Earned. Because points received are a function of the proportion of times a participant selects each option, the proportion of the optimal cumulative payoff value is an indirect measure of the proportion of times participants made the optimal choice for each reward structure. We used this measure, rather than points earned on the task, because it equated performance across the two tasks which differed in their reward structures. Figure 3b plots the proportions of the optimal cumulative payoff for participants in each condition. A 2 (WM load) X 2 (Reward Structure) ANOVA revealed a significant effect of reward structure, F(1,94)=25.11, p<.001, η2=.21. Participants in the Decreasing-optimal reward structure conditions (M=.67, SD=.23) earned higher proportions of the optimal cumulative payoff than participants in the Increasing-optimal conditions (M=.45, SD=.24). There was also a significant WM load X Reward Structure interaction, F(1,94)=18.15, p<.001, η2=.16. We conducted pair-wise comparisons within each reward structure to examine the locus of this interaction. For the Increasing-optimal reward structure Single-task participants (M=.55, SD=.22) significantly outperformed Dual-task participants (M=.36, SD=.22), F(1,48)=9.60, p<.01, η2=.17. However, for the Decreasing-optimal reward structure Dual-task participants (M=.77, SD=.23) significantly outperformed Single-task participants (M=.58, SD=.20), F(1,46)=8.59, p<.01, η2=.16.
Figure 3.
Sample screen shot from the experiment. Participants were given a cover story where they were asked to test two oxygen-extraction systems on the Martian landscape. The oxygen extracted on each trial was shown in the “Current” tank and then transferred to the “Cumulative” tank.
We also examined the mean response times for participants in each condition. These are shown in Figure 5. A 2 (WM load) X 2 (Reward Structure) ANOVA revealed a significant effect of WM load, F(1,94)=25.11, p<.001, η2=.21. Dual-task participants (M=1.44, SD=.62) took significantly longer to make a decision than Single-task participants (M=.79, SD=.38). One possibility for the longer response times for Dual-task participants is that they spent time rehearsing the information for the numerical Stroop task before making a decision on each trial.
Figure 5.
Average response times for participants in each condition.
Model-based Analyses
We fit each participant’s data individually with the Softmax and WSLS models. We also fit a Baseline or null model that assumes fixed choice probabilities (Worthy & Maddox, 2012; Yechiam et al., 2005; Gureckis & Love, 2009b). This model has one free parameter, p(Increasing), that represents the probability of selecting the Increasing option on any given trial. The probability of selecting the Decreasing option is 1-p(Increasing). While this model does not assume that participants learn from rewards given on each trial it can often provide a good fit to the data (Gureckis & Love, 2009b). Indeed, the optimal strategy in each task is to select the best option repeatedly, and data from a participant who exhibited such maximizing behavior would be well fit by the Baseline model. Thus, a good fit for the baseline model does not necessarily imply random responding.
The models were assessed based their ability to predict each choice a participant would make on the next trial, by estimating parameter values that maximized the log-likelihood of each model given the participant’s choices. We used Akaike’s Information Criterion (AIC) (Akaike, 1974) to examine the fit of the WSLS and Softmax model’s relative to the fit of the Baseline model. AIC penalizes models with more free parameters. For each model, i, AICi is defined as:
| (3) |
where Li is the maximum likelihood for model i, and Vi is the number of free parameters in the model. Smaller AIC values indicate a better fit to the data. We compared the fits of the WSLS and Softmax models relative to fits of the Baseline model by subtracting the AIC of each model from the AIC of the Baseline model for each participant’s data (e.g. Gureckis & Love, 2009b):
| (4) |
Positive values indicate a better fit of the learning model and negative values indicate a better fit of the Baseline model.
The relative fits for each model are listed in Table 1. Data from Single-task participants are better fit by the WSLS model than Dual-task participants who are better fit by the Softmax model. Overall, relative fit values were lower for Dual-task participants, although data from Single task participants who performed the Decreasing-optimal task fit no better by the Softmax model than the Baseline model. This could suggest that Dual-task participants’ behavior is more random than the behavior of Single-task participants (cf. Franco-Watkins et al., 2006), or that they followed a near-deterministic response process by selecting the Decreasing option on the majority of trials. However, average relative fits of the Softmax model for Dual-task participants were well above 0 in both tasks which suggests that this model provided a good fit to a large number of Dual-task participants’ data. We also examined the proportion of data sets that were fit best by the Baseline model. The Baseline model provided the best fit for 20% of Dual-task participants in the Increasing-optimal task and 39% of participants in the Decreasing-optimal task, compared to only 4% and 12% of Single-task participants in the Increasing and Decreasing-optimal tasks, respectively. These differences in the proportion of Single and Dual-task participants that were best fit by the Baseline model are both significant (p<.05 by binomial test). Thus, there was more evidence for Dual-task participants being best fit by the Baseline model relative to Single-task participants, but there was also much evidence that Dual-task participants were fit well by the Softmax model which suggests that they were not simply behaving randomly.
Table 1. Relative fits of the WSLS and Softmax models to the Baseline model for Experiment 1.
| WSLS | Softmax | |
|---|---|---|
| Increasing-Optimal Task | ||
| Single-Task | 60.15 (12.56) | 29.51 (9.41) |
| Dual-Task | 4.32 (8.29) | 16.93 (6.99) |
| Decreasing-Optimal Task | ||
| Single-Task | 59.79 (14.23) | −.20 (3.57) |
| Dual-Task | 3.18 (7.57) | 13.75 (6.38) |
Note: Positive values indicate a better fit of each model than the Baseline model. Standard errors of the mean are listed in parentheses
Having established that participants were by and large well characterized by models that exhibit dependence on previous actions and outcomes rather than a null model, we next compared the relative fit of the WSLS model to the Softmax model. To obtain a relative measure of the degree to which the WSLS provided a better fit to the data than the Softmax model we subtracted the log-likelihood of the WSLS model from the log-likelihood of the Softmax model (Relative fitWSLS = lnLSoftmax - lnLWSLS). Because these models have the same number of free parameters, their log-likelihood values can be compared directly. Under our likelihood ratio metric, positive Relative fitWSLS values indicate a better fit for the WSLS model, while negative Relative fitWSLS values indicate a better fit for the Softmax model.
Figure 6 shows the Relative fitWSLS values for participants in each condition. We performed non-parametric Mann-Whitney U-tests to examine the effects of WM load and reward structure as the Relative fitWSLS distributions were markedly non-normal. There was a significant effect of WM load, U=564.50, p<.001. Single-task participants (M=45.35, SD=66.36) had much higher Relative fitWSLS values than Dual-task participants (M=−11.63, SD=22.40). This suggests that Single-task participants were more likely to use a WSLS strategy while Dual-task participants were more likely to use a Softmax strategy. There was no significant effect of reward structure, U=1,316, p>.10.
Figure 6.
Average relative fit values of the WSLS model compared to the Softmax reinforcement learning model for data from Experiment 1. Higher values indicate a better fit for the WSLS model.
Parameter estimates
The average estimated parameter values for participants in each condition are listed in Table 2. We conducted a 2 (WM load) X 2 (Reward Structure) ANOVA on the average estimated recency parameter (α) values for participants in each condition. There was a marginally significant effect of WM load, F(1,94)=3.11, p<.10, η2=.03. Data from participants in the Single-task condition were best fit by higher recency parameter values (M=.39, SD=.45) than data from participants in the Dual-task condition (M=.24, SD=.36). There was no effect of Reward Structure and no WM Load X Reward Structure interaction (both F<1). The difference in recency parameter values between Single and Dual-task participants is consistent with the results of Otto and colleagues (2011), who found higher estimated recency parameter values for Single-task participants who performed a simple binary prediction task. Otto and colleagues also found increased reliance on a WSLS strategy among Singl-task participants, compared to Dual-task participants. One interpretation of this difference is that Single-task participants are more responsive to recent outcomes. One way in which the Softmax model may account for data from participants who are using a WSLS strategy is by better fitting the data with higher estimated recency parameter values. This suggests that there may be an association between recency parameter values and Relative fitWSLS. We did indeed find such a correlation (r=.23, p<.05).3
Table 2. Average Best-fitting Parameter Values for Participants in Each Condition in Experiment 1.
| P(stay|win) | P(shift|loss) | α | γ | |
|---|---|---|---|---|
| Increasing-Optimal Task | ||||
| Single-Task | .82 (.02) | .41 (.04) | .36 (.09) | .12(.04) |
| Dual-Task | .73 (.03) | .51 (.02) | .21 (.07) | .19(.06) |
| Decreasing-Optimal Task | ||||
| Single-Task | .80 (.02) | .33 (.03) | .43(.09) | .12(.05) |
| Dual-Task | .82 (.41) | .41 (.03) | .27(.08) | .14(.05) |
Note: Standard errors of the mean are listed in parentheses.
A 2 (WM load) X 2 (Reward Structure) ANOVA on the average estimated exploitation parameter (γ) values for participants in each condition revealed no effect of WM Load or Reward Structure, and no interaction (all F<1). For the P(stay|win) parameter, a 2 (WM load) X 2 (Reward Structure) ANOVA showed no effect of WM Load, F(1,94)=1.28, p>.10, and no effect of Reward Structure F(1,94)=1.08, p>.10. However, there was a significant WM Load X Reward Structure interaction, F(1,94)=4.12, p<.05, η2=.04. For the Increasing-optimal task there was data from Single-task participants (M=.83, SD=.09) were best fit by significantly higher P(stay|win) parameter values than Dual-task participants (M=.74, SD=.17), F(1,94)=5.52, p<.05, η2=.10. However, there was no difference between Single and Dual-task participants for the Decreasing-optimal task (F<1).
For the P(shift|loss) parameter, a 2 (WM load) X 2 (Reward Structure) ANOVA revealed a main effect of WM Load, F(1,94)=8.55, p<.01, η2=.08. Data from Single-task participants (M=.37, SD=.18), were best fit by lower P(shift|loss) parameter values than data from Dual-task participants (M=.46, SD=.12). There was also a main effect of Reward Structure, F(1,94)=9.95, p<.01, η2=.10. Estimated P(shift|loss) parameters were higher for data from participants who performed the Increasing-optimal task (M=.46, SD=.15), than participants who performed the Decreasing-optimal task (M=.37, SD=.15). The WM load X Reward Structure interaction was not significant.
Cross-fitting analysis
Recent work has demonstrated that model complexity cannot be solely accounted for by measures like AIC or the Bayesian Information Criterion (BIC: Schwartz, 1978) to penalize models for the number of free parameters (Myung & Pitt, 1997; Djuric, 1998). Often models with the same number of free parameters can differ in how flexible they are in accounting for data. For example, the WSLS model may be more flexible than the Softmax model in that it can account for a wider range of behavior in decision-making tasks, or the Softmax model may be more flexible in that it can account for data from participants who are using a WSLS strategy with higher recency parameter values. To address this issue we used a procedure known as the Parametric Bootstrap Cross-fitting Method (PBCF) proposed by Wagenmakers and colleagues (Wagenmakers, Ratcliff, Gomez, & Iverson, 2004; see also Donkin, Brown, Heathcote, and Wagenmakers, 2011 for a similar approach). This method involves simulating a large number of data sets with each of two models, and then fitting each data set with each model. If neither model can mi mic the other model then the model that generated the data should provide the best fit to the majority of data sets. However, if a large proportion of data sets that are generated by one model are fit better by the non d suggest that the non-data generating model is more flexible in its ability to mimic the true data generating model.
We used sets of parameter values that were estimated from our participants’ data for the simulated data sets. For each task we generated 2,000 data sets using parameter combinations that were sampled with replacement from the best that were sampled with replacement from the best-fitting parameter distributions for participants in each experiment. Thus, for the Softmax model we randomly sampled a combination of α and γ that provided the best fit to one participant’s data and use those parameter values to perform one simulation of the task. We generated 2,000 simulated data sets in this manner, and performed the same simulation procedure with the WSLS model. We then fit each simulated data set with both models and determined the Relative fitWSLS value for each data set as outlined above.
Figure 7 plots histograms of the Relative fitWSLS values for data generated by each model for the Increasing-optimal (Figure 7a) and Decreasing-optimal tasks (Figure 7b). These values should be less than zero for data generated by the Softmax model and greater than zero for data generated by the WSLS model if neither model is able to mimic the other model. For the Increasing-optimal 85% of data sets generated by the Softmax model were also fit best by the Softmax model, while 98% percent of data sets generated by the WSLS model were also fit best by the WSLS model. This suggests that the WSLS model has a higher probability of mimicking the Softmax model than vice versa. The overall probability of correct classification of the data-generating model is (.85+.98)/2=.915 (Wagenmakers et al., 2004).
Figure 7.
Distributions of fits of the WSLS model relative to the Softmax model for data simulated by each model for the Increasing-optimal (a) and Decreasing-optimal tasks (b). Values greater than zero indicate a better fit for the WSLS mode, values less than zero indicate a better fit for the Soft max model, and values equal to zero indicate an equal fit for both models.
For the Decreasing-optimal task 71% of data sets generated by the Softmax model were also fit best by the Softmax model, and 97% of data sets generated by the WSLS model were also fit best by the WSLS model. The overall probability of correct classification of the data generating model is .84. These results suggest that a.) there is a high probability of correctly classifying the model as the true data generating model, and the probability is higher for the Increasing-optimal task than the Decreasing-optimal task (.915 versus .840), and b.) that the WSLS model is better able to mimic the Softmax model than vice versa.
However, it should be noted that the majority of data sets that were generated by the Softmax model, but better fit by the WSLS model had Relative fitWSLS values that were only slightly greater than 0, indicating only a slightly better fit by the WSLS model. This is indicated by the large spikes around the zero point in the x-axis in Figures 7a and 7b. Thus, for many data sets the models provided essentially the same fit to the data. The average Relative fitWSLS values for Single-task participants in both tasks were extremely high (M=30.65 for the Increasing-optimal task and M=59.99 for the Decreasing-optimal task). Our cross-fitting analysis suggests that it is unlikely that these large Relative fitWSLS values for Single-task participants resulted from the WSLS model mimicking the Softmax model. Thus, while the WSLS model may be able to mimic the Softmax model to some extent, it is highly unlikely that Single-task participants who were best fit by the WSLS model were actually using a strategy that is consistent with the Softmax model.
Discussion
The data clearly indicate that WM load modulates the degree to which participants attend to the immediate versus delayed consequences of their actions, and the model-based analyses support this conclusion. We used an experimental paradigm that allowed us to examine this immediate vs. delayed dichotomy by offering two choices, one that always led to a larger immediate reward (the Decreasing option), and one that always led to larger rewards for both options on future trials (the Increasing option), despite giving a smaller immediate payoff on each trial. The paradigm also allowed us to manipulate which of the two options was the optimal choice for the task, and this allowed us to examine whether WM load affected the preference for the immediate vs. delayed rewarding option or whether it affected participants’ ability to juxtapose the benefits of each option. Both our behavioral and model-based analyses clearly show that WM load affected only the preference for immediate vs. delayed rewarding option, with participants in the Single-task conditions preferring the Increasing option, which led to larger delayed rewards, in both tasks more than participants in the Dual-task conditions. Thus, the WM load brought about by the concurrent dual task hurt in one case but helped in the other.
The model-based analyses show a clear effect of WM load on the relative fit of two models that assume different decision-making strategies. Behavior of Single-task participants were best fit by the WSLS model which assumes that participants maintain the reward received on the last trial in WM and use it as a benchmark for deciding whether to stay with the option that was picked or shift to the other option. A WSLS strategy will allow participants to notice how each option causes future rewards to either increase or decrease. For example, if the participant repeatedly selects the Increasing option then they will repeatedly “win,” and if the participant repeatedly selects the Decreasing option then they will repeatedly “lose.”
Data from Dual task participants were fit less well by the WSLS model, compared to the Softmax reinforcement learning model. This suggests that WM load led participants to update their expected values for each option based only on the rewards received immediately after each selection. The WM load manipulation may have prevented Dual-task participants from noticing how previous choices affected rewards on future trials. The modeling results also suggest that there may be separate implicit and explicit systems that operate in different ways. The implicit system may develop expected values based on immediate feedback and compare the expected values for each option to make decisions, while the explicit system may use verbalizable rules like “pick the same option if the reward was greater than or equal to the reward received on the previous trial, or switch options if the reward received was less than the reward received on the previous trial”, and makes decisions based on the rule being used (e.g. Otto et al., 2011). This is similar to a prominent dual systems view in category learning that posits the existence of an explicit, rule-based system, and a procedural, information-integration systems that relies on immediate feedback (e.g. Ashby, Alfonso-Reese, Turken, & Waldron, 1998; Ashby & Maddox, 2005; Maddox & Ashby, 2004). The results are also consistent with a long line of research that has posited the existence of dual learning systems, with a major distinction being between a WM-dependent explicit system, and a non-WM dependent implicit, associative system (Metcalfe & Mischel, 1999; Smith & Decoster, 2000; Evans, 2008; Strack & Deutsch, 2008; Sloman, 1996; Poldrack & Packard, 2003).
While the two tasks used in Experiment 1 differed in the choice that was optimal, they also differed in how many points would be rewarded if one consistently picked the optimal versus the inferior option. In the Increasing-optimal task the participant earned 80 points for selecting the Increasing option repeatedly, whereas they earned only 40 points for selecting the Decreasing option repeatedly. In the Decreasing-optimal task they earned 65 points for selecting the Decreasing option repeatedly, whereas they earned only 55 points for selecting the Increasing option repeatedly. Thus what we will call “end-state separation” or the difference in points earned between repeatedly selecting the optimal option versus repeatedly selecting the sub-optimal option, differed between the two tasks, with the Increasing-optimal task having a larger end-state separation (40 points) than the Decreasing-optimal task (10 points).
In Experiment 2 we conceptually replicate the effect of WM load on decision-making and equate end-state separation between the Increasing-optimal and Decreasing-optimal tasks. Specifically, we have participants perform either the Increasing-optimal task from Experiment 1 or a Decreasing-optimal task with a large end-state separation (40 points, which is the end-state separation for the Increasing optimal task), under either single or dual-task conditions. These reward structures are shown in Figure 8.
Figure 8.
Rewards given for each task as a function of the number of the number of Increasing option selections over the previous ten trials. (a.) Rewards given for the Increasing-optimal task. Selecting the Increasing option 10 consecutive times will lead to a reward of 80 points on each trial, whereas selecting the Decreasing option 10 consecutive times will lead to a reward of 40 points on each trial. (b.) Rewards given for the Decreasing-optimal task. Selecting the Decreasing option 10 consecutive times will lead to a reward of 95 points on each trial, whereas selecting the Increasing option 10 consecutive times will lead to a reward of 55 points on each trial.
Here, we also use a more traditional “Gambling task” paradigm rather than the Mars Farming cover story used in Experiment 1. Figure 9 shows a sample screen shot from the experiment. A main difference between the Mars Farming task and Gambling task paradigms, of potential relevance to the WM task manipulation, is the form of feedback given after each decision. In the Mars Farming task setup feedback is given in the form of the amount of oxygen shown in the narrow tank after each selection, and no numbers are shown to represent the amount of oxygen given. In the Gambling task setup numbers representing points or money are given as feedback following each selection. One possibility is that a numerical form of feedback may be easier to remember than a perceptual form of feedback like the amount of oxygen shown in the tank. Thus, Dual-task participants may be better able to remember past reward amounts if given concrete numerical reward information. For Single-task participants, numerical information about the rewards given on each trial may bias them away from selecting the increasing option in the Decreasing-optimal task. The Decreasing option gives 90 more points on each trial than the Increasing option in this task, which should be very noticeable to participants.
Figure 9.

Sample screen shot from the Gambling task used in Experiment 2. Participants received points after each selection, and had a goal of earning a certain amount of points over the course of the experiment.
However, we did not predict that changing the task setup would alter the differences between single and dual task participants that we found in Experiment 1. Nevertheless it is important to replicate the results in a Gambling task framework to broaden the scope of our findings. Additionally, a replication would suggest that the surface features of the task do not significantly affect decision-making behavior. To our knowledge this has not been directly tested. Such a finding might be useful for designing future work. For example, researchers might be able to examine differences in decision-making behavior in a within-subjects design by altering the surface features of the task to minimize effects of repeated testing.
Experiment 2
Method
Participants
Seventy-nine undergraduates from Texas A&M University participated in the experiment for course credit. Participants were randomly assigned to one of the four conditions that resulted from the factorial combination of 2 WM load (Single vs. Dual-task) and 2 Reward Structure (Increasing vs. Decreasing-optimal) conditions.
Materials and Procedure
The reward structure for the Increasing-optimal task was identical to the reward structure from the Increasing-optimal task in Experiment 1, shown again in Figure 8a, and the reward structure for the Decreasing optimal task is shown in Figure 8b. These reward structures both had an end-state separation of 40 points. Figure 9 shows a sample screen-shot from the experiment. Two-decks were presented on each trial, a red deck and a blue deck. In the Single-task condition participants selected from one of the two decks, and the resulting reward value was revealed above the chosen deck. The point total was then incremented based on the number of points earned on that trial. Participants were given a goal of earning 18,000 points in the Increasing optimal task, and 22,000 points for the Decreasing-optimal task. To achieve these goals participants had to select the optimal option on about 80% of trials.
As in the first experiment, Dual-task participants also concurrently performed the numerical Stroop task. The procedure for the numerical Stroop task is identical to the procedure detailed in the Method section for Experiment 1 above. Participants performed a total of 250 trials, and were told whether they reached their goal at the end of the experiment.
Results
Figure 10a shows the proportion of Increasing option selections for participants in each condition. A 2 (WM load) X 2 (Reward Structure) ANOVA revealed a significant effect of WM load, F(1,75)=39.56, p<.001, η2=.35. Single-task participants (M=.54, SD=.27) selected the Increasing option more often than Dual-task participants (M=.24, SD=.17). There was also a significant effect of reward structure, F(1,75)=7.18, p<.01, η2=.09. Participants who performed the Increasing-optimal task (M=.45, SD=.27) selected the increasing option more often than participants who performed the Decreasing-optimal task (M=.32, SD=.25). The WM load X reward structure interaction was not significant (F<1).
Figure 10.
(a.) Proportion of Increasing options selections for participants in each condition of Experiment 2. (b.) Proportion of the optimal cumulative payoff participants earned in the task. These proportions are roughly equivalent to the proportion of times participants selected the optimal option based on the reward structure.
Figure 10b shows the proportion of the optimal cumulative payoff earned by participants in each condition. This was computed in the same way as in Experiment 1, and is roughly equivalent to the proportion of times they selected the optimal option for the task. A 2 (WM load) X 2 (Reward Structure) ANOVA revealed a main effect of reward structure, F(1,75)=25.46, p<.001, η2=.25, where participants who performed the Decreasing-optimal task (M=.69, SD=.24) earned a higher proportion of the optimal cumulative payoff than participants who performed the Increasing-optimal task (M=.45, SD=.27). The effect of WM load was non-significant (F<1), however, there was a significant WM load X reward structure interaction, F(1,75)=39.32, p<.001, η2=.34. Pairwise comparisons within each reward structure condition showed that Single-task participants (M=.62, SD=.25) earned a higher proportion of the optimal cumulative payoff than Dual-task participants (M=.28, SD=.17) in the Increasing-optimal task, F(1,38)=24.89, p<.001, η2=.40, while Dual-task participants (M=.81, SD=.16) earned a higher proportion of the optimal cumulative payoff than Single-task participants (M=.56, SD=.24) in the Decreasing-optimal task, F(1,38)=14.94, p<.001, η2=.25.
Figure 11 shows the average response times for participants in each condition. A 2 (WM load) X 2 (Reward Structure) ANOVA revealed a main effect of WM load, F(1,75)=32.50, p<.001, η2=.30. Single-task participants (M=.67, SD=.54) took significantly less time to respond than Dual-task participants (M=1.32, SD=.46). There was no effect of Reward Structure, F(1,75)=1.13, p>.10, and no WM load X Reward Structure interaction (F<1).
Figure 11.
Average response times for participants in each condition.
Model-based analyses
We used the same modeling approach used in Experiment 1 to examine the types of strategies participants used. Table 3 lists the fits of each learning model relative to the stochastic Baseline model. Positive values indicate a better fit for each learning model than the baseline model. The results are similar to those from Experiment 1. Single-task participants are better fit by the WSLS model, while Dual task participants are better fit by the Softmax model. Fits of the learning models relative to the Baseline model are also higher for Single-task participants. In the Increasing-optimal task 10% of data sets from Single-task participants are best fit by the Baseline model, compared to 40% of data sets for Dual-task participants (p<.01 by binomial test). However, in the Decreasing optimal task the difference in the number of data sets best fit by the Baseline model was much smaller. 5% of data sets from Single-task participants were fit best by the Baseline model, compared to 10% of data sets from Dual-task participants which was not a significant difference (p>.10 by binomial test). Thus, Dual-task participants who performed the Decreasing optimal task did not exhibit more stochastic behavior than Single-task participants who performed the same task.
Table 3. Relative fits of the WSLS and Softmax models to the Baseline model for Experiment 2.
| WSLS | Softmax | |
|---|---|---|
| Increasing-Optimal Task | ||
| Single-Task | 63.62 (12.47) | 39.51 (11.83) |
| Dual-Task | −6.13 (5.45) | 3.35 (3.07) |
| Decreasing-Optimal Task | ||
| Single-Task | 69.35 (12.62) | 4.21 (4.76) |
| Dual-Task | 5.12 (8.09) | 11.51 (3.15) |
Note: Positive values indicate a better fit of each model than the Baseline model. Standard errors of the mean are listed in parentheses.
Figure 12 shows the Relative fitWSLS values for participants in each condition. There was a significant effect of WM load, Mann-Whitney U=299.00, p<.001. Single-task participants (M=44.62, SD=58.29) had much higher Relative fitWSLS values than Dual-task participants (M=-11.63, SD=22.40). The effect of reward structure was non-significant, Mann-Whitney U= 937.00, p>.10. Thus, replicating our result from Experiment 1, Single-task participants were fit better by the WSLS model, while Dual-task participants were fit better by the Softmax model.
Figure 12.
Average relative fit values of the WSLS model compared to the Softmax reinforcement learning model for data from Experiment 2. Higher values indicate a better fit for the WSLS model.
Parameter Estimates
We next examined the best-fitting parameter estimates from each model (Table 4). We conducted a 2 (WM load) X 2 (Reward Structure) ANOVA on the average estimated recency parameter (α) values for participants in each condition. There was a marginally significant effect of WM load, F(1,75)=3.19, p<.10, η2=.04. Single-task participants’ data (M=.47, SD=.47) were best fit by higher recency parameter values than Dual-task participants’ data (M=.29, SD=.41). There was also a significant correlation between estimated recency parameter values and Relative fitWSLS values (r=.26, p<.05). This result is consistent with the results from Experiment 1 and Otto et al. (2011), and suggests that the Softmax model accounts for Single-task participants’ increased WSLS strategy use by estimating higher recency parameter values for Single-task participants than for Dual-task participants. A 2 (WM load) X 2 (Reward Structure) ANOVA on the average estimated exploitation parameter (γ) values for participants in each condition revealed no effect of WM load or Reward Structure (both F<1), and no WM load X Reward Structure interaction, F(1,75)=1.89, p>.10.
Table 4. Average Best-fitting Parameter Values for Participants in Each Condition in Experiment 4.
| P(stay|win) | P(shift|loss) | α | γ | |
|---|---|---|---|---|
| Increasing-Optimal Task | ||||
| Single-Task | .81 (.05) | .56 (.04) | .42 (.11) | .17 (.06) |
| Dual-Task | .73 (.04) | .52 (.02) | .29 (.07) | .07 (.01) |
| Decreasing-Optimal Task | ||||
| Single-Task | .81 (.05) | .43 (.03) | .53 (.11) | .07 (.05) |
| Dual-Task | .87 (.41) | .47 (.03) | .29 (.10) | .13 (.07) |
Note: Standard errors of the mean are listed in parentheses.
We next conducted a 2 (WM load) X 2 (Reward Structure) ANOVA on the average estimated P(stay|win) parameter values for participants in each condition. There was no effect of WM load (F<1), but there was a marginally significant effect of Reward Structure, F(1,75)=2.91, p<.10, η2=.04. P(stay|win) parameter values were higher for participants who performed the Decreasing-optimal task (M=.84, SD=.17), than for participants who performed the Increasing optimal task (M=.77, SD=.19). There was a also a marginally significant WM load X Reward Structure interaction, F(1,75)=2.79, p<.10, η2=.04. For the Increasing-optimal task Single-task participants’ data (M=.81, SD=.21) were fit best by higher P(stay|win) parameter values than Dual-task participants’ data (M=.73, SD=.17), although this difference was non-significant, F(1,38)=1.65, p>.10. In contrast, for the Decreasing-optimal task Single-task participants’ data (M=.81, SD=.87) were fit best by lower P(stay|win) parameter values than Dual-task participants’ data (M=.87, SD=.12), although this difference was also non-significant, F(1,37)=1.15, p>.10.
A 2 (WM load) X 2 (Reward Structure) ANOVA on the average estimated P(lose|shift) parameter values for participants in each condition showed no effect of WM load (F<1), but there was a significant effect of Reward Structure, F(1,75)=6.03, p<.05, η2=.07. P(lose|shift) parameter values were higher for participants who performed the Increasing-optimal task (M=.54, SD=.15) than for participants who performed the Decreasing-optimal task (M=..45, SD=.16). The WM load X Reward Structure interaction did not reach significance, F(1,75)=1.56, p>.10.
Cross-fitting analysis
We performed the same cross-fitting analysis that we performed for Experiment 1 data to examine the degree to which each model can mimic the other model. Figure 13 plots histograms of the Relative fitWSLS values for data generated by each model for the Increasing-optimal (Figure 13a) and Decreasing optimal tasks (Figure 13b). For the Increasing-optimal task 76% of data sets generated by the Softmax model were also fit best by the Softmax model, while 98% percent of data sets generated by the WSLS model were also fit best by the WSLS model. For the Decreasing-optimal task 81% of data sets generated by the Softmax model were also fit best by the Softmax model, and 99 % of data sets generated by the WSLS model were also fit best by the WSLS model. Similar to the results from Experiment 1, there were a large number of data sets clustered around the 0 point on the x-axis which suggests that the WSLS model provided an equal or only slightly better fit that the Softmax model to a large number of data sets that were generated by the Softmax model, but only to a degree that it can fit a portion of the Softmax a portion of the Softmax-generated data sets roughly equally well, but not substantially better.
Figure 13.
Distributions of fits of the WSLS model relative to the Softmax model for data simulated by each model for the Increasing-optimal (a) and Decreasing-optimal tasks (b). Values greater than zero indicate a better fit for the WSLS mode, values less than zero indicate a better fit for the Softmax model, and values equal to zero indicate an equal fit for both models.
Discussion
The pattern of results is identical to the pattern from Experiment 1. Single-task participants selected the Increasing option more often than Dual-task participants regardless of which option was optimal for the task. Dual-task participants also took a longer time to respond which suggests that they were not simply making a quick decision to get to the numerical Stroop query, but may have engaged in effortful encoding of the numerical Stroop stimuli at the beginning of each trial before making a decision. Our model comparison revealed that Single-task participants were better fit by a model that assumes a rule-based WSLS strategy, while Dual-task participants were better fit by a Softmax reinforcement learning model that assumes a more associative strategy. When we compared the parameter estimates for participants in each condition we found that the Softmax model’s recency, or learning rate parameter values were higher for Single-task participants than for Dual-task participants. This effect was also found in Experiment 1 and previously found by Otto and colleagues (2011) in a binary prediction task. Thus, participants who performed the decision-making task under WM load were less responsive to recent outcomes and showed almost no evidence of WSLS strategy use. Instead, Dual-task participants used a strategy consistent with the Softmax model where they tended to select the option that, on average, gave the highest immediate reward (the Decreasing option). It should also be noted that a substantial proportion of Dual-task participants’ data were fit best by the Baseline model (although this was not the case for the Decreasing option in Experiment 2). This suggests that these participants may have used very little information from the rewards they received to adjust their response tendencies, and instead selected the Decreasing option on a large number of trials. This behavior seems plausible in that it was likely salient to the Dual-task subjects early in the task that the Decreasing option always gave a higher immediate reward, and so they simply decided to select this option on the majority of trials. The Baseline comparisons across both experiments do suggest that Dual-task participants exhibited attenuated learning in response to the rewards they received on each trial. However, we do not feel that our results suggest that Dual-task participants were simply behaving randomly, as they clearly showed a preference for the Decreasing option.
General Discussion
WM load is a common variable in everyday decision-making. People make decisions both while focusing solely on the decision at hand, and while keeping intrusive, extraneous information in mind (Burgess, 2000; Burgess, Veitch, Costello, & Shallice, 2001; Strayer & Drews, 2007). One straightforward prediction is that more available WM resources will lead to better decision-making, yet our results suggest that this is not the case, and that WM load affects the degree to which participants focus on the immediate versus delayed benefits of each option, and whether people operate in an explicit, rule-based mode that utilizes heuristics like WSLS, or whether they operate in a more implicit, associative mode that maximizes the options with the highest immediate reward. Participants who performed the task under Single-task conditions did not simply make better decisions, but instead made their decisions in a different manner than participants who were placed under WM load. Single-task participants made their decisions based on a comparison of the current reward to the reward received on the previous trial, while Dual-task participants made decisions based on which option would give the best immediate reward. Focusing on the differences in rewards received across trials was productive for one task (the Increasing-optimal task) but counterproductive for the other task (the Decreasing-optimal task).
When placed in the context of the three components of decision-making outlined above, our results suggest that WM load does not affect the third, and perhaps most important, component of decision-making, the juxtaposition of the immediate versus delayed consequences of selecting each option. Rather, WM load appears to directly affect the first two components. The results for participants in the Single-task conditions of each experiment are perhaps most interesting. Single-task participants exhibited sensitivity to the Increasing option’s effect on the rewards for both options on future trials, yet they seemed to pick this option without realizing that the immediate benefits of the Decreasing option outweighed the delayed benefits of the Increasing option in the Decreasing-optimal tasks. This pattern of results generalized across two separate experiments with different reward structures (small vs. large end state separation) and different surface features (Mars farming and Gambling). Single-task participants selected the Increasing option more often than Dual-task participants even when the Decreasing option gave 90 more points than the Increasing option on each trial, and where the difference in end state separation was 40 points (Experiment 2).
Moreover, the Decreasing-optimal task can be considered “easier” than the Increasing optimal task because participants earned a higher proportion of the optimal cumulative payoff in the Decreasing-optimal conditions in both experiments. Thus, Single-task participants performed better on the “harder” Increasing-optimal tasks, but worse on the “easier” Decreasing-optimal tasks. These results could suggest a lack of meta-cognitive awareness among our participants where they are unable to appropriately weigh the immediate and delayed benefits of each option to make the best decisions.
One caveat to this work is that we chose one working memory load manipulation (the numerical Stroop task) among many. Other working memory tasks, such as memorizing a digit span, keeping a running count of tones presented during the task, or generating a random number, have also been successfully used to manipulate WM load. While we believe that the numerical Stroop task successfully depleted working memory resources there is also evidence that different working memory load manipulations can differentially affect working memory capacity (e.g. Jameson, Hinson, & Whitney, 2004; Miles & Minda, 2010). Thus, the possible differences in the effects of specific working memory load manipulations should be considered when designing future research.
Dual Learning Systems
The implications of these results should also be considered for dual-system theories cognition. Many dual-system theories would likely predict that worse performance for Dual-task participants on the Increasing-optimal task was due to those participants using a more reflexive strategy which would favor options with larger immediate rewards. However, one could predict that Single-task participants who are operating in a more reflective, or “cold-cognitive” mode should be capable of more rational, abstract reasoning, and should be able to learn the optimal choice for the task (Evans, 2003).
Our modeling results suggest that, rather than behaving rationally, Single-task participants used their unoccupied WM resources to employ a simple rule-based strategy that was not always optimal. Future work should examine what affects this third-opponent of decision-making. One possibility is that, for Single-task participants, both systems are operating simultaneously, and that the explicit system is dominant. This view of competition between the systems is a central feature of Ashby and colleagues’ COmpetition between Verbal and Implicit Systems (COVIS) theory of category learning (Ashby et al., 1998). One possibility is that the implicit system interferes with the explicit system under Single-task conditions, which leads Single-task participants to utilize simple rules like WSLS, rather than allowing them to reflectively juxtapose the strengths and weaknesses of each option. We admit that this line of reasoning is speculative, but future work could potentially develop testable predictions from theories that propose competition between implicit and explicit systems.
As we acknowledged in the introduction, the notion that the Softmax model mimics an implicit, or procedural processing mode is consistent with work that has demonstrated that sub-cortical regions can develop and update expected reward values based on the immediate rewards received after selecting each option (Pagnoni, Zink, Montague, & Berns, 2000; Pessiglione, Seymour, Flandin, Dolan, & Frith, 2006; Hare, O’Doherty, Camerer, Schultz, & Rangel, 2008). However, the strategy, “select the option with that has, on average, given the largest rewards on recent trials,” is obviously verbalizable and capable of being explicitly represented. This strategy is likely more simple than the WSLS shift strategy, and so the dual-task may have simply led participants to use a simpler strategy that is also explicit in nature.
Related to the dual-systems issue, is the role that affect played in guiding our participants’ behavior. Many dual-systems theories propose that the implicit system is the “hot” or affectively rich system (Evans, 2003; Sloman, 1996), and affect has been shown to play a role in gambling tasks like the ones our participants performed (Bechara et al., 1994; Bechara, Damasio, & Damasio, 2000). One possibility is that the hot, affective system favors options based on their immediate rewards and that the cold-cognitive system must override this tendency to select option that do not lead to the best immediate rewards. This notion of competition between implicit (affective) and explicit (cognitive) systems whereby the explicit system must override the implicit system is a central feature of many dual-system theories (e.g. Sloman, 1996; Cohen, 2005). In our experiments dual-task participants’ compromised working memory resources may have prevented them from overriding the affect-based tendency to select the option that gave higher immediate rewards.
Implications and Future Directions
A central aim for future work should be to examine what processes affect the third component of decision-making outlined above, which could be described as making the optimal decision for the situation. One possibility is that WM capacity affects the juxtaposition of the immediate and long-term consequences of each action, but WM load does not. Our study only examined WM load, and did not measure individual differences in WM capacity. Future work should identify whether individual differences in WM capacity affect decision-making in situations similar to the experiments reported here, and also whether WM capacity interacts with WM load. Individuals with high WM capacity may outperform individuals with low WM capacity on both tasks, however, it is also possible that individual differences in WM capacity will have the same effect on preferences for immediate versus delayed outcomes that we found by manipulating WM load.
These results have important implications for everyday decision-making. Multitasking has become a ubiquitous part of modern life (e.g. Vestergren & Nilsson, 2011; Burgess et al., 2001). Here we demonstrated that the degree to which WM resources are divided between separate, simultaneous tasks appears to have very large effects on the strategies people employ in decision-making. Our results indicate that attenuating WM resources by performing a concurrent task can lead to a temporal myopia in decision-making where individuals are only able to focus on the immediate consequences of their actions. Intriguingly, our results also suggest that being free from a distracting secondary task is not always beneficial. The full resources available to our Single-task participants may have led them to employ a more WM-demanding strategy than was needed for a relatively easy task.
Acknowledgements
This research was supported by start-up funds from Texas A&M University to DAW, and NIMH grant MH077708 to WTM. ARO was supported by a Mike Hogg Endowment Fellowship. We thank Action Editor Andrew Heathcote and three anonymous reviewers for insightful comments and suggestions. We also thank Anna Anthony, Corey L. Dossey, Susan Garcia, Karla S. Gomez, Cara Henis, Abigail E. Moser, Crina Silasi-Mansat, Nicole A. Toomey, and Angelina M. Torres for valuable assistance in collecting the data.
Footnotes
In our paradigm which involves decision-making under uncertainty, the term ‘expected reward value’ is used to represent an estimate of the reward that will be received upon selecting each option. This is distinct from the term ‘Expected Value’ which has often been used to represent an optimal objective representation of the benefit of an option, and is calculated by multiplying the reward associated with each option by the probability of receiving the reward, when both are known, as in description-based decision-making paradigms (e.g. Tversky & Kahneman, 1981).
We did not observe any correlations with performance on the WM task and performance in the decision-making task in either Experiment. Most participants performed well above 80%, and so there was little variance in performance on the WM task within the Dual-task condition.
A reanalysis of the data reported in Otto et al. (2011) found a strong relationship between estimated recency parameter values and the relative fit of the WSLS model used to fit their data (r=.87, p<.001).
References
- Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control. 1974;19:716–723. [Google Scholar]
- Ashby FG, Alfonso-Reese LA, Turken AU, Waldron EM. A neuropsychological theory of multiple systems in category learning. Psychological Review. 1998;105:442–481. doi: 10.1037/0033-295x.105.3.442. [DOI] [PubMed] [Google Scholar]
- Ashby FG, Ell SW. Single versus multiple systems of category-learning: Reply to Nosofsky & Kruschke (2002) Psychonomic Bulletin & Review. 2002;9:175–180. doi: 10.3758/bf03196274. [DOI] [PubMed] [Google Scholar]
- Ashby FG, Maddox WT. Human category learning. Annual Review ofPsychology. 2005;56:149–178. doi: 10.1146/annurev.psych.56.091103.070217. [DOI] [PubMed] [Google Scholar]
- Bechara A, Damasio AR, Damasio H. Emotion, decision making and the orbitofrontal cortex. Cerebral Cortex. 2000;10:295–307. doi: 10.1093/cercor/10.3.295. [DOI] [PubMed] [Google Scholar]
- Bechara A, Damasio AR, Damasio H, Anderson SW. Insensitivity to future consequences following damage to human prefrontal cortex. Cognition. 1994;50:7–15. doi: 10.1016/0010-0277(94)90018-3. [DOI] [PubMed] [Google Scholar]
- Bechara A, Martian EM. Impaired decision-making related to working memory deficits in individuals with substance addictions. Neuropsychology. 2004;18:2189–2202. doi: 10.1037/0894-4105.18.1.152. [DOI] [PubMed] [Google Scholar]
- Bogacz R, McClure SM, Li J, Cohen JD, Montague PR. Short-term memory traces for action bias in human reinforcement learning. Brain Research. 2007;1153:111–121. doi: 10.1016/j.brainres.2007.03.057. [DOI] [PubMed] [Google Scholar]
- Burgess PW, Veitch E, Costello ADL, Shallice T. The cognitive and neuroanatomical correlates of multitasking. Neuropsychologia. 2001;38:848–863. doi: 10.1016/s0028-3932(99)00134-7. [DOI] [PubMed] [Google Scholar]
- Burgess PW. Strategy application disorder: the role of the frontal lobes in human multitasking. Psychological Research. 2000;63:279–288. doi: 10.1007/s004269900006. [DOI] [PubMed] [Google Scholar]
- Cohen JD. The vulcanization of the human brain: A neural perspective on interactions between cognition and emotion. The Journal of Economic Perspectives. 2005;19:3–24. [Google Scholar]
- Curtis CE, Lee D. Beyond working memory: the role of persistent activity in decision making. Trends in Cognitive Sciences. 2010;14:216–222. doi: 10.1016/j.tics.2010.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Djuric PM. Asymptotic MAP criteria for model selection. IEEE Transactions on Signal Processing. 1998;46:2726–2735. [Google Scholar]
- Donkin C, Brown S, Heathcote A, Wagenmakers EJ. Diffusion versus linear ballistic accumulation: Different models but the same conclusions about psychological processes? Psychonomic Bulletin & Review. 2011;18:61–69. doi: 10.3758/s13423-010-0022-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dretsch MN, Tipples J. Working memory involved in predicting future outcomes based on past experiences. Brain and Cognition. 2008;66:83–90. doi: 10.1016/j.bandc.2007.05.006. [DOI] [PubMed] [Google Scholar]
- Evans J.St.B.T. In two minds: dual process accounts of reasoning. TRENDS in Cognitive Sciences. 2003;7:454–459. doi: 10.1016/j.tics.2003.08.012. [DOI] [PubMed] [Google Scholar]
- Evans J.St.B.T. Dual-processing accounts of reasoning, judgment, and social cognition. Annual Review of Psychology. 2008;59:255–278. doi: 10.1146/annurev.psych.59.103006.093629. [DOI] [PubMed] [Google Scholar]
- Franco-Watkins AM, Pashler H, Rickard TC. Does working memory load lead to greater impulsivity? Commentary on Hinson, Jameson, & Whitney’s (2003) Journal of Experimental Psychology: Learning, Memory, & Cognition. 2006;32:448–450. doi: 10.1037/0278-7393.32.2.443. [DOI] [PubMed] [Google Scholar]
- Filoteo JV, Lauritzen S, Maddox WT. Removing the frontal lobes: The effects of engaging executive functions on perceptual category learning. Psychological Science. 2010;21:415–423. doi: 10.1177/0956797610362646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frank MJ, Claus ED. Anatomy of a decision: Striato-orbitofrontal interactions in reinforcement learning, decision-making, and reversal. Psychological Review. 2006;113:300–326. doi: 10.1037/0033-295X.113.2.300. [DOI] [PubMed] [Google Scholar]
- Frank MJ, Kong L. Learning to avoid in older age. Psychology & Aging. 2008;23:392–398. doi: 10.1037/0882-7974.23.2.392. [DOI] [PubMed] [Google Scholar]
- Frank MJ, O’Reilly RC, Curran T. When memory fails, intuition reigns.Midazolam enhances implicit inference in humans. Psychological Science. 2006;17:700–707. doi: 10.1111/j.1467-9280.2006.01769.x. [DOI] [PubMed] [Google Scholar]
- Gross JJ, Carstensen LL, Pasupathi M, Tsai J, Skorpen CG, Hsu AYC. Emotion and aging: Experience, expression, and control. Psychology & Aging. 1997;12:590–599. doi: 10.1037//0882-7974.12.4.590. [DOI] [PubMed] [Google Scholar]
- Gureckis TM, Love BC. Short-term gains, long term pains: How cues about state aid in learning in dynamic environments. Cognition. 2009a;113:293–313. doi: 10.1016/j.cognition.2009.03.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gureckis TM, Love BC. Learning in noise: Dynamic decision-making in a variable environment. Journal of Mathematical Psychology. 2009b;53:180–193. doi: 10.1016/j.jmp.2009.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hare TA, O’Doherty J, Camerer CF, Schultz W, Rangel A. Dissociating the role of orbitofrontal cortex and the striatum in the computation of goal values and prediction errors. The Journal of Neuroscience. 2008;28:5623–5630. doi: 10.1523/JNEUROSCI.1309-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hinson J, Jameson T, Whitney P. Somatic markers, working memory, and decision-making. Cognitive, Affective & Behavioral Neuroscience. 2002;2:341–353. doi: 10.3758/cabn.2.4.341. [DOI] [PubMed] [Google Scholar]
- Hinson J, Jameson T, Whitney P. Impulsive decision making and working memory. Journal of Experimental Psychology: Learning, Memory & Cognition. 2003;29:298–306. doi: 10.1037/0278-7393.29.2.298. [DOI] [PubMed] [Google Scholar]
- Jameson TL, Hinson JM, Whitney P. Components of working memory and somatic markers in decision making. Psychonomic Bulletin & Review. 2004;11:515–520. doi: 10.3758/bf03196604. [DOI] [PubMed] [Google Scholar]
- Lee MD, Zhang S, Munro M, Steyvers M. Psychological models of human and optimal performance in bandit problems. Cognitive Systems Research. 2011;12:164–174. [Google Scholar]
- Maddox WT, Ashby FG, Ing AD, Pickering AD. Disrupting feedback processing interferes with rule-based but not information-integration category learning. Memory & Cognition. 2004;32:582–591. doi: 10.3758/bf03195849. [DOI] [PubMed] [Google Scholar]
- Maddox WT, Ashby FG. Dissociating explicit and procedural-learning systems of perceptual category learning. Behavioral Processes. 2004;66:309–332. doi: 10.1016/j.beproc.2004.03.011. [DOI] [PubMed] [Google Scholar]
- McClure SM, Berns GS, Montague PR. Temporal prediction errors in a passive learning task activate human striatum. Neuron. 2003;38:339–346. doi: 10.1016/s0896-6273(03)00154-5. [DOI] [PubMed] [Google Scholar]
- Metcalfe J, Mischel W. A hot/cool system analysis of delay of gratification: Dynamics of willpower. Psychological Review. 1999;106:3–19. doi: 10.1037/0033-295x.106.1.3. [DOI] [PubMed] [Google Scholar]
- Miles SJ, Minda JP. The effect of concurrent verbal and visual tasks on category learning. Journal of Experimental Psychology: Learning, Memory & Cognition. 2011;37:588–607. doi: 10.1037/a0022309. [DOI] [PubMed] [Google Scholar]
- Myung IJ, Pitt MA. Applying Occam’s razor in modeling cognition: A Bayesian approach. Psychonomic Bulletin & Review. 1997;4:79–95. [Google Scholar]
- Newell BR, Dunn JC, Kalish M. The dimensionality of perceptual category learning: A state-trace analysis. Memory & Cognition. 2010;38:563–581. doi: 10.3758/MC.38.5.563. [DOI] [PubMed] [Google Scholar]
- Nosofsky RM, Kruschke JK. Single-system models and interference in category learning: Commentary on Waldron and Ashby (2001) Psychonomic Bulletin & Review. 2002;9:169–174. doi: 10.3758/bf03196274. [DOI] [PubMed] [Google Scholar]
- Novak M, Sigmund K. A strategy of win-stay lose-shift that outperforms tit-for-tat in the Prisoner’s Dilemma game. Nature. 1993;364:56–58. doi: 10.1038/364056a0. [DOI] [PubMed] [Google Scholar]
- Otto AR, Gureckis TM, Markman AB, Love BC. Navigating through abstract decision spaces: Evaluating the role of state generalization in a dynamic decision-making task. Psychonomic Bulletin & Review. 2009;16:957–963. doi: 10.3758/PBR.16.5.957. [DOI] [PubMed] [Google Scholar]
- Otto AR, Markman AB, Gureckis TM, Love BC. Regulatory fit and systematic exploration in a dynamic decision-making environment. Journal of Experimental Psychology: Learning, Memory and Cognition. 2010;36:797–804. doi: 10.1037/a0018999. [DOI] [PubMed] [Google Scholar]
- Otto AR, Markman AB, Love BC. Taking More, Now: The Optimality of Impulsive Choice Hinges on Environment Structure. Social Psychological and Personality Science. 2012;3(2):131–138. doi: 10.1177/1948550611411311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Otto AR, Taylor EG, Markman AB. There are at least two kinds of probability matching: Evidence from a secondary task. Cognition. 2011;118:274–279. doi: 10.1016/j.cognition.2010.11.009. [DOI] [PubMed] [Google Scholar]
- Pagnoni G, Zink CF, Montague PR, Berns GS. Activity in human ventral striatum locked to errors of reward prediction. Nature Neuroscience. 2002;5:97–98. doi: 10.1038/nn802. [DOI] [PubMed] [Google Scholar]
- Pessiglione M, Seymour B, Flandin G, Dolan RJ, Frith CD. Dopamine-dependent prediction errors underpin reward-seeking behavior in humans. Nature. 2006;442:1042–1045. doi: 10.1038/nature05051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Poldrack RA, Packard MG. Competition among multiple memory systems: converging evidence from animal and human brain studies. Neuropsychologia. 2003;41:245–251. doi: 10.1016/s0028-3932(02)00157-4. [DOI] [PubMed] [Google Scholar]
- Rachlin H. Self-control: Beyond commitment. Behavioral and Brain Sciences. 1995;18:109–121. [Google Scholar]
- Rakow T, Newell BR. Degrees of uncertainty: An overview and framework for future research on experience-based choice. Journal of Behavioral Decision Making. 2010;23:1–14. [Google Scholar]
- Sloman SA. The empirical case for two systems of reasoning. Psychological Bulletin. 1996;119:3–22. [Google Scholar]
- Smith ER, DeCoster J. Dual-process models in social and cognitive psychology: conceptual integration and links to underlying memory systems. Personality and Social Psychology Review. 2000;4:108–31. [Google Scholar]
- Steyvers M, Lee MD, Wagenmakers EJ. A Bayesian analysis of human decision-making on bandit problems. Journal of Mathematical Psychology. 2009;53:168–179. [Google Scholar]
- Strack F, Deutsch R. Reflective and impulsive determinants of social behavior. Personality and Social Psychology Review. 2004;8:220–247. doi: 10.1207/s15327957pspr0803_1. [DOI] [PubMed] [Google Scholar]
- Strayer DL, Drews FA. Cell-phone-induced driver distraction. Current Directions in Psychological Science. 2007;16:128–131. [Google Scholar]
- Stroop JR. Studies of interference in serial verbal reactions. Journal of Experimental Psychology. 1935;18:643–662. [Google Scholar]
- Tversky A, Kahneman D. The framing of decisions and the psychology of choice. Science. 1981;211:453–458. doi: 10.1126/science.7455683. [DOI] [PubMed] [Google Scholar]
- Vestergren P, Nilsson LG. Perceived causes of everyday memory problems in a population-based sample aged 39-99. Applied Cognitive Psychology. 2011;25:641–646. [Google Scholar]
- Wagenmakers EJ, Ratcliff R, Gomez P, Iverson G. Assessing model mimicry using the parametric bootsrap. Journal of Mathematical Psychology. 2004;48:28–50. [Google Scholar]
- Waldron EM, Ashby FG. The effects of concurrent task interference on category learning: Evidence for multiple category learning systems. Psychomomic Bulletin and Review. 2001;8:168–176. doi: 10.3758/bf03196154. [DOI] [PubMed] [Google Scholar]
- Worthy DA, Gorlick MA, Pacheco JL, Schnyer DM, Maddox WT. With Age Comes Wisdom: Decision-Making in Younger and Older Adults. Psychological Science. 2011;22:1375–1380. doi: 10.1177/0956797611420301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Worthy DA, Maddox WT. Age-based differences in strategy-use in choice tasks. Frontiers in Neuroscience. 2012;5(145):1–10. doi: 10.3389/fnins.2011.00145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Worthy DA, Maddox WT, Markman AB. Regulatory fit effects in a choice task. Psychonomic Bulletin & Review. 2007;14:1125–1132. doi: 10.3758/bf03193101. [DOI] [PubMed] [Google Scholar]
- Worthy DA, Maddox WT, Markman AB. Ratio and difference comparisons of expected reward in decision-making tasks. Memory & Cognition. 2008;36:1460–1469. doi: 10.3758/MC.36.8.1460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yechiam E, Busemeyer JR. Comparison of basic assumptions embedded in learning models for experience based decision-making. Psychonomic Bulletin & Review. 2005;12:387–402. doi: 10.3758/bf03193783. [DOI] [PubMed] [Google Scholar]
- Yechiam E, Busemeyer JR, Stout JC, Bechara A. Using cognitive models to map relations between neuropsychological disorders and human decision making deficits. Psychological Science. 2005;16:973–978. doi: 10.1111/j.1467-9280.2005.01646.x. [DOI] [PubMed] [Google Scholar]
- Zeithamova D, Maddox WT. Dual-task interference in perceptual category learning. Memory & Cognition. 2006;34:387–398. doi: 10.3758/bf03193416. [DOI] [PubMed] [Google Scholar]












