Skip to main content
The Journal of Neuroscience logoLink to The Journal of Neuroscience
. 2024 Jul 12;44(34):e2343232024. doi: 10.1523/JNEUROSCI.2343-23.2024

Dissociable Roles of the Dorsolateral and Ventromedial Prefrontal Cortex in Human Categorization

Matthew B Broschard 1,2,, Brandon M Turner 3, Daniel Tranel 2,4, John H Freeman 2
PMCID: PMC11340282  PMID: 38997159

Abstract

Models of human categorization predict the prefrontal cortex (PFC) serves a central role in category learning. The dorsolateral prefrontal cortex (dlPFC) and ventromedial prefrontal cortex (vmPFC) have been implicated in categorization; however, it is unclear whether both are critical for categorization and whether they support unique functions. We administered three categorization tasks to patients with PFC lesions (mean age, 69.6 years; 5 men, 5 women) to examine how the prefrontal subregions contribute to categorization. These included a rule-based (RB) task that was solved via a unidimensional rule, an information integration (II) task that was solved by combining information from two stimulus dimensions, and a deterministic/probabilistic (DP) task with stimulus features that had varying amounts of category-predictive information. Compared with healthy comparison participants, both patient groups had impaired performance. Impairments in the dlPFC patients were largest during the RB task, whereas impairments in the vmPFC patients were largest during the DP task. A hierarchical model was fit to the participants’ data to assess learning deficits in the patient groups. PFC damage was correlated with a regularization term that limited updates to attention after each trial. Our results suggest that the PFC, as a whole, is important for learning to orient attention to relevant stimulus information. The dlPFC may be especially important for rule-based learning, whereas the vmPFC may be important for focusing attention on deterministic (highly diagnostic) features and ignoring less predictive features. These results support overarching functions of the dlPFC in executive functioning and the vmPFC in value-based decision-making.

Keywords: category learning, human patient, neuropsychology, prefrontal cortex, rule-based

Significance Statement

Category learning creates flexible memory representations that easily generalize to novel situations. Although it is generally established that the prefrontal cortex is central to categorization, it is unclear how different prefrontal subregions contribute to learning. Separate literatures have implicated both the dorsolateral prefrontal cortex (dlPFC) and the ventromedial prefrontal cortex (vmPFC) in categorization, but there has been little effort to bridge these literatures. The current study is the first to examine categorization in patients with lesions centered in the dlPFC and vmPFC. We found that, as a whole, the PFC orients attention to relevant stimulus information. The dlPFC is important for rule-based learning, whereas the vmPFC is important for focusing attention on highly diagnostic features and ignoring less predictive features.

Introduction

It is well established that the prefrontal cortex (PFC) plays a pivotal role in categorization (Freedman et al., 2001; Broschard et al., 2021; Reinert et al., 2021). Converging evidence suggests that the PFC is important for identifying stimulus features that are predictive of an object's category membership (Miller et al., 2002; Mack et al., 2020). Accordingly, computational models of categorization simulate the PFC using mechanisms of selective attention (Ashby et al., 1998; Weichart et al., 2022a). Selective attention maximizes the separability between categories by stretching representations along relevant stimulus dimensions and attenuating representations along irrelevant dimensions. This sort of “warping” is strongly integrated with the learning process (Zhang et al., 2023) and dramatically influences how new stimuli are encoded and categorized (Rehder and Hoffman, 2005).

A critical gap in this literature lies in differentiating the functions of PFC subregions. Both the dorsolateral prefrontal cortex (dlPFC; Freedman et al., 2001; Cromer et al., 2010) and the ventromedial prefrontal cortex (vmPFC; Zeithamova et al., 2012; Bowman and Zeithamova, 2018) have been implicated in categorization. Both subregions are generally involved in directing attention to relevant information, but the mechanisms that mediate this function may be different for each subregion. The dlPFC is associated with rule-based learning and may be important for testing and evaluating category rules (Bunge, 2004; Seger and Cincotta, 2006; Antzoulatos and Miller, 2011; Mok and Love, 2022). This is supported by multiple experiments demonstrating that single dlPFC neurons are selective for abstract rules (Wallis et al., 2001; Mian et al., 2012) and everyday categories (Freedman et al., 2001; Cromer et al., 2010). Conversely, the vmPFC is associated with forming abstract category representations called prototypes (Bowman et al., 2020; Park et al., 2020). This is accomplished by utilizing mechanisms that integrate overlapping memory traces with existing representations (Koscik and Tranel, 2012; Benoit et al., 2014; Schlichting and Preston, 2016; Spalding et al., 2018). The current experiment focused on parsing the contributions of these subregions by examining categorization behavior in patients with PFC lesions.

We administered three categorization tasks to PFC patients with lesions primarily in the dlPFC or the vmPFC. Each participant completed a rule-based task (RB; Ashby and Maddox, 2011) that was solved via a unidimensional rule, an information integration task (II; Ashby and Maddox, 2011) that was solved by combining information from two dimensions, and a deterministic/probabilistic task (DP; Deng and Sloutsky, 2016) that was solved by attending to fully diagnostic features in the presence of features that provided partial category information. Models of categorization predict that the lateral PFC is important for learning RB tasks but not for II tasks (Ashby and Maddox, 2011); therefore, we predicted that the dlPFC patients would be impaired on the RB task. Additionally, Mack et al. (2020) used similar stimuli as the DP task and found that the vmPFC was critical for attending to diagnostic features; therefore, we predicted that the vmPFC patients would be impaired on the DP task. We found that both patient groups showed impaired learning on the category tasks compared with healthy normal comparisons (HCs). The dlPFC patients had larger impairments during the RB task, whereas the vmPFC patients had larger impairments during the DP task.

A computational model was used to better understand how each subregion contributed to learning. Two mechanisms were included that constrained how the model learned to orient attention optimally. We found that the PFC lesions were strongly correlated with a regularization parameter that dampened updates to attention after each trial. These models required more extensive training to learn to attend to category-relevant features. Together, these results suggest that the PFC is important for learning to orient attention to relevant stimulus information. For the dlPFC, this is especially critical for rule-based learning. For the vmPFC, this is especially critical to focus attention on deterministic features and to ignore less predictive features.

Materials and Methods

Participants

Ten participants with PFC lesions (five centered in the dlPFC and five centered in the vmPFC; five males and five females; Fig. 1) were selected and recruited from the Iowa Neurological Patient Registry. Each participant completed a comprehensive neuropsychological exam at least 3 months following lesion onset. Ten HCs were recruited from the Iowa City area. All procedures were approved by the Institutional Review Board at the University of Iowa. All participants provided informed consent in accordance with the Declaration of Helsinki and were compensated financially for participating.

Figure 1.

Figure 1.

Prefrontal patients. Lesion overlap for dlPFC patients (red; n = 5) and vmPFC patients (blue; n = 5). Darker colors indicate voxels with more overlap.

Each lesion was verified using magnetic resonance imaging or computerized tomography and was quantified using the “MAP-3” lesion method (Damasio and Frank, 1992; Frank et al., 1997). This involved manually tracing each lesion slice-by-slice onto a template brain by multiple experts. Each lesion trace was checked by an additional researcher to ensure consistency. Lesion volume was determined by dividing the number of voxels comprising the traced lesion from the number of voxels within the region of interest, bilaterally.

General overview

Each participant completed a battery of three categorization tasks using laptop computers (Dell, 14″ screen, 1,280 × 720 resolution). Participants were given a 20 min break in between tasks, and all three tasks were completed in ∼4 h. Each task included a training phase in which participants learned to sort abstract stimuli into two categories and a testing phase in which participants generalized to novel stimuli. On every trial, a single stimulus was shown on the computer screen. The participants decided the category membership of that stimulus by pressing one of two keys on the keyboard (Fig. 2A; i.e., “p” or “q”). Visual feedback (i.e., “Right!” or “Wrong!”) was presented following each decision. Intertrial intervals ranged from 0.8 to 1.2 s in 0.1 s intervals. Participants were not told any details about the stimuli or how to categorize them. All experimental procedures were controlled by custom-written scripts using MATLAB (MathWorks).

Figure 2.

Figure 2.

Methodology. A, Trial sequence for each task. Each trial was initiated by pressing the spacebar. Then, a category stimulus was presented on the screen, and the participant decided the category membership of that stimulus by pressing either “q” or “p” on the keyboard. Visual feedback (“Right!” or “Wrong!”) was presented after each decision. B, Stimulus sets used to examine RB and II categorization. The stimuli of each set varied along two continuous dimensions (left, the spatial frequency and orientation of black and white gratings; right, the size of black rectangles and the density of green pixels within them). All dimensions were scaled so that they ranged from 0 to 100. C, RB and II category tasks were created by placing normal distributions onto the two stimulus spaces. Each distribution constituted a category, and each point represented a unique category stimulus. The RB task had distributions perpendicular to a stimulus axis (top), and the II task had distributions that were rotated 45° (bottom). D, Testing stimuli (RB tasks, top; II tasks, bottom) were organized into grids to examine generalization to novel portions of the stimulus space. The dotted ellipses indicate the positions of the training distributions. E, Example stimuli for the DP task. The center circle provided deterministic category information, whereas the outer circles provided probabilistic category information (i.e., 75%). F, Example testing stimuli to examine generalization. “Trained” stimuli were the same stimuli presented during training. “Prototype” stimuli were the category's prototypes. “OnlyD” stimuli had the outer, probabilistic features obscured. “OnlyP” stimuli had the center, deterministic feature obscured. “Incongruent” stimuli had the deterministic features of one category and the probabilistic features of the other category.

Statistical analysis

Accuracy and choice reaction time were analyzed using linear mixed effects modeling (R, version 3.4.2). Full models for the training phases included fixed effects for experimental group, training bin (i.e., twenty trials per bin), and a quadratic function across bins, as well as random effects for slope, intercept, and the quadratic function. Full models for the testing phases included fixed effects for experimental group, trial type, and a linear function across trial types, as well as random effects for slope, intercept, and the linear function. A model simplification procedure was used to find the simplest model that fit the data. Specifically, random effects were systematically removed from the full model until the estimates were significantly different from the larger model before it (Broschard et al., 2019).

RB and II categorization

RB and II category tasks have become classic paradigms to investigate category learning (Ashby and Maddox, 2011; Smith et al., 2012; Broschard et al., 2019). In these tasks, participants learn to categorize distributions of visual stimuli that vary along two continuous dimensions. Multiple fMRI studies have shown increased BOLD activity in the PFC during RB and II category learning (Nomura et al., 2007; Carpenter et al., 2016). Multiple medial and lateral prefrontal regions have been implicated, and there is little consistency as to which prefrontal subregions are involved. By examining these tasks in PFC patients, we sought to identify which prefrontal subregions are necessary for RB and II categorization.

The RB and II tasks are created by positioning normal distributions onto a two-dimensional stimulus space (Fig. 2B,C). Each distribution constitutes a category, and each point within the distribution represents a category stimulus. In the RB task, the distributions are perpendicular to a stimulus axis (e.g., Dimension 1; Fig. 2C; µXA = 35.86; µXB = 64.14; σX = 4.04; µY = 50.00; σY = 18.86). This task is typically learned by creating a unidimensional rule according to that dimension. For the II task, the distributions are not aligned with either stimulus axis. This task is typically learned by combining information from both stimulus dimensions. In the current experiment, each participant learned to categorize both the RB task and the II task (160 trials for each task). To prevent interference between task types, we created two unique stimulus sets (see below). Therefore, each participant learned the RB task with one stimulus set and the II task with the other stimulus set. Stimulus set and task order were randomized across participants.

Each stimulus set contained stimuli that varied along two continuous dimensions (Fig. 2B). Stimulus set 1 was comprised by Gabor patches containing black and white gratings that varied in their spatial frequency (0.2532–1.2232 cpd) and orientation (0–1.75 radians). Stimulus set 2 was comprised by rectangles that varied in their overall size (200 pixels by 100 pixels to 400 pixels by 200 pixels) and the density of green pixels randomly dispersed within them (5% density to 30% density). These stimulus sets were selected from previous studies (Ashby and Maddox, 2011; Smith et al., 2013) and were determined to have roughly equal salience in a pilot experiment. Each stimulus dimension was linearly transformed so that they had a common range (i.e., 0–100).

Testing phases were administered to examine category generalization (160 trials per task). These stimuli were configured into grids that expanded the training distributions to include novel stimuli (Fig. 2D). Typically, stimulus generalization improves with increasing distance from the category boundary (Broschard et al., 2019). Nondifferential feedback was given to the novel stimuli to discourage further learning (Broschard et al., 2019).

Characterizing behavioral strategies

We used decision boundary modeling (Maddox and Ashby, 1993) as a model-based approach to characterize the behavioral strategies used by each participant. This approach assumes that participants categorize stimuli by placing a decision boundary somewhere on the stimulus space; stimuli on each side of the boundary are assigned a unique category label. The shape of the decision boundary informs the type of strategy used by the participant (e.g., a unidimensional strategy vs a bidimensional strategy). Typically, multiple boundary types are each fit to the participant's choice data, and the best fitting model infers the participant's strategy.

For the current experiment, we assessed three distinct strategy types. These included (1) unidimensional strategies (one for each stimulus dimension) that categorized stimuli according to one stimulus dimension, (2) a bidimensional strategy that combined information from both stimulus dimensions, and (3) a control model that assumed the participant was guessing randomly. The optimal strategy for the RB task was a unidimensional strategy, and the optimal strategy for the II task was the bidimensional strategy. All models were fit to the participants’ choice data using the MATLAB function fmincon (Hélie et al., 2017). Model fits were compared by calculating BIC values (Neath and Cavanaugh, 2011) for each model; the model with the smallest BIC value was assumed to be the best fitting model.

To complement this model-based approach, we asked participants to self-report their strategy after each task. Strategies were classified as “unidimensional” or “bidimensional” depending on the number of reported stimulus dimensions. A range of terminology was accepted for each stimulus dimension, assuming the answer generally described that dimension. For example, “spatial frequency” was commonly reported as “the number of lines” or “line thickness.” All unclear or task-irrelevant responses were classified as “random guessing.”

DP categorization

Stimuli in the DP category task contained a mixture of features that provide (1) perfect category information (i.e., deterministic features) and (2) imperfect category information (i.e., probabilistic features). By including probabilistic features, DP categories could be learned using a variety of strategies. For example, children typically distribute attention across the deterministic features and the probabilistic features, whereas adults focus on the deterministic features and ignore the probabilistic features (Deng and Sloutsky, 2016). The propensity to use selective attention in adulthood is attributed to a developed PFC. We tested this notion by examining DP categorization in PFC patients.

The stimuli used for this task were direct replications of the stimuli used in Castro et al. (2020), who examined category learning in human adults and pigeons. The training stimuli (n = 30) were organized in a hexagon configuration and contained seven circles (one circle at each node and one circle at the center; Fig. 2E). Each circle was assigned 1 of 14 unique colors. Each color was determined to be discriminable to humans in a previous report (Castro et al., 2020). The color of the center circle provided complete category information (deterministic; e.g., orange always referred to category “A” and blue always referred to category “B”). The colors of the outer six circles provided incomplete category information (probabilistic; each color was 75% predictive of one category and 25% predictive of the other category). Because the number of participants was limited, precluding full counterbalancing, the color used for the deterministic cue was always orange or blue (Fig. 2E).

Participants then completed a testing phase (120 trials) which probed novel stimuli. These novel stimuli were created by removing or rearranging features of the training stimuli (Fig. 2F). Generalization to these stimuli (or lack thereof) could inform which feature(s) were attended to. For example, “OnlyD” stimuli obscured the probabilistic features, and “OnlyP” stimuli obscured the deterministic feature. “Incongruent” stimuli contained conflicting information (e.g., the deterministic feature of category “A” but the probabilistic features of category “B”). Each of these trial types were compared with the training stimuli (“Training”) and the category prototypes (“Prototype”). Nondifferential feedback was given for the novel stimuli to prevent further learning (Broschard et al., 2019).

Similar to the RB and II tasks, each participant was asked to self-report their behavioral strategy. For each participant, we recorded whether the self-reported strategy included (1) the deterministic feature and (2) any of the probabilistic features. To complement these self-reports, we derived a measure that estimated the proportion of attention directed toward the deterministic feature compared with the probabilistic features. Specifically:

DetRatio=DP(DP+DA),

where DP is the participant's average accuracy for testing conditions in which the deterministic feature was present (i.e., Prototype, Trained, and OnlyD) and DA is the participant's average accuracy for testing conditions in which the deterministic feature was absent (i.e., OnlyP and Incongruent). Scores close to 1.0 or 0.0 suggest that attention was mainly directed toward the deterministic feature or the probabilistic features, respectively. Scores ∼0.5 suggest that attention was roughly equivalent between the deterministic and probabilistic features.

Computational modeling

We fit a computational model to the choice data from all three tasks simultaneously. This was leveraged to identify learning deficits in the patient groups.

Associative learning mechanism

When learning about categories, the learner's task is to assign a D-dimensional stimulus vector e(i) presented on the ith trial to one of C categories (i.e., category “A” or category “B”). To model the learning process, we assume new episodic traces are added to the underlying representation after every trial, a procedure consistent with the instance theory of automatization (Logan, 1988, 1992, 2002). This process is mathematically equivalent to a type of associative learning that strengthens the category representation at the location of the current stimulus. To do this, we assume that after each experience with a stimulus e(i), an episodic trace x(i)=[x1(i)x2(i)xD(i)] (i.e., an exemplar) is stored such that x(i)=e(i). Every experience is associated with feedback about the correct category membership f(i){1,2,C}.

Selective attention

Although learning can occur purely from the associative process that binds stimulus information to the correct category label, properly explaining learning effects in humans requires a more rapid learning process through selective attention. Here, a subset of stimulus features is encoded, stored, and used to make decisions. Each stimulus feature is assigned a nonnegative attention weight αj(i) which reflects the learned importance of that feature. Stimulus information from features with larger attention weights are prioritized and contribute more to each category decision (Kruschke, 1992; Galdo et al., 2022).

Stimulus categorization

When making decisions about category membership, a key process is comparing the current stimulus to the set of stored experiences stored in memory. We used the similarity kernel used by the well-established generalized context model (Nosofsky, 1986), which defines that distance between stimulus e(i) and the nth exemplar on the jth feature as follows:

dj(e(i),x(n))=αj(i)|ej(i)xj(n)|,

αj(i) denotes the selective attention applied on trial i to dimension j. Then, the psychological similarity between the stimulus and the exemplar is defined as follows:

sj(e(i),x(n))=e[dj(e(i),x(n))].

The overall activation of the nth exemplar is determined by multiplying these similarities across features. Specifically, activation is defined as follows:

a(e(i),x(n))=jsj(e(i),x(n)).

The probability of assigning the current stimulus to category “c” is defined as the ratio of exemplar activations associated with that category label and the exemplar activations of all categories. Specifically:

P(''c'')=na(e(i),x(n))I(f(n)=c)na(e(i),x(n)),

where I(a) is an indicator function returning a 1 if condition a is true and a 0 if condition a is not true.

Attention as an optimization problem

At the beginning of training, the attention vector α is equivalent for all stimulus features (i.e., distributed attention across all stimulus features). Like the ALCOVE model (Kruschke, 1992), we assume that attention is updated after each trial to maximize accuracy. This process is formalized by defining error as a function of α, denoted loss(α). Then, α is optimized using gradient descent. In the AARM model (Galdo et al., 2022; Weichart et al., 2022b), gradient descent would iteratively adjust α so as to minimize the loss function:

|αj(i+1)=αj(i)γαj(i)loss(αj(i)),

where γ>0 is a learning rate parameter. To specify the loss function, ALCOVE used the humble teaching rule, which is a modified version of a sum of squared error function. In Galdo et al. (2022), we used a cross-entropy loss function, which is more widely used and connected to statistical and machine learning norms of classification (Goodfellow et al., 2018). When using a Luce choice rule (e.g., a variant of a softmax rule), the cross-entropy loss function for a single trial is simply the negative log likelihood of making the correct categorization decision on that trial (Goodfellow et al., 2018). Hence, the above equation can be modified so that α maximizes the probability of making correct responses:

αj(i+1)=αj(i)+γαj(i)log(P(correct)),

where P(correct) denotes the probability of making the correct category response (i.e., a response that is consistent with the feedback).

Regularization and inhibition

As discussed in Galdo et al. (2022), learners often use efficient learning mechanisms to help guide selective attention over time. These mechanisms are consequences of the notion that attention is a limited resource and must be allocated efficiently (Chun et al., 2011). Two such strategies we investigated here were (1) regularization λ and (2) inhibition β. Regularization is the tendency to simplify a learning problem by attending to as few features as possible. This is simulated by biasing the attention updates to zero. A large regularization term restricts attention to only features that strongly predict an object's category membership. Inhibition is the tendency for stimulus features to compete through lateral inhibition. For instance, increasing attention to one feature corresponds to a decrease in attention to another, less predictive feature. As shown in Galdo et al. (2022), we can implement these two mechanisms by adding them to the updates to each αj(i) as follows:

αj(i+1)=αj(i)+γ[αj(i)log(P(correct))λ]β[kjαk(i)log(P(correct))λ(D1)],

where D is the total number of dimensions. Here we can see that LASSO regularization simply pushes down updates to attention, whereas inhibition allows the dimensions to compete for attention—dimensions with larger gradients will gain more attention and suppress dimensions with smaller gradients.

The hierarchical model

We develop a hierarchical version of this model, which has parameters at two levels: parameters that are associated with each participant and parameters that are associated with the group-level distributions. At the participant level, we subscript each parameter with an s indicating that it is the parameter associated with the sth participant. Each subject contained five free parameters: a learning rate parameter γS that determined the speed of learning, an inhibition term βS, a regularization term λS, and two initial attention starting point terms for continuous αS0,C and discrete αS0,D features, respectively. The initial attention starting point parameters determined the initial prioritization of certain stimulus dimensions (i.e., discrete or continuous).

Letting RCS denote the response data for the sth participant, where RCSi denotes the sth participant's response on the ith trial, the likelihood can be written as follows:

L(ΘS|RCS)=iBernoulli(RCSi|P(RCSi)),

where ΘS contains all of the model parameters associated with the sth participant and P(RCSi) is the model's predicted probability of making the same response as the participant on Trial i (i.e., RCSi).

At the group level, we make appropriate transformations for each parameter such that their support ranges from (,). On this scale, we can specify how the participant-level parameters are connected to the group-level parameters. We use a normal distribution with mean μa and standard deviation σa denoted N(μa,σa) for each parameter and supply a corresponding subscript for each μ and σ. For example, the group-level mean for the learning rate parameter γ is denoted μγ, and the group-level standard deviation is σγ. Using this notation, the prior distributions for each participant-level parameter is as follows:

log(γS)N(μγ,σγ),
log(βS)N(μβ,σβ),
log(λS)N(μλ,σλ),
αS0,CN(μα0,C,σα0,C),
αS0,DN(μα0,D,σα0,D).

To complete the hierarchical model in a Bayesian framework, we must specify prior distributions for each of the group-level parameters. Although we did convey some information in the priors, we tried to remain agnostic because we wanted to avoid unduly controlling model parameters for patients from the lesion group. As such, we specified the following priors for the group-level means:

μγN(log(0.1),5),
μβN(log(0.001),5),
μλN(log(0.001),5),
μα0,CN(0,5),
μα0,DN(0,5).

For the group-level standard deviations, we simply used an inverse gamma distribution denoted Γ1(a,b) with shape a and rate b, such that:

σγ,σβ,σλ,σα0,C,σα0,D,Γ1(4,30).

The choice of a normal prior distribution for the group means and an inverse gamma distribution for the standard deviations was made for convenience because these choices facilitate a Gibbs sampling algorithm that was used to estimate the joint posterior distribution.

Given the likelihood and the full specification of priors, we can combine all of these terms to specify the joint posterior distribution. Letting π(a) denote the prior distribution corresponding to a given parameter a, Φ denote the set of all group-level parameters, and Θ denote the matrix of all participant-level parameters, the full joint posterior distribution of all model parameters is the following:

π(Φ,Θ|RC)π(Φ)Sπ(ΘS|Φ)L(ΘS|RCs).

Estimating the posterior distribution

Although we have fully specified the hierarchical model in writing the joint posterior distribution, estimating that distribution requires some numerical estimation techniques. To do this, we used a Gibbs sampling scheme to first draw samples from the conditional distribution of the group-level parameters given the participant-level parameters (i.e., π(ϕ|ΘS)) and then draw samples from the conditional distribution of the participant-level parameters given the group-level parameters (i.e., π(ΘS|ϕ)). To draw samples in the first step, we derived an analytic solution for the conditional distribution because the choice of a normal prior for the group-level mean and an inverse gamma distribution for the standard deviation are conjugate priors in a Gibbs sampling scheme (i.e., the mean conditional on the standard deviation). This made estimating the joint posterior distribution dramatically more efficient. For the second step, we used differential evolution with Markov chain Monte Carlo (DE-MCMC; ter Braak, 2006; Turner et al., 2013) to estimate the conditional distribution of the participant-level parameters given the group-level parameters. We used 24 chains operating in parallel for 1,000 iterations and discarded the first 100 iterations based on visual inspection. Hence, we used 21,600 samples as an estimate of the joint posterior distribution. We also first used the optim package in R to first obtain sensible parameter values from the likelihood function for each subject. We ran optim for only 500 iterations to avoid inflating the standard deviation terms at the group level, and then we initialized our 24 chains by randomly perturbing the initial estimates. The combination of using a short optim run and then perturbing chains around these estimates gave us sensible starting points and enough diversity in the distribution of the chains to allow the particle interactions within the DE-MCMC algorithm to effectively explore the parameter space.

Results

We recruited 10 patients with brain lesions centered in the dlPFC (n = 5; 3 men, 2 women) or vmPFC (n = 5; 2 men, 3 women). Figure 1 shows the lesion overlap for the dlPFC patients and the vmPFC patients; the location of maximum lesion overlap matched their assigned group (i.e., dlPFC and vmPFC, respectively). Each patient completed a comprehensive neuropsychological exam at least 3 months after lesion onset (Table 1). HCs (n = 10, 6 women) were recruited from the Iowa City area and were group matched to the vmPFC and dlPFC patients on age, gender, and education (mean years of age = 67.80; SEM = 3.82; mean years of education = 14.10; SEM = 0.52). There were no significant differences on any demographic variable between the PFC patients and the HCs (all ps > 0.05). Generally, the patients with bilateral lesions had larger impairments than patients with unilateral lesions; however, there were not sufficient numbers of participants to conduct formal analyses of bilaterality versus unilaterality. Two vmPFC patients had lesions that extended into the dlPFC. These patients generally had larger impairments than the patients with lesions contained within the vmPFC; however, the inclusion of these patients did not appreciably change the accuracy curves. Lastly, the accuracy curves were roughly equal for men and women participants; there were not sufficient numbers of participants to conduct formal analyses of men versus women.

Table 1.

Neuropsychological test scores for each patient at least 3 months after lesion onset

Group Etiology Lesion onset dlPFC damage vmPFC damage Gender Age Education FSIQ IGT CD-AB WSCT Cat Stroop interference Token test BDI
dlPFC R ischemic stroke 2007 0.04 0 M 75 13 109 N/A 1 N/A 42 6
dlPFC Bi AVM resection 2010 0.24 0 W 48 13 107 18 6 48 44 3
dlPFC L ischemic stroke 1957 0.03 0 W 75 13 107 N/A N/A N/A 43 N/A
dlPFC L ischemic stroke 2003 0.02 0 M 77 16 108 52 6 43 44 6
dlPFC Bi ischemic stroke 2001 0.11 0 M 47 12 87 N/A 0 N/A N/A 4
vmPFC Bi benign tumor resection 1999 0.24 0.66 W 75 13 118 −44 6 52 43 4
vmPFC Bi subarachnoid hemorrhage; aneurysm clipping 1981 0 0.28 W 72 14 109 −14 6 55 44 1
vmPFC L subarachnoid hemorrhage; aneurysm clipping 1998 0 0.29 M 71 16 104 −14 6 38 40 3
vmPFC Bi benign tumor resection 1999 0.37 0.51 M 81 14 119 −50 6 65 44 0
vmPFC R benign tumor resection 2008 0.03 0.18 W 75 16 120 72 6 41 44 0

FSIQ, Full Scale Intelligence Quotient from the WAIS-IV; IGT, Iowa Gambling Task Net Score from Advantageous decks minus Disadvantageous decks; WSCT, Wisconsin Card Sorting Task Number of Categories (out of six); BDI, Beck Depression Inventory-II.

RB and II categorization

The RB and II tasks contained distributions of visual stimuli that varied along two continuous dimensions (Fig. 2C; Ashby et al., 1998; Smith et al., 2012; Broschard et al., 2019). RB tasks are typically learned by creating a unidimensional rule according to one dimension. II tasks are typically learned by combining information from both dimensions (Ashby and Maddox, 2011).

RB and II category learning

Two stimulus sets were used in the current experiment to allow for a within-subjects design. Each participant learned the RB task with one stimulus set and the II task with the other stimulus set. The stimulus sets and task order were randomized across participants. These counterbalances did not affect performance (Stimulus set. RB task: t(8.00) = −0.69, p = 0.510; II task: t(8.00) = −2.03, p = 0.086. Task order. RB task: t(8.00) = 1.72, p = 0.123; II task: t(8.00) = 0.20, p = 0.846).

Across all participants, performance improved significantly across training trials for both task types, suggesting that the participants reliably learned the category tasks. Specifically, accuracy increased across training bins (Fig. 3A; RB tasks: t(154.41) = 3.55, p < 0.001; II tasks: t(155.03) = 2.63, p = 0.009), and choice reaction time decreased across training bins (RB tasks: t(157.04) = −3.31, p = 0.001; II tasks: t(156.96) = −2.97, p = 0.003). Performance was significantly better for the RB task compared with that for the II task (accuracy: t(322.91) = 52.10, p < 0.001; choice reaction time: t(322.23) = −42.80, p < 0.001). This learning advantage for the RB task has been observed in previous studies (Smith et al., 2012) and suggests that humans use simple category rules.

Figure 3.

Figure 3.

RB and II categorization. A, Accuracy for the RB tasks (left) and the II tasks (right). The dlPFC patients and the vmPFC patients were impaired to learn both task types compared with the HCs. Impairments in the RB task were larger for the dlPFC patients compared with the vmPFC patients. B, Choice reaction time for the RB tasks (left) and the II tasks (right). The dlPFC patients, and not the vmPFC patients, had longer reaction time for both task types compared with the HCs. C, The proportion of a category “A” response for the testing stimuli at each distance along the axis perpendicular to the category boundary. The slopes of the generalization curves were shallower for the patient groups compared with the HCs. Similar to the training phase, impairments in the RB task were larger for the dlPFC patients compared with the vmPFC patients. All error bars indicate SEM. * indicates statistical significance (p < 0.05).

Accuracy was significantly lower in both patient groups and the HCs (Fig. 3A). Impairments were significant for both task types (RB tasks. dlPFC: t(151.31) = −11.21, p < 0.001; vmPFC: t(151.31) = −2.66, p = 0.009. II tasks. dlPFC: t(127.07) = −5.29, p < 0.001; vmPFC: t(127.07) = −3.74, p < 0.001), suggesting that both prefrontal subregions are important for RB and II category learning. Impairments in the RB task were significantly larger for the dlPFC patients compared with those for the vmPFC patients (t(151.73) = −7.99, p < 0.001), suggesting that the dlPFC is especially critical for rule-based learning. Differences between the patient groups during the II task were not significant (t(153.93) = −1.37, p = 0.172).

Choice reaction time was significantly longer for the dlPFC patients compared with that for the HCs (Fig. 3B). This was true for both task types (RB task: t(157.59) = 6.29, p < 0.001; II task: t(163.96) = 2.36, p = 0.019), suggesting that the dlPFC patients required more time to make each category decision. There were no significant differences in choice reaction time between the HCs and the vmPFC patients (RB task: t(157.59) = 1.80, p = 0.074; II task: t(163.96) = 1.05, p = 0.296). This suggests that the reaction time effects were specific to the dlPFC patients. Differences in reaction time between the patient groups were significant during the RB task (t(156.21) = −4.22, p < 0.001), but not the II task (t(155.18) = 1.27, p = 0.207). Together, these results suggest that both subregions are critical for RB and II category learning. The dlPFC may be especially important for rule-based learning as well as making category decisions.

RB and II category generalization

Testing blocks were included to examine category generalization in each task (Fig. 2D). The stimuli used were organized into a grid that expanded the training distributions to probe novel portions of the stimulus space. For each participant, we quantified the proportion of category “A” responses at each distance along the axis perpendicular to the category boundary (i.e., the category-relevant axis). As expected, the probability of choosing category “A” dramatically decreased along this axis (RB task: t(161.00) = 5.38, p < 0.001; II task: t(161.00) = 4.18, p < 0.001) with sharp inflection points at the category boundaries (Fig. 3C). Compared with the HCs, the slopes of the generalization curves were significantly lower for both patient groups and each task type (RB task: dlPFC, t(161.00) = −8.89, p < 0.001; vmPFC, t(161.00) = −3.61, p < 0.001. II task: dlPFC, (161.00) = 4.30, p < 0.001; vmPFC, t(161.00) = 2.32, p = 0.022), suggesting that category generalization was impaired. Similar to training, the dlPFC patients had significantly larger impairments than the vmPFC patients during the RB task (t(161.00) = −4.51, p < 0.001), but not the II task (t(161.00) = 1.69, p = 0.093). Together, these results are consistent with the training results and suggest that both subregions are important for RB and II categorization. Additionally, the dlPFC may be especially critical for rule-based categorization.

Behavioral strategies during RB and II category learning

We used decision boundary modeling to characterize the behavioral strategies used during RB and II categorization (Hélie et al., 2017). This was accomplished by fitting three unique strategy types to each participant's training data: unidimensional models that sorted stimuli along one stimulus dimension, a bidimensional model that sorted stimuli along two dimensions, and a control model that assumed the participant was guessing randomly. The winning strategy for each participant was determined according to the model with the lowest BIC value.

Table 2 shows the best fitting strategy for each participant. The majority of the HCs used the tasks’ optimal strategies (i.e., a unidimensional strategy for the RB task and the bidimensional strategy for the II task). This was also true for the vmPFC patients [vmPFC patients vs HCs. RB task: X2(1, N = 18) = 1.80, p = 0.180; II task: X2(1, N = 18) = 0.00, p = 1.00]. The dlPFC patients were significantly less likely to use the optimal strategies compared with the HCs [RB task: X2(1, N = 18) = 10.13, p = 0.001; II task: X2(1, N = 18) = 4.00, p = 0.046]. These results suggest that learning impairments in the dlPFC group may be attributed to an inability to find and maintain the appropriate strategy.

Table 2.

Strategies used for each task determined by decision boundary modeling and self-report

Group RB task II task DP task
DBM Self-report DBM Self-report DBM Det ratio
dlPFC 1D R 1D R 1D 1D Center 0.77
dlPFC 1D I 1D I RGM RGM Center 0.76
dlPFC RGMa 2Da 2D 2D Center 0.8
dlPFC RGM RGM RGM RGM Outer and center 0.65
dlPFC RGM RGM 1D 1D Center 0.71
vmPFC 1D R 1D R 2D 2D Outer and center 0.46
vmPFC 1D R 1D R 1D 1D Outer 0.41
vmPFC RGM RGM 2D 2D Center 0.8
vmPFC 1D R 1D R 1D 1D Outer and center 0.54
vmPFC 1D R 1D R 2D 2D Center 0.83
HC 1D R 1D R 2D 2D Center 0.77
HC 1D R 1D R 2D 2D Center 0.8
HC 1D R 1D R 1D 1D Center 0.77
HC 1D R 1D R 1Da 2Da Outer and center 0.56
HC 1D R 1D R 2D 2D Outer and center 0.67
HC 1D R 1D R 2D 2D Center 0.91
HC 1D R 1D R 2D 2D Center 0.85
HC 1D R 1D R 1Da 2Da Outer and center 0.61
HC 1D R 1D R 2D 2D Center 0.83
HC 1D R 1D R 2D 2D Center 0.8
HC 1D I 1D I RGMa 1Da Center 0.78
HC RGMa 1D Ra 2D 2D Center 0.8

Gray boxes indicate the correct strategy. DBM, decision boundary modeling; RGM, random guessing model; 1D, unidimensional strategy; 2D, bidimensional strategy; 1D R, unidimensional strategy along the relevant stimulus dimension; 1D I, unidimensional strategy along the irrelevant stimulus dimension; Det, deterministic cue; Prob, probabilistic cues. Det ratio is the ratio of accuracy for conditions that the deterministic feature was present (i.e., Prototype, Trained, and Only D) and conditions that the deterministic feature was absent (i.e., Only P and Incongruent). Values closer to 1 indicate larger attention to the deterministic feature.

a

Indicates participants where the DBM results conflict with the self-report.

To support these results, we asked each participant to self-report their strategy after each task. Importantly, the self-reported strategies were strongly consistent with the results of the decision boundary modeling. The dlPFC patients (and not the vmPFC patients) were less likely to report the tasks’ optimal strategies compared with the HCs [Fig. 3D; RB task: dlPFC, X2(1, N = 18) = 10.52, p < 0.001; vmPFC, X2(1, N = 18) = 0.28, p = 0.596. II task: dlPFC, X2(1, N = 18) = 6.78, p = 0.009; vmPFC, X2(1, N = 18) = 1.80, p = 0.180]. This supports our finding that the dlPFC patients had difficulty finding the correct strategies.

DP categorization

Next, we examined performance on the DP task, which tested stimuli that contained a mixture of features that (1) always predicted category membership (i.e., deterministic) and (2) provided partial category information (i.e., probabilistic; Fig. 2E).

DP category learning

For all participants, performance improved across training bins (Fig. 4A,B; accuracy: t(113.91) = 3.20, p = 0.002; choice reaction time: t(113.26) = −5.01, p < 0.001), suggesting that the participants reliably learned the DP task. Compared with the HCs, performance was impaired for the vmPFC patients (accuracy: t(121.87) = −2.87, p = 0.005; choice reaction time (t(121.55) = 2.10, p = 0.027), but not the dlPFC patients (accuracy: t(121.87) = −1.65, p = 0.102; choice reaction time: t(121.55) = −0.61, p = 0.540). This suggests that the vmPFC, and not the dlPFC, is important for learning the DP task. Performance differences between the vmPFC patients and the dlPFC patients were not significant (accuracy: t(112.03) = 1.18, p = 0.242; choice reaction time: t(109.81) = −1.63, p = 0.106).

Figure 4.

Figure 4.

DP categorization. A, The vmPFC patients, and not the dlPFC patients, were impaired to learn the DP task compared with the HCs. B, The vmPFC patients, and not the dlPFC patients, had longer choice reaction time during training compared with the HCs. C, Generalization for the HCs and dlPFC patients was consistent with a strategy that focused attention to the deterministic feature. Specifically, generalization was high when the deterministic feature was present (i.e., “Trained”, Prototype, “OnlyD”) and low when the deterministic feature was absent (i.e., “OnlyP”) or provided incongruent information (i.e., “Incongruent”). The vmPFC patients were more likely to distribute attention across all features. Specifically, the vmPFC patients had larger accuracy when generalization relied on the probabilistic features (i.e., “OnlyP” and “Incongruent”). All error bars indicate SEM. * indicates statistical significance (p < 0.05).

DP category generalization

Testing blocks examined category generalization by presenting novel stimuli that obscured or rearranged the feature information of the training stimuli (Fig. 2F). Specifically, “OnlyD” stimuli removed the probabilistic features, and “OnlyP” stimuli removed the deterministic feature. “Incongruent” stimuli contained conflicting information (e.g., the deterministic feature of category “A” and the probabilistic features of category “B”). These trial types were compared with the training stimuli (“Trained”) as well as the category prototypes (“Prototype”).

For the HCs, performance was consistent with the strategy to direct attention toward the deterministic feature (Fig. 4C). Specifically, generalization was high for stimuli that included the deterministic feature (i.e., “Trained” vs “Prototype”: t(100.57) = 0.60, p = 0.552; “Trained” vs “OnlyD”: t(100.57) = −0.15, p = 0.882), and generalization was low for stimuli in which the deterministic feature was absent (“Trained” vs “Only P”: t(100.57) = −5.08, p < 0.001; “Trained” vs “Incongruent”: t(100.57) = −10.26, p < 0.001). There were no significant differences between the HCs and the dlPFC patients (all p > 0.05), suggesting that the dlPFC patients also directed attention toward the deterministic feature.

Stimulus generalization for the vmPFC patients was consistent with a strategy that distributed attention across all stimulus features (Fig. 4C). Specifically, accuracy in the vmPFC patients was significantly higher than the other groups when (1) the deterministic feature was removed (i.e., “OnlyP” stimuli; vmPFC patients vs HCs: t(100.57) = 2.03, p = 0.041; vmPFC patients vs dlPFC patients: t(100.57) = −2.45, p = 0.034) and (2) the deterministic feature provided incongruent information (“Incongruent”: vmPFC vs HCs: t(100.57) = 3.03, p = 0.003; vmPFC vs dlPFC: t(100.57) = −3.04, p = 0.003). All other trial types were not significant (all p > 0.05). This suggests that impairments in the vmPFC patients may be attributed to an inability to focus attention to the deterministic feature.

Behavioral strategies used during the DP task

Table 2 shows the self-reported strategies during the DP task. All participants were able to verbalize the colors of the feature(s) they used to categorize the stimuli. We recorded whether each participant's strategy included a description of the deterministic features and/or the probabilistic features. As predicted, the majority of the HCs and the dlPFC patients only reported using the deterministic feature. Conversely, the vmPFC patients were more likely to report using both the deterministic feature and the probabilistic features.

To assess whether these self-reported strategies were related to the participants’ behavior, we derived a proxy measure that estimated the proportion of attention directed toward the deterministic feature. Scores close to 1 suggest the participants were focusing attention to the deterministic feature, whereas scores close to 0.5 suggest the participants were attending to both the deterministic and probabilistic features equally. To validate this measure, we first compared scores of participants with different strategy types. Scores were significantly smaller for participants that self-reported using both deterministic and probabilistic features than participants that self-reported using only the deterministic feature (t test; t(20) = 8.11, p < 0.001), suggesting that the participants’ self-reported strategies were strongly related to their behavior. The vmPFC patients, but not the dlPFC patients, had significantly lower scores than the HCs (vmPFC: t(15) = 2.19, p = 0.045; dlPFC: t(15) = 0.50, p = 0.622; vmPFC vs dlPFC: t(8) = 1.43, p = 0.191), suggesting that the vmPFC patients directed less attention toward the deterministic feature. These results support our interpretation that the impairments in the vmPFC group were attributed to an inability to focus attention to the deterministic feature.

Hierarchical modeling

We used hierarchical modeling to examine learning deficits in the patient groups. For each participant, the model was fit to data from all three categorization tasks. The model included five free parameters: a learning rate parameter, a regularization parameter, a competition parameter, and two parameters that set the starting values of the attention weights (i.e., one for continuous dimensions and one for discrete features).

Figure 5A shows the average model estimates from the training and testing data of each task. Our model successfully captured the participants’ behavior in all tasks. There were slight differences between the participants’ behavior and the model; however, the model replicated critical dissociations in the patient groups. Specifically, impairments for the dlPFC patients were largest during the RB task, and impairments for the vmPFC patients were largest during the DP task.

Figure 5.

Figure 5.

Computational modeling. A, Average model estimates for the training and testing phases of each task. The hierarchical model successfully fit the participants’ behavior and captured dissociations between the dlPFC patients and the vmPFC patients. B, Pairwise correlations between the model's parameter estimates and the proportion of damage in the dlPFC and vmPFC. There were strong positive correlations between PFC damage and the model's regularization parameter, suggesting that the PFC damage limited the ability to update attention after every trial. Additionally, there were negative correlations between dlPFC damage and the initial attention weights for continuous dimensions as well as between vmPFC damage and the initial attention weights for discrete dimensions. Note: The parameter estimates from the healthy comparisons are shown in the figure only as a reference to allow for visual comparison to the other groups; the estimates were not used in the correlation analysis presented in the main text.

We next tested whether the model's parameter estimates were related to the size of each PFC lesion (Fig. 5B). For each model parameter, we calculated Pearson's correlation between the estimated parameter values and the amount of PFC damage. Correlations were calculated for the proportion of dlPFC damage and vmPFC damage separately, and the HCs were not included in the correlations. Each correlation included all 10 patients to maximize statistical power. By collapsing the patients across the assigned groups, we obtained a more generalized relationship between the parameters and each PFC subregion, as many patients had nonzero damage to both regions.

There were strong positive correlations between the proportion of PFC damage and the regularization parameter. The regularization parameter dampened updates to attention after each trial. With strong regularization, it is difficult for the model to learn to attend to relevant stimulus information. Additionally, there were negative correlations between the proportion of PFC damage and the initial starting values of the attention weights. These effects were task specific, such that lower attention to the continuous dimensions was correlated with larger dlPFC damage and lower attention to the discrete features was correlated with larger vmPFC damage. These parameters determine the initial prioritization of certain dimensions. This suggests that the dlPFC may be specialized for learning about continuous dimensions and the vmPFC may be specialized for learning about discrete dimensions. All other parameters were not correlated with PFC damage (all ps > 0.05).

Discussion

It is generally accepted that the PFC is critical for category learning; however, there are few studies comparing the roles of different prefrontal subregions. To our knowledge, the current experiment is the second study to examine category learning in PFC patients (Schnyer et al., 2009) and the first to dissociate the prefrontal subregions. We administered a battery of categorization tasks to PFC patients with lesions centered in the dlPFC or vmPFC. Our results suggest that both subregions are critical for categorization. However, each subregion may be specialized to learn specific task structures. The dlPFC was especially important for rule-based learning, whereas the vmPFC was especially important for attending to fully diagnostic features while ignoring less predictive features. Hierarchical modeling of the participant's choice data found that PFC damage was strongly correlated with an inability to learn to attend category-relevant stimulus information.

RB and II categorization

During the RB and II tasks, participants learned to categorize distributions of visual stimuli according to a unidimensional rule (RB) or by combining information from two stimulus dimensions (II). The dlPFC and vmPFC patients had impaired performance in both task types compared with the HCs (Fig. 3).

Categorization models posit that the PFC is important for learning RB tasks (and not II tasks), because rule-based learning requires executive functions to select and apply category rules (Ashby and Maddox, 2011). This hypothesis is supported by multiple behavioral experiments showing that disrupting executive functions like working memory differentially affects RB learning (Maddox and Ashby, 2004; Ashby and Maddox, 2011). Additionally, Nomura et al. (2007) found that PFC BOLD activity was stronger for correct trials than that for incorrect trials in participants learning the RB tasks, but not the II tasks. In the current experiment, the dlPFC patients had much larger impairments during the RB tasks than the II tasks, supporting the prediction that the dlPFC is especially important for rule-based learning.

Conversely, neural dissociations between the RB and II tasks are not always definitive. Neuroimaging (Carpenter et al., 2016) and neuropsychological (Schnyer et al., 2009) experiments have shown that the PFC is also recruited to learn the II tasks. Additionally, Roark and Chandrasekaran (2023) showed that working memory capacity was positively correlated with accuracy for both tasks, suggesting that executive functions also benefit learning the II tasks. In the current experiment, there were multiple impairments that were consistent across task types. First, the dlPFC patients had slower reaction times, suggesting the dlPFC may serve a general role in category-based decision-making (Heffernan et al., 2021; Hutcherson and Tusche, 2022). Second, the dlPFC patients were less likely to use the tasks’ optimal strategies, supporting the role of the PFC in maintaining goal-directed strategies (Barraclough et al., 2004; Genovesio et al., 2005). Together, our results indicate that the dlPFC may be especially important for learning the RB tasks, but the dlPFC and vmPFC were both critical to learn the RB and II tasks.

DP categorization

The DP task contained stimuli with a mixture of deterministic features (i.e., perfect category information) and probabilistic features (i.e., imperfect category information). The vmPFC patients (and not the dlPFC patients) were impaired during this task compared with the HCs (Fig. 4). Accuracy was not significantly different between the dlPFC patients and the vmPFC patients, suggesting that this dissociation was not as strong as it was for the RB task.

Adding probabilistic features allowed for multiple viable learning strategies (e.g., selective attention toward the deterministic feature or distributed attention across all features). The testing phase of this task allowed us to differentiate between these strategies by probing stimuli that selectively removed either the probabilistic or deterministic features. Generalization behavior for the HCs and the dlPFC patients was consistent with a strategy that focused attention onto the deterministic feature. Generalization was high when the deterministic feature was present, and generalization was low when the deterministic feature was absent. Conversely, the vmPFC patients were more likely to distribute attention across all features. It has been hypothesized that using selective attention in the DP task is attributed to a developed PFC (Deng and Sloutsky, 2016; Castro et al., 2020). The current experiment provides the first definitive evidence supporting this hypothesis.

There are many potential reasons for why the vmPFC patients did not maintain attention to the deterministic feature. One possibility is that these patients simply did not learn that the deterministic feature was category relevant. This possibility seems unlikely, since most of the vmPFC patients included the deterministic feature in their self-reported strategy (Table 2). Instead, we predict that the vmPFC patients were unable to prioritize the deterministic feature over the probabilistic features. This could have resulted from a deficit to accurately predict the relative value associated with each feature (Fellows and Farah, 2007; Gläscher et al., 2009). These patients could also have deficits in decision-making, such that the attention weights were not correctly assigned to each feature (Bechara et al., 2000; Kroker et al., 2022).

Hierarchical modeling

Hierarchical modeling was used to further assess learning deficits in the patient groups. For each participant, we modeled behavior from all three tasks; therefore, parameter differences between the patients and HCs were likely to be attributed to underlying learning deficiencies rather than task-specific effects. The model successfully fit the participants’ behavior and captured critical dissociations between the patient groups (Fig. 5).

This model included two mechanisms (i.e., regularization and competition) that constrained how attention was learned. The proportion of PFC damage was strongly correlated with the regularization parameter, which limited attention updating by biasing the attention weights toward zero. These results suggest that the impairments in the patient groups were likely attributed to an inability to learn to orient attention to category-predictive stimulus. This is consistent with the general role of the PFC to direct behavior according to task-relevant goals (Freedman et al., 2001; Mack et al., 2020).

Damage to the dlPFC and the vmPFC were both correlated with this regularization term, suggesting that there was functional overlap between the subregions. This supports the findings of Weichart et al. (2022a), who argue that multiple brain structures work together to update attention. Specifically, they showed that trial-by-trial updates to a participant's attention weights were correlated with BOLD activity in a variety of brain regions, including the PFC, parietal cortex, hippocampus, and visual cortex. We suspect that the dlPFC and vmPFC both contribute to an entire network that orients attention to relevant information (e.g., the frontoparietal control network; Spreng et al., 2010; Zanto and Gazzaley, 2013).

Specialized roles for the dlPFC and vmPFC

Despite the functional overlap in the modeling results, there were dissociations between the patient groups that suggest each subregion contributed a specialized role to category learning (Gläscher et al., 2012). Generally, dlPFC patients had larger impairments during the RB and II tasks, whereas the vmPFC patients had larger impairments during the DP task. This difference was captured in the computational model via the parameters that estimated the attention starting points (Fig. 5B). Lesion size in the dlPFC was negatively correlated with initial attention to continuous stimulus dimensions (e.g., RB and II tasks), whereas lesion size in the vmPFC was negatively correlated with initial attention to discrete stimulus features. These parameters determine the initial importance of certain stimulus dimensions, suggesting that the dlPFC may prioritize learning about continuous dimensions and the vmPFC may prioritize learning discrete dimensions.

Specialized functioning of the dlPFC and vmPFC may depend on differences in connectivity to the rest of the brain. For instance, category learning models of the dlPFC often highlight interactions with the basal ganglia (Ashby et al., 1998), where the basal ganglia learn simple associations and trains the dlPFC to abstract overarching rules (Antzoulatos and Miller, 2011). Conversely, models of the vmPFC highlight interactions with the hippocampus (Love and Gureckis, 2007; Mack et al., 2018), where the hippocampus stores memory representations and the vmPFC manipulates these representations according to task goals. Future experiments can examine these systems and test whether they communicate with separate prefrontal subregions.

Limitations

The current study had multiple limitations that should be considered when interpreting the results. Each analysis relied on a small number of PFC patients. This was in part due to the inherent difficulty in recruiting patients with lesions constrained to specific PFC subregions. A small sample size may limit how the results generalize to the population; therefore, future experiments using a larger sample size may be important to replicate and validate these results. Related to this, analyses that correlated the model parameter estimates with lesion size should be interpreted cautiously. With a small sample size, these correlations are more susceptible to outliers. Additionally, although the participant self-reports did not indicate any perceptual issues, it is possible that impairments during the DP task could be related to variable ability to verbalize the various colors. Finally, the computational modeling focused on mechanisms of attention learning. Attention was determined to be a necessary mechanism to properly describe the participants’ data. Nevertheless, future modeling efforts could focus on other learning mechanisms important for categorization.

Data Availability

All data will be made available upon reasonable request.

References

  1. Antzoulatos EG, Miller EK (2011) Differences between neural activity in prefrontal cortex and striatum during learning of novel abstract categories. Neuron 71:243–249. 10.1016/j.neuron.2011.05.040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ashby FG, Alfonso-Reese LA, Turken AU, Waldron EM (1998) A neuropsychological theory of multiple systems in category learning. Psychol Rev 105:442–481. 10.1037//0033-295X.105.3.442 [DOI] [PubMed] [Google Scholar]
  3. Ashby FG, Maddox WT (2011) Human category learning 2.0. Ann N Y Acad Sci 1224:147–161. 10.1111/j.1749-6632.2010.05874 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Barraclough D, Conroy M, Lee D (2004) Prefrontal cortex and decision making in a mixed-strategy game. Nat Neurosci 7:404–410. 10.1038/nn1209 [DOI] [PubMed] [Google Scholar]
  5. Bechara A, Tranel D, Damasio H (2000) Characterization of the decision-making deficit of patients with ventromedial prefrontal cortex lesions. Brain 123:2189–2202. 10.1093/brain/123.11.2189 [DOI] [PubMed] [Google Scholar]
  6. Benoit RG, Szpunar KK, Schacter DL (2014) Ventromedial prefrontal cortex supports affective future simulation by integrating distributed knowledge. Proc Natl Acad Sci U S A 111:16550–16555. 10.1073/pnas.1419274111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bowman CR, Iwashita T, Zeithamova D (2020) Tracking prototype and exemplar representations in the brain across learning. Elife 9:e59360. 10.7554/elife.59360 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bowman CR, Zeithamova D (2018) Abstract memory representations in the ventromedial prefrontal cortex and hippocampus support concept generalization. J Neurosci 38:2605–2614. 10.1523/jneurosci.2811-17.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Broschard MB, Kim J, Love BC, Wasserman EA, Freeman JH (2019) Selective attention in rat visual category learning. Learn Mem 26:84–92. 10.1101/lm.048942.118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Broschard MB, Kim J, Love BC, Wasserman EA, Freeman JH (2021) Prelimbic cortex maintains attention to category-relevant information and flexibly updates category representations. Neurobiol Learn Mem 185:107524. 10.1016/j.nlm.2021.107524 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bunge SA (2004) How we use rules to select actions: a review of evidence from cognitive neuroscience. Cogn Affect Behav Neurosci 4:564–579. 10.3758/cabn.4.4.564 [DOI] [PubMed] [Google Scholar]
  12. Carpenter KL, Wills AJ, Benattayallah A, Milton F (2016) A comparison of the neural correlates that underlie rule-based and information-integration category learning. Hum Brain Mapp 37:3557–3574. 10.1002/hbm.23259 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Castro L, Savic O, Navarro V, Sloutsky VM, Wasserman EA (2020) Selective and distributed attention in human and pigeon category learning. Cognition 204:104350. 10.1016/j.cognition.2020.104350 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Chun MM, Golomb JD, Turk-Browne NB (2011) A taxonomy of external and internal attention. Annu Rev Psychol 62:73–101. 10.1146/annurev.psych.093008.100427 [DOI] [PubMed] [Google Scholar]
  15. Cromer JA, Roy JE, Miller EK (2010) Representation of multiple, independent categories in the primate prefrontal cortex. Neuron 66:796–807. 10.1016/j.neuron.2010.05.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Damasio H, Frank R (1992) Three-dimensional in vivo mapping of brain lesions in humans. Arch Neurol 49:137–143. 10.1001/archneur.1992.00530260037016 [DOI] [PubMed] [Google Scholar]
  17. Deng WS, Sloutsky VM (2016) Selective attention, diffused attention, and the development of categorization. Cogn Psychol 91:24–62. 10.1016/j.cogpsych.2016.09.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Fellows LK, Farah MJ (2007) The role of ventromedial prefrontal cortex in decision making: judgment under uncertainty or judgment per se? Cereb Cortex 17:2669–2674. 10.1093/cercor/bhl176 [DOI] [PubMed] [Google Scholar]
  19. Frank RJ, Damasio H, Grabowski TJ (1997) Brainvox: an interactive, multimodal visualization and analysis system for neuroanatomical imaging. Neuroimage 5:13–30. 10.1006/nimg.1996.0250 [DOI] [PubMed] [Google Scholar]
  20. Freedman DJ, Riesenhuber M, Poggio T, Miller EK (2001) Categorical representation of visual stimuli in the primate prefrontal cortex. Science 291:312–316. 10.1126/science.291.5502.312 [DOI] [PubMed] [Google Scholar]
  21. Galdo M, Weichart ER, Sloutsky VM, Turner BM (2022) The quest for simplicity in human learning: identifying the constraints on attention. Cogn Psychol 138:101508. 10.1016/j.cogpsych.2022.101508 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Genovesio A, Brasted PJ, Mitz AR, Wise SP (2005) Prefrontal cortex activity related to abstract response strategies. Neuron 47:307–320. 10.1016/j.neuron.2005.06.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Gläscher J, Adolphs R, Damasio H, Bechara A, Rudrauf D, Calamia M, Paul LK, Tranel D (2012) Lesion mapping of cognitive control and value-based decision making in the prefrontal cortex. Proc Natl Acad Sci U S A 109:14681–14686. 10.1073/pnas.1206608109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Gläscher J, Hampton AN, O'Doherty JP (2009) Determining a role for ventromedial prefrontal cortex in encoding action-based value signals during reward-related decision making. Cereb Cortex 19:483–495. 10.1093/cercor/bhn098 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Goodfellow I, Bengio Y, Courville A (2018) Deep learning. MITP. [Google Scholar]
  26. Heffernan EM, Adema JD, Mack ML (2021) Identifying the neural dynamics of category decisions with computational model-based functional magnetic resonance imaging. Psychon Bull Rev 28:1638–1647. 10.3758/s13423-021-01939-4 [DOI] [PubMed] [Google Scholar]
  27. Hélie S, Turner BO, Crossley MJ, Ell SW, Ashby FG (2017) Trial-by-trial identification of categorization strategy using iterative decision-bound modeling. Behav Res Methods 49:1146–1162. 10.3758/s13428-016-0774-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Hutcherson CA, Tusche A (2022) Evidence accumulation, not ‘self-control’, explains dorsolateral prefrontal activation during normative choice. Elife 11:e65661. 10.7554/elife.65661 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Koscik TR, Tranel D (2012) The human ventromedial prefrontal cortex is critical for transitive inference. J Cogn Neurosci 24:1191–1204. 10.1162/jocn_a_00203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Kroker T, et al. (2022) Noninvasive stimulation of the ventromedial prefrontal cortex modulates rationality of human decision-making. Sci Rep 12:20213. 10.1038/s41598-022-24526-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Kruschke JK (1992) ALCOVE: an exemplar-based connectionist model of category learning. Psychol Rev 99:22–44. 10.1037/0033-295X.99.1.22 [DOI] [PubMed] [Google Scholar]
  32. Logan GD (1988) Toward an instance theory of automatization. Psychol Rev 95:492–527. 10.1037/0033-295X.95.4.492 [DOI] [Google Scholar]
  33. Logan GD (1992) Shapes of reaction-time distributions and shapes of learning curves: a test of the instance theory of automaticity. J Exp Psychol Learn Mem Cogn 18:883–914. 10.1037/0278-7393.18.5.883 [DOI] [PubMed] [Google Scholar]
  34. Logan GD (2002) An instance theory of attention and memory. Psychol Rev 109:376–400. 10.1037/0033-295X.109.2.376 [DOI] [PubMed] [Google Scholar]
  35. Love BC, Gureckis TM (2007) Models in search of a brain. Cogn Affect Behav Neurosci 7:90–108. 10.3758/cabn.7.2.90 [DOI] [PubMed] [Google Scholar]
  36. Mack ML, Love BC, Preston AR (2018) Building concepts one episode at a time: the hippocampus and concept formation. Neurosci Lett 680:31–38. 10.1016/j.neulet.2017.07.061 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Mack ML, Preston AR, Love BC (2020) Ventromedial prefrontal cortex compression during concept learning. Nat Commun 11:46. 10.1038/s41467-019-13930-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Maddox WT, Ashby FG (1993) Comparing decision bound and exemplar models of categorization. Percept Psychophys 53:49–70. 10.3758/bf03211715 [DOI] [PubMed] [Google Scholar]
  39. Maddox WT, Ashby FG (2004) Dissociating explicit and procedural-learning based systems of perceptual category learning. Behav Processes 66:309–332. 10.1016/j.beproc.2004.03.011 [DOI] [PubMed] [Google Scholar]
  40. Mian MK, Sheth SA, Patel SR, Spiliopoulos K, Eskandar EN, Williams ZM (2012) Encoding of rules by neurons in the human dorsolateral prefrontal cortex. Cereb Cortex 24:807–816. 10.1093/cercor/bhs361 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Miller EK, Freedman DJ, Wallis JD (2002) The prefrontal cortex: categories, concepts and cognition. Philos Trans R Soc Lond B Biol Sci 357:1123–1136. 10.1098/rstb.2002.1099 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Mok RM, Love BC (2022) An abstract neural representation of category membership beyond information coding stimulus or response. J Cogn Neurosci 34:1719–1735. 10.1162/jocn_a_01651 [DOI] [PubMed] [Google Scholar]
  43. Neath AA, Cavanaugh JE (2011) The Bayesian information criterion: background, derivation, and applications. WIREs Comp Stats 4:199–203. 10.1002/wics.199 [DOI] [Google Scholar]
  44. Nomura E, Maddox W, Filoteo J, Ing A, Gitelman D, Parrish T, Mesulam M-M, Reber P (2007) Neural correlates of rule-based and information-integration visual category learning. Cereb Cortex 17:37–43. 10.1093/cercor/bhj122 [DOI] [PubMed] [Google Scholar]
  45. Nosofsky RM (1986) Attention, similarity, and the identification-categorization relationship. J Exp Psychol Gen 115:39–57. 10.1037/0096-3445.115.1.39 [DOI] [PubMed] [Google Scholar]
  46. Park SA, Miller DS, Nili H, Ranganath C, Boorman ED (2020) Map making: constructing, combining, and inferring on abstract cognitive maps. Neuron 107:1226–1238. 10.1016/j.neuron.2020.06.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Rehder B, Hoffman AB (2005) Eyetracking and selective attention in category learning. Cogn Psychol 51:1–41. 10.1016/j.cogpsych.2004.11.001 [DOI] [PubMed] [Google Scholar]
  48. Reinert S, Hübener M, Bonhoeffer T, Goltstein PM (2021) Mouse prefrontal cortex represents learned rules for categorization. Nature 593:411–417. 10.1038/s41586-021-03452-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Roark CL, Chandrasekaran B (2023) Stable, flexible, common, and distinct behaviors support rule-based and information-integration category learning. npj Sci Learn 8:14. 10.1038/s41539-023-00163-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Schlichting ML, Preston AR (2016) Hippocampal–medial prefrontal circuit supports memory updating during learning and post-encoding rest. Neurobiol Learn Mem 134:91–106. 10.1016/j.nlm.2015.11.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Schnyer DM, Maddox WT, Ell S, Davis S, Pacheco J, Verfaellie M (2009) Prefrontal contributions to rule-based and information-integration category learning. Neuropsychologia 47:2995–3006. 10.1016/j.neuropsychologia.2009.07.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Seger CA, Cincotta CM (2006) Dynamics of frontal, striatal, and hippocampal systems during rule learning. Cereb Cortex 16:1546–1555. 10.1093/cercor/bhj092 [DOI] [PubMed] [Google Scholar]
  53. Smith JD, et al. (2012) Implicit and explicit categorization: a tale of four species. Neurosci Biobehav Rev 36:2355–2369. 10.1016/j.neubiorev.2012.09.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Smith JD, Boomer J, Zakrzewski AC, Roeder JL, Church BA, Ashby FG (2013) Deferred feedback sharply dissociates implicit and explicit category learning. Psychol Sci 25:447–457. 10.1177/0956797613509112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Spalding KN, Schlichting ML, Zeithamova D, Preston AR, Tranel D, Duff MC, Warren DE (2018) Ventromedial prefrontal cortex is necessary for normal associative inference and memory integration. J Neurosci 38:3767–3775. 10.1523/jneurosci.2501-17.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Spreng RN, Stevens WD, Chamberlain JP, Gilmore AW, Schacter DL (2010) Default network activity, coupled with the frontoparietal control network, supports goal-directed cognition. Neuroimage 53:303–317. 10.1016/j.neuroimage.2010.06.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. ter Braak CJF (2006) A Markov chain Monte Carlo version of the genetic algorithm differential evolution: easy Bayesian computing for real parameter spaces. Stat Comput 16:239–249. 10.1007/s11222-006-8769-1 [DOI] [Google Scholar]
  58. Turner BM, Sederberg PB, Brown S, Steyvers M (2013) A method for efficiently sampling from distributions with correlated dimensions. Psychol Methods 18:368–384. 10.1037/a0032222 [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Wallis JD, Anderson KC, Miller EK (2001) Single neurons in prefrontal cortex encode abstract rules. Nature 411:953–956. 10.1038/35082081 [DOI] [PubMed] [Google Scholar]
  60. Weichart ER, Evans DG, Galdo M, Bahg G, Turner BM (2022a) Distributed neural systems support flexible attention updating during category learning. J Cogn Neurosci 34:1761–1779. 10.1162/jocn_a_01882 [DOI] [PubMed] [Google Scholar]
  61. Weichart ER, Galdo M, Sloutsky VM, Turner BM (2022b) As within, so without; as above, so below: common mechanisms can support between- and within- trial learning dynamics. Psychol Rev 129:1104–1143. 10.1037/rev0000381 [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Zanto TP, Gazzaley A (2013) Fronto-parietal network: flexible hub of cognitive control. Trends Cogn Sci 17:602–603. 10.1016/j.tics.2013.10.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Zeithamova D, Dominick AL, Preston AR (2012) Hippocampal and ventral medial prefrontal activation during retrieval-mediated learning supports novel inference. Neuron 75:168–179. 10.1016/j.neuron.2012.05.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Zhang X, et al. (2023) Adaptive stretching of representations across brain regions and deep learning model layers. bioRxiv.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data will be made available upon reasonable request.


Articles from The Journal of Neuroscience are provided here courtesy of Society for Neuroscience

RESOURCES