Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Jan 1.
Published in final edited form as: J Exp Psychol Learn Mem Cogn. 2011 Jan;37(1):1–27. doi: 10.1037/a0021330

Response-Time Tests of Logical-Rule Models of Categorization

Daniel R Little 1, Robert Nosofsky 2, Stephen E Denton 2
PMCID: PMC3081391  NIHMSID: NIHMS251061  PMID: 21058874

Abstract

A recent resurgence in logical-rule theories of categorization has motivated the development of a class of models that predict not only choice probabilities but also categorization response times (RTs; Fifić, Little & Nosofsky, 2010). The new models combine mental-architecture and random-walk approaches within an integrated framework, and predict detailed RT-distribution data at the level of individual subjects and individual stimuli. To date, however, tests of the models have been limited to validation tests in which subjects were provided with explicit instructions to adopt particular processing strategies for implementing the rules. The present research tests conditions in which categories are learned via induction over training exemplars and where subjects are free to adopt whatever classification strategy they choose. In addition, the research explores how variations in stimulus formats, involving either spatially separated or overlapping dimensions, influence processing modes in rule-based classification tasks. In conditions involving spatially separated dimensions, strong evidence is obtained for application of logical-rule strategies operating in a serial-self-terminating processing mode. In conditions involving spatially overlapping dimensions, preliminary evidence is obtained that a mixture of serial and parallel processing underlies the application of rule-based classification strategies. The logical-rule models fare considerably better than major extant alternative models in accounting for the categorization RTs.


A classic idea in the study of concept learning is that people learn and represent certain kinds of categories by forming simple, logical rules (Bourne, 1970; Bruner, Goodnow, & Austin, 1956; Levine, 1975; Trabasso & Bower, 1968). Through the years, however, a variety of alternative models of multidimensional categorization, including prototype (Posner & Keele, 1968; Reed, 1972), exemplar (Medin & Schaffer, 1978; Nosofsky, 1986), and decision-bound models (Ashby & Townsend, 1986) have come into prominence. Nevertheless, the idea that logical rules may underlie numerous types of category representations has certainly not disappeared, and rule-based models continue to be proposed, at least as components of fuller systems (e.g., Ashby, Alfonso-Reese, Turken, & Waldron, 1998; Erickson & Kruschke, 1998; Feldman, 2000; Goodman, Tenenbaum, Feldman, & Griffiths, 2008; Nosofsky, Palmeri, & McKinley, 1994). Until very recently, however, a major limitation of logical-rule models is that they did not provide detailed and rigorous accounts of categorization response times (RTs). By contrast, a variety of exemplar and decision-bound models have provided excellent accounts of such data (e.g., Ashby & Maddox, 1994; Cohen & Nosofsky, 2003; Lamberts, 1995, 1998, 2000; Maddox & Ashby, 1996; Nosofsky & Palmeri, 1997). This limitation of rule-based models is significant, because it is often extremely difficult to distinguish between categorization models based on analysis of choice-probability data alone (e.g., Nosofsky & Johansen, 2000).

Thus, an important recent direction is that several researchers have now formulated logical-rule accounts of categorization RTs (Bradmetz & Mathy, 2008; Fifić, Little & Nosofsky, 2010; Lafond, Lacouture & Cohen, 2009; for earlier related approaches, see Martin & Caramazza, 1980; Trabasso, Rollins, & Shaughnessy, 1971). Bradmetz and Mathy (2008) introduced a model that assumes that the overall classification RT associated with simple Boolean concepts (e.g., Feldman, 2000, 2006; Shepard, Hovland & Jenkins, 1961) is the time taken to “decompress” the concept into a series of component binary decisions involving its features. The decompression time is determined by the number of component decision that are involved and the order in which these decisions are processed. They assume that decisions about the values of features are processed by “agents” that communicate decisions in either a fixed-order serial fashion or in a dynamic (mixed-order) fashion. Across several two-dimensional and three-dimensional Boolean concepts, the mean classification RTs were generally consistent with the assumption that each decision was carried out in a fixed serial order. However, in all cases, their RT measures were based on small numbers of observations for each stimulus averaged across subjects. In addition, for these category structures the predictions of the serial model were nearly identical to predictions derived from the exemplar-based generalized context model (GCM; Nosofsky, 1986). Hence, strong conclusions could not be reached about whether participants were actually engaging in logical-rule processing.

Serial-rule models such as those proposed by Bradmetz and Mathy (2008) are isomorphic to models that assume that choices are made by progressing through the nodes of an ordered decision tree. Building on earlier work of Trabasso et al. (1971), Lafond et al. (2009) recently developed and tested a class of such models. In their decision-tree models, choices are made by making decisions about the values of discrete features. Each feature is represented by a branch on the decision tree and free parameters are estimated to represent the decision time associated with individual branches. In general, rules combining a greater number of sequential decisions take longer to complete. Lafond et al. fitted data at the individual-subject level; however, like Bradmetz and Mathy, they fitted only mean RTs. Although members of the class of decision-tree models compared favorably with exemplar models in their study, there are some limitations in reaching very strong conclusions for the operation of rule-based processing. First, the model comparisons all relied on measures of quantitative fit, and there were no a priori qualitative predictions for contrasting the models. Second, to achieve good fits, Lafond et al needed to “patch” the predictions from their sequential decision-tree model with other free parameters. Although they provided reasonable interpretations for the emergence of these parameters, a fully specified rule-based process model was not provided.

In contrast to the rule-based models just discussed, Fifić et al. (2010) have proposed a set of logical-rule models that are capable of explaining the time course of categorization not only at the level of mean RTs but at the level of full RT distributions. As described more fully in the next section, these models combine sequential-sampling and mental-architecture models of RT within an integrated framework (for related approaches in other domains, see Eidels, Donkin, Brown, & Heathcote, in press; Palmer & Mclean, 1995; Ratcliff, 1978; Thornton & Gilden, 2007). In brief, the models assume that subjects make independent decisions about the values that stimuli have along each of their dimensions. The time course of these independent decisions is governed by separate, individual random-walk processes (Busemeyer, 1985; Luce, 1986). The outcomes of the independent decisions are then combined via alternative mental architectures (Schweickert, 2002; Sternberg, 1969; Townsend, 1984, 1990) to determine whether or not the logical rule that defines a category structure has been satisfied. For example, as described more fully in the next section, the individual-dimension random walks may be executed in either a serial, parallel, or coactive fashion. Furthermore, the observer may use either exhaustive or self-terminating stopping rules in evaluating the evidence about whether a logical rule has been satisfied.

One of the key contributions of Fifić et al. (2010) was that they provided a set of qualitative contrasts for differentiating alternative processing versions of the rule-based models from one another. That is, within the framework of the paradigm they tested, each rule model led to its own unique RT signature. Furthermore, these signatures also allowed for strong qualitative contrasts between the predictions from the rule-based models and major alternative models of classification RT, such as exemplar and decision-bound models. Moreover, as will be seen later in this article, by considering the detailed shapes of individual-stimulus RT distributions, the rule-based models can be contrasted even with extremely general models that do not posit forms of multiple-stage, rule-based processing.

The aims of the current research are two-fold. First, the previous experiments conducted by Fific et al. (2010) were intended only as validation tests of the newly proposed logical-rule models. In particular, subjects were always provided with explicit knowledge of the rule-based structure of the categories. Furthermore, in most cases, subjects were provided with explicit instructions for use of a serial self-terminating strategy for implementing the logical rules. Not surprisingly, under those extreme conditions, the qualitative predictions from the serial self-terminating rule model were observed, and the model provided an excellent quantitative fit to the detailed RT-distribution data that were obtained in the experiments. In a nutshell, these previous experiments simply provided validation that the new models and experimental paradigm formed valuable tools: If subjects do indeed use logical rules as a basis for classification, then the behavior can be sharply identified through fits of the models to the RT data. The first purpose of the present research, therefore, is to put the tools to use. In Experiment 1, subjects are again tested in Fific et al.'s classification paradigm. Now, however, they receive no instructions regarding the rule-based category structure. Furthermore, they receive no instructions regarding the type of processing strategy that they should use. Instead, subjects are required to learn the categories from scratch by induction over individually presented training exemplars and are free to adopt whatever classification strategy they choose. The question is whether or not we will find evidence for use of any of the logical-rule strategies under these open-ended learning and performance conditions.

The second purpose of the new research, which we pursue in Experiment 2, involves the types of stimuli that subjects are required to classify. In the experiments conducted by Fific et al. (2010) and Lafond et al. (2009), the stimuli were composed of highly separable dimensions located in spatially separated regions. The reason for following this research strategy was that it made more likely the possibility that subjects might process the dimensions in a one-by-one serial fashion, and this form of serial examination of the dimensions seems conducive to logical rule-based classification strategies. As described earlier, however, a key idea is that evaluation of logical rules may not demand serial processing of dimensions. Instead, under certain experimental conditions, observers may use forms of parallel or coactive processing in evaluating whether or not a logical classification rule has been satisfied. To investigate this possibility, in Experiment 2 we used stimuli composed of spatially overlapping dimensions, making more plausible, for example, that forms of parallel processing of the dimensions might take place. Thus, the new research is aimed at investigating the variety of types of information processing that may underlie the application of logical-rule-based categorization strategies.

In the following section, we provide a more detailed review of the rule-based models of classification RT proposed by Fific et al. (2010), and a review of the experimental paradigm and sets of qualitative contrasts used for distinguishing among the models. We then evaluate the models in the new experiments, both by testing their a priori qualitative predictions and by quantitatively fitting the models to the detailed sets of individual-stimulus RT distributions obtained in the tasks.

Logical, rule-based models

Because a full presentation of the logical-rule models has already been provided by Fific et al. (2010), in this section we provide only a brief review. To explain the workings of the models, it is convenient to make reference to the experimental paradigm illustrated in Figure 1 (left panel), which underlies the work reported in this article. The stimuli vary along two continuous dimensions, x and y, with three values per dimension. The dimension values are combined orthogonally to yield a 9-member stimulus set. The stimuli in the upper-right quadrant of the stimulus space are members of the “target” category (A), whereas the remaining stimuli are members of the “contrast” category (B).

Figure 1.

Figure 1

Left panel: Schematic illustration of the category structure used for testing the logical-rule models. The stimuli are composed of two dimensions, x and y, with three values per dimension, combined orthogonally to produce the nine members of the stimulus set. The stimuli in the upper-right quadrant of the space are the members of the “target” category (A), whereas the remaining stimuli are the members of the “contrast” category (B). Right panel: Shorthand nomenclature for identifying the main stimulus types in the category structure. H and L refer to the high- and low-salience dimension values, respectively. R = redundant stimulus; I = interior stimulus; E = exterior stimulus.

When we say that a subject adopts a “logical-rule strategy,” we mean that the subject makes independent decisions about the values of a stimulus along each of its dimensions, and then combines those decisions using logical connectives such as AND, OR, and NOT to determine if the stimulus belongs to a given category (Ashby & Gott, 1988; Nosofsky, Clark, & Shin, 1989). In the Figure-1 example, a simple way of describing Category A is in terms of a conjunctive rule: A stimulus is a member of Category A if it has value greater than or equal to x1 on Dimension x AND value greater than or equal to y1 on Dimension y. Conversely, a stimulus is a member of Category B if it satisfies a complementary disjunctive rule, i.e., the stimulus has value less than x1 on Dimension x OR value less than y1 on Dimension y. According to the present rule models, the subject processes the stimuli so as to determine which rule has been satisfied and then emits the appropriate categorization response.

Specifically, the models assume that the subject establishes decision boundaries along each dimension, as illustrated in Figure 1, and determines to which side of the boundary each individual dimension value of a presented stimulus falls (Ashby & Townsend, 1986). The decision process along each individual dimension is modeled in terms of an elementary random-walk process. As illustrated in the top panel of Figure 2, there is a random walk counter with initial value zero. The subject establishes criteria with values +A and −B that determine how much evidence is needed to make an A or a B decision on each individual dimension. Along each of the dimensions, there is associated with each stimulus a normal distribution of percepts (Figure 2, bottom panel). On each step of the random walk, a percept is sampled from the stimulus's distribution. For Dimension x, if the percept falls to the right of the decision bound (i.e., in Region A), then the random walk takes unit step in the direction of criterion +A, whereas if the percept falls to the left of the bound then the random walk takes unit step in the direction of −B. The sampling process continues until either criterion +A or −B is reached. An analogous random-walk process takes place for making independent decisions on Dimension y. The decision time along each dimension is determined by the number of steps required to complete each individual random walk. Note that, in general, stimuli with values that lie far from the decision bound (e.g., value x2 in Figure 2) will lead to faster and more accurate independent decisions than those that lie close (e.g., value x1 in Figure 2), because the random walk will take more consistent steps toward the appropriate response criterion.

Figure 2.

Figure 2

Schematic illustration of the random-walk process that governs decision-making on each individual dimension. In the illustration, x1 is the presented stimulus value. Percepts sampled from distribution x1 that fall to the right side of the decision boundary lead the random walk on dimension x to take steps towards criterion +A.

To make an overall categorization response, the subject needs to determine which logical rule is satisfied. Thus, the results from the individual-dimension random walks must be combined. For example, an object is judged to be a member of the target category only if both the x and y random walks lead to Region-A decisions. The formal models make allowance for the possibility that different mental architectures and stopping rules are used for combining the outputs of the individual-dimension random walks. First, the random walks may operate in either serial (one at a time) or parallel (simultaneous) fashion.1 Second, the observer may use either a self-terminating or exhaustive stopping rule. For example, if the first completed random walk leads to a decision that the object lies in Region B along a dimension, then processing may self-terminate, because the disjunctive-rule defining Category B has already been satisfied. By contrast, if an exhaustive stopping rule is used, then the final classification response is not made until both random walks have been completed, regardless of the outcome of any earlier decisions. Importantly, throughout this article, when we refer to “self-terminating” models, we mean that processing self-terminates only when it has the logical option to do so. Of course, even self-terminating models presume that exhaustive processing takes place in those cases in which it is logically required. For example, suppose that an observer employs a self-terminating stopping rule and that stimulus x0y2 from the contrast category (B) is presented (see Figure 1). If the random-walk process on Dimension x finishes first and correctly determines that x0 falls in Region B, then processing will self-terminate, as explained above. But if the random-walk process on Dimension y finishes first, then there is insufficient information to decide if the stimulus belongs to Category A or B (i.e., both the target and contrast categories include stimuli with value y2 on Dimension y). In this case, the observer must also process Dimension x before he or she is able to make the classification response. Finally, note that regardless of whether an observer is using a self-terminating or exhaustive stopping rule, processing is always exhaustive in cases in which correct classification of a member of the target category (A) takes place. That is, for the present paradigm, exhaustive processing is a logical requirement to verify that the conjunctive rule that defines the target category has been satisfied.

Combining the possibilities outlined above, we consider models that assume either serial-self-terminating, serial-exhaustive, parallel-self-terminating, or parallel-exhaustive processing. We consider as well a fifth possibility, based on coactive processing (e.g., Miller, 1982), in which sampled perceptual information from Dimensions x and y is pooled into a single, common random walk (Fific et al., 2010). This model is discussed at greater length in the final section of our introduction.

To implement the models, we introduce various simplifying assumptions and parameter constraints. First, because the stimuli used in our experiments vary along highly separable dimensions (Shepard, 1964), we assume that the perceptual distributions along Dimensions x and y are independent for each individual stimulus. Furthermore, the mean and variance of each stimulus's perceptual distribution along Dimension x depends only on whether it is composed of dimension value x0, x1, or x2; and likewise for Dimension y. (These respective assumptions are referred to as “perceptual independence” and “perceptual separability” within the General Recognition Theory of Ashby & Townsend, 1986). Within each dimension, the variances of the perceptual distributions are assumed to be equal; however, to allow for the possibility that one dimension is more discriminable than the other, separate perceptual variance parameters are allowed for Dimension xx2) and Dimension yy2). Because we will be fitting RT distributions, we assume that there is a log-normal distribution of residual times with mean μR and variance σR2 (corresponding, for example, to encoding and motor-execution stages).2 We also need to estimate a free scaling parameter k that specifies the time in ms for taking each individual random-walk step. Finally, the serial self-terminating model requires a free parameter px indicating the proportion of times that the dimensions are processed in the order x-then-y (rather than y-then-x). In sum, the logical rule models use the following free parameters: the perceptual-variance parameters σx2 and σy2; decision-bound locations on Dimensions x and y (Dx and Dy); random-walk criteria +A and −B; residual-stage parameters μR and σR2; time-scaling parameter k; and, for the serial self-terminating model, the order-probability parameter px.

Given the stated assumptions, the experimental paradigm illustrated in Figure 1 is a highly diagnostic one for telling the models apart and for contrasting them with major extant alternatives in the field. Although we will ultimately be fitting all of the models to detailed RT-distribution data, it turns out that sharp distinctions exist among the models even at the level of mean RTs. In particular, each model yields its own unique signature of the predicted pattern of mean RTs across the target and contrast categories. (These predictions assume that error rates are fairly low, a requirement that will be satisfied in our ensuing experiments.)

To help illustrate the predictions, we make use of the notation shown in the right panel of Figure 1. For the target category (A), the stimuli are designated by their discriminability with respect to the category boundaries: the low-low (LL) stimulus is close to the boundary on both dimensions and hence has low discriminability on both dimensions. By contrast, the high-high (HH) stimulus has high discriminability on both dimensions. The LH and HL stimuli have low discriminability on one dimension but high discriminability on the other. Assuming that decision making is faster for stimuli with high rather than low discriminability, then it is expected that the HH stimulus will have the fastest RTs, the LH and HL stimuli intermediate RTs, and the LL stimulus the slowest RTs. A quantitative indicator of the pattern of mean RTs for the target-category stimuli is provided by a measure known as the mean-interaction-contrast (MIC):

MIC=(RTLLRTLH)(RTHLRTHH), (1)

where RTij is the mean RT associated with target-stimulus ij (Townsend & Nozawa, 1995). As illustrated by examples in the left panels of Figure 3, when MIC=0 the mean RTs for the target-category stimuli show an additive pattern; when MIC<0 the mean RTs are underadditive; and when MIC>0 the mean RTs are overadditive.

Figure 3.

Figure 3

Figure 3

Summary predictions of mean response times (RTs) from the alternative logical-rule models of classification. Each row corresponds to one of the models. The left panels show the pattern of predictions for the target-category members, and the right panels show the pattern of predictions for the contrast-category members. As explained in the text, note that regardless of the stopping rule, correct classification of target-category members requires exhaustive processing. Left panels: L = low-discriminability dimension value; H = high-discriminability dimension value; D1 = Dimension 1; D2 = Dimension 2. Right panels: R = redundant stimulus, I = interior stimulus, E = exterior stimulus. “First-processed” and “second-processed” dimensions are as defined in the text.

The target-category structure forms part of what is known as the double-factorial paradigm in the information-processing literature. Under reasonable assumptions (see Townsend & Nozawa, 1995), which are satisfied by the rule-based models used in this research, the alternative mental architectures make clear-cut predictions of the pattern of mean RTs in this paradigm (see Figure 3, left panels). The serial-rule models predict an additive pattern of mean RTs (MIC=0); the parallel models predict an underadditive pattern (MIC<0); and the coactive model predicts an overadditive pattern (MIC>0). See Fific, Nosofsky, and Townsend (2008) and Fific et al. (2010) for intuitive explanations of the emergence of each of these mean RT patterns from the alternative models.

Whereas the pattern of target-category predictions is well known in the information-processing literature, Fific et al. (2010) noted that the pattern of mean RTs for the members of the contrast category (B) offers additional diagnostic evidence of the type of information-processing architecture that is involved. Again, we use the notation in the right panel of Figure 1 to facilitate the discussion. The contrast-category stimulus that satisfies the disjunctive rule on both dimensions is denoted the redundant (R) stimulus; the adjacent neighbors of the R stimulus are denoted the interior stimuli (Ix and Iy); and the stimuli at the far edges of the contrast category are denoted the exterior stimuli (Ex and Ey). In addition, in the case of serial processing, if, say, the subject tends to process the stimuli in the order x-then-y, then we will refer to Dimension x as the first-processed dimension and to Dimension y as the second-processed dimension. This same terminology is also used for the parallel and coactive models if processing tends to be faster or more accurate on one dimension than the other.

To illustrate the diagnosticity of the contrast category, consider, for example, the predictions from a fixed-order, serial-self-terminating rule strategy, with the subject always processing Dimension x first and Dimension y second. Assuming perfectly accurate responding, if presented with any of the three stimuli R, Ix, or Ex, then the subject will verify in the first stage of processing that the disjunctive rule defining the contrast category has been satisfied (see Figure 1, right panel). Thus, processing will self terminate, and the subject can immediately emit a Category-B response. Therefore, the mean RTs for these three stimuli on the first-processed dimension will tend to be fast and approximately equal to one another. By contrast, if either of stimuli Iy or Ey is presented, the subject will need to engage in a second stage of processing. That is, after processing only Dimension x, there is insufficient information to determine whether the stimulus is a member of the contrast category or the target category (because both include members with x values that fall to the right of the decision boundary). Because the observer first processes Dimension x and then processes Dimension y, RTs for Iy and Ey will tend to be slower than for R, Ix, and Ex. Finally, the mean RT for the exterior stimulus Ey will be faster than the mean RT for the interior stimulus Iy. The reason is that, in the first stage of processing, Ey is farther from the x decision bound than is Iy, so Ey is processed faster during the first stage. In the second stage, the time to determine that these stimuli fall below the y decision boundary is the same for Iy and Ey. Because, for the serial model, the total decision time is just the sum of the individual-dimension decision times, the fixed-order serial-self-terminating model therefore predicts faster mean RTs for the exterior stimulus than for the interior stimulus on the second-processed dimension. A summary picture of this complete set of qualitative predictions for the fixed-order serial self-terminating model is presented in the top-right panel of Figure 3.

Similar forms of reasoning, verified through computer simulation, lead to the qualitative sets of predictions illustrated for all of the remaining models in the right panels of Figure 3. (See Fific et al., 2010, pp. 315-317, for a step-by-step exposition of the basis for the predictions for each of the models.)3 For example, according to the mixed-order serial self-terminating model (which assumes that the order in which the dimensions are tested is a probabilistic mixture across trials), the exterior stimuli will be classified more rapidly than the interior stimuli on both dimensions. (Also, assuming perfect accuracy, there should be a small redundant-stimulus advantage.) By contrast, if a parallel-self-terminating strategy is used, the RTs for the interior and exterior stimuli will be equal on both dimensions. And, to take a final example, if a coactive rule process is involved, then RTs for the exterior stimuli will be slower than RTs for the interior stimuli on both dimensions. Although these predictions are not completely parameter free, they hold over the vast range of plausible parameter settings from the models. (Extreme parameter settings that “undo” these predictions lead to other extreme consequences that constrain the models.) From inspection of Figure 3, the reader can verify that each of the individual rule-based models generates it own unique signature of the predicted pattern of mean RTs taken across both the target and contrast categories. Thus, the paradigm is a highly diagnostic one for telling apart the alternative rule models.

Comparison Models

As a source of comparison, we also consider three alternative models of classification RTs: the exemplar-based random walk (EBRW) model (Nosofsky & Palmeri, 1997); a random-walk version of multidimensional decision-boundary (RW-MDB) theory (Ashby, 2000; Nosofsky & Stanton, 2005); and a very general free-stimulus-drift-rate model (Fific et al., 2010). Because the EBRW and RW-MDB models have been discussed extensively in previous articles, we only briefly review them here. (See Fific et al. (2010) for a fuller statement and a listing of the free parameters.)

According to the EBRW model, people store individual exemplars of categories in memory. Presentations of test items lead the stored exemplars to be retrieved -- the greater the similarity of the exemplar to the test item, the higher is its retrieval probability. The retrieved exemplars drive a random-walk process for making classification decisions. If a retrieved exemplar belongs to Category A, then the random walk steps in the direction of criterion +A, whereas if the exemplar belongs to Category B then the random walk steps toward Criterion −B. The exemplar-retrieval process continues until one of the criteria has been reached. The qualitative predictions of mean RTs from the EBRW model are shown schematically in the bottom panels of Figure 3, and they are identical to the qualitative predictions from the coactive model. In general, the EBRW model predicts that classification RTs get faster as a stimulus's summed similarity to its own category grows larger, and as its summed similarity to the opposite category grows smaller. So, for example, for the members of the contrast category (B), summed similarity to the correct category grows larger as one approaches the lower-left corner where the redundant stimulus is located.

According to the RW-MDB model, the observer establishes multidimensional decision boundaries to divide the stimulus space into category regions. For the present paradigm, each stimulus is associated with a bivariate normal distribution of percepts. Upon presentation of a stimulus, a percept is sampled from the bivariate distribution. If the percept falls in Region A defined by the multidimensional decision bound, then a random walk takes a step in the direction of criterion +A; whereas if the percept falls in region B, the random walk steps in the direction of criterion −B. In the same manner as the previous models, the perceptual sampling process continues until either the +A or −B criterion has been reached. (Note that the process here is different from the serial and parallel rule models, which assume that independent decisions are made along each dimension by a separate random walk, with the separate decisions then being combined.) The precise predictions from the RW-MDB model depend on the form of the multidimensional decision bound that is used for dividing the stimulus space into category regions. In the present experiments, the stimuli will vary along highly separable dimensions, so it is reasonable to assume that the multidimensional bound is simply the combination of two individual decision boundaries that are orthogonal to the coordinate axes (as illustrated in Figure 1). Likewise, we continue to assume perceptual independence and perceptual separability of the stimulus representations (Ashby & Townsend, 1986). Under these assumptions, the RW-MDB model is formally identical to the coactive model. Therefore, the coactive model will serve as our representative from the general class of multidimensional decision-bound models.

Perhaps the most important comparison model is the free stimulus-drift-rate model. As is the case for the EBRW and coactive models, this model assumes that classification decision-making is governed by a single-channel random-walk process. That is, in this model, subjects do not make separate decisions about the values that stimuli have on each of their dimensions, and the outputs of separate random walks are not combined (as occurs in the serial and parallel rule-based models). Instead, there is just a single random walk process that is associated with each individual stimulus. However, within this single-channel framework, the model is extremely general, because it simply allows each individual stimulus to have its own freely estimated drift-rate parameter. That is, for each individual stimulus, we estimate a separate free parameter that indicates the probability that the single random walk moves toward criterion +A on each step. (This approach is analogous to one that Ratcliff and colleagues often apply when fitting their continuous-time diffusion model to different experimental conditions, e.g., Ratcliff & Rouder, 1998) For the category structure illustrated in Figure 1, the free stimulus-drift-rate model uses 14 free parameters for fitting the RT distributions. The first five play the same role as already described for the logical rule-based models: the random-walk criteria +A and −B; residual-distribution parameters μR and σR2; and scaling constant k. The remaining 9 free parameters are the individual-stimulus drift rates. Note that the free stimulus-drift-rate model subsumes the EBRW and coactive models as special cases. Thus, it must provide at least as good an absolute fit to the data as do these other models. However, because our model-fit criterion will employ a penalty term for number of free parameters, it is not guaranteed to provide as good a penalty-corrected fit. Possibly, the EBRW and coactive models could provide more parsimonious accounts of the data.

Because of its generality, the free stimulus-drift-rate model can predict any pattern of mean RTs for the nine stimuli in the classification task. Therefore, the qualitative predictions of mean RTs shown in Figure 3 will not be useful for contrasting the logical-rule models with the free stimulus-drift-rate model. Nevertheless, as explained in depth in the Computional Modeling sections of our article, the present models also make different predictions of the shapes of individual-stimulus RT distributions. As will be seen, the RT-distribution data will be extremely useful for telling apart the logical-rule models from even this very general free-stimulus-drift-rate model. To the extent that the logical-rule models provide better fits than does even the free-stimulus-drift-rate model, it would provide extremely convincing evidence of the utility of combining the mental-architecture and random-walk approaches within an integrated framework.

Experiment 1

In Experiment 1, we test four individual subjects using the Figure-1 category structure. The stimuli and category structure are the same as those used in Fific et al.'s (2010) Experiment 2. The crucial difference in the procedure is that the Fific et al. study served only as a validation test of the proposed rule models. In that previous experiment, all subjects were provided with explicit information regarding the details of the rule-based category structure. In addition, most were also provided with explicit instructions to use particular fixed-order serial-self-terminating procedures to apply the stated rules. By contrast, in the present experiment, subjects learn the categories via trial-by-trial induction over the individual training exemplars, and are free to apply whatever strategy they choose in order to classify the stimuli. The question is whether or not we will find evidence of rule-based classification under these open-ended learning and performance conditions.

Following Fific et al. (2010, Experiment 2), our Experiment-1 stimuli are composed of highly separable dimensions located in spatially separated regions (see the Method section for details). Given the nature of the stimuli, it is likely that subjects cannot focus attention simultaneously on the separate dimensions within the same “attentional beam”. Eye movements may even be needed in order for subjects to sharply encode the values on the separate dimensions. We continue to use these stimuli because they are likely to promote serial processing of the dimensions, and we judge that serial processing is conducive to allowing subjects to adopt rule-based classification strategies. (The idea in this initial experiment was to arrange conditions that might bolster the chances that rule-based strategies would be freely chosen and used.) Note, however, that serial encoding of individual dimensions does not force the application of the logical-rule strategies themselves. For example, serial processing may take place in an initial encoding stage, but then a host of alternative classification strategies may operate upon the fully encoded stimulus in a subsequent decision stage. Thus, unlike Fific et al. (2010), it is an open question whether or not we do indeed find evidence for application of the logical rules under the present conditions.

Method

Participants

Four Indiana University students completed Experiment 1. They received $9 dollars per session and an extra $3 dollar bonus per session for highly accurate performance. The participants had normal or corrected-to-normal vision. The participants were unaware of the issues under investigation in the research.

Stimuli

The stimuli were schematic drawings of lamps composed of four parts (see example in Figure 4): a top piece, which varied in the amount of curvature; a lamp shade, which varied in the angle at which the sides connected the bottom of the shade to the top of the shade; the design or body of the lamp, which varied in qualitatively different forms; and the base of the lamp, which varied in width. Overall, the lamp shade and body of the lamp (not including the top or the base) were 385 pixels tall and 244 pixels at the widest point (the bottom of the shade piece). The stimuli subtended a vertical visual angle of about 10.48 degrees and a horizontal visual angle of about 5.72 degrees at their widest point.

Figure 4.

Figure 4

Illustration of the “lamp” stimuli used in Experiment 1. Values on the irrelevant dimensions (shade and body) varied randomly from trial to trial. (The identifying labels were not present on the experimental stimuli.)

Only the top piece and the base were relevant for the categorization task (see Figure 4). The curvature of the top piece was varied in three levels by drawing an arc inside of a 60 pixel-wide rectangle with a variable height: 15, 17, or 24 pixels . Likewise, the width of the base was varied in three levels (95, 105, and 160 pixels). The height of the base was fixed at 20 pixels. All of the lamps that had either the top piece with the smallest curvature (15 pixels) or the narrowest base (95 pixels) formed the set of contrast-category lamps (Category B). All of the remaining lamps formed the set of target-category lamps (Category A).

The lamp shade and design dimensions also varied across trials (3 levels each). However, as explained more fully in the Procedure section, the lamp shade and design dimensions were irrelevant to the categorization task. Furthermore, for each individual subject, within each session of testing, there was no correlation between values on the relevant dimensions and values on the irrelevant ones, either within or between categories.

As shown schematically in Figure 1, note that the difference between the dimensional values close to the category boundary (e.g., the base width values of 95 and 105 pixels) was smaller than the difference between the middle dimensional value and the remaining value (e.g., the base width values of 105 and 160 pixels). In order to tell apart the predictions from the alternative rule models, the high-discrimination (H) dimension values must be processed more rapidly than the low-discrimination (L) values. The difficulty of the discriminations near the boundary was used to increase the magnitude of these desired speed-of-processing differences.

Procedure

In describing the procedure, we will refer to each of the 9 combinations of the relevant top and base dimensions as an item type. Specific stimuli composed of unique combinations of all four dimension values will be referred to as tokens. (Because each of the 4 dimensions had 3 values, there was a total of 81 tokens.) Participants completed 5 sessions across near consecutive days. In each session, participants first completed 27 practice trials (3 repetitions of each item type). This was followed by 810 experimental trials, grouped into 6 blocks of 15 presentations of each item type, with rest breaks in between each block. In the experimental trials of each session, each of the 81 tokens was presented 10 times each. Thus, across the entire experiment, not including the practice trials, each of the 9 main item types was presented 90 times per session and 450 times in total. The order of presentation of the stimuli was randomized anew for each participant and session, within the constraints stated above. The reader may verify that this procedure ensures that there are no spurious correlations between values on the irrelevant dimensions and category membership.

Participants responded by pressing the left mouse button for group A and the right mouse button for group B. Participants were instructed to rest their left and right index fingers on the mouse buttons throughout the testing session. RTs were recorded from the onset of a stimulus display up to the time of a response. Each trial started with the presentation of a fixation cross for 1770 ms. After 1070 ms from the initial appearance of the fixation cross, a warning tone was sounded for 700 ms. The stimulus was then presented on the screen and remained visible until the participant's response was recorded.

The first session was used as a training session in which participants were required to learn the categories through trial and error. At the outset of the experiment, participants were shown three example stimuli that illustrated all three different top piece curvatures and all three different base widths. To speed learning, participants were instructed that the lamp shade and design were irrelevant for learning the categories. In the first session of Experiment 1, participants could take as long as they desired to respond, feedback was provided for both correct (e.g., “…CORRECT…”) and incorrect responses, and the stimulus remained onscreen while the feedback was presented to allow further inspection of the relevant dimensions and increase the speed of learning. In the later sessions, participants were given feedback (“…WRONG…”) only after incorrect responses or after responses that took longer than 5 s (“…TOO SLOW…”). A blank inter-trial interval of 1870 ms was inserted between each trial.

Because the contrasting qualitative predictions from the competing models assume high-accuracy responding, the instructions emphasized the need for subjects to be accurate. Subjects were informed, however, that their RTs were being recorded, so they needed to execute their response as soon as they had made their decision.

Results

For Experiment 1, we refer to the participants as L1 to L4 where the L designates the lamp-stimuli experiment. The initial training/practice session (Session 1) was excluded from the analyses. Trials with error responses were excluded from the RT analyses. Finally, for each individual participant and stimulus, trials with RTs less than 150 ms and trials with RTs greater than 3 standard deviations above the mean for that stimulus were removed. Less than 2% of the trials were removed by this method. As shown in Table 1, error rates were generally very low, with a couple of higher error rates always associated with slower stimuli. Therefore, we now turn to analyses of the RTs.

Table 1.

Experiment 1: Mean Correct RTs (ms) and Error Rates for the Individual Stimuli, Along with the Modeling Predictions..

Item
Participant L1 HH HL LH LL EY IY EX IX R
RT Observed 714.3 1317.2 1140.7 1703.7 1219.6 1523.0 1013.8 1218.5 1012.8
RT Model 710.3 1344.3 1109.0 1745.2 1224.3 1510.3 1028.4 1182.1 1030.6
p(e) Observed 0.00 0.07 0.03 0.13 0.02 0.08 0.04 0.02 0.00
p(e) Model 0.00 0.10 0.02 0.11 0.07 0.07 0.04 0.04 0.00

Participant L2 HH HL LH LL EY IY EX IX R

RT Observed 568.1 744.7 761.7 926.1 569.1 621.6 779.7 952.7 636.3
RT Model 583.1 756.8 748.4 917.3 588.0 598.8 804.3 953.1 607.1
p(e) Observed 0.00 0.04 0.03 0.07 0.03 0.03 0.06 0.13 0.02
p(e) Model 0.00 0.04 0.03 0.06 0.06 0.06 0.07 0.07 0.01

Participant L3 HH HL LH LL EY IY EX IX R

RT Observed 935.1 1133.2 1040.4 1262.4 1102.8 1236.4 970.5 1013.9 1044.6
RT Model 916.1 1184.9 1094.2 1344.8 1180.9 1323.5 989.7 1014.3 996.6
p(e) Observed 0.00 0.01 0.01 0.04 0.01 0.09 0.03 0.01 0.00
p(e) Model 0.00 0.02 0.00 0.03 0.03 0.03 0.03 0.03 0.00

Participant L4 HH HL LH LL EY IY EX IX R

RT Observed 689.7 1073.9 880.89 1312 1066.4 1178.4 552.02 558.74 610.22
RT Model 711.6 1066.5 914.6 1269.1 994.0 1184.5 570.9 567.6 570.9
p(e) Observed 0.00 0.08 0.03 0.11 0.03 0.06 0.02 0.03 0.00
p(e) Model 0.00 0.10 0.02 0.12 0.06 0.06 0.01 0.01 0.00

Note. Predictions are from the serial self-terminating attention-switch model, which is described in the text.

Mean-RT Analyses

The mean RTs for each individual subject and stimulus are reported in Table 1 and are illustrated graphically in Figure 5. The target-category results are shown in the left-hand panels and the contrast-category results in the right-hand panels. In terms of the big picture, comparing the observed data to the canonical prediction graphs in Figure 3, it can be seen that the results for Subjects L1-L3 closely resemble the qualitative predictions from the mixed-order serial-self-terminating rule model. In addition, the results for Subject L4 closely resemble the predictions from the fixed-order serial-self-terminating rule model. In particular, for all four subjects, the target-category RTs show the additive pattern predicted by the serial models. Likewise, for all four subjects, the contrast-category results show that the exterior stimulus is classified more rapidly than is the interior stimulus on the “second-processed” dimension. For Subjects L1-L3, this same pattern also tends to be seen on the “first-processed” dimension, which is indicative of a mixed-order serial-self-terminating process. For Subject L4, the RTs on the first-processed dimension are nearly flat, which is indicative of a fixed-order serial-self-terminating process. The only minor deviation of the observed data from the predictions of the serial-self-terminating models is that the redundant stimulus has somewhat slower RTs than expected. As explained in the Computational Modeling section, even this pattern is a possible outcome from the model when there is a non-zero probability of errors. Whereas the data for all of the subjects closely resemble the predictions from the serial self-terminating rule models, they obviously strongly violate subsets of the predictions from all of the competing models shown in Figure 3.

Figure 5.

Figure 5

Experiment 1: Observed mean response times (RTs) for the individual subjects and stimuli. Error bars represent ±1 SE. The left panels show the results for the target-category stimuli, and the right panels show the results for the contrast-category stimuli. Left panels: L = low-discriminability dimension value; H = high-discriminability dimension value; D1 = Dimension 1; D2 = Dimension 2. Right panels: R = redundant stimulus; I = interior stimulus; E = exterior stimulus. For ease in making comparisons to the prediction graphs in Figure 3, the contrast-category stimuli are labeled with respect to whether they are on the “first-processed” or “second-processed” dimension, as defined in the text.

We conducted statistical tests to confirm the description of the results provided above. For the target-category data, we conducted a 4×2×2 analysis of variance (ANOVA) of the RTs of each individual subject, using as factors Session (2-5), discrimination level of the base dimension (L or H), and discrimination level of the top-piece dimension (L or H). The results are presented in Table 2. For all participants, there was a main effect of session, reflecting a slight speeding up of responding over time. As is obvious from inspection of Figure 5, the main effects of stimulus-dimension discriminability (L or H) were of course highly significant and those results are not presented in the table. Most important for the present analysis, there was no interaction between the base and top-piece factors, supporting the claim of additivity (MIC=0) of the target-category RTs, and supporting the inference of serial processing of the dimensions. Furthermore, the Session × Base × Top Piece interaction was not significant for any of the subjects, suggesting that the processing strategy was stable across the course of the experiment. Although not reported in Table 2, there were some occasional interactions of session with the individual dimensions, reflecting minor changes in relative processing speed across the course of the experiment.

Table 2.

Experiment 1: Statistical Test Results for the Individual Subjects.

Target Category
Contrast Category
Participant L1 df F M t
Session 3 28.07*** E1 - I1 −204.74 −4.86***
Base × Top Piece 1 0.58 E2 - I2 −303.41 −6.88***
Session × Base × Top Piece 1 1.32 E1 - R 0.99 0.03
Error 1300 I1 - R 205.73 5.00***
E2 - R 206.76 6.01***
I2 - R 510.17 11.54***

Target Category
Contrast Category
Participant L2 df F M t

Session 3 81.04*** E1 - I1 −52.42 −3.50***
Base × Top Piece 1 0.64 E2 - I2 −173.08 −12.66***
Session × Base × Top Piece 1 0.23 E1 - R −67.16 −4.65***
Error 1338 I1 - R −14.74 −0.9
E2 - R 143.37 10.86***
I2 - R 316.45 18.99***

Target Category
Contrast Category
Participant L3 df F M t

Session 3 324.22*** E1 - I1 −43.43 −1.80 ±
Base × Top Piece 1 2.17 E2 - I2 −133.64 −6.49***
Session × Base × Top Piece 1 1.57 E1 - R −74.14 −3.22**
Error 1378 I1 - R −30.71 −1.31
E2 - R 58.20 2.78**
I2 - R 191.84 8.67***

Target Category
Contrast Category
Participant L4 df F M t

Session 3 227.10*** E1 - I1 −6.71 −0.54
Base × Top Piece 1 1.88 E2 - I2 −111.99 −4.68***
Session × Base × Top Piece 1 1.56 E1 - R −58.19 −3.83***
Error 1307 I1 - R −51.48 −3.35***
E2 - R 456.23 23.49***
I2 - R 568.21 25.76***

Note: ± p < .10

*

p < .05

**

p < .01

***

p < .001.

The MIC test is highlighted in boldface.

For the contrast category, we conducted t-tests to compare various stimulus pairs of interest. As reported in the table, in the case of the second-processed dimension, all subjects classified the exterior stimulus significantly faster than the interior stimulus. In addition, for Subjects L1-L3, this same pattern is observed on the first-processed dimension (the result is marginally significant for Subject L3). For Subject L4, the difference between the exterior and interior stimulus on the first-processed dimension does not approach statistical significance. Finally, although the redundant stimulus is usually classified significantly more rapidly than are the other members of the contrast category, some exceptions arise, particularly in comparison to the exterior stimulus on the first-processed dimension. Despite these small deviations involving the redundant stimulus, when considered collectively the pattern of mean RTs for the individual subjects points strongly towards the serial self-terminating rule models (mixed-order for Subjects L1-L3 and fixed-order for Subject L4).

Computational Modeling

Following Fific et al. (2010), we fitted the models to the complete sets of correct RT-distribution and error-proportion data by using a variant of the method of quantile-based maximum likelihood estimation (QMLE; Heathcote, Brown & Mewhort, 2002).4 For each stimulus, predictions were generated of correct RTs for the following quantile bins: the fastest 10%, the next four 20% intervals and the slowest 10%. Because error rates were so low, we did not attempt to fit error-RT distributions. However, the error data still strongly constrain the models, because the models are required to simultaneously fit both the correct-RT distributions and the overall error rates for each stimulus. In particular, the fit of the models to the data was evaluated using the multinomial log-likelihood function:

lnL=Σi=1nln(Ni!)Σi=1nΣj=1m+1ln(fij!)+Σi=1nΣj=1m+1fijln(pij) (2)

where Ni is the number of observations of stimulus i (i = 1, n); fij is the frequency with which stimulus i had a correct RT in the j'th quantile (j = 1,m) or was associated with an error response (j = m+1); and pij (which is a function of the model parameters) is the predicted probability that stimulus i had a correct RT in the j'th quantile or was associated with an error response. The log-likelihood values were then transformed to account for the number of free parameters used by each model. In particular, we used the Bayesian Information Criterion (BIC; Schwarz, 1978), which penalizes the log-likelihood based on the number of free parameters and the size of the sample being fit:

BIC=2lnL+npln(M) (3)

where np is the number of free parameters in the model and M is the total number of observations in the data set. The model that yields the smallest BIC is the preferred model.

Quantitative predictions of the RT-distribution and error-probability data were generated using 10,000 simulations for each stimulus (90,000 simulations for the entire set). To illustrate the classification-decision stage of the simulation procedure, consider the serial-self-terminating model and suppose that the test stimulus is xoy2 – see Figure 1. With probability px, the simulation will first check the stimulus's value on Dimension x. Thus, the random walk on Dimension x is simulated to produce a decision time (Tx) on this first-processed dimension. Assuming that the random-walk process leads to the correct decision, namely that xo lies in Region B of Dimension x, then the process self-terminates (because the disjunctive rule that defines the contrast category has been satisfied). In this case, the total decision time on this simulated trial is simply T = Tx. Alternatively, with probability 1- px, the simulation will first check the stimulus's value on Dimension y. The random walk on Dimension y is simulated, yielding a decision time Ty. Assuming a correct decision, namely that y2 lies in Region A along Dimension y, there is insufficient information to make a classification response. Thus, the simulation now checks Dimension x, yielding a decision time Tx, so the total decision time on that simulated trial is T = Ty + Tx. The case just illustrated was for a member of the contrast category (B). The basic process is the same for the members of the target category (A). Note, however, that whereas for the contrast category the process will sometimes self-terminate after the first-processed dimension has been checked, correct classification for members of the target category always demands exhaustive processing of both dimensions (in order to verify that the conjunctive rule is satisfied). Finally, error responses arise from the same processes already described. For example, if a target-category stimulus is presented, and one of the random walks incorrectly decides that the stimulus lies in Region B along one of the dimensions, then the observer will emit an incorrect response. See Fific et al. (2010, pp. 311-317) for analogous discussion of the mechanics of each of the other logical-rule models.

We used a modified Hooke and Jeeves (1961) parameter-search procedure starting from 100 different random starting configurations to find the set of best-fitting parameters for each model. In fitting the models, we assumed for simplicity that the means of the perceptual distributions along each dimension were given by the physically specified dimension values used for constructing the stimuli – see the Method section. (To place the x and y dimensions on roughly the same range, the base-width dimension was arbitrarily scaled by 0.10.)

The fits of the models to the individual-subject data from Experiment 1 are shown in Error! Reference source not found.. (For now, the reader should ignore the column labeled serial attention-switch model.) Inspection of the table reveals that for all four participants, the serial-self-terminating rule model yields, by far, the best BIC fit among the competing models. The superior quantitative fit of that model compared to the other rule models is not surprising given that the qualitative pattern of mean RTs for the individual stimuli, described in the previous section, pointed strongly in its direction. Note as well that the serial-rule model far outperforms both the EBRW model and the coactive model, with the latter being our representative from the class of random-walk, multidimensional decision-bound (RW-MDB) models. Thus, the serial rule model is far outperforming two of the leading extant models of multidimensional perceptual classification.

Perhaps of greatest interest, the serial-self-terminating rule model provides a better fit to the data than does even the free stimulus-drift-rate model. Indeed, even without imposing a penalty based on number of free parameters, the serial rule model still provides a better fit to the data than does the free drift-rate model, i.e., its absolute log-likelihood fit is better. Because the free-drift-rate model generalizes the class of RW-MDB models, it follows that no matter what the form of the multidimensional decision boundary, the serial-rule model would also outperform any model from that class.

Because the free stimulus-drift-rate model can fit any pattern of mean RTs, it is likely that the superior performance of the serial-rule-model resides in its ability to fit the detailed shapes of the individual-stimulus RT distributions. Before pursuing that point further, however, we first introduce an extension of the standard serial-self-terminating rule model that yields an even better account of the full set of data.

Serial Attention-Switch Model

According to the serial rule model, the observer first makes an independent decision about a stimulus's value along one dimension; then, if needed, the observer makes another decision along the second dimension. A likely important component that we have left out of the modeling is that, in cases that require processing of both dimensions, the observer needs to switch attention from one dimension to another. Furthermore, an attention-switch process may play a particularly important role for the present kinds of stimuli, in which the dimensions are located in spatially separated regions. For example, there is clear evidence from other domains of research that spatial shifts of attention take time (e.g., Sperling & Weichselgartner, 1995).

Thus, following Fific et al. (2010), we extend the serial rule model by also including an attention-switch stage, which we assume is log-normally distributed with mean μAS and variance σAS2. Thus, in cases in which a subject must process both dimensions of a stimulus, the total RT would be the sum of the residual time, the times to make independent decisions on each of the dimensions, and the time to execute an attention switch. (For the contrast category, an attention-switch occurs only on those trials in which both dimensions need to be processed to verify if the disjunctive rule is satisfied; it does not occur on trials in which the observer is able to self-terminate after completion of the first-processed dimension. Correct processing of the target-category stimuli always requires an attention switch, because, as discussed previously, verifying that the conjunctive rule is satisfied demands exhaustive processing of the dimensions.)

The fits of the serial-attention-switch model are shown along with the other models in Table 3. Using the BIC statistic as the criterion of fit, this extended model yields dramatically improved fits compared to even the standard serial-self-terminating model for subjects L2-L4 (and approximately the same fit for Subject L1). Therefore, we focus on the performance of this model in the remainder of this section.

Table 3.

Experiment 1: Negative Log-Likelihood and BIC Fits for the Models.

Serial Self-
Terminating
Parallel Self-
Terminating
Serial Exhaustive Parallel
Exhaustive

Participant −ln L BIC −ln L BIC −ln L BIC −ln L BIC
L1 219 518 324 719 400 873 471 1015
L2 383 847 646 1354 639 1350 744 1561
L3 328 736 403 878 386 844 429 930
L4 361 803 520 1112 830 1732 1163 2398
Sum 1291 2904 1893 4063 2255 4799 2807 5904
Coactive EBRW Free Stimulus
Drift Rate
Serial Self-
Terminating +
Attention
Switching

Participant −ln L BIC −ln L BIC −ln L BIC −ln L BIC
L1 534 1140 621 1306 291 694 211 519
L2 780 1633 852 1769 587 1287 291 678
L3 443 957 451 966 373 858 244 596
L4 649 1370 701 1466 509 1130 252 600
Sum 2406 5100 2625 5507 1760 3969 998 2393

Note. −ln L = negative log-likelihood, BIC = Bayesian Information Criterion. Boldface type is used to indicate the best fits among the baseline models and also the fits for the serial-attention-switch model.

The predicted mean RTs and error rates from the serial-attention-switch model are reported for each individual stimulus in Table 1. Comparing the predictions to the observed data, it is clear that the model provides a very good overall account of the mean RTs and error rates. (The correlation between the predicted and observed mean RTs of the nine stimuli was r=.996, r=.991, r=.978, and r=.993 for Subjects L1-L4, respectively.) The model accounts for all of the aforementioned qualitative effects involving the mean RTs. To some extent, it is even able to account for the finding that the redundant stimulus is sometimes classified more slowly than the exterior stimulus on the first-processed dimension (although it clearly underestimates the magnitude of the effect). The explanation is as follows. On some small proportion of trials in which the redundant stimulus is presented, the subject's decision on the first-processed dimension will be in error. On those trials, the subject will undergo a second stage of processing in which the second dimension is examined. The most likely outcome is that the subject classifies the redundant stimulus correctly during that second stage of processing. Such trials will result in slow RTs, because two stages of processing were required to achieve the correct response. By contrast, if the subject makes an error on the interior or exterior stimulus during the first-stage or processing, the final response is most likely to be an error (because these stimuli fall to the wrong side of the decision boundary on the second dimension). Thus, averaging across trials, the overall correct mean RT for the redundant stimulus may be slowed.

The fits of the serial-attention-switch model are illustrated in greater detail in Figure 6, which plots the predicted correct-RT distributions for each individual subject and stimulus against the observed correct-RT distributions. In these plots, the observed correct-RT distributions (open bars) are displayed as vincentile histograms (for a review, see Van Zandt, 2000, pp. 429-430). The predicted correct-RT distributions are shown as smoothed densities computed from the correct trials of the 10,000 simulated RTs for each individual stimulus (using a Gaussian kernel estimator -- see Van Zandt, 2000, p. 430). As can be seen, beyond accounting for the data at the level of mean RTs, the model generally does an excellent job of accounting for the detailed shapes of the individual-stimulus RT distributions. None of the other models came close to matching this degree of quantitative precision.

Figure 6.

Figure 6

Experiment 1: Fit (smooth curves) of the serial-self-terminating model (with attention switching) to the detailed response time (RT) distribution data (open bars) of the individual subjects. Each cell of each panel shows the RT distribution associated with an individual stimulus. Within each panel, the spatial layout of the stimuli is the same as in Figure 1. See text for further description of the derivation of the predicted and observed RT-distribution plots.

Survivor-Interaction-Contrast (SIC) Function Analyses

To obtain additional perspective on the model-fitting results, we analyze the RT-distribution data by computing what is known as the survivor-interaction-contrast (SIC) function associated with the target-category members (Townsend & Nozawa, 1995). As reviewed below, the SIC function provides highly diagnostic information for telling apart alternative mental architectures and stopping rules, and an analysis involving the SIC function provides some interesting insights regarding the present models.

The survivor function associated with a time-based random variable T is defined as the probability that the process takes greater than t time units to complete:

S(t)=P(T>t). (4)

The SIC function is defined analogously to the mean-interaction-contrast (MIC) described earlier in this article (Equation 1). For each time value t, one computes:

SIC(t)=[SLL(t)SLH(t)][SHL(t)SHH(t)], (5)

where Sij(t) is the survivor function associated with target-category member ij. (Both the MIC and SIC statistics are computed with respect to the target-category stimuli only, not the contrast-category stimuli.) Because it will be useful in discussing the results below, we note here that the MIC is equal to the integral of the SIC. As illustrated schematically in Figure 7, the alternative mental architectures make differing predictions of the form of the SIC function (Townsend and Nozawa, 1995). (Recall that, in the present paradigm, correct responding to the members of the target category demands exhaustive processing of the dimensions. Therefore, we illustrate the predictions for only the exhaustive cases.) As can be seen, if the dimensions are processed in serial fashion, then the SIC function will be S-shaped, with the initial part of the function being negative, the latter part positive, and the areas subsumed by the negative and positive portions being equal to one another (MIC=0). If processing is parallel, then the SIC function will be negative everywhere (MIC<0). And if processing is coactive, then the function will have an initial negative blip followed by extended positivity (MIC>0).

Figure 7.

Figure 7

Schematic illustration of the predicted survivor-interaction contrast (SIC) functions for the serial exhaustive, parallel exhaustive and coactive models.

The observed SIC functions for the four subjects in our experiment are shown in Figure 8. It is clear from inspection that they match the general S-shaped form of a serial processing model. Also plotted in Figure 8 (top panels) are the precise quantitative predictions of the SIC curves from the serial-attention-switch model. These predicted SIC functions are those derived when holding all parameters fixed from the previous analyses in which the model was fitted to the individual-stimulus RT distributions. For all four subjects, the model does a very good job of characterizing the precise form of the observed SIC functions. By way of comparison, in the bottom panels of Figure 8 we plot the derived SIC functions from the free stimulus-drift-rate model. It can be seen that the model misses systematically the observed functions. The cross-over point from negativity to positivity is always too late and the positive portion of the function has too long a tail. In addition, for some of the subjects, the area subsumed by the positive part of the function is too small relative to the area subsumed by the negative part of the function. (The positive-tail problem would be even worse if parameters were chosen to make the negative and positive regions equal in area.)

Figure 8.

Figure 8

Experiment 1: Predicted and observed survivor-interaction-contrast (SIC) functions computed over the target-category stimuli for each participant. Top panels: serial-self-terminating attention-switch model. Bottom panels: free stimulus-drift-rate model.

These SIC analyses provide insights into why the serial rule model is providing a better account of the shapes of the RT distributions than is the free stimulus-drift-rate model. Apparently, the free drift-rate model is predicting an RT distribution for the LL stimulus that, relative to the other target-category distributions, has too long a tail and is too positively skewed (see SIC definition in Equation 5). Because the mean RT for the LL stimulus is slower than for the other stimuli, the free drift-rate model must assign that stimulus a slower drift rate parameter. But, in the case of single-channel random-walk and diffusion models, the slower the drift rate, the more positively skewed will be the predicted distribution (e.g., Ratcliff & Smith, 2004). Although similar considerations apply to the serial-rule model, note that in that model the predicted decision-time distributions arise as sums of multiple component distributions (i.e., two stages of independent decisions plus an attention-switch stage). Thus, owing to considerations of the central limit theorem, the degree of positive skewing of the slow distributions is not as extreme as in a single-channel random-walk model, i.e., the distributions are more bell-shaped.5

In sum, the SIC analyses provide further strong support for the present logical-rule model account of classification RTs, namely that decision making for the target category involved serial processing of the component dimensions. Moreover, although previous researchers have reported SIC analyses to diagnose underlying mental architectures and stopping rules, to our knowledge the present analysis is the first to consider the quantitative match of parameterized models to such data. Finally, the analysis provides insights into the reason why the present serial-rule model is predicting the detailed shapes of the individual-stimulus RT distributions better than is the single-channel, free-stimulus-drift-rate model.

Best-Fitting Parameters

The best-fitting parameters from the serial attention-switch model are reported for each individual subject in Table 4. Overall, the parameter estimates are sensible and easy to interpret. For example, for each subject, the best-fitting decision-bound parameters (Dx and Dy) are located roughly midway between the means of the perceptual distributions of the contrast-category stimuli and the adjacent target-category stimuli. The residual-time means range between 353.3 and 648.0 ms, and the attention-switch means range between 104.4 and 259.8 ms, which also seem like reasonable estimates. According to the px estimates, Subject L4 always processed the dimensions in the order x-then-y, which agrees with our previous inference that Subject L4 was a fixed-order serial processor. Also in agreement with our previous inferences, the best-fitting px parameters indicate that Subjects L1 and L2 were mixed-order serial processors; and that Subject L3 was intermediate, most often processing in the order x-then-y, but processing in the reverse order on roughly 10% of the trials.

Table 4.

Experiment 1: Best-Fitting Parameters for the Serial-Self-Terminating Attention-Switch Model.

Parameters
Participant σ x σ y Dx Dy +A B k μ R σ R μ AS σ AS px
L1 4.27 3.08 15.87 10.00 10 9 11.60 364.3 115.7 104.4 20.0 0.75
L2 4.52 2.17 16.04 10.05 7 10 5.41 492.0 534.4 165.7 82.4 0.47
L3 7.69 4.98 15.56 9.89 28 19 1.11 648.0 111.0 227.3 1168.9 0.92
L4 5.14 4.78 15.88 9.94 18 12 2.45 353.3 42.5 259.8 171.0 1.00

Note: σx, σy = perceptual standard deviations on dimensions x and y; Dx, Dy = decision boundary locations on dimensions x and y; +A, −B = random walk criteria; k = random-walk scaling constant; μR, σR = residual-time mean and standard deviation; μAS, σAS = attention-switch-time mean and standard deviation; px = probability that dimension x is processed before dimension y.

Discussion

In summary, the qualitative patterns of mean RTs for the target and contrast categories, the shapes of the derived SIC functions, and the detailed quantitative fits to the individual-stimulus RT distributions provide strong support for the serial-self-terminating logical-rule model of classification RTs. Although Fific et al. (2010) reported previous data that pointed toward the serial self-terminating rule model, those previous data were collected under conditions in which subjects were provided with explicit knowledge of the rule-based category structures and where most were provided with explicit instructions to use a serial-self-terminating processing strategy to implement the logical rules. By contrast, the present results were obtained in conditions in which subjects needed to learn the categories via trial-by-trial induction over individual training exemplars and were free to use whatever classification strategy they chose. Thus, the data provide convincing evidence of conditions in which subjects freely choose to use logical-rule strategies in tasks of multidimensional perceptual classification, and provide strong support for the newly proposed logical-rule models of classification RT.

Experiment 2

Thus far, the evidence for logical-rule use has been found in situations in which the stimulus dimensions are located in spatially separated regions. This situation held not only in our Experiment 1 but also in the other recent studies that investigated logical-rule models of classification RT (Fific et al., 2010; Lafond et al., 2009). The motivating idea for testing spatially separated dimensions was to promote the possibility of serial processing of dimensions, which, intuitively, seems conducive to applying logical-rule strategies. In the present Experiment 2, the idea was to begin to investigate more general conditions in which logical-rule use might take place. We tested the same logical category structure as in Experiment 1. However, instead of using stimuli in which the dimensions are located in spatially separated regions, we now test stimuli in which the dimensions are spatially overlapping. Here, our expectation is that serial-self-terminating processing of the dimensions may be far less likely. Nevertheless, alternative information-processing architectures may still underlie the application of logical-rule strategies.

Because the manner in which subjects may process spatially overlapping dimensions in the present categorization tasks is unclear, and because we wished to study such processing in a context in which people do try to apply logical-rule strategies, we decided to reinstitute the research approach from Fific et al. (2010). In particular, subjects were provided with explicit instructions about the rule-based structure of the categories and were asked to classify the stimuli in accord with these logical rules; however, specific processing instructions (e.g., serial self-terminating) for individual dimensions were not provided. Once we obtain a better understanding of the expected performance patterns under such conditions, the stage will be set for later investigating speeded categorization with spatially overlapping stimulus dimensions under open-ended learning and performance conditions.

Method

Participants

Four Indiana University students completed Experiment 2 and received $9 dollars per session (with an extra $3 dollar bonus per session for accurate performance) for their participation. The subjects had normal or corrected-to-normal vision and were unaware of the issues under investigation in the research.

Stimuli

The stimuli were 225 × 150 pixel rectangles displayed in red (Munsell Hue 5R, brightness value 5) with a 10-pixel-wide black border and a 100 × 10 pixel interior vertical black line that extended from the lower-left corner of the rectangle (also used in Nosofsky & Little, in press). There were 9 stimuli composed of all combinations of 3 levels of saturation (Munsell-chroma levels 3.5, 5.5 and 18) and 3 positions of the vertical line (80, 64 and 20 pixels from the left-hand side of the rectangle). The colors were generated by converting them into RGB values with the Munsell Conversion Software v8 (http://wallkillcoor.com/). The stimuli subtended a vertical visual angle of about 3.82 degrees and a horizontal visual angle of about 5.72 degrees.

Similarity-ratings studies, described in previous work by Nosofsky and Little (in press), were used to derive a two-dimensional scaling solution for the stimuli. Some details regarding this scaling work and the derived coordinate parameters for the stimuli are reported in Appendix A of the present article. The structure of the derived scaling solution closely matched the schematic design illustrated in our Figure 1.

Procedure

The procedure of Experiment 2 was identical to Experiment 1 with the following exceptions. Prior to the start of the experiment, participants were shown a schematic diagram of the category space that displayed all nine of the stimuli along with the rule boundaries. The nature of the rules was explained to the subjects and they were instructed to classify the stimuli in accord with these rules. Initial pilot tests revealed that discriminating items at the category boundaries was difficult; hence, at the outset participants were given instructions about the difficulty of this discrimination and were allowed to inspect the stimulus dimensions separately. That is, participants were allowed to view rectangles displayed in the three levels of saturation but without the inset line. Likewise, participants viewed the position of the inset line but in a white colored rectangle. The order in which each dimension was shown was counterbalanced across participants. The first session was used as a practice session for participants to learn to perceptually discriminate the diagnostic dimensional values and to practice the rules that were provided at the start of the experiment.

Results

In Experiment 2, we refer to the participants as O1 to O4 where the O designates the overlapping-dimensions experiment. Again, the initial training/practice session (Session 1) was excluded from the analyses. RT analyses were conducted for correct trials only. Finally, for each individual participant and stimulus, trials with RTs less than 150 ms and trials with RTs greater than 3 standard deviations above the mean for that stimulus were also removed. Less than 2% of the trials were removed by this method. As shown in Table 5, error rates were low; hence, we now turn to analyses of the mean RTs.

Table 5.

Experiment 2: Mean Correct RTs (ms) and Error Rates for the Individual Stimuli, Along with the Modeling Predictions.

Item
Participant O1 HH HL LH LL EY IY EX IX R
RT Observed 807.3 1005.1 890.5 1101.5 905.5 961.9 862.9 865.2 800.8
RT Model 756.6 978.9 901.7 1129.5 937.4 989.4 874.3 885.1 873.7
p(e) Observed 0.00 0.02 0.01 0.01 0.02 0.03 0.04 0.03 0.00
p(e) Model 0.00 0.01 0.00 0.01 0.03 0.03 0.02 0.02 0.00

Participant O2 HH HL LH LL EY IY EX IX R

RT Observed 473.1 612.4 578.3 634.7 533.8 587.5 584.3 596.2 508.5
RT Model 481.1 610.7 580.8 636.4 557.5 563.7 593.0 590.2 521.5
p(e) Observed 0.00 0.01 0.00 0.01 0.00 0.01 0.05 0.02 0.00
p(e) Model 0.00 0.02 0.00 0.02 0.00 0.00 0.01 0.01 0.00

Participant O3 HH HL LH LL EY IY EX IX R

RT Observed 627.7 832.8 723.1 827.7 691.1 775.0 717.6 793.4 626.4
RT Model 622.3 810.4 764.9 907.0 707.4 734.7 725.5 787.6 656.9
p(e) Observed 0.01 0.07 0.02 0.05 0.01 0.01 0.04 0.01 0.00
p(e) Model 0.00 0.06 0.04 0.10 0.01 0.01 0.03 0.02 0.00

Participant O4 HH HL LH LL EY IY EX IX R

RT Observed 640.1 950.4 859.2 1102.2 942.5 1156.6 662.6 680.0 649.0
RT Model 636.8 932.9 865.4 1131.0 975.7 1142.5 672.5 699.4 691.8
p(e) Observed 0.00 0.00 0.00 0.00 0.01 0.02 0.01 0.00 0.00
p(e) Model 0.00 0.01 0.00 0.01 0.01 0.02 0.00 0.00 0.00

Note. Predictions are from the mixed serial/parallel model, which is described in the text.

Mean RT Analyses

The mean correct RTs for the individual subjects and stimuli are reported in Table 5 and are displayed graphically in Figure 9. As can be seen from inspection of the figure, the patterns of mean RTs are considerably more variable across subjects than was observed in Experiment 1. In addition, comparing the patterns to the canonical prediction graphs in Figure 3, no clear-cut winners seem to emerge from among the competing process models. The pattern of mean RTs for O1 comes close to matching the predictions from a fixed-order serial-self-terminating rule model. The main exception is the faster-than-expected mean RT for the redundant stimulus. For O2, the target-category RTs show a clearly under-additive pattern, suggesting parallel processing; however, for the contrast category, the mean RTs on the first-processed dimension are not flat, contradicting the predictions from a pure parallel model. For O3, again, the target-category RTs are highly under-additive, suggesting parallel processing; however, the pattern of contrast-category RTs strongly resembles the predictions from a mixed-order serial-self-terminating model. Finally, like O2 and O3, Subject O4 also shows signs of both parallel processing (under-additive target-category RTs), but (fixed-order) serial-self-terminating processing for the contrast category.

Figure 9.

Figure 9

Experiment 2: Observed mean response times (RTs) for the individual subjects and stimuli. Error bars represent ±1 SE. The left panels show the results for the target-category stimuli, and the right panels show the results for the contrast-category stimuli. Left panels: L = low-discriminability dimension value; H = high-discriminability dimension value; D1 = Dimension 1; D2 = Dimension 2. Right panels: R = redundant stimulus; I = interior stimulus; E = exterior stimulus. For ease in making comparisons to the prediction graphs in Figure 3, the contrast-category stimuli are labeled with respect to whether they are on the “first-processed” or “second-processed” dimension, as defined in the text.

Clearly, the data pose a major challenge for all of the baseline logical-rule models. Nevertheless, to anticipate, we will argue below that a reasonable account of the data is provided by assuming that subjects applied logical-rule strategies, with the mental architecture for implementing the rules involving a mix of parallel and serial processing across trials. In our view, such an occurrence seems plausible for the present kinds of stimuli.

A full presentation of the statistical-test results for the four subjects is provided in Table 6. In general, the statistical-test results confirm the descriptions that we provided above. However, because of the variability across subjects, it is difficult to summarize the many statistical-test results in succinct fashion. Here, we provide a capsule summary of the results for only the target-category members. We conducted a 4x2x2 ANOVA of the target-category RTs of each subject, using as factors Session (2-5), discrimination level of the saturation dimension (L or H), and discrimination level of the vertical-bar dimension (L or H). For all participants, there was a main effect of session, reflecting a slight speeding of responding over time. Although not reported in the table, the main effects of stimulus-dimension discriminability were significant for all subjects, reflecting faster processing of the H values than of the L values. The main result of interest is that, for Subjects O2, O3, and O4, the interaction between discrimination levels of the bar and saturation dimensions was statistically significant, reflecting the under-additive pattern of mean RTs seen for these subjects in Figure 9. Also as expected from observation of Figure 9, this interaction was not statistically significant for Subject O1. Finally, we mention that there were occasional significant three-way interactions of Session × Saturation × Bar (see Table 6), suggesting some changes in the form of processing over the course of the experiment.

Table 6.

Experiment 2: Statistical-Test Results for the Individual Subjects.

Target Category
Contrast Category
Participant L1 df F M t
Session 3 36.52*** E1 - I1 −2.30 −0.09
Bar × Saturation 1 0.15 E2 - I2 −56.39 −2.05 *
Session × Bar × Saturation 1 0.39 E1 - R 62.02 2.50 *
Error 1374 I1 - R 64.32 2.59 ***
E2 - R 104.63 3.96***
I2 - R 161.02 6.17***

Target Category
Contrast Category
Participant L2 df F M t

Session 3 35.86*** E1 - I1 −11.84 −1.58
Bar × Saturation 1 88.35*** E2 - I2 −53.74 −7.04***
Session × Bar × Saturation 1 3.8 E1 - R 75.79 12.72***
Error 1385 I1 - R 87.63 12.29***
E2 - R 25.22 4.43***
I2 - R 78.97 10.46***

Target Category
Contrast Category
Participant L3 df F M t

Session 3 18.64*** E1 - I1 −75.81 −3.47***
Bar × Saturation 1 16.82*** E2 - I2 −83.87 −4.10***
Session × Bar × Saturation 1 0.19 E1 - R 91.19 5.59***
Error 1327 I1 - R 167.00 8.46***
E2 - R 64.76 4.08***
I2 - R 148.63 7.98***

Target Category
Contrast Category
Participant L4 df F M t

Session 3 10.96*** E1 - I1 −17.46 −1.14
Bar × Saturation 1 4.73* E2 - I2 −21.16 −8.08***
Session × Bar × Saturation 1 3.18* E1 - R 13.60 0.96
Error 1385 I1 - R 31.06 1.99 *
E2 - R 293.5 14.95***
I2 - R 507.66 22.30***

Note: ± p < .10

*

p < .05

***

p < .01

***

p < .001.

The MIC test is highlighted in boldface.

Computational Modeling

The model-fitting procedure was identical to the one that we used in Experiment 1. In conducting the modeling, the means of the perceptual distributions for the dimensional values were set to equal the coordinate values derived from the two-dimensional scaling solution for the stimuli (see Appendix A). These values were 0.00, 0.37, and 1.69 for Dimension x (saturation); and 0.00, 0.47, and 2.63 for Dimension y (bar position).

In addition to the seven baseline models that we fitted to the data from Experiment 1, we fitted a model that assumed a mixture of serial and parallel processing across trials. Specifically, the model assumed that with probability ps, processing on a given trial is serial self-terminating; and with probability 1 - ps, processing on a given trial is parallel self-terminating. This mixture model was motivated by our observation of the mixed patterns of mean RTs described in the Results section. Interestingly, as will be seen, our formal analyses will lead us to obtain some independent evidence that further supports this hypothesis of a mixture of serial and parallel processing.

To reduce the number of free parameters necessary for fitting the mixed model, we made several simplifying assumptions. First, we assumed that identical decision-boundary positions, random-walk criteria, and residual-time distributions operated for both the serial and parallel processes. Second, to allow for different processing rates across trials in which serial versus parallel-processing operated, the dimensional variances for the serial process were multiplied by a parameter,mσ, to yield the variances for the parallel process. Finally, we also allowed the scaling parameter for the parallel-process random walks (kp) to differ from that of the serial-process random walks (ks). In total, the mixed serial/parallel model had 13 free parameters (see listing in Table 7).

Table 7.

Experiment 2: Best-Fitting Parameters for the Mixed Serial/Parallel Model.

Parameters
σ x σ y Dx Dy +A B ks kp μ R σ R mσ px ps ps-C
O1 0.37 0.59 0.10 0.14 6 5 31.08 91.77 361.9 314.4 0.13 0.98 0.91 0.27
O2 0.20 0.25 0.12 0.22 27 16 5.22 0.79 434.8 43.6 6.24 1.00 0.08 -
O3 0.11 0.10 0.30 0.43 3 5 26.92 68.75 450.0 128.5 8.09 0.46 0.85 -
O4 0.26 0.82 0.29 0.27 8 12 7.82 73.05 471.8 110.0 0.37 0.88 0.91 -

Note: σx, σy = perceptual standard deviations on each dimension; Dx, Dy = decision boundary locations on each dimension; +A, −B = random walk criteria; ks, kp = scaling constants for serial and parallel random walks; μR, σR = residual-time mean and standard deviation; mσ = variance multiplier parameter; px = probability that dimension x is processed before dimension y; ps = probability of serial processing on each trial. For participant O1, ps-C = probability of serial processing for the contrast-category.

A complication arose in our initial fits of the mixed model to the data from Subject O1. In particular, although the model provided a good fit to the subject's RT-quantiles data, it systematically overestimated the mean RTs (due to predicting a small number of exceedingly long RTs in the slowest quantile). To remedy the problem for this participant, we added a parameter to allow for the possibility that the relative proportion of serial and parallel processing might differ between the target and contrast category. That is, when categorizing a target-category item, serial processing occurred with probability ps; but when categorizing a contrast-category item, serial processing occurred with a different probability, ps-C. Although the assumption is post hoc, we discuss plausible reasons for such an occurrence in our General Discussion.

The fits of all of the models are shown in Table 8. The BIC values indicate that the mixed serial/parallel model is strongly preferred compared to all of the alternative logical-rule models, the coactive model, and the EBRW model. This improvement in fit occurs despite the penalty assigned to the model due to its increased number of free parameters. The comparisons between the mixed model and the free stimulus-drift-rate model are not as clear-cut. For two subjects (O1 and O4), the mixed model still yields dramatically improved fits; but for the other two subjects (O2 and O3), the BIC values of those two models are in the same ballpark. Because the mixed model appears to provide the most general account of the full set of data, in the remainder of this section we focus our analyses on this model. However, we revisit the free stimulus-drift-rate model in the Discussion.

Table 8.

Experiment 2: Negative Log-Likelihood and BIC Fits for the Models.

Serial Self-
Terminating
Parallel Self-
Terminating
Serial Exhaustive Parallel
Exhaustive

Participant −ln L BIC −ln L BIC −ln L BIC −ln L BIC
O1 316 713 326 725 348 768 356 784
O2 266 612 236 545 415 903 392 856
O3 292 664 265 602 303 678 306 685
O4 235 551 261 594 416 904 476 1024
Sum 1109 2540 1088 2466 1482 3253 1530 3349
Coactive EBRW Free Stimulus
Drift Rate
Mixed Serial-
Parallel

Participant −ln L BIC −ln L BIC −ln L BIC −ln L BIC
O1 340 752 331 726 306 725 209 531
O2 427 926 473 1010 218 549 218 541
O3 321 715 302 669 205 523 217 539
O4 310 692 395 855 233 579 192 489
Sum 1398 3085 1501 3260 962 2376 836 2100

Note. −ln L = negative log-likelihood, BIC = Bayesian Information Criterion. Best fits are indicated by boldface type.

The predicted mean RTs and error rates from the mixed model are shown along with the observed data in Table 5. We start by noting that, although the error rates were low, the model is in the right ballpark for the error data. More important, inspection of the table indicates that the model provides a reasonably good account of the intricate patterns of mean RTs. (The correlation between the predicted and observed mean RTs of the nine stimuli was r=.939, r=.969, r=.906, and r=.995 for Subjects O1-O4, respectively.) Note that, when there is a mixture of serial and parallel processing across trials, the target-category mean RTs would be expected to show an under-additive pattern overall. The reason is that, across trials, one would be averaging an additive pattern of RTs (from the serial process) with an under-additive pattern (from the parallel process), thereby yielding an under-additive final average. Likewise, the mixture of serial-self-terminating and parallel-self-terminating processing across trials would also tend to yield the configuration of mean RTs most often seen for the contrast category (see Figures 3 and 9). On trials in which a parallel-self-terminating process operated, the RTs for the interior and exterior stimulus on each dimension would be flat. But on trials in which a serial-self-terminating process operated, the RTs for the exterior stimulus would tend to be faster than for the interior stimulus (at least on the second-processed dimension). In the final average, the exterior stimulus would therefore tend to have a faster RT than the interior stimulus (at least on the second-processed dimension). This general pattern is again the main one seen in the observed data. Finally, assuming a mixture of serial-self-terminating and parallel-self-terminating processing across trials, one would also expect to observe an RT advantage for the redundant stimulus, which is also the main pattern in the observed data.

In Figure 10 we show plots of the predicted RT distributions for each individual subject and stimulus against the observed RT distributions. The same methods were used for generating these plots as already described in Experiment 1. Overall, the mixed serial/parallel model accounts well for the detailed shapes of the individual-stimulus RT distributions.

Figure 10.

Figure 10

Experiment 2: Fit (smooth curves) of the mixed serial/parallel model to the detailed response time (RT) distribution data (open bars) of the individual subjects. Each cell of each panel shows the RT distribution associated with an individual stimulus. Within each panel, the spatial layout of the stimuli is the same as in Figure 1. See text for further description of the derivation of the predicted and observed RT-distribution plots.

Further support for the mixed serial/parallel model is shown in its fits to the SIC functions associated with the target-category members, which we plot in Figure 11. Again, the predicted functions are derived by holding fixed all parameters that were used for fitting the model to the individual-stimulus RT distributions. The predicted SIC functions do a reasonably good job of following along with the observed curves.

Figure 11.

Figure 11

Experiment 2: Predicted and observed survivor-interaction-contrast (SIC) functions computed over the target-category stimuli for each participant. The predictions are from the mixed serial/parallel model.

Of particular interest in these plots are the results for Subject O4. The SIC is predominately negative, reflecting the under-additive MIC that we have already noted for this subject. However, the SIC switches from negative to slightly positive for large processing times t. This pattern is not characteristic of any of the “pure” mental architectures (i.e., compare to the schematic plots for the serial, parallel, and coactive models in Figure 7). It could be produced, however, if there were a mix between parallel and serial processing across trials. That is, as can be seen from consideration of Figure 7, an average of the SICs for the serial and parallel processes could produce an SIC with the form seen for Subject O4. Thus, our hypothesis of a mix of serial and parallel processing across trials receives independent supporting evidence from this hybrid SIC function.6

Best-Fitting Parameters

The best-fitting parameters for the mixed serial/parallel model are reported in Table 7. It is straightforward to interpret most of the parameter estimates. For example, in all cases, the decision bounds are located intermediate between the means of the contrast-category distributions and the adjacent target-category distributions, as seems sensible. In addition, across subjects, the mean processing times for the residual stage vary between 361.9 and 471.8 ms, which also seem like reasonable estimates. The estimates of the serial/parallel mixture parameter indicate that Subject O2 was primarily a parallel processor, Subject O4 primarily a serial processor, whereas O1 was a serial processor for the target category but mostly a parallel processor for the contrast category. (We explain the plausibility of the latter result in our General Discussion.) These mixture-parameter estimates seem sensible in light of the patterns of mean RTs already discussed for these subjects. (On the other hand, because Subject O3 had strongly contrasting signatures of mean RTs for the target and contrast categories, we were unsure what mixture-parameter estimate to expect for that subject.)

Interpretation of some of the other parameter estimates, including the serial versus parallel scaling constants, the variance multiplier, and the magnitude of the random-walk criteria, is more complicated. We suspect that some of these parameters trade off strongly with one another, e.g., an increased magnitude of the criteria +A and −B is compensated for with reduced values of the step-time scaling constants ks and kp. In addition, recall that for simplicity and to reduce the number of free parameters, we imposed some arbitrary constraints of parameter equality across the serial and parallel processes. Thus, interpretations regarding the values of these other free parameters should be made with caution.

Discussion

In the present conditions involving spatially overlapping stimulus dimensions, the patterns of classification RT data did not conform to the predictions of any of the baseline logical-rule models, despite the fact that subjects were provided with explicit knowledge of the rule-based category structures and with instructions to use the rules as a basis for classification. Nevertheless, our modeling analyses lead us to suggest the possibility that, although subjects attempted to apply the logical rules, the mental architecture for implementing the rules may have involved a mix of serial and parallel processing. A formalized version of a mixed serial/parallel processing model gave a good account of the mean RTs and error rates associated with the individual stimuli; a good account of the shapes of the individual-stimulus RT distributions; and a good account of the derived SIC functions associated with the target-category members. Furthermore, in our view, this possibility of a mix of serial and parallel processing across trials seems plausible for the present types of stimuli. Finally, this version of the logical-rule model far outperformed two of the major extant contenders in the field, namely the EBRW and RW-MDB models.

In light of the good fits achieved by the mixed serial/parallel model, we decided to revisit our results from Experiment 1 and check the model's performance in that experiment as well. Interestingly, although the BIC fit for the mixed model was worse than for the pure serial model for Subject L1, it yielded substantially improved BIC fits for Subjects L2-L4. However, the best-fitting parameter estimates indicated that the Experiment-1 subjects engaged in parallel processing on only between .12-.16 of the trials, whereas a couple of our Experiment-2 subjects engaged in parallel processing on the majority of the trials. Thus, a natural interpretation is that processing may have been mixed in both experiments, with the stimulus manipulation simply influencing the proportion of serial versus parallel processing that took place.

Finally, in the present Experiment 2, for two of the subjects the free stimulus-drift-rate model yielded fits to the data that were as good as those of the mixed serial/parallel rule model. Of course, a major limitation of the free stimulus-drift-rate model is that it does not specify an underlying process that gives rise to the drift rates. This limitation does not preclude, however, that a deeper process interpretation may eventually be forthcoming. Thus, under conditions involving spatially overlapping stimulus dimensions, it may be that for some subjects, classification processing is better described in terms of multiple-channel decision making (as in the present logical-rule models of RT), whereas for other subjects the process is better described in terms of single-channel forms of decision making (as in the free drift-rate model). Future research will be needed to develop paradigms that can sharply discriminate between these alternatives.

General Discussion

To summarize, a classic idea in cognitive psychology and cognitive science is that, in many situations, people may represent categories in terms of logical rules. Until very recently, however, researchers have not developed rigorous theories to predict what categorization RTs should look like if such rules are indeed being used. In recent theoretical work, Fific et al. (2010) proposed a class of logical-rule models of categorization RT to help fill this gap. The models combine mental-architecture and random-walk approaches within an integrated framework, and allow for the prediction of detailed RT distribution data and error rates at the level of individual subjects and individual stimuli. A key idea within the framework is that logical-rule strategies in categorization may be implemented via a variety of processing modes, and diagnostic paradigms are needed to tease apart the processing modes used in alternative tasks and situations.

In their previous work, Fific et al. provided validation tests of the proposed models by giving subjects explicit instructions to use particular logical rules as a basis for classification. In addition, in most cases, subjects were given explicit instructions to use a serial-self-terminating process to implement the rules. By contrast, in Experiment 1 of the present work, we tested the extent to which the models might provide a good account of human categorization performance under open-ended learning conditions and where the subjects were free to adopt whatever classification strategy they chose. Impressively, the serial-self-terminating logical-rule model again yielded an excellent account of the classification RT data of the individual subjects, and far outperformed a variety of competing models of classification RT. Not only did it predict well the patterns of mean RTs and error rates for the individual stimuli in the tasks, it also provided an excellent account of the detailed shapes of the individual-stimulus RT distributions.

In both the present Experiment 1 and other recent work that has investigated logical-rule models of classification RT (e.g., Fific et al., 2008, 2010; Lafond et al., 2009), a salient aspect of the designs was that the defining dimensions of the stimuli were located in spatially separated regions. The motivation for using this type of stimulus format was that it might promote serial processing of the dimensions, which seems conducive to implementing logical-rule strategies. However, logical-rule use in categorization may also be implemented within alternative mental architectures. To begin to explore that issue, in Experiment 2 we tested the same category structures, but now with an alternative set of stimuli in which the dimensions were located in spatially overlapping regions. With these types of stimuli, we expected that serial processing of the dimensions would be less likely, and that forms of parallel or coactive processing of the dimensions might operate instead. To investigate performance under controlled conditions, however, we decided to reinstitute part of the research approach from Fific et al. (2010) by informing subjects of the rule-based structure of the categories.

Under these latter conditions, the patterns of performance were more complicated than observed in Experiment 1, and the RT data did not conform to any single one of the “baseline” logical-rule models (nor to major extant alternatives in the field). Nevertheless, a good working hypothesis seems to be that the logical rules were implemented via a mixture of serial-self-terminating and parallel-self-terminating processing. A formalized mixture model along these lines provided a good account of the intricate patterns of individual-stimulus mean RTs and again captured well the shapes of the individual-stimulus RT distributions.

In our view, this hypothesis of a mixture of serial and parallel processing is a very reasonable one and it receives added plausibility from other domains of research. For example, consider the classic studies of Shiffrin and Scheider (1977) and Schneider and Shiffrin (1977) that investigated attention and information processing in the domains of visual and memory search. In conditions involving “varied mappings” of stimuli to responses, it appeared that subjects mostly engaged in serial-search processes. But in conditions involving pure “consistent mappings,” it appeared that automatic attention responses developed that made allowance for parallel processing.7 The varied-mapping and consistent-mapping conditions are endpoints of a continuum, and surely there are intermediate conditions in which both serial and parallel processes may be brought jointly into play. Likewise, Wolfe's (1994) influential Guided Search model of visual search involves parallel and serial components operating in concert. Cousineau and Shiffrin (2004) also recently suggested a general model that involved combinations of both serial and parallel processing to describe detailed forms of RT-distribution data for both hits and misses in tasks of visual search.

Clearly, however, the idea of a serial-parallel mixture is only one possibility, and future research will need to develop and test alternative accounts of our observed classification RT data. For example, one source of evidence that was consistent with the hypothesis of a mixture of serial and parallel processing was our observation of a hybrid form of the target-category SIC function for one of our participants. To review, we found that an SIC function that starts out strongly negative but then shifts to slightly positive was well modeled in terms of the mixed serial-parallel model. Interestingly, Fific, Townsend and Eidels (2008) recently reported simulations that showed that an “interactive-channels” parallel model (e.g., Mordkoff & Yantis, 1991; Townsend & Thomas, 1994; Townsend & Wenger, 2004) could also produce SIC functions of this form, if the individual channels of the parallel process are allowed to facilitate one another to the proper degree. Whether such a model could simultaneously handle our target-category SICs and also the intricate patterns of contrast-category RT data remains an open question, but it seems like an interesting research avenue to pursue.

Finally, although we found evidence in favor of rule-based classification in the present research, it is critical for future research to investigate the boundary conditions on applications of such strategies. Obviously, one critical factor in the present work was that the category structure that we tested could in fact be described in terms of simple logical rules, making the application of such strategies a feasible one. Even for such rule-based category structures, however, it is a wide open question when rule-based strategies are actually used. Based on the results from Lafond et al. (2009) and the present Experiment 1, it appears that use of stimuli with spatially separated dimensions may be an important contributing factor. Whether subjects will freely adopt logical rule-based strategies in cases involving spatially overlapping dimensions, or even integral-dimension stimuli, remains to be investigated. Likewise, we need to extend and test the present models in domains involving more complex rules defined over more than two relevant dimensions.

In a recent experiment conducted by Nosofsky and Little (in press), the same category structure was tested as in the present research, using the same overlapping-dimension stimuli as in the present Experiment 2. A critical manipulation within the experiment was that some stimuli received probabilistic feedback assignments. Despite the probabilistic feedback assignments, the optimal decision strategy in the design was one in which the rule-based boundaries illustrated in Figure 1 should be used (see Nosofsky & Little, in press, for details). Nevertheless, in this design, Nosofsky and Little obtained convincing evidence that most subjects did not use a rule-based strategy. Instead, the patterns of classification RT data were more in accord with the predictions from an exemplar-retrieval model of classification. (In particular, subjects showed significantly slower RTs for stimuli that received probabilistic feedback assignments compared to deterministic stimuli that were the same distance from the rule-based boundaries.) Clearly, a great deal of future research is needed to understand the experimental conditions that lead human observers to adopt alternative classification strategies. The rule-based RT models tested in the present work should serve as valuable tools in conducting such investigations.

Acknowledgements

This work was supported by Grants FA9550-08-1-0486 from the Air Force Office of Scientific Research and MH48494 from the National Institute of Mental Health to Robert Nosofsky.

Appendix A

Similarity-Scaling Results for the Stimuli Used in Experiment 2

As described in more detail in an on-line supplement that accompanies Nosofsky and Little (in press), similarity ratings were collected for all pairs of the present Experiment-2 stimuli. A two-dimensional scaling model was fitted to the averaged ratings by searching for the coordinate parameters that minimized the sum of squared deviations between predicted and observed ratings. The model assumed a decreasing linear relation between similarity and the city-block distance between points in the derived space. In a full version of the model, each stimulus was allowed to have its own freely estimated pair of x-y coordinates. In a constrained version of the scaling model, all stimuli with a common physical value on Dimension x were constrained to have the same x-value in the scaling solution, and likewise for Dimension y. This constrained scaling model required the estimation of only four freely varying coordinate parameters. That is, because distances in the space are translation invariant, the coordinate parameters associated with dimension values x0 and y0 could be held fixed at zero without loss of generality, thereby requiring coordinate-parameter estimates for only dimension values x1, y1, x2 and y2. A general linear test indicated that the constrained two-dimensional scaling model did not fit the data significantly worse than the full two-dimensional scaling version in which all coordinate parameters were free to vary. The constrained two-dimensional scaling solution accounted for 98.2% of the variance in the averaged similarity ratings. The best-fitting coordinate parameters were 0.00, 0.37, and 1.69 along Dimension x; and 0.00, 0.47, and 2.63 along Dimension y. The derived configuration matches closely the intended form of the schematic design illustrated in Figure 1.

Footnotes

Publisher's Disclaimer: The following manuscript is the final accepted manuscript. It has not been subjected to the final copyediting, fact-checking, and proofreading required for formal publication. It is not the definitive, publisher-authenticated version. The American Psychological Association and its Council of Editors disclaim any responsibility or liabilities for errors or omissions of this manuscript version, any version derived from this manuscript by NIH, or other third parties. The published version is available at www.apa.org/pubs/journals/XLM

1

To allow strong contrasts, we assume independent, unlimited-capacity parallel models in this work. Fific et al. (2010) also made preliminary investigations of certain limited-capacity parallel models. An extremely wide variety of such models could be developed, however, so future work is needed to more fully investigate these possibilities.

2

The log-normal seemed a reasonable choice for the assumed distribution of residual times because it is continuous, non-negative, unimodal, and positively skewed; and it has minuscule probability density below a given time cutoff (with appropriate parameter settings). These properties are commonly observed for numerous types of empirically observed RT distributions (cf. Ulrich & Miller, 1993) and they seem plausible for latent residual-time distributions as well. Regardless, the exact form of the assumed residual-time distribution is unlikely to have any bearing on the main conclusions reached in this research. As will be seen, the alternative categorization models to be tested make strongly contrasting qualitative predictions, and these predictions are independent of the assumed residual-time distribution. To corroborate this point, we also fitted some of the subjects' data while assuming a uniform distribution of residual times instead of a log-normal. An identical pattern of model-selection results was observed and none of our conclusions was affected by the specific residual-time distribution that was assumed.

3

For the exhaustive models, the relative speed of the redundant stimulus compared to the interior ones depends on the precise placement of the decision bounds. This detailed issue is not relevant in the context of the present studies.

4

There is some debate in the literature about the merits of fitting methods that make use of variable-width quantile bins versus fixed-width bins with varying counts (Heathcote & Brown, 2004; Speckman & Rouder, 2004). Fific et al. (2010) fitted the present models using both methods, and they yielded identical model-selection results.

5

Fific et al. (2010) advanced a related argument to explain different amounts of skewing in individual-stimulus RT distributions associated with the contrast category. However, in that previous study, all subjects engaged in an instructed fixed order serial-self-terminating strategy, so the contrast-category distributions tended to have different shapes than in the present study.

6

In a previous related study, Fific et al. (2008, Experiment 1) conducted preliminary investigations of classification performance in a condition involving spatially overlapping separable dimensions. In that preliminary study, only target-category performance was examined, and no formal modeling of the RT data was involved. Nevertheless, it is interesting to note that both subjects who participated in the overlapping-dimensions condition displayed SIC functions with a form that is similar to the one seen for Subject O4 in the present experiment -- see Fific et al. (2008), p. 364, Figure 7, “Overlapping Spatial Positions”. Thus, there is precedence for this type of hybrid SIC function under similar experimental conditions, and we now have an account of such performance in terms of the mixed serial/parallel processing model.

7

Interestingly, for Subject O1 in our Experiment 2, the evidence suggested greater parallel processing for contrast-category members than for target-category members. In the present experimental design, dimension values x0 and y0 are consistently mapped to the contrast category; but all remaining dimension values receive variable mappings. (Whether those remaining dimension values signal the target or the contrast category depends on how they are combined.) Thus, because the target-category stimuli are composed only of dimension values that receive variable mappings, serial processing may have been more likely for the members of that category. Note that a subject does not need to “know” at the beginning of a trial which response-category is involved in order for parallel processing to override serial processing, so the account is not circular. In particular, in the present design, consistent mappings are associated with particular stimulus values (x0 and y0). Just as an “automatic attention” response developed for consistently mapped stimuli in Shiffrin and Schneider's classic search experiments, a similar automatic process may develop for consistently mapped stimulus values in the present classification paradigm.

References

  1. Ashby FG. A stochastic version of general recognition theory. Journal of Mathematical Psychology. 2000;44:310–329. doi: 10.1006/jmps.1998.1249. [DOI] [PubMed] [Google Scholar]
  2. Ashby FG, Alfonso-Reese LA, Turken AU, Waldron EM. A neuropsychological theory of multiple systems in category learning. Psychological Review. 1998;105:442–481. doi: 10.1037/0033-295x.105.3.442. [DOI] [PubMed] [Google Scholar]
  3. Ashby FG, Gott RE. Decision rules in the perception and categorization of multidimensional stimuli. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1988;14:33–53. doi: 10.1037//0278-7393.14.1.33. [DOI] [PubMed] [Google Scholar]
  4. Ashby FG, Maddox WT. A response time theory of separability and integrality in speeded classification. Journal of Mathematical Psychology. 1994;38:423–466. [Google Scholar]
  5. Ashby FG, Townsend JT. Varieties of perceptual independence. Psychological Review. 1986;93:154–179. [PubMed] [Google Scholar]
  6. Bourne LE. Knowing and using concepts. Psychological Review. 1970;77:546–556. [Google Scholar]
  7. Bradmetz J, Mathy F. Response times seen as decompression time in Boolean concept use. Psychological Research. 2008;72:211–234. doi: 10.1007/s00426-006-0098-7. [DOI] [PubMed] [Google Scholar]
  8. Bruner JS, Goodnow JJ, Austin GA. A study of thinking. Wiley; New York, NY: 1956. [Google Scholar]
  9. Busemeyer JR. Decision making under uncertainty: A comparison of simple scalability, fixed-sample, and sequential-sampling models. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1985;11:538–564. doi: 10.1037//0278-7393.11.3.538. [DOI] [PubMed] [Google Scholar]
  10. Cohen AL, Nosofsky RM. An extension of the exemplar-based random-walk model to separable-dimension stimuli. Journal of Mathematical Psychology. 2003;47:150–165. [Google Scholar]
  11. Cousineau D, Shiffrin RM. Termination of a visual search with large display size effects. Spatial Vision. 2004;17:327–352. doi: 10.1163/1568568041920104. [DOI] [PubMed] [Google Scholar]
  12. Eidels A, Donkin C, Brown SD, Heathcote A. Converging measures of workload capacity. Psychonomic Bulletin & Review. doi: 10.3758/PBR.17.6.763. in press. [DOI] [PubMed] [Google Scholar]
  13. Erickson MA, Kruschke JK. Rules and exemplars in category learning. Journal of Experimental Psychology: General. 1998;127:107–140. doi: 10.1037//0096-3445.127.2.107. [DOI] [PubMed] [Google Scholar]
  14. Feldman J. Minimization of Boolean complexity in human concept learning. Nature. 2000;407:630–633. doi: 10.1038/35036586. [DOI] [PubMed] [Google Scholar]
  15. Feldman J. An algebra of human concept learning. Journal of Mathematical Psychology. 2006;50:339–368. [Google Scholar]
  16. Fifić M, Little DR, Nosofsky RM. Logical-rule models of classification response times: A synthesis of mental-architecture, random-walk, and decision-bound approaches. Psychological Review. 2010;117 doi: 10.1037/a0018526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Fifić M, Nosofsky RM, Townsend JT. Information-processing architectures in multidimensional classification: A validation test of the systems-factorial technology. Journal of Experimental Psychology: Human Perception and Performance. 2008;34:356–375. doi: 10.1037/0096-1523.34.2.356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Fifić M, Townsend JT, Eidels A. Studying visual search using systems factorial methodology with target–distractor similarity as the factor. Perception & Psychophysics. 2008;70:583–603. doi: 10.3758/pp.70.4.583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Goodman ND, Tennenbaum JB, Feldman J, Griffiths TL. A rational analysis of rule-based concept learning. Cognitive Science. 2008;32:108–154. doi: 10.1080/03640210701802071. [DOI] [PubMed] [Google Scholar]
  20. Heathcote A, Brown S. Reply to Speckman and Rouder: A theoretical basis for QML. Psychonomic Bulletin & Review. 2004;11:577–578. doi: 10.3758/bf03196613. [DOI] [PubMed] [Google Scholar]
  21. Heathcote A, Brown S, Mewhort DJK. Quantile maximum likelihood estimation of response time distributions. Psychonomic Bulletin & Review. 2002;9:394–401. doi: 10.3758/bf03196299. [DOI] [PubMed] [Google Scholar]
  22. Hooke R, Jeeves TA. Direct search solution of numerical and statistical problems. Journal of the ACM. 1961;8:212–229. [Google Scholar]
  23. Hunt E. Concept learning: An information-processing approach. Wiley; New York, NY: 1962. [Google Scholar]
  24. Lafond D, Lacouture Y, Cohen AL. Decision-tree models of categorization response times, choice proportions, and typicality judgments. Psychological Review. 2009;116:833–855. doi: 10.1037/a0017188. [DOI] [PubMed] [Google Scholar]
  25. Lamberts K. Categorization under time pressure. Journal of Experimental Psychology: General. 1995;124:161–180. [Google Scholar]
  26. Lamberts K. The time course of categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1998;24:695–711. [Google Scholar]
  27. Lamberts K. Information-accumulation theory of speeded categorization. Psychological Review. 2000;107:227–260. doi: 10.1037/0033-295x.107.2.227. [DOI] [PubMed] [Google Scholar]
  28. Levine M. A cognitive theory of learning: Research on hypothesis testing. Erlbaum; Hillsdale, NJ: 1975. [Google Scholar]
  29. Luce RD. Response times: Their role in inferring elementary mental organization. Oxford University Press; New York: 1986. [Google Scholar]
  30. Maddox WT, Ashby FG. Perceptual separability, decisional separability, and the identification-speeded classification relationship. Journal of Experimental Psychology: Human Perception and Performance. 1996;22:795–817. doi: 10.1037//0096-1523.22.4.795. [DOI] [PubMed] [Google Scholar]
  31. Martin RC, Caramazza A. Classification in well-defined and ill-defined categories: Evidence for common processing strategies. Journal of Experimental Psychology: General. 1980;109:320–353. doi: 10.1037//0096-3445.109.3.320. [DOI] [PubMed] [Google Scholar]
  32. Medin DL, Schaffer MM. Context theory of classification learning. Psychological Review. 1978;85:207–238. [Google Scholar]
  33. Miller JO. Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology. 1982;14:247–279. doi: 10.1016/0010-0285(82)90010-x. [DOI] [PubMed] [Google Scholar]
  34. Mordkoff JT, Yantis S. An interactive race model of divided attention. Journal of Experimental Psychology: Human Perception and Performance. 1991;17:520–538. doi: 10.1037//0096-1523.17.2.520. [DOI] [PubMed] [Google Scholar]
  35. Nosofsky RM. Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General. 1986;115:39–57. doi: 10.1037//0096-3445.115.1.39. [DOI] [PubMed] [Google Scholar]
  36. Nosofsky RM, Clark SE, Shin HJ. Rules and exemplars in categorization, identification, and recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1989;15:282–304. doi: 10.1037//0278-7393.15.2.282. [DOI] [PubMed] [Google Scholar]
  37. Nosofsky RM, Johansen MK. Exemplar-based accounts of multiple-system phenomena in perceptual categorization. Psychonomic Bulletin & Review. 2000;7:375–402. [PubMed] [Google Scholar]
  38. Nosofsky RM, Little DR. Classification response times in probabilistic rule-based category structures: Contrasting exemplar-retrieval and decision-bound models. Memory & Cognition. doi: 10.3758/MC.38.7.916. in press. [DOI] [PubMed] [Google Scholar]
  39. Nosofsky RM, Palmeri TJ. An exemplar-based random walk model of speeded classification. Psychological Review. 1997;104:266–300. doi: 10.1037/0033-295x.104.2.266. [DOI] [PubMed] [Google Scholar]
  40. Nosofsky RM, Palmeri TJ, McKinley SC. Rule-plus-exception model of classification learning. Psychological Review. 1994;101:53–79. doi: 10.1037/0033-295x.101.1.53. [DOI] [PubMed] [Google Scholar]
  41. Nosofsky RM, Stanton RD. Speeded classification in a probabilistic category structure: Contrasting exemplar-retrieval, decision-boundary, and prototype models. Journal of Experimental Psychology: Human Perception and Performance. 2005;31:608–629. doi: 10.1037/0096-1523.31.3.608. [DOI] [PubMed] [Google Scholar]
  42. Palmer J, McLean J. Imperfect, unlimited capacity, parallel search yields large set-size effects; Presentation at the 1995 Meeting of the Society for Mathematical Psychology; Irvine, CA. 1995. [Google Scholar]
  43. Posner ML, Keele SW. On the genesis of abstract ideas. Journal of Experimental Psychology. 1968;77:353–363. doi: 10.1037/h0025953. [DOI] [PubMed] [Google Scholar]
  44. Ratcliff R. A theory of memory retrieval. Psychological Review. 1978;85:59–108. [Google Scholar]
  45. Ratcliff R, Rouder JN. Modeling response times for two-choice decisions. Psychological Science. 1998;9:347–356. [Google Scholar]
  46. Ratcliff R, Smith P. A comparison of sequential sampling models for two choice reaction time. Psychological Review. 2004;111:333–367. doi: 10.1037/0033-295X.111.2.333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Reed SK. Pattern recognition and categorization. Cognitive Psychology. 1972;4:194–206. [Google Scholar]
  48. Schneider W, Shiffrin RM. Controlled and automatic human information processing: I. Detection, search, and attention. Psychological Review. 1977;84:1–66. [Google Scholar]
  49. Schweickert R. Information, time, and the structure of mental events: A twenty-five year review. In: Meyer DE, Kornblum S, editors. Attention and performance: Vol. 14. Synergies in experimental psychology, artificial intelligence, and cognitive neuroscience -- A silver jubilee. MIT Press; Cambridge, MA: 1992. pp. 535–566. [Google Scholar]
  50. Shepard RN. Attention and the metric structure of the stimulus space. Journal of Mathematical Psychology. 1964;1:54–87. [Google Scholar]
  51. Shepard R, Hovland C, Jenkins H. Learning and memorization of classifications. Psychological Monographs: General and Applied. 1961;75:1–42. [Google Scholar]
  52. Shiffrin RM, Schneider W. Controlled and automatic human information processing: II. Perceptual learning, automatic attending, and a general theory. Psychological Review. 1977;84:127–190. [Google Scholar]
  53. Speckman PL, Rouder JN. A Comment on Heathcote, Brown, and Mewhort's QMLE method for response time distributions. Psychonomic Bulletin & Review. 2004;11:574–576. doi: 10.3758/bf03196613. [DOI] [PubMed] [Google Scholar]
  54. Sperling G, Weichselgartner E. Episodic theory of the dynamics of spatial attention. Psychological Review. 1995;102:503–532. [Google Scholar]
  55. Sternberg S. Memory scanning: Mental processes revealed by reaction-time experiments. American Scientist. 1969;4:421–457. [PubMed] [Google Scholar]
  56. Thornton TL, Gilden DL. Parallel and serial processes in visual search. Psychological Review. 2007;114:71–103. doi: 10.1037/0033-295X.114.1.71. [DOI] [PubMed] [Google Scholar]
  57. Townsend JT. Uncovering mental processes with factorial experiments. Journal of Mathematical Psychology. 1984;28:363–400. [Google Scholar]
  58. Townsend JT. Serial vs. parallel processing: Sometimes they look like tweedledum and tweedledee but they can (and should) be distinguished. Psychological Science. 1990;1:46–54. [Google Scholar]
  59. Townsend JT, Nozawa G. Spatio-temporal properties of elementary perception: An investigation of parallel, serial, and coactive theories. Journal of Mathematical Psychology. 1995;39:321–359. [Google Scholar]
  60. Townsend JT, Thomas R. Stochastic dependencies in parallel and serial models: Effects on systems factorial interactions. Journal of Mathematical Psychology. 1994;38:1–34. [Google Scholar]
  61. Townsend JT, Wenger MJ. A theory of interactive parallel processing: New capacity measures and predictions for a response time inequality series. Psychological Review. 2004;111:1003–1035. doi: 10.1037/0033-295X.111.4.1003. [DOI] [PubMed] [Google Scholar]
  62. Trabasso T, Bower GH. Attention in learning: Theory and research. Wiley; New York: 1968. [Google Scholar]
  63. Trabasso T, Rollins H, Shaughnessy E. Storage and verification stages in processing concepts. Cognitive Psychology. 1971;2:239–289. [Google Scholar]
  64. Ulrich R, Miller JO. Information processing models generating lognormally distributed reaction times. Journal of Mathematical Psychology. 1993;37:513–525. [Google Scholar]
  65. Van Zandt T. How to fit a response time distribution. Psychonomic Bulletin & Review. 2000;7:424–465. doi: 10.3758/bf03214357. [DOI] [PubMed] [Google Scholar]
  66. Wolfe JM. Guided Search 2.0: A revised model of visual search. Psychonomic Bulletin & Review. 1994;1:202–238. doi: 10.3758/BF03200774. [DOI] [PubMed] [Google Scholar]

RESOURCES