Categorization refers to the ability to reduce the potentially unlimited number of objects an animal might encounter to a smaller number of discrete groups, or categories. Animals adaptively form categories to exploit similarities and differences between objects that aid communication, prediction, and decision-making. For example, a human must recognize a person as friend or stranger, a monkey must determine whether a conspecific is a foe, and a rodent must decide whether the large animal roaming nearby is a predator. The goal of categorization is therefore to abstract a decision rule that permits rapid recognition of category-relevant features while ignoring or suppressing category-irrelevant features.
To properly understand a theory of information processing such as categorization, Marr (1982) proposed that we must understand its operation at multiple levels of analysis: the abstract goal of the computation and why it is environmentally appropriate (computational level), how the computational theory is represented as an input–output process and the algorithms that perform this transformation (representational or algorithmic level), and how the representations and algorithms are embodied in the neural hardware—the physical realization of the process (implementational level). Most theories of categorization are proposed at the implementational level in the neurosciences, and at the algorithmic or computational level in the psychological literature. Marr's (1982) levels provide a natural framework to link theories proposed in different disciplines.
Across disciplines, categorization is most often studied at the algorithmic level in the visual domain. Visual categories are acquired through extraction of category-specific features following repeated exposure to a set of training items. Generalization is tested with novel items sampled from the same generating distribution (i.e., population) as the training stimuli. Successful generalization requires identification of category-relevant features in the novel visual items. Thus, categorization training and generalization testing is used to examine the capabilities of the visual system of animals by manipulating the visual detail of the training and test items from simple, artificially constructed shapes to complex natural scenes. This general procedure has provided many insights, demonstrating, for instance, that visual processing in the primate is a useful model for the human visual system. Relative to nonhuman primates, much less is known about higher-level visual processing and categorization in the rodent, even though rats and mice are the most easily accessible animal model in scientific research. Since categorization is necessary for adaptive functioning in many species, it is plausible that the underlying mechanisms are phylogenetically well preserved, suggesting the presence of universal categorization mechanisms.
In an article recently published in The Journal of Neuroscience, Vinken et al. (2014) studied the ability of rats to categorize natural movies as a proxy to higher-level visual processing of naturalistic stimuli in rodents. Six rats were trained in a two-alternative forced choice task to discriminate 5 s movies of rats of the same strain from distractor movies of various items (toy train, gloved hand, moving stuffed sock) that were rigorously matched on low-level visual properties (pixel intensity, root-mean-squared contrast, average change in pixels across frames). Five of the six rats learned to categorize training items from distractor items to a criterion level of performance after 3.5–4.5 months of practice.
Generalization of categorization training was tested in three qualitatively distinct test sets, each with five novel movies that, relative to the training set, had (1) dissimilar low-level stimulus properties but qualitatively similar high-level content (similar amount of motion energy, same rat strain); (2) rats (target) and objects (distractor) that displayed less motion; and (3) rats with visually dissimilar markings. The rats generalized to the novel items in the three test sets, suggesting that they abstracted a decision rule from the training set that might involve integration of complex features of the visual stimuli. However, performance was poorer when the subject of the movie was less active (test set 2). Control conditions using deviant movies led Vinken et al. (2014) to conclude that the diminished performance was due to confounding stimulus properties and that motion energy was not a salient cue for categorization. There is reason to be cautious of this conclusion, however. The authors' control conditions to demonstrate that motion was not the salient cue for generalization used modified movies from test set 1 with reduced frame rate (¼ speed) and single-frame snapshots. The potential problem is in the sequential testing protocol: generalization testing began with test set 1 followed by test sets 2 and 3, and then the control conditions using modified movies from test set 1. Improved performance in the control conditions could be the result of previous exposure to these stimuli, since the experimental protocol cannot eliminate the possibility that learning continued throughout the generalization phase. Further testing is required, though it remains possible that motion cues, when available, moderate categorization performance.
Vinken et al. (2014) presented empirical evidence for performance of the fundamental cognitive task of categorizing naturalistic stimuli in rats. These findings add to the body of research demonstrating categorization in humans, nonhuman primates, pigeons, and other species. Cross-species similarities in categorization performance allow development and empirical testing of theoretical accounts in a comparative cognition framework. Such analyses allow consideration of the outstanding question: how did the rats learn to categorize?
Vinken et al.'s (2014) results suggest that visual categorization might be fruitfully studied in implementational-level models (Marr, 1982) in lower-order animals. To unify theories of categorization, models proposed at the implementational level must generate flow-on predictions for models proposed at other levels. For instance, some primate models of object recognition assume a hierarchical feedforward architecture that builds progressive feature representations through modules in striate and extrastriate visual areas (Serre et al., 2007). However, it is unclear when these representations become cognitively accessible to the decision-maker in the form of an abstract decision rule. Vinken et al. (2014) suggested a role for contrast templates, a combination of contrast cues arising from low-level spatial frequencies of the stimulus, in the rodent ventral stream, but how did rats use or manipulate such templates to decide whether a movie contained a target or distractor?
Categorization decision rules have been studied in the psychological literature as algorithmic level theories. Many algorithmic-level explanations assume that objects can be characterized on the basis of multiple features (e.g., dogs have fur, four legs, bark, etc.). Any particular object has defined values for each feature that can be represented as a point in multidimensional feature space with axes defined by the features. The generalized context model (GCM; Nosofsky, 1986), for example, formalizes an object's most likely classification as the category to which it has greatest similarity with all previously encountered exemplars stored in memory as points in the multidimensional feature space.
It may be that a feedforward hierarchy in the ventral stream subserves higher-level similarity comparisons among exemplars. However, without explicit and testable linking propositions, where cognitive states are hypothesized to map to measurable neural states (Schall, 2004), it is difficult to reconcile theories of categorization proposed at different levels of explanation. As an example in a related domain, the drift diffusion model (Ratcliff, 1978) assumes that speeded perceptual decision-making involves integration of noisy evidence from the environment until an evidence counter crosses a boundary, triggering a response. This algorithmic level account has been implemented, via linking propositions, to the instantaneous firing rates of neurons from the lateral intraparietal area in Macaca mulatta (Roitman and Shadlen, 2002), among others, which seem to behave like diffusion models. This approach has led to mutual constraint on theories across algorithmic and implementational levels of analysis, and across disciplines.
In visual categorization, a simple algorithmic-level model that draws on the results of Vinken et al. (2014) and the authors' previous work (Vermaercke and Op de Beeck, 2012) might categorize on the basis of the similarity between a novel object and the contrast templates of exemplars in memory, and motion energy when available. This could be instantiated within the GCM as a two-dimensional feature space with weights given to each dimension dependent on the bias of the animal. At an implementational level, one might hypothesize that observed activation in the ventral stream (e.g., inferotemporal cortex) and dorsal stream (e.g., middle temporal area) is proportional to the weight placed on the contrast template and motion energy dimensions of the GCM, respectively. Such concurrent investigations at the algorithmic and implementational levels, and linking propositions to combine the two, will move the field toward a unified theory of categorization.
Finally, Vinken et al. (2014) pave the way for analyzing data in a principled and efficient manner, drawing all conclusions from hierarchical Bayesian analyses. Hierarchical Bayesian analysis models the dependencies between observations from the same subject, and uncertainty in subject-level estimates appropriately inform population-level conclusions. Bayesian inferential statistical tests, therefore, are not overly confident with respect to small numbers of subjects, which can occur with t tests that do not model the dependence between multiple observations from the same subject (Aarts et al., 2014). The Bayesian approach to data analysis is general and not restricted to behavioral or neural data, and the field will benefit from more widespread use of such rigorous analyses.
Footnotes
Editor's Note: These short, critical reviews of recent papers in the Journal, written exclusively by graduate students or postdoctoral fellows, are intended to summarize the important findings of the paper and provide additional insight and commentary. For more information on the format and purpose of the Journal Club, please see http://www.jneurosci.org/misc/ifa_features.shtml.
I thank B. U. Forstmann and S. A. Hiles for valuable discussions.
References
- Aarts E, Verhage M, Veenvliet JV, Dolan CV, van der Sluis S. A solution to dependency: using multilevel analysis to accommodate nested data. Nat Neurosci. 2014;17:491–496. doi: 10.1038/nn.3648. [DOI] [PubMed] [Google Scholar]
- Marr D. San Francisco: W. H. Freeman; 1982. Vision: a computational investigation into the human representation and processing of visual information. [Google Scholar]
- Nosofsky RM. Attention, similarity, and the identification-categorization relationship. J Exp Psychol Gen. 1986;115:39–57. doi: 10.1037/0096-3445.115.1.39. [DOI] [PubMed] [Google Scholar]
- Ratcliff R. A theory of memory retrieval. Psychol Rev. 1978;85:59–108. doi: 10.1037/0033-295X.85.2.59. [DOI] [Google Scholar]
- Roitman JD, Shadlen MN. Responses of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. J Neurosci. 2002;22:9475–9489. doi: 10.1523/JNEUROSCI.22-21-09475.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schall JD. On building a bridge between brain and behavior. Annu Rev Psychol. 2004;55:23–50. doi: 10.1146/annurev.psych.55.090902.141907. [DOI] [PubMed] [Google Scholar]
- Serre T, Oliva A, Poggio T. A feedforward architecture accounts for rapid categorization. Proc Natl Acad Sci U S A. 2007;104:6424–6429. doi: 10.1073/pnas.0700622104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vermaercke B, Op de Beeck HP. A multivariate approach reveals the behavioral templates underlying visual discrimination in rats. Curr Biol. 2012;22:50–55. doi: 10.1016/j.cub.2011.11.041. [DOI] [PubMed] [Google Scholar]
- Vinken K, Vermaercke B, Op de Beeck HP. Visual categorization of natural movies by rats. J Neurosci. 2014;34:10645–10658. doi: 10.1523/JNEUROSCI.3663-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]