Perceptual Learning Produces Perceptual Objects

Michael J Wenger; Stephanie E Rhoten

doi:10.1037/xlm0000735

. Author manuscript; available in PMC: 2021 Mar 1.

Published in final edited form as: J Exp Psychol Learn Mem Cogn. 2019 Jun 20;46(3):455–475. doi: 10.1037/xlm0000735

Perceptual Learning Produces Perceptual Objects

Michael J Wenger ¹, Stephanie E Rhoten ¹

PMCID: PMC6923621 NIHMSID: NIHMS1039354 PMID: 31219302

Abstract

In their seminal study of chess expertise, Simon and Chase (Simon & Chase, 1973; Chase & Simon, 1973) proposed that perceptual learning was a necessary component of skill acquisition. In their view, acquisition of skill results from the strategic use of learning at multiple levels in order to adaptively overcome inherent limitations. The knowledge acquired by way of perceptual learning that supported increasingly sophisticated perceptual discrimination processes, according to Simon and Chase, was referred to as a “chunk.” The “chunk” was conceptualized as a meaningful complex set of features that abstracted the notion of a perceptual object. Simon and Chase further suggested that meaningful combinations of chunks could be combined to form configurations (Simon & Chase, 1973, p. 399). The present study addresses this idea by framing the notion of a “chunk” in terms of two formal meta-theories, one that addresses representation (Ashby & Townsend, 1986), and one that addresses processing (Townsend & Nozawa, 1995), and tests the prediction that perceptual learning produces organized perceptual objects (“chunks”). Two experiments combine behavioral and electroencephelographic (EEG) measures to show that perceptual learning produces (a) a shift from perceptual independence and separability to violations of separability, and (b) shifts from limited-capacity serial processing to super-capacity parallel processing. The evidence from both experiments is strong and consistent: perceptual learning does indeed induce “chunking”—the production of perceptual objects, and the foundation of perceptual expertise.

Keywords: perceptual learning, skill acquisition, expertise, chunking

In 1963, Eleanor Gibson (Gibson, 1963) noted that the study of perceptual learning had become a “healthily growing” (p. 50) field, having only three years earlier “acquired the status of an area worthy of separate review” (p. 29, in reference to Drever, 1960). Further, she noted that while perceptual learning must logically be a component of the acquisition of higher-order skills, “[v]ery little applied (sic) research on perceptual skills is to be found in psychological journals” (Gibson, 1963, p. 49). More than 50 years later, comprehensive reviews of the literature (Kellman & Massey, 2013; Dosher & Lu, 2017; Seitz, 2017; Watanabe & Sasaki, 2015) suggest that Gibson’s observations still hold true. The literature on perceptual learning is healthy. However, it is also true that there is relatively limited work on the explicit role of perceptual learning in the acquisition of higher-level expertise (Lu, Lin, & Dosher, 2016; Kellman & Garrigan, 2009; Kellman & Massey, 2013; Lu et al., 2016; Polat, 2016).

Perceptual learning and perceptual “chunks”

In their seminal study of chess, Simon and Chase (Simon & Chase, 1973; Chase & Simon, 1973) proposed that perceptual learning was a necessary component of skill acquisition. In their view, acquisition of skill results from the strategic use of learning at multiple levels in order to adaptively overcome inherent limitations. They proposed that “specific perceptual knowledge acquired through long experience, stored in long-term memory, and accessed by perceptual discrimination processes” (Simon & Chase, 1973, p. 394) was necessary to alleviate the constraints of a limited-capacity short-duration memory. The knowledge acquired by way of perceptual learning that supported increasingly sophisticated perceptual discrimination processes, according to Simon and Chase, was referred to as a “chunk.”

The “chunk” was conceptualized as a meaningful complex set of features that abstracted the notion of a perceptual object. Simon and Chase further suggested that meaningful combinations of chunks could be combined to form configurations (Simon & Chase, 1973, p. 399). Simon and Chase formalized this recursive structure in a computational model that possessed the critical characteristic of coding chunks by changing the weighted connections from input to output elements. The literature provides some support for the idea that perceptual learning of any stimulus involving more than one dimension produces effects that are consistent with the regularities of learning and attending to objects (as dependent conjunctions of features, e.g., W. Li, Piech, & Gilbert, 2004; R. W. Li, Levi, & Klein, 2004; Schyns, Goldstone, & Thibaut, 1998; Czerwinski, Lightfoot, & Shiffrin, 1992). Indeed, modern conceptualizations, such as Goldstone’s notion of perceptual unitization (Goldstone, 1998, 2000), along with computational (e.g., Schyns et al., 1998) and conceptual (e.g., Kellman & Garrigan, 2009; Kellman & Massey, 2013) accounts of the construct of chunking share many structural aspects with successful models of perceptual learning (e.g., Dosher, Jeter, Liu, & Lu, 2013; Petrov, Dosher, & Lu, 2005; Huang, Lu, & Dosher, 2012; Zhaoping, Herzog, & Dayan, 2003).

The hierarchical/recursive structure proposed by Simon and Chase (1973) implies a set of characteristics present in both the empirical and theoretical literatures on perceptual learning. First, their proposal implies that learning can occur at multiple levels of representation. This is most critically true if the first-order organization of features into perceptual objects is generalized to configurations (and potentially) beyond. This is a characteristic of the major theoretical conceptions of perceptual learning (e.g., Dosher et al., 2013; Watanabe & Sasaki, 2015; Ahissar & Hochstein, 2004) and is suggested by some of the data on generalization and transfer (e.g., Wang et al., 2016; Wang, Zhang, Klein, Levi, & Yu, 2012, 2014; J.-Y. Zhang, Cong, Klein, Levi, & Yu, 2014). Second, the notion that learning can occur at multiple levels of representation suggests the computational need for changing both feed-forward and feed-back connections among levels of processing. This computational requirement is one that has been acknowledged in multiple theoretical perspectives (see review in Schyns et al., 1998), is one that has been shown to be critical in maintaining stability in networks of neurons (e.g., Moldakarimov, Bazhenov, & Sejnowski, 2014), and that is supported by various sources of empirical evidence (e.g., Rauss, Schwartz, & Pourtois, 2011; Ruff & Driver, 2006). Third, implied in the original proposal by Simon and Chase, and elaborated more explicitly in later work (e.g., Feltovich, Prietula, & Ericsson, 2006; Chase & Ericsson, 1981), is the notion that ongoing perceptual organization, or the formation of new chunks, is dependent on and assisted by existing representations. Support for this idea can be found empirically in the “Eureka” phenomenon (Ahissar & Hochstein, 1997), as well as in effects due to frames of reference or context (e.g., Rauss et al., 2011; Pourtois, Grandjean, & Sander, 2004), and theoretically in concepts such as Gibson’s predifferentiation (Gibson & Walk, 1956) and the mnemonic encoding principle of skilled memory theory (Chase & Ericsson, 1981; Wenger & Payne, 1995). Fourth, according to Simon and Chase, the learning of perpetual objects with extensive practice allowed for the inherent limitations of declarative memory to be alleviated by a transition to procedural memory. They conceived of this as stimulus-response learning, idealized computationally as a production, a pairing of a condition and an action. On this issue, Simon and Chase were consistent with the earlier conception of the stages of skill acquisition described by Fitts and Posner (1967), in which the second stage was a transition to declarative retrieval. Repeated declarative retrieval provided the basis for the transition to the autonomous stage, sometimes referred to as automaticity (e.g., Shiffrin & Schneider, 1977; Schneider & Shiffrin, 1977; Logan, 1988; Wenger, 1999), a stage characterized by extremely efficient responding, absent the need for controlled attention. This conception is consistent with the empirical regularities associated with the learning and transfer of “pop-out” (e.g., Ahissar, Laiwand, Kozminsky, & Hochstein, 1998; Schoups & Orban, 1996) and with contemporary computational (e.g., Tenison & Anderson, 2016) and neurobiological (e.g., Ashby, Ennis, & Spiering, 2007) theories.

Defining and detecting perceptual objects

The approach taken here conceptualizes perceptual objects in terms of foundational aspects of stimulus encoding and processing (O’Toole, Wenger, & Townsend, 2001; Townsend & Wenger, 2015). With respect to encoded representations, the work relies on the theoretical definitions of dependency and lack-of-separability in general recognition theory (GRT, Ashby & Townsend, 1986; Townsend, Houpt, & Silbert, 2012). In that view, a perceptual object is one in which the encoded stimulus features evidence of a violation of perceptual independence (PI), a violation of perceptual separability (PS), and/or a violation of decisional separability (DS). A violation of PI would imply that encoded information about the level of one dimension (e.g., the contrast of a gabor patch) would be correlated with (not independent of) the encoded information about another dimension (e.g., the gabor patch’s orientation) within an individual stimulus. A violation of PS would imply that a change in the level of one dimension (e.g., an increase in contrast) would change the level of encoded information about another dimension (e.g., an increase in perceived tilt), with this effect being obtained across stimuli. A violation of DS would imply that a change in the level of one dimension (e.g., an increase in contrast) would change the criterion for judging the state of another dimension (e.g., becoming more predisposed to judge the patch as tilted up, independent of the actual level of orientation), with this effect also being obtained across stimuli. With respect to the processing of encoded information, we rely on the theoretical characterization of processing architecture, stopping-rules, channel independence, and capacity developed in systems factorial theory (SFT, Townsend & Nozawa, 1995; Little, Altieri, Yang, & Fific, 2017). In that view, a perceptual object is one that is processed in a parallel, dependent, exhaustive manner, and which evidences unlimited- to super-capacity processing. The advantages of using these two conceptions include rigorous mathematical definitions of their central constructs, well-developed and well-understood experimental tasks, and highly-refined statistical methodologies for linking theory and data.

Learning perceptual objects: GRT.

With respect to encoded representations, the notion of creating a perceptual object from two or more features by way of perceptual learning implies a practice-dependent shift from independence or separability to violations of either or both. In order to test for this change, it is necessary to have a task that allows the perceptual state of the observer to be determined with respect to all of the features simultaneously (Ashby & Townsend, 1986; Kadlec & Townsend, 1992). The task that allows this is known as the complete identification (CID) paradigm. In this task, all possible combinations of each of the features is presented with each requiring a unique response. For example, in the experiments below, two contrast-defined features each are either absent or present and at either a low or high level of contrast. In experiment 1, this results in four possible stimulus states with each assigned a unique response. The data from this task are summarized as an identification/confusion matrix, with each cell having a corresponding set of reaction times (RTs). The first of the critical measures used with the response frequencies is a test of marginal response invariance (MRI), which is defined in terms of marginal responses on each level of each dimension. MRI holds for a given level on a given dimension if the marginal probability of identifying that level on that dimension is the same across levels of the other dimension (Ashby & Townsend, 1986; Silbert & Hawkins, 2016). If MRI does not hold, then one infers a failure of PS, DS, or both. The second critical measure is a test of report independence (RI, originally referred to as sampling independence, see Ashby & Townsend, 1986). RI holds for a given stimulus if the probability of correctly identifying that stimulus is equal to the product of the marginal probabilities of accurately identifying the level of each component (Ashby & Townsend, 1986; Silbert & Hawkins, 2016). If RI does not hold, it suggests a violation of PI for that stimulus. More recently, Townsend and colleagues (2012) have developed RT-based tests of MRI and RI (referred to respectively as timed MRI [tMRI] and timed RI [tRI]). On the basis of the inferences suggested by these tests, multidimensional gaussian models with these characteristics are fit to the data and compared (per Thomas, 2001) to both more- and less-restrictive models to determine whether the inferences drawn from the tests of MRI and RI are consistent with the best-fitting model.

Learning perceptual objects: SFT.

With respect to foundational characteristics of information processing, the notion of creating a perceptual object from two or more features by way of perceptual learning implies a practice-dependent shift from serial or parallel independent, self-terminating, limited-capacity processing to parallel, dependent, exhaustive, unlimited- to super-capacity processing (O’Toole et al., 2001; Wenger & Townsend, 2001; Townsend & Wenger, 2015). In order to perform strong-inference tests for this change, it is necessary to use a task known as the double-factorial paradigm (DFP Townsend & Nozawa, 1995; Townsend & Wenger, 2004a, 2004b). The prototypical version of this task involves two features, each of which can be present or absent (Townsend & Nozawa, 1995; Ingvalson & Wenger, 2005). In addition, when both features are present, their relative speeds of processing are manipulated so that each can be processed slowly or quickly. The response instruction used here is to give one response when both features are present; otherwise a second response is given. This is referred to as an AND task.

Inferences about processing architecture and stopping rule are drawn using RT interaction contrasts calculated for the four cells of the design in which both features are present. The interaction contrasts are calculated at the level of the mean, and then at the level of the survivor function of the RT distribution (the complement of the cumulative distribution function). The signs of these two interaction contrasts allow for unique inferences regarding architecture and stopping rule, and statistical tests are available to check the signs of both (Houpt & Townsend, 2010). Inferences regarding capacity are drawn on the basis of a capacity coefficient which, for the AND task is defined as a ratio of the inverse cumulative hazard functions (Chechile, 2011) for single- and double-target trials (Townsend & Wenger, 2004b; Townsend & Eidels, 2011; Townsend, Wenger, & Houpt, 2018). Values of this coefficient that are equal to 1 indicate unlimited-capacity processing, values < 1 indicate limited capacity processing, and values > 1 indicate super-capacity processing, and statistical tools have been developed to assess the reliability of any deviations from 1 (Houpt & Townsend, 2010). Inferences regarding independence are guided by the inferences regarding capacity, as it has been shown that limited or super-capacity processing typically results from channel dependences (Townsend & Wenger, 2004b; Wenger & Townsend, 2006).

Converging evidence from the timing of neural events.

The logic and statistical practice relating data to theory in both GRT and SFT relies on converging sources of evidence (as in Bridgman, 1945). In an effort to strengthen our inferences, we sought to use a neurophysiological variable along with response frequencies and latencies. Following Schweickert and Mounts (1998), we sought to use a feature of the EEG signals that could be readily interpreted in terms consistent with the assumptions of the time-based measures in GRT and the distributional measures in SFT, in particular the assumption of selective influence. The most promising candidate in this regard was the lateralized readiness potential (LRP). The LRP is a negative-going waveform, measured in central electrodes contralateral to the motor response that it precedes, and is interpreted as being an indicator that sufficient processing has been completed in order to program the motor response (Coles, 1989; Hackley & Miller, 1995; Miller & Hackley, 1992; Mordkoff & Grosjean, 2001; Ray, Slobounov, Mordkoff, Johnston, & Simon, 2000). As well, the LRP should behave in accord with the assumptions regarding processing times and observable RTs in GRT and SFT. The LRP is estimated by subtracting the ipsilateral from the contralateral potential for each hand. Our focus will be on the onset time for the LRP, operationalized as the earliest time at which the LRP becomes reliably less than 0 (as in Kuefner, Jacques, Prieto, & Rossion, 2010; Von Der Heide, Wenger, Bittner, & Fitousi, 2018). We test the hypothesis that when the timing of the onset of the LRP is analyzed using the RT tests specified for GRT and SFT, the conclusions will be consistent with those drawn from the analyses of the RT data, providing an additional source of converging evidence for our inferences.

Experiment 1

Experiment 1 uses the theory and methods of GRT to test the hypothesis that perceptual learning results in a change in encoded representations such that they become perceptual objects. Specifically, we predict that prior to perceptual learning, observers will encode the features of a two-element stimulus as independent and/or separable, both perceptually and decisionally. After perceptual learning, observers will encode those features as dependent and/or non-separable, perceptually or decisionally. This will be consistent with the theoretical conception of an object from the GRT perspective. Critically, these inferences should be supported by the analyses of the response frequencies, RTs, and the onset times of the LRP.

Methods

Participants.

A total of four (all female, ages 21–24) participants were recruited from the University of Oklahoma community. All were right-handed, reported normal or corrected-to-normal vision, normal hearing, and unencumbered use of both hands. Participants were reimbursed by gift card at the completion of their participation. The protocol and procedures for this experiment were reviewed and approved by the Institutional Review Board of the University of Oklahoma (IRB approval number 3908).

Design.

Two tasks were used in experiment 1: a complete identification task before and after perceptual practice, and a three-down one-up staircase task augmented with catch trials for perceptual practice. The complete identification task was a feature-complete factorial design, involving the presence and absence of two contrast-defined features, presented just above or below fixation in the center of the stimulus. This resulted in four possible stimuli, each of which were assigned a unique response. These stimuli were presented at two levels of (Michelson) contrast (10% and 50%) before practice, and three levels of contrast (final threshold for each participant, 10%, and 50%) after practice.¹ The three-down one-up staircase task involved the three stimuli in which at least one of the contrast-defined targets was present. Separate staircases were run for each stimulus type.

Materials.

The stimuli were created from a digital version of the line drawing of the Cheshire Cat from the original edition of Alice in Wonderland (Carroll, 1865). Figure 1 presents examples of each type of stimulus. The units of the dimensions noted in Figure 1 are degrees of visual angle at a constrained viewing distance of 78 cm. The target-absent stimulus was created from a randomly-selected and arranged set of 15 × 15 pixels from the forehead of the cat. Each of the two targets was created from the left eye of the cat, rotated by 90°. Pattern masks were created by scrambling (in 15 × 15 pixel tiles) the stimulus presented on each trial. Gaussian noise (μ = 0, σ = 0.10) was added to all stimuli and masks. Contrast for all stimuli and masks ranged from 1% to 60%, in steps of 0.1%. All stimuli were presented on an 55 cm (diagonal) gray-scale CRT, and all responses were made using a four-button response box with button presses timed to ± 1 ms. Both the complete identification trials and staircase practice trials were presented and timed using E-prime (Psychology Software Tools, Sharpsburg PA).

Figure 1 . — Examples (here at 40% contrast) of the four stimulus types used in Experiments 1 and 2. Dimensions are in degrees of visual angle. (a) Target-absent, (b) bottom only, (c) top only, (d) top and bottom.

Procedure.

The experiment was performed across 14 days, and consisted of one day of a baseline (BL) complete identification task, followed by 10 days of practice using the three-down one-up modified staircase procedure, followed by an endline (EL) set of three days of sessions with the complete identification task. Concurrent EEG was collected during the BL and EL complete identification sessions. All testing took place in light- and sound-shielded chambers. Testing with concurrent EEG occurred in an identical chamber that was also electromagnetically shielded (i.e., a Faraday cage). All sessions began with a five minute period of dark adaptation and all testing occurred between 0800 and 1700 local time. Participants performed all trials with their heads in a chin rest positioned 78 cm from the monitor.

Each session of the complete identification task required approximately 90 min, including time for EEG setup and clean-up. The BL session involved 300 presentations of each test stimulus, each at two levels of contrast: 10% and 50%. The EL sessions each involved 300 presentations of each test stimulus, each at three levels of contrast: 10%, 50%, and each participant’s final level of contrast from the perceptual practice sessions. Each session was divided into blocks of 400 trials, with a short (5 min) break between blocks, during which impedances were checked and adjusted as needed. Order of trials was randomized by session for each participant.

Each trial of the complete identification task included the following events. A small dot signaled the beginning of each trial, which was self-initiated by the observer by pressing the key under the index finger of their dominant hand. A fixation cross was presented centrally for a random period on each trial, determined by an exponential distribution with a mean of 700 ms, censored at 400 and 1000 ms. This was followed by three randomly selected pattern masks, at the level of contrast for the test stimulus, each presented for 25 ms. The test stimulus was then presented for 25 ms, followed by a set of three randomly selected pattern masks. The participant then was allowed up to 2 s to respond. Participants were instructed to respond as quickly and as accurately as possible, and no feedback was provided.

Each session of perceptual practice required between 45 and 60 min; no concurrent EEG was collected. Each session of practice was divided into six blocks of 250 trials per block, with each block involving only one of the three target-present stimuli. A short (eyes closed) break occurred between blocks. All participants began the first block of practice with each stimulus type at 60% contrast. Each subsequent block of each stimulus type began at 110% of the threshold from the previous block with that stimulus type. Contrast decrements and increments during each staircase were at 10% of the current contrast level. Catch trials involved presentation of a target-absent stimulus at the current level of contrast. The sequence and timing of events were the same as those in the complete ID. Participants indicated the perceived presence of a stimulus by pressing the button under the index finger of their dominant hand and indicated the perceived absence of a stimulus by pressing the button under the index finger of their non-dominant hand. No feedback was provided.

EEG data were collected using a 128-channel EGI system (EGI Philips, Eugene OR). EEG was recorded continuously with hardware filters set from 0.1 to 100 Hz, a sampling rate of 100 Hz, and an online vertex reference. Impedances were kept ≤ 50KΩ for the entire session. Continuous EEG was epoched around the cues (−800 to 1500 ms). Data were preprocessed and analyzed using EEGLab (Delorme & Mackeig, 2004). Data were first inspected visually, and bad channels were deleted. Artifacts were then rejected and any additional bad channels were deleted on the basis of probability using EEGLab. The data were then band-pass filtered (1 to 90 Hz), using a notch filter centered at 60 Hz (59–61 Hz). The data were then epoched, and independent components analysis (ICA) was performed and components corresponding to artifacts were removed. The data were then low-pass filtered at 8 Hz in order to determine the LRP. A set of six electrodes, three to the left of midline and three to the right, were used to determine the LRP, following previously used procedures (Kuefner et al., 2010; Von Der Heide et al., 2018). The start-time for the LRP was defined as the time point at which the negative-going difference wave was reliably different from 0.

Results

Practice.

Thresholds in each block for each stimulus type were determined by calculating the geometric mean of the contrast level associated with last 15 reversals in each block. False alarm rates were calculated for each block. To account for individual differences, thresholds were converted to relative thresholds for each observer by dividing the threshold in block n by that observer’s threshold in block 1. Reliability of change for both threshold and false alarm rates was assessed using linear regression on log-transformed values as a function of log-transformed block number. Plots of threshold and false alarm rates as a function of block are presented for each of the four observers in Figure 2 and the results of the regression analyses are presented in Table 1. Practice produced reliable decreases in threshold for all four observers for each stimulus type. Practice also produced reliable increases in false alarm rates for all four observers and for all stimulus types, with the exception of the single-target top stimulus for observer 2 and the single-target bottom stimulus for observer 4.

Figure 2 . — Experiment 1: Relative thresholds (THR) and false alarm rates (FAR) for the three stimulus types for each of the four observers (Obs).

Table 1.

Results of the regression analyses on relative thresholds and false alarm rates from the practice sessions for all four observers and all three target-present stimuli in Experiment 1. Note that all $\hat{β}$ values were reliably different from 0 (p < .05) except for those marked with ⁺.

	Relative Threshold				False Alarm Rate
Observer	Stimulus	Intercept	$\hat{β}$	R²	Intercept	$\hat{β}$	R²
1	top	0.13	−1.25	0.93	−2.18	0.30	0.58
	bottom	−0.07	−1.30	0.91	−1.80	0.14	0.57
	both	−0.25	−1.17	0.90	−2.32	0.35	0.48
2	top	−0.71	−0.45	0.36	−2.34	0.12+	0.34
	bottom	−0.15	−1.21	0.80	−3.05	0.51	0.66
	both	−0.81	−0.59	0.46	−3.07	0.51	0.62
3	top	1.07	−1.82	0.91	−2.44	0.33	0.42
	bottom	0.38	−1.53	0.91	−2.03	0.18	0.17
	both	0.34	−1.44	0.92	−2.13	0.20	0.15
4	top	0.91	−1.59	0.87	−2.00	0.25	0.58
	bottom	0.79	−0.93	0.62	−1.70	0.10+	0.36
	both	1.22	−1.55	0.77	−2.07	0.26	0.48
$\bar{X}$				0.78			0.45

Open in a new tab

Baseline and endline identification.

Data analysis.

Since the inception of GRT (Ashby & Townsend, 1986), there have been a range of approaches to analyzing data with respect to inferences regarding PI, PS, and DS, with there being a range of strengths and weaknesses noted for each (e.g., Silbert & Thomas, 2013; Thomas, 2001, 2003; Thomas & Silbert, 2014). A critical point, however, is that the literature on the analysis of complete identification data is a vibrant and evolving one, with regular additions and evolutions (e.g., Silbert & Hawkins, 2016; Soto et al., 2015), including the extension of GRT to RTs (Townsend et al., 2012). The approach taken here is to take advantage of the strengths of a variety of approaches by combining them, with the goal of using the inferences drawn from each approach as sources of potentially-converging evidence (see, e.g., Cornes, Donnelly, Godwin, & Wenger, 2011; Von Der Heide et al., 2018, for a similar approach).

Figure 3 illustrates the manner in which our analyses proceeded. One path in this process was used for preliminary inferences regarding PS and DS, and a second was used for preliminary inferences regarding PI. The critical quantities for the first path were the following. The first is a test of marginal response invariance (MRI) for the response frequencies, and its corresponding test in RTs (timed MRI or tMRI; Townsend et al., 2012). MRI is defined in terms of marginal responses on each level of each dimension. MRI holds for a given level on a given dimension if the marginal probability of identifying that level on that dimension is the same across levels of the other dimension (Ashby & Townsend, 1986; Silbert & Hawkins, 2016). If MRI and tMRI do not hold, then this suggests a failure of PS, DS, or both. The second set of quantities are the marginal measures of sensitivity (d′) and criterion (c) as defined in signal detection theory (Green & Swets, 1966; Macmillan & Creelman, 2005). Equality of these marginal measures across the levels of the other dimension (e.g., marginal d′ for the top feature across the two levels of the bottom) is tested using 95% confidence intervals (Gourevitch & Galanter, 1967). Inequality of the marginal d′s suggests a possible failure of PS and inequality of the marginal cs suggests a possible failure of DS. The critical quantities on the second path is a test of report independence (RI, originally referred to as sampling independence; see Ashby & Townsend, 1986) and its corresponding test for RTs (timed RI or tRI, Townsend et al., 2012). RI holds for a given stimulus if the probability of correctly identifying that stimulus is equal to the product of the marginal probabilities of accurately identifying the level of each component (Ashby & Townsend, 1986; Silbert & Hawkins, 2016). If RI and tRI do not hold, it suggests a violation of PI for that stimulus. Here it should be noted that we applied the tests of tMRI and tRI to both the RT data and the start times of the LRPs.

Prior to analysis, the RT data were censored at 200 and 2000 ms. Any LRP start times that were less than 100 ms before the observed RT were deleted prior to analysis. This eliminated less than 1% of the observations for each observer. Overall correlations between RTs and LRP start times for the four observers ranged from r = 0.78 to r = 0.89.

All of the preliminary inferences drawn to this point were then used to guide hierarchical model fitting (per Thomas, 2001). Specifically, the set of preliminary inferences regarding PI, PS, and DS suggested an hypothesized model. This hypothesized model was used to specify a hierarchy of possible models, starting with the simplest possible model (PI, PS, and DS all holding). The most complex model in any hierarchy was more complex than the hypothesized model, and was constructed by relaxing one assumption of the hypothesized model. In addition, in cases in which the hypothesized model contained either a violation of PI or a violation of DS, an alternative model was constructed using a violation of the alternative (Silbert & Thomas, 2013). In cases in which the hypothesized model contained both violations of PI and DS, alternative models in which only PI or only DS were violated were also fit. Finally, a completely unconstrained (all parameters free) was also fit to the data. The best model (based on a χ² statistic calculated on the negative log likelihoods of each model, per Thomas, 2001) selected in this process was used for the final set of inferences.

Tests of MRI and tMRI.

Table 2 presents presents a summary of the inferences for all of the analyses of the data from experiment 1. In order to conserve space, we present the results only for the cases in which either MRI or tMRI failed. For the tests of MRI, the critical test statistic is a χ² (Silbert & Hawkins, 2016) and for the tests of tMRI (using the RT and the LRP data), the critical test statistic is D, scaled by the number of observations in the cumulative distribution functions that are being compared (Townsend et al., 2012, p. 485). In Table 2, cells highlighted in gray indicate where the tests for MRI or tMRI fail. In all but one case, MRI and tMRI (for both RTs and LRPs) held for all observers at both contrast levels at BL (before perceptual practice). However, there were numerous failures of both MRI and tMRI at endline (after perceptual practice), occurring most frequently at the two super-threshold levels of contrast and far more frequently for the top rather than the bottom feature. This suggests that PS and/or DS may both have held prior to practice, but that either or both may have been violated after practice. In addition, the tests of tMRI on the RTs and LRPs showed a high level of consistency. Of the 40 possible inferences, the analyses of the RTs and LRPs agreed on 35 (88%).

Table 2.

Experiment 1: Tests of MRI (response frequencies) and tMRI (RTs and start times of the LRPs) for all four observers. Cells shaded in gray indicate cases in which MRI or tMRI failed. Note: BL = baseline, EL = endline.

			Tests of MRI		Tests of tMRI
			Target	Level of other			D
Obs	Test	Contrast	Dimension	Dimension	x²	Comparison	RT	LRP
1	EL	10	top	absent	5.617	Top, absent	0.136	0.183
				present	25.734	Top, present	0.136	0.163
			bottom	absent	1.539	Bottom, absent	0.053	0.245
				present	0.375	Bottom, present	0.028	0.193
		50	top	absent	0.000	Top, absent	0.070 \|	0.206
				present	33.689	Top, present	0.205	0.128
2	EL	threshold	top	absent	2.347	Top, absent	0.079	0.072
				present	21.425	Top, present	0.138	0.024
		10	top	absent	0.137	Top, absent	0.065 j	0.229
				present	45.405	Top, present	0.242	0.390
			bottom	absent	0.852	Bottom, absent	0.041^	0.187
				present	0.833	Bottom, present	0.326	0.235
		50	top	absent	6.005	Top, absent	0.219	0.193
				present	38.487	Top, present	0.362	0.259
			bottom	absent	0.791	Bottom, absent	0.091	0.284
				present	0.059	Bottom, present	0.163	0.127
3	BL	50	top	absent	0.084	Top, absent	0.120	0.128
	EL	threshold	top	absent	3.607	Top, absent	0.162	0.165
				present	0.012	Top, present	0.135	0.201
		10	top	absent	0.118	Top, absent	0.260	0.226
				present	3.350	Top, present	0.352	0.135
			bottom	absent	1.769	Bottom, absent	0.325	0.150
				present	0.450	Bottom, present	0.380	0.143
		50	top	present	21.007	Top, present	0.366	0.270
4	EL		bottom	absent	2.286	Bottom, absent	0.111	0.204
				present	0.142	Bottom, present	0084^	0.164
		10	top	absent	1.141	Top, absent	0.050	0.029
				present	3.334	Top, present	0.111	0.400
		50	top	absent	3.687	Top, absent	0.04l\|	0.177
				present	23.525	Top, present	0.198	0.209

Open in a new tab

Tests of marginal signal detection measures.

Figure 4 plots the differences in the marginal hit and false alarm rates, before and after practice, at each level of contrast. The values plotted are the difference in each measure for one of the features (top, bottom) across the two levels of the other feature (present, absent). The points labeled as ”Top” are the differences between the measure when the bottom feature was present minus the value of the measure when the bottom feature was absent. Figure 5 plots the differences in the marginal measures of sensitivity (d^′) and criterion for the four observers at each level of contrast, before and after practice. Prior to practice, equality of both marginal measures held for all four observers. However, after practice, equality of the marginal d^′s failed 10 times, with the majority of those failures occurring at the two supra-threshold levels of contrast for the top feature. In addition, after practice, equality of the marginal cs failed eight times, with at least one failure for every observer except observer 1. The majority of the failures of equality occurred for the top feature at the two supra-threshold levels of contrast. In all cases the failures of equality involved a more liberal criterion when the other feature was present relative to when it was absent. All of this suggests that perceptual practice resulted in possible failures of both PS and DS.

Figure 5 . — Experiment 1: Differences (Diff) in marginal sensitivity (d′, panels a-d) and criterion (c, panels e-h) at baseline (BL) and endline (EL) at each level of contrast.

Tests of RI and tRI.

Table 3 summarizes the tests of RI and tRI, listing only the cases in which RI or tRI failed. As can be seen, there were no failures of any of the tests of RI prior to practice and only a small number (five) of possible failures of RI or tRI after practice. For those five failures, four were observed at the two supra-threshold levels of contrast. In addition, in four of the five cases, the failures were observed for the stimulus in which both features were present. Finally, there was one observer (4) for whom there was no evidence of any failures of RI or tRI, at either stage of practice or at any level of contrast. The consistency among the three tests here was more limited than for the tests of MRI and tMRI. In four of the five cases, the tests on the response frequencies and RTs were in agreement with each other but in disagreement with the test on the LRP start times. The source of this inconsistency is not readily apparent. In sum, there is at best limited evidence for practice-induced violations of PI.

Table 3.

Experiment 1: Failures of tests of RI (response frequencies) or tRI (RTs and LRP start times). Cells shaded in gray indicate the test that suggested the failure.

					RI	tRI: D
Obs	Test	Contrast	Top	Bottom	χ²	RT	LRP
1	Endline	10	Present	Present	1.000	0.051	0.145
2	Baseline	50	Present	Present	2.500	0.016	0.289
3	Baseline	50	Present	Present	3.000	0.065	0.226
	Endline	threshold	Present	Absent	0.300	0.016	0.240
		10	Present	Present	9.400	0.111	0.273
				counts	0	1	5

Open in a new tab

Hierarchical model fits.

Table 4 summarizes the results of the hierarchical model fitting. Here it can be seen that, prior to practice, there was a uniform preservation of PI, PS, and DS, for all observers, at both levels of contrast. Practice produced three sets of changes. First, none of the possible violations of PI suggested by the tests of RI and tRI were supported by the model fitting. Second, in nine of 12 cases, PS was violated, and in all cases the violations were for the top feature. This suggests that one of the effects of perceptual practice is to alter the representation of the top feature as a function of whether the bottom feature was present or absent. This can be interpreted as a violation of PS in nine of the 12 possible cases. Third, in six of 12 cases, the response criterion for the top feature was altered as a function of the state of the bottom feature. This can be interpreted as violations of DS in half of the 12 possible cases.

Table 4.

Experiment 1: Final inferences from the hierarchical model fitting to the data from each observer, at each time point and level of contrast.

			Presservation of
Test	Obs	Contrast	PI	PS	DS	Parms	ln(L)	AIC
Baseline	1	10	T	T	T	4	−1453.76	2915.51
		50	T	T	T	4	−1210.74	2429.48
	2	10	T	T	T	4	−1508.57	2915.51
		50	T	T	T	4	−1336.18	2680.35
	3	10	T	T	T	4	−1472.05	2952.11
		50	T	T	T	4	−1226.02	2460.05
	4	10	T	T	T	4	−1541.55	3091.11
		50	T	T	T	4	−1299.68	2607.36
		violations	0	0	0
Endline	1	threshold	T	T	T	4	−1502.66	3013.31
		10	T	F: Top	T	5	−1069.95	2149.90
		50	T	F: Top	T	5	−965.45	1940.90
	2	threshold	T	F: Top	T	5	−1469.53	2949.05
		10	T	F: Top	F: Top	6	−1289.52	2591.03
		50	T	F: Top	F: Top	6	−1047.88	2107.76
	3	threshold	T	T	T	4	−1245.90	2499.80
		10	T	F: Top	F: Top	6	−969.70	1957.41
		50	T	F: Top	F: Top	6	−653.46	1320.92
	4	threshold	T	T	T	4	−1541.55	3091.11
		10	T	F: Top	F: Top	6	−1205.35	2422.70
		50	T	F: Top	F: Top	6	−1205.33	2424.65
		violations	0	9	6

Open in a new tab

Discussion

Experiment 1 was intended to test the hypothesis that perceptual learning leads to the creation of perceptual objects, defined in terms of the GRT constructs of PI, PS, and DS. Specifically, we tested the prediction that perceptual learning for two arbitrary, contrast-defined features would produce a shift from separability and independence to violations of perceptual and/or decisional separability, and violations of perceptual independence. Performance (as measured by detection thresholds) reliably decreased for all observers, with this being accompanied by small but reliable increases in false alarm rates. Prior to practice, there was consistent evidence for PS, DS, and PI. In contrast, after practice, there was reasonably consistent evidence for violations of PS, accompanied in a number of cases by violations of DS, suggesting perceptual learning involves changes in both encoding and decision-making. There was, however, no strong evidence that perceptual learning produced violations of PI. Thus, the results can be interpreted in terms of creating perceptual representations in which the features are non-separable, rather than dependent.

Critically, these conclusions were reached on the basis of three sources of data: choice frequencies, choice RTs, and the onset time of the LRP. While frequencies and latencies have been used in conjunction in previous work with GRT (see Townsend et al., 2012), this is the first case to our knowledge that adds a neural variable in a theory-based way as a source of converging evidence. Although there were points of disagreement across these three variables, the consistency was generally high. This combination of evidence allowed us to efficiently identify a set of candidate models, and then use hierarchical model-fitting to adjudicate the final inferences.

Finally, we should note that this approach allowed us to identify individual differences as a function of learning (see also Fific, Nosofsky, & Townsend, 2008). This has long been a critical strength of the GRT approach, allowing for examination of individual differences in strategy and learning. In this case, the approach revealed that observers varied with respect to the prevalence of violations of both PS and DS, suggestive of variations in strategies across observers.

Experiment 2

Experiment 2 uses the theory and methods of SFT (Townsend & Nozawa, 1995; Townsend & Wenger, 2004a, 2004b) to test the hypothesis that perceptual learning results in a change in the processing of encoded representations such that they become perceptual objects. Specifically, we predict that prior to practice, observers will process the elements of a stimulus in ways that would not be associated with the processing of a set of features bound together as an object: in serial, exhaustively, independently, and with limited capacity. After practice, observers should process the elements quite differently, in a way that would be associated with a set of features bound together as an object: in parallel, exhaustively, non-independently, and with unlimited to super-capacity. We test this hypothesis using the same set of stimuli that were used in Experiment 1 with a new set of observers.