Reducing the low-prevalence effect with probe trials

Mark W Becker; Andrew Rodriguez; Derrek T Montalvo; Chad Peltier

doi:10.1186/s41235-025-00702-w

. 2026 Jan 8;11:5. doi: 10.1186/s41235-025-00702-w

Reducing the low-prevalence effect with probe trials

Mark W Becker ^1,^✉, Andrew Rodriguez ¹, Derrek T Montalvo ¹, Chad Peltier ¹

PMCID: PMC12779849 PMID: 41501239

Abstract

As targets become rare in visual search tasks, the likelihood of missing them increases—a phenomenon known as the low-prevalence effect (LPE). This has important implications for real-world searches, but reducing the LPE has proven challenging. In Experiment 1, we used a low-prevalence T-among-Ls task and found that distributing “probe” trials—trials with known targets and post-response feedback—reduced the LPE. In Experiment 2, participants searched for two low-prevalence targets (T and O among Ls and Qs), and we varied how often each appeared in probe trials. The probe benefit scaled with the frequency of the matching target, suggesting limited generalizability to non-probed targets. Experiment 3 used eye tracking to examine whether probes affected quitting thresholds, decision criteria, or guidance. Results showed that probes biased top-down guidance toward features of frequently probed targets, without affecting the number of items inspected or the decision criterion. In Experiment 4, we tested whether feedback was necessary for the probe benefit. Findings suggest that probes improve rare-target search by altering perceived prevalence, not through feedback alone. Overall, probes may reduce the LPE by increasing perceived prevalence and thereby increasing search guidance, but only when probe targets closely match actual search targets.

Keywords: Low-prevalence effect, Visual search, Misses in visual search, Eye tracking

Significance statement

When targets in a visual search task become rare, the likelihood of missing them increases greatly—the low-prevalence effect (LPE). Given that many important real-world search tasks (e.g., radiology, baggage screening) involve low-prevalence targets, finding ways to mitigate the LPE may have real-world consequences. Here we demonstrate that interspersing “probe” trials—trials with known targets and post-response feedback—reliably reduces the LPE. We also investigate the extent to which the probe targets must match the real targets and use eye tracking to determine the search mechanism impacted by probes. Our results suggest that the probe benefit is derived from increasing top-down guidance toward items that share features with frequently probed targets, suggesting that to be effective, probe targets would need to share features with targets in the real search task. Finally, we investigate whether the probe benefit results from the feedback they provide or because their inclusion increases the perceived prevalence rate of targets. Results suggest that the probe benefit results from a change in the perceived prevalence rate of targets. These findings may have implications for improving the detection of rare targets in real-world tasks and provide some insight into the types of probes that would be required for such an approach to be successful.

Introduction

While visual search tasks have been ubiquitous in attentional research, most of that work had targets present on 50% or 100% of trials. While research in radiology first raised the concern that target prevalence rates might impact search performance (Horowitz, 2017; Kundel, 1982), Wolfe and colleagues (Wolfe et al., 2005) were the first visual cognition researchers to explore this issue. They found that as targets in a task become rare, the likelihood of missing them (when they do occur) skyrockets, an effect known as the low-prevalence effect (LPE). The LPE is concerning because it may have implications for a number of important real-world tasks, such as mammography and baggage screening, in which target prevalence rates can be extremely low (Gur et al., 2004; Horowitz, 2017).

This worry has been justified based on findings suggesting that the LPE may influence target detection rates in real-world scenarios. For instance, trained radiologists (Evans et al., 2011) and TSA agents (Wolfe et al., 2013) are susceptible to the LPE, and drivers are less likely to notice road hazards when they become rare (Kosovicheva et al., 2023; Song & Wolfe, 2024). These potential real-world consequences have driven research investigating potential methods to reduce the LPE and improve low-prevalence search.

However, much of this research has found that the LPE is particularly stubborn and difficult to mitigate (Van Wert et al., 2006; Wolfe et al., 2007). For instance, to reduce miss errors researchers have attempted to compel observers to perform a more complete search prior to executing a target absent response. One method of doing so was by using an eye tracker to provide participants with real-time feedback about which sections of the search display had not yet been fixated. This type of real-time feedback did not help; it did not result in a more thorough search of the displays, nor did it impact the magnitude of the LPE (Drew & Williams, 2017; Peltier & Becker, 2017). While that result may be surprising, it is only one of a number of attempts to improve low-prevalence search that have failed to reduce the LPE. For instance, the LPE persists even when the search is shifted to a feature search (Rich et al., 2008); video recording participants during their search improves overall detection but does not reduce the LPE (Miyazaki, 2015); splitting displays so that only half the items appear in each of the two epochs fails to reduce the LPE (Kunar et al., 2010); and implementing a delay before people can respond fails to eliminate the LPE (Wolfe et al., 2007). Having two people perform the search (known as “double reading”) is less conclusive, with some showing it reduces the LPE (Kunar et al., 2021) and others showing it does not (Wolfe et al., 2007). However, even if it does, trying to implement double reading in a real-world scenario may be prohibitively expensive.

By contrast, one recent method has been shown to eliminate the LPE. Taylor and colleagues (Taylor et al., 2022) altered the task instructions—changing from a target present/absent response to identifying which item in the search array most resembled the target. They found that this change in instructions completely eliminated the LPE, and in target present trials the selected item was frequently the actual target. While an interesting finding, from a practical perspective this change in instructions would be problematic, as it results in selection akin to a false alarm in every trial without a target—the vast number of trials in a low-prevalence search. Or to put it another way, you would not want to be in a busy TSA screening line that utilized this approach. Gillies and Kosovicheva (2025) noticed this potential flaw and instructed participants to perform a two-stage task. In the first stage, participants selected the item that was most similar to the target, and in the second stage, they made a binary decision about whether the selected item was actually a target or not. Unfortunately, the LPE returned for this second binary decision, limiting the real-world viability of such an approach.

However, Wolfe and colleagues discovered an alternative approach that successfully reduces the LPE and appears to have real-world feasibility (Wolfe et al., 2007). Their approach (Experiment 7) was to insert an additional mini-block of trials into the middle of the LPE search task. The mini-block involved a set of high-prevalence trials (50% prevalence rate) that involved post-response feedback; during these trials, the search array remained on the screen after the participant’s response and the target was circled in red with text either praising the participant for finding the target (when correct) or telling them they had missed the target. While feedback was given during this high-prevalence mini-block, targets were rare and feedback was absent in the other “true” trials. Even so, they showed that including the mini-block reduced the LPE. In theory, this approach could be implemented in real-world scenarios. For instance, in mammography it would amount to inserting a set of scans with known cancers into the middle of the radiologist’s workload and providing feedback after each of those known scans was evaluated.

Given the potential success of this method, we first sought to replicate their findings using a slightly modified method of presenting the additional “probe” trials. We reasoned that rather than including them in a mini-block, dispersing a set of “probe” target present trials throughout the task may also be effective. If so, it might confer some additional real-world advantages, because there would not be a prolonged period when the observer was off task, and it might allow agencies to track an individual’s performance over time to identify when performance dips and a break is needed.

Indeed, this approach is similar to the Threat Image Projection (TIP) system that many countries mandate as part of airline baggage screening (Catchpole et al., 2023). The TIP system projects a fictitious threat image (e.g., gun, explosive) into travelers’ bags as they are screened through the X-ray machines at security checkpoints. When the security screener detects the threat, they are provided with immediate feedback that the image was a TIP image. This system is used during training and to monitor on-the-job performance of individual operators. If an operator’s monthly performance drops below a given standard, the operator must attended remedial training (U.S. Government Accountability Office, 2016). To date, the primary use of the program is to enhance training and monitor performance. However, if we find that these types of probes increase rare-target detection, it is possible that the use of the program has a direct benefit for rare-target detection (Cutler & Paddock, 2009), rather than just a training benefit.

To foreshadow, Experiment 1 investigates this dispersed probe method, Experiment 2 then investigates whether the probe benefit generalizes to targets whose features do not match the probe, Experiment 3 uses eye tracking to attempt to determine the mechanism responsible for the probe benefit, and Experiment 4 investigates whether the feedback provided in probe trials is essential for their benefit or whether they provide benefit because they change the perceived prevalence of specific targets.

Experiment 1

The main goal of our first experiment was to replicate Wolfe et al.’s (2007) mini-block advantage and determine whether dispersing the mini-block’s “probe trials” throughout the trial sequence would confer a similar advantage.

Participants

A power analysis calculating the sample size required to evaluate the interaction term of a mixed-model ANOVA with an effect size of.18 (halfway between a small and moderate effect size) with power of.95, suggested an overall sample size of 104. Assuming some participants would be non-compliant, we ran a total of 111 participants. Data from nine participants were eliminated from further analyses due to false alarm rates > 45%, leaving a final sample of 102 participants. False alarm rates for the remaining participants were extremely low (M = 1.1%, SE =.45%). All participants were undergraduate students recruited through our campus SONA system and participated for course credit or extra credit. All participants reported normal or correct to normal vision. All methods were approved by Michigan State University’s IRB and participants gave informed consent. We did not ask for age or gender information.

Procedure

The experiment was programed in E-Prime and ran in sound attenuated testing rooms on PCs with 24-inch monitors set at a resolution of 1024 by 768 running at 60 Hz. Each participant completed a control block and an experimental block, with the block order randomized. For roughly half (n = 52) the participants the experimental block included a “mini-block” of trials with feedback similar to Wolfe’s approach. For the other participants (n = 50), the experimental condition included “probe trials” which interleaved a set of target present trials with feedback throughout the block with the “real” trials. Each condition began with 50 practice trials with 10% target prevalence and no feedback to allow observers to set prevalence-appropriate quitting thresholds and decision criterion for a low-prevalence task (Ishibashi et al., 2012). The control block consisted of 250 trials with a 10% prevalence rate and no feedback. Both experimental blocks consisted of 250 “real” trials with a 10% prevalence rate and no feedback plus an additional 50 trials with feedback. The mini-block intervention involved a set of 50 trials with 50% prevalence rate and feedback that was presented halfway through the block of real trials. The probe intervention involved 50 target present trials with feedback that were randomly dispersed throughout the real trials.

Each trial began with a central fixation cross (.5 s), followed by the search array (see Fig. 1). For real trials, the array remained on the screen until the participants made a target present/target absent button response, and then the next trial would begin with its fixation cross. All probe and mini-block trials were followed by feedback. In the probe and mini-block trials, the participant’s button press did not erase the array. If the trial was a target present trial (all probe trials and ½ the mini-block trials), a red circle would appear around the target and the words “The target was present. Take a moment to look at the image” would appear in the middle of the screen. If the trial was a target absent trial (only in mini-block condition), the feedback consisted of the words “The target was absent. Take a moment to look at the image” appearing in the center of the screen. The feedback would be displayed for 3 s, and then the next trial would begin with its fixation cross.

Displays

Target absent displays consisted of 12 black items on a white background. Each was an offset L (for the vertical presentation: ~ 2.3° × ~ 2° with the vertical component indented by ~.3° from the edge of the horizontal line, and a line width of ~.25°) randomly presented at any of the four cardinal directions. In target present displays, one of the offset Ls was replaced by a T appearing in any of the cardinal directions. The screen (~ 52° × 29.5°) was segmented into 70 possible locations (10 × 7 grid) and array items appeared in a randomly selected set of 12 locations.

Results

All analyses were performed only on the non-probe, “real” trials. A series of 2 × 2 mixed-model ANOVAs with Block (Control/Experimental) as a within-factor and Intervention Type (Mini-Block/Probe) as a between-factor were performed, with hit rate, false alarm rate, hit reaction time (RT), and target absent RT as dependent variables in separate analyses.

Accuracy

For hits, there was main effect of Block, F(1, 100) = 32.928, p <.001, η_p² =.248, with higher accuracy for the experimental blocks than the control blocks (see Fig. 2). The main effect of Intervention Type was not significant, F(1, 100) = 3.584, p =.061, η_p² =.035, but there was a trend for overall better performance in the probe intervention than the mini-block intervention. This trend may be due to the fact there were more target present trials in the probe intervention, since 100% of the probes had targets, while only 50% of the mini-block trials had targets. The two factors did not interact, F(1, 100) =.085, p =.772, η_p² =.001. False alarms were rare; all conditions had a false alarm rate < 2.2%. Neither main effect nor the interaction approached significance, all F(1, 100) < 1, all p >.35, all η_p² <.01. Thus, the higher accuracy in the experimental conditions can be attributed to a shift in sensitivity rather than a change in criterion.1

Fig. 2 — Mean hit accuracy for control and intervention blocks as a function of which intervention the subject received. Error bars are the standard error of the mean

Reaction time

One explanation that has been offered for the low-prevalence effect is that low prevalence causes a shift to a lower quitting threshold (Peltier & Becker, 2016; Wolfe & Van Wert, 2010; Wolfe et al., 2005); people search less of the display before responding target absent, thereby increasing the miss rate and producing faster target absent responses. Thus, one possibility is that the experimental interventions improve performance because they shift the quitting threshold higher. To examine this possibility, we performed an ANOVA on target absent RTs (Fig. 3). Consistent with this view, there was main effect of Block, F(1, 100) = 8.416, p =.005, η_p² =.078, with faster target absent RTs in the Control Block (M = 2757.29, SE = 107.13) than the Experimental Blocks (M = 2991.36, SE = 95.01). There was no main effect of Intervention Type, F(1, 100) =.691, p =.408, η_p² =.007, nor an interaction, F(1, 100) = 2.628, p =.108, η_p² =.026.

For completeness we also performed an ANOVA on hit RTs (Fig. 3). Neither main effect approached significance, both F(1, 100) < 1, both p >.5, both η_p² <.005. However, there was a significant interaction, F(1, 100) = 6.305, p =.014, η_p² =.059. The source of the interaction appears to be that the intervention produced slower hit RTs than control in the mini-block condition, but faster RTs than control in the probe condition, although Bonferroni correct follow-up comparison reveals this difference was only significant for the mini-block Intervention (p =.025) and did reach significance for the probe condition (p =.20).

Discussion

Replicating Wolfe et al. (2007), we found that including a high-prevalence mini-block of trials that provide post-response feedback within an otherwise low-prevalence search task improved rare-target detection. In addition, our results suggest that distributing the probe trials throughout the block, rather than presenting them in a mini-block, is equally effective at increasing rare-target detections. Further, we found that both interventions produced slower target absent RTs than in the control block. This change in target absent RTs is consistent with the interventions increasing quitting thresholds, thereby leading to longer and more thorough searches before responding target absent.

Since it appears that both the probe and mini-block methods produce similar benefits, going forward we will test only the distributed probe method developed here. We do so because we believe the probe technique might have some practical advantages—distributed probes could allow one to monitor a searcher’s performance over time and the intervention involves a brief single trial at a time, rather than a prolonged mini-block of trials.

Given the success of these interventions in improving rare-target search, their implementation in real-world search contexts, such as TSA screening or radiology, might provide real-world benefit. However, before making such a suggestion, it is worth noting that in our experiments participants were searching for a single, well-defined target—in our case a letter “T”—and the probe trial targets perfectly matched the targets in the real trials. This raises the question of how closely the targets during probe trials have to match the actual targets to achieve a benefit. From a practical stance, a probe benefit that generalizes beyond targets that closely match the probed targets would be valuable.

The original Wolfe et al. (2007) experiment using mini-blocks provides some insight into this issue. The task required people to search for knives and guns; multiple exemplars of both types of targets were used, and the probe targets were not the exemplars used in the real trials. Thus, their data suggest that probes do not have to be an exact match to the real targets to result in a benefit. Still, it is likely that there was substantial overlap in the features of the probe targets and the real targets in their experiment. They also had 50% of their probe targets match each target category, so each target type was moderately probed. What would be more informative is to have probes match one category more frequently than the other and determine whether the probe benefit generalized to an infrequently probed target. This approach would provide more insight into how well probe benefits generalize to non-probed targets and might provide insight into the mechanisms responsible for the probe benefit.

Three mechanisms have been associated with miss errors in low-prevalence search scenarios. One is the trial-wide quitting threshold—a factor that determines how completely the search array is inspected before making a target absent response (Chun & Wolfe, 1996; Peltier & Becker, 2016; Wolfe & Van Wert, 2010). If targets become rare, the number of items inspected prior to making a target absent response may decrease, leading to more misses and faster target absent RTs (Peltier & Becker, 2016; Wolfe et al., 2005). If probes simply increase this trial-wide quitting threshold, a more thorough search may benefit all targets—even those that do not match probed targets. A second potential mechanism that could impact miss rates is the criterion for identification of a target when evaluating whether a fixated item is a target or not (Hout et al., 2015; Peltier & Becker, 2016; Wolfe & Van Wert, 2010). If probes make this criterion more liberal for the probed target, then the probe benefit might be selective to targets that match the probes (and there might be more false alarms identifying distractors as this type of target). Finally, it is possible that probes increase top-down guidance (Chen & Zelinsky, 2006) toward items that share the frequently probed targets’ features (Hout et al., 2014). If this were the case, the probe benefit would be selective to targets that matched the probed target. Of course, it is also possible that the probe benefit impacts more than one of these factors. For instance, if an increase in the quitting threshold co-occurs with more top-down guidance to the features associated with the frequently probed target, the benefit may still be confined primarily to targets that match or closely match the targets that appear during the probe trials. To investigate this issue in Experiment 2, we had subjects search for two very distinct targets and varied the proportion of probe trials that matched each.

Experiment 2

To investigate whether the probe benefit relies on the probed targets being similar to the actual targets, we ran a series of experiments in which subjects simultaneously searched for either of two targets—a “T” or “O”—among arrays of 12 items. On target absent trials, half the items were offset L distractors and half were Q distractors, providing a set of distractors that was visually similar to one of the targets (T and Ls; O and Qs) and dissimilar to the other (T and Qs; O and Ls). Each participant completed two blocks of trials—a control block and a probe block—with the order of presentation counterbalanced across subjects. The T and L stimuli were identical to Experiment 1; the O was an oval (~ 2° × ~ 2.3°) and the Q was the same oval with a “tail” attached to the oval (~.6°). Each experiment began with 40 practice trials, with 10% prevalence rates for each target. In the control block of 150 real trials, each target appeared on 15 trials for a 10% prevalence rate for each target. When the O target appeared, it replaced a Q distractor in the display, and when a T target appeared it replaced an L distractor, resulting in every display having six O-like stimuli and six T-like stimuli. Target present trials never had both targets present. After the initial practice trials, the probe block consisted of the same 150 trials as the control block plus an additional 50 probe trials that were randomly distributed throughout the block. These probe trials were target present trials that presented the same post-response feedback as in Experiment 1. No feedback was provided in non-probe, real trials. There were three versions of the probe block that varied how frequently the targets in the probe trials matched one of the targets versus the other. In one version, 100% of the probes matched one target, with none matching the other target. In a second version, one target appeared in 80% of the probe trials, with the other target appearing in the remaining 20%. In a final version, both targets appeared in 50% of the probe trials. To avoid confusion, we will refer to this manipulation of the relative frequency of probes as “probe balance.” In all versions, which target was assigned to each probe prevalence condition was counterbalanced across participants. Participants responded in the same way as Experiment 1, they pressed one button when they detected either target or a second button to indicate target absent.