Abstract.
Purpose: Experienced radiologists have enhanced global processing ability relative to novices, allowing experts to rapidly detect medical abnormalities without performing an exhaustive search. However, evidence for global processing models is primarily limited to two-dimensional image interpretation, and it is unclear whether these findings generalize to volumetric images, which are widely used in clinical practice. We examined whether radiologists searching volumetric images use methods consistent with global processing models of expertise. In addition, we investigated whether search strategy (scanning/drilling) differs with experience level.
Approach: Fifty radiologists with a wide range of experience evaluated chest computed-tomography scans for lung nodules while their eye movements and scrolling behaviors were tracked. Multiple linear regressions were used to determine: (1) how search behaviors differed with years of experience and the number of chest CTs evaluated per week and (2) which search behaviors predicted better performance.
Results: Contrary to global processing models based on 2D images, experience was unrelated to measures of global processing (saccadic amplitude, coverage, time to first fixation, search time, and depth passes) in this task. Drilling behavior was associated with better accuracy than scanning behavior when controlling for observer experience. Greater image coverage was a strong predictor of task accuracy.
Conclusions: Global processing ability may play a relatively small role in volumetric image interpretation, where global scene statistics are not available to radiologists in a single glance. Rather, in volumetric images, it may be more important to engage in search strategies that support a more thorough search of the image.
Keywords: medical image perception, gist processing, expertise, scanners and drillers, lung cancer detection
1. Introduction
Identifying an abnormality in a medical image is a critical step toward patient diagnosis and treatment. However, medical image interpretation is a difficult task, and research spanning the past several decades has consistently revealed missed abnormality rates of .1 Given the challenge of this task, one might expect abnormality detection to involve an exhaustive search of the image until an abnormality is located. However, radiologists frequently report sensing an abnormality is present before it is actually located and identified in the image. Consistent with these anecdotal reports, radiologists detect most abnormalities within the first second of interpretation, which is much less time than it would take to complete an exhaustive search of the image.2–4 In addition, radiologists can discriminate between normal and cancerous cases at a rate well-above chance after viewing medical images for only a fraction of a second.5–8 These findings demonstrate that radiologists can extract a remarkable amount of information about a medical image in only a single glance. This phenomenon is referred to as “gist” or “global” processing, and these enhanced perceptual abilities are considered to be a key distinguishing characteristic between experts and novices in radiology.9–11
Although radiologists would never view medical images for only a fraction of a second in clinical practice—accuracy greatly improves with an unlimited viewing time12—these findings provide important insight on how the development of perceptual expertise influences naturalistic search behavior in radiology. Since the early 1970s, researchers have observed both qualitative and quantitative differences in search patterns across radiologists with different levels of experience.13 More experienced radiologists have lower image coverage, make fewer fixations, have larger saccadic amplitude, and fixate on abnormalities more quickly (i.e., shorter time to first fixation) than both naïve and novice observers.10 These findings suggest that experienced radiologists are able to rely more on the global properties of the image for attentional guidance than novices. These enhanced perceptual abilities appear to emerge before expert decision-making abilities develop and without any explicit instruction on search strategy.13,14
The differences in search behavior between experts and novices have led to a number of medical image perception models, each of which posits a major role for global processing in medical image interpretation.15–17 The most recent of these models proposes a two-component visual search process with a non-selective (global) pathway and a selective (local) pathway that operate in parallel.15 The non-selective pathway enables radiologists to rapidly extract the global statistical properties of an image. Although the non-selective pathway helps guide attention to perturbations in the image, detailed information about the abnormalities appears to be limited relative to the selective pathway.6,7 In contrast, the selective pathway is limited in processing capacity but provides fine-grained information that supports the recognition and localization of abnormalities during a more foveal search. This two-pathway model originates in the visual search literature,18 where evidence suggests that global summary statistics (e.g., mean size19 and orientation20 of objects, scene category,21 or direction of motion22) can be extracted from scenes in a single glance, whereas only a limited number of objects can be recognized simultaneously due to limits of object-based attention.23 Global processing ability in radiology is thought to involve the same cognitive mechanisms that allow laypeople to categorize familiar types of scenes after brief image presentations.15,24,25 Through experience, radiologists develop a strong mental representation of a normal medical image, resulting in greater sensitivity to the statistical irregularities associated with an abnormal image. Thus more experienced radiologists are able to rely more on the non-selective pathway than novices, resulting in a search that relies more on information extracted from the periphery than an exhaustive search of the image.
Despite the prominent role of global processing in all major medical image perception models, some caution is warranted on the generalizability of these findings. These models were established using a relatively limited set of tasks: lung cancer detection using chest radiographs and breast cancer detection in mammography. Meanwhile, advancements in medical imaging technology have dramatically changed the size and complexity of medical images over the past several decades. In particular, there has been a shift from two-dimensional (2D) medical images, such as radiographs, to volumetric images, such as computed tomography (CT) scans, that better preserve the underlying three-dimensional (3D) structure of the human body. Volumetric medical images make up an increasingly large portion of radiologists’ workload,26,27 but it remains unclear how global processing ability might manifest in these images, where the global statistical information is embedded in a navigable volume rather than being available to the observer in a single glance.28
Recent studies have evaluated global processing in these new modalities by showing observers videos of volumetric medical images that rapidly transition through the image slices at a fixed rate.29,30 In these studies, observers were able to reliably discriminate between normal and abnormal cases after rapid image presentations and discrimination ability increased with observer experience. Although these studies provide evidence that global processing may play a role in volumetric image interpretation, we do not yet know how experience influences naturalistic search behavior. If more experienced radiologists use a global search strategy, eye tracking metrics associated with experience in 2D medical images, such as reduced image coverage and shorter time to first fixation, should replicate in volumetric image interpretation tasks. However, a recent review paper found that very few of these expertise-related differences in search behavior have been examined in comparable tasks using volumetric images.28
In addition to differences in scan patterns, global processing ability might also change how the observer scrolls through the depth of volumetric images. The global statistical properties of volumetric images are embedded throughout multiple stacked slices. Therefore, forming a global impression must involve some type of interaction with scrolling behavior. For example, an observer might establish a global impression of the image by frequently scrolling through the full depth of the image volume.31 In a recent longitudinal study, radiology residents spent less time conducting “full runs” through the stack toward the end of their training, suggesting that global impressions of the image are established more efficiently with experience.32 Similarly, experts adapted to faster image presentation speeds more easily than novices, which might reflect a shift toward a more global search strategy with experience.33 However, other studies have not found any differences in performance between experts and novices at different image presentation speeds, and very few studies have addressed this question while allowing radiologists to freely scroll through the image stack as they would in clinical practice.34,35
Although global processing ability explains much of the variation between experts and novices in 2D image interpretation tasks, volumetric images introduce other aspects of search behavior that may help explain individual differences in performance. For example, two different strategies have been identified for searching through the depth of chest CT stacks during a lung cancer detection task: scanning and drilling.36 Scanners search broadly across each slice of the CT scan while slowly moving through the image slices. In contrast, drillers keep their eyes relatively fixed in a single region of the lung at a time while rapidly scrolling through the depth of the stack. When given a fixed time limit for each case (3 min), drillers detected more lung nodules and had greater image coverage than scanners. These differences in performance are attributed to the fact that lung cancer nodules appear to flicker in and out of view when the observer scrolls through the image slices, which helps the observer differentiate the nodules from other structures, such as blood vessels, that persist throughout many slices of the image.37
It is not yet clear if the benefits of drilling generalize to tasks beyond lung cancer detection.38,39 However, volumetric images clearly have unique properties that are important to consider in models of perceptual expertise. For example, lung nodules may appear to flicker in and out of view as the observer scrolls through the depth of a CT stack, which may mimic abrupt motion onset cues that are thought to involuntarily capture attention.40 Although there does not appear to be a standard practice for how to instruct radiologists to search through volumetric medical images, search strategy might develop organically with experience. For example, a wider useful field of view (UFOV) might allow more experienced radiologists to take advantage of motion onset cues elicited in the periphery when scrolling through depth. Alternatively, search strategy might be passed on informally from mentor to mentee during training, or radiologists may simply learn that one strategy is more effective than another and begin to adopt it over time. In the original scanner/driller study, drillers reported reading more CT images in an average week than scanners, but radiologists in each group had similar years of experience.36 Although this preliminary evidence that drillers had more regular experience with CT images is promising, that study was not designed to look at experience-related effects on search strategy, requiring more work to fully disentangle the effects of experience versus search strategy on task performance.
In sum, knowledge of how expert search behavior develops in volumetric image interpretation is currently a substantial gap in the medical image perception literature. Here we sought to help fill this gap by characterizing expert search behavior in a large sample of radiologists () with a wide range of experience. In this study, radiologists evaluated chest CT scans for lung cancer nodules. Because lung cancer detection is one of the most well-researched tasks in the medical image perception literature, these findings can be more easily compared to the previous research. The first aim of this study was to determine whether behavioral and eye tracking measures associated with global processing ability in 2D images (accuracy, search time, image coverage, saccadic amplitude, and time to first fixation) replicate in volumetric medical images. Although the search behaviors associated with global processing ability in volumetric images are not yet well-understood, the measures associated with global processing ability in 2D images serve as a useful starting point for understanding expert search behavior in volumetric tasks. In addition, we investigated how radiologists might establish a global impression of the image using novel measures of scrolling behavior (number of depth passes and scrolling speed). The second aim of this study was to determine how overall search strategy changes with experience. Specially, the goals were to: (1) replicate previous findings that drilling is a better strategy than scanning for lung cancer detection and (2) disentangle the effects of experience from search strategy. Together, these analyses help determine whether existing models of medical image perception can account for expert search behavior in volumetric image interpretation, as well as how they might be updated to account for scrolling behavior in volumetric images.
2. Method
A separate analysis of this dataset has been published previously.41
2.1. Participants
Fifty-six radiologists were recruited from the National Cancer Institute’s Perception Lab at a Radiological Society for North America meeting; a hospital in Salt Lake City, UT, United States; and a hospital in Sydney, NSW, Australia. In order to meet the minimum experience level for eligibility in our study, participants were required to be in the first year of a radiology residency program or higher. Five radiologists were excluded from the study prior to participation due to unsuccessful eye tracking calibration, and data from one radiologist were excluded from the analysis due to equipment failure. The final sample consisted of 50 radiologists with a wide range of experience: 25 radiology residents (4 first year, 5 second year, 7 third year, and 9 fourth year), 1 fellow, and 24 attending or practicing radiologists.
Participants at RSNA were entered into a raffle for a chance to win a $500 Amazon gift card, participants in Salt Lake City were compensated with $50, and participants in Sydney volunteered their time. The study procedures were approved by the University of Utah Institutional Review Board and the Macquarie University Human Research Ethics Committee. All participants provided informed written consent and were debriefed following the study.
2.2. Procedure
Participants first completed a questionnaire regarding their level of experience, area of expertise, and demographic information. Next, observers performed a lung cancer detection task using seven axial chest CT scans (one practice and six experimental) viewed in a typical lung window and level. Half of the cases were normal (no lung nodules) and the other half were abnormal (at least one lung nodule). Participants were instructed to identify nodules in diameter by clicking on the nodule’s center of mass with the mouse. Case completion time was unrestricted and participants clicked on a box to move on to the next case. Participants could freely scroll back and forth through the slices of the CT scan using the mouse scroll wheel. On average, there were 148 slices in each CT stack. Following each case, radiologists rated the difficulty of the case from 1 (not at all difficult) to 6 (very difficult).
Participants were situated on a chinrest from a 17-arc sec monitor. Eye movements were recorded using an Eyelink 1000 Plus at a sampling rate of 1000 Hz. Participants underwent a nine-point calibration procedure at the beginning of the study, and recalibrations were performed throughout the task as necessary. To reconstruct eye movements through the volumetric space, the observer’s current position in depth was co-registered with each eye tracking sample and processed offline using custom MATLAB scripts.
2.3. Materials
The abnormal cases contained 9, 11, and 23 nodules, respectively. Five of the six experimental cases were obtained from the Lung Image Database Consortium (LIDC) and the final case was obtained from clinical practice at the University of Utah School of Medicine.42 For the LIDC cases, ground truth was established by four thoracic radiologists who independently marked nodule locations prior to reviewing the anonymized marks of the other three radiologists and rendering a final decision. For the Utah case, author W.A. marked the nodule locations.
2.4. Analysis Plan
The study’s sample size, data exclusion criteria, and primary predictions and analyses were preregistered prior to data collection.43 There are some preregistered analyses that have not yet been conducted as they are beyond the scope of this particular paper (e.g., similarity score and pupillometry). As preregistered, years of experience since graduating medical school and the average number of chest CTs evaluated each week were entered into a multiple linear regression for each of the dependent measures. In addition, in preregistered analyses, image coverage, search strategy (i.e., scanning/drilling), and scrolling speed were regressed onto nodule detection rate to determine which search behaviors predicted better performance. To control for the effects of experience, years of experience and the number of chest CTs read per week were added as predictors in each regression model. The remaining regression analyses were exploratory and not included in the preregistration. We also added a quartile comparison where we compared the bottom and top quartile of each quantitative scanner/driller measure using a between-participants -test to determine how these methods compared to the subjective method of classifying search strategy.
In addition to the preregistered analyses, Bayes factors were calculated to assist in the interpretation of null results and to help identify analyses that might have been underpowered. For the linear regressions, we used a JZS prior with the default scale (). A indicates sufficient evidence for the alternative relative to the null hypothesis, a indicates sufficient evidence for the null relative to the alternative hypothesis, and a between these two values indicates that more evidence is needed for a strong conclusion.44 For each multiple linear regression model, Bayes factors are reported for each predictor variable individually as well as the full model.
3. Results
3.1. Observer Experience
Participants (19 females and 31 males) reported reading 41 (, ) chest CT scans in an average week and had an average of 12 (, ) years of radiology experience since graduating medical school. On average, radiologists were 41 (, ) years old. Of these radiologists, 27 (54%) reported they were American Board of Radiology certified or their country’s equivalent. Twenty (40%) radiologists reported expertise in thoracic imaging. The relationship between experience and each of the dependent measures of search behavior is shown in Table 1.
Table 1.
Measure | Mean | SD | Years value | Chest CTs value | Model value | |||||
---|---|---|---|---|---|---|---|---|---|---|
Sensitivity | 58% | 19% | 0.56 | 0.26 | 57 | −0.001 | 0.001 | 0.40 | 0.04 | 0.24 |
False alarms | 3.4 | 2.4 | 0.21 | 0.17 | 3.44 | −0.04 | 0.009 | 0.14 | 0.08 | 0.55 |
Search time | 137.9 s | 61.7 s | 0.23 | 0.89 | 149.1 | −0.87 | −0.02 | 0.49 | 0.03 | 0.20 |
Coverage | 38% | 13% | 0.17 | 0.54 | 42 | −0.002 | −0.0002 | 0.36 | 0.04 | 0.26 |
Saccadic amplitude | 2.15 deg | 0.77 deg | 0.06 | 0.26 | 1.85 | 0.02 | 0.002 | 0.12 | 0.09 | 0.61 |
Time to first fixation | 567 ms | 596 ms | 0.24 | 0.61 | 501.4 | 8.25 | −0.87 | 0.40 | 0.04 | 0.24 |
Depth passes | 2.3 | 1.7 | 0.07 | 0.22 | 3.03 | −0.04 | −0.006 | 0.11 | 0.09 | 0.65 |
Scrolling speed | 6 | 2 | 0.004 | 0.18 | 7.17 | −0.07 | −0.007 | 0.01 | 0.18 | 4.59 |
Refixation rate | 39% | 11% | 0.047 | 0.27 | 44 | −0.003 | −0.0003 | 0.10 | 0.10 | 0.73 |
Nodule dwell time | 3065.8 ms | 1531.7 ms | 0.60 | 0.38 | 3136 | 9.33 | −3.84 | 0.55 | 0.03 | 0.19 |
Eye movement index | 0.41 | 0.24 | 0.06 | 0.32 | 0.32 | 0.005 | 0.001 | 0.13 | 0.29 | 0.60 |
Change score | 63.74 | 64.31 | 0.02 | 0.15 | 31.26 | 1.79 | 0.26 | 0.03 | 0.14 | 2.04 |
3.2. Task Performance
3.2.1. Accuracy
On average, radiologists reported 58% () of the lung cancer nodules. Contrary to our prediction, neither years of experience, , , , nor the number of chest CTs read per week, , , , predicted nodule detection rate, , [Fig. 1(a) and Table 1]. Next, we calculated false alarms as the average number of clicks per case that were not within 50 pixels of a true nodule. The average number of false alarms per case was 3.4 () nodules. The number of false alarms was not predicted by years of experience, , , , or the number of chest CTs read per week, , , ; , [Fig. 1(b) and Table 1]. However, the Bayes factors suggest more evidence is needed to make a conclusion about whether false alarms differ across experience levels.
3.2.2. Error classification
Using the eye tracking data, miss errors were classified into recognition, search, or decision errors by calculating the cumulative dwell time on the lung nodules.45 Recognition errors were defined as unreported nodules fixated for , decision errors were defined as unreported nodules fixated for more than 1000 ms, and search errors were defined as unreported nodules that were never fixated at all. In addition, we performed a non-preregistered analysis to classify search errors into two different types: (1) the slice containing the abnormality was visited but the nodule was never fixated. (2) The slice containing the abnormality was never visited. However, the second type of search error was only observed in 1/50 radiologists, and therefore search errors were collapsed for all subsequent analyses.
Contrary to our prediction, cumulative dwell time on correctly identified nodules ( and ) did not significantly decrease with years of experience, , , , nor the number of chest CTs read per week, , , ; , . In previous research using a lung cancer detection task with chest radiographs,45 the distribution of miss errors was 45% decision, 30% search, and 25% recognition errors. In this task, there were 51% recognition errors, 39% search errors, and 10% decision errors. One possible reason for the shift from decision to recognition errors in this dataset is that nodules and normal structures (e.g., blood vessels) might be less confusable with each other in volumetric medical images. However, this proposal will need to be tested in future work by directly comparing 2D and volumetric image search when controlling for other characteristics (e.g., abnormality size and location and case difficulty). Although we predicted the number of search errors would differ with experience, years of experience and the number of chest CTs read per week did not predict a greater proportion of any error type, all and all .
3.2.3. Search time
On average, radiologists spent 137.9 () s evaluating each case. Abnormal trials (, ) were searched significantly longer than normal trials (, ), , , . In 2D images, search time would be expected to decrease with experience due to an increased reliance on the global properties of the image. However, in this volumetric image interpretation task, search time did not decrease with years of experience, , , , nor the number of chest CTs read per week, , , ; , [Fig. 1(c) and Table 1]. This pattern of results was the same for both normal and abnormal cases, all , all . Controlling for experience using multiple linear regression, spending more time on each case was a strong predictor of increased nodule detection rate, , , , [Fig. 2(a) and Table 2].
Table 2.
Measure | value | |||
---|---|---|---|---|
Search time | 0.002 | <0.001 | 0.31 | 207.94 |
Coverage | 1.04 | <0.001 | 0.32 | 252.06 |
Saccadic amplitude | −0.14 | 0.02 | 0.14 | 3.39 |
Depth passes | 0.05 | 0.008 | 0.18 | 6.98 |
Scrolling speed | 0.04 | 0.90 | <0.001 | 0.46 |
Eye movement index | −0.29 | 0.01 | 0.17 | 5.54 |
Change score | −0.0001 | 0.77 | 0.04 | 0.47 |
3.3. Eye Movements
3.3.1. Image coverage
To calculate the percentage of lung tissue searched (i.e., image coverage), each slice of the CT scans was converted to a black (non-lung tissue) and white (lung tissue) mask. Using the eye tracking sample data, which consisted of the , , and eye position coordinates sampled once every millisecond, we converted the pixels within a 2.6-deg diameter UFOV of each set of coordinates to black. None of the results reported here substantively differ if coverage is calculated using the fixation data instead of the eyetracking sample data. Although a 5-deg diameter UFOV is commonly used for lung nodule detection tasks using chest radiographs,46 previous research demonstrated a 2.6-deg diameter UFOV is more appropriate for lung nodule detection using chest CT scans.47 Image coverage was calculated as [1 − (the number of white pixels in the final image/the number of white pixels in the original image)].
Consistent with previous research using volumetric medical images, overall image coverage was quite low.36,38,48,49 On average, only 38% () of the total area of the CT scans was searched within a 2.6-deg diameter UFOV. We predicted that image coverage would decrease with observer experience, indicating an ability to rely more on information extracted from the periphery rather than a systematic search. Contrary to this prediction, image coverage did not decrease with years of experience, , , , nor the number of chest CTs evaluated each week, , , ; , [Fig. 3(a) and Table 1]. Controlling for experience using multiple linear regression, searching the images more thoroughly strongly predicted increased nodule detection rate, , , , [Fig. 2(b) and Table 2].
3.3.2. Saccadic amplitude
A larger saccadic amplitude (i.e., the average distance between consecutive fixations expressed in degrees of visual angle) is thought to reflect a more global search strategy and was expected to increase with observer experience.10 On average, saccadic amplitude was 2.15 deg (). Contrary to our prediction, saccadic amplitude did not significantly increase with years of experience, , , , nor the average number of chest CTs read per week, , , ; , [Fig. 3(b) and Table 1]. However, the Bayes factors suggest more evidence is needed before a strong conclusion can be made about the relationship between experience and saccadic amplitude. Controlling for experience using multiple linear regression, having a smaller saccadic amplitude predicted a higher nodule detection rate, , , , (Table 2).
3.3.3. Time to first fixation
Time to first fixation on a detected abnormality in 2D medical images is thought to reflect a more global search strategy and typically decreases with experience.10 To adapt this measure to volumetric images, we calculated time to first fixation relative to the moment the abnormality first became visible when scrolling through the slices.50 If the nodule was not detected the first time, it became visible (i.e., the radiologist moved to another position in depth without clicking on the nodule), time to first fixation was calculated relative to the moment the abnormality first reappeared prior to detection.
On average, radiologists took 567 () milliseconds to fixate on the nodules from the moment they first became visible. Contrary to our prediction, time to first fixation did not decrease with years of experience, , , , nor the number of chest CTs read per week, , , ; , [Fig. 3(c) and Table 1]. Upon visual inspection [Fig. 3(c)], it became apparent that one participant was an outlier ( SDs from the mean). However, the outcome of the multiple linear regression does not change if this outlier is removed, both , all
3.3.4. Refixation rate
Refixation rate was calculated as the proportion of total fixations that were within UFOV (2.6 deg) of a previous fixation (i.e., proportion of fixations that were refixations).51 We predicted that more experienced radiologists would use more systematic search strategies to navigate through the image, resulting in fewer refixations.
On average, 39% () of fixations were refixations. In partial support for our hypothesis, refixation rate decreased with years of experience, , , , but not the number of chest CTs read per week, , , ; , . However, the Bayes factors indicate that more evidence is needed to make a strong conclusion about the relationship between refixations and observer experience. Controlling for experience using multiple linear regression, higher refixation rates predicted better nodule detection performance, , ; , . This result suggests that observers with larger refixation rates may benefit from additional opportunities to detect nodules that might have been missed during the first opportunity for detection. Consistent with this proposal, refixation rate was strongly correlated with search time, , , , , and higher refixation rates were no longer significantly associated with better performance when controlling for search time, , ; , .
3.4. Scrolling Behavior
3.4.1. Depth passes
The number of passes through the depth of the CT scan has been proposed as a metric of global processing ability in volumetric images.31 If experienced observers rely more on a global search strategy, they may make more passes through the depth of the image in order to establish a global impression of the image. Alternatively, if more experienced observers are able to extract the global properties of the image more easily, they might be able to maintain high-performance despite making fewer passes through the depth of the image. The number of passes through depth was defined as the number of times the radiologist scrolled through at least 80% of the depth of the full stack.
On average, radiologists made 2 () depth passes. Contrary to our prediction, the number of passes through depth was not significantly related to years of experience, , , , nor the number of chest CTs read per week, , , ; , [Fig. 4(a) and Table 1]. However, Bayes factors suggest that more evidence is needed before making a strong conclusion about the relationship between the number of passes through depth and experience. Controlling for experience using multiple linear regression, making more passes through depth predicted increased nodule detection rate, , ; , [Fig. 2(c) and Table 2].
3.4.2. Scrolling speed
On average, scrolling speed was 6 () slices per second. We predicted that more experienced observers would scroll through the stack more quickly than less experienced radiologists. However, contrary to this prediction, scrolling speed significantly decreased with years of experience, , , but not the number of chest CTs read per week, , , ; , [Fig. 4(b) and Table 1]. Controlling for experience using multiple linear regression, scrolling speed did not predict differences in nodule detection rate, , , , (Table 2).
3.5. Scanners and Drillers
Radiologists were first tentatively divided into scanners and drillers by analyzing the depth by time plots for each participant following the subjective method used in the previous studies [Fig. 5(a)].36,39 First, the observer’s position in depth was plotted on the axis and time was plotted on the axis. Next, each quadrant of the image was assigned a different color. At each time point, the observer’s eye position on the 2D plane was reduced to a single dimension by plotting each point in the color assigned to that quadrant. Using the depth by time plots, the lead author then made a subjective decision about whether each radiologist was a driller or a scanner according to the descriptions of search strategy outlined by Drew et al. (2013). Qualitatively, driller plots are characterized by spending prolonged time in one region of the lung (typically one quadrant or lobe) at a time while rapidly scrolling through the slices. In contrast, scanners search broadly across the 2D plane while slowly moving through the depth of the CT scan [Fig. 5(a)]. Although depth by time plots can reveal qualitative differences in search strategy, it is unclear how to best capture these differences in search behavior quantitatively. Here we compared two quantitative measures that have been used in the previous research: the eye-movement index36,39 and the change in score.38
In the original scanner/driller study, the authors’ subjective categorizations of search strategy were then tested using the eye movement index.36 On average, scanners should have larger saccadic amplitude and make fewer consecutive fixations in the same quadrant of the lung (i.e., fixation clusters) than drillers. Therefore, if mean saccadic amplitude is plotted on the axis and the average number of fixation clusters per second is plotted on the axis, scanners tend to cluster in the top-right of the figure [Fig. 5(b)]. These measures can then be combined into a single metric by normalizing each score from 0 to 1 and adding the two measures together [Fig. 5(b)].
The eye movement index can help distinguish between scanners and drillers,36 but this metric does not directly take the observer’s movement through depth into account. If drilling is associated with better performance because it enables radiologists to take advantage of abrupt motion onset cues while scrolling through depth, this may be an important aspect of search behavior to quantify. To account for this possibility, scanning and drilling behavior has also been conceptualized as the [summed change in (i.e., scan path length)/the maximum change in ] averaged across 5-s intervals.38 Within a set time period, drillers make more movements in than in compared to scanners, resulting in smaller change in scores than scanners [Fig. 5(c)]. Another promising approach is to classify scanners and drillers based on the number of direction changes that occur during each case.52 However, this measure requires a fixed time limit for each CT scan, so we were not able to use this categorization method for the current dataset.
Both EMI and change in scores have been used in the previous research, but there is no consensus on which best captures the qualitative differences in search strategy observed in depth by time plots. Although there is some overlap in these measures, an observer can still score relatively high on one and relatively low on the other, suggesting they tap into distinct aspects of search behavior [Fig. 5).36,38 Furthermore, it is unclear if search strategy is dichotomous (e.g., scanners versus drillers), or whether it is more appropriate to consider continuous changes in these measures (e.g., more drilling versus less drilling behavior). Here we used the eye movement index and change in scores as continuous predictors for each of the dependent variables using linear regression. In addition, we also divided radiologists into groups based on quartile rankings and compared these results to the subjective categorization method described above. The subjective categorization method and the change in score regression analyses were preregistered,43 but the eye movement index and quartiles analyses were exploratory.
3.5.1. Subjective categorization method
Using the subjective categorization method [Fig. 5(a)], 30% of radiologists were categorized as scanners and 70% were categorized as drillers. We first present the results using this separation and then examine the degree to which the different objective methods of quantifying search strategy impact the results.
Controlling for experience using multiple linear regression, drillers ( and ) detected more of the lung nodules than scanners ( and ), , , [Fig. 6(a)]. Drillers (, ) also made more false alarms per case than scanners (, ), , , , but it is possible that some true nodules may be unmarked in the LIDC database (see also Ref. 15) so we do not want to over-emphasize false alarms. Scanners (, ) made significantly more search errors than drillers (, ), , , , whereas drillers (, ) made significantly more recognition errors than scanners (, ), , , . There were no significant differences between scanners (, ) and drillers (, ) on decision errors, , , .
These large differences in hit rate between the search strategies were not associated with differences in years of experience, , , , nor the number of chest CTs read per week, , , [Fig. 6(a)]. What did seem to drive the improved hit rate was that drillers spent more time evaluating each case, , , , searched the images more thoroughly, , , , and made more passes through depth, , , , than scanners.
Using the subjective categorization, we then examined the eye movement index and change in scores for the two groups. On average, scanners (, ) had a larger eye movement index than drillers (, ), , , ; and scanners (, ) had a larger change in score than drillers (, ), , , .
We then examined the effect of using quantitative categorizations, repeating the above analyses using EMI and quantitative measures as: (1) continuous predictors of performance and (2) to classify radiologists into distinct groups of scanners and drillers using the top and bottom quartiles, respectively.
3.5.2. Eye movement index
First, we used the eye movement index [Fig. 5(b)] as a continuous predictor of performance in a linear regression analysis. Controlling for experience using multiple linear regression, having a smaller eye movement index (drilling) was associated with better nodule-detection rates than having a large eye movement index (scanning), , , , [Figs. 7(a) and Table 2]. Next, we sought to determine whether these measures could be used to establish an objective classification system by dividing radiologists into scanners and drillers using the top and bottom quartiles, respectively. Using this method, 12/12 radiologists in the bottom quartile matched our subjective “drilling” classification, and 11/12 radiologists in the top quartile matched our “scanning” classification [Fig. 5(b)]. If we then look at the performance of these two quartile groups on the nodule detection, the drillers (bottom quartile) detected 70% () of the nodules, on average, whereas the scanners (top quartile) detected only 50% () of the nodules, , , [Fig. 6(b)]. The distribution of error type follows the same pattern as the subjectively categorized results: scanners (, ) made significantly more search errors than drillers (, ), , , , whereas drillers (, ) made significantly more recognition errors than scanners (, ), , , . There were no significant differences between scanners (, ) and drillers (, ) on decision errors, , , .
Neither years of experience, , , (although note the insufficient evidence here), nor the number of chest CTs per week, , , , predicted the eye movement index, , [Fig. 7(a) and Table 1]. The bottom (, ) and top (, ) quartiles did not significantly differ in years of experience, , , [Fig. 6(b)], but the Bayes factor indicates that more evidence is needed to make a strong conclusion. Similarly, the bottom (, ) and top (, ) quartiles did not differ in the number of chest CTs read per week, , , [Fig. 6(b)].
Using the eye movement index as a continuous measure, we found that drilling was associated with longer search times, , , , greater image coverage, , , , and more depth passes, , , . As seen in the subjective classification method, the quartile analysis revealed that drillers (bottom quartile) spent more time evaluating each case, , , , searched the images more thoroughly, , , , and made more passes through depth, , , , than scanners [top quartile, Fig. 8(a)].
3.5.3. Change in XY/Z score
Next, we used the change in scores as our key variable [Fig. 5(c)].38 Controlling for experience using multiple linear regression, change in scores did not significantly predict nodule detection rate, , , , [Fig. 7(b) and Table 2]. Using the quartile method, 12/12 radiologists in the bottom quartile matched our subjective “drilling” classification, and 9/12 radiologists in the top quartile matched our “scanning” classification [Fig. 5(c)]. Drillers detected 65% () of the nodules, whereas scanners detected 56% () of the nodules. These differences were not statistically significant, , , [Fig. 6(c)], but the Bayes factors indicate there is insufficient evidence to interpret these null findings. For the change in score, there were no significant differences in the type of miss errors between scanners and drillers, all .
Radiologists with a larger change in score (scanners) tended to have more years of experience, , , , but there was no relationship between change in score and the number of chest CTs read per week, , , ; , [Fig. 7(b) and Table 1]. Drillers (, ) had fewer years of experience than scanners (, ), , , , but the bottom (, ) and top (, ) quartiles did not differ in the number of chest CTs read per week, , , [Fig. 6(c)].
Using change in scores as a continuous measure, drilling was associated with greater image coverage, , , , and more depth passes, , , but was not significantly related to search time, , , . Similarly, in the quartile analysis, drillers had longer search times, , , , greater image coverage, , , , and more depth passes, , , , than scanners [Fig. 8(b)].
3.5.4. Results summary
Across the three methods of classifying search behavior, our results largely replicate previous findings that drilling is a superior strategy for lung nodule detection than scanning when controlling for the effects of experience. Both the subjective categorization method and the eye movement index revealed greater nodule detection for drilling than scanning. The change in score did not significantly predict performance; however, the Bayes factors indicate that these analyses are not interpretable with this sample size. Critically, this study expands on the previous research by examining whether differences in experience level between the two groups can account for differences in performance. On average, drillers tended to have less experience than scanners [Fig. 7(b)], which is inconsistent with the idea that radiologists learn to adopt better search strategies with experience. However, this data should not be interpreted as evidence that more experienced observers are worse at the task overall. We do not see any evidence for a negative relationship between experience and detection rate in our dataset [Fig. 1(a)], and there are many additional factors that may explain variation in task performance beyond search strategy. Rather, these results demonstrate that drilling behavior predicts better performance above and beyond the effects of experience. Drillers may have performed better on the task because they engaged in a more systematic search of the images: regardless of how we classified the radiologists, drilling was associated with greater image coverage, making more passes through depth, and spending more time on each case.
4. Discussion
In this study, we examined how naturalistic search behavior differed across radiologists with varying levels of experience during lung cancer detection with volumetric images. This research makes two primary contributions to the literature. First, contrary to predictions based on findings from studies using 2D medical images, we did not find evidence in support of global processing-related changes in search behavior with experience—and, importantly, we demonstrate evidence for the null using Bayes analyses. Null results were consistent across a number of measures that have been closely associated with expertise in 2D medical image interpretation (search time, image coverage, saccadic amplitude, and time to first fixation) as well as novel measures of scrolling behavior (depth passes and scrolling speed) that have been proposed as potential indices of expertise in volumetric image interpretation.31,33 Second, we identified several strong predictors of individual differences in task performance for lung cancer detection. Although experts tend to have better performance than novices in 2D interpretation tasks despite lower image coverage, we found that performance in our volumetric task was closely related to how many opportunities there were for abnormality detection. Specifically, better performance was predicted by spending more time on each case, searching the images more thoroughly, and making more passes through the depth of the CT scan. Observers who adopted a drilling search strategy detected more of the lung cancer nodules than scanners, which may be due to differences in how systematically the images were searched. Critically, these performance differences do not appear to be driven by differences in experience level. Drilling remained a significant predictor of the performance when controlling for differences in experience, and there was limited evidence that drillers actually had fewer years of experience than scanners. Together, these findings have important implications for current models of perceptual expertise and may provide insight on how to train radiologists to evaluate volumetric images.
Although this research suggests a smaller role for global processing in volumetric image interpretation than in 2D images, these results need to be reconciled with recent reports that radiologists can reliably classify volumetric images as normal or abnormal after brief video presentations.29,30 The current study used lung cancer detection rather than the breast cancer and prostate cancer detection tasks used in the previous studies, suggesting differences in stimulus characteristics (e.g., abnormality size) might account for the different findings. In addition, the type of signal that supports abnormality detection in gist processing studies could be quite different in volumetric images, where a global “snapshot” of the image is not present. Instead, the abrupt motion onset cues elicited by abnormalities in the periphery as the videos transition through depth might be the key driver for performance, rather than sensitivity to global scene statistics per se. In future research, it may be fruitful to determine how different abnormality characteristics, such as their ability to elicit motion onset cues, relate to performance in gist processing studies. In previous flash-viewing studies, radiologists were able to detect cancerous “signals” in the breast opposite to the lesion, as well as images taken years before the development of a detectable mass.53,54 If the presence of a mass is also unnecessary for gist processing in volumetric images, it would suggest that the outcome of previous gist processing studies did not depend solely on motion onset cues that may have been generated by the abnormalities when the videos transitioned through depth.
A clearly plausible explanation for global processing playing a smaller role in volumetric rather than 2D image interpretation is that the global statistical properties of the image cannot be extracted in a single glance and must instead be acquired as the observer scrolls through the depth of the stack. If the gist of the image is not readily available, it might then become more important to rely on a more systematic, foveal search through the image, which is a characteristic of drilling behavior. Consistent with this interpretation, many of our current results show the opposite relationship between scan patterns and task performance than would be expected under global processing models. Specifically, nodule detection rate was strongly predicted by how thoroughly the images were searched, suggesting less information can be extracted from the periphery during volumetric image interpretation. Notably, this is consistent with the recent work demonstrating that UFOV is lower for lung cancer detection in volumetric medical images than UFOV estimates established for the same task using chest radiographs.47
Searching medical images more thoroughly might be particularly important in volumetric images due to their large size (when taking depth into account). Because volumetric images consist of hundreds of stacked 2D images,26,28 abnormalities represent a smaller fraction of the total image size and are only visible for a brief period of the overall search time.47 In addition, image coverage tends to be much lower in volumetric medical images than in 2D medical images, and recent work suggests using volumetric images may not be beneficial if abnormalities cannot be readily detected using peripheral vision.38,48,49 Image size is a particularly important consideration in light of findings from the visual search literature that demonstrate that memory for where you have already searched is very limited. At best, observers are able to remember their most recent 3 to 4 fixations or have a rough representation of their general scan path.55–58 At worst, observers are no better than chance at distinguishing their own scan patterns from someone else’s following a visual search task.59–61 Finally, memory appears to be impaired following brief interruptions in volumetric image interpretation tasks,51,62 suggesting memory representations might be easily disrupted in stacked images. Thus previous findings suggest that it might be particularly difficult to maintain a reliable representation of which regions of the image have already been searched in volumetric images, and the current work suggests that optimally deciding when to terminate search may be a strong predictor of individual differences in the performance.
Although this discussion highlights the potential costs of searching through stacked images, this should not be considered criticism of using volumetric images in radiology. Volumetric medical images are associated with better overall diagnostic accuracy across a wide range of clinical tasks and allow radiologists to more easily envision the underlying 3D nature of anatomical structures and abnormalities.37,38,48,63,64 The current results demonstrate there may be an optimal strategy for evaluating volumetric medical images in clinical practice. In the previous research, rapidly drilling through the slices appeared to be a better strategy for lung nodule detection than scanning the 2D plane while slowly moving through depth in a time limited study (3 min per case).36 Here we replicated these findings while allowing observers an unlimited amount of time to evaluate each case, demonstrating that it is not simply that scanning is a less efficient method, but that it actually does lead to more miss errors. Most importantly, our findings demonstrate these effects were not driven by differences in level of experience: scanners and drillers had a similar number of years of experience and chest CTs evaluated per week. Moreover, drilling behavior remained a significant predictor of task performance when controlling for observer experience. Thus it may be beneficial for radiologists to adopt a drilling strategy when evaluating chest CT images for lung nodules.
Although the benefits of drilling may be due to the ability to elicit abrupt motion onset cues when rapidly scrolling through depth,37 there are other potential explanations for the observed differences in the performance. For example, using this dataset alone, we cannot rule out that the drillers in our study might have been more conscientious or motivated than scanners, on average, resulting in both better performance and a more thorough search of the images. However, the previous research found that teaching radiology residents to use a drilling strategy improved task performance, which suggests the benefits of drilling cannot solely be a result of group-level differences between observers.65 Alternatively, drilling might be a more effective strategy because it reflects a more systematic approach to searching through volumetric images. Drillers tend to search through one lobe or quadrant of the lung before moving on to a next one, which appears to result in a more thorough search of the image. In contrast, scanners’ search patterns appear to have little organizational structure when the dimension is collapsed.36 Given the large size of image stacks relative to a single 2D image, engaging in any systematic search strategy that reduces memory load and improves image coverage might lead to better performance. In support of this proposal, making smaller eye movements on the 2D plane and searching in one quadrant of the lung at a time (i.e., saccadic amplitude and eye movement index) predicted better task performance, whereas measures of the rate of movement through depth (i.e., scrolling speed and change in ) did not. Although caution is warranted when interpreting null results with inconclusive Bayes factors (e.g., change in ), this pattern of results would be unexpected if drilling was a better strategy primarily because of abrupt motion onset cues.
This study has a number of limitations. First, although this study had a larger sample size than similar studies in the medical image perception literature (on average: 7.73 experts, 5.60 intermediates, and 8.36 novices per study10), the sample size was still smaller than ideal for an individual differences study due to the inherent difficulty of collecting large samples of expert radiologists. To address this concern, Bayes factors have been included for each analysis to help distinguish between results with sufficient power and those that will require more evidence. Many of these analyses reached the threshold for sufficient evidence in favor of either the null or alternative hypothesis44 (Tables 1 and 2). However, some of the critical analyses (e.g., the relationship between experience and saccadic amplitude) will require follow-up studies with larger sample sizes to make a strong conclusion. At minimum, given the relatively large sample size of this study, these results suggest experience-related changes in search behavior are likely much smaller in volumetric images than the effects observed in the previous studies with 2D medical images. In addition, the “ground truth” for abnormality presence or absence is difficult to establish when using real medical images. Although the LIDC database includes every nodule that was marked by at least one of the expert observers rather than expert consensus, it is still possible that some of the less conspicuous nodules were missed by all of the expert observers.66 In future research, it may be fruitful to replicate these findings using simulated nodules or to use a method of analysis that does not require an independent assessment of ground truth.67,68 Finally, in future studies, it may be beneficial to investigate expert search behavior in a more clinically valid context. For example, this study had a relatively high abnormality prevalence rate, which may have increased observer fatigue or shifted the observers’ decision criterion for marking an abnormality.
This study largely replicated previous findings that drilling is a better strategy for lung nodule detection than scanning when controlling for the effects of experience.36 However, there are significant challenges in how to classify radiologists as scanners or drillers. In this study, like others in the literature, we initially divided the radiologists into groups by analyzing their depth by time plots and subjectively categorizing them based on the scan patterns [Fig. 5(a)].36,39 There are clear and significant limitations to using a subjective approach for categorizing search strategy, but there is currently no consensus on how to best capture search strategy using quantitative metrics. In the original scanner/driller study, the subjective categorization method closely matched each observer’s eye movement index30 (see also Ref. 33). Here using quartiles to objectively divide radiologists into groups based on this metric also closely (but imperfectly) matched the subjective groups [Fig. 5(b)] and independently predicted task performance as both a dichotomous [quartile analysis, Fig. 6(b)] and continuous [regression analysis, Fig. 7(a)] variable. Change in score also roughly matched the subjective categorization of search strategy but did not match the groups as well as the eye movement index [Fig. 5(c)].38 Unlike the eye movement index, change in did not independently predict performance in this task [Figs. 6(c) and 7(b)], suggesting these metrics may reflect different aspects of search behavior. This study is the first to compare all three methods of characterizing drilling behavior in relation to task performance, and these results suggest the eye movement index might be a suitable alternative to subjective categorization methods. In addition, future studies might use a data-driven approach (e.g., principal components analysis) or instruct observers to use a particular search strategy to determine which eye tracking metrics are able to best classify the groups. In addition, it was previously unclear if it is more appropriate to use quantitative measures to separate radiologists into distinct categories (i.e., the quartile analysis) or as continuous predictors of search behavior (i.e., the regression analysis). Here the quartile analysis and the regression analysis showed the same pattern of results, suggesting it may be unnecessary to divide radiologists into distinct groups. Finally, we still need to establish the reliability of search strategy within an observer and between tasks to determine whether search strategy is internal to a radiologist or unique to specific task circumstances.
The current findings suggest that global processing plays a lesser role in volumetric image interpretation than in 2D analogous tasks, but alternative accounts for these results should be considered. Here we quantified experience in terms of both the overall number of years spent practicing radiology (i.e., years of experience) and the degree of routine experience with the task (i.e., chest CTs per week). However, the number of chest CTs read per week did not relate to any of our key variables, and it is unclear whether self-reported estimates of task experience are reliable. Critically, however, none of our results substantively differ if only years of experience are included in the regression models. In addition, the eye tracking and behavioral measures associated with global processing ability in 2D medical image interpretation may not tap into the same cognitive process in volumetric image interpretation. For example, saccadic amplitude is highly confounded with search strategy. By definition, scanners have a larger saccadic amplitude than drillers because they engage in larger, sweeping eye movements across the 2D plane. Similarly, it is unclear how to best classify miss errors in volumetric images. Although we categorized search errors in volumetric medical images in the same way as previous studies using 2D images, it is debatable whether the 1000-ms threshold used for distinguishing between recognition and decision errors in 2D images is appropriate for volumetric data. Prolonged nodule fixations might be less common for dynamic stimuli than 2D images, which could artificially inflate the number of recognition errors in volumetric images. In future research, it would be beneficial to use a data-driven approach with a larger stimulus set in order to identify an appropriate threshold.69 Finally, it is unclear how to best address the inherent differences in the time to first fixation measure between 2D and volumetric images. In volumetric images, radiologists have multiple opportunities to detect an abnormality when scrolling back and forth through depth. As a result, we calculated time to first fixation relative to the moment the abnormality becomes visible during the instance it was detected. Here to avoid some of these concerns, we focused on a set of metrics that would reflect global processing ability rather than focusing heavily on the results of any single metric. Because enhanced global processing ability results in a wider UFOV, one would predict that a global search strategy would be associated with shorter search times, reduced time to first fixation, and smaller image coverage regardless of whether the images are 2D or volumetric. Across each of these measures, we did not find evidence that any of these global processing measures differed with experience in this task. However, as the global properties of volumetric medical images are not yet well-defined, there is ample opportunity for additional research in this area.
This research may ultimately provide insight on how radiology residents should be trained to search through volumetric medical images. In 2D interpretation tasks, translating expertise-related changes in search behavior into training techniques has proven to be quite difficult. Because experts’ enhanced perceptual abilities are closely linked to repeated exposure to medical images,70–72 efforts to train novices to adopt the search patterns of experts have been largely unsuccessful at improving diagnostic accuracy, and there are currently no known “shortcuts” to enhanced global processing ability in radiology.73–76 Here we did not find experience-related differences in search behavior that might reflect a more global search strategy. Instead, we found that individual differences in task performance were closely related to whether the observer drilled through the image slices and searched the images thoroughly. These results are intriguing because they suggest that instructing radiologists to engage in these search behaviors during training could translate to better diagnostic accuracy in clinical practice.65
5. Conclusion
Although research dating back to the early 1970s has demonstrated that experience improves global processing ability, this study is the first to test this prediction in a volumetric image interpretation task while allowing radiologists to freely scroll through the image slices. Across a wide range of measures that have been associated with experience in previous research, we found evidence that experience was not predictive of performance when searching volumetric medical images. These findings suggest the ability to extract the global statistical properties of an image might be more difficult in image stacks. Rather than individual differences in global processing ability, diagnostic performance was closely related to whether radiologists engaged in drilling versus scanning, with drilling being a more thorough, systematic search of the image that resulted in better detection. In future research, it may be fruitful to focus on whether instructing radiologists to use a drilling strategy improves image coverage and task performance. Overall, these findings demonstrate that existing models of perceptual expertise in radiology do not fully account for search behavior in volumetric images, and addressing this gap in the literature is a promising avenue for future research.
Acknowledgements
We would like to acknowledge the NCI Perception Laboratory at RSNA and David Alonso for the help with participant recruitment as well as the radiologists that participated in our study. We were also grateful to the NIH (Grant #1R01CA225585-01 for T. D.) and the NSF Graduate Research Fellowship Program (No. #1747505 for L. H. W.).
Biography
Biographies of the authors are not available.
Disclosures
The authors declare they have no competing interests.
Contributor Information
Lauren H. Williams, Email: l8williams@ucsd.edu.
Ann J. Carrigan, Email: ann.carrigan@mq.edu.au.
Megan Mills, Email: megan.mills@hsc.utah.edu.
William F. Auffermann, Email: william.auffermann@hsc.utah.edu.
Anina N. Rich, Email: anina.rich@mq.edu.au.
Trafton Drew, Email: trafton.drew@psych.utah.edu.
References
- 1.Berlin L., “Accuracy of diagnostic procedures: has it improved over the past five decades?” Am. J. Roentgenol. 188(5), 1173–1178 (2007). 10.2214/AJR.06.1270 [DOI] [PubMed] [Google Scholar]
- 2.Donovan T., Litchfield D., “Looking for cancer: expertise related differences in searching and decision making,” Appl. Cognit. Psychol. 27, 43–49 (2013). 10.1002/acp.2869 [DOI] [Google Scholar]
- 3.Kundel H. L., et al. , “Holistic component of image perception in mammogram interpretation: gaze-tracking study,” Radiology 242, 396–402 (2007). 10.1148/radiol.2422051997 [DOI] [PubMed] [Google Scholar]
- 4.Kundel H. L., et al. , “Using gaze-tracking data and mixture distribution analysis to support a holistic model for the detection of cancers on mammograms,” Acad. Radiol. 15, 881–886 (2008). 10.1016/j.acra.2008.01.023 [DOI] [PubMed] [Google Scholar]
- 5.Carmody D. P., Nodine C. F., Kundel H. L., “Finding lung nodules with and without comparative visual scanning,” Percept. Psychophys. 29, 594–598 (1981). 10.3758/BF03207377 [DOI] [PubMed] [Google Scholar]
- 6.Carrigan A. J., Wardle S. G., Rich A. N., “Finding cancer in mammograms: if you know it’s there, do you know where?” Cognit. Res. Princ. Implic. 3(1), 10 (2018). 10.1186/s41235-018-0096-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Evans K. K., et al. , “The gist of the abnormal: above-chance medical decision making in the blink of an eye,” Psychon. Bull. Rev. 20, 1170–1175 (2013). 10.3758/s13423-013-0459-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kundel H. L., Nodine C. F., “Interpreting chest radiographs without visual search,” Radiology 116, 527–532 (1975). 10.1148/116.3.527 [DOI] [PubMed] [Google Scholar]
- 9.Brams S., et al. , “The relationship between gaze behavior, expertise, and performance: a systematic review,” Psychol. Bull. 145(10), 980–1027 (2019). 10.1037/bul0000207 [DOI] [PubMed] [Google Scholar]
- 10.Gegenfurtner A., Lehtinen E., Säljö R., “Expertise differences in the comprehension of visualizations: a meta-analysis of eye-tracking research in professional domains,” Educ. Psychol. Rev. 23, 523–552 (2011). 10.1007/s10648-011-9174-7 [DOI] [Google Scholar]
- 11.van der Gijp A., et al. , “Interpretation of radiological images: towards a framework of knowledge and skills,” Adv. Health Sci. Educ. Theory Pract. 19(4), 565–580 (2014). 10.1007/s10459-013-9488-y [DOI] [PubMed] [Google Scholar]
- 12.Oestmann J. W., “Lung lesions: correlation between viewing time and detection,” Radiology 166(2), 451–453 (1988). 10.1148/radiology.166.2.3336720 [DOI] [PubMed] [Google Scholar]
- 13.Kundel H. L., La Follette P. S., “Visual search patterns and experience with radiological images,” Radiology 103(3), 523–528 (1972). 10.1148/103.3.523 [DOI] [PubMed] [Google Scholar]
- 14.Kelly B. S., et al. , “The development of expertise in radiology: in chest radiograph interpretation, “expert” search pattern may predate “expert” levels of diagnostic accuracy for pneumothorax identification,” Radiology 280(1), 252–260 (2016). 10.1148/radiol.2016150409 [DOI] [PubMed] [Google Scholar]
- 15.Drew T., et al. , “Informatics in radiology: what can you see in a single glance and how might this guide visual search in medical images?” Radiographics 33, 263–274 (2013). 10.1148/rg.331125023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Nodine C. F., Kundel H. L., “The cognitive side of visual search in radiology,” in Eye Movements from Physiology to Cognition, O'Regan J. K., Levy-Schoen A., Eds., Elsevier; (1987). [Google Scholar]
- 17.Swensson R. G., “A two-stage detection model applied to skilled visual search by radiologists,” Percept. Psychophys. 27, 11–16 (1980). 10.3758/BF03199899 [DOI] [Google Scholar]
- 18.Wolfe J. M., et al. , “Visual search in scenes involves selective and nonselective pathways,” Trends Cognit. Sci. 15(2), 77–84 (2011). 10.1016/j.tics.2010.12.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Chong S. C., Treisman A., “Representation of statistical properties,” Vision Res. 43(4), 393–404 (2003). 10.1016/S0042-6989(02)00596-5 [DOI] [PubMed] [Google Scholar]
- 20.Parkes L., et al. , “Compulsory averaging of crowded orientation signals in human vision,” Nat. Neurosci. 4(7), 739–744 (2001). 10.1038/89532 [DOI] [PubMed] [Google Scholar]
- 21.Greene M. R., Oliva A., “Recognition of natural scenes from global properties: seeing the forest without representing the trees,” Cognit. Psychol. 58(2), 137–176 (2009). 10.1016/j.cogpsych.2008.06.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Williams D. W., Sekuler R., “Coherent global motion percepts from stochastic local motions,” ACM SIGGRAPH Comput. Graphics 18(1), 24–24 (1984). 10.1145/988525.988533 [DOI] [PubMed] [Google Scholar]
- 23.Müller-Plath G., Elsner K., “Space-based and object-based capacity limitations in visual search,” Vis Cognit. 15(5), 599–634 (2007). 10.1080/13506280600845572 [DOI] [Google Scholar]
- 24.Greene M. R., Oliva A., “The briefest of glances: the time course of natural scene understanding,” Psychol. Sci. 20(4), 464–472 (2009). 10.1111/j.1467-9280.2009.02316.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Potter M. C., “Short-term conceptual memory for pictures,” J. Exp. Psychol. [Hum. Learn.] 2(5), 509–522 (1976). 10.1037/0278-7393.2.5.509 [DOI] [PubMed] [Google Scholar]
- 26.Andriole K. P., et al. , “Optimizing analysis, visualization, and navigation of large image data sets: one 5000-section CT scan can ruin your whole day,” Radiology 259, 346–362 (2011). 10.1148/radiol.11091276 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.McDonald R. J., et al. , “The effects of changes in utilization and technological advancements of cross-sectional imaging on radiologist workload,” Acad Radiol. 22, 1191–1198 (2015). 10.1016/j.acra.2015.05.007 [DOI] [PubMed] [Google Scholar]
- 28.Williams L. H., Drew T., “What do we know about volumetric medical image interpretation?: a review of the basic science and medical image perception literatures,” Cognit. Res. Princ. Implic. 4(1), 21 (2019). 10.1186/s41235-019-0171-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Treviño M., et al. , “Rapid perceptual processing in two- and three-dimensional prostate images,” J. Med. Imaging 7(2), 022406 (2020). 10.1117/1.JMI.7.2.022406 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Wu C.-C., et al. , “Gist processing in digital breast tomosynthesis,” J. Med. Imaging 7(2), 022403 (2019). 10.1117/1.JMI.7.2.022403 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Venjakob A., et al. , “Radiologists’ eye gaze when reading cranial CT images,” Proc. SPIE 8318, 83180B (2012). 10.1117/12.913611 [DOI] [Google Scholar]
- 32.van Montfort D., et al. , “Expertise development in volumetric image interpretation of radiology residents: what do longitudinal scroll data reveal?” Adv. Health Sci. Educ. 26, 1–30 (2020). 10.1007/s10459-020-09995-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Bertram R., et al. , “Eye movements of radiologists reflect expertise in CT study interpretation: a potential tool to measure resident development,” Radiology 281, 805–815 (2016). 10.1148/radiol.2016151255 [DOI] [PubMed] [Google Scholar]
- 34.Bertram R., et al. , “The effect of expertise on eye movement behaviour in medical image perception,” PLoS One 8, e66169 (2013). 10.1371/journal.pone.0066169 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Diaz I., et al. , “Eye-tracking of nodule detection in lung CT volumetric data,” Med. Phys. 42, 2925–2932 (2015). 10.1118/1.4919849 [DOI] [PubMed] [Google Scholar]
- 36.Drew T., et al. , “Scanners and drillers: characterizing expert visual search through volumetric images,” J. Vision 13, 3 (2013). 10.1167/13.10.3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Seltzer S. E., et al. , “Spiral CT of the chest: comparison of cine and film-based viewing,” Radiology 197, 73–78 (1995). 10.1148/radiology.197.1.7568857 [DOI] [PubMed] [Google Scholar]
- 38.Aizenman A., et al. , “Comparing search patterns in digital breast tomosynthesis and full-field digital mammography: an eye tracking study,” J. Med. Imaging 4(4), 045501 (2017). 10.1117/1.JMI.4.4.045501 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kelahan L. C., et al. , “The radiologist’s gaze: mapping three-dimensional visual search in computed tomography of the abdomen and pelvis,” J. Digital Imaging 32(2), 234–240 (2019). 10.1007/s10278-018-0121-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Abrams R. A., Christ S. E., “Motion onset captures attention,” Psychol. Sci. 14(5), 427–432 (2003). 10.1111/1467-9280.01458 [DOI] [PubMed] [Google Scholar]
- 41.Williams L., et al. , “The invisible breast cancer: experience does not protect against inattentional blindness to clinically relevant findings in radiology,” Psychon. Bull. Rev. 28, 503–511 (2020). 10.3758/s13423-020-01826-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Armato S. G., et al. , “The Lung Image Database Consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans,” Med. Phys. 38(2), 915–931 (2011). 10.1118/1.3528204 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Williams L., et al. , What are the characteristics of expertise in volumetric medical image search? (2019).
- 44.Jeffreys H., The Theory of Probability, Oxford University Press, Oxford: (1998). [Google Scholar]
- 45.Kundel H. L., Nodine C. F., Carmody D., “Visual scanning, pattern recognition and decision-making in pulmonary nodule detection,” Invest. Radiol. 13, 175–181 (1978). 10.1097/00004424-197805000-00001 [DOI] [PubMed] [Google Scholar]
- 46.Kundel H. L., et al. , “Searching for lung nodules. A comparison of human performance with random and systematic scanning models,” Invest. Radiol. 22, 417–422 (1987). 10.1097/00004424-198705000-00010 [DOI] [PubMed] [Google Scholar]
- 47.Rubin G. D., “Lung nodule and cancer detection in computed tomography screening,” J. Thorac. Imaging 30, 130–138 (2015). 10.1097/RTI.0000000000000140 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Adamo S. H., et al. , “Mammography to tomosynthesis: examining the differences between two-dimensional and segmented-three-dimensional visual search,” Cognit. Res. Princ. Implic. 3, 17 (2018). 10.1186/s41235-018-0103-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Lago M. A., et al. , “Interactions of lesion detectability and size across single-slice DBT and 3D DBT,” Proc. SPIE 10577, 105770X (2018). 10.1117/12.2293873 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Helbren E., et al. , “Towards a framework for analysis of eye-tracking studies in the three dimensional environment: a study of visual search by experienced readers of endoluminal CT colonography,” Br. J. Radiol. 87, 20130614 (2014). 10.1259/bjr.20130614 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Williams L. H., Drew T., “Distraction in diagnostic radiology: how is search through volumetric medical images affected by interruptions?” Cognit. Res. Princ. Implic. 2, 12 (2017). 10.1186/s41235-017-0050-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Ba A., et al. , “Search of low-contrast liver lesions in abdominal CT: the importance of scrolling behavior,” J. Med. Imaging 7(4), 045501 (2020). 10.1117/1.JMI.7.4.045501 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Brennan P. C., et al. , “Radiologists can detect the ‘gist’ of breast cancer before any overt signs of cancer appear,” Sci. Rep. 8(1), 1–12 (2018). 10.1038/s41598-018-26100-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Evans K. K., et al. , “A half-second glimpse often lets radiologists identify breast cancer cases even when viewing the mammogram of the opposite breast,” Proc. Natl. Acad. Sci. U. S. A. 113(37), 10292–10297 (2016). 10.1073/pnas.1606187113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Dickinson C. A., Zelinsky G. J., “Memory for the search path: evidence for a high-capacity representation of search history,” Vision Res. 47, 1745–1755 (2007). 10.1016/j.visres.2007.02.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Godwin H. J., Benson V., Drieghe D., “Using interrupted visual displays to explore the capacity, time course, and format of fixation plans during visual search,” J. Exp. Psychol. Hum. Percept. Perform. 39, 1700–1712. (2013). 10.1037/a0032287 [DOI] [PubMed] [Google Scholar]
- 57.Klein R. M., MacInnes W. J., “Inhibition of return is a foraging facilitator in visual search,” Psychol. Sci. 10, 346–352 (1999). 10.1111/1467-9280.00166 [DOI] [Google Scholar]
- 58.Peterson M. S., Beck M. R., Vomela M., “Visual search is guided by prospective and retrospective memory,” Percept. Psychophys. 69, 123–135 (2007). 10.3758/BF03194459 [DOI] [PubMed] [Google Scholar]
- 59.Foulsham T., Kingstone A., “Where have eye been? Observers can recognise their own fixations,” Perception 42, 1085–1089 (2013). 10.1068/p7562 [DOI] [PubMed] [Google Scholar]
- 60.Võ M. L. H., Aizenman A. M., Wolfe J. M., “You think you know where you looked? You better look again,” J. Exp. Psychol. Hum. Percept. Perform. 42, 1477–1481 (2016). 10.1037/xhp0000264 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Wermeskerken M., Litchfield D., Gog T., “What am I looking at? Interpreting dynamic and static gaze displays,” Cognit. Sci. 42, 220–252 (2018). 10.1111/cogs.12484 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Drew T., et al. , “Quantifying the costs of interruption during diagnostic radiology interpretation using mobile eye-tracking glasses,” J. Med. Imaging 5(3), 031406 (2018). 10.1117/1.JMI.5.3.031406 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Alakhras M. M., et al. , “Effect of radiologists’ experience on breast cancer detection and localization using digital breast tomosynthesis,” Eur. Radiol. 25, 402–409 (2015). 10.1007/s00330-014-3409-1 [DOI] [PubMed] [Google Scholar]
- 64.Blanchon T., et al. , “Baseline results of the Depiscan study: a French randomized pilot trial of lung cancer screening comparing low dose CT scan (LDCT) and chest X-ray (CXR),” Lung Cancer 58, 50–58 (2007). 10.1016/j.lungcan.2007.05.009 [DOI] [PubMed] [Google Scholar]
- 65.van der Gijp A., et al. , “The effect of teaching search strategies on perceptual performance,” Acad. Radiol. 24(6), 762–767 (2017). 10.1016/j.acra.2017.01.007 [DOI] [PubMed] [Google Scholar]
- 66.Eckstein M. P., et al. , “Quantifying the limitations of the use of consensus expert committees in ROC studies,” Proc. SPIE 3340, 128–134 (1998). 10.1117/12.306177 [DOI] [Google Scholar]
- 67.Henkelman R. M., Kay I., Bronskill M. J., “Receiver operator characteristic (ROC) analysis without truth,” Med. Decis. Making 10(1), 24–29 (1990). 10.1177/0272989X9001000105 [DOI] [PubMed] [Google Scholar]
- 68.Kundel H. L., Polansky M., “Mixture distribution and receiver operating characteristic analysis of bedside chest imaging with screen-film and computed radiography,” Acad. Radiol. 4(1), 1–7 (1997). 10.1016/S1076-6332(97)80152-3 [DOI] [PubMed] [Google Scholar]
- 69.Cain M. S., Adamo S. H., Mitroff S. R., “A taxonomy of errors in multiple-target visual search,” Vis. Cognit. 21, 899–921 (2013). 10.1080/13506285.2013.843627 [DOI] [Google Scholar]
- 70.Chen W., et al. , “Perceptual training to improve hip fracture identification in conventional radiographs,” PLoS One 12, e0189192 (2017). 10.1371/journal.pone.0189192 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Mello-Thoms C., “How much agreement is there in the visual search strategy of experts reading mammograms?” Proc. SPIE 6917, 691704 (2008). 10.1117/12.768835 [DOI] [Google Scholar]
- 72.Sha L. Z., et al. , “Perceptual learning in the identification of lung cancer in chest radiographs,” Cognit. Res. Princ. Implic. 5(1), 4 (2020). 10.1186/s41235-020-0208-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Geel K., et al. , “Teaching systematic viewing to final-year medical students improves systematicity but not coverage or detection of radiologic abnormalities,” J. Am. Coll. Radiol. 14, 235–241 (2017). 10.1016/j.jacr.2016.10.001 [DOI] [PubMed] [Google Scholar]
- 74.Gegenfurtner A., et al. , “Effects of eye movement modeling examples on adaptive expertise in medical image diagnosis,” Comput. Educ. 113, 212–225 (2017). 10.1016/j.compedu.2017.06.001 [DOI] [Google Scholar]
- 75.Kok E. M., et al. , “Systematic viewing in radiology: seeing more, missing less?” Adv. Health Sci. Educ. 21, 189–205 (2016). 10.1007/s10459-015-9624-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Litchfield D., et al. , “Viewing another person’s eye movements improves identification of pulmonary nodules in chest x-ray inspection,” J. Exp. Psychol. Appl. 16, 251–262 (2010). 10.1037/a0020082 [DOI] [PubMed] [Google Scholar]