Skip to main content
Journal of Medical Imaging logoLink to Journal of Medical Imaging
. 2024 Dec 31;12(1):015501. doi: 10.1117/1.JMI.12.1.015501

Examining the influence of digital phantom models in virtual imaging trials for tomographic breast imaging

Amar Kavuri a, Mini Das a,b,*
PMCID: PMC11686409  PMID: 39744152

Abstract.

Purpose

Digital phantoms are one of the key components of virtual imaging trials (VITs) that aim to assess and optimize new medical imaging systems and algorithms. However, these phantoms vary in their voxel resolution, appearance, and structural details. We investigate whether and how variations between digital phantoms influence system optimization with digital breast tomosynthesis (DBT) as a chosen modality.

Methods

We selected widely used and open-access digital breast phantoms created with different methods and generated an ensemble of DBT images to test acquisition strategies. Human observer performance was evaluated using localization receiver operating characteristic (LROC) studies for each phantom type. Noise power spectrum and gaze metrics were also employed to compare phantoms and generated images.

Results

Our LROC results show that the arc samplings for peak performance were 2.5  deg and 6 deg in Bakic and XCAT breast phantoms, respectively, for the 3-mm lesion detection task and indicate that system optimization outcomes from VITs can vary with phantom types and structural frequency components. In addition, a significant correlation (p<0.01) between gaze metrics and diagnostic performance suggests that gaze analysis can be used to understand and evaluate task difficulty in VITs.

Conclusion

Our results point to the critical need to evaluate realism in digital phantoms and ensure sufficient structural variations at spatial frequencies relevant to the intended task. Standardizing phantom generation and validation tools may help reduce discrepancies among independently conducted VITs for system or algorithmic optimizations.

Keywords: virtual imaging trials, virtual imaging trial, digital breast tomosynthesis, tomosynthesis, optimization, structure variability, digital phantom, virtual phantom, localization receiver operating characteristic, gaze metrics

1. Introduction

Rapid advances in medical imaging technologies and methods make it impractical to evaluate and refine their design for optimal clinical use through clinical trials alone. These trials can take extensive resources and long duration. Virtual imaging trials (VITs) are an alternative approach to assess the potential of an imaging system, software, or specific system/software combinations.18 VITs are based on in silico methods, where digital phantoms replace patients, the software platform mimics the imaging process, and virtual interpreters represent the human readers.

As VITs are becoming more accurate, realistic digital phantom development has drawn much attention from the medical imaging community. Our group recently examined the contribution of anatomical and quantum noise in signal detection and performance for human observers in such imaging trials.9,10 Using VITs, we have also examined in the past the development of novel visual search observer models to match humans,1113 system and algorithm optimization questions,3,4,1417 understanding image texture features as relevant to human observer performance,18,19 and radiomics variability.20 As in other groups’ work, all of these studies were conducted with one breast phantom type.

Here, we will evaluate what implications a VIT study outcome may have by changing the phantom type. We once again take digital breast tomosynthesis (DBT) as the modality of choice to examine this critical question and use two widely accepted types of breast phantoms—both considered anthropomorphic. For the last few years, many computational breast phantoms have been developed using different methods.21 Some commonly used breast phantoms include power-law-based phantoms,2225 anthropomorphic phantoms,2631 modified patient tomographic data,3236 and mastectomy specimens.3739 These phantoms vary in their resolution, model, structures, and details. Power spectrum analysis has been used as a method to assess phantom structures and realism.2730,38,40,41 Cockmartin et al.41 observed in 2013 that none of the evaluated phantoms matched with the patient data in terms of power-spectrum parameters. Few studies extended the realism assessment with trained humans’ (physicists and radiologists) rating.27,30,32,34 Badano et al.42 however argued that realism is subjective and simulating relevant properties for the task is sufficient. However, it is unclear which properties are relevant and what level of realism is sufficient for the task of tomographic imaging system/algorithmic optimizations.

Past work by various groups has used different phantom types for evaluating DBT system configurations.22,37,43,44 Because of the differences in phantoms, configurations, and interpreters among the studies, there is little agreement on the optimal configuration. Specifically, the contribution of phantom differences in these inconsistencies is not well understood. In the preliminary analysis, Park et al.45 and Zhao et al.46 observed that phantom differences (uniform versus structured) influence tomographic system optimal configurations. However, differences resulting from various available structured phantoms (when they are each considered realistic by the research community) have never been explored.

We explore this critical question and examine whether and how phantom structure variability would influence the study outcome of a VIT such as for DBT acquisition parameters. In addition, we also investigated the influence of task difficulty due to changes in the phantom structures using analysis of observer gaze patterns and interpretation processes when viewing images generated with different phantoms in the VIT studies. We examine how variations in phantom design modulate visual search strategies and cognitive processes during image interpretation. Another key objective of this research is to assess whether specific gaze metrics can serve as reliable indicators of task difficulty in virtual trials. We present results from our study examining the influence of breast phantom types in VIT for DBT when all other aspects remain the same. To the best of our knowledge, ours is the first study to evaluate this critical aspect.

To accomplish this, we selected two widely used and open-access breast phantom types which were generated using different procedures. Further, human observers analyzed simulated in-plane DBT images of the selected phantoms for similar configurations. Our results show a comparison of predicted system optimizations between VITs using these two phantom types.

2. Materials and Methods

The study methods are summarized below. First, the selected phantoms are listed. Second, our simulation methodology to generate both abnormal and normal cases of DBT images is described. Next, the procedure to characterize phantom structures using a power spectrum is described. Finally, the experimental method to estimate human observer performance and gaze pattern is discussed.

2.1. Phantom Selection

We selected two types of digital breast phantoms for this study: three-dimensional anthropomorphic breast models generated by Bakic et al.26 at the University of Pennsylvania (hereafter referred to as Bakic phantom) and XCAT breast phantoms generated at Duke University using compressed volumes of patient breast CT data33 (hereafter referred to as XCAT breast phantoms). Bakic phantoms are based on mathematical models that define the breast structural variability. These phantoms can be manipulated easily through the model configurations to simulate changes in anatomy. XCAT breast phantoms are based on patient breast CT images. These phantoms may appear more realistic, but changing their anatomical variations or resolution is difficult.7 For our analysis, six digital phantoms of a 5-cm thickness for each type were selected. The phantoms of each type were categorized based on volumetric glandular fractions (VGFs) with 25% density and the other three with 50% density. Bakic and XCAT breast phantoms have a voxel resolution of (0.2  mm)3 and (0.25  mm)3, respectively. The selected case numbers of XCAT phantoms of 25% density are CTA1608, CTA0357, and CTA1326 and of 50% density are CTB6013, CTA1284, and CTA1285.

Figure 1 shows sample slices of both types of phantoms on the top. The transitions in XCAT breast phantoms from 100% glandular to adipose tissue have intermediately dense (25%, 50%, and 75%) voxels, whereas these transitions are sharp (100% dense to adipose) in Bakic phantoms. The XCAT breast phantoms lack small structures such as Cooper’s ligaments. Erickson et al.33 noted that the “lack of very fine-detail structures similar to Cooper’s ligaments can negatively impact the realism of digital phantoms.” References 34, 35, and 47 tried different methods to improve XCAT breast phantoms, but these phantoms were not released publicly.

Fig. 1.

Fig. 1

Sample slice of 25% VGF Bakic (top left) and XCAT breast (top right) phantoms and the corresponding reconstructed DBT slices with a 3-mm spherical lesion on the bottom.

2.2. Image Generation

The DBT images used in this study were generated using a simulation platform based on a serial cascade model.48,49 The simulation platform modeled generic DBT systems with both source and detector rotating geometry, which was detailed in our previous work9,49,50 and is described briefly here. The X-ray spectrum modeled a 30-kVp molybdenum anode source with a 0.7-mm-thick Al filter and the X-ray fluence scaled to provide a 1.5-mGy mean glandular dose to a 5-cm-thick compressed breast. This total dose was evenly distributed at each projection angle, and the X-ray fluence per projection was determined based on the breast dosimetry data (Dgn coefficients) generated by a Monte Carlo simulator.51 Focal-spot blurring with a 0.1-mm focal-spot size was modeled using a Gaussian modulation transfer function. The detector was modeled as a 0.1-mm-thick CsI-based a:Si flat panel detector with a 0.1-mm pixel size. The scintillator blurring was modeled using an empirically measured pre-sampling MTF. Quantum noise was modeled by a Poisson distribution for each keV (at the absorption of X-rays within the scintillator), whereas additive electronic noise followed a Gaussian process with a standard deviation of 2200 electrons. Scatter was not modeled in this simulation.

The lesion targets were homogeneous spherical masses with a 3-mm diameter. Although this is smaller than the average lesion sizes detected by the current DBT systems, VITs are also aimed at evaluating future imaging system designs. This signal size of 3 mm was also chosen to achieve sufficient challenge without requiring artificially altering attenuation values of the signal and to capture the human observer performance trends more accurately for a range of breast densities. An abnormal case was generated by substituting the lesion into the randomly selected location in the glandular region prior to the projection imaging. The lesion contrast or local glandularity was not matched between phantom types, whereas selecting lesion locations as absolute performance was not relevant for this study. The simulated mass was assumed to have an energy-dependent attenuation coefficient for invasive ductal carcinoma as reported by Johns and Yaffe.52 Eight abnormal cases and one normal case were formed for each phantom. The projections were acquired using Siddon’s ray tracing method53 to model X-ray transmission through the breast. Two different sets of projections were generated with different phantom types. To filter the random noise, an adaptive Wiener filter-based denoising algorithm was applied to each projection.54,55 The denoising algorithm was shown to reduce the noise effectively in our prior work.9 Each data set was acquired over an angular span of 60 deg with a projection number P{3,7,11,15,19,21,25,31,35,41,45} by keeping the total dose steady at 1.5 mGy for each DBT acquisition. Feldkamp filtered back-projection algorithm56 was used for image reconstruction. A three-dimensional Butterworth filter with a cutoff of frequency 0.25 cycles/pixel was applied on reconstructed volumes. In-plane DBT images of 1-mm thickness were produced by applying a boxcar averaging. Eight lesion-present (abnormal) and eight lesion-absent (normal) images were created for each phantom for the human observer studies as described in our earlier work.4 A set of 96 images was produced from the six phantoms of each phantom type for a given projection number. Figure 1 shows sample 1-mm abnormal DBT slices of both types of phantoms on the bottom.

2.3. Noise Power Spectrum

Noise power spectrum (NPS) based analysis has been used in literature to quantify the similarity between phantom structures and clinical data.28,41 Our NPS estimation methods are briefly summarized here and can be found in detail elsewhere.9 We selected multiple regions of interest (ROIs) of size 2.43  cm×2.43  cm from each lesion-absent DBT slice from the breast region. The mean of each ROI was subtracted from the corresponding ROI, and a Hann tapering window was applied to each ROI to reduce the edge artifacts. The 2D NPS was calculated by ensemble averaging the square of the magnitude of the discrete Fourier transform of each tapered ROI, and radial averaging of the 2D NPS was performed, resulting in a 1D NPS. A linear regression fit (from lower frequency ranges between 0.1 and 0.15  mm1 to higher frequency ranges between 0.4 and 0.7  mm1) was estimated to the natural logarithm of the 1D NPS that maximizes the coefficient of determination (R2). The NPS parameter β is estimated as the slope of linear fit.

2.4. Human Observer Study

Three non-radiologists took part in the localization ROC (LROC) experiments. An LROC study entails both detection and localization of a 3-mm spherical lesion. The task in our study is the search and localization/detection of spherical mass in simulated in-plane DBT images, thus justifying the non-radiologists as observers. Physicists and engineers who participated as observers had the same level of experience in reading simulated images. The human observer experimental method was the same as that described in our prior work3,4,11,12 and summarized briefly here. In this study, in-plane DBT slices of the Bakic phantom set correspond to five acquisition protocols reconstructed with projection number P of 7, 11, 19, 25, and 35 and the XCAT breast phantom set corresponds to six acquisition protocols reconstructed with projection number P of 3, 7, 11, 19, 25 and 35 were evaluated. The 96 images per set (48 pairs of abnormal/normal images) were divided into 72 test images (36 pairs) and 24 training images (12 pairs). All sets included an initial training session followed by a test session. Each observer thus read 10 image sets. Observers were asked to select the lesion location, and a four-point ordinal scale was used to collect the confidence rating. Localizations were considered correct when the observer selected the location within a 2-mm radius of the lesion center (radius of spherical lesion +0.5  mm additional radius for human selection error). The observers’ performance was quantified with the area under LROC curves (AUC). The estimate of AUC for a given observer and protocol was obtained with a Wilcoxon-based non-parametric ranking method. To assess the consistency of the AUC values between the observers, the intraclass correlation coefficient (ICC) was calculated using Python (v3.7.6) software.57

2.5. Eye Gaze Collection

Numerous studies have demonstrated that gaze metrics correlate with diagnostic performance and can reveal human observers’ interpretation process.5861 To evaluate the differences in the interpretation process due to the change of phantom type, we collected gaze data using a screen-based eye-tracking system (Tobii pro X3-120 with EPU). An additional three non-radiologists participated in the eye-tracking study. The eye gaze data were collected for a single acquisition protocol of 35 projections for both types of phantoms. Both presentation of stimuli and eye-tracking were controlled by a Lenovo ThinkPad p52s laptop using an in-house built Python software. Calibration and validation procedures were performed before starting a session. The images were displayed on a standard Dell 23.8 in. LCD monitor with a resolution of 1920×1080. Once the software collected and stored the gaze logs, post-processing was applied to estimate fixation locations, fixation durations, and saccade durations using an I-VT filter based on Tobii pro white paper.62 Figure 2 shows a sample gaze pattern of an observer for the task of searching and locating a 3-mm lesion. Each vertex indicates the fixation center with numbers indicating fixation order and lines indicating the saccade paths and lengths.

Fig. 2.

Fig. 2

Example gaze pattern with fixations and saccades on DBT slices of 25% dense (top) and 50% dense (bottom) phantoms.

We computed five gaze metrics to characterize the gaze pattern. Namely, we found the total time spent on each image, the total number of fixations made on each image, the time taken to first fixate on the lesion region (first hit time), the number of fixations on the lesion region, and the accumulated lesion dwell time. These gaze metrics were estimated using MATLAB2018a (The MathWorks, Inc., Natick, United States) with in-house-built scripts. Spearman rank correlation was estimated between the average values of each gaze metric and the corresponding AUC values of the six observers.

3. Results

3.1. Power Spectrum Analysis

To characterize the structural variations of the selected phantom types, we estimated the NPS of the simulated DBT images. We selected the NPS of in-plane DBT images corresponding to the acquisition configurations of 35 projections over a 60-deg arc span for comparing the phantoms’ structures. Figure 3 shows the averaged NPS of both the phantom types in the log-log scale along with linear fits. These plots show that Bakic phantom backgrounds have higher spectral densities at higher frequencies (considered the anatomical region) than XCAT breast phantom backgrounds. We note that this high anatomic noise is not reflected in the calculated β values. The estimated values of β were 2.52 and 3.32 for Bakic and XCAT breast phantom backgrounds, respectively. This result is contradictory to the popular belief that lower β values indicate lower anatomical noise and is in agreement with observations made in our prior work.9

Fig. 3.

Fig. 3

Power spectra analysis of DBT slices of both types of backgrounds for a sample acquisition protocol of a 60-deg arc span, 35 projections suggest that lack of small and sharp structures in XCAT breast phantoms resulted in lower spectral density at higher frequencies than that of Bakic phantoms.

3.2. Human Observer LROC Results

In our studies, each human observer was tasked with search and localization of a 3-mm lesion within the displayed DBT image slice (similar to the ones shown in Fig. 1). Figure 4 shows sample regions of DBT slices with a lesion at the center of the region to illustrate the changes in the visibility of the signal for the varying number of projections in both types of phantom backgrounds. The signal is better visible in the Bakic phantom background for the acquisition configuration of 35 projections, whereas the signal is better visible in the XCAT breast background for the acquisitions of 11 to 35 projections. Figure 5 presents LROC plots for the three observers, for a sample acquisition of 35 projections over a 60-deg arc span, in both phantom backgrounds. The y-axis of the LROC represents the joint probability of correctly localizing a lesion in a case reported as positive. Therefore, the curve reaches up to the percentage of cases with correct lesion localization. The LROC AUC values above zero are considered better than guessing as the likelihood of guessing the lesion’s location is zero. Figure 6 shows the average performance of the three observers in both Bakic and XCAT breast phantom images. Error bar lengths indicate twice the standard error of the three observers’ AUC values. We observed a greater improvement in performance up to 11 projections in XCAT breast phantom backgrounds and a steady performance thereafter. This corresponds to an arc sampling of approximately 6 deg between adjacent projections for peak performance. In Bakic phantom backgrounds, observers’ performance improved up to 25 projections and required finer arc sampling of 2.5  deg to achieve peak performance. We also plotted the detection performance separately for 25% dense and 50% dense slices, where 25% dense indicates an easy level of task difficulty while 50% dense indicates a higher level of task difficulty. Regardless of phantom type, both levels of task difficulty show similar trends and suggest that optimization may not change with the task difficulty, which is in accordance with earlier observations by Zeng et al.44 and Mackenzie et al.63 We also noticed in Figs. 5 and 6 that observers had overall slightly lower performance in XCAT breast backgrounds than in Bakic backgrounds in particular for 25% dense images. The inter-observer agreement, quantified using the ICC, ranged from 0.92 to 0.95 with the average AUC values, suggesting a strong agreement between the observers.

Fig. 4.

Fig. 4

Sample lesion presents regions of DBT slices of both phantoms acquired with different numbers of projections. The black arrows indicate the lesion position.

Fig. 5.

Fig. 5

Human observer performance plotted as LROC curves for a sample acquisition of 35 projections over 60 deg in a 3-mm mass detection study in Bakic phantom (top) and XCAT breast phantom (bottom) backgrounds.

Fig. 6.

Fig. 6

Human observer performance plotted as the area under LROC (AUC) against the number of projections in a 3-mm mass detection study in Bakic phantom (top) and XCAT breast phantom (bottom) backgrounds. The results suggest the optimal configurations do not change with the task difficulty but change with the background structure type.

3.3. Gaze Analysis

From the eye-tracking data, the total time spent as well as the total number of fixations made on images (both lesion present and lesion absent images) were estimated. For lesion present images, we also estimated the first hit time (on the lesion), lesion dwell time, and the number of fixations on the lesion. Each of these gaze metrics was averaged across all images for a given phantom type and breast density for each observer. Figure 7 shows the influence of phantom type on two of the gaze metrics plotted separately for lesion absent and lesion present images. Observers spent a longer time and made more fixations to diagnose lesion-absent images than lesion-present images, which is in line with previous findings.64,65 The most striking result to emerge from this data is that observers spent a longer time and made more fixations on images with the Bakic phantom backgrounds than XCAT breast backgrounds. This difference is significant (p-value<0.01) in lesion-absent images only.

Fig. 7.

Fig. 7

Average amount of time spent and the average number of fixations made on images (includes lesion absent and lesion present images) plotted for both phantom types. Observers spent a longer time and made more fixations to make decisions on images with Bakic phantom backgrounds in comparison to those with XCAT breast backgrounds.

Spearman rank correlation coefficient was computed between the average values of each gaze metric and the AUC values of both phantom types and densities. Table 1 shows that all of the estimated gaze metrics have a good correlation with diagnostic performance. Our observation of first hit time, lesion dwell time, and number of fixations on the lesion showed that the observers took longer to first fixate on the lesion, spent less cumulative time on the lesion, and had fewer fixations on the lesion as the task difficulty increased due to higher breast density. No significant difference was observed in these gaze metrics due to a change of phantom type in lesion-present images. First hit time showed a strong negative (0.85) correlation with the AUC values as shown in Fig. 8. This result suggests that the quicker an observer fixates on the lesion the better the diagnostic performance which is in accordance with the observation made by Kundel et al.60 The positive correlation of lesion fixations and dwell time with AUC suggests that observers fixate the lesion longer and multiple times to locate accurately. All five gaze metrics showed a significant correlation (p-value<0.01) with AUC values.

Table 1.

Spearman rank correlation between the AUC values of the six observers and the corresponding average values of the five gaze metrics. All five metrics show a good and significant correlation with the diagnostic performance.

Gaze metric Correlation coefficient (ρ) p-value
Total number of fixations −0.52 0.0085
Total time −0.6 0.0019
Lesion dwell time 0.67 0.0003
Lesion number of fixations 0.68 0.0003
First hit time −0.85 1.2e−7

Fig. 8.

Fig. 8

Average value of first hit time, lesion dwell time, and number of fixations on lesion were plotted against the AUC value of each observer (total of six) for both phantom types and two densities.

4. Discussion and Conclusion

This work evaluated the influence of variations between digital breast phantoms on DBT optimization and interpretation process for a small lesion localization task. Our results indicate that the phantoms should have adequate structures at spatial frequencies that are relevant to the signal size and the intended task for sufficient realism. We observed that the optimal number of projections for peak detection performance could change with the structural complexity of the phantoms. In addition, we observed that the number of projections required to achieve maximum performance is smaller for XCAT breast phantoms than for Bakic phantoms. Our power spectrum analysis revealed that the complex structures in Bakic phantoms contribute to high frequencies, whereas the high-frequency content of XCAT breast phantoms showed lower amplitudes. These high-frequency structures resulted in more aliasing artifacts in Bakic phantom backgrounds compared with XCAT breast phantom backgrounds under similar sparse sampling conditions. Hence, more projections were required to resolve these high-frequency structures in Bakic phantom backgrounds. The difference in observers’ performance between the two phantom backgrounds in particular for 25% dense images (see Figs. 5 and 6) cannot be attributed to phantom structures necessarily as the local densities in the region where lesions are inserted could also influence these LROC AUC. The magnitude of AUC values is less relevant in this particular study as only relative changes in AUC values with changing system parameters (such as a number of projections in this study) were used to deduce the final conclusions.

Our gaze analysis revealed intriguing insights into the influence of anatomical variations on visual search strategies. Observers spent significantly more time and made more fixations when inspecting images with Bakic phantom backgrounds compared with XCAT breast backgrounds. A possible explanation for this difference may be that the greater anatomical noise of Bakic phantom backgrounds than the XCAT breast backgrounds (see Fig. 3) made observers less confident and hence spent a longer time to make decisions. Notably, this effect of anatomical complexity was most pronounced in lesion-absent cases, where observers had to rely solely on the background characteristics to guide their search. By contrast, when a lesion was present, the visibility and salience of the target became the dominant factor driving gaze patterns, overriding the influence of background complexity. Observers exhibited longer first hit time, reduced cumulative lesion dwell times, and fewer lesion-directed fixations as the breast density, and hence task difficulty, increased. This suggests that lesion visibility or task difficulty plays a pivotal role in shaping visual attention allocation when a target is present, whereas anatomical complexity exerts a greater influence on search strategies in the absence of a defined target. All five gaze metrics showed a good correlation with diagnostic performance, which is in accordance with the observations made by Voisin et al.61 in mammography. This result indicates that gaze metrics and interpretation processes were influenced by the task difficulty. Hence, these gaze metrics can be used to understand the task difficulty in VIT imaging.

In this study, we evaluated the optimal number of projections in an arc span of 60 deg, which is wider than the arc ranges used in the clinical DBT systems. The arc samplings between the adjacent projections for peak performance were 2.5  deg and 6 deg in Bakic and XCAT breast phantoms respectively for the 3-mm lesion detection task. For other arc spans, we expect a similar amount of aliasing artifacts for the similar arc sampling. Thus, our conclusions should hold for other arc spans used in clinical DBT prototypes that have not been examined here in our VIT study.

The difference in the optimal arc sampling from VIT conducted with two widely used phantom types raises the need to standardize and unify the frequency contents and complexity of digital phantoms if results published from multiple groups need to be compared. The goal of our study was not to determine if one of the two phantoms is favorable or better than the other for use in VIT for breast. It is likely that the perceived realism and correlation with observer performance for VIT with actual clinical imaging trials would show results in favor of one of these phantoms based on the chosen task and the signal type.

One of the limitations of our study is that only three observers participated in estimating AUC trends. This number is smaller than the number of observers who participated in DBT optimization studies in the literature.44,6668 However, the observers had higher agreement with each other which is backed by our estimation of ICC value of greater than 0.92. In addition, we also estimated the correlation between the average AUC trends of Bakic and XCAT breast phantoms. The Spearman rank correlation of 0.2 suggests that the two trends are not similar.

Furthermore, the study was conducted only for the task of searching and locating a 3-mm spherical lesion as conducting human observer studies for different signal sizes and types is time-consuming and expensive. Prior studies suggest that smaller (high frequency) lesions dictate the optimal number of projections because their visibility is affected more by both aliasing artifacts (at a lower number of projections) and random noise (mainly at a higher number of projections).23,43 Many studies also used 3-mm22,23,37 and similar sizes44,6769 of lesions as a signal target in optimization studies. We believe that for larger signals the differences in optimal arc sampling due to differences in the selected phantom structures may be less evident than that of smaller signals. We chose a sphere lesion as a target because detecting a sphere lesion is a relatively easy task compared with a complex spicule lesion study, hence requiring minimal training. In addition, the performance of non-radiologists and radiologists was shown to be similar for relatively easy tasks70 and multiple studies have used spherical lesions in VITs.22,23,37,43,44 On the other hand, for high-contrast smaller signals such as micro-calcification, random noise was shown to be the dominating factor in determining the optimal configuration rather than the background structures.43 Future studies will include signals of different sizes and types.

Another limitation of our study is that only six phantoms (to generate a much larger number of independent cases in each study set) were selected for each phantom type, and this sample may not represent the entire population of each phantom type. Our goal was not to compare Bakic and XCAT phantoms but to validate how optimization estimation changes with phantoms that were generated differently. One may choose six phantoms in their studies in evaluating optimal configurations as some of the studies in literature chose around two to nine phantoms.11,15,6668 If the selected new phantoms have similar NPS properties, we anticipate similar differences in the optimal number of projections. Instead, if the selected phantoms have different NPS properties and result in different optimal numbers of projections, this strengthens our argument that phantom structures influence the estimation of system optimal configurations. The selected phantoms (25% and 50% dense) are in the higher range of breast densities observed in the clinical data. The average AUC trends (see Fig. 6) did not change due to changes in breast density, suggesting that AUC trends may remain the same for other density phantoms. These results also point to the need to include more phantoms with variations in VIT studies.

In conclusion, our results indicate that both the structural complexity and the relevant spatial frequency magnitudes in digital phantom structures can influence the estimation of optimal system configurations. Our results highlight the importance of accurate modeling of phantoms to resemble the patient’s anatomy and the importance of assessment for their realism (for chosen tasks) before use in VITs. These results can be generalized to digital phantoms used for multiple imaging modalities. As a final note, our goal in this particular study was not to discuss the superiority of one phantom type over the other based on the results shown here. The key aspect is for the VIT and medical physics community to be mindful that optimization or other results shown using one “realistic” breast phantom may not always agree with results when using another “realistic” phantom unless additional standardization efforts are pursued.

Acknowledgments

This work was partially supported by funding from the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health (NIH) (Grant No. R01 EB EB029761), the US Department of Defense (DOD) Congressionally Directed Medical Research Program (CDMRP) Breakthrough Award (Award No. BC151607), and the National Science Foundation CAREER Award (Award No. 1652892). We would also like to thank both UPenn and Duke groups for making the digital phantoms freely available for research.

Biographies

Amar Kavuri received his PhD in biomedical engineering from the University of Houston in 2022. He is currently working as a postdoctoral associate at Duke University. His current research interests include medical imaging, image processing, image analysis, image perception, visual attention, and virtual imaging trials. He is a member of SPIE – the international society for optics and photonics.

Mini Das is a Moores professor of physics, electrical engineering, and biomedical engineering at the University of Houston. Her research interests span broad areas of applied physics, optical physics, system development, quantitative imaging, inverse problems, detector physics, image science, and image perception to solve outstanding challenges in medicine and biology. She is a fellow of SPIE – the international society for optics and photonics, and is keen to develop educational and outreach activities to advance and promote interdisciplinary research.

Contributor Information

Amar Kavuri, Email: k.amareswararao@gmail.com.

Mini Das, Email: mdas@uh.edu.

Disclosures

The authors declare that the research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest.

Code and Data Availability

Both code and data could be made available upon written request to the authors.

References

  • 1.Bakic P., et al. , “Virtual tools for validation of X-ray breast imaging systems,” Med. Phys. 40(6Part23), 390–390 (2013). 10.1118/1.4815218 [DOI] [Google Scholar]
  • 2.Badano A., et al. , “In silico imaging clinical trials for regulatory evaluation: initial considerations for VICTRE, a demonstration study,” Proc. SPIE 10132, 1013220 (2017). 10.1117/12.2255746 [DOI] [Google Scholar]
  • 3.Das M., Gifford H. C., “Comparison of model-observer and human-observer performance for breast tomosynthesis: effect of reconstruction and acquisition parameters,” Proc. SPIE 7961, 796118 (2011). 10.1117/12.878826 [DOI] [Google Scholar]
  • 4.Das M., Liang Z., Gifford H. C., “Examining wide-arc digital breast tomosynthesis: optimization using a visual-search model observer,” Proc. SPIE 9412, 94121S (2015). 10.1117/12.2082986 [DOI] [Google Scholar]
  • 5.Barufaldi B., et al. , “OpenVCT: a GPU-accelerated virtual clinical trial pipeline for mammography and digital breast tomosynthesis,” Proc. SPIE 10573, 1057358 (2018). 10.1117/12.2294935 [DOI] [Google Scholar]
  • 6.Sharma D., et al. , “In silico imaging tools from the VICTRE clinical trial,” Med. Phys. 46(9), 3924–3928 (2019). 10.1002/mp.13674 [DOI] [PubMed] [Google Scholar]
  • 7.Abadi E., et al. , “Virtual clinical trials in medical imaging: a review,” J. Med. Imag. 7(4), 042805 (2020). 10.1117/1.JMI.7.4.042805 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Barufaldi B., et al. , “Virtual clinical trials in medical imaging system evaluation and optimisation,” Radiat. Prot. Dosim. 195, 363–371 (2021). 10.1093/rpd/ncab080 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kavuri A., Das M., “Relative contributions of anatomical and quantum noise in signal detection and perception of tomographic digital breast images,” IEEE Trans. Med. Imag. 39(11), 3321–3330 (2020). 10.1109/TMI.2020.2991295 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kavuri A., Fredette N. R., Das M., “Interaction of anatomic and quantum noise in DBT power spectrum,” Proc. SPIE 10577, 105770G (2018). 10.1117/12.2295218 [DOI] [Google Scholar]
  • 11.Gifford H. C., Liang Z., Das M., “Visual-search observers for assessing tomographic X-ray image quality,” Med. Phys. 43(3), 1563–1575 (2016). 10.1118/1.4942485 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lau B. A., Das M., Gifford H. C., “Towards visual-search model observers for mass detection in breast tomosynthesis,” Proc. SPIE 8668, 86680X (2013). 10.1117/12.2008503 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Jiang Z., Das M., Gifford H. C., “Analyzing visual-search observers using eye-tracking data for digital breast tomosynthesis images,” J. Opt. Soc. Am. A 34(6), 838–845 (2017). 10.1364/JOSAA.34.000838 [DOI] [PubMed] [Google Scholar]
  • 14.Gifford H., et al. , “Optimizing breast-tomosynthesis acquisition parameters with scanning model observers,” Proc. SPIE 6917, 69170S (2008). 10.1117/12.771018 [DOI] [Google Scholar]
  • 15.Das M., et al. , “Evaluation of a variable dose acquisition methodology for breast tomosynthesis,” Proc. SPIE 6913, 691319 (2008). 10.1117/12.773106 [DOI] [Google Scholar]
  • 16.Das M., et al. , “Evaluation of a variable dose acquisition technique for microcalcification and mass detection in digital breast tomosynthesis,” Med. Phys. 36(6 Part 1), 1976–1984 (2009). 10.1118/1.3116902 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Gifford H. C., Das M., “Assessment of DBT acquisition parameters for 2D and 3D search tasks,” Proc. SPIE 10577, 105770I (2018). 10.1117/12.2294984 [DOI] [Google Scholar]
  • 18.Nisbett W. H., Kavuri A., Das M., “On the correlation between second order texture features and human observer detection performance in digital images,” Sci. Rep. 10, 13510 (2020). 10.1038/s41598-020-69816-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Nisbett W. H., Kavuri A., Das M., “Investigating the contributions of anatomical variations and quantum noise to image texture in digital breast tomosynthesis,” Proc. SPIE 10573, 105730H (2018). 10.1117/12.2294981 [DOI] [Google Scholar]
  • 20.Andrade D., Kavuri A., Das M., “Sources of image texture variation in tomographic breast imaging,” Proc. SPIE PC12035, PC120350A (2022). 10.1117/12.2614334 [DOI] [Google Scholar]
  • 21.Glick S. J., Ikejimba L. C., “Advances in digital and physical anthropomorphic breast phantoms for X-ray imaging,” Med. Phys. 45(10), e870–e885 (2018). 10.1002/mp.13110 [DOI] [PubMed] [Google Scholar]
  • 22.Sechopoulos I., Ghetti C., “Optimization of the acquisition geometry in digital tomosynthesis of the breast,” Med. Phys. 36(4), 1199–1207 (2009). 10.1118/1.3090889 [DOI] [PubMed] [Google Scholar]
  • 23.Gang G., et al. , “Anatomical background and generalized detectability in tomosynthesis and cone-beam CT,” Med. Phys. 37(5), 1948–1965 (2010). 10.1118/1.3352586 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Lau B. A., et al. , “A statistically defined anthropomorphic software breast phantom,” Med. Phys. 39(6Part1), 3375–3385 (2012). 10.1118/1.4718576 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Baneva Y., et al. , “Evaluation of a breast software model for 2D and 3D X-ray imaging studies of the breast,” Phys. Med. 41, 78–86 (2017). 10.1016/j.ejmp.2017.04.024 [DOI] [PubMed] [Google Scholar]
  • 26.Bakic P. R., et al. , “Mammogram synthesis using a 3D simulation. I. Breast tissue model and image acquisition simulation,” Med. Phys. 29(9), 2131–2139 (2002). 10.1118/1.1501143 [DOI] [PubMed] [Google Scholar]
  • 27.Bliznakova K., et al. , “Evaluation of an improved algorithm for producing realistic 3D breast software phantoms: application for mammography,” Med. Phys. 37(11), 5604–5617 (2010). 10.1118/1.3491812 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Chen B., et al. , “An anthropomorphic breast model for breast imaging simulation and optimization,” Acad. Radiol. 18(5), 536–546 (2011). 10.1016/j.acra.2010.11.009 [DOI] [PubMed] [Google Scholar]
  • 29.Mahr D. M., Bhargava R., Insana M. F., “Three-dimensional in silico breast phantoms for multimodal image simulations,” IEEE Trans. Med. Imag. 31(3), 689–697 (2011). 10.1109/TMI.2011.2175401 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Elangovan P., et al. , “Design and validation of realistic breast models for use in multiple alternative forced choice virtual clinical trials,” Phys. Med. Biol. 62(7), 2778 (2017). 10.1088/1361-6560/aa622c [DOI] [PubMed] [Google Scholar]
  • 31.Carton A.-K., et al. , “A virtual human breast phantom using surface meshes and geometric internal structures,” Lect. Notes Comput. Sci. 8539, 356–363 (2014). 10.1007/978-3-319-07887-8_50 [DOI] [Google Scholar]
  • 32.Hsu C. M., et al. , “Generation of a suite of 3D computer-generated breast phantoms from a limited set of human subject data,” Med. Phys. 40(4), 043703 (2013). 10.1118/1.4794924 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Erickson D. W., et al. , “Population of 224 realistic human subject-based computational breast phantoms,” Med. Phys. 43(1), 23–32 (2016). 10.1118/1.4937597 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Sturgeon G. M., et al. , “Eigenbreasts for statistical breast phantoms,” Proc. SPIE 9783, 97832B (2016). 10.1117/12.2216398 [DOI] [Google Scholar]
  • 35.Chen X., et al. , “High-resolution, anthropomorphic, computational breast phantom: fusion of rule-based structures with patient-based anatomy,” Proc. SPIE 10132, 101321W (2017). 10.1117/12.2255913 [DOI] [Google Scholar]
  • 36.Sarno A., et al. , “Dataset of patient-derived digital breast phantoms for in silico studies in breast computed tomography, digital breast tomosynthesis, and digital mammography,” Med. Phys. 48(5), 2682–2693 (2021). 10.1002/mp.14826 [DOI] [PubMed] [Google Scholar]
  • 37.Chawla A. S., et al. , “Optimized image acquisition for breast tomosynthesis in projection and reconstruction space,” Med. Phys. 36(11), 4859–4869 (2009). 10.1118/1.3231814 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.O’Connor J. M., et al. , “Generation of voxelized breast phantoms from surgical mastectomy specimens,” Med. Phys. 40(4), 041915 (2013). 10.1118/1.4795758 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.O’Connor J. M., et al. , “Development of an ensemble of digital breast object models,” Lect. Notes Comput. Sci. 6136, 54–61 (2010). 10.1007/978-3-642-13666-5_8 [DOI] [Google Scholar]
  • 40.Bakic P. R., Zhang C., Maidment A. D., “Development and characterization of an anthropomorphic breast software phantom based upon region-growing algorithm,” Med. Phys. 38(6Part1), 3165–3176 (2011). 10.1118/1.3590357 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Cockmartin L., Bosmans H., Marshall N., “Comparative power law analysis of structured breast phantom and patient images in digital mammography and breast tomosynthesis,” Med. Phys. 40(8), 081920 (2013). 10.1118/1.4816309 [DOI] [PubMed] [Google Scholar]
  • 42.Badano A., ““How much realism is needed?”—The wrong question in silico imagers have been asking,” Med. Phys. 44(5), 1607–1609 (2017). 10.1002/mp.12187 [DOI] [PubMed] [Google Scholar]
  • 43.Reiser I., Nishikawa R., “Task-based assessment of breast tomosynthesis: effect of acquisition parameters and quantum noise a,” Med. Phys. 37(4), 1591–1600 (2010). 10.1118/1.3357288 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Zeng R., Badano A., Myers K. J., “Optimization of digital breast tomosynthesis (DBT) acquisition parameters for human observers: effect of reconstruction algorithms,” Phys. Med. Biol. 62(7), 2598 (2017). 10.1088/1361-6560/aa5ddc [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Park S., et al. , “A statistical, task-based evaluation method for three-dimensional X-ray breast imaging systems using variable-background phantoms,” Med. Phys. 37(12), 6253–6270 (2010). 10.1118/1.3488910 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Zhao Z., Gang G., Siewerdsen J., “Noise, sampling, and the number of projections in cone-beam CT with a flat-panel detector,” Med. Phys. 41(6Part1), 061909 (2014). 10.1118/1.4875688 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Rajagopal J., et al. , “Evaluation of statistical breast phantoms with higher resolution,” Proc. SPIE 10573, 1057307 (2018). 10.1117/12.2294049 [DOI] [Google Scholar]
  • 48.Siewerdsen J., et al. , “Empirical and theoretical investigation of the noise performance of indirect detection, active matrix flat-panel imagers (AMFPIS) for diagnostic radiology,” Med. Phys. 24(1), 71–89 (1997). 10.1118/1.597919 [DOI] [PubMed] [Google Scholar]
  • 49.Vedula A. A., Glick S. J., Gong X., “Computer simulation of CT mammography using a flat-panel imager,” Proc. SPIE 5030, 349–361 (2003). 10.1117/12.480015 [DOI] [Google Scholar]
  • 50.Das M., et al. , “Penalized maximum likelihood reconstruction for improved microcalcification detection in breast tomosynthesis,” IEEE Trans. Med. Imag. 30(4), 904–914 (2011). 10.1109/TMI.2010.2089694 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Boone J. M., “Glandular breast dose for monoenergetic and high-energy X-ray beams: Monte Carlo assessment,” Radiology 213(1), 23–37 (1999). 10.1148/radiology.213.1.r99oc3923 [DOI] [PubMed] [Google Scholar]
  • 52.Johns P. C., Yaffe M. J., “X-ray characterisation of normal and neoplastic breast tissues,” Phys. Med. Biol. 32(6), 675 (1987). 10.1088/0031-9155/32/6/002 [DOI] [PubMed] [Google Scholar]
  • 53.Siddon R. L., “Fast calculation of the exact radiological path for a three-dimensional CT array,” Med. Phys. 12(2), 252–255 (1985). 10.1118/1.595715 [DOI] [PubMed] [Google Scholar]
  • 54.Lim J. S., Two-Dimensional Signal and Image Processing, p. 710, Prentice Hall, Englewood Cliffs, New Jersey: (1990). [Google Scholar]
  • 55.Vieira M. A. C., et al. , “Effect of denoising on the quality of reconstructed images in digital breast tomosynthesis,” Proc. SPIE 8668, 86680C (2013). 10.1117/12.2007804 [DOI] [Google Scholar]
  • 56.Feldkamp L. A., Davis L., Kress J. W., “Practical cone-beam algorithm,” J. Opt. Soc. Am. A 1(6), 612–619 (1984). 10.1364/JOSAA.1.000612 [DOI] [Google Scholar]
  • 57.https://www.python.org.
  • 58.Brunyé T. T., et al. , “A review of eye tracking for understanding and improving diagnostic interpretation,” Cognit. Res. Principles Implic. 4(1), 1–16 (2019). 10.1186/s41235-019-0159-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Kundel H. L., Nodine C. F., Carmody D., “Visual scanning, pattern recognition and decision-making in pulmonary nodule detection,” Invest. Radiol. 13(3), 175–181 (1978). 10.1097/00004424-197805000-00001 [DOI] [PubMed] [Google Scholar]
  • 60.Kundel H. L., et al. , “Using gaze-tracking data and mixture distribution analysis to support a holistic model for the detection of cancers on mammograms,” Acad. Radiol. 15(7), 881–886 (2008). 10.1016/j.acra.2008.01.023 [DOI] [PubMed] [Google Scholar]
  • 61.Voisin S., et al. , “Investigating the association of eye gaze pattern and diagnostic error in mammography,” Proc. SPIE 8673, 867302 (2013). 10.1117/12.2007908 [DOI] [Google Scholar]
  • 62.Olsen A., “The Tobii I-VT fixation filter,” Tobii Technol. 21, 4–19 (2012). [Google Scholar]
  • 63.Mackenzie A., et al. , “Effect of glandularity on the detection of simulated cancers in planar, tomosynthesis, and synthetic 2D imaging of the breast using a hybrid virtual clinical trial,” Med. Phys. 48(11), 6859–6868 (2021). 10.1002/mp.15216 [DOI] [PubMed] [Google Scholar]
  • 64.Timberg P., et al. , “Investigation of viewing procedures for interpretation of breast tomosynthesis image volumes: a detection-task study with eye tracking,” Eur. Radiol. 23(4), 997–1005 (2013). 10.1007/s00330-012-2675-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Suwa K., et al. , “Analyzing the eye movement of dentists during their reading of CT images,” Odontology 89(1), 0054–0061 (2001). 10.1007/s10266-001-8186 [DOI] [PubMed] [Google Scholar]
  • 66.Hadjipanteli A., et al. , “The threshold detectable mass diameter for 2D-mammography and digital breast tomosynthesis,” Phys. Med. 57, 25–32 (2019). 10.1016/j.ejmp.2018.11.014 [DOI] [PubMed] [Google Scholar]
  • 67.Goodsitt M. M., et al. , “Digital breast tomosynthesis: studies of the effects of acquisition geometry on contrast-to-noise ratio and observer preference of low-contrast objects in breast phantom images,” Phys. Med. Biol. 59(19), 5883 (2014). 10.1088/0031-9155/59/19/5883 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Vancoillie L., et al. , “The impact on lesion detection via a multi-vendor study: a phantom-based comparison of digital mammography, digital breast tomosynthesis, and synthetic mammography,” Med. Phys. 48(10), 6270–6292 (2021). 10.1002/mp.15171 [DOI] [PubMed] [Google Scholar]
  • 69.Ikejimba L. C., et al. , “Assessment of task-based performance from five clinical DBT systems using an anthropomorphic breast phantom,” Med. Phys. 48(3), 1026–1038 (2021). 10.1002/mp.14568 [DOI] [PubMed] [Google Scholar]
  • 70.Bertram R., et al. , “The effect of expertise on eye movement behaviour in medical image perception,” PLoS One 8(6), e66169 (2013). 10.1371/journal.pone.0066169 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Both code and data could be made available upon written request to the authors.


Articles from Journal of Medical Imaging are provided here courtesy of Society of Photo-Optical Instrumentation Engineers

RESOURCES