Abstract.
This study aims to characterize the effect of background tissue density and heterogeneity on the detection of irregular masses in breast tomosynthesis, while demonstrating the capability of the sophisticated tools that can be used in the design, implementation, and performance analysis of virtual clinical trials (VCTs). Twenty breast phantoms from the extended cardiac-torso (XCAT) family, generated based on dedicated breast computed tomography of human subjects, were used to extract a total of 2173 volumes of interest (VOIs) from simulated tomosynthesis images. Five different lesions, modeled after human subject tomosynthesis images, were embedded in the breasts and combined with the lesion absent condition yielded a total of VOIs. Effects of background tissue density and heterogeneity on the detection of the lesions were studied by implementing a composite hypothesis signal detection paradigm with location known exactly, lesion known exactly or statistically, and background known statistically. Using the area under the receiver operating characteristic curve, detection performance deteriorated as density was increased, yielding findings consistent with clinical studies. A human observer study was performed on a subset of the simulated tomosynthesis images, confirming the detection performance trends with respect to density and serving as a validation of the implemented detector. Performance of the implemented detector varied substantially across the 20 breasts. Furthermore, background tissue density and heterogeneity affected the log-likelihood ratio test statistic differently under lesion absent and lesion present conditions. Therefore, considering background tissue variability in tissue models can change the outcomes of a VCT and is hence of crucial importance. The XCAT breast phantoms have the potential to address this concern by offering realistic modeling of background tissue variability based on a wide range of human subjects, comprising various breast shapes, sizes, and densities.
Keywords: breast imaging, digital breast tomosynthesis, extended cardiac-torso breast phantoms, anthropomorphic lesion models, detection, performance evaluation, receiver operating characteristic curve analysis, doubly composite hypothesis testing, Monte Carlo integration, virtual clinical trials, tissue heterogeneity, breast density
1. Introduction
Advanced imaging techniques and systems are constantly under development and examination to improve the screening and diagnosis of breast cancer. The advent of digital X-ray detectors has facilitated three-dimensional (3-D) X-ray imaging of the breast. Digital breast tomosynthesis and dedicated breast computed tomography (CT) are two promising 3-D modalities that are rapidly growing in their development and application.1
Detecting a lesion in tomosynthesis images is confounded not only by several factors, such as system quantum noise, scattering, and other physical artifacts, but also by background tissue variation, which can lead to masking of an actual lesion as well as giving the misleading impression of a lesion where none exists. It is therefore important to consider all of these factors when studying lesion detection. One attribute of uncertain background tissue is its average density, the volume fraction of fibroglandular tissue, which can be considered as a first-order statistic. The background tissue average density is referred to as background tissue density for simplicity hereafter. Another important attribute of uncertain background tissue is its spatial distribution and texture, which reflects higher-order statistics and is referred to as heterogeneity hereafter.
The focus of this study is to characterize the effect of background tissue density and heterogeneity on the detection of irregular masses in digital breast tomosynthesis in a simulation study. The presented study expands upon initial results reported in Ref. 2 and was made possible through modeling the population, the imaging system, and the observers. The population was represented via several virtual breast phantoms from the extended cardiac-torso (XCAT) family,3–7 with lesion models extracted from human subject tomosynthesis images. The geometry, beam spectrum, and noise and scatter characteristics of the imaging system were modeled after a prototype tomosynthesis imaging system to simulate the image formation chain. Finally, a doubly composite hypothesis signal detection theory paradigm, where both the and hypothesis have uncertain parameters,8 was devised and implemented to evaluate and characterize the performance of observers in signal known exactly (SKE) or signal known statistically (SKS) and background known statistically (BKS) detection paradigms. The implemented likelihood ratio detection paradigm, a Bayesian ideal observer, takes advantage of the volumetric data available in tomosynthesis images, incorporates both lesion and background uncertainty in decision-making, and uses an optimal decision metric with the help of Monte Carlo integration techniques.9 In a doubly composite hypothesis paradigm, the numerator and the denominator of the likelihood ratio are each typically modeled mathematically by joint probability density functions (pdfs) of the data, conditional to unknown parameters (the realizations of the ensemble), which are then weighted by the prior probabilities of the parameters and integrated. An aspect of our approach is that it obtains the ensemble of realizations needed for the Monte Carlo integration directly from the ensemble of virtual breast phantoms. Initial exploratory human observer studies were conducted and receiver operating characteristic curves (ROCs) were obtained to get insight into whether this might also model the detection performance of experienced human observers who have some memory of previous images and decision performance. This study considers the particular task of deciding the presence or absence of a lesion in which there is uncertainty in both the structure of the lesion as well as the background.
2. Methods
Detecting a lesion in tomosynthesis images in a location known exactly sense can be modeled as classifying a given volume of interest (VOI) as corresponding to only background tissue or to background tissue plus a lesion. This detection/classification problem can be formulated in terms of a hypothesis test, where the null hypothesis indicates the sole presence of noisy and uncertain background tissue and the alternative hypothesis indicates the presence of a lesion in noisy and uncertain background tissue in a VOI in the tomosynthesis image set. In other words, the detector, emulating a radiologist reader, is trying to decide whether a VOI in tomosynthesis images contains a lesion or is simply normal breast tissue. The readers are usually trained by studying images of numerous cases and their corresponding pathological appearance. For each modality, the readers have practically formed a training library of statistically known background tissue variations with and without lesions. In the case of tomosynthesis, the readers have access to slices through the reconstructed volume of a compressed breast, which they can scroll through, look at in cine, or look at via their maximum intensity projection rendering. In this study, we considered the detector to have access to the tomosynthesis reconstructed slices in a VOI, forming such a “training” library. The following elaborate the design and implementation of this detection paradigm and its performance analysis.
2.1. Population Modeling
The XCAT breast phantoms were generated through processing and segmenting dedicated breast CT images of a large number of human subjects acquired at the University of California, Davis, as part of an institutional review board approved study using a prototype dedicated breast CT scanner.3–7 A group of 20 different breast phantoms from the XCAT family was selected to represent the population with different shapes, sizes, and densities. Each phantom voxel was assigned to adipose (0% dense) or three fibroglandular classes of 80%, 90%, and 100% to create a realistic “feathering effect” at the transitions between adipose and fibroglandular tissue. The mesh models for these breast phantoms were compressed to 4 cm thickness in the craniocaudal direction using a simplistic mathematical technique.3 The breast phantoms can also be compressed using finite-element compression techniques.10 Figure 1 shows the middle axial slice through the 4-cm-thick compressed volume of the 20 breast phantoms. The gray values correspond to different fibroglandular tissue classes based on density. The various anatomical distributions of the tissues, the different shapes, sizes, and densities of the breasts, can be appreciated in this figure.
Fig. 1.
Middle slice through the 4-cm-thick compressed volume of the 20 breast phantoms used in the study. The gray values correspond to different fibroglandular tissue classes based on density. Note the various shapes, sizes, and densities of the phantoms.
2.2. Lesion Modeling
The lesion models for this study were generated from the tomosynthesis images of five different human subjects with biopsy-proven cancers. These masses were segmented and used to inform the fitting of a mathematical volume with a Gaussian edge profile11 to be bound in a cube. Figure 2 depicts the middle slice through the tomosynthesis images of the human subjects and the corresponding models generated based on them.
Fig. 2.
Five lesion models: middle tomosynthesis slice through the lesions in human subject images (top), volume rendering of the generated models (middle), and the three fibroglandular classes after scaling the lesions to fit in a bounding box.
A grid with 200 positions 1 cm apart was designed to embed lesions in the central depth of each breast phantom. This arrangement was designed to take advantage of as much of the breast volume as possible. Only locations falling within the uniformly compressed 4-cm-thick region of each breast were used for fair comparison. Figure 3 depicts the lesion grid with one of the five lesion models in all of its 200 positions. Using this technique, a total of 2173 grid positions were identified for lesion placement within the 20 breast phantoms.
Fig. 3.
Lesion grid with 200 positions 1 cm apart filled with the same lesion model.
2.3. Image Formation
The geometry, spectrum, and physical characteristics of a tomosynthesis unit (Siemens MAMMOMAT Inspiration, Erlangen, Germany) were used to simulate the tomosynthesis images. Twenty-five projection images spanning a 45-deg arc were simulated by ray tracing through the mesh models of the breast phantoms, implemented on a graphics processing unit (GPU) cluster. A conventional mammographic spectrum with W/Rh at 30 kVp was used for generating the projection images. Pixels were binned to size to limit computational complexity. Considering the spatial extent of the imaging features, this process was expected to have negligible effects on the results.
Noise and scatter were simulated and added to the projection images. The scatter contribution was generated by convolution of the scatter point spread function with the primary image,12 then scaled by the empirically measured scatter-to-primary ratio for this thickness and energy. The noise magnitude as a function of binned pixel value and the noise power spectrum (NPS) of the tomosynthesis system were estimated based on a 50% glandular, 4-cm uniform phantom (Computerized Imaging Reference Systems Inc., Norfolk, Virginia). To add the noise to the projection images, a Gaussian noise map, which is a good approximation of the Poisson noise in the projection images, was created and filtered by the square root of the NPS curve. This map was then multiplied by an intensity-to-noise magnitude map defined by the measured trends to give the overall noise pattern.13 The magnitude of the added noise corresponded to an average glandular dose of 1 mGy.
A standard filtered back-projection algorithm with a cosine filter implemented on a GPU cluster was used to reconstruct the tomosynthesis slices at 1 mm slice spacing and in-plane resolution.14,15 To estimate the noise in the tomosynthesis reconstructed images, a requirement of our likelihood ratio calculations, projection images of a uniform phantom were simulated with added noise and scatter and reconstructed. The first-order pdf of the reconstructed noise was found to correspond to a zero-mean Gaussian noise, but presumed not to be white. Figure 4 shows simulated projection and reconstructed images of a breast phantom with various lesions embedded in its middle depth, with and without simulated noise and scatter.
Fig. 4.
Central projection image of a breast phantom with lesions embedded in its middle depth (a) without and (b) with noise and scatter. The middle slice through the tomosynthesis reconstructed volume of the same breast phantom (c) without and (d) with noise and scatter.
2.4. Observer Modeling
2.4.1. Detection paradigm
It is commonly known that dense parenchymal background tissue may obscure lesions, leading to degradation in lesion detection.16–20 At the same time, parenchymal tissue can mimic lesions, leading to a high number of false positives.16–20 In this study, we aimed to quantify the impact of background tissue heterogeneity on lesion detection and put that in comparison with background tissue density. As such, we evaluated the dependence of detection performance on both background tissue density and heterogeneity.
The detection problem can be formulated as a doubly composite signal detection theory paradigm8 where the null hypothesis indicates only noisy and uncertain background tissue, whereas the alternative hypothesis indicates the presence of a lesion in noisy and uncertain background tissue. The optimum detector for this situation forms the likelihood ratio, which incorporates the uncertainties in an optimal way. In the context of observer modeling, the likelihood ratio detector is sometimes called a Bayesian ideal observer since the ROC using this optimal approach provides a realistic upper bound on detection performance.
To reiterate, the detector is presented with a VOI in reconstructed tomosynthesis images and processes the information so as to decide whether there is a lesion in the VOI or if it is solely background tissue. Each VOI has voxels. The detector has access to different realizations of a VOI of this size in its training dataset, which represents the uncertainty in the background tissue variations. is the set of these background-only VOIs. The detector also has access to a copy of each VOI from with a given lesion model, , embedded in the same background tissue for all lesion models . is the set of these VOIs. Therefore, the detector training dataset consists of . The detector classifies a vectorized test VOI as belonging to either hypothesis,
| (1) |
where , represents the noise. Note that noise refers to the aggregate deteriorating effects of system quantum noise, scatter, and other physical characteristics on reconstructed tomosynthesis images. To assess the effect of the background tissue on detection performance, noise was modeled as an additive zero-mean white Gaussian noise with a standard deviation estimated from the tomosynthesis images.
The likelihood ratio is formed by calculating the joint pdf of a VOI conditional to each of the two hypotheses. In this work, a Monte Carlo integration technique is used to approximate the marginal pdfs. This results in a form of the likelihood ratio that uses the available realizations of a VOI in the training dataset.9 This direct use of actual data realizations is a major feature of this approach as opposed to trying to capture the characteristics of the data through multivariate Gaussian modeling. Equations (2)–(4) show the derivation of the likelihood ratio, , for a given test VOI, , for the doubly composite hypothesis signal detection theory problem, namely where there is background uncertainty present under both hypotheses, and lesion uncertainty under the alternative hypothesis. In particular, the likelihood ratio directly incorporates the uncertain background information by capturing its dependencies and correlations. It was also assumed that the training dataset does not include the exact background tissue as in the test VOI, with or without a lesion embedded
| (2) |
Substituting the conditional probability distribution functions of the test voxels under each hypothesis based on Eq. (1) in the likelihood ratio results in
| (3) |
The VOIs in the training dataset were assumed to be equiprobable. Therefore,
| (4) |
Note that the final results are joint pdfs of the VOI conditional to each of the hypotheses and are typically dependent and in general not multivariate Gaussian and hence not representable by a mean vector and covariance matrix. Furthermore, this likelihood ratio goes beyond simple cross correlation by including the signal energy terms in both its numerator and denominator. The likelihood ratio is calculated for every VOI in the test dataset. The likelihood ratio may then be thresholded to calculate the probability of detection (sensitivity), , and the probability of false alarm (1-specificity), , to obtain the ROC, or the resultant area under the curve (AUC). AUC values and their associated standard error and 95% confidence interval were computed with ROCkit software (University of Chicago, Chicago, Illinois, 2011).
2.4.2. Test and training datasets
Two different sets of tomosynthesis images were generated to constitute the test and training data under the two hypotheses. Under , from the images of the 20 breast phantoms, without inclusion of any lesions, 2173 VOIs of dimensions were extracted at the locations that fall within the uniformly 4-cm-thick region of each breast. These VOIs constituted with . Under on the other hand, tomosynthesis images were generated with the inclusion of a lesion grid. A set of 2173 VOIs of the same dimensions was extracted from the same locations as in for each of the five lesion models, resulting in a total of VOIs. Different subsets of these VOIs could be used to constitute data for problems with different hypotheses for and . The subscript in specifies the lesion model(s) used in the VOIs in the set; for instance consists of 2173 VOIs with lesion model 1 embedded in them, and consist of the 2173 VOIs with lesion models 2, 3, 4, and 5 embedded in them, respectively, constituting VOIs in total. Figure 5 shows sample VOIs from one breast with and without lesions. The test and training datasets were used in a leave-one-out fashion, meaning that each testing background was excluded from training. The density of every VOI was calculated from the corresponding region in the voxelized breast phantom as a fraction of fibroglandular tissue in the volume. Figure 6 shows the distribution of the VOI density across the 2173 VOIs. The mean VOI density per breast for the 20 breast phantoms was 0.23, in a range of 0.02 to 0.77.
Fig. 5.
Middle slice through adjacent tomosynthesis reconstructed VOIs from one breast (a) with a lesion model embedded in them and (b) without any lesions. Notice that the lesions are invisible in certain VOIs as a result of tissue superposition, noise, and scattering.
Fig. 6.
Histogram of the distribution of VOI densities across 2173 VOIs.
2.5. Detection Analysis
2.5.1. Effect of background tissue density uncertainty
To study the effect of background tissue density on detection performance, the 2173 background realizations were divided equally into five categories based on the density of the VOI, denoted by . The effect of background tissue density uncertainty was evaluated in two different paradigms described below. Note that under both paradigms, the detector had to classify a test VOI in a given density category as either background-only or background plus a known lesion. The background was only known statistically.
SKE-BKS: First, it was assumed that the detector is trained in all density categories but is tested in a certain density category. The test dataset in each density category contained one fifth of the cases in the training dataset. For each density category, performance was evaluated as a function of the test and training sample size.
SKE-BKS-density known statistically (SKE-BKS-DKS): Next, it was assumed that the detector knows the background tissue density category of the test VOI but not its particular background tissue density, which is part of the underlying uncertainty. The detector was both tested and trained in the same density category. This paradigm examined the effect of background uncertainty with the knowledge of background density category.
2.5.2. Effect of lesion uncertainty
To characterize the effect of lesion uncertainty on detection performance, two different paradigms were considered. The background was only known statistically.
SKE-BKS-across lesions: First, it was assumed that the detector exactly knows the lesion; this situation was implemented for every one of the five lesion models. This paradigm examines the effect of lesion variability on detection performance.
SKS-BKS: Next, the detector was assumed to only have a statistical knowledge of the lesion in the sense that it is trained with a finite number of similar lesion possibilities. The purpose of this paradigm was to examine the effect of lesion uncertainty in addition to background uncertainty when the detector is trained with several lesion models but tested on a new lesion model.
2.5.3. Effect of background tissue heterogeneity uncertainty
Up to this point, the data collected from the 20 different breast phantoms were combined to evaluate the detection performance. To evaluate the detection performance in an individual breast, the lesion known exactly, background known statistically paradigm was selected for each VOI and the detector assumed the VOIs for a given breast were statistically independent.
The trends in log-likelihood ratio under and at the 2173 VOI locations across the 20 breast phantoms were compared with the trends in background tissue density to examine the contribution of background tissue density and heterogeneity on detection performance.
2.6. Human Observer Study
The implemented detector was validated with a human observer study. The observers were presented with images from the simulated tomosynthesis images one by one to score their confidence in the presence of a lesion in the images. The midslices of 200 tomosynthesis VOIs were presented to the observers in random order. These images were displayed to the scale at on a grayscale standard display function-calibrated, 5-megapixel display. The image set contained 20 VOIs from each of the five density categories, with and without a lesion embedded in them. The images included just one lesion model. The observers were comprised of two medical physicists (J.Y.L. and E.S.) with many years of experience evaluating image quality for tomosynthesis and five students familiar with medical imaging. There was no significant difference in their performance. Since the detection task was relatively simple and no clinical diagnosis was required, it was sufficient to rely upon physicists and students for this study. The performances of the observers were averaged in each density category, and interobserver error bars were calculated. Figure 7 shows sample images presented to the observers.
Fig. 7.
Sample midslice images of the VOIs in the five density categories (density increasing from top to bottom), (a) with embedded lesion, or (b) background-only.
3. Results
3.1. Human Observers Performance
Figure 8 shows the average ROC curve of all observers across the five density categories. The AUC values are also presented in a bar chart in Fig. 8. Detection performance changes with background density; as density increases, the detection performance deteriorates.
Fig. 8.
Average observer performance with respect to background density; error bars indicate interobserver variation. Note that the detection performance deteriorates as the background density increases.
3.2. Effect of Background Tissue Density Uncertainty
Figure 9 shows the ROC curve of the implemented detector under the SKE-BKS paradigm. Although the lesion is known exactly, the detector is faced with background uncertainty and does not know the density.
Fig. 9.
ROC curve for the SKE-BKS paradigm, where the lesion is known exactly but the background is only known statistically. The AUC is with 95% confidence interval [0.854, 0.876].
Under the same conditions, the ROC curves were generated for each density category separately to evaluate the effect of density uncertainty on the detection performance in each density category in the SKE-BKS paradigm, Figs. 10(a) and 10(b). As density was increased between 0, 0.005, 0.049, 0.142, 0.332, and 1.0 (density category bounds), it is seen that the AUC was decreased by 1%, 7%, 13%, and 19%, respectively. In other words, the detection performance deteriorated as the density increased. The concordance of these performance trends with those of human observers serves to validate the performance of the detector and the tools and models used in this study.
Fig. 10.
(a) ROC curves for different test density categories in the SKE-BKS paradigm, where the detector is trained on all density categories, and (b) AUC of each curve as a function of density. (c) ROC curves for different density categories in the SKE-BKS-DKS paradigm, where the detector is tested and trained on a specific density category, and (d) AUC of each curve as a function of density. The change in the density training paradigm increased the AUC by an average of 5% in each density category. Error bars denote standard error in AUC values.
With the test and training datasets divided into the five density categories in the SKE-BKS-DKS paradigm, the effect of more detailed knowledge of the density category on detection performance and the AUC are presented in Figs. 10(c) and 10(d). The results suggest that the knowledge of the density category improves the performance of the detector by an average of 5% increase in the AUC in each density category, compared to the SKE-BKS paradigm.
Under the paradigm of training on all density categories, the SKE-BKS paradigm, the effect of varying the number of VOIs included in the test and training datasets on the AUC was studied to ensure the stability of the above findings. Note that when different values were used, the number of VOIs in each density category was not necessarily equalized. It appears that a minimum of 400 independent VOIs needed to be included in the training dataset for the results to approach the steady state. This finding ensured that the number of VOIs included in the study was sufficient.
3.3. Effect of Lesion Uncertainty
First, the effect of lesion variability was examined on the detection performance in the SKE-BKS across lesions paradigm. The use of the five lesion models led to similar performance in terms of density dependence. The ROC curves corresponding to these paradigms are shown in Fig. 11(a). As the density was increased over the same density categories as before, the AUC dropped. The AUC varied between 2% and 9% across the lesion models in the same density category. The standard deviation of the AUC values across the five lesions was increased as the density was increased over the same categories as before.
Fig. 11.
(a) ROC curves for the SKE-BKS across lesions paradigm and (b) the SKS-BKS paradigm, along with the SKE-BKS paradigm in solid lines for comparison.
Next, the ROC curves for the case of lesion and background known statistically in the SKS-BKS paradigm are shown in Fig. 11(b) in each density category. The AUC dropped between 2% and 7% in the same density categories in comparison with the SKE-BKS paradigm. These results are in accordance with the expectation that not knowing the exact characteristics of the lesion can make the detection harder. This last paradigm can be considered to be a preliminary result closer to a situation for a radiologist reader; the radiologist has seen many sample lesions, but as in this work not the exact lesion in the current patient. Table 1 shows the AUC value for each of these paradigms along with its standard error and 95% confidence interval.
Table 1.
standard error values and 95% confidence intervals for different detection paradigms. Equalized density categories are denoted by bounds on density, .
| Paradigm | |||||
|---|---|---|---|---|---|
| SKE-BKS (lesion 1) | [0.975, 0.991] | [0.957, 0.978] | [0.883, 0.925] | [0.753, 0.815] | [0.595, 0.663] |
| SKE-BKS (lesion 2) | [0.977, 0.992] | [0.961, 0.982] | [0.886, 0.924] | [0.752, 0.813] | [0.573, 0.647] |
| SKE-BKS (lesion 3) | [0.979, 0.993] | [0.964, 0.983] | [0.917, 0.947] | [0.780, 0.835] | [0.596, 0.670] |
| SKE-BKS (lesion 4) | [0.991, 0.999] | [0.984, 0.996] | [0.945, 0.972] | [0.838, 0.890] | [0.596, 0.716] |
| SKE-BKS (lesion 5) | [0.988, 0.998] | [0.979, 0.994] | [0.941, 0.966] | [0.829, 0.877] | [0.582, 0.656] |
| SKE-BKS-DKS (lesion 1) | [1, 1] | [0.987, 0.999] | [0.952, 0.977] | [0.815, 0.868] | [0.5876 0.651] |
| SKS-BKS | [0.944, 0.968] | [0.908, 0.943] | [0.806, 0.858] | [0.699, 0.768] | [0.551, 0.625] |
Figure 12 shows the concordance of the most representative detection paradigm (SKE-BKS) with observer data as a function of density. The results indicate that the detector provides a density dependency similar to that of human observers but not identical. This difference should remain as an area of future exploration; however, it can potentially be attributed to the differences in the design of the detector and the human observer study. This data, however, can be used to calibrate the detector data to observer performance.
Fig. 12.
Scatter plot of AUC against density for the implemented detector and the human observers. The SKE-BKS paradigm, which most closely relates to the human observer study, was selected for this comparison. Each density category is presented by its mean value. The standard error bars are presented for the SKE-BKS data points.
3.4. Effect of Background Tissue Heterogeneity Uncertainty
Finally, performance across different VOIs is analyzed to determine the effect of tissue heterogeneity. As suggested by the AUC values for individual breasts (mean: 0.81; standard deviation: 0.148), detection performance varies substantially across different breasts and it tends to be closely related to the breast density.
To further visualize the location-wise dependence of detection performance within every breast, Fig. 13 aims to show the correspondence between background tissue density, background tissue heterogeneity, the log-likelihood ratio under , and the log-likelihood ratio under at every VOI location across the 20 breast phantoms. The corresponding log-likelihood ratio under and log-likelihood ratio under at the VOI locations as calculated per Eqs. (2)–(4) are presented in the form of two heat maps. Comparing the same VOI location across the middle tomosynthesis slice and the heat maps suggests that in general the log-likelihood ratio under follows the density trend and the log-likelihood ratio under follows the opposite density trend; generally, the log-likelihood ratio under is higher where the density is higher, and the log-likelihood ratio under is higher where the density is lower. However, there are several locations for which the reverse is observed. This is not too surprising since it is the uncertain heterogeneity of the background that is being modeled using actual realizations of real tomosynthesis reconstructed images, for which the density is just one parameter that only partially characterizes the uncertainty in the background.
Fig. 13.
(a) Middle slice through the tomosynthesis reconstructed volume, (b) the corresponding heat map of extracted VOI density, and (c and d) log-likelihood ratio under and maps per Eqs. (2)–(4) of the 20 breast phantoms. The AUC of the SKE-BKS paradigm in a given breast varied considerably across the 20 breasts (mean: 0.81; standard deviation: 0.148).
Figure 14 shows a scatter plot of the log-likelihood ratio under and the log-likelihood ratio under against density across all VOIs. The data suggest that the relationship between density and the log-likelihood ratio under or is not linear and can rather be approximated by a quadratic fit. As a result of density variation from 0 to 1, the quadratic fit to the log-likelihood ratio under varied by 3.36, and the quadratic fit to the log-likelihood ratio under varied by 17.63. This suggests that the density has a more significant effect on the log-likelihood ratio under than on the log-likelihood ratio under .
Fig. 14.
Distribution of the log-likelihood ratio under (a) and (b) per Eqs. (2)–(4) versus density and their corresponding quadratic fits. Dotted vertical lines indicate the density categories with equal number of samples, and solid vertical lines at mid-density points indicate standard deviation of the log-likelihoods in the density category.
Furthermore, the fact that the same density results in multiple possible values for both the log-likelihood ratio under and the log-likelihood ratio under emphasizes that density is not the only factor playing a role in detection performance.
Background tissue heterogeneity, though yielding the same average VOI density, can significantly change the detection performance. As a result of background tissue heterogeneity, an average standard deviation of was measured in the log-likelihood ratio under and within each equalized density category, as separated by vertical dotted lines in Fig. 14. These results suggest that, based on the metrology of this study alone, background tissue heterogeneity can affect the log-likelihood ratio under times as much as background tissue density, while background tissue density seems to be affecting the log-likelihood ratio under slightly more than background tissue heterogeneity.
4. Discussion
Given the recent advances in computation and modeling, virtual clinical trials (VCTs) can be carefully designed and carried out to inform, orient, or potentially replace clinical trials. VCTs involve simulation of the patient population, image formation, and the observers. In this study, we elaborated on the employment and advancement of the sophisticated tools and models that were developed in previous studies3–7 and can potentially be used in a VCT.
This study focused on the detection of irregular masses in uncertain heterogeneous background tissue as perceived from tomosynthesis reconstructed images. The detection task was defined as deciding whether a given VOI includes a lesion or not, under lesion known exactly or statistically and background known statistically paradigms. The detection performance analysis was carried out rigorously through sensitivity and specificity analyses, characterizing the effects of background tissue density and heterogeneity uncertainty and lesion uncertainty on the detection performance.
The observer model used in this study surpasses the models used in previous studies by simultaneously considering several factors: (1) it took advantage of the 3-D data provided by tomosynthesis, (2) it incorporated the uncertainty resulting from realistic background tissue variations in the detection task, (3) it incorporated the uncertainty resulting from lesion variation in the detection task, (4) it took advantage of the realistic data realizations in evaluating the decision metric rather than constructed mathematical uncertainty, and (5) it is ideal in the sense of using likelihood ratios as its decision metric.
The performed human observer study served to validate the observer model used in this study. The human observers demonstrated deteriorating detection performance as the background density increased. The concordance of these trends with those of the implemented detector validated the application of the detector and the tools and models used to perform this simulation study.
Similar to the human observers, the detector performance results suggest that the detection performance deteriorates as the background tissue density increases. However, the knowledge of the background tissue density category seems to have an effect as well. The additional information about background tissue density category can somewhat improve the detection performance in the presence of uncertain background heterogeneity. The close agreement of these results with those reported previously16–20 may serve as a first-order validation of the simulation platform for possible VCTs.
The stability of the AUC values after inclusion of enough VOI realizations in the training dataset suggests that the total number of realizations used in the analysis were sufficient for drawing conclusions. Furthermore, the distribution of background tissue density was in agreement with reported average breast density in recent literature.21
When using the same lesion model in the test and training datasets, the different lesion models resulted in similar performance trends with respect to density. When using different lesion models in the training dataset than the lesion model in the test dataset, the AUC dropped in comparison with using the same lesion model in the test and training datasets. These preliminary results are in accord with the expectation that not knowing the exact characteristics of the lesion can make the detection harder and that the more samples of lesions in the training dataset can help with the detection.
Furthermore, it was hypothesized that the detectability is not only affected by background tissue density uncertainty but also by background tissue heterogeneity uncertainty. In fact, the density is a partial characterization of the background uncertainty. This latter was observed through comparing the log-likelihood ratio under and with density. It was seen that not only did the likelihoods vary significantly across different breasts, but also they did not necessarily correspond to the background tissue density. This suggests that in addition to background tissue density, background tissue heterogeneity also plays a role in the detection performance. Although the test VOIs were not designed specifically to avoid background tissue density and heterogeneity interdependence, it can be deduced that both background tissue density and heterogeneity affect detection performance. Given the substantial role that uncertainty in the background tissue heterogeneity and density play in the detection performance, it seems critical to include heterogeneous anatomical backgrounds with enough variation when performing VCTs.
The results reported in this study, though informative and promising, can be further refined by including a larger number of breast phantoms and lesion models, as well as including other lesion shapes, for better representation of the population, and by simulating higher-resolution images at the cost of computational resources. In that regard, it should be noted that both the breast tissue and lesion definitions included the inherent image properties of the original systems from which they were extracted (dedicated breast CT and digital breast tomosynthesis, respectively). Future work may include deblurring processes for better spatial definitions. The dose condition for the simulations targeted an average radiation condition corresponding to an average glandular dose of 1 mGy. However, given the density variability across the breast models, that condition corresponded to different glandular dose imaging conditions in different breasts. Future work may include a systematic evaluation of effect of dose on detectability. More advanced modeling of the noise in projections images as well as reconstructed images can be carried out to account for underlying physical phenomena and noise correlations. Background tissue heterogeneity metrics can be devised and evaluated to better characterize the effects of background tissue density and heterogeneity with respect to each other. The effects of rotation, translation, and scale of training lesions can also be quantified. In the future, the performance of the devised detection paradigm can be compared with the performance of real observers.
5. Conclusion
This study focused on the detection of both certain and uncertain irregular masses in uncertain breast background tissue as perceived from tomosynthesis reconstructed images. Twenty breast models from the XCAT family with various shapes, sizes, and fibroglandular densities, representing a wide range of the population, were used to generate 2173 VOIs. Five irregular mass lesion models were generated from tomosynthesis images of human subjects and embedded in these VOIs. Tomosynthesis volumes were generated by reconstructing simulated projection images with added noise and scatter. The detection task was defined as identifying whether a given VOI includes a lesion or not, under lesion known exactly or statistically and background known statistically paradigms. The detection performance analysis was done rigorously through sensitivity and specificity analyses, which provide more insight into the performance compared to traditional figures of merit such as signal-to-noise ratio, contrast-to-noise ratio, and so on. Similarity of the trends in detection results with those of human observers served to validate the observer model. The differences between them, although they could be attributed to the differences in the study design, should remain an area of future exploration. The results suggested that both background tissue density uncertainty as well as higher-order heterogeneity uncertainties can affect detection performance. Given the significant role that uncertainty in the background tissue plays in the detection performance, it seems critical to include heterogeneous anatomical backgrounds with enough variation when performing VCTs. In the future, the framework presented in this work can be advanced to be applied to real patient images to compute ROC maps that may aid the diagnosis by human observers.
Acknowledgments
The authors wish to thank Jered Wells, Gregory Sturgeon, Adam Nolte, Yuan Lin, Matthew Reynolds, Kingshuk Roy Choudhury, and Maciej Mazurowski for their contributions and helpful discussions. This work was supported in part by Siemens Healthcare.
Biographies
Nooshin Kiarashi received her doctoral degree in electrical and computer engineering from Duke University while she was a member of the Carl E. Ravin Advanced Imaging Laboratories. Currently, she serves as a lead scientific reviewer at the Center for Devices and Radiological Health at the US FDA. Her research interests include development and application of advanced modeling and computing techniques to realize virtual clinical trials for optimization and evaluation of medical imaging systems.
Loren W. Nolte is a professor of electrical and computer engineering at Duke University. His research interests include developing Bayesian approaches to optimal signal detection, classification, localization and decision fusion in numerous applications. In collaboration with the Medical Physics Group at Duke, his research includes approaches that directly incorporate statistical information from real clinical data to improve sensitivity and selectivity in the detection of uncertain cancer tissue structures in the presence of background imaging uncertainties.
Joseph Y. Lo is a professor and associate vice chair for research of the Department of Radiology, Duke University School of Medicine. He also serves as director of the Carl E. Ravin Advanced Imaging Laboratories. His research focuses on development of anthropomorphic breast phantoms for virtual clinical trials, as well as radiogenomics for improved management of breast cancer.
W. Paul Segars is an associate professor of radiology and biomedical engineering and a member of the Carl E. Ravin Advanced Imaging Laboratories (RAILabs) at Duke University, Durham, North Carolina. He is among the leaders in the development of simulation tools for medical imaging research where he has applied state-of-the-art computer graphics techniques to develop realistic anatomical and physiological models.
Sujata V. Ghate is an associate professor of radiology specializing in mammography, tomosynthesis, breast US and MRI. She practices breast imaging in a university setting, teaching residents, fellows and medical students and collaborating on research projects with the Department of Medical Physics and Biomedical Engineering. She is currently a fellow of the society of breast imaging, and on the advisory board for the CDC Breast and Cervical Cancer Early Detection and Control Advisory Committee.
Justin B. Solomon received his doctoral degree in medical physics from Duke University in 2016 and is currently a medical physicist in the Clinical Imaging Physics Group (CIPG) at Duke University Medical Center’s Radiology Department. His expertise is in x-ray computed tomography imaging and image quality assessment.
Ehsan Samei DABR, FAAPM, FSPIE is a tenured professor at Duke University and the director of the Duke Medical Physics Graduate Program and the Clinical Imaging Physics Program. His interests include clinically relevant metrology of imaging quality and safety for optimum interpretive and quantitative performance. He strives to bridge the gap between scientific scholarship and clinical practice by meaningful realization of translational research and the actualization of clinical processes that are informed by scientific evidence.
References
- 1.Kiarashi N., Samei E., “Digital breast tomosynthesis: a concise overview,” Imaging Med. 5(5), 467–476 (2013). 10.2217/iim.13.52 [DOI] [Google Scholar]
- 2.Kiarashi N., et al. , “The impact of breast structure on lesion detection in breast tomosynthesis,” Proc. SPIE 9412, 941229 (2015). 10.1117/12.2082473 [DOI] [Google Scholar]
- 3.Li C. M., et al. , “Methodology for generating a 3D computerized breast phantom from empirical data,” Med. Phys. 36(7), 3122–3131 (2009). 10.1118/1.3140588 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Hsu C. M. L., et al. , “An analysis of the mechanical parameters used for finite element compression of a high-resolution 3D breast phantom,” Med. Phys. 38(10), 5756–5770 (2011). 10.1118/1.3637500 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hsu C. M., et al. , “Generation of a suite of 3D computer-generated breast phantoms from a limited set of human subject data,” Med. Phys. 40(4), 043703 (2013). 10.1118/1.4794924 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kiarashi N., et al. , “Development and application of a suite of 4D virtual breast phantoms for optimization and evaluation of breast imaging systems,” IEEE Trans. Med. Imaging 33(7), 1401–1409 (2014). 10.1109/TMI.2014.2312733 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Erickson D. W., et al. , “Population of 224 realistic human subject-based computational breast phantoms,” Med. Phys. 43(23), 23–32 (2015). 10.1118/1.4937597 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lee S. C., Nolte L. W., Hatsell C. P., “A generalized likelihood ratio formula: arbitrary noise statistics for doubly composite hypotheses,” IEEE Trans. Inf. Theory 23, 637–640 (1977). 10.1109/TIT.1977.1055766 [DOI] [Google Scholar]
- 9.Shorey J. A., Nolte L. W., Krolik J. L., “Computationally efficient Monte Carlo estimation algorithms for matched field processing in uncertain ocean environments,” J. Comput. Acoust. 2(3), 285–314 (1994). 10.1142/S0218396X94000191 [DOI] [Google Scholar]
- 10.Sturgeon G. M., et al. , “Finite-element modeling of compression and gravity on a population of breast phantoms for multimodality imaging simulation,” Med. Phys. 43(5), 2207–2217 (2016). 10.1118/1.4945275 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Solomon J. B., Samei E., “A generic framework to simulate realistic lung, liver, and renal pathologies in CT imaging,” Phys. Med. Biol. 59(21), 6637 (2014). 10.1088/0031-9155/59/21/6637 [DOI] [PubMed] [Google Scholar]
- 12.Salvagnini E., et al. , “Quantification of scattered radiation in projection mammography: four practical methods compared,” Med. Phys. 39(6), 3167–3180 (2012). 10.1118/1.4711754 [DOI] [PubMed] [Google Scholar]
- 13.Saunders R. S., Samei E., Hoeschen C., “Impact of resolution and noise characteristics of radiographic detectors on the detectability of lung nodules,” Med. Phys. 31(6), 1603–1613 (2004). 10.1118/1.1753112 [DOI] [PubMed] [Google Scholar]
- 14.Kak A. C., Slaney M., Principles of Computed Tomographic Imaging, SIAM, Philadelphia, Pennsylvania: (2001). [Google Scholar]
- 15.Wu T., et al. , “A comparison of reconstruction algorithms for breast tomosynthesis,” Med. Phys. 31(9), 2636–2647 (2004). 10.1118/1.1786692 [DOI] [PubMed] [Google Scholar]
- 16.Kerlikowskie K., et al. , “Effect of age, breast density, and family history on the sensitivity of first screening mammography,” JAMA 276(1), 33–38 (1996). 10.1001/jama.1996.03540010035027 [DOI] [PubMed] [Google Scholar]
- 17.Lehman C. D., et al. , “Effect of age and breast density on screening mammography with false-positive findings,” Am. J. Roentgenol. 173(6), 1651–1655 (1999). 10.2214/ajr.173.6.10584815 [DOI] [PubMed] [Google Scholar]
- 18.Rafferty E. A., et al. , “Diagnostic accuracy and recall rates for digital mammography and digital mammography combined with one-view and two-view tomosynthesis: results of an enriched reader study,” Am. J. Roentgenol. 202(2), 273–281 (2014). 10.2214/AJR.13.11240 [DOI] [PubMed] [Google Scholar]
- 19.Svahn T. M., et al. , “Breast tomosynthesis and digital mammography: a comparison of diagnostic accuracy,” Br. J. Radiol. 85(1019), e1074–e1082 (2012). 10.1259/bjr/53282892 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Mandelson M. T., et al. , “Breast density as a predictor of mammographic detection: comparison of interval- and screen- detected cancers,” J. Natl. Cancer Inst. 92(13), 1081–1087 (2000). 10.1093/jnci/92.13.1081 [DOI] [PubMed] [Google Scholar]
- 21.Yaffe M. J., et al. , “The myth of the 50-50 breast,” Med. Phys. 36(12), 5437–5443 (2009). 10.1118/1.3250863 [DOI] [PMC free article] [PubMed] [Google Scholar]














