Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Apr 21.
Published in final edited form as: Med Phys. 2021 Jan 13;48(3):1054–1063. doi: 10.1002/mp.14657

Rapid measurement of the low contrast detectability of CT scanners

Akinyinka Omigbodun 1, J Y Vaishnav 2, Scott S Hsieh 3,4,a)
PMCID: PMC8058889  NIHMSID: NIHMS1692220  PMID: 33325033

Abstract

Purpose:

Low contrast detectability (LCD) is a metric of fundamental importance in computed tomography (CT) imaging. In spite of this, its measurement is challenging in the context of nonlinear data processing. We introduce a new framework for objectively characterizing LCD with a single scan of a special-purpose phantom and automated analysis software. The output of the analysis software is a “machine LCD” metric which is more representative of LCD than contrast-noise ratio (CNR). It is not intended to replace human observer or model observer studies.

Methods:

Following preliminary simulations, we fabricated a phantom containing hundreds of low-contrast beads. These beads are acrylic spheres (1.6 mm, net contrast ~10 HU) suspended and randomly dispersed in a background matrix of nylon pellets and isoattenuating saline. The task was to search for and localize the beads. A modified matched filter was used to automatically scan the reconstruction and select candidate bead localizations of varying confidence. These were compared to bead locations as determined from a high-dose reference scan to produce free-response ROC curves. We compared iterative reconstruction (IR) and filtered backpropagation (FBP) at multiple dose levels between 40 and 240 mAs. The scans at 60, 120, and 180 mAs were performed three times each to estimate uncertainty.

Results:

Experimental scans demonstrated the feasibility of our technique. Our metric for machine LCD was the area under the exponential transform of the FROC curve (AUC). AUC increased monotonically from 0.21 at 40 mAs to 0.84 at 240 mAs. The sample standard deviation of AUC was approximately 0.02. This measurement uncertainty in AUC corresponded to a change in tube current of 4% to 8%. Surprisingly, we found that AUCs for IR were slightly worse than AUCs for FBP. While the phantom was sufficient for these experiments, it contained small air bubbles and alternative fabrication methods will be necessary for widespread utilization.

Conclusions:

It is feasible to measure machine LCD using a search task on a phantom with hundreds of beads and to obtain tight error bars using only a single scan. Our method could facilitate routine quality assurance or possibly enable comparisons between different protocols and scanners.

Keywords: dose reduction, low-contrast detectability, quality assurance

1. INTRODUCTION

How can we efficiently and objectively measure the low contrast detectability (LCD) in computed tomography (CT)? LCD captures the ability of a radiologist to detect a subtle lesion of clinical importance (such as metastatic cancer) for a fixed budget of radiation dose. Low contrast detectability is closely related to the more general system concept of dose efficiency. When the clinical task is the detection of subtle, low-contrast objects, improving LCD can be interpreted as improving dose efficiency. A host of technologies have appeared that attempt to improve LCD. Nonetheless, a simple and effective method for measuring LCD has not yet been devised. We will first describe four existing strategies for measuring LCD before introducing our work.

1.A. Existing strategies

One simple method for characterizing LCD is for a human reader to evaluate scans of a low contrast module of a quality assurance (QA) phantom. These phantoms contain lesions of varying contrast, and the reader can determine the minimum contrast or size for which detection remains possible. Commercial CT image quality phantoms, such as the ACR accreditation phantom (Gammex, Middleton, WI) or Catphan 600 (Phantom Laboratories, Salem, NY) are available for this purpose. However, that the locations of signals in these phantoms are known can lead to subjective bias in such evaluations. The resulting LCD measurements are neither objective nor reproducible: two human readers can disagree about whether a particular inset is “detectable.” This subjective element means that studies cannot easily be standardized, nor their results precisely reproduced; even the same study, repeated with the same readers, could yield different results.

A second simple method for characterizing LCD is to measure contrast-to-noise ratio (CNR). CNR can be an adequate surrogate for LCD under certain limited conditions. ACR accreditation mandates that a minimum CNR be achieved for certain protocols. However, CNR cannot be used to compare across different reconstruction algorithms. Contrast-to-noise ratio can be inflated using a smoother kernel even if the LCD is not substantially improved. Contrast-to-noise ratio depends only on contrast and noise, while actual LCD depends on multiple other factors including signal size, shape, and density distribution; background level, variability, and correlation; the variance and covariance of measurement noise; spatial resolution; reconstruction kernel; and the observer and detection strategy used. Contrast-to-noise ratio is not a suitable metric to use for direct comparison of two different vendors’ scanners, and is a poor choice with iterative reconstruction (IR) algorithms that claim a higher apparent resolution on high contrast objects than on low contrast anatomy.1 Even more recent FBP algorithms may contain proprietary or nonlinear preprocessing.2 Contrast-to-noise ratio is an objective measure that is simple to calculate, but is otherwise a poor surrogate for LCD.

A third, more realistic method for characterizing LCD is to measure the performance of radiologists at detecting low contrast lesions in actual clinical images. These measurements are closest to clinical reality but are very difficult to obtain. In the context of comparing FBP to IR, clinical reader studies have been performed, including the detection of urinary tract stones3 and metastatic cancer.4 Such studies are time-consuming, expensive, and often have poor statistical power. If a new technology offers a substantial (e.g., 25%) dose reduction, a clinical study may be able to detect its effect. However, an improved detector or antiscatter grid may offer only a 5% or 10% improvement in quantum efficiency; while such a technology would incrementally improve a CT scanner, a clinical study may not offer sufficient statistical power to demonstrate its benefit. For example, Goenka et al. concluded in an 18-reader study studying detection of low contrast inserts in an anthropomorphic phantom that 75% exposure was noninferior to 100% exposure for both iterative reconstruction and FBP.5 From an imaging physics standpoint, this seems unlikely: reducing exposure should always degrade detectability, although the effect may be small. However, with the limited sample size of these studies, a conclusion of noninferiority is understandable. A study by Fletcher et al. analyzing real clinical images showed similar results,6 with 60% exposure noninferior to 100% exposure. The measured data suggest decreasing reader performance with decreasing exposure, as expected, but the noninferiority criteria remained valid until 60% exposure. While one lesson from these studies is that exposure can be safely reduced without substantially impacting reader performance, another lesson is that small differences in exposure are not easily detected in reader studies of moderate size. Also, while human observer studies provide valuable information, they cannot practically be deployed in routine clinical practice because of their expense and time requirements.

For these reasons, a fourth method has been developed for characterizing LCD: mathematical anthropomorphic model observers. These algorithms attempt to mimic human performance for certain clinical tasks, and hence we refer to them as “anthropomorphic” in this work. Task-based metrics of image quality, as measured via the use of anthropomorphic model observers, have been applied in clinical CT, including for optimization of radiation dose and scanning protocols710 as well as assessment of the effect of iterative reconstruction algorithms on LCD.1114 A shortcoming of anthropomorphic model observers is that they tend to require tens to hundreds of images to generate statistically significant results, which poses an obstacle to their use in the context of quality assurance. Recent literature attempts to address this issue, via Bayesian estimation of figures of merit15 as well as the development of phantoms (21) with a simple, known design, and parametric observers specific to the phantom design; the method requires about 10 scans, and is not intended for images with random lumpy backgrounds or anatomical structures.

These four methods each have their strengths and weaknesses. Qualitative assessment is rapid but subjective. Contrast-to-noise ratio is useful under certain conditions but cannot be used to compare across scanners or different reconstruction algorithms. Human reader studies are valuable but resource intensive. Anthropomorphic model observers are a valuable adjunct but typically require tens to hundreds of scans and are not yet viable for routine quality assurance (QA). The ideal method for characterizing LCD would have four desirable characteristics: it would be objective, easy to run, useful for comparing across scanners or reconstruction algorithms, and predictive of human reader performance.

1.B. Proposed technique

We propose to build a phantom containing hundreds of low-contrast beads dispersed throughout its volume. The task is to search for and localize all of the low-contrast beads, which are at unknown locations. The search task was proposed by Popescu et al.16 for its increased statistical power. The clinical relevance of the search task is that many diagnoses rely on a search for a culprit lesion, such as a tumor or clot. We use hundreds of these beads in order to increase the statistical power of the exam.

The detection of the beads is performed by a computer algorithm rather than a human reader. Our algorithm is a modified matched filter, which is widely used for signal detection and is optimal under specific conditions. The bead localizations from the algorithm are compared against the ground truth to produce a summary statistic, which can be interpreted to be a measure of “machine LCD.” The summary statistic we used is area under a receiver operating characteristic curve (AUC of ROC).

Our proposed method for characterizing LCD satisfies three of our four aforementioned desirable characteristics: it is objective (producing a summary statistic that quantifies LCD), it is easy to run (requiring only one scan), and it is useful for comparing across scanners or reconstruction algorithms. It does not satisfy the final desirable characteristic of predicting human performance. Instead, it measures the ability of an idealized detector, the matched filter, to detect the low contrast beads. It may be possible to use an anthropomorphic model observer instead of the matched filter for localization. One option is a modified channelized Hotelling observer.17 However, that is outside the scope of this work.

We envision that our method can be used whenever a quantification of LCD is desired. Two specific examples include routine QA and protocol adjustment across a heterogeneous scanner fleet. Routine QA is appealing because it is performed on a daily or weekly basis and includes a measure of LCD, which is frequently CNR. Our method may be able to produce another metric that is more representative of LCD. Protocol adjustment across a heterogeneous scanner fleet is a challenge for many large institutions which include scanners from different vendors. One option that exists today is to attain a target CNR. However, the CNR depends on the reconstruction kernel, and a smooth kernel from one vendor may not match a smooth kernel from another vendor. Our method could be more robust to variations of kernel and could improve the uniformity of LCD across different scanners.

Many existing QA phantoms contain separate modules to measure resolution, low contrast, and CT number accuracy. We envision our approach as residing in a separate “machine LCD” module so that each time the QA phantom is scanned, the reconstruction of the machine LCD can be analyzed automatically with software. The LCD could thereby be easily quantified for the chosen protocol and scanner. However, for this exploratory work, instead we will build a proof-of-principle phantom using easily accessible components. Adoption into clinical practice would require a more robust, professionally developed phantom design.

2. MATERIALS AND METHODS

We simulated our methodology prior to phantom fabrication. These simulations do not mimic all aspects of our experimental methods but provide an illustrative reference for those that would like to reproduce our work, and the source code for the simulations is freely available. The simulations were used to validate the theoretical feasibility of using a search task with a dense arrangement of low contrast markers. Details of the simulations are described in the Appendix A.

2.A. Phantom fabrication

To realize our methodology experimentally, we fabricated a phantom consisting of small acrylic spheres (“beads”) interspersed in nylon pellets (used in APSX-PIM injection molding) placed in a 6.67% saline solution, which approximates the attenuation of the nylon pellets. The nylon pellets together with the saline solution form a nearly uniform background, which we will refer to as the nylon-saline matrix. Note that the term “matrix” is used here to denote a surrounding medium, and not to imply that the nylon pellets are placed in a regular repeating pattern. Additionally, the acrylic beads are not fixed in place among the nylon pellets, and are free to move with agitation. As detailed in Section 2.B and depicted in Fig. 1, a high-dose scan is used to determine the ground truth of bead locations in the phantom. The phantom was not moved between the high-dose scan and subsequence lower dose scans. The beads (Clear Scratch- and UV-Resistant Acrylic Balls, McMaster-Carr Supply Company, Elmhurst, Illinois) were 1.6 mm in diameter and functioned as the low-contrast markers within the nylon-saline matrix. The nylon pellets were shaped as right cylinders of approximately 2.3 × 2.6 mm. This size was chosen to be comparable to the size of the acrylic spheres. The nominal densities of nylon and acrylic are 1.15 and 1.18 g/cm3. While there were 200 beads in all, we ignored beads close to the container boundary or large air bubbles, leaving 171 beads in our detectability analyses.

Fig. 1.

Fig. 1.

(Left) Picture of the phantom before scanning. The saline level is higher than the matrix and has a green hue due to the addition of detergent. Neither fills the jar completely. (Center) High-dose reference scan of the object scanned at 750 mAs and peak voltage of 140 kV. White dots correspond to low contrast beads (three white arrows). Some targets are in adjacent slices so are only faintly visible in this slice. Small black circles are air bubbles. (Right) Same object scanned at 120 mAs. Many of these beads are now difficult to see; only one of the three objects remains detectable (single white arrow).

A major difference between our work and the design proposed by Popescu16 and other authors is the decision to detect small spheres rather than rods. Popescu proposed embedding five low contrast rods into a water phantom. In each slice of the reconstructed volume, ROIs would be extracted containing the rod at different locations, but with the ROI offset so that the rod position would be randomized. In this work, we use hundreds of small spheres. Within a single slice, both spheres and rods present circular profiles, so naively, one may expect that the difference between spheres and rods to be simply cosmetic. However, iterative reconstruction can exhibit a very narrow slice sensitivity profile18 for high contrast objects that may degrade for low contrast objects. One explanation for the dose reduction reported with IR is the presence of slice-to-slice correlations.19,20 To minimize these effects, we elected to use small spherical beads which can be contained within one slice. Another difference is that in our framework, the beads are not positioned throughout the phantom, whereas the Popescu phantom has rods equidistantly positioned from the center. This improves the uniformity of the noise and resolution characteristics for the rods. The radial resolution of CT depends on the distance from center due to radial sampling patterns and the line focus effect, and the noise also varies due to differing attenuation path lengths through the object and bowtie filter.

The plastic container holding the saline solution was 17 cm in height and 9 cm in diameter. Air bubbles randomly form in our phantom during fabrication and are a contaminant. To reduce air bubble formation, we introduced a small quantity of liquid detergent (Dawn Professional Manual pot and pan detergent), a surfactant, into the saline solution to reduce the surface tension. We introduced 3 g of detergent during fabrication and 2 g shortly before scanning the phantom 1 week later, for a total of 5 g of detergent. While the surfactant causes foam to form at the surface, it simultaneously reduces the probability of bubble formation beneath the surface. Some air bubbles still remain in the phantom. These air bubbles produce negative contrast, so that they can be disambiguated from the acrylic beads, which produce positive contrast. Other investigators have been able to eliminate air bubble formation using other fabrication procedures.21

We scanned at a peak voltage of 140 kV in order to minimize the attenuation mismatch between the saline and nylon. The center panel of Fig. 1 shows a slight attenuation mismatch between the 6.67% saline solution and nylon in a high-dose scan. However, this is so slight that we do not believe that it affected our results.

By placing the plastic container in a larger holder and using beads of different sizes, our fabrication technique could be adapted to be a model of specific clinical tasks. If the container were placed into a larger 35 × 20 cm anthropomorphic phantom, and the beads increased to several mm in width, there would be resemblance to the detection of metastatic cancer in the liver. In these experiments, we scanned the plastic container by itself, with a size that is more comparable to a pediatric head scan. The detection of the small hyperdense spheres in this context is similar to the task of detecting small internal bleeds which may present after trauma, such as hemorrhagic parenchymal contusions. These bleeds appear as millimeter-sized dots of positive contrast which reflect the increased attenuation of blood relative to the brain background.22

2.B. Acquisition protocol

The phantom was scanned with the abdomen protocol in a Siemens Force scanner. We scanned using an abdomen protocol from 40 to 240 mAs with 20 mAs increments. All scans were performed at a peak voltage of 140 kV, where we found the best match between saline and nylon. The scans at 60, 120, and 180 mAs were repeated three times to estimate the repeatability. The phantom was also scanned twice at 375 mAs; these scans were averaged together to serve as a high-dose reference scan, effectively at 750 mAs. We used a slice thickness of 3 mm with increments of 1.5 mm. The FBP reconstructions were performed using a B44 kernel, and IR reconstructions also included ADMIRE strength 3. This ensured that the signal from an acrylic bead (1.6 mm) would be almost entirely contained in one slice, so that a 2D analysis could be applied directly to detect the beads. If the slice thickness was comparable to the bead diameter and a 2D analysis was still used, beads that were centered on a slice would have been easier to detect than beads that were split between two slices. While some evidence has pointed to the validity of 2D anthropomorphic model observers in a multi-slice environment,23 we felt that the simplest solution was to avoid this complication using a thicker slice.

2.C. Image processing and bead localization

The beads were localized using a modified matched filter. Figure 2 describes the major steps in our localization process.

Fig. 2.

Fig. 2.

Image processing workflow. The (bottom left) high-dose reference scan shows low contrast beads (two examples shown with white arrows) and an air bubble (black arrow). Starting from an (top left) axial slice of the reconstructed volume, we preprocess the image by segmenting out regions near the air or phantom edge, and we subtract the background. Next, we apply a matched filter (top center) by convolution with the template image. The template image is the averaged signal modified by the noise power spectrum (NPS) following the theory of the matched filter. Finally, we apply various thresholds. A high threshold (top right) selects only localizations of high confidence, but misses many beads. A low threshold (bottom right) detects more beads but also has more false positives.

The first step is the subtraction of the background. For an ideal CT scanner, the background would be uniform throughout the nylon-saline matrix. In practice, subtle shading artifacts related to helical interpolation, beam hardening, or scatter can occur. Each image was first clamped to a range between 70 and 100 HU. In particular, air voxels were set to 70 HU so that they do not unduly influence the estimate of the background. The nylon-saline matrix was approximately 85 HU, so the clamping process was not expected to distort the average value of the background. The clamped image was then convolved with a Gaussian filter with a standard deviation of 11 pixels. The background estimate was then subtracted from the initial image. The background changes from scan to scan because of varying gantry rotation start angle and associated helical interpolation artifacts. Therefore, a separate estimate of the background must be produced for each scan.

The background-subtracted image was then convolved with a matched filter template image. The matched filter template image can be derived from an estimate both of the bead signal and the background noise. Beads were first detected in the 750 mAs reference scan by implementing a naïve-matched filter assuming white noise and a circular template three pixels in radius (2.0 mm in diameter). Because the beads do not present circular signals and the noise is not white, this naïve-matched filter was expected to have imperfect statistical qualities; however, the reference scan has sufficiently low noise that these assumptions appeared sufficient. A total of 171 beads were detected in the high-dose image, and the localizations were visually inspected to ensure that they were reasonable.

We calculated a separate matched filter from both the reference FBP reconstruction and IR reconstruction. We reasoned that IR might have different signal and noise properties, and that a matched filter derived from IR would perform better for localizing beads in IR images. In practice, we noticed very little difference, and we simply used the matched filter derived from FBP throughout. However, we did find that the average bead signal was weaker in IR. Figure 3 shows the average bead signal with both FBP and IR, calculated by averaging together a small region of interest (ROI) that surrounded all 171 beads. The peak value with FBP is approximately 9 HU, but the peak value with IR was slightly less, at about 7 HU. Note that the reconstructed slice is approximately twice as thick as the diameter of the acrylic sphere, and hence the intrinsic contrast of our beads is larger than these values.

Fig. 3.

Fig. 3.

(Top) Average bead signals for both FBP and iterative reconstruction (IR). Windowing is from 0 to 10 HU. (Bottom) The line profile compares the intensity along the central line (i.e., at y = 0).

Afterward, each bead was averaged together to produce an estimate of the bead signal (“average signal” in Fig. 2). The background noise was estimated by sampling 2000 random background patches with replacement in the high-dose reference scan near the center of the object. Air pockets and beads were avoided in the sampling process. The noise power spectrum of each of these patches was calculated from the Fourier transform of each patch and averaged. Compared to a typical noise power spectrum, our empirical noise power spectrum has additional power at low frequencies which may stem from the “anatomical noise” of the mismatch between nylon and saline solution. To further reduce effects from artifacts or imperfect background subtraction, we limited the template image to be small, 19 by 19 pixels (6.4 × 6.4 mm). We expected that the bead signal would be local and did not anticipate performance loss from this choice. Under certain conditions of an unstructured noise power spectrum and a known bead signal, the matched filter provides a template image that, when convolved with the phantom, maximizes the detectability of the bead. If the noise were white, the template image would be identical to the averaged signal. Our noise power spectrum does not correspond perfectly to white noise, and hence the template image is different from the averaged signal, although the differences are not large as seen in Fig. 2. We derived the template image from the reference FBP scan. We thought that it may be possible that detections with IR would improve if the template image and noise pattern were instead derived from the reference IR scan, but when we tested this hypothesis, we observed virtually no advantage, and the detection over most reconstructions decreased slightly: the average AUC of FBP and IR detections decreased by 0.025 and 0.023, respectively.

After convolution with the template image, the image was thresholded by a variable T. Regions of the image above T were processed with connected components analysis, and the center of mass of each region above T was calculated to produce a candidate localization. Areas of the image with high suspicion would lie above the threshold. T would be varied in the ROC analysis, to be described in the next section.

Finally, areas of the image that were contaminated with air bubbles were discarded. Air was detected in the reference scan as portions of the image below 50 HU. Any part of the scan within 30 pixels of air (10 mm) was considered possibly contaminated. This 30-pixel boundary was chosen to be large enough to also contain the edge of the phantom cylinder, which includes hyperdense plastic. For visualization purposes, Fig. 2 shows the air regions discarded in the preprocessing step, but in the actual implementation the air regions were discarded after the thresholding step. Some very small air bubbles do not reach below 50 HU because their size was much smaller than the spatial resolution of the scanner. These were ignored, as we expected their impact on localization would be small.

The matched filter was implemented on 2D axial images only. Because the volume was reconstructed with 3.0 mm slice thickness and 1.5 mm increment, nearly all of the signal of a 1.6 mm diameter bead was contained in at least one of the slices. In the worst case, a small spherical cap with a height of 0.1 mm would be absent, but this spherical cap contains approximately 1% of the total volume of the sphere. The connected component analysis operated in three dimensions (MATLAB function bwlabeln using 26-connectivity). After thresholding, any voxels that were touching each other by sharing either a common face or a common corner were merged together as one entity. Groups of voxels that were above the threshold in multiple adjacent slices would thereby be merged together as one entity during the connected components process, and the final localization was the 3D center of mass. This was rounded to the nearest integer pixel and slice.

Example localizations from the matched filter are shown in Fig. 4. The false positives in this figure often appear to be plausible locations for the real beads, whereas the false negatives show little indication of a missed bead.

Fig. 4.

Fig. 4.

Example ROIs of (top row) true positives, (middle row) false negatives, (bottom row) false positives from the localization process. The images are shown in pairs, with the left half of each pair from the 120 mAs FBP scan, and the right half showing the high-dose reference image. Each ROI is 14 by 14 mm.

2.D. Analysis of FROC and resampling error estimates

The FROC curve was produced by varying the threshold T. For each choice of T, the number of true positives, false positives, and true negatives was calculated. A localization was considered a true positive if it was found within 2 mm of a bead detected in the reference scan.

For increased statistical power, we transformed the FROC curve using an exponential function, yielding what has previously been described as the EFROC curve.24 In a FROC curve, the sensitivity is plotted against the number of false positives in the volume. However, there can be an arbitrarily high number of false positives in the volume. In EFROC curves, the sensitivity is instead plotted against 1 – exp (−ν), where ν is the ratio of false positive localizations to total number of beads present in the volume. This transforms the abscissa from being semi-infinite to being contained in the interval from 0 to 1. The area under the EFROC curve (AUC) is a summary statistic that parameterizes the LCD of the scan. EFROC was chosen because it has been shown to have slightly better statistical power when compared with other methodologies such as AFROC (alternative FROC).25

Under certain conditions, the AUC of the EFROC curve has a statistical uncertainty that can be calculated analytically.24 Our experiments do not meet these conditions because of the presence of artifacts and imperfectly subtracted background. Instead, we estimate the uncertainty empirically using the sample standard deviation of three repeated measurements at 60, 120, and 180 mAs.

3. RESULTS

3.A. FROC performance analysis

Figure 5 shows the exponential-transform FROC (EFROC) curves. As expected, detectability, as measured by AUC, improves significantly with increasing dose. Detectability is slightly worse with iterative reconstruction, although the difference is small. Table I compares the AUC values and their uncertainties at 60, 120, and 180 mAs for both FBP and iterative reconstruction. In all cases, the sample standard deviations are small. Visual inspection of Fig. 5 shows that the AUC monotonically increases with tube current, and the AUCs fit well to a parabolic curve.

Fig. 5.

Fig. 5.

(Left) EFROC curves from 40 to 240 mAs in 20 mAs increments. The shading of the line corresponds to tube current, with low currents a lighter cyan and higher currents a darker red (color available online). All curves correspond to FBP reconstruction. (Right) Area under the EFROC curves (AUC) as a function of mAs for both FBP and IR reconstructions. Error bars at 60, 120, and 180 mAs are plus or minus 1 sample standard deviation. Dashed line is a parabola of best fit for FBP reconstruction.

Table I.

Area under the EFROC curve (AUC) at three different tube currents.

mAs AUC (μ±σ), FBP AUC (μ±σ), IR /dI(mAs−1) Implied σI (mAs)
60 0.30 ± 0.012 0.29 ± 0.016 0.0055 2.2
120 0.61 ± 0.023 0.59 ± 0.023 0.0037 6.2
180 0.76 ± 0.028 0.71 ± 0.026 0.0019 15

Each AUC measurement is estimated from three scans. μ and σ refer to sample mean and sample standard deviation of AUC, respectively. /dI is the estimated change in AUC with respect to an increase of 1 mAs according to the parabolic fit of the FBP data in Fig. 5. Implied σI is the uncertainty in AUC propagated to the uncertainty in tube current.

The AUC is consistently worse with iterative reconstruction. In the plots shown, the template image for the matched filter was derived from the high-dose reference FBP scan. We hypothesized that this might lead to bias because the signal and noise characterized would not be matched to the IR image. However, when we derived the matched filter from IR and repeated the experiments, the results were essentially unchanged.

Figure 5 shows that improvements in AUC decrease at higher AUC, and improvements past an AUC of 0.80 are marginal. One explanation is imperfectly subtracted artifacts or the presence of background structure that might impede detection of some signals. For example, a faint positive streak artifact present in the image might lead to increased localizations along the streak. These artifacts or background noise characteristics would not be reduced by improvements in quantum statistics that stem from increased mAs. It seems that our method is more useful in the moderate mAs regime than the high mAs regime, because at high mAs further improvements in AUC are limited by the background noise structure or artifacts.

3.B. Sensitivity of AUC to changes in tube current

In Table I, we propagate the error in AUC into estimated errors in tube current. The change in AUC is estimated from the parabolic fit, and the error of AUC was taken pointwise at each measurement. The rightmost column, σI, represents the change in the tube current that corresponds to 1 standard deviation of measurement error in AUC. We found that σI ranges from 4% and 8% of the total tube current. Changes in tube current smaller than this will result in a statistically insignificant change in AUC.

At face value, Table I suggests that the sensitivity of the technique is best at low mAs. We would caution against this interpretation because the uncertainties in Table I are calculated from only three points, and they themselves have substantial statistical error. The purpose of Table I was not primarily to identify trends in sensitivity but to ensure that our proposed technique can be applied across a wide operating range. Figure 5 similarly shows that the deviation of each AUC measurement from a smooth interpolating curve, the dashed parabola, is modest. The root mean square error from the interpolating curve is 0.024, which is consistent with the errors found in Table I. Together, Table I and Fig. 5 indicate that the proposed technique is sensitive to changes in tube current.

4. DISCUSSION

We have demonstrated the feasibility of estimating the LCD of CT scanners using a custom phantom with hundreds of low-contrast beads. The goal of this work was to supplement existing methods for measuring LCD which are either inappropriate for nonlinear reconstruction methods (e.g., measurement of CNR) or subjective (e.g., judgment of the visibility of low contrast inserts in known locations). Our technique is designed to be compatible with daily or weekly quality assurance and can be performed with a single scan of a phantom. At present, our technique is not intended to measure differences in human perception but rather differences in scanner efficiency. We use a matched filter for bead localization rather than anthropomorphic model observers that might better mimic human performance. Elements of perception such as the eye filter were not used in this work.

Our work is the proof-of-principle, and several obstacles will need to be overcome before this could be adopted for routine clinical use. The most important obstacle is perhaps the construction of the phantom. While our phantom fabrication method was accessible and eliminated the need for specialized equipment, it had several drawbacks. Air bubble contaminants were present that were reduced but not eliminated by the use of detergent as a surfactant. With greater control over the makeup of the low contrast objects and phantom stability in view, we believe a better fabrication approach may be multi-material 3D printing, which has already been used to create anthropomorphic phantoms.26 For widespread adoption, we believe that it will be necessary to employ commercial processes for QA phantom construction, which are carefully controlled to avoid nonidealities.

The phantom is designed to house hundreds of low-contrast beads in a uniform background; the beads are automatically localized using image processing software. While our phantom was a long cylinder, it could be adapted into a shorter, wider form factor that is typical of a module in a quality assurance phantom. We use localizations of varying confidence to build out EFROC curves and calculate an AUC summary statistic. We have found that the uncertainty in these AUC measurements is small compared to changes in dose. We estimate that by measuring AUC, the tube current can be determined with a standard deviation of about 5% (operating near 120 mAs). Multiple repeat scans can be used to further decrease stochastic variability if smaller error bars are desired.

To reduce artifactual localizations, we applied background subtraction and eliminated beads near air bubbles. Nonetheless, imperfect localizations may still remain. It will be important in future work to evaluate and improve the robustness of the image processing algorithms so that they fairly evaluate the LCD of the reconstruction without being confounded by the artifacts in the scan. We observed faint streaks and bands in the reconstruction, especially near the edges of the volume, which could be the result of helical interpolation artifacts.

Figure 5 shows that FBP consistently outperformed IR by a small margin. This difference is small and is more easily seen when FBP and IR are reconstructed from the same raw dataset rather than from different noise realizations at the same protocol, as is shown in Table I. The statistical significance of this effect is difficult to determine because the noise between FBP and IR is highly correlated when reconstructed from the same dataset. This can be contrasted with recent volumetric task-based assessments using human observers, which show a small advantage for IR of about 10%.5,27 One difference between our experiments and past work is that our objects (1.6 mm diameter) are thinner than the thickness of one slice (3 mm). Model observer studies provided by the vendors studying single thin slices of large objects have predicted large benefits from IR.2831 However, it is known that the 3D regularization process in IR allows blending from adjacent slices, and under certain assumptions, iterative reconstruction can be modeled as a linear filter of FBP images.20 If this regularization improves the detectability of thick low contrast objects by increasing the effective slice thickness, it is reasonable to expect that it might simultaneously degrade detectability of very thin objects by the same mechanism. Consistent with this hypothesis, Fig. 3 shows that the beads appear approximately 20% dimmer under IR than they do under FBP. To our knowledge, IR has never been compared to FBP for the detection of very thin objects. While our work is suggestive, it is by no means conclusive because we used a matched filter rather than a human reader study or a human-mimicking model observer.

Other groups are also pursuing the goal of rapidly estimating LCD from a small number of scans. Anton et al.32 use a modified parametric model observer that is able to obtain a reasonably tight bound on AUC using about 10 scans. Ma et al find that a channelized Hotelling observer can directly obtain sufficiently accurate estimates of AUC using as few as 10 scans with appropriate selection of the channels, compared to 80 scans if the channels are naively selected.33

In summary, we have described a method for rapidly estimating the LCD of a CT scanner. This method can be placed in the context of other tools to measure LCD. CNR is widely used today because of simplicity and in spite of its known problems. Fourier domain metrics have been a staple of performance assessment in previous decades but may not be valid in the era of advanced reconstruction. Clinical performance remains a gold standard but remains impractical to study outside of a controlled research scenario and cannot easily be powered to detect small improvements in LCD. Existing experimental protocols using anthropomorphic model observers are well validated but are currently cumbersome to use on a routine basis. We submit that our proposed method offers a useful compromise between expediency and accuracy and could be adapted to automatically run with every scan of a quality assurance phantom. Future developments in fabrication or validation, as we have outlined above, could further improve the accuracy of this phantom for estimating LCD.

ACKNOWLEDGMENTS

This research was supported by funding from the UCLA Radiological Sciences Exploratory Research Program. The authors also acknowledge support from the National Institutes of Health under award number U24EB028936.

APPENDIX A

SIMULATIONS

The source code for the simulations is released at https://github.com/scotthsiehucla/lcd-sim-framework. All codes were written in MATLAB (The MathWorks, Natick, MA).

Our simulations assume an ideal, two-dimensional parallel-beam scanner. Several low-contrast circular inserts (“beads”) are randomly interspersed throughout the volume, with the restriction that each insert cannot be placed too close to an existing insert. Noise is injected into the sinogram and the reconstruction is performed either with FBP or IR. In IR, we assume that the algorithm has knowledge of the same system matrix that is used to generate the sinogram. This is an acceptable approximation for low contrast objects, but it is a simplification for high contrast objects, where a mismatch between the system matrix and physical reality can lead to high-frequency ringing artifacts in the reconstructed volume. Our IR algorithm was an adaptation of Thibault et al.11 in two dimensions and uses iterative coordinate descent to minimize each voxel in the cost function sequentially. A Huber function was used in regularization. The transition from linear to quadratic was adjusted to avoid the cartoon-like noise patchiness typical of total variation reconstruction, and the strength of the Huber function was empirically tuned. The iterative algorithm was not designed for computational efficiency and for these reasons we restricted the simulations to a small matrix size of 129 × 129. Four iterations were used.

After reconstruction, we performed localization by convolution with a template image. In the other portions of the manuscript, a matched filter was used. However, in simulations, we simply convolved with a plain circle, which would be equal to the matched filter in the scenario of white noise. This is analogous to the non-prewhitening (NPW) observer. These localizations are tabulated against ground truth and used to produce free-response receiver operating characteristic (FROC) curves. We compared FBP and IR in two scenarios: asymmetric and uniform noise. In the asymmetric noise condition, we injected additional noise along one axis. This may resemble the noise between the shoulders, or the noise in an elliptical phantom in the absence of tube current modulation. The uniform noise condition assumes all noise statistics are the same. Because IR models the noise directly, we expect the improvement in IR to be larger in the asymmetric noise condition.

Figures 6 and 7 show results from our MATLAB simulations. The improvement in detectability from iterative reconstruction (IR) is greater under asymmetric noise conditions than uniform noise conditions.

Fig. 6.

Fig. 6.

Reconstructions from the simulation code. The original image is shown at the far left, with a variety of low contrast cylinders embedded into the volume. The datasets for both asymmetric (streaky) and uniform noise statistics are reconstructed in two different ways, using both FBP and IR.

Fig. 7.

Fig. 7.

Free-response receiver operating characteristic (FROC) curves using the model observer. The number of false positives (x-axis) refers to the number of false localizations across the entire image.

Footnotes

CONFLICT OF INTEREST

Jay Vaishnav is an employee of Canon Medical Systems. Akinyinka Omigbodun and Scott Hsieh have no conflict of interest to disclose.

Contributor Information

Akinyinka Omigbodun, Department of Radiological Sciences, UCLA, Los Angeles, CA 90024, USA.

J. Y. Vaishnav, Canon Medical Systems, U.S.A., Tustin, CA 92780, USA

Scott S. Hsieh, Department of Radiological Sciences, UCLA, Los Angeles, CA 90024, USA; Department of Radiology, Mayo Clinic, Rochester, MN 55902, USA.

REFERENCES

  • 1.Yu L, Vrieze TJ, Leng S, Fletcher JG, McCollough CH. Measuring contrast-and noise-dependent spatial resolution of an iterative reconstruction method in CT using ensemble averaging. Med Phys. 2015;42:2261–2267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Kachelrieß M, Watzke O, Kalender WA. Generalized multi-dimensional adaptive filtering for conventional and spiral single-slice, multi-slice, and cone-beam CT. Med Phys. 2001;28:475–490. [DOI] [PubMed] [Google Scholar]
  • 3.Pooler BD, Lubner MG, Kim DH, et al. Prospective trial of the detection of urolithiasis on ultralow dose (sub mSv) noncontrast computerized tomography: direct comparison against routine low dose reference standard. J Urol. 2014;192:1433–1439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Pickhardt PJ, Lubner MG, Kim DH, et al. Abdominal CT with model-based iterative reconstruction (MBIR): initial results of a prospective trial comparing ultralow-dose with standard-dose imaging. AJR Am J Roentgenol. 2012;199:1266–1274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Goenka AH, Herts BR, Obuchowski NA, et al. Effect of reduced radiation exposure and iterative reconstruction on detection of low-contrast low-attenuation lesions in an anthropomorphic liver phantom: an 18-reader study. Radiology. 2014;272:154–163. [DOI] [PubMed] [Google Scholar]
  • 6.Fletcher JG, Fidler JL, Venkatesh SK, et al. Observer performance with varying radiation dose and reconstruction methods for detection of hepatic metastases. Radiology. 2018;289:455–464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Richard S, Li X, Yadava G, Samei E. Predictive models for observer performance in CT: Applications in protocol optimization. 2011;7961:79610H. [Google Scholar]
  • 8.McCollough CH, Chen GH, Kalender W, et al. Achieving routine sub-millisievert CT scanning: report from the summit on management of radiation dose in CT. Radiology. 2012;264:567–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wunderlich A, Noo F. Image covariance and lesion detectability in direct fan-beam x-ray computed tomography. Phys Med Biol. 2008;53:2471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Racine D, Ryckx N, Ba A, et al. Task-based quantification of image quality using a model observer in abdominal CT: a multicentre study. Eur Radiol. 2018;28:5203–5210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Yu L, Leng S, Chen L, Kofler JM, Carter RE, McCollough CH. Prediction of human observer performance in a 2-alternative forced choice low-contrast detection task using channelized hotelling observer: Impact of radiation dose and reconstruction algorithms. Med Phys. 2013;40:041908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Vaishnav J, Jung W, Popescu L, Zeng R, Myers K. Objective assessment of image quality and dose reduction in CT iterative reconstruction. Med Phys. 2014;41:071904. [DOI] [PubMed] [Google Scholar]
  • 13.Samei E, Bakalyar D, Boedeker KL, et al. Performance evaluation of computed tomography systems: summary of AAPM task group 233. Med Phys. 2019;46:e735–e756. [DOI] [PubMed] [Google Scholar]
  • 14.Tseng H, Fan J, Kupinski MA, Sainath P, Hsieh J. Assessing image quality and dose reduction of a new x-ray computed tomography iterative reconstruction algorithm using model observers. Med Phys. 2014;41:071910. [DOI] [PubMed] [Google Scholar]
  • 15.Reginatto M, Anton M, Elster C. Assessment of CT image quality using a Bayesian approach. Metrologia. 2017;54:S74. [DOI] [PubMed] [Google Scholar]
  • 16.Popescu LM, Myers KJ. CT image assessment by low contrast signal detectability evaluation with unknown signal location. Med Phys. 2013;40:111908. [DOI] [PubMed] [Google Scholar]
  • 17.Leng S, Yu L, Zhang Y, Carter R, Toledano AY, McCollough CH. Correlation between model observer and human observer performance in CT imaging when lesion location is uncertain. Med Phys. 2013;40:081908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Thibault JB, Sauer KD, Bouman CA, Hsieh J. A three-dimensional statistical approach to improved image quality for multislice helical CT. Med Phys. 2007;34:4526. [DOI] [PubMed] [Google Scholar]
  • 19.Hsieh SS, Chesler DA, Fleischmann D, Pelc NJ. A limit on dose reduction possible with CT reconstruction algorithms without prior knowledge of the scan subject. Med Phys. 2016;43:1361–1368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Divel SE, Hsieh SS, Wang J, Pelc NJ. Can image-domain filtering of FBP CT reconstructions match low-contrast performance of iterative reconstructions? 10573:1057314; 2018. [Google Scholar]
  • 21.Cockmartin L, Marshall NW, Zhang G, et al. Design and application of a structured phantom for detection performance comparison between breast tomosynthesis and digital mammography. Phys Med Biol. 2017;62:758. [DOI] [PubMed] [Google Scholar]
  • 22.Heit JJ, Iv M, Wintermark M. Imaging of intracranial hemorrhage. J Stroke. 2017;19:11–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Yu L, Chen B, Kofler JM, et al. Correlation between a 2D channelized hotelling observer and human observers in a low-contrast detection task with multislice reading in CT. Med Phys. 2017;44:3990–3999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Popescu LM. Nonparametric signal detectability evaluation using an exponential transformation of the FROC curve. Med Phys. 2011;38:5690–5702. [DOI] [PubMed] [Google Scholar]
  • 25.Chakraborty DP. Maximum likelihood analysis of free-response receiver operating characteristic (FROC) data. Med Phys. 1989;16:561–568. [DOI] [PubMed] [Google Scholar]
  • 26.Leng S, McGee K, Morris J, et al. Anatomic modeling using 3D printing: quality assurance and optimization. 3D Print Med. 2017;3:6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Solomon J, Marin D, Roy Choudhury K, Patel B, Samei E. Effect of radiation dose reduction and reconstruction algorithm on image noise, contrast, resolution, and detectability of subtle hypoattenuating liver lesions at multidetector CT: filtered back projection versus a commercial model–based iterative reconstruction algorithm. Radiology. 2017;284:777–787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Siemens Medical Solutions. 510(k) summary for SAFIRE. https://www.accessdata.fda.gov/cdrh_docs/pdf10/K103424.pdf. Updated 2010.
  • 29.Philips Medical Systems. 510(k) summary for iMR software application. https://www.accessdata.fda.gov/cdrh_docs/pdf12/K123576.pdf. Updated 2012.
  • 30.GE Healthcare. 510(k) summary for ASiR-V. https://www.accessdata.fda.gov/cdrh_docs/pdf13/K133640.pdf. Updated 2013.
  • 31.Canon Medical Systems. 510(k) summary for Aquilion ONE vision with FIRST 1.0. https://www.accessdata.fda.gov/cdrh_docs/pdf15/K151673.pdf. Updated 2015.
  • 32.Anton M, Khanin A, Kretz T, Reginatto M, Elster C. A simple parametric model observer for quality assurance in computer tomography. Phys Med Biol. 2018;63:075011. [DOI] [PubMed] [Google Scholar]
  • 33.Ma C, Yu L, Chen B, Favazza C, Leng S, McCollough C. Impact of number of repeated scans on model observer performance for a low-contrast detection task in computed tomography. J Med Imaging. 2016;3:023504. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES