Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Apr 1.
Published in final edited form as: AJR Am J Roentgenol. 2021 Feb 10;216(4):894–902. doi: 10.2214/AJR.19.22429

The “Sweet Spot” Revisited: Optimal Recall Rates for Cancer Detection With 2D and 3D Digital Screening Mammography in the Metro Chicago Breast Cancer Registry

Garth H Rauscher 1, Anne Marie Murphy 2, Qiong Qiu 3, Therese A Dolecek 1, Katherine Tossas 4, Yanyang Liu 3, Nila H Alsheik 3
PMCID: PMC8087168  NIHMSID: NIHMS1694944  PMID: 33566635

Abstract

OBJECTIVE.

One central question pertaining to mammography quality relates to discerning the optimal recall rate to maximize cancer detection while minimizing unnecessary downstream diagnostic imaging and breast biopsies. We examined the trade-offs for higher recall rates in terms of biopsy recommendations and cancer detection in a single large health care organization.

MATERIALS AND METHODS.

We included 2D analog, 2D digital, and 3D digital (tomosynthesis) screening mammography examinations among women 40–79 years old performed between January 1, 2005, and December 31, 2017, with cancer follow-up through 2018. There were 36, 67, and 38 radiologists who read at least 1000 2D analog examinations, 2D digital examinations, and 3D tomosynthesis examinations, respectively, who were included in these analyses. Using logistic regression with marginal standardization, we estimated radiologist-specific mean recall (abnormal interpretations/1000 mammograms), biopsy recommendation, cancer detection (screening-detected in situ and invasive cancers/1000 mammograms), and minimally invasive cancer detection rates while adjusting for differences in patient characteristics.

RESULTS.

Among 1,060,655 screening mammograms, the mean recall rate was 10.7%, the cancer detection rate was 4.0/1000 mammograms, and the biopsy recommendation rate was 1.60%. Recall rates between 7% and 9% appeared to maximize cancer detection while minimizing unnecessary biopsies.

CONCLUSION.

The results of this investigation are in contrast to those of a recent study suggesting appropriateness of higher recall rates. The “sweet spot” for optimal cancer detection appears to be in the recall rate range of 7–9% for both 2D digital mammography and 3D tomosynthesis. Too many women are being called back for diagnostic imaging, and new benchmarks could be set to reduce this burden.

Keywords: breast cancer, mammography, quality improvement, recall rate, screening


Screening mammograms are one of the most commonly performed screening imaging studies, with approximately 65.3% of U.S. women 40 years old or older undergoing at least one between 2013 and 2015 [1]. The federal government has regulated the quality of the screening mammography process since 1992 with the creation of the Mammography Quality Standards Act (MQSA), which has since undergone several extensions [2]. In addition to the MQSA requirements, the American College of Radiology (ACR) has additional quality benchmarks on the recall rate (proportion of screening mammography examinations with abnormal findings) and screen-detection rate (cancer detection rate for screening mammography). Other screening mammography benchmarks include the PPV of cancer in mammograms interpreted as abnormal (PPV1), the PPV of biopsy recommendation (PPV2), the PPV of documented biopsy (PPV3), and detection of small and early-stage tumors [3, 4]. The goal of these benchmarks is to set an optimal recall rate that would maximize cancer detection while minimizing unnecessary downstream diagnostic imaging studies and breast biopsies. However, despite these efforts, the recall rates in the United States have remained remarkably high.

With any screening test, it is important to minimize overutilization of unnecessary downstream diagnostic procedures that come with potential physical and psychologic morbidity and additional financial costs to the patient and the health care system. Among the four types of cancer screenings recommended by the U.S. Preventive Services Task Force, breast cancer screening has a cancer detection rate of approximately 0.5%, which is one of the lowest when compared with the cancer detection rates of cervical cancer, lung cancer, and colon cancer screenings [57].

Multiple analyses of the U.S. Breast Cancer Surveillance Consortium (BCSC) based on film-screen mammography suggested that an optimal recall rate is between 5% and 12% [8, 9]. Recent data from the National Mammography Database suggest that, on average, approximately 10% of screening mammograms in the United States are interpreted as showing abnormal findings [10]. Another recent study published in 2017 reviewed the performance of digital screening mammography across six BCSC registries composed of more than 1.68 million digital screening mammography examinations of 792,808 women and found that the abnormal interpretation rate (AIR) was around 11.6% with a cancer detection rate of 5.1/1000 mammograms [9]. However, the AIRs were higher than the recommended for almost one-half of radiologists interpreting screening mammograms, and there were wide variations in AIRs across the individuals and facilities [9]. In a study of 5.5 million mammograms from 1996 to 1999, recall rates were approximately twice as high in the United States as compared with the United Kingdom across all age groups with similar cancer detection rates [11].

There is no consensus about what the optimal recall call rate should be. A 2017 study by Grabler et al. [12], conducted in a single academic center in the era of digital mammography, suggested that the so-called “sweet spot” for optimal cancer detection is in the recall range of 12% to less than 14%, which is higher than the 5–12% reported by the BCSC [8, 9]. Interestingly, Grabler and colleagues also reported a linear correlation between recall rate and cancer detection between recall rates of 7% and 14%. In contrast, a 2019 U.K. study of 11.3 million screening mammograms conducted between 2009 and 2016 set the optimum incident screening recall rate to be around 3.1% (range, 2.6–4%) and reported a diminishing rate of return for higher recall rates for the detection of invasive and microinvasive cancers and high-grade ductal carcinoma in situ [13]. Nevertheless, international comparisons are difficult because of differences in screening requirements. For instance, in the United Kingdom, screening mammography is typically double-read and is performed at 3-year intervals.

Many studies on screening mammogram recall rates do not account for the potential negative trade-offs related to higher biopsy recommendation rates and instead focus solely on the relationship between recall rates and cancer detection rates. We conducted a retrospective analysis of 2D analog, 2D digital, and 3D tomosynthesis screening mammograms interpreted by community radiologists in a large community-based and geographically diverse health care organization. The goal was to compare the trade-offs between recall, biopsy recommendation, and cancer detection to identify a range of recall rates that might optimize cancer detection while minimizing additional diagnostic imaging and biopsy procedures.

Materials and Methods

This retrospective study has been described previously [14, 15]. The dataset for this institutional review board–approved HIPAA-compliant research was composed of 1,060,655 eligible screening mammograms with greater than 12-month follow-up performed at a single community-based health care network. Informed consent was waived by the institutional review board of Advocate Health Care. The mammograms were single-read and interpreted by a mix of general radiologists and dedicated breast imaging specialists.

The image dataset included bilateral 2D analog examinations (n = 205,817), 2D digital examinations (n = 682,352), and 3D digital tomosynthesis examinations (n = 172,486) performed between January 1, 2005, and December 31, 2017. In order to ensure reasonably stable cancer detection rates, an eligible radiologist was defined as one with documented evidence of at least 1000 screening mammography examination reads for one or more of the modalities of interest (i.e., 2D analog, 2D digital, and 3D tomosynthesis). Eligible patients were women who were 40–79 years old and did not have a history of breast cancer at the index screening examination.

The radiology dataset includes patient demographics, risk factor data, the procedure performed, imaging results with unique identifiers for the facility and the interpreting radiologist, and biopsy results for each screening and diagnostic procedure performed. Each mammography examination was interpreted by the reading radiologist and was assigned a score using the ACR’s BI-RADS. BI-RADS assessment categories for screening and diagnostic mammography range from 0 to 5 as follows: category 0, need additional imaging evaluation; 1, negative finding; 2, benign finding; 3, probably benign finding; 4, suspicious abnormality; and 5, finding highly suggestive of malignancy.

For the descriptive analysis, we estimated radiologist-specific mean recall rates (abnormal interpretations/1000 mammograms), cancer detection rates (screening-detected in situ and invasive cancers/1000 mammograms), and biopsy recommendation rate. Rates were examined by screening modality and calendar year. For all mammograms combined, we used logistic regression with generalized estimating equations (to account for multiple screening processes within women) to estimate factors associated with recall rate. These models included indicator variables for each radiologist interpreting screening mammograms, indicator variables for screening modality and calendar year, and patient-level variables (i.e., age, race and ethnicity, availability of comparison mammography study at interpretation, personal history of breast biopsy, family history of breast cancer, and breast density). We applied marginal standardization to the resulting model coefficients to estimate mean recall rates by radiologists after adjusting for their patient population. The purpose of marginal standardization was to use the information from the adjusted logistic regression model results to estimate adjusted recall rates, biopsy recommendation rates, and cancer detection rates in terms of probabilities or percentages rather than odds ratios. We repeated analyses with biopsy recommendation rate and cancer detection rate as dependent variables to estimate the mean cancer detection rate and biopsy recommendation rate per radiologist that were also adjusted for patient mix.

Our primary analysis assessed how cancer detection rates and biopsy recommendation rates changed with recall rates. In presenting our results, the focus was on creating a graphical depiction of the trade-offs associated with higher recall rates in terms of cancer detection and biopsy recommendation, while also presenting the variability in estimates across radiologists. We created scatterplots of radiologist-level adjusted cancer detection rates (y-axis) against the corresponding recall rates (x-axis) weighted by radiologist screening volume. The size of data points, or “bubbles,” in the plot was larger for radiologists with larger screening volumes.

Next, we used linear regression to flexibly model radiologist-level cancer detection rates against the corresponding recall rates using best-fitting fractional polynomial terms. Fractional polynomials were allowed to range from −2.5 to 2.5, inclusive. We repeated an identical analysis for the relation of biopsy recommendation rates with recall rates.

Finally, for each outcome (i.e., cancer detection and biopsy recommendation), scatterplots and corresponding curvilinear plots were overlaid onto a single figure for graphical depiction of the trade-offs associated with higher recall rates. We performed the analysis just described for all mammography examinations and separately for 2D analog, 2D digital, and 3D tomosynthesis examinations.

We conducted additional analyses that modified the assumptions in our analyses to see how this might affect our interpretations. In these sensitivity analyses, we forced a linear association for cancer detection and biopsy recommendation rates with recall rates (for comparison with best-fitting fractional polynomials) and without weighting radiologist-specific values by screening volume (for comparison with weighted results). We conducted likelihood ratio tests (i.e., deviance testing) to statistically evaluate whether the addition of fractional polynomials improved the fit of biopsy recommendation and cancer detection when modeled in linear regression against recall rates for analyses weighted for radiologist volumes and also for unweighted analyses. Because the weighted models used a robust variance estimate, deviance testing may not be valid; however, because our primary analyses weighted for screening volume, we proceeded with deviance testing while acknowledging this limitation. Although we conducted likelihood ratio testing, we did not rely on these results for graphical depictions; rather, we chose a priori to present results weighted for radiologist screening volumes while incorporating the two best-fitting fractional polynomial terms in our models as described earlier.

Results

Among 1,060,655 screening mammograms, 71% were accompanied by a prior screening mammography study in the database; of these, the mean interval between screening examinations was 19 months. Overall, 84% of screening examinations were interpreted in the context of an available comparison study. The mean recall rate was 10.7%, cancer detection rate was 4.0/1000 mammograms, and biopsy recommendation rate was 1.60%. Recall rates and cancer detection rates varied with the method of screening mammography. The mean recall rate was highest for 2D analog examinations (11.2%), followed by 2D digital examinations (10.8%) and then 3D tomosynthesis examinations (9.4%). The mean biopsy recommendation rate was highest for 3D tomosynthesis examinations (1.93%), followed by 2D digital examinations (1.55%) and then 2D analog examinations (1.47%). Older patient age was associated with lower recall and biopsy recommendation rates and with higher cancer detection and minimally invasive cancer detection rates. The absence of a comparison study was associated with higher recall, biopsy recommendation, and cancer detection rates. Greater breast density was strongly associated with higher recall rate but was inversely associated with cancer detection rate (Table 1).

TABLE 1:

Characteristics of Screening Mammograms and Outcomes From a Large Health Care Organization With Multiple Facilities in the Greater Metropolitan Chicago Area (January 1, 2005–December 31, 2017)

Characteristic Total No. of Screening Mammograms Recall Rate (%) Biopsy Recommendation Rate (%) Cancer Detection Rate (%) Minimally Invasive Cancer Detection Rate (%)
Modality
 2D Analog 205,817 11.2 1.47 0.35 0.21
 2D Digital 682,352 10.8 1.55 0.42 0.26
 3D Tomosynthesis 172,486 9.4 1.93 0.40 0.27
Age at mammography (y)
 40–49 319,100 14.0 1.74 0.26 0.13
 50–59 346,562 10.4 1.58 0.37 0.23
 60–69 253,281 8.8 1.56 0.53 0.35
 70–79 141,712 7.4 1.38 0.59 0.40
Race or ethnicity
 Non-Latina White 659,991 10.9 1.68 0.45 0.29
 Non-Latina Black 226,395 9.9 1.38 0.35 0.20
 Latina 90,365 10.8 1.36 0.24 0.14
 Other or data missing 83,904 11.2 1.94 0.38 0.22
Comparison mammography study
 No 167,614 19.5 2.95 0.54 0.31
 Yes 893,041 9.0 1.35 0.38 0.24
Breast density
 Fatty 92,663 8.3 1.52 0.50 0.35
 Scattered fibroglandular 430,657 9.1 1.40 0.39 0.26
 Heterogeneously dense 450,929 12.6 1.80 0.41 0.24
 Extremely dense 86,406 11.2 1.59 0.38 0.19
Personal history of breast biopsy
 No 871,347 10.8 1.51 0.37 0.23
 Yes 189,308 10.3 1.99 0.55 0.35
Family history of breast cancer
 None 710,373 10.7 1.55 0.37 0.24
 Second-degree relative only 185,763 10.5 1.59 0.38 0.22
 First-degree relative 114,150 10.5 1.75 0.54 0.32
 More than one first-degree relative or early onset in first-degree relative 50,369 11.2 1.94 0.65 0.39

Note—Minimally invasive cancer was defined as American Joint Committee on Cancer, 7th edition, stages 1 and 2A combined.

There were 36, 67, and 38 radiologists who read at least 1000 2D analog examinations, 2D digital examinations, and 3D tomosynthesis examinations, respectively. In analyses weighted for radiologist volumes, p values for these likelihood ratio tests comparing linear versus fractional polynomial terms were .19, .25, and .43 for cancer detection by recall rate among radiologists conducting analog examinations, 2D digital examinations, and 3D tomosynthesis examinations (results not tabulated). Corresponding p values for biopsy recommendation by recall rate among radiologists interpreting 2D analog, 2D digital, and 3D tomosynthesis examinations were .37, .50, and .33, respectively. In unweighted analyses of 2D analog, 2D digital, and 3D tomosynthesis examinations, these likelihood ratio p values were .29, .02, and .22 for cancer detection and .38, .12, and .74 for biopsy recommendation, respectively.

In the primary analysis (Fig. 1), recall rates between 7% and 9% appeared to maximize cancer detection while minimizing unnecessary biopsies; recall rates above 8% were associated with disproportionately increased biopsy rates with little increase in cancer detection rates. There was minimal difference in results when stratified by screening modality or when reexamining without weighting analyses by radiologist volumes. Linear projections of these relations are shown in Figure 2 (weighted for volume) and Figures 3 and 4 (unweighted). Graphical depictions of trends with respect to how minimally invasive breast cancer detection rates changed with recall rate were very similar (results not shown).

Fig. 1—

Fig. 1—

Scatterplots of adjusted radiologist-level mean cancer detection (squares) and biopsy recommendation (circles) rates as function of recall rates. Larger circles and squares represent larger screening volumes. Curvilinear fit is based on two best-fitting fractional polynomials and weighted for radiologist screening volumes.

A–D, Scatterplots for all mammograms (A), 2D analog mammograms (B), 2D digital mammograms (C), and 3D tomosynthesis mammograms (D).

Fig. 2—

Fig. 2—

Sensitivity analysis. Scatterplots of adjusted radiologist-level mean cancer detection (squares) and biopsy recommendation (circles) rates as function of recall rates show best linear fit weighted for radiologist screening volumes. Larger circles and squares represent larger screening volumes.

A–D, Scatterplots for all mammograms (A), 2D analog mammograms (B), 2D digital mammograms (C), and 3D tomosynthesis mammograms (D).

Fig. 3—

Fig. 3—

Sensitivity analysis. Scatterplots of adjusted radiologist-level mean cancer detection (squares) and biopsy recommendation (circles) rates as function of recall rates show curvilinear fit based on two best-fitting fractional polynomials (unweighted). Larger circles and squares represent larger screening volumes.

A–D, Scatterplots for all mammograms (A), 2D analog mammograms (B), 2D digital mammograms (C), and 3D tomosynthesis mammograms (D).

Fig. 4—

Fig. 4—

Sensitivity analysis. Scatterplots of adjusted radiologist-level mean cancer detection (squares) and biopsy recommendation rates (circles) as function of recall rates show best linear fit (unweighted). Larger circles and squares represent larger screening volumes.

A–D, Scatterplots for all mammograms (A), 2D analog mammograms (B), 2D digital mammograms (C), and 3D tomosynthesis mammograms (D).

Discussion

Within this large, geographically and ethnically diverse, community-based sample of patients and radiologists in a major metropolitan area across 34 points of breast care, the optimal recall rate for cancer detection appears to be in the range 7–9% for both 2D mammography and 3D tomosynthesis. These results are in contrast to a recent U.S. study suggesting a higher recall rate, 12–14%, to be the range for the “sweet spot” [12]. These results are also on the lower spectrum of the ideal recall rate of 5–12% based on previous studies reported by the BCSC in the era of screening film mammography that were based on accounting for the trade-offs of higher recall rates in terms of sensitivity, specificity, and amount of additional workup [8, 9].

The previous U.S.-based study [12] grouped recall rates of less than 10% together for most of the analysis, which made it difficult to evaluate the trade-offs at the lower end of the range. In the current study, the recall rates were not collapsed at either end of the distribution (Table 2). Of note, radiologists with recall rates of 20.0% or greater were excluded because these radiologists appeared to be outliers who could artificially influence the modeled trends. Another reason for the lower “sweet spot” than previously described is that our study includes biopsy rates as a harmful impact of higher recall rates. Although recall rates higher than 5% may still incrementally increase the sensitivity of cancer detection, there is diminished return. Therefore, recall rates as low as 5% could be considered optimal because the biopsy recommendation rate is increasing at a higher velocity than the cancer detection rate.

TABLE 2:

Association of Cancer Detection and Biopsy Recommendation Rates With Recall Rate Modeled as a Nominal Variable and as Groups Adjusted for Patient Mix

Recall Rate (%) Mean Recall Rate (%) No. of Radiologists No. of Screening Mammogramsa Cancer Detection Rate (%) Biopsy Recommendation Rate (%) No. of Recalls per Cancer Detected No. of Biopsy Recommendations per Cancer Detected
Recall rate modeled in groups as one unit ranges
 4.0–4.9 4.2 1 6785 0.03 (−0.01, 0.06) 0.25 (0.14, 0.37) 166 10.0
 5.0–5.9 5.6 8 129,381 0.33 (0.30, 0.37) 1.22 (1.16, 1.28) 17 3.7
 6.0–6.9 6.4 4 41,880 0.13 (0.10, 0.16) 0.71 (0.63, 0.79) 49 5.4
 7.0–7.9 7.7 7 42,843 0.39 (0.32, 0.45) 1.57 (1.44, 1.69) 20 4.1
 8.0–8.9 8.3 8 177,476 0.42 (0.39, 0.45) 1.40 (1.35, 1.46) 20 3.3
 9.0–9.9 9.7 6 75,184 0.37 (0.32, 0.41) 1.50 (1.41, 1.59) 27 4.1
 10.0–10.9 10.5 5 91,109 0.52 (0.47, 0.57) 1.97 (1.87, 2.06) 20 3.8
 11.0–11.9 11.5 8 142,624 0.42 (0.39, 0.46) 1.61 (1.54, 1.68) 27 3.8
 12.0–12.9 12.7 6 52,352 0.42 (0.36, 0.48) 1.14 (1.05, 1.24) 30 2.7
 13.0–13.9 13.5 7 65,579 0.39 (0.34, 0.43) 1.47 (1.38, 1.56) 35 3.8
 14.0–14.9 14.3 6 108,102 0.44 (0.40, 0.48) 2.29 (2.20, 2.39) 32 5.2
 15.0–15.9 15.4 4 16,991 0.56 (0.44, 0.67) 2.50 (2.26, 2.74) 28 4.5
 16.0–16.9 16.9 1 1736 0.18 (−0.02, 0.38) 2.39 (1.63, 3.15) 95 13.4
 17.0–17.9 17.4 3 39,573 0.63 (0.55, 0.70) 2.24 (2.09, 2.39) 28 3.6
 18.0–18.9 18.4 1 1460 0.45 (0.01, 0.90) 2.69 (1.79, 3.58) 41 6.0
 19.0–19.9 19.1 1 1630 0.06 (−0.06, 0.19) 1.76 (1.08, 2.43) 297 27.4
Recall rate modeled in groups
 5.0–6.9 5.8 12 171,261 0.28 (0.26, 0.31) 1.11 (1.06, 1.16) 21 3.9
 7.0–8.9 8.2 15 220,319 0.41 (0.39, 0.44) 1.43 (1.38, 1.48) 20 3.5
 9.0–11.9 10.8 19 308,917 0.43 (0.41, 0.46) 1.67 (1.63, 1.72) 25 3.8
 12.0–19.9 14.4 29 287,423 0.45 (0.43, 0.48) 1.87 (1.82, 1.92) 32 4.1

Note—Values in parentheses are 95% confidence limits.

a

The numbers of screening mammograms in each section do not add up to 1,060,655 because mammograms interpreted by radiologists with recall rates of 20.0% or greater were excluded.

Although the differences in recall rate and biopsy rate among the three mammography modalities were not statistically significant in this study, 3D tomosynthesis had the lowest recall rate and the highest biopsy rate. This observation is supported by recently published studies, which reported that 3D tomosynthesis was associated with greater specificity and cancer detection rate compared with 2D digital mammography [16, 17]. Thus, with the increased use of 3D tomosynthesis in the United States, the recall sweet spot will likely decrease as well.

Optimal recall rates might vary by radiologist training and specialization. Although we lack specific, detailed information on reader experience, we know that breast imaging specialization was rare in this group of radiologists and that few, if any, were dedicated breast imagers. Radiologists interpreted a median of 1600 screening mammograms per year during the study period, which is a fairly modest volume that reflects the responsibilities these radiologists had to interpreting examinations of other anatomic sites. Thus, our results should be generalizable to other community-based practices but may not be generalizable to academic and other settings that rely predominantly on breast imaging specialists.

Interestingly, the lower recommended recall rates are consistent with European recall rates. A recent population study in England reported a recall rate of to 7.6% and study in Sweden reported a recall rate of 2.48%, significantly lower than that of the United States [11, 18]. Despite these large differences in recall rates between Europe and the United States, estimated cancer detection rates of 5.8/1000 mammograms in the United States compared with 6.3/1000 mammograms in the United Kingdom were similar. An additional harm associated with higher recall rates in the United States may be the higher rates of additional imaging and invasive procedures when compared with Norway and Spain [11]. However, international comparison of recall rates and cancer detection rates is notoriously difficult given the differences in reading methods and screening intervals. The typical screening interval is 1 year in the United States, 2 years in Norway and Spain, and 3 years in the United Kingdom, and these differences in screening interval alter the prescreening cancer prevalence and therefore impact recall rates and cancer detection rates.

The negative impact of a false-positive screening mammogram on subsequent compliance with screening mammography has also been well studied. In a retrospective review of 741,150 screening mammograms of 261,767 women, investigators reported that the median delay in returning to screening mammography was higher for women with false-positive examinations than for those with true-negative examinations (median delay, 13 vs 3 months, respectively; p < .001) [14]. Women with a true-negative result were 36% more likely to return to screening within the next 36 months than those who received a false-positive result (hazard ratio, 1.36 [95% CI, 1.35–1.37]). Additionally, a prior false-positive mammogram was associated with an increased risk of late-stage cancer at diagnosis compared with a prior true-negative mammogram (0.4% vs 0.3%; p < .001) [14]. These results are consistent with multiple other studies showing lower rescreening rates among women that have experienced false-positive screening mammograms as compared with true-negatives [1923]. In addition to the negative impact on patient compliance, false-negative mammogram results have been shown to induce cancer-specific psychosocial distress for up to 3 years [24]. Together, these studies highlight the psychosocial, physical, and economic burdens of false-positive screening mammography. A reduction of recall rates is especially important in the United States: almost half of the radiologists within the BCSC have a recall rate higher than the current U.S. recommendation of 12%. A 2006 study suggested that women may be willing to make the trade-off of a higher recall rate if it means having a higher cancer detection rate [25]. Therefore, although lower recall rates may reduce the number of unnecessary biopsies overall, individual patients might hypothetically be willing to trade higher recall rates and higher biopsy rates for a slight increase in cancer detection if this were possible. Population-level preferences (if known) could hypothetically be taken into account when considering the appropriate benchmark for recall rates; regardless, we need to better understand and more precisely quantify the trade-offs involved to establish relevant benchmarks.

The disproportionately higher biopsy recommendation rates with increasing recall rates could be, in part, because of radiologists who are risk-adverse and therefore recall more patients both at screening and diagnostic workup (by recommending more biopsies). However, the unit of analysis was radiologist recommending recall at screening and the radiologist making the recommendation for biopsy may not have been the same. Nonetheless, these results reflect the impact of radiologist recall rates at screening on subsequent biopsy recommendation and cancer detection rates, as typically occur in community practice settings.

Our study has many strengths. We accounted for differences in patient demographics and risk factors and included a diverse sample of community radiologists. A limitation of this and most studies like this study pertains to the small number of cancers detected per radiologist, which results in a large statistical variation in cancer detection rates. We included the individual radiologist scatterplots in our figures to emphasize this point, a point that tends to be ignored in studies such as ours. The graphical depictions are intended to be applicable in general. Nevertheless, future, more extensive studies would be beneficial to generate estimates better suited for specific populations—for example, for patients seen at highly accredited facilities where they are regularly screened versus patients seen at underresourced sites where they are irregularly screened. Additionally, like many studies published on recall rates, this study did not link examinations to a population-based cancer registry but rather relied on cancers diagnosed within this health care organization and available through the hospital tumor registries.

Conclusion

The optimal recall rate for cancer detection appears to be in the range of 7–9% for both 2D mammography and 3D tomosynthesis. Too many women are currently being called back for additional diagnostic imaging, and it might be possible through a combination of specialized training, double-reading, or other intervention to reduce the burden of false-positives on patients and the health care system without sacrificing early detection of breast cancer. In addition, setting a lower benchmark for recall rate could serve to motivate changes that improve the effectiveness of mammography as a cancer screening tool.

Acknowledgments

We thank the facilities and patients who participated in the Metro Chicago Breast Cancer Registry and who provided data for these analyses. We also thank Hai Nguyen for his statistical support.

Supported by the Breast Cancer Surveillance Consortium program project (P01CA154292) and supported in part by the National Institutes of Health (U54CA163303), the Patient-Centered Outcomes Research Institute (PCS-1504-30370), and the Agency for Healthcare Research and Quality (R01 HS018366-01A1).

Footnotes

The authors declare that they have no disclosures relevant to the subject matter of this article.

Based on a presentation at the Society for Epidemiologic Research 2019 annual meeting, Minneapolis, MN.

References

  • 1. Mammography Quality Standards Act of 1992. Public Law 102–539. As amended by the Mammography Quality Standards Reauthorization Act of 1998 and 2004, Pub. L. No. 105–248, Title 42, Subchapter II, Part F, Subpart 3, § 354 (42 USC 263b), certification of mammography facilities.
  • 2.Centers for Disease Control and Prevention (CDC) website. Table 70: use of mammography among women aged 40 and over, by selected characteristics: United States, selected years 1987–2015. www.cdc.gov/nchs/data/hus/2017/070.pdf. Published 2017. Accessed December 3, 2020
  • 3.Institute of Medicine. Improving breast imaging quality standards. The National Academies Press, 2005 [Google Scholar]
  • 4.Durand MA, Song J, Yen RW, et al. Adapting the breast cancer surgery decision quality instrument for lower socioeconomic status: improving readability, acceptability, and relevance. MDM Policy Pract 2018; 3:2381468318811839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Portillo I, Idigoras I, Bilbao I, et al. ; EUSKOLON Study Investigators. Colorectal cancer screening program using FIT: quality of colonoscopy varies according to hospital type. Endosc Int Open 2018; 6:E1149–E1156 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ronco G, Dillner J, Elfström KM, et al. ; International HPV Screening Working Group. Efficacy of HPV-based screening for prevention of invasive cervical cancer: follow-up of four European randomised controlled trials. Lancet 2014; 383:524–532 [DOI] [PubMed] [Google Scholar]
  • 7.Pinsky PF. Lung cancer screening with low-dose CT: a world-wide view. Transl Lung Cancer Res 2018; 7:234–242 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Schell MJ, Yankaskas BC, Ballard-Barbash R, et al. Evidence-based target recall rates for screening mammography. Radiology 2007; 243:681–689 [DOI] [PubMed] [Google Scholar]
  • 9.Lehman CD, Arao RF, Sprague BL, et al. National performance benchmarks for modern screening digital mammography: update from the Breast Cancer Surveillance Consortium. Radiology 2017; 283:49–58 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lee CS, Bhargavan-Chatfield M, Burnside ES, Nagy P, Sickles EA. The National Mammography Database: preliminary data. AJR 2016; 206:883–890 [DOI] [PubMed] [Google Scholar]
  • 11.Domingo L, Hofvind S, Hubbard RA, et al. Cross-national comparison of screening mammography accuracy measures in U.S., Norway, and Spain. Eur Radiol 2016; 26:2520–2528 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Grabler P, Sighoko D, Wang L, Allgood K, Ansell D. Recall and cancer detection rates for screening mammography: finding the sweet spot. AJR 2017; 208:208–213 [DOI] [PubMed] [Google Scholar]
  • 13.Blanks RG, Given-Wilson RM, Cohen SL, Patnick J, Alison RJ, Wallis MG. An analysis of 11.3 million screening tests examining the association between recall and cancer detection rates in the English NHS breast cancer screening programme. Eur Radiol 2019; 29:3812–3819 [DOI] [PubMed] [Google Scholar]
  • 14.Dabbous FM, Dolecek TA, Berbaum ML, et al. Impact of a false-positive screening mammogram on subsequent screening behavior and stage at breast cancer diagnosis. Cancer Epidemiol Biomarkers Prev 2017; 26:397–403 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Rauscher GH, Dabbous F, Dolecek TA, et al. Absence of an anticipated racial disparity in interval breast cancer within a large health care organization. Ann Epidemiol 2017; 27:654–658 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Conant EF, Barlow WE, Herschorn SD, et al. ; Population-based Research Optimizing Screening Through Personalized Regimen (PROSPR) Consortium. Association of digital breast tomosynthesis vs digital mammography with cancer detection and recall rates by age and breast density. JAMA Oncol 2019; 5:635–642 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.McDonald ES, Oustimov A, Weinstein SP, Synnestvedt MB, Schnall M, Conant EF. Effectiveness of digital breast tomosynthesis compared with digital mammography: outcomes analysis from 3 years of breast cancer screening. JAMA Oncol 2016; 2:737–743 [DOI] [PubMed] [Google Scholar]
  • 18.Wu WY, Törnberg S, Elfström KM, Liu X, Nyström L, Jonsson H. Overdiagnosis in the population-based organized breast cancer screening program estimated by a non-homogeneous multi-state model: a cohort study using individual data with long-term follow-up. Breast Cancer Res 2018; 20:153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Chiarelli AM, Moravan V, Halapy E, Majpruz V, Mai V, Tatla RK. False-positive result and reattendance in the Ontario Breast Screening Program. J Med Screen 2003; 10:129–133 [DOI] [PubMed] [Google Scholar]
  • 20.Brett J, Austoker J. Women who are recalled for further investigation for breast screening: psychological consequences 3 years after recall and factors affecting re-attendance. J Public Health Med 2001; 23:292–300 [DOI] [PubMed] [Google Scholar]
  • 21.Brett J, Bankhead C, Henderson B, Watson E, Austoker J. The psychological impact of mammographic screening: a systematic review. Psychooncology 2005; 14:917–938 [DOI] [PubMed] [Google Scholar]
  • 22.McCann J, Stockton D, Godward S. Impact of false-positive mammography on subsequent screening attendance and risk of cancer. Breast Cancer Res 2002; 4:R11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Román R, Sala M, De La Vega M, et al. Effect of false-positives and women’s characteristics on long-term adherence to breast cancer screening. Breast Cancer Res Treat 2011; 130:543–552 [DOI] [PubMed] [Google Scholar]
  • 24.Bond M, Pavey T, Welch K, et al. Systematic review of the psychological consequences of false-positive screening mammograms. Health Technol Assess 2013; 17:1–170, v–vi [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ganott MA, Sumkin JH, King JL, et al. Screening mammography: do women prefer a higher recall rate given the possibility of earlier detection of cancer? Radiology 2006; 238:793–800 [DOI] [PubMed] [Google Scholar]

RESOURCES