Skip to main content
Radiology logoLink to Radiology
. 2008 Feb;246(2):376–383. doi: 10.1148/radiol.2461070200

Diagnostic Accuracy of Digital versus Film Mammography: Exploratory Analysis of Selected Population Subgroups in DMIST

Etta D Pisano 1, R Edward Hendrick 1, Martin J Yaffe 1, Janet K Baum 1, Suddhasatta Acharyya 1, Jean B Cormack 1, Lucy A Hanna 1, Emily F Conant 1, Laurie L Fajardo 1, Lawrence W Bassett 1, Carl J D'Orsi 1, Roberta A Jong 1, Murray Rebner 1, Anna N A Tosteson 1, Constantine A Gatsonis 1
PMCID: PMC2659550  PMID: 18227537

Abstract

Purpose: To retrospectively compare the accuracy of digital versus film mammography in population subgroups of the Digital Mammographic Imaging Screening Trial (DMIST) defined by combinations of age, menopausal status, and breast density, by using either biopsy results or follow-up information as the reference standard.

Materials and Methods: DMIST included women who underwent both digital and film screening mammography. Institutional review board approval at all participating sites and informed consent from all participating women in compliance with HIPAA was obtained for DMIST and this retrospective analysis. Areas under the receiver operating characteristic curve (AUCs) for each modality were compared within each subgroup evaluated (age < 50 vs 50–64 vs ≥ 65 years, dense vs nondense breasts at mammography, and pre- or perimenopausal vs postmenopausal status for the two younger age cohorts [10 new subgroups in toto]) while controlling for multiple comparisons (P < .002 indicated a significant difference). All DMIST cancers were evaluated with respect to mammographic detection method (digital vs film vs both vs neither), mammographic lesion type (mass, calcifications, or other), digital machine type, mammographic and pathologic size and diagnosis, existence of prior mammographic study at time of interpretation, months since prior mammographic study, and compressed breast thickness.

Results: Thirty-three centers enrolled 49 528 women. Breast cancer status was determined for 42 760 women, the group included in this study. Pre- or perimenopausal women younger than 50 years who had dense breasts at film mammography comprised the only subgroup for which digital mammography was significantly better than film (AUCs, 0.79 vs 0.54; P = .0015). Breast Imaging Reporting and Data System–based sensitivity in this subgroup was 0.59 for digital and 0.27 for film mammography. AUCs were not significantly different in any of the other subgroups. For women aged 65 years or older with fatty breasts, the AUC showed a nonsignificant tendency toward film being better than digital mammography (AUCs, 0.88 vs 0.70; P = .0025).

Conclusion: Digital mammography performed significantly better than film for pre- and perimenopausal women younger than 50 years with dense breasts, but film tended nonsignificantly to perform better for women aged 65 years or older with fatty breasts.

© RSNA, 2008


The American College of Radiology Imaging Network sponsored the Digital Mammographic Imaging Screening Trial (DMIST) to assess the diagnostic accuracy of digital mammography compared with film mammography for women presenting for breast cancer screening (1). DMIST revealed that digital and film mammography had statistically similar diagnostic accuracy for the overall screening population but that digital was significantly better, as measured by a greater area under the receiver operating characteristic curve (AUC), for women who were younger than 50 years, who were pre- or perimenopausal, or who had mammographically dense breasts (2). The difference in diagnostic accuracy (AUC value) between digital and film mammography in the affected populations was driven by the improved sensitivity of digital mammography compared with film mammography, without a difference in specificity (2).

These results have puzzled some readers (3), who have inferred that a significant difference in performance for digital mammography in one subset of the population with no difference in performance overall must mean that film mammography outperformed digital for some other subset. Because the results were not anticipated, the originally planned analysis did not attempt to dissect the effect of the three different factors defining the groups for which digital mammography performed better—that is, age, menopausal status, and breast density.

Breast density is known to affect mammographic accuracy (4) and is considered to be the most likely driving factor for the DMIST results (5). Age is correlated with density, with younger women tending to have mammographically denser breasts. Both of these factors were well classified in DMIST, with density defined by using the four-point Breast Imaging Reporting and Data System (BI-RADS) scale in use at the time the study was open to accrual. Menopausal status was defined in DMIST through self-report by each subject. A woman reporting regular menstruation was considered premenopausal. Women whose last menstrual period was less than 1 year prior to the study mammography and who stated they were no longer having regular menstruation were considered perimenopausal. Women whose last menstrual period was more than 1 year prior to the study mammogram were considered postmenopausal. All women who had undergone hysterectomy were defined as postmenopausal, regardless of their ovarian status and the date of their last menstrual period.

To provide additional information from the DMIST study beyond the originally planned and reported subset analyses, our study retrospectively compared the accuracy of digital versus film mammography in subgroups defined by combinations of age, menopausal status, and breast density, by using either biopsy results or follow-up information as the reference standard.

MATERIALS AND METHODS

DMIST Study

Detailed descriptions of the methods used in DMIST are published elsewhere (1,2). Briefly, DMIST enrolled women at 33 sites in the United States and Canada during 25 consecutive months. Women underwent both digital and film mammography. Two different radiologists interpreted each subject's examinations—one reader for digital and one for film. Work-up proceeded if results of either examination were positive. Truth (the reference standard) regarding breast cancer status was determined with biopsy results for women who underwent biopsy or with imaging and/or clinical follow-up for at least 10 months after initial imaging (1,2). Breast density was determined by using the American College of Radiology BI-RADS four-point scale for density by the radiologist interpreting each woman's film mammogram.

DMIST, and this retrospective new analysis, had institutional review board approval at the American College of Radiology Imaging Network and at all participating sites (Appendix), participant informed consent, and compliance with the Health Insurance Portability and Accountability Act. No direct industrial support was provided for DMIST or for this study. Digital mammography machines (the Computed Radiography System for Mammography; Fuji Medical, Tokyo, Japan) that were not approved for purchase by the U.S. Food and Drug Administration at the time of DMIST were provided at the expense of the manufacturers. Coauthors who are not paid consultants or employees of the digital mammography machine manufacturers had control of inclusion of the data and the information that is included in this manuscript. (L.L.F. is a salaried member of the Board of Directors of and C.J.D. is a paid consultant to Hologic [Bedford, Mass]. C.J.D. is a paid consultant to GE Medical Systems [Milwaukee, Wis]. J.K.B. was paid as a consultant to Fischer Medical [Denver, Colo] during DMIST.)

Current Study Reference Standard

The reference standard status of each participant was defined as positive for malignancy if there was evidence of pathologically verified cancer within 455 days after initial study mammography and as negative for malignancy if the participant's status was not classified as positive and if their breast cancer status was determined to be negative at the enrolling institution 10 months or more after study entry, either through follow-up mammography (including subsequent work-up) or with other information. Participants whose status was neither positive nor negative were classified as having indeterminate status if they had undergone a breast biopsy with indeterminate results (defined as insufficient or nondiagnostic interpretations); had undergone a follow-up mammographic study that was interpreted as showing BI-RADS category 3, 4, or 5 findings and had no additional follow-up information; or had died during the follow-up period without a diagnosis of breast cancer. All other participants were classified as having an unknown reference standard.

Final Study Group

Participants with either positive or negative reference-standard status comprised the set of “fully verified” patients whose cancer status had been determined and who were used in the analysis as the final study group.

New Retrospective Subgroup Analyses

We undertook a new retrospective analysis of the accuracy of digital and film mammography for new population subsets defined by combinations of age, menopausal status, and mammographic density. Specifically, we compared digital and film mammography in 10 subgroups of women: pre- and perimenopausal women younger than 50 years with fatty breasts, pre- and perimenopausal women younger than 50 years with dense breasts, postmenopausal women younger than 50 years with fatty breasts, postmenopausal women younger than 50 years with dense breasts, pre- and perimenopausal women between 50 and 64 years of age with fatty breasts, pre- and perimenopausal women between 50 and 64 years of age with dense breasts, postmenopausal women between 50 and 64 years of age with fatty breasts, postmenopausal women between 50 and 64 years of age with dense breasts, women aged 65 years or older with fatty breasts, and women aged 65 years or older with dense breasts. Menopausal status was eliminated as a factor for women aged 65 years or older because there were so few pre- and perimenopausal women in this age group, and there is high likelihood that such women were misclassified. We chose to subdivide the group of women older than 50 years into two subgroups because of questions about the relative utility of the two mammographic technologies in the U.S. Medicare population, questions that are particularly relevant to the DMIST cost-effectiveness analysis, whose results will be reported elsewhere. All of these analyses were based on the original DMIST data, with interpretations performed by the on-site study radiologists.

In addition, we evaluated all cancers with respect to mammographic detection method (digital vs film vs both vs neither), mammographic lesion type (mass, calcifications, or other), digital machine type, mammographic and pathologic size and diagnosis, existence of prior mammographic study at the time of interpretation, months since prior mammographic study, and compressed breast thickness. Digital machine types included in DMIST were the SenoScan (Fischer Medical), the Computed Radiography System for Mammography (Fuji Medical), the Senographe 2000D (GE Medical Systems), the Digital Mammography System (Hologic), and the Selenia Full Field Digital Mammography System (Hologic). Mammographic lesion type (mass, calcifications, or other, with masses plus calcifications classified as masses) and size of breast cancers were recorded by a single radiologist (E.D.P., with 23 years of experience in interpreting breast imaging studies), who reviewed film and digital mammograms of all studies known to contain cancer side by side, with knowledge of lesion location and diagnosis after the primary DMIST study was completed. Mammographic size was measured in millimeters by using the digital and film mammogram that best depicted the lesion according to the judgment of the single reader. Pathologic size was obtained from pathology reports acquired from the clinical sites. Compressed breast thickness was obtained from the Digital Imaging and Communications in Medicine headers of the digital mammograms.

Statistical Analysis

Receiver operating characteristic curves for digital and film mammography were estimated from the pooled data across the study by using the seven-point malignancy score assigned to each patient at the time of screening mammography and before further work-up (1). The AUCs were compared by using the bivariate binormal model, which accounts for the paired test design (6,7). A corroborating nonparametric AUC analysis was also performed (8,9). Although the analysis for our current study was exploratory and the subsets were not planned in the original study, we controlled for multiple comparisons by using a Bonferroni correction, and we consider the AUC subset comparisons reported as additional comparisons to the 15 reported in the primary DMIST study. We required a P value of less than .002 (.05/25) to declare significance of differences in our current study.

In addition to the exploration of the AUC comparisons, estimates of the sensitivity, specificity, and positive predictive value (the so-called PPV1, henceforth referred to as PPV) of the two mammographic modalities were computed on the basis of the BI-RADS scores assigned at initial screening interpretation dichotomized into negative (BI-RADS scores of 1, 2, and 3) and positive (BI-RADS scores of 0, 4, and 5) scores. Comparisons of estimates were performed by using the McNemar test. Additional work-up status was used to classify the breast cancer status of subjects in some parts of the analysis. A participant's status was classified as positive for additional work-up if a BI-RADS score of 0, 4, or 5 had been assigned during the initial digital or film mammographic study or if the initial screening test resulted in a recommendation for further imaging (additional mammographic views, ultrasonography, magnetic resonance imaging), physical examination, or biopsy. In accordance with the approach taken in the primary DMIST study, the comparisons of sensitivity, specificity, and PPV were treated as descriptive in the analysis.

Statistical software (SAS, version 9.1, SAS Institute, Cary, NC; and ROCKIT, version 0.9 beta [available from the Kurt Rossmann Laboratories for Radiologic Image Research at the University of Chicago, Chicago, Ill, at http://www-radiology.uchicago.edu/krl/index.htm]) was used in the statistical analysis.

RESULTS

Of the 49 528 women who were eligible for DMIST and had complete imaging analyses, only 42 760 had breast cancer truth status information (either biopsy results or findings at follow-up 10 months or later after study entry) (2). A chart (Figure) shows the screening results for these 42 760 women with information on the BI-RADS interpretation of their digital and film screening mammograms.

Figure 1.

Figure 1

Chart shows screening mammography results for all women in DMIST included in current study. + = Positive, − = negative, DG = digital mammography, SF = screen-film mammography, Ref Std = reference standard.

Subgroups

The 42 760 women and their breast cancers (Table 1) were divided into 10 subgroups on the basis of age (younger than 50 years, between 50 and 64 years of age, and aged 65 years or older) and breast density (dichotomized between the two densest and two least dense BI-RADS categories). The youngest two age cohorts were also divided into pre- or perimenopausal and postmenopausal groups. There were only 19 women in the entire population who could not be classified because of missing breast density classifications. There were 7315 pre- and perimenopausal women younger than 50 years with dense breasts and 4600 with fatty breasts. There were 1107 postmenopausal women younger than 50 years with dense breasts and 1108 with fatty breasts. There were 1964 pre- and perimenopausal women older than 50 but younger than 65 years with dense breasts and 1874 with fatty breasts. There were 6716 postmenopausal women older than 50 but younger than 65 years with dense breasts and 9547 with fatty breasts. There were 2507 women aged 65 years or older with dense breasts and 5379 with fatty breasts.

Table 1.

Numbers of Women and Cancers in Each Age and Breast Density Subgroup

graphic file with name r08fe15t01x.jpg

*

Number of cancers divided by total number of cancers diagnosed in DMIST (ie, 335).

Number of subjects in that subgroup divided by the total number of subjects (ie, 42 760).

Percentages may not add up to 100% owing to rounding.

Subgroup Comparisons

Table 2 shows the AUCs (derived by using the DMIST seven-point scale), as well as the sensitivities, specificities, and PPVs (derived by using BI-RADS categories), with 95% confidence intervals and P values, for the 10 subgroups for both digital and film mammography. For this exploratory analysis involving multiple comparisons, with the application of a Bonferroni correction for 25 total subset comparisons of AUC (15 in the primary DMIST study, plus 10 in Table 2), a P value indicates significance only if it is less than .002. The data in Table 2 show that the subgroup comparisons between digital and film mammography that proved to yield statistically significant differences were the AUC, sensitivity, and PPV for pre- and perimenopausal women younger than 50 years with dense breasts (AUC for digital, 0.791; AUC for film, 0.544; difference in AUC, 0.247 [P = .0015]; sensitivity for digital, 0.591; sensitivity for film, 0.273 [P = .0013]; and PPV for digital, 0.033; PPV for film, 0.015 [P = .0005]). The only other comparisons that approached statistical significance were the AUCs and PPVs for women aged 65 years or older who had nondense breasts (AUC, 0.705 for digital and 0.877 for film; 95% confidence intervals, 0.578, 0.811 and 0.804, 0.929, respectively [P = .0025]; and PPV, 0.092 for digital and 0.127 for film; 95% confidence intervals, 0.064, 0.126 and 0.094, 0.168, respectively [P = .0055]). Interestingly, the trend for women aged 65 years or older with fatty breasts was in favor of improved diagnostic accuracy for film over digital mammography.

Table 2.

AUCs, Sensitivities, Specificities, and PPVs for Digital and Film Mammography in 10 Demographic Subgroups

graphic file with name r08fe15t02x.jpg

Note.—Data in parentheses are 95% confidence intervals (which are exact for sensitivity and specificity). AUCs were achieved by using the seven-point DMIST malignancy scale; sensitivity, specificity, and PPV were achieved by using BI-RADS categories.

*

For comparison of digital versus film mammography. P < .002 indicates a significant difference.

All mammographic lesion types in women younger than 50 years with dense breasts were more frequently detected with digital than with film mammography; conversely, all lesion types in women aged 65 years or older with fatty breasts were more frequently detected with film mammography (Table 3). Excluding the Hologic machines (because so few cancers were detected with those systems), all digital machines depicted more cancers than film systems for women younger than 50 years with dense breasts, and film systems depicted more cancers than all digital machines for women aged 65 years or older with fatty breasts.

Table 3.

Comparison of Characteristics of Cancers in Various DMIST Population Subgroups

graphic file with name r08fe15t03x.jpg

Note.—Unless otherwise specified, data are numbers of participants, with percentages (based on a denominator of 106 total subjects) in parentheses.

In the subgroup of pre- and perimenopausal women younger than 50 years with dense breasts, 16 cancers were found with digital and missed with film mammography, while only two cancers were found with film and missed with digital mammography. On the other hand, in the subgroup of women aged 65 years or older with nondense breasts, 15 cancers were found with film and missed with digital mammography, while only four cancers were found with digital and missed with film mammography. Note that in this latter population, six of the 15 cancers found with film and missed with digital mammography were missed with the Fischer digital unit, and only 11 total cancers were imaged with the Fischer unit. This represents a higher percentage of tumors missed at digital mammography (six [55%] of 11 cancers) with the Fischer unit than with the other machine types in this population (two [18%] of 11 cancers were missed with the Fuji digital unit, and seven [18%] of 38 cancers were missed with the GE digital unit).

Also of interest is the presence of ductal carcinoma in situ and mammographically smaller lesions throughout both age groups. There were also no obvious trends regarding prior studies for comparison, in terms of either their availability or the length of time since the earlier study. In addition, we found no difference in compressed breast thickness that would explain the differences in performance between digital and film mammography in the two age groups.

DISCUSSION

The results reported here corroborate the trend in favor of improved diagnostic accuracy of digital mammography over film for pre- and perimenopausal women younger than 50 years with dense breasts. It also reveals a nonsignificant trend toward improved diagnostic accuracy of film over digital mammography for women aged 65 years or older with fatty breasts. Again, for most groups evaluated, there was no significant difference between digital and film mammography, as in the primary DMIST analysis. This, of course, may be due to small sample sizes when subgroups are analyzed.

We evaluated the possible causes for these trends, but could not identify a definitive cause on the basis of the original DMIST data. Because digital mammography has improved image contrast with worse spatial resolution compared with film, we evaluated the types of lesions found with both modalities in each subset of the population. We saw no difference in the mammographic lesion size and the presence of ductal carcinoma in situ that could explain the detection difference between the two populations, suggesting that differences in spatial resolution are not responsible for the trend seen between younger and older women. The only factor that seemed to be correlated with the poorer performance of digital mammography in older women was the higher percentage of cancers missed when digital screening was performed by using the Fischer unit compared with the other machine types. Very few cancers were detected with each machine, however, so the importance of this factor is difficult to determine with certainty. Of course, DMIST was not designed to answer this specific question, and it is not surprising that the study power was insufficient for us to answer definitively many interesting questions that arise from the results of our primary DMIST analysis.

Another possible explanation for the source of variation in diagnostic accuracy between digital and film mammography found in DMIST is the variability of the interpretive performance of the readers who participated in the study. This factor was mitigated as much as possible by the study design, in that all readers read approximately equal numbers of digital and film studies.

We believe our study provides additional information for researchers contemplating further research studies of digital mammography. However, it does not provide a definitive answer to the interesting question of why digital mammography performed better than film mammography for women with dense breasts, women younger than 50 years, and pre- and perimenopausal women and why there was a tendency toward better performance for film mammography for women aged 65 years or older with fatty breasts. The results of these exploratory analyses are presented as hypothesis generating. They do not supplant the previously published results (2) but rather serve to provide more detailed information and, ultimately, contribute to greater understanding of the factors that may affect the comparison of digital and screen-film mammography.

To address some of the remaining questions, we are currently performing a study in which highly experienced readers will compare digital and film mammograms for all DMIST cancers. Perhaps the image processing used for dense and fatty breasts causes a difference in performance of digital versus film for the two populations that is most apparent when the population is highly skewed to women with either very dense or very fatty breasts, as it would be in the youngest and oldest age groups studied. We hope such a review will help determine whether image characteristics themselves may explain the differences in diagnostic accuracies for these two distinct populations of women.

ADVANCES IN KNOWLEDGE

  • Digital mammography was significantly better than film mammography in pre- and perimenopausal women with dense breasts younger than 50 years (P = .0015).

  • In all other population subgroups, there was no significant difference in diagnostic accuracy between digital and film mammography (P > .002).

  • For women aged 65 years or older with fatty breasts, there was a nonsignificant trend toward improved diagnostic accuracy of film over digital mammography (P = .0025).

IMPLICATION FOR PATIENT CARE

  • Radiologists should strongly consider using digital rather than film mammography for the population subsets in which digital mammography had greater diagnostic accuracy than film mammography (women younger than 50 years, women with dense breasts at mammography, and pre- or perimenopausal women).

Table 4.

Principal Investigators and Lead Physicists at DMIST Clinical Sites

graphic file with name r08fe15t04x.jpg

Acknowledgments

The authors gratefully acknowledge the important contributions of all the many people involved at American College of Radiology Imaging Network headquarters and at the recruiting sites in the completion of this work. Special thanks go to all of the site principal investigators and lead physicists, whose names are listed above as the DMIST Investigators Group. In addition, the trial would not have been possible without the other radiologists and research assistants at all of the recruiting sites.

Abbreviations

  • AUC = area under the receiver operating characteristic curve

  • BI-RADS = Breast Imaging Reporting and Data System

  • DMIST = Digital Mammographic Imaging Screening Trial

  • PPV = positive predictive value

APPENDIX

The clinical sites of DMIST, as well as the principal investigators and lead physicists at each site, are listed in Table A1.

Guarantors of integrity of entire study, E.D.P., C.A.G.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; manuscript final version approval, all authors; literature research, R.E.H., M.J.Y., C.A.G.; clinical studies, R.E.H., M.J.Y., J.K.B., E.F.C., L.L.F., L.W.B., C.J.D., R.A.J.; statistical analysis, S.A., J.B.C., L.A.H., C.A.G.; and manuscript editing, R.E.H., M.J.Y., J.K.B., S.A., J.B.C., E.F.C., L.L.F., C.J.D., R.A.J., M.R., A.N.A.T., C.A.G.

NIH funding: This research was supported by the National Institutes of Health [grant number 5 U01 CA080098-09].

References

  • 1.Pisano ED, Gatsonis C, Yaffe M, et al. American College of Radiology Imaging Network digital mammographic imaging screening trial: objectives and methodology. Radiology 2005;236(2):404–412. [DOI] [PubMed] [Google Scholar]
  • 2.Pisano ED, Gatsonis C, Hendrick E, et al. Diagnostic performance of digital versus film mammography for breast-cancer screening. N Engl J Med 2005;353(17):1773–1783. [Published correction appears in N Engl J Med 2006;355(17):1840.] [DOI] [PubMed] [Google Scholar]
  • 3.Keen JD. Digital and film mammography. N Engl J Med 2006;354(7):765–767. [PubMed] [Google Scholar]
  • 4.Carney PA, Miglioretti DL, Yankaskas BC, et al. Individual and combined effects of age, breast density, and hormone replacement therapy use on the accuracy of screening mammography. Ann Intern Med 2003;138(3):168–175. [Published correction appears in Ann Intern Med 2003;138(9):771.] [DOI] [PubMed] [Google Scholar]
  • 5.Pisano ED. Digital mammography: what next? J Am Coll Radiol 2006;3(8):583–585. [DOI] [PubMed] [Google Scholar]
  • 6.Metz C, Wang P, Kronman H. A new approach for testing the significance of differences between ROC curves measured from correlated data. In: Deconinck F, ed. Information processing in medical imaging. The Hague, the Netherlands: Nijhoff, 1984.
  • 7.Zhou XH, Obuchowski N, McClish D. Statistical methods in diagnostic medicine. New York, NY: Wiley, 2002.
  • 8.DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated received operating characteristic curves: a non-parametric approach. Biometrics 1988;44(3):837–845. [PubMed] [Google Scholar]
  • 9.Toledano AY, Gatsonis C. Generalized estimating equations for ordinal categorical data: arbitrary patterns of missing responses and missingness in a key covariate. Biometrics 1999;55(2):488–496. [DOI] [PubMed] [Google Scholar]

Articles from Radiology are provided here courtesy of Radiological Society of North America

RESOURCES