Skip to main content
Springer logoLink to Springer
. 2016 Dec 23;162(1):95–103. doi: 10.1007/s10549-016-4090-7

Volumetric breast density affects performance of digital screening mammography

Johanna O P Wanders 1, Katharina Holland 2, Wouter B Veldhuis 3, Ritse M Mann 2, Ruud M Pijnappel 3,4, Petra H M Peeters 1,5, Carla H van Gils 1,, Nico Karssemeijer 2
PMCID: PMC5288416  PMID: 28012087

Abstract

Purpose

To determine to what extent automatically measured volumetric mammographic density influences screening performance when using digital mammography (DM).

Methods

We collected a consecutive series of 111,898 DM examinations (2003–2011) from one screening unit of the Dutch biennial screening program (age 50–75 years). Volumetric mammographic density was automatically assessed using Volpara. We determined screening performance measures for four density categories comparable to the American College of Radiology (ACR) breast density categories.

Results

Of all the examinations, 21.6% were categorized as density category 1 (‘almost entirely fatty’) and 41.5, 28.9, and 8.0% as category 2–4 (‘extremely dense’), respectively. We identified 667 screen-detected and 234 interval cancers. Interval cancer rates were 0.7, 1.9, 2.9, and 4.4‰ and false positive rates were 11.2, 15.1, 18.2, and 23.8‰ for categories 1–4, respectively (both p-trend < 0.001). The screening sensitivity, calculated as the proportion of screen-detected among the total of screen-detected and interval tumors, was lower in higher density categories: 85.7, 77.6, 69.5, and 61.0% for categories 1–4, respectively (p-trend < 0.001).

Conclusions

Volumetric mammographic density, automatically measured on digital mammograms, impacts screening performance measures along the same patterns as established with ACR breast density categories. Since measuring breast density fully automatically has much higher reproducibility than visual assessment, this automatic method could help with implementing density-based supplemental screening.

Keywords: Mammographic density, Breast cancer, Cancer screening, Mammography, Breast

Introduction

Breast density increases breast cancer risk [1, 2]. In addition, sensitivity of screening mammography is lower for women with dense breasts, caused by the masking effect of dense (fibroglandular) breast tissue [3, 4]. This has led to breast density legislation in 28 states of the United States of America (USA) until now, and has fueled ongoing discussions on the need for supplemental screening for women with dense breasts worldwide [5].

One hoped that screening performance in women with dense breasts would improve when film-screen mammography (FSM) was replaced by digital mammography (DM). Unfortunately, screening sensitivity was still worse in women with dense compared to nondense breasts when DM was used [68]. Most large studies looking into the effect of breast density on screening performance used the breast imaging reporting and data system (BI-RADS) for breast density assessment, which is assessed by radiologists. However, this method has a moderate inter-observer agreement [912].

With the advent of digital mammography, several fully automatic volumetric density assessment methods have been developed. Volpara is one of these methods, and has shown correlation with BI-RADS density categories and MRI breast density measurements [1316].

The effect of automatically measured volumetric breast density on screening sensitivity has only been studied once [17]. However, information about the effect of automatically measured volumetric breast density on other screening performance measures like recall rates, false positive rates, and positive predictive values (PPV) was not given in this study. Therefore, the aim of this study was to examine to what extent automatically measured volumetric mammographic density affected screening sensitivity and other screening performance measures in a large Dutch population-based screening program cohort containing a consecutive series of digital screening mammograms and complete information about interval cancers.

Materials and methods

Study population

Data were acquired from a breast cancer screening unit (Preventicon screening unit 19, Utrecht, the Netherlands) of the Foundation of Population Screening Mid-West, one of the five screening regions of the Dutch breast cancer screening program. Women participating in this biennial screening program are aged 50–75. The program involves mammography only, and all mammograms are read by two certified screening radiologists. In the Dutch screening program, previous screening mammograms are most of the time available for comparison in case of subsequent screens.

In 2003, DM was introduced at the Preventicon screening unit [1820]. Analog mammography systems were gradually replaced by digital ones. In July 2007, almost all mammograms at this screening unit were digital [19].

By participating in the Dutch screening program, women consent to their data being used for evaluation and improvement of the screening, unless they have indicated otherwise.

Data collection

We prospectively collected all unprocessed DM examinations that were taken at the Preventicon screening unit between 2003 and 2011, with exception of a 4-month period in 2009 when only processed data were archived. All mammograms were acquired using Lorad Selenia DM systems (Hologic, Danbury, Conn.). The first screening examination of a woman in the screening program always included the two standard views, craniocaudal (CC) and mediolateral oblique (MLO). At subsequent screening examinations, MLO was the routinely acquired view and CC was acquired in 57% of the cases by indication (e.g., high breast density, visible abnormality) during the study period. Recall and breast cancer detection information was obtained from the screening registration system. Interval cancers were identified through linkage with the Netherlands Cancer Registry.

Examinations were excluded, when information about recall or final outcome was missing. In addition, examinations for which breast density could not be determined, and interval cancers diagnosed more than 24 months after the last screening mammogram were excluded for analysis.

Tumor information such as maximum diameter, nodal status, and ICD-O codes were obtained from the screening registration system. Nodes were classified negative when the sentinel lymph node, or the dissection specimen in case no sentinel lymph node procedure was performed, contained no or only isolated tumor cells. Nodes were considered positive if they contained micrometastases (0.2–2 mm) or metastases larger than 2 mm.

Volumetric mammographic density assessment

Percentage dense volume (PDV) was automatically assessed from unprocessed mammograms of the left and right breasts, and MLO and CC views using the commercially available Volpara Density software (version 1.5.0, Volpara Solutions, Wellington, New Zealand) [21].

The average PDV per screening examination was determined using the available views of both breasts. Volpara density grades (VDGs) were constructed based on this average PDV (VDG1: 0% ≤ VBD < 4.5%, VDG2: 4.5% ≤ VBD < 7.5%, VDG3: 7.5% ≤ VBD < 15.5%, VDG4: VBD ≥ 15.5%). The VDGs are designed to mimic the American College of Radiology (ACR) BI-RADS breast density categories (4th edition).

Statistical analysis

Examinations were grouped according to VDGs. Within these groups, we determined the following screening performance measures with accompanying 95% confidence intervals (CI) using generalized estimating equations (GEE) to account for correlation between examinations of the same woman using the ‘independence’ correlation structure: recall rate, false positive rate, screen-detected breast cancer rate, interval breast cancer rate, total breast cancer rate (all rates are per 1000 screening examinations), sensitivity and specificity of the screening, and positive predictive value (PPV). For the screening sensitivity, we calculated Wilson’s 95% confidence intervals (see Table 1 for screening performance definitions). For comparison with American screening programs, we also determined interval cancer rates for the first year after a negative screening mammogram, since the screening interval in the USA is normally 1 year.

Table 1.

Definitions of screening performance measures

FN (Interval breast cancer) Breast cancers diagnosed within 24 months after a screening examination that did not lead to recall (negative mammogram), and before the next scheduled screening examination
TP (Screen-detected breast cancer) Breast cancers diagnosed after a recalled screening examination (positive mammogram)
FP Screening examinations that led to a recall (positive mammogram), but not to a breast cancer diagnosis within 24 months after the examination, or before the next scheduled screening examination
TN Screening examinations that did not lead to recall (negative mammogram) and no breast cancer was diagnosed within 24 months after the examination, or before the next scheduled screening examination
Sensitivity of screening The number of screen-detected breast cancers divided by the total number of screen-detected plus interval breast cancers ((TP/(TP + FN))
Specificity of screening Number of screening examinations that did not lead to recall (negative mammogram) and no breast cancer diagnosis within 24 months, or before the next scheduled screening examination divided by the total number of examinations without breast cancer diagnosis within 24 months, or before the next scheduled screening examination ((TN/(TN + FP))
PPV The number of screen-detected breast cancers divided by the total number of examinations that led to recall ((TP/(TP + FP))

FN false negative, TP true positive, FP false positive, TN true negative, PPV positive predictive value

We performed several sensitivity analyses: (1) taking only invasive tumors into account (i.e., excluding the examinations leading to a true positive or false negative diagnosis of in situ carcinoma); (2) taking only subsequent screening rounds into account, since performance measures are expected to be different between first and subsequent rounds (in case of subsequent rounds, the prior mammogram could be analog or digital); (3) using VDGs based on the mean PDV of only the MLO views instead of using all available views.

We tested for linear trends across the four density categories for screening performance measures, the percentage of in situ cancers, and positive lymph nodes with a Chi square linear trend test. In addition, we examined whether tumors diagnosed in dense breasts were larger than in nondense breasts, using the Jonckheere-Terpstra test, as we expected tumor size not to be normally distributed. All statistical tests were two-sided. Statistical analyses were performed in IBM SPSS statistics, version 21 and in R, version 3.2.2 using the “geese” function from the “geepack” package.

Results

In total, 113,956 screening examinations were available. We excluded 50 examinations of which the screening outcome was unknown, 47 interval cancers which were diagnosed more than 24 months after the last screening examination, and 1961 examinations for which VDG could not be assessed. This resulted in 111,898 examinations belonging to 53,239 women with a median age of 58 years (IQR: 53–64 years). Among the examinations, 21.6% were categorized as density category 1 (‘almost entirely fatty’), and 41.5, 28.9, and 8.0% as category 2–4 (‘extremely dense’), respectively (Table 2). In total, 667 screen-detected breast cancers were identified based on a mammogram taken before January 1, 2012, and 234 interval cancers were identified within 24 months after a mammogram taken before January 1, 2012, of which 79.5 and 97.9%, respectively, were invasive breast cancers (Tables 2, 4).

Table 2.

Number of mammography examinations in total and within Volpara Density Grade (VDG) categories (based on the available views)

Total VDG 1 VDG 2 VDG 3 VDG 4
Total
 Screening examinations [N (%)] 111,898 (100%) 24,210 (21.6%) 46,426 (41.5%) 32,330 (28.9%) 8932 (8.0%)
 Screen-detected cancers (N) 667 96 298 212 61
 Interval cancers (N) 234 16 86 93 39
 False positives (N) 1774 271 700 590 213
 True negatives (N) 109,223 23,827 45,342 31,435 8619
Only invasive tumors taken into account
 Screening examinations [N (%)] 111,754 (100%) 24,188 (21.6%) 46,375 (41.5%) 32,279 (28.9%) 8912 (8.0%)
 Screen-detected cancers (N) 529 75 250 163 41
 Interval cancers (N) 228 15 83 91 39
 False positives (N) 1774 271 700 590 213
 True negatives (N) 109,223 23,827 45,342 31,435 8619
Only subsequent screening rounds
 Screening examinations [N (%)] 94,665 (100%) 22,146 (23.4%) 40,664 (43.0%) 25,777 (27.2%) 6078 (6.4%)
 Screen-detected cancers (N) 521 86 249 152 34
 Interval cancers (N) 203 16 81 80 26
 False positives (N) 1170 214 491 366 99
 True negatives (N) 92,771 21,830 39,843 25,179 5919

Table 4.

Tumor characteristics in total and within Volpara Density Grade (VDG) categories (based on the available views)

Total VDG 1 VDG 2 VDG 3 VDG 4 p trend
Proportion invasive tumorsa
 Total (N = 898) Invasive [N (%)] 757 (84.3%) 90 (80.4%) 333 (87.2%) 254 (83.6%) 80 (80.0%) 0.49
 Screen-detected cancer (N = 665) Invasive [N (%)] 529 (79.5%) 75 (78.1%) 250 (84.2%) 163 (77.3%) 41 (67.2%) 0.03
 Interval cancer (N = 233) Invasive [N (%)] 228 (97.9%) 15 (93.8%) 83 (97.6%) 91 (97.8%) 39 (100.0%) 0.20
pT (only invasive tumors)b
 Total (N = 700) T1 [N (%)] 503 (71.9%) 70 (81.4%) 231 (73.6%) 153 (66.8%) 49 (69.0%)
T2 [N (%)] 171 (24.4%) 15 (17.4%) 74 (23.6%) 65 (28.4%) 17 (23.9%) 0.02c
T3 & T4 [N (%)] 26 (3.7%) 1 (91.2%) 9 (2.9%) 11 (4.8%) 5 (7.0%)
 Screen-detected cancer (N = 511) T1 [N (%)] 404 (79.1%) 63 (85.1%) 195 (79.6%) 116 (75.8%) 30 (76.9%)
T2 [N (%)] 97 (19.0%) 11 (14.9%) 46 (18.8%) 33 (21.6%) 7 (17.9%) 0.14c
T3 & T4 [N (%)] 10 (2.0%) 0 (0.0%) 4 (1.6%) 4 (2.6%) 2 (5.1%)
 Interval cancer (N = 189) T1 [N (%)] 99 (52.4%) 7 (58.3%) 36 (52.2%) 37 (48.7%) 19 (59.4%)
T2 [N (%)] 74 (39.2%) 4 (33.3%) 28 (40.6%) 32 (42.1%) 10 (31.3%) 0.87c
T3 and T4 [N (%)] 16 (8.5%) 1 (8.3%) 5 (7.2%) 7 (9.2%) 3 (9.4%)
Lymph node status (only invasive tumors)d
 Total (N = 741) Positive [N (%)] 234 (31.6%) 18 (20.2%) 105 (32.3%) 87 (35.2%) 24 (30.0%) 0.12
 Screen-detected cancer (N = 518) Positive [N (%)] 152 (29.3%) 13 (17.6%) 75 (30.7%) 51 (32.1%) 13 (31.7%) 0.08
 Interval cancer (N = 223) Positive [N (%)] 82 (36.8%) 5 (33.3%) 30 (37.0%) 36 (40.9%) 11 (28.2%) 0.68
Tumor diameter (only invasive tumors)e
 Total (N = 691) Median (mm) (IQR) 15 (10; 22) 12 (8; 18) 15 (10; 21) 17 (11; 25) 14 (10; 22) 0.01
 Screen-detected cancer (N = 500) Median (mm) (IQR) 13 (9; 19) 11 (8; 17) 13 (10; 19) 14 (10; 20) 12 (8; 19) 0.10
 Interval cancer (N = 191) Median (mm) (IQR) 20 (14; 30) 20 (13; 33) 19 (16; 30) 21 (16; 31) 16 (12; 25) 0.34

aInformation on invasiveness is missing for 3 tumors (2 screen-detected and 1 interval tumors)

bInformation on pT status is missing for 57 tumors (18 screen-detected and 39 interval tumors)

c p-trend determined for T1 versus T2, T3, and T4

dInformation on lymph node status is missing for 16 tumors (11 screen-detected and 5 interval tumors)

eInformation on tumor diameter is missing for 66 tumors (29 screen-detected and 37 interval tumors)

Screening performance across volumetric density categories

Table 3 shows that total and interval breast cancer rates, recall rates, and false positive rates were higher in higher breast density categories compared to lower density categories, all with a significant linear trend (p-trend < 0.001). Screen-detected breast cancer rates were found to be lowest in the lowest breast density category (4.0 per 1000 examinations (‰)) and more comparable across the three highest breast density categories: 6.4, 6.6, and 6.8‰, respectively (p-trend < 0.001). The screening sensitivity was significantly lower (p-trend < 0.001) in higher breast density categories: 85.7, 77.6, 69.5, and 61.0% in VDG categories 1–4, respectively. No significant linear trend was found for PPV (p-trend = 0.12) (Table 3).

Table 3.

Screening performance measures in total and within volpara density grade (VDG) categories (based on the available views)

Screening performance measures (95% CI) for total population and within VDG breast density categories p trend
Total VDG 1 VDG 2 VDG 3 VDG 4
Total
 Recall/1000 21.8 (20.9; 22.7) 15.2 (13.7; 16.8) 21.7 (20.2; 22.9) 24.8 (23.1; 26.6) 30.7 (27.2; 34.5) <0.001
 FP/1000 15.9 (15.1; 16.6) 11.2 (9.9; 12.6) 15.1 (14.0; 16.2) 18.2 (16.8; 19.8) 23.8 (20.8; 27.3) <0.001
 Screen-detected cancer/1000 6.0 (5.5; 6.4) 4.0 (3.2; 4.8) 6.4 (5.7; 7.2) 6.6 (5.7; 7.5) 6.8 (5.3; 8.8) <0.001
 Interval cancer/1000 2.1 (1.9; 2.4) 0.7 (0.4; 1.1) 1.9 (1.5; 2.3) 2.9 (2.3; 3.5) 4.4 (3.2; 6.0) <0.001
 BC/1000 8.1 (7.6; 8.7) 4.6 (3.8; 5.6) 8.3 (7.5; 9.1) 9.4 (8.4; 10.5) 11.2 (9.2; 13.6) <0.001
 Sensitivity of screening (%) 74.0 (71.1; 76.7) 85.7 (78.1; 91.0) 77.6 (73.2; 81.5) 69.5 (64.1; 74.4) 61.0 (51.2; 70.0) <0.001
 Specificity (%) 98.4 (98.3; 98.5) 98.9 (98.7; 99.0) 98.5 (98.4; 98.6) 98.2 (98.0; 98.3) 97.6 (97.2; 97.9) <0.001
 PPV (%) 27.3 (25.6; 29.1) 26.2 (21.9; 30.9) 29.9 (27.1; 32.8) 26.4 (23.5; 29.6) 22.3 (17.7; 27.6) 0.12
Only invasive tumors taken into account
 Recall/1000 20.6 (19.8; 21.4) 14.3 (12.9; 15.9) 20.5 (19.2; 21.8) 23.3 (21.7; 25.1) 28.5 (25.2; 32.3) <0.001
 FP/1000 15.9 (15.1; 16.6) 11.2 (9.9; 12.6) 15.1 (14.0; 16.3) 18.3 (16.9; 19.8) 23.9 (20.9; 27.4) <0.001
 Screen-detected cancer/1000 4.7 (4.3; 5.1) 3.1 (2.5; 3.9) 5.4 (4.8; 6.1) 5.0 (4.3; 5.9) 4.6 (3.4; 6.2) 0.02
 Interval cancer/1000 2.1 (1.9; 2.4) 0.6 (0.4; 1.0) 1.8 (1.4; 2.2) 2.8 (2.3; 3.5) 4.4 (3.2;6.0) <0.001
 BC/1000 6.9 (6.4; 7.3) 3.7 (3.0; 4.6) 7.2 (6.5; 8.0) 7.9 (7.0; 8.9) 9.0 (7.2; 11.1) <0.001
 Sensitivity of screening (%) 69.1 (66.5; 73.0) 83.3 (74.3; 89.6) 74.4 (70.2; 79.4) 62.9 (58.1; 69.8) 50.6 (40.5; 61.9) <0.001
 Specificity (%) 98.4 (98.3; 98.5) 98.9 (98.7; 99.0) 98.5 (98.4; 98.6) 98.2 (98.0; 98.3) 97.6 (97.2; 97.9) <0.001
 PPV (%) 23.0 (21.3; 24.7) 21.7 (17.6; 26.3) 26.3 (23.6; 29.2) 21.6 (18.9; 24.7) 16.1 (12.1; 21.2) 0.02
Only subsequent screening rounds taken into account
 Recall/1000 17.9 (17.0; 18.7) 13.5 (12.1; 15.2) 18.2 (16.9; 19.5) 20.1 (18.4; 21.9) 21.9 (18.5; 25.9) <0.001
 FP/1000 12.4 (11.7; 13.1) 9.7 (8.4; 11.0) 12.1 (11.1; 13.2) 14.2 (12.8; 15.7) 16.3 (13.3; 19.9) <0.001
 Screen-detected cancer/1000 5.5 (5.0; 6.0) 3.9 (3.1; 4.8) 6.1 (5.4; 6.9) 5.9 (5.0; 6.9) 5.6 (4.0; 7.8) 0.02
 Interval cancer/1000 2.2 (1.9; 2.5) 0.7 (0.4; 1.2) 2.0 (1.6; 2.5) 3.1 (2.5; 3.9) 4.3 (2.9; 6.3) <0.001
 BC/1000 7.7 (7.2; 8.3) 4.6 (3.8; 5.6) 8.1 (7.3; 9.0) 9.0 (7.9; 10.2) 9.9 (7.7; 12.7) <0.001
 Sensitivity of screening (%) 71.3 (68.6; 75.1) 84.3 (76.0; 90.1) 74.8 (70.1; 79.8) 64.4 (59.2; 71.3) 56.7 (44.1; 68.4) <0.001
 Specificity (%) 98.8 (98.7; 98.8) 99.0 (98.9; 99.2) 98.8 (98.7; 98.9) 98.6 (98.4; 98.7) 98.4 (98.0; 98.7) <0.001
 PPV (%) 30.8 (28.6; 33.0) 28.7 (23.8; 34.0) 33.6 (30.3; 37.1) 29.3 (25.6; 33.4) 25.6 (18.8; 33.7) 0.35

FP false positive examinations, BC breast cancers, PPV positive predictive value. BC/1000 = (Screen-detected cancers/1000) + (Interval cancers/1000), Sensitivity of screening = screen-detected cancers/(screen-detected cancers + interval cancers), Specificity = true negative examinations/(true negative examinations + false positive examinations), PPV = screen-detected cancers/(screen-detected cancers + false positive examinations)

Overall trends for interval cancer rates, recall rates and false positive rates, screening sensitivity and specificity were similar when either invasive cancers alone or both invasive cancer and in situ cancers were taken into account. However, when restricting the analyses to invasive cancers only, the screening sensitivity in VDG4 decreased most notably compared to the screening sensitivity when both in situ and invasive breast cancers were taken into account. When only subsequent screening rounds were taken into account, the overall trends were again similar to the analyses based on both first and subsequent screening examinations (Table 3). The results of the sensitivity analysis, where PDV was based on MLO views only, did not differ from those based on all available views (data not shown).

In VDG category 1, 25% of the interval breast cancers were diagnosed in the first year after screening examination; in VDG categories 2 and 3, this was 41% and in VDG category 4 67%. This resulted in interval cancer rates in the first year after a screening examination of 0.2, 0.8, 1.2, and 2.9‰ (p-trend < 0.001) in VDG categories 1, 2, 3, and 4, respectively.

Tumor characteristics across volumetric density categories

Of all tumors, 74.0% were screen-detected and 26.0% were interval cancers. 15.7% of all tumors were in situ and 84.3% were invasive tumors. 89.4% of the in situ tumors showed microcalcifications on the last screening mammogram. For screen-detected tumors, the highest proportion of in situ tumors was found in the highest density category (in VDG4, 32.8% of the screen-detected tumors were in situ tumors) and the lowest proportion in density category 2 (in VDG2, 15.8% of the screen-detected tumors were in situ tumors). A significant linear trend was observed for the proportion of invasive tumors over breast density categories among screen-detected tumors (p-trend = 0.03).

About 80% of the screen-detected and slightly over 50% of the interval invasive breast cancers were smaller than 20 mm (pT1 status) at diagnosis. No linear trend was found for screen-detected tumor size across the four density categories (p-trendSD = 0.10) (Table 4). Lymph nodes were positive in 29.3% of the screen-detected cancers and 36.8% of the interval cancers. For lymph node status, no linear trend was found across the four breast density categories for screen-detected breast cancers (p-trendSD = 0.08) (Table 4).

Discussion

We found that the sensitivity of a DM screening program was significantly lower in women with high volumetric breast density than in women with low volumetric breast density (61.0 and 85.7%, respectively, (p-trend < 0.001)). This is despite the higher recall rates in women with high compared to low breast density (30.7 and 15.2‰, respectively) (p-trend < 0.001).

A study of Destounis et al., which was recently published, also studied the screening sensitivity in four automatically determined volumetric breast density categories. They found screening sensitivities of 95, 89, 83, and 65% in density categories 1–4, respectively. Additionally, they determined the mammographic screening sensitivity across the visual BI-RADS categories and found sensitivities of 82% in the lowest and 66% in the highest breast density category [17].

Four other studies where breast density was visually assessed on digital screening mammograms, also found a negative influence of breast density on screening sensitivity [68, 22] a fifth study did not find this result [23]. A Canadian study showed a lower screening sensitivity for women with 75% or higher breast density (74.2% (95% CI 67.2–80.4)) compared to women with less than 75% breast density (80.2% (95% CI 78.4–81.9)) when using direct radiography (DR) in a biennial screening program, where women who are considered to be at increased risk were screened annually [8]. In the American Digital Mammographic Imaging Screening Trial (DMIST), the screening sensitivity was determined for women with dense and nondense breasts for several subgroups. Sensitivity seemed higher for all nondense compared to dense subgroup comparisons, with exception of postmenopausal women aged 50–64 years [6]. In a study using data from the Breast Cancer Surveillance Consortium (BCSC), Kerlikowske et al. found that in an annual screening program, DM screening sensitivity was also significantly lower in the higher BI-RADS breast density categories than in the lower BI-RADS categories for women aged 50–74 years [7]. However, in another paper by Kerlikowske et al., also using BCSC data, no significant differences in screening sensitivity between breast density categories was found, when DM was used [23]. Finally, in a recently published study of Weigel et al., where data of the German biennial screening program was used, screening sensitivity was found to be lower in the higher as compared to the lower breast density categories. In that study, screening sensitivities of 100 and 50% were found for the lowest and the highest density category, respectively [22].

Although the results in the above studies are not completely consistent, the majority of them showed that screening performance is still negatively influenced by breast density when DM is used instead of FSM. This is also found in the current study.

Four out of six above-mentioned studies were conducted in the USA [6, 7, 17, 23]. The only European study determining the influence of breast density on digital mammography screening performance was the recently published study of Weigel et al. [22]. However, our study is the first to determine the effect of automatically assessed volumetric mammographic density on DM screening performance in a European population-based screening setting. There are three notable differences between European and American screening programs: (1) recall rates are below 5–7% in Europe and around 8–10% in the USA [2329]; (2) double-reading, which is also used in this study, is common in European screening programs, but not in the USA [30]; (3) the screening interval is different. Biennial screening is common in European countries, while in the USA, women are mostly screened yearly [30].

When looking at the interval cancers diagnosed within the first year after a negative screening mammogram, we found that in the lower density categories, only a small part of the interval cancers were found in the first year after a negative screening examination, and most were found in the second year, whereas in women with extremely dense breasts, this was the other way around. Although a one-year screening interval instead of a 2-year screening interval would probably result in a higher program sensitivity in all density groups, this will happen to a larger extent in the women with fatty breasts than in those with extremely dense breasts, resulting in larger differences in screening sensitivity across density categories.

When only invasive cancers instead of both invasive and in situ cancers were taken into account, the screening sensitivity decreased most notably in VDG4. This indicates that the detection of invasive breast cancers in DM screening is hampered to a larger extent than the detection of in situ breast cancers (Table 3). A possible explanation for this is that the visibility of microcalcifications, that often are the hallmark of ductal carcinoma in situ (DCIS) on mammography [20], is not hampered as much in dense tissue as the visibility of invasive breast cancers. 89.4% of the DCIS in our study was accompanied by microcalcifications.

False positive rates were found to be higher in women with dense breasts compared to women with nondense breasts. Similar trends were found in two American studies using BCSC data [7, 31].

When looking at the tumor characteristics of screen-detected breast cancers, we observe a significant linear trend for the proportion of invasive tumors over breast density categories (p-trend = 0.03). In addition, the size of screen-detected cancers and the proportion of positive lymph node status among screen-detected cancers seem to be larger in denser breasts. However, no significant linear trend was found for screen-detected tumor size and positive lymph node status proportion across the four density categories (p-trendsize = 0.10 and p-trendlymph node status = 0.08).

It should be noted that the four density categories (VDGs) used in this study are comparable to the 4th edition BI-RADS density categories. Although in 2013 the 5th BI-RADS density edition was introduced, we here still used the VDG categories comparable to the 4th edition, to enable better comparison with previous studies.

A limitation of this study is that during the study period, the MLO view was the standardly acquired view for the subsequent screening rounds and CC views were only taken in addition to MLO during the first screening round or by indication during subsequent rounds. As a result, breast density was determined based on only MLO views for some examinations and on both MLO and CC views for other examinations in our main analysis. Volpara’s PDV measured on CC views tends to be somewhat higher than on MLO views [32]. As CC views are more often performed among women with dense breasts and women with a suspicious region on their MLO view, breast density might be somewhat artificially elevated for these women. Our sensitivity analysis using VDG categories based on PDV from the MLO views only did not lead to different conclusions. Screening sensitivity is presumably higher when both MLO and CC views are available compared to MLO views only. Therefore, standardly taking both MLO and CC views would lead to higher sensitivity, particularly in women with fatty breasts as they are the ones who most often receive MLO views only. This would lead to larger differences in screening performance across breast density categories.

Strengths of this study are the large sample size and the fact that the digital mammograms were acquired in routine screening. In addition, we used a fully automatic method to determine PDV, which was possible because unprocessed image data were archived. In several studies, this automatic method (Volpara) showed to be correlated with BI-RADS breast density and to give comparable breast cancer risk estimations as with BI-RADS breast density [13, 15, 16]. In addition, it has been validated against MRI [14]. Volpara gives objective and reproducible density measurements, representing the amount of dense tissue rather than the size of the dense tissue projection as measured by area-based methods.

In summary, in a large screening population, where DM was used for screening and a fully automatic method (Volpara) was used to determine PDV, breast density was found to significantly hamper the detection of breast tumors. This is shown by a lower screening sensitivity in women with dense compared to those with nondense breasts, which existed despite a higher recall rate for women with dense breasts. These findings are in line with results of most studies using visually assessed BI-RADS density on digital mammograms. Since measuring breast density fully automatically has higher reproducibility than visual assessment, this automatic method could help with facilitating a more tailored screening, such as supplemental screening for women with dense breasts.

Acknowledgements

This study has received funding by the European Union’s Seventh Framework Programme FP7 (Grant number 306088), and the Dutch Cancer Society (Grant number KWF UU 2009-4348). Diana Miglioretti, PhD (Division of Biostatistics, Department of Public Health Sciences School of Medicine, University of California, Davis, USA), Karla Kerlikowske, MD, PhD (Department of Medicine and Epidemiology/Biostatistics, University of California, San Francisco, USA), and Rebecca Stellato, MSc (Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, The Netherlands) kindly provided statistical advice for this study. We also want to thank the Foundation of Population Screening Mid-West (The Netherlands) for providing data.

Abbreviations

DM

Digital mammography

ACR

American College of Radiology

USA

United States of America

BI-RADS

Breast imaging-reporting and data system

MRI

Magnetic resonance imaging

CC

Craniocaudal

MLO

Mediolateral oblique

ICD-O

International classification of diseases for oncology

PDV

Percentage dense volume

VDG

Volpara density grade

GEE

Generalized estimating equations

DR

Direct radiography

DMIST

Digital mammographic imaging screening trial

BCSC

Breast cancer surveillance consortium

FSM

Film-screen mammography

PPV

Positive predictive value

CI

Confidence interval

DCIS

Ductal carcinoma in situ

Compliance with ethical standards

Conflict of interest

J. Wanders, K. Holland, R. Mann, P. Peeters, C. van Gils, and N. Karssemeijer report all the same grant from the European Union’s Seventh Framework Programme (FP7), during the conduct of the study. N. Karssemeijer also reports to be one of the co-founders of Volpara Solutions, who develops and markets the breast density measurement software Volpara used in this study. In addition, N. Karssemeijer has a patent pending and is co-founder of two other companies in the field of breast imaging next to his position as professor in the University. The two companies are Qview Medical (Los, Altos, CA) and ScreenPoint Medical (Nijmegen, NL). These companies develop products for computer-aided detection of breast cancer, in whole-breast ultrasound and in mammography, respectively. C. van Gils also reports a personal grant from the Dutch Cancer Society, during the conduct of the study and a grant from Bayer Healthcare, and non-financial support from Volpara Solutions outside the submitted work. In addition, R. Mann reports grants, personal fees, and non-financial support from Siemens Healthcare and grants and personal fees from Bayer Healthcare outside the submitted work. R. Mann also reports a research contract with Seno Medical, and he reports to be a scientific advisor for ScreenPoint Medical (Nijmegen, NL) outside the submitted work. W. Veldhuis and R. Pijnappel have nothing to disclose.

Ethical standards

The authors declare that this study complies with the current laws in The Netherlands.

Footnotes

Carla H. van Gils and Nico Karssemeijer share last authorship.

References

  • 1.McCormack VA, dos Santos Silva I. Breast density and parenchymal patterns as markers of breast cancer risk: a meta-analysis. Cancer Epidemiol Biomark Prev. 2006;15(6):1159–1169. doi: 10.1158/1055-9965.EPI-06-0034. [DOI] [PubMed] [Google Scholar]
  • 2.Vachon CM, van Gils CH, Sellers TA, Ghosh K, Pruthi S, Brandt KR, et al. Mammographic density, breast cancer risk and risk prediction. Breast Cancer Res: BCR. 2007;9(6):217. doi: 10.1186/bcr1829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Boyd NF, Guo H, Martin LJ, Sun L, Stone J, Fishell E, et al. Mammographic density and the risk and detection of breast cancer. N Engl J Med. 2007;356(3):227–236. doi: 10.1056/NEJMoa062790. [DOI] [PubMed] [Google Scholar]
  • 4.Kerlikowske K. The mammogram that cried Wolfe. N Engl J Med. 2007;356(3):297–300. doi: 10.1056/NEJMe068244. [DOI] [PubMed] [Google Scholar]
  • 5.Are You Dense Advocacy Website. [cited 2016 October 29]; Available from: http://www.areyoudenseadvocacy.org
  • 6.Pisano ED, Hendrick RE, Yaffe MJ, Baum JK, Acharyya S, Cormack JB, et al. Diagnostic accuracy of digital versus film mammography: exploratory analysis of selected population subgroups in DMIST. Radiology. 2008;246(2):376–383. doi: 10.1148/radiol.2461070200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kerlikowske K, Zhu W, Tosteson AN, Sprague BL, Tice JA, Lehman CD, et al. Identifying women with dense breasts at high risk for interval cancer: a cohort study. Ann Intern Med. 2015;162(10):673–681. doi: 10.7326/M14-1465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Prummel MV, Muradali D, Shumak R, Majpruz V, Brown P, Jiang H et al (2015) Digital compared with screen-film mammography: measures of diagnostic accuracy among women screened in the Ontario breast screening program. Radiology 150–733 [DOI] [PubMed]
  • 9.Ciatto S, Houssami N, Apruzzese A, Bassetti E, Brancato B, Carozzi F, et al. Categorizing breast mammographic density: intra- and interobserver reproducibility of BI-RADS density categories. Breast. 2005;14(4):269–275. doi: 10.1016/j.breast.2004.12.004. [DOI] [PubMed] [Google Scholar]
  • 10.Gard CC, Aiello Bowles EJ, Miglioretti DL, Taplin SH, Rutter CM. Misclassification of breast imaging reporting and data system (BI-RADS) mammographic density and implications for breast density reporting legislation. Breast J. 2015;21(5):481–489. doi: 10.1111/tbj.12443. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Gweon HM, Youk JH, Kim JA, Son EJ. Radiologist assessment of breast density by BI-RADS categories versus fully automated volumetric assessment. AJR Am J Roentgenol. 2013;201(3):692–697. doi: 10.2214/AJR.12.10197. [DOI] [PubMed] [Google Scholar]
  • 12.Redondo A, Comas M, Macia F, Ferrer F, Murta-Nascimento C, Maristany MT, et al. Inter- and intraradiologist variability in the BI-RADS assessment and breast density categories for screening mammograms. Br J Radiol. 2012;85(1019):1465–1470. doi: 10.1259/bjr/21256379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Brandt KR, Scott CG, Ma L, Mahmoudzadeh AP, Jensen MR, Whaley DH et al (2015) Comparison of clinical and automated breast density measurements: implications for risk prediction and supplemental screening. Radiology 151–261 [DOI] [PMC free article] [PubMed]
  • 14.Gubern-Merida A, Kallenberg M, Platel B, Mann RM, Marti R, Karssemeijer N. Volumetric breast density estimation from full-field digital mammograms: a validation study. PLoS ONE. 2014;9(1):e85952. doi: 10.1371/journal.pone.0085952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Seo JM, Ko ES, Han BK, Ko EY, Shin JH, Hahn SY. Automated volumetric breast density estimation: a comparison with visual assessment. Clin Radiol. 2013;68(7):690–695. doi: 10.1016/j.crad.2013.01.011. [DOI] [PubMed] [Google Scholar]
  • 16.Lee HN, Sohn YM, Han KH. Comparison of mammographic density estimation by Volpara software with radiologists’ visual assessment: analysis of clinical-radiologic factors affecting discrepancy between them. Acta Radiol. 2015;56(9):1061–1068. doi: 10.1177/0284185114554674. [DOI] [PubMed] [Google Scholar]
  • 17.Destounis S, Johnston L, Highnam R, Arieno A, Morgan R, Chan A (2016) Using volumetric breast density to quantify the potential masking risk of mammographic density. AJR. Am J Roentgenol 1–6 [DOI] [PubMed]
  • 18.Bluekens AM, Karssemeijer N, Beijerinck D, Deurenberg JJ, van Engen RE, Broeders MJ, et al. Consequences of digital mammography in population-based breast cancer screening: initial changes and long-term impact on referral rates. Eur Radiol. 2010;20(9):2067–2073. doi: 10.1007/s00330-010-1786-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Karssemeijer N, Bluekens AM, Beijerinck D, Deurenberg JJ, Beekman M, Visser R, et al. Breast cancer screening results 5 years after introduction of digital mammography in a population-based screening program. Radiology. 2009;253(2):353–358. doi: 10.1148/radiol.2532090225. [DOI] [PubMed] [Google Scholar]
  • 20.Bluekens AM, Holland R, Karssemeijer N, Broeders MJ, den Heeten GJ. Comparison of digital screening mammography and screen-film mammography in the early detection of clinically relevant cancers: a multicenter study. Radiology. 2012;265(3):707–714. doi: 10.1148/radiol.12111461. [DOI] [PubMed] [Google Scholar]
  • 21.Highnam R, Brady M, Yaffe MJ, Karssemeijer N, Harvey J. Robust breast composition measurement—volpara (TM) Lect Notes Comput Sci. 2010;6136:342–349. doi: 10.1007/978-3-642-13666-5_46. [DOI] [Google Scholar]
  • 22.Weigel S, Heindel W, Heidrich J, Hense HW, Heidinger O (2016) Digital mammography screening: sensitivity of the programme dependent on breast density. Eur Radiol [DOI] [PubMed]
  • 23.Kerlikowske K, Hubbard RA, Miglioretti DL, Geller BM, Yankaskas BC, Lehman CD, et al. Comparative effectiveness of digital versus film-screen mammography in community practice in the United States: a cohort study. Ann Intern Med. 2011;155(8):493–502. doi: 10.7326/0003-4819-155-8-201110180-00005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Pisano ED, Gatsonis C, Hendrick E, Yaffe M, Baum JK, Acharyya S, et al. Diagnostic performance of digital versus film mammography for breast-cancer screening. N Engl J Med. 2005;353(17):1773–1783. doi: 10.1056/NEJMoa052911. [DOI] [PubMed] [Google Scholar]
  • 25.Hambly NM, McNicholas MM, Phelan N, Hargaden GC, O’Doherty A, Flanagan FL. Comparison of digital mammography and screen-film mammography in breast cancer screening: a review in the Irish breast screening program. AJR Am J Roentgenol. 2009;193(4):1010–1018. doi: 10.2214/AJR.08.2157. [DOI] [PubMed] [Google Scholar]
  • 26.van Luijt PA, Fracheboud J, Heijnsdijk EA, den Heeten GJ, de Koning HJ. Nation-wide data on screening performance during the transition to digital mammography: observations in 6 million screens. Eur J Cancer. 2013;49(16):3517–3525. doi: 10.1016/j.ejca.2013.06.020. [DOI] [PubMed] [Google Scholar]
  • 27.Henderson LM, Benefield T, Nyante SJ, Marsh MW, Greenwood-Hickman MA, Schroeder BF. Performance of digital screening mammography in a population-based cohort of black and white women. Cancer Causes Control: CCC. 2015;26(10):1495–1499. doi: 10.1007/s10552-015-0631-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kemp Jacobsen K, O’Meara ES, Key D, I Buist, Kerlikowske K, Vejborg Buist, et al. Comparing sensitivity and specificity of screening mammography in the United States and Denmark. Int J Cancer. Journal international du cancer. 2015;137(9):2198–2207. doi: 10.1002/ijc.29593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lee CS, Bhargavan-Chatfield M, Burnside ES, Nagy P, Sickles EA. The national mammography database: preliminary data. AJR Am J Roentgenol. 2016;206(4):883–890. doi: 10.2214/AJR.15.14312. [DOI] [PubMed] [Google Scholar]
  • 30.International Cancer Screening Network (ICSN) website. [cited 2015 November 1]; Available from: http://healthcaredelivery.cancer.gov/icsn/
  • 31.Nelson HD, O’Meara ES, Kerlikowske K, Balch S, Miglioretti D (2016) Factors associated with rates of false-positive and false-negative results from digital mammography screening: an analysis of registry data. Ann Intern Med [DOI] [PMC free article] [PubMed]
  • 32.Eng A, Gallant Z, Shepherd J, McCormack V, Li J, Dowsett M, et al. Digital mammographic density and breast cancer risk: a case-control study of six alternative density assessment methods. Breast Cancer Res: BCR. 2014;16(5):439. doi: 10.1186/s13058-014-0439-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Breast Cancer Research and Treatment are provided here courtesy of Springer

RESOURCES