Abstract
This study was to evaluate the performance of commercially available artificial intelligence (AI) software in unilateral mammograms simulating postmastectomy surveillance compared with AI software used in bilateral mammograms from the same women serving as controls. A retrospective database search identified consecutive women who underwent breast cancer surgery between January 2021 and December 2021. AI software was applied to the mammogram immediately preceding breast cancer diagnosis in two modes: bilateral (the standard bilateral mammography dataset) and unilateral analyses (each breast's craniocaudal and mediolateral oblique views), and their outputs were reviewed. The sensitivity, specificity, and number of marks per breast were compared between the bilateral and unilateral analyses with -5% non-inferiority margin for the difference in sensitivity and specificity between the two modes. A total of 694 women (mean age, 55.2 ± 10.8 years) with unilateral or bilateral breast cancer contributed mammograms for analysis; each breast was then separately evaluated in the unilateral postmastectomy simulation (n = 1388), of which 730 had breast cancer (52.6%) (mean invasive size = 1.5 cm) and compared with bilateral mammography analysis. The sensitivity of unilateral analysis was not inferior to that of bilateral analysis (78.6% vs. 76.7%), with a difference of 1.9%. The specificity of unilateral analysis was inferior to that in the bilateral analysis (81.5% vs. 91.9%), with a difference of -10.5% being lower than the non-inferiority margin. The average number of AI marks per breast was 0.94 (unilateral [1298/1388] and bilateral analyses [1306/1388], respectively). AI software performance in simulated unilateral mammography analysis demonstrated non-inferior sensitivity and inferior specificity compared to bilateral mammography.
Keywords: Artificial intelligence, Mammography, Mastectomy, Non-inferiority
Introduction
Breast cancer survivors have a higher risk of subsequent breast cancer than women without a personal history of breast cancer (PHBC) [1]. In women with a previous diagnosis of breast cancer who have been treated with total mastectomy, surveillance mammography of the contralateral breast is recommended for the early detection of second breast cancers. However, the interval invasive cancer rate of mammography screening in women with PHBC within 5 years of primary breast cancer treatment was 2.9 per 1000 examinations [2]. According to the Breast Cancer Surveillance Consortium (BCSC) [3], digital mammography or tomosynthesis has a relatively low sensitivity of 70% (95% CI: 67%, 72%) and a high interval cancer rate (i.e., the number of breast cancers per 1000 screening diagnosed within 1 year of a negative screening examination, owing to clinical symptoms or abnormalities detected with other imaging methods) of 3.6 per 1000 examinations (95% CI: 3.2, 3.9). Given the limitations of mammography, more intensive multimodality surveillance imaging is being tested to improve the early detection of second breast cancers [4].
Recent advances in artificial intelligence (AI) technology present additional possibilities for earlier breast cancer detection and lesion discrimination [5]. Its potential effectiveness in surveillance mammography for women who have undergone lumpectomy or unilateral mastectomy has been evaluated in study comparing digital breast tomosynthesis and AI performance with digital mammography following breast-conserving surgery [6]. The results of this study revealed that adjunct digital breast tomosynthesis or AI reduced recall rates and improved accuracy in the ipsilateral and contralateral breasts compared to digital mammography alone [6]. However, since most current commercially available AI software is trained on standard bilateral mammography, it is unclear whether these tools can be effectively used for specific cohorts including scarred or unilateral breasts [7]. Given the rising number of breast cancer cases (approximately 287,850 new invasive breast cancer cases and 51,400 cases of DCIS in 2022 among women in the United States) with one third undergoing mastectomy [8], and an increasing number of survivors due to advances in screening and treatment, it may be necessary to develop software tailored for these cohorts. However, understanding how the performance of existing AI systems changes when contralateral breast information is removed could provide essential insights into how women treated with mastectomy might benefit from AI.
Thus, this study aimed to evaluate the performance of commercially available AI software in unilateral mammograms simulating postmastectomy surveillance, comparing it with AI software used in bilateral mammograms from the same women serving as controls.
Materials and Methods
Study Cohort
This retrospective study was approved by the institutional review board of Seoul National University Hospital (IRB No. 2304-083-1423), which waived the requirement for informed consent. Our institutional database was searched to gather data of consecutive women who underwent definitive surgical treatment for breast cancer between January 2021 and December 2021 and who underwent screening or diagnostic bilateral digital mammography with or without additional breast imaging within 2 months prior to breast cancer diagnosis. We included women with a first-time diagnosis of breast cancer of ductal carcinoma in situ (DCIS) or stage I–III invasive cancer according to the American Joint Committee on Cancer (AJCC) 8th edition guidelines [9] and those who had no history of breast cancer surgery or neoadjuvant chemotherapy. A total of 732 women met the inclusion criteria. Among them, 38 were subsequently excluded, of whom 17 had an insufficient follow-up period (< 330 days), 17 had excisional biopsy scars, three had injection granulomas, and one underwent augmentation mammoplasty. Consequently, 694 women with 730 breast cancers and 658 with normal or benign breasts were included in the final analysis (Fig. 1).
Fig. 1.
Flowchart of the patient selection process
Mammography Data Set Preparation
All imaging data were prospectively obtained between January 2021 and December 2021 as a part of routine clinical practice and stored in a picture archiving and communication system. All the mammographic imaging data were acquired using a full-field digital mammography unit (Selenia Dimensions; Hologic, Bedford, MA, USA). Standard mammography includes bilateral craniocaudal (CC) and mediolateral oblique (MLO) views. Mammographic reports were created at the time of interpretation. Subsequently, two datasets were developed to compare the performance of the AI software in unilateral mammography simulating postoperative surveillance with that of standard bilateral mammography. One was the original bilateral mammography dataset (bilateral analysis), and the other was unilateral mammography using each breast's CC and MLO views (unilateral analysis). In this unilateral analysis, each breast was treated independently as an individual case.
AI Software Application
For AI analysis, the commercially available AI software (Lunit INSIGHT for Mammography, ver. 1.1.7.1, Lunit, Inc.), validated in a multinational study, was used [10]. Details of the development and configuration of the commercially available AI software for breast cancer detection have been described previously [10]. The AI software was sequentially applied in bilateral and unilateral analyses (Fig. 2). The model is trained concurrently for two distinct tasks: image-level cancer detection, using a single-view image as input, and exam-level cancer detection, using four-view images [11]. Depending on the image composition, the model operates in two modes: Less-than-four-view (L4V) and Standard-four-view (S4V). In our unilateral analysis, L4V mode was used and the model outputs prediction scores for each view independently using the image-level classifier. In our bilateral analysis, S4V mode was used and the model combines features from all four view images and uses the exam-level classifier to provide prediction scores for each breast. To ensure consistency, prediction scores are linearly adjusted to achieve a sensitivity of 0.90 at a threshold of 0.1 on our internal validation set in the S4V mode only. Using this algorithm, the AI software provided a visual overlay of localizing lesions on each mammographic view while providing corresponding abnormality scores ranging from 0 to 100% for each CC and MLO view. For breasts with one lesion, the highest score from either view was used as the lesion-level and breast-level score. If more than one lesion was identified in a breast, the highest abnormality score across lesions was used as the final breast-level score [10]. In the AI software report, mammograms with maximum abnormality scores < 10% were considered negative, and those with ≥ 10% were considered positive [12].
Fig. 2.
Schematic diagram of the artificial intelligence (AI) application in bilateral and unilateral analyses. In the bilateral analysis, the AI software was applied for standard bilateral mammography. In the unilateral analysis, the AI software was applied for unilateral mammography of the right breast; craniocaudal (CC) and mediolateral oblique (MLO) views of the right breast, and then the CC and MLO views of the left breast were selected for another application of the AI software
Data Collection and Reference Standard
In addition to the AI results, clinical and demographic data were abstracted from the medical records. Ground truth in terms of malignancy or benignity was confirmed with a histopathologic diagnosis or at least 1 year of follow-up. Lesions diagnosed as DCIS or invasive carcinoma on biopsy or during surgery were considered malignant. Data on the pathologic size of the invasive tumor, histologic type, TNM stage, estrogen receptor (ER) status, progesterone receptor (PR) status, and human epidermal growth factor receptor type 2 (HER2) status were collected for malignant lesions. Hormone receptor positivity was defined as an ER and/or PR positivity of ≥ 1% of nuclear staining using standard immunohistochemistry methods. HER2 positivity was defined as an HER2 score of 3 + or gene amplification by fluorescence in situ hybridization in tumors with an HER2 score of 2 + .
High risk lesions, such as lobular carcinoma in situ (LCIS), atypical ductal hyperplasia (ADH), and atypical lobular hyperplasia (ALH), were considered benign. Lesions diagnosed as benign on biopsy or during surgery, as well as those that did not exhibit a change during a follow-up of at least 1 year, were considered benign.
Preoperative breast MRI, mammograms obtained after wire localization, or both were used to determine the reference location of the cancer. Two dedicated breast radiologists (S.M.H. and J.M.C., with 11 and 17 years of experience, respectively) reviewed the marks reported by AI software to confirm whether the marks correctly identified the lesions and assessed the lesion type on mammography. Maximum abnormality scores of ≥ 10% within the corresponding mammographic location of the cancer in at least one view were considered true-positive cases. Based on the reference standard, all markers in the AI software at other sites without cancer involvement were considered false-positive cases.
Statistical Analysis
The diagnostic performance of the AI algorithm was evaluated in terms of sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Sensitivity was defined as the proportion of examinations with a positive result in which the AI system correctly identified the corresponding mammographic location of the cancer with a maximum pixel-level abnormality score of ≥ 10%, among those diagnosed with cancer during the follow-up period. Specificity was defined as the proportion of examinations with a negative result in which the AI software did not detect a suspicious mammographic location among those without a cancer diagnosis during the follow-up period. PPV was defined as the percentage of positive examinations that resulted in a tissue diagnosis of cancer. NPV was defined as the percentage of negative examinations that resulted in a negative cancer diagnosis within the follow-up period.
This study hypothesized that the diagnostic performance of AI software, in terms of sensitivity and specificity, in unilateral analysis is not inferior to that of bilateral analysis. In many existing studies, the performance of AI shows a wide range of sensitivity from 75.8% to 88.8% and specificity from 63.1% to 95.6% [13]. In our study, we assumed both the sensitivity and specificity of the bilateral analysis to be 80%, and the non-inferiority margin for the difference in sensitivity and specificity between the two modes was set at −5% based on the non-inferiority studies [14]. The non-inferiority margin is the maximum difference in diagnostic performance between the two modes which is considered acceptable or tolerable for the unilateral analysis to be considered similar or not worse compared to the bilateral analysis. A sample size of 524 pairs was needed to achieve a power of 80% (697 pairs for 90% power), with a one-sided type I error of 0.025 for the non-inferiority test with a non-inferiority margin of −5%, assuming a diagnosis proportion of 80% and the proportion of discordant pairs of 16% (within-subject correlation of 0.5) between the bilateral and unilateral mammography test results.
The diagnostic performance was estimated and compared using a generalized estimating equation with a binomial distribution. The number of marks with an abnormality score of ≥ 10% on each breast was compared between the two modes using the Wilcoxon signed-rank test separately between non-cancer and cancer cases. The interobserver agreement of the abnormality scores (the maximum abnormality scores for each of the CC and MLO views per breast) between the two modes for cancer cases was assessed using the intraclass correlation coefficient (ICC). All statistical analyses were performed using the SAS statistical software (version 9.4; SAS Institute, Inc., Cary, NC, USA). The inferiority tests performed were one-tailed, whereas the other tests performed were two-tailed. Statistically significant differences were indicated by a one-sided P-value of < 0.025, and a two-sided P-value of < 0.05 were used to indicate statistically significant differences.
Result
Patient and Lesion Characteristics
Table 1 summarizes the characteristics of 694 women and 730 breast cancers. The ages of the women ranged from 29.0 to 87.0 years (median: 53.5; mean: 55.2 years; standard deviation [SD] = 10.8 years). The mean size of the invasive cancer was 1.5 cm (SD = 1.3 cm, 0–17 cm). In 658 with normal or benign breasts, 38 lesions showed high-risk or benign histology according to biopsy results: ADH (n = 2), ALH (n = 2), LCIS (n = 2), fibroadenoma (n = 6), intraductal papilloma (n = 4), columnar cell changes (n = 5), fibrocystic changes (n = 8), complex sclerosing lesions (n = 1), sclerosing adenosis (n = 1) and usual ductal hyperplasia (n = 7).
Table 1.
Characteristics of the patient and breast cancer
| Characteristic | Study Cohort |
|---|---|
| Patient age (y)* | |
| Mean ± standard deviation | 55.2 ± 10.8 |
| Range | 29.0, 87.0 |
| First-degree family history of breast cancer (n = 694) | |
| Absent | 570 (82.1) |
| Present | 124 (17.9) |
| BI-RADS breast density (n = 694) | |
| Almost entirely fatty | 13 (1.9) |
| Scattered fibroglandular | 118 (17.0) |
| Heterogeneously dense | 312 (45.0) |
| Extremely dense | 251 (36.2) |
| Mode of detection (n = 694) | |
| Screen detected | 522 (75.2) |
| Clinically detected | 170 (24.5) |
| Unknown history | 2 (0.3) |
| Size of invasive cancer (cm) (n = 559) | |
| Mean ± standard deviation | 1.5 ± 1.3 |
| Range | 0.0, 17.0 |
| Stage of breast cancer (n = 730) | |
| Ductal carcinoma in situ, stage 0 | 171 (23.4) |
| Invasive cancer, stage I | 403 (55.2) |
| Invasive cancer, stage II | 147 (20.1) |
| Invasive cancer, stage III | 9 (1.2) |
| Lymph node metastasis (n = 730) | |
| No | 477 (93.8) |
| Yes | 94 (12.9) |
| Not submitted | 159 |
| Histology of breast cancer (n = 730) | |
| Ductal carcinoma in situ | 171 (23.4) |
| Invasive ductal carcinoma | 456 (62.4) |
| Invasive lobular carcinoma | 44 (6.0) |
| Mucinous carcinoma | 17 (2.3) |
| Microinvasive carcinoma | 28 (3.8) |
| Invasive mixed ductal and lobular carcinoma | 5 (0.7) |
| Solid papillary carcinoma | 3 (0.4) |
| Etc† | 6 (0.8) |
| Subtype of first breast cancer (n = 730) | |
| ER-positive or PR-positive | 613 (84.0) |
| ER-negative and PR-negative, HER2-positive | 52 (7.1) |
| Triple-negative | 59 (8.1) |
| Unknown | 6 (0.8) |
| Operation type (n = 730) | |
| Breast conserving surgery | 476 (65.2) |
| Total mastectomy | 254 (34.8) |
Except where indicated, data are numbers of examinations, with percentages in parentheses
Abbreviations. BI-RADS breast imaging reporting and data system, ER estrogen receptor, PR progesterone receptor, HER2 human epidermal growth factor receptor 2
*Data are presented as age (years)
†Invasive micropapillary carcinoma (n = 1), adenoid cystic carcinoma (n = 1), infiltrating mammary carcinoma (n = 1), large cell neuroendocrine carcinoma (n = 1)
Diagnostic Performances of AI in Bilateral and Unilateral Analyses
The sensitivity, specificity, PPV and NPV of the AI software for the two analyses modes are presented in Table 2, Fig. 3. The sensitivity of unilateral analysis was 78.6% (95% confidence interval [CI]: 75.5, 81.5) and 76.7% (95% CI: 73.5, 79.6) in bilateral analysis, with a difference of 1.9% (95% CI: 0.9, 3.0) (Fig. 4). The criteria for non-inferiority of unilateral mammography sensitivity were met, with the lower limit of the 95% confidence interval of the difference being higher than the non-inferiority margin of −5% and a difference of zero (P < 0.001 for the non-inferiority test and P < 0.001 for the difference test). In subgroup analysis, the sensitivity of cancer detection was significantly higher in unilateral analysis compared to bilateral analysis in patients cohort having screen detected cancer (77.3% vs. 75%, P < 0.001), stage 0 (68.4% vs. 64.3%, P = 0.007) and stage I (77.7% vs. 76.1%, P = 0.013), without lymph node metastasis (76.7% vs. 74.5%, P < 0.001), and ER or PR positive cancers (77.3% vs. 75.2%, P < 0.001). Regarding image findings, unilateral analysis detected more cancers presenting negative/benign findings (33.6% vs. 27.6%, P = 0.006), and calcifications (93.3% vs. 91.7%, P = 0.043) (Table 3). The characteristics of patients and tumors showing discrepant findings on unilateral or bilateral analysis are listed in Table 4. The specificity of AI in the unilateral analysis (81.5%; 95% CI: 78.3, 84.2) was inferior to that in the bilateral analysis (91.9%; 95% CI: 89.6, 93.8) (Fig. 5), with the lower limit of the 95% confidence interval of the difference (−10.5%; 95% CI: −12.8, −8.1) being lower than the non-inferiority margin [(P > 0.999 for non-inferiority test and P < 0.001 for difference test (statistically higher in the bilateral analysis)]. The PPV of the unilateral analysis (82.5%; 95% CI: 79.9, 84.8) was significantly lower than that of the bilateral analysis (91.4%; 95% CI: 89.0, 93.3), with the lower limit of the 95% confidence interval of the difference (−8.9%; 95% CI: −10.8, −7.0) being lower than the non-inferiority margin [(P > 0.999 for non-inferiority test and P < 0.001 for difference test). The NPV of the unilateral analysis (77.5%; 95% CI: 74.6, 80.1) was significantly non-inferior than that of the bilateral analysis (78.1%; 95% CI: 75.4, 80.5), with the lower limit of the 95% confidence interval of the difference (−0.6%; 95% CI: −1.6, 0.4) being higher than the non-inferiority margin of −5% (P < 0.001 for non-inferiority test and P = 0.219 for difference test).
Table 2.
Performance and outcomes of AI software in unilateral and bilateral analyses
| Performance Measures | Unilateral Analysis | Bilateral Analysis | Difference | P value | |||
|---|---|---|---|---|---|---|---|
| N | Proportion | N | Proportion | Non-inferiority | Difference | ||
| Overall | |||||||
| Sensitivity (%) | 574/730 | 78.6 (75.5, 81.5) | 560/730 | 76.7 (73.5, 79.6) | 1.9 (0.9, 3.0) | < 0.001 | < 0.001 |
| Specificity (%) | 536/658 | 81.5 (78.3, 84.2) | 605/658 | 91.9 (89.6, 93.8) | −10.5 (−12.8, −8.1) | > 0.999 | < 0.001 |
| PPV (%) | 574/696 | 82.5 (79.9, 84.8) | 560/613 | 91.4 (89.0, 93.3) | −8.9 (−10.8, −7.0) | > 0.999 | < 0.001 |
| NPV (%) | 536/692 | 77.5 (74.6, 80.1) | 605/775 | 78.1 (75.4, 80.5) | −0.6 (−1.6, 0.4) | < 0.001 | 0.219 |
| Women with non-dense breasts | |||||||
| Sensitivity (%) | 120/135 | 88.9 (82.0, 93.4) | 119/135 | 88.1 (81.1, 92.8) | 0.7 (−0.7, 2.2) | < 0.001 | 0.316 |
| Specificity (%) | 112/127 | 88.2 (81.3, 92.8) | 123/127 | 96.9 (91.9, 98.8) | −8.7 (−13.6, −3.8) | 0.929 | 0.001 |
| PPV (%) | 120/135 | 88.9 (83.1, 92.9) | 119/123 | 96.7 (91.8, 98.8) | −7.9 (−12.0, −3.7) | 0.911 | < 0.001 |
| NPV (%) | 112/127 | 88.2 (81.4, 92.7) | 123/139 | 88.5 (82.2, 92.7) | −0.3 (−1.8, 1.2) | < 0.001 | 0.689 |
| Women with dense breasts | |||||||
| Sensitivity (%) | 454/595 | 76.3 (72.7, 79.5) | 441/595 | 74.1 (70.5, 77.5) | 2.2 (0.9, 3.4) | < 0.001 | 0.001 |
| Specificity (%) | 424/531 | 79.8 (76.2, 83.0) | 482/531 | 90.8 (88.0, 93.0) | −10.9 (−13.6, −8.3) | > 0.999 | < 0.001 |
| PPV (%) | 454/561 | 80.9 (78.0, 83.5) | 441/490 | 90.0 (87.2, 92.2) | −9.1 (−11.2, −6.9) | > 0.999 | < 0.001 |
| NPV (%) | 424/565 | 75.0 (71.9, 78.0) | 482/636 | 75.8 (72.8, 78.5) | −0.7 (−1.9, 0.4) | < 0.001 | 0.199 |
P-value from the generalized estimating equation
PPV positive predictive value, NPV negative predictive value
Fig. 3.
The diagnostic performance, including sensitivity, and specificity of the artificial intelligence (AI) software application in unilateral and bilateral analysis. The sensitivity of the AI application in unilateral analysis was statistically higher than that in bilateral analysis, with the lower limit of the 95% confidence interval (CI) of the difference (1.9% [0.9, 3.0]) being higher than the difference of zero (p < 0.001 for difference test). The specificity of the unilateral analysis is inferior to that of the bilateral analysis, with the lower limit of the 95% CI of the difference (−10.5% [−12.8, −8.1]) being less than the non-inferiority margin (−5%) (P > 0.999 for non-inferiority test). The positive predictive value (PPV) of the unilateral analysis was significantly lower than that of the bilateral analysis, with the lower limit of 95% confidence interval of the difference (−8.9%; 95% CI: −10.8, −7.0) being lower than the non-inferiority margin [(P > 0.999 for non-inferiority test and P < 0.001 for difference test). The negative predictive value (NPV) of the unilateral analysis was significantly non-inferior than that of the bilateral analysis, with the lower limit of the 95% confidence interval of the difference (−0.6%; 95% CI: −1.6, 0.4) being higher than the non-inferiority margin of −5% (P < 0.001 for non-inferiority test and P = 0.219 for difference test)
Fig. 4.
Higher sensitivity in unilateral versus bilateral analysis. A 52-year-old woman with invasive lobular carcinoma of the right breast and ductal carcinoma in situ of the left breast. Mammography shows architectural distortion with amorphous calcifications in both upper breasts. (A) The artificial intelligence (AI) application in bilateral mammography revealed two marks, with the highest abnormality score being 98%, in the right breast (true-positive) and no mark in the left breast, which indicates a false-negative case in the left breast. (B) The AI application in unilateral mammography of the right breast revealed two marks, with the highest abnormality score being 97% (true-positive). (C) The AI application in unilateral mammography of the left breast detected a suspicious lesion with one corresponding mark, and it had an abnormality score of 33% (true-positive). (D) US image shows a 1.9-cm indistinct heterogeneous echoic mass (arrows) in the left breast. (E) Mediolateral oblique mammogram after US-guided wire localization shows good agreement with the mass detected using the AI software. The radiopaque round marker, which is attached on the skin to mark the wire insertion site, is visible
Table 3.
Comparison of the sensitivity in unilateral and bilateral analysis based on patients and cancer characteristics
| N | Cancer detected at Unilateral Analysis |
Cancer detected at Bilateral Analysis |
P-Value | |
|---|---|---|---|---|
| Patient age (y) | ||||
| 29–39 | 35 | 24 (68.6) | 23 (65.7) | 0.311 |
| 40–49 | 215 | 151 (70.2) | 148 (68.8) | 0.081 |
| 50–59 | 228 | 179 (78.5) | 169 (74.1) | 0.003 |
| 60–69 | 173 | 149 (86.1) | 149 (86.1) | > 0.999 |
| ≥ 70 | 79 | 71 (89.9) | 71 (89.9) | > 0.999 |
| BI-RADS breast density | ||||
| Almost entirely fatty | 13 | 11 (84.6) | 11 (84.6) | > 0.999 |
| Scattered fibroglandular | 122 | 109 (89.3) | 108 (88.5) | 0.315 |
| Heterogeneously dense | 325 | 268 (82.5) | 265 (81.5) | 0.179 |
| Extremely dense | 270 | 186 (68.9) | 176 (65.2) | 0.001 |
| Mode of detection | ||||
| Screen detected | 556 | 430 (77.3) | 417 (75.0) | < 0.001 |
| Clinically detected | 170 | 140 (82.4) | 139 (81.8) | 0.316 |
| Unknown history | 4 | 4 (100) | 4 (100) | > 0.999 |
| Stage of breast cancer | ||||
| Ductal carcinoma in situ, stage 0 | 171 | 117 (68.4) | 110 (64.3) | 0.007 |
| Invasive cancer, stage I | 403 | 313 (77.7) | 307 (76.2) | 0.013 |
| Invasive cancer, stage II | 147 | 136 (92.5) | 135(91.8) | 0.563 |
| Invasive cancer, stage III | 9 | 8 (88.9) | 8 (88.9) | > 0.999 |
| Lymph node metastasis | ||||
| No | 636 | 488 (76.7) | 474 (74.5) | < 0.001 |
| Yes | 94 | 86 (91.5) | 86 (91.5) | > 0.999 |
| Histology of breast cancer | ||||
| Ductal carcinoma in situ | 171 | 117 (68.4) | 110 (64.3) | 0.007 |
| Invasive ductal carcinoma | 456 | 376 (82.5) | 370 (81.1) | 0.033 |
| Invasive lobular carcinoma | 44 | 33 (75.0) | 33 (75.0) | > 0.999 |
| Mucinous carcinoma | 17 | 13 (76.5) | 12 (70.6) | 0.303 |
| Microinvasive carcinoma | 28 | 23 (82.1) | 23 (82.1) | > 0.999 |
| Invasive mixed ductal and lobular carcinoma | 5 | 4 (80.0) | 4 (80.0) | > 0.999 |
| Solid papillary carcinoma | 3 | 3 (100) | 3 (100) | > 0.999 |
| Etc† | 6 | 5 (83.3) | 5 (83.3) | > 0.999 |
| Subtype of first breast cancer | ||||
| ER-positive or PR-positive | 613 | 474 (77.3) | 461 (75.2) | < 0.001 |
| ER-negative and PR-negative, HER2-positive | 52 | 46 (88.5) | 45 (86.5) | 0.313 |
| Triple-negative | 59 | 51 (86.4) | 51 (86.4) | > 0.999 |
| Unknown | 6 | 3 (50.0) | 3 (50.0) | > 0.999 |
| Mammographic findings* | ||||
| Negative/Benign | 116 | 39 (33.6) | 32 (27.6) | 0.006 |
| Mass | 335 | 311 (92.8) | 309 (92.2) | 0.317 |
| Calcifications (suspicious) | 252 | 235 (93.3) | 231 (91.7) | 0.043 |
| Asymmetry | 145 | 103 (71.0) | 100 (69.0) | 0.078 |
| Architectural distortion | 26 | 25 (96.2) | 24 (92.3) | 0.289 |
| Skin or nipple change | 18 | 17 (94.4) | 17 (94.4) | > 0.999 |
*In patients with multiple mammographic findings, each was counted separately
Abbreviations. BI-RADS breast imaging reporting and data system, ER estrogen receptor, PR progesterone receptor, HER2 human epidermal growth factor receptor 2
P-value from the generalized estimating equation
Table 4.
Patients and tumor characteristics showing discrepant findings on unilateral or bilateral analysis
| No | Patient Age |
BI-RADS breast density | Detection at Unilateral Analysis | Detection at Bilateral Analysis | Histology | Stage | Nodal Metastasis | ER/PR/HER2 status | Lesion Type at Mammography |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 38 | d | + | - | IDC | I | No | + / ± | Calcifications |
| 2 | 51 | d | + | - | DCIS | 0 | No | -/-/ + | Negative |
| 3 | 52 | d | + | - | IDC | I | No | + / ± | Negative |
| 4 | 54 | c | + | - | IDC | I | Np | + / ± | Negative |
| 5 | 51 | d | + | - | IDC | I | No | + / ± | Calcifications, Asymmetry |
| 6 | 50 | c | + | - | DCIS | 0 | Not submitted | + / ± | Negative |
| 7 | 50 | d | + | - | IDC | II | Not submitted | + / ± | Negative |
| 8 | 43 | d | + | - | DCIS | 0 | No | + / ± | Negative |
| 9 | 51 | c | + | - | Mucinous carcinoma | II | Yes | + / ± | Mass |
| 10 | 52 | d | + | - | DCIS | 0 | Not submitted | + / ± | Calcifications, Architectural distortion |
| 11 | 41 | d | + | - | DCIS | 0 | No | + / ± | Calcifications, Asymmetry |
| 12 | 44 | c | + | - | DCIS | 0 | No | + / ± | Mass |
| 13 | 54 | b | + | - | DCIS | 0 | Not submitted | + / ± | Negative |
| 14 | 55 | d | + | - | IDC | I | No | + / ± | Mass |
| 15 | 50 | d | + | - | IDC | I | No | + / ± | Asymmetry |
| 16 | 56 | c | - | + | IDC | II | Yes | ± /- | Mass |
Abbreviations. BI-RADS breast imaging reporting and data system, IDC invasive ductal carcinoma, DCIS ductal carcinoma in situ, ER estrogen receptor, PR progesterone receptor, HER2 human epidermal growth factor receptor 2
Fig. 5.
Lower specificity in unilateral versus bilateral analysis. A 50-year-old woman with right invasive ductal carcinoma. (A) The artificial intelligence (AI) application in bilateral mammography revealed two marks, with the highest abnormality score being 53% for the right breast (true-positive). (B) The AI application in unilateral mammography of the right breast shows two marks, with the highest abnormality score being 77% (true-positive). (C) The AI application in unilateral mammography of the left breast shows one mark, with an abnormality score of 48% (false-positive)
Agreement of Abnormality Scores and Comparison of Number of Marks
The ICC of the AI abnormality scores in the 730 cancers between the unilateral and bilateral analyses was 0.985 (95% CI: 0.983, 0.987), which indicates an excellent agreement between the two modes for assessing the AI abnormality score in cancer cases.
When the number of marks for the unilateral and bilateral analyses were counted, they were 1298 and 1306, respectively, for 1388 breasts, which includes average 0.94 marks per breast in both modes. Among the 730 breast cancer cases, the number of marks in the unilateral analysis (average number of marks per breast = 1.55) was significantly lower than that in the bilateral analysis (average number of marks per breast = 1.67) (P < 0.001) (Table 5). In 658 breasts without breast cancer, the number of marks in the unilateral analysis (0.25 per breast) was significantly higher than that in the bilateral analysis (0.13 per breast) (P < 0.001).
Table 5.
Comparison of mark number in unilateral and bilateral analyses based on the presence of breast cancer
| Non-cancer cases (n = 658) | Cancer cases (n = 730) | |||
|---|---|---|---|---|
| Number of marks | Unilateral Analysis | Bilateral Analysis | Unilateral Analysis | Bilateral Analysis |
| 0 | 536 (81.5) | 605 (92.0) | 151 (20.7) | 168 (23.0) |
| 1 | 86 (13.1) | 27 (4.1) | 137 (18.8) | 62 (8.5) |
| 2 | 27 (4.1) | 18 (2.7) | 355 (48.6) | 380 (52.1) |
| 3 | 9 (1.4) | 7 (1.1) | 65 (8.9) | 87 (11.9) |
| 4 | 0 (0) | 1 (0.2) | 21 (2.9) | 30 (4.1) |
| 5 | - | - | 1 (0.1) | 3 (0.4) |
| Total number of marks | 167 | 88 | 1131 | 1218 |
| Average number of marks (per breast) | 0.25 | 0.13 | 1.55 | 1.67 |
| P-value* | < 0.001 | < 0.001 | ||
Data are numbers of examinations, with percentages in parentheses
*P value from the Wilcoxon signed-rank test
Discussion
Deep learning surpasses traditional machine learning algorithms in various recognition tasks because it directly learns rich feature representations unrestricted by human-designed features from large-scale data [10]. Recent studies of standalone AI software demonstrate diagnostic performance for screening mammography interpretation that is equivalent to or surpasses that of individual breast radiologists or the outcomes of an average reader, with an overall sensitivity of 80.6% and an overall specificity of 85.7% [13]. While studies have demonstrated that AI can improve mammography performance [15, 16], few have evaluated how AI performs in post-treatment surveillance mammography settings. Notably, despite the fact that approximately one-third of women with breast cancer undergo mastectomy for surgical treatment, data on AI’s ability to detect second breast cancers in the contralateral breast in this population is limited.
This study evaluated AI in unilateral mammography mimicking a surveillance setting using a study design where the same women’s bilateral mammograms as controls, and found that AI application for simulated unilateral mammography had a sensitivity of 78.6% and specificity of 81.5%. The sensitivity of the unilateral analysis was non-inferior and significantly higher than that of the bilateral analysis. However, the specificity of the unilateral analysis was lower than that of the bilateral analysis.
In clinical practice, radiologists assess local properties when interpreting mammograms and consider broader contextual information. Since the breast should be compared as symmetric organs, the search for asymmetries and differences is of great importance in mammography interpretation. Radiologists frequently incorporate global factors such as breast density, overall fibroglandular tissue presence, and associated parenchymal and nodular patterns into their assessments. Using normal tissue pattern from the contralateral breast and different view of ipsilateral breast provides a measure of asymmetry [17], significantly influencing the suspicion levels for a specific lesion [18].
Similar to human reasoning, the AI software performed exam-level cancer detection in the bilateral analysis using the full image dataset, while the cancer detection task in the unilateral analysis was conducted on a single-image basis with missing data. In cancer detection tasks, having a comparison might be helpful, but multiple similar findings could lead to overlooking a true cancer. Independently assessing abnormal findings in each view may increase the sensitivity of cancer diagnosis, but it could also result in many false positives. In our simulation study, applying commercially available AI software to unilateral mammography showed non-inferior and even higher sensitivity but with a trade-off of decreased specificity. For 658 non-cancer cases, the average mark number in the unilateral analysis was 0.25 per breast, which was higher than 0.13 in the bilateral analysis. Additionally, 69 more false-positive cases were found in the unilateral analysis than in the bilateral analysis. These false-positive cases can be classified as normal if the global properties and symmetry are considered. However, as is evident from our study with fewer missed cancers of AI in simulated mammography (156 in unilateral analysis vs. 170 in bilateral analysis), this approach may offer advantages for patient cohorts with a higher likelihood of recurrence by detecting cancers with subtle suspicious features and increasing the sensitivity. Although the emphasis on sensitivity versus specificity may vary depending on the characteristics of the group where AI is applied, it is crucial for future clinical use to maintain the high sensitivity achieved with current AI algorithms while improving the specificity for women undergoing unilateral mammography surveillance.
Our study has some limitations. First, this was a retrospective study conducted at a single institution. Second, we excluded women with excisional biopsy scars, injection granulomas, and augmentation mammoplasty because these alterations may affect AI performance. However, these excluded patients also undergo routine mammography in our clinical practice, and their frequency is not uncommon. Therefore, they should be included in future studies utilizing AI systems trained with this patient demographic to increase generalizability and reflect real-world data. Third, this study uses AI software from a single vendor. Thus, other AI software may exhibit different diagnostic performances when used in unilateral mammography. There is currently no commercially available or even experimental AI specifically designed for unilateral mammography, and it is uncertain when such development might occur. Utilizing the existing AI software for this unique group of postoperative patients could be a more efficient way to achieve equitable access and extend the benefits of AI to these individuals. In addition, to establish the practical viability of the unilateral AI application in various patients’ cohorts, further retrospective and prospective investigations focusing on AI performance in surveillance mammography for postmastectomy patients are essential.
In conclusion, AI application in unilateral mammography demonstrated non-inferior sensitivity but inferior specificity for detecting breast cancer compared to AI application in bilateral mammography. With additional evidence from further studies involving more extensive and diverse patient populations in the comparative analysis of available AI software applications, AI could potentially be used for surveillance mammography in women who underwent unilateral mastectomy for the detection of contralateral second breast cancer with enhanced sensitivity, although improvement in specificity is warranted.
Acknowledgements
The authors gratefully acknowledge Gunhee Nam (Lunit, Inc.) for describing the software’s operational principles and Soyoung Yim (Department of Anatomy and Cell Biology, Seoul National University College of Medicine) for providing the medical illustration.
Abbreviations
- AI
Artificial intelligence
- BI-RADS
Breast imaging reporting and data system
- DCIS
Ductal carcinoma in situ
- PPV
Positive predictive value
- NPV
Negative predictive value
Author Contributions
All authors contributed to the study conception and design. The first draft of the manuscript was written by Ji Yeong An and all authors commented on previous versions of the manuscript. Details are as follows: Ji Yeong An (Formal analysis, Writing – original draft preparation, Writing – review & editing), Janie M. Lee (Writing – review & editing), Myoung-jin Jang (Formal analysis, Writing – review & editing), Su Min Ha (Writing – review & editing), Jung Min Chang (Conceptualization, Formal analysis, Supervision, Writing – review & editing). All authors read and approved the final manuscript.
Funding
This research was supported by a Seoul National University Hospital grant no. 04–2023-2160.
Data availability
Additional documents related to this study are available on request to the corresponding author. However, the datasets from Seoul National University Hospital were used under license for the current study and are not publicly available.
Code availability
We used commercial software (Lunit INSIGHT MMG).
Declarations
Institutional Review Board Statement
The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Seoul National University Hospital (IRB No. 2304–083-1423).
Informed Consent Statement
Patient consent was waived due to retrospective nature of the study.
Competing Interests
The authors have no relevant financial or non-financial interests to disclose.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Lawson MB, Herschorn SD, Sprague BL, Buist DS, Lee SJ, Newell MS, Lourenco AP, Lee JM. Imaging surveillance options for individuals with a personal history of breast cancer: AJR expert panel narrative review. AJR Am J Roentgenol 2022;219(6):854-868. 10.2214/AJR.22.27635 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lee JM, Abraham L, Lam DL, Buist DS, Kerlikowske K, Miglioretti DL, Houssami N, Lehman CD, Henderson LM, Hubbard RA. Cumulative risk distribution for interval invasive second breast cancers after negative surveillance mammography. J Clin Oncol 2018;36(20):2070. 10.1200/JCO.2017.76.8267 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lee JM, Ichikawa LE, Wernli KJ, Bowles E, Specht JM, Kerlikowske K, Migiloretti DL, Lowry KP, Tosteson ANA. Digital mammography and breast tomosynthesis performance in women with a personal history of breast cancer, 2007-2016. Radiology 2021;300(2):290-300. 10.1148/radiol.2021204581 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ha SM, Lee JM, Kim SO, Moon WK, Chang JM. Semiannual breast US or MRI screening in patients with a personal history of breast cancer. Radiology 2023;307(5):e221660. 10.1148/radiol.221660 [DOI] [PubMed] [Google Scholar]
- 5.Yoon JH, Kim EK. Deep learning-based artificial intelligence for mammography. Korean J Radiol. 2021;22(8):1225-1239. 10.3348/kjr.2020.1210 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yoon JH, Kim EK, Kim GR, Han K, Moon HJ. Mammographic surveillance after breast-conserving therapy: impact of digital breast tomosynthesis and artificial intelligence–based computer-aided detection. AJR Am J Roentgenol 2022;218(1):42-51. 10.2214/AJR.21.26506 [DOI] [PubMed] [Google Scholar]
- 7.Allen B, Dreyer K, Stibolt Jr R, Agarwal S, Coombs L, Treml C, Elkholy M, Brink L, Wald C. Evaluation and real-world performance monitoring of artificial intelligence models in clinical practice: try it, buy it, check it. J Am Coll Radiol 2021;18(11):1489-1496. 10.1016/j.jacr.2021.08.022 [DOI] [PubMed] [Google Scholar]
- 8.Giaquinto AN, Sung H, Miller KD, Kramer JL, Newman LA, Minihan A, Jemal A, Siegel RL. Breast Cancer Statistics, 2022. CA A Cancer J Clin 2022;72(6):524-541. 10.3322/caac.21754 [DOI] [PubMed] [Google Scholar]
- 9.Amin MB, Edge SB, Greene FL, Compton CC, Gershenwald JE, Brookland RK, Meyer L, Gress DM, Byrd DR, Winchester DP. The eight edition AJCC cancer staging manual: continuing to build a bridge from a population-based to a more “personalized” approach to cancer staging. CA Cancer J Clin 2017;67(2):93–99. 10.3322/caac.21388 [DOI] [PubMed]
- 10.Kim HE, Kim HH, Han BK, Kim KH, Han K, Nam H, Lee EH, Kim EK. Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study. Lancet Digit Health 2020;2(3):e138-e148. 10.1016/S2589-7500(20)30003-0 [DOI] [PubMed] [Google Scholar]
- 11.Kim EK, Kim HE, Han K, Kang BJ, Sohn YM, Woo OH, Lee CW. Applying Data-driven Imaging Biomarker in Mammography for Breast Cancer Screening: Preliminary Study. Sci Rep. 2018 ;8(1):2762. 10.1038/s41598-018-21215-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lee SE, Hong H, Kim EK. Positive Predictive Values of Abnormality Scores From a Commercial Artificial Intelligence-Based Computer-Aided Diagnosis for Mammography. Korean J Radiol 2024;25(4):343-350. 10.3348/kjr.2023.0907 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yoon JH, Strand F, Baltzer PAT, Conant EF, Gilbert FJ, Lehman CD, Morris EA, Mullen LA, Nishikawa RM, Sharma N, Vejborg I, Moy L, Mann RM. Standalone AI for Breast Cancer Detection at Screening Digital Mammography and Digital Breast Tomosynthesis: A Systematic Review and Meta-Analysis. Radiology 2023;307(5):e222639. 10.1148/radiol.222639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kim D, Hwang JE, Cho Y, Cho HW, Lee W, Lee JH, Oh IY, Baek S, Lee E, Kim J. A Retrospective Clinical Evaluation of an Artificial Intelligence Screening Method for Early Detection of STEMI in the Emergency Department. J Korean Med Sci 2022;37(10):e81. 10.3346/jkms.2022.37.e81 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rodriguez-Ruiz A, Lång K, Gubern-Merida A, Broeders M, Gennaro G, Clauser P, Helbich TH, Chevalier M, Tan T, Mertelmeier T, Wallis MG, Andersson I, Zackrisson S, Mann RM, Sechopoulos I. Stand-alone artificial intelligence for breast cancer detection in mammography: comparison with 101 radiologists. J Natl Cancer Inst 2019;111(9):916-922. 10.1093/jnci/djy222 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lee JH, Kim KH, Lee EH, Ahn JS, Ryu JK, Park YM, Shin GW, Kim YJ, Choi HY. Improving the performance of radiologists using artificial intelligence-based detection support software for mammography: a multi-reader study. Korean J Radiol. 2022;23(5):505-516. 10.3348/kjr.2021.0476 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hupse R, Karssemeijer N. Use of normal tissue context in computer-aided detection of masses in mammograms. IEEE Trans Med Imaging 2009;28(12):2033-2041. 10.1109/TMI.2009.2028611 [DOI] [PubMed] [Google Scholar]
- 18.Wu N, Huang Z, Shen Y, Park J, Phang J, Makino T, Kim SG, Cho K, Heacock L, Moy L, Geras KJ. Reducing false-positive biopsies using deep neural networks that utilize both local and global image context of screening mammograms. J Digit Imaging 2021;34(6):1414-1423. 10.1007/s10278-021-00530-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Additional documents related to this study are available on request to the corresponding author. However, the datasets from Seoul National University Hospital were used under license for the current study and are not publicly available.
We used commercial software (Lunit INSIGHT MMG).






