Abstract
Purpose
To investigate the value of the combined diagnosis of multiparametric MRI-based deep learning models to differentiate triple-negative breast cancer (TNBC) from fibroadenoma magnetic resonance Breast Imaging-Reporting and Data System category 4 (BI-RADS 4) lesions and to evaluate whether the combined diagnosis of these models could improve the diagnostic performance of radiologists.
Methods
A total of 319 female patients with 319 pathologically confirmed BI-RADS 4 lesions were randomly divided into training, validation, and testing sets in this retrospective study. The three models were established based on contrast-enhanced T1-weighted imaging, diffusion-weighted imaging, and T2-weighted imaging using the training and validation sets. The artificial intelligence (AI) combination score was calculated according to the results of three models. The diagnostic performances of four radiologists with and without AI assistance were compared with the AI combination score on the testing set. The area under the curve (AUC), sensitivity, specificity, accuracy, and weighted kappa value were calculated to assess the performance.
Results
The AI combination score yielded an excellent performance (AUC = 0.944) on the testing set. With AI assistance, the AUC for the diagnosis of junior radiologist 1 (JR1) increased from 0.833 to 0.885, and that for JR2 increased from 0.823 to 0.876. The AUCs of senior radiologist 1 (SR1) and SR2 slightly increased from 0.901 and 0.950 to 0.925 and 0.975 after AI assistance, respectively.
Conclusion
Combined diagnosis of multiparametric MRI-based deep learning models to differentiate TNBC from fibroadenoma magnetic resonance BI-RADS 4 lesions can achieve comparable performance to that of SRs and improve the diagnostic performance of JRs.
Keywords: Breast cancer, Deep learning, Neural network, Breast MRI, Triple-negative breast cancer
Introduction
Breast cancer has the highest prevalence and mortality rate among all malignant tumous in women (Sung et al. 2021). Triple-negative breast cancer (TNBC) accounts for 10–20% of breast cancers diagnosed worldwide, with approximately 200,000 new cases each year (Kumar and Aggarwal 2016). TNBC is a heterogeneous subgroup of breast cancer that does not express the estrogen receptor or progesterone receptor or overexpress human epidermal growth factor-2 (Anders and Carey 2008). TNBC is characterized by aggressive clinical behavior, higher mortality, and poor prognosis (Dogan and Turnbull 2012; Kassam et al. 2009). TNBCs can show benign-like features on ultrasound, mammography, or magnetic resonance imaging (MRI), as these cancers grow quickly and often may mimic fibroadenomas. MRI features of TNBC may present with mass enhancement, intratumoural high T2 signal intensity, a lobulated mass, smooth margins, and rim enhancement (Ryu et al. 2014; Sung et al. 2013; Uematsu et al. 2009). This benign-like appearance of TNBC might result in false-negative results, which would lead to delayed treatment and a worse clinical outcome (Costantini et al. 2016; Mersin et al. 2008).
MRI is a powerful tool for distinguishing benign and malignant breast masses. The 5th version of the Breast Imaging-Reporting and Data System (BI-RADS) is the most widely used diagnostic system for reporting breast lesions. BI-RADS category 4 (BI-RADS 4) indicates lesions that do not fulfil the criteria for malignancy but are suspicious enough to consider a recommendation for biopsy (Leithner et al. 2017). BI-RADS 4 covers a probability of malignancy, from more than 2% to less than 95%. To date, differentiating TNBC BI-RADS 4 lesions from fibroadenoma BI-RADS 4 lesions (Clauser et al. 2021) can sometimes be a major challenge for less experienced radiologists in clinical practice. Although experienced radiologists in large medical institutions achieve a high degree of concordance with pathological results (Truhn et al. 2019), the widespread use of breast MRI in some regions is limited by the lack of radiologists with extensive experience in breast MRI diagnosis due to the long training period needed.
In recent years, machine learning has attracted increasing attention given the exciting prospect of facilitating radiologists’ diagnoses (Liu et al. 2021; Sheth and Giger 2020). A convolutional neural network (CNN) is a type of deep learning model that is widely used in computer vision and is typically composed of convolution layers, pooling layers, and fully connected layers. CNNs can automatically and adaptively learn multi-level features from images without handcrafted feature engineering (Yamashita et al. 2018). The training process of these CNN networks requires a very large amount of data. However, MRI sets of TNBCs available at a single institute are usually limited, and the limited size of the data prevents CNN models from achieving higher diagnostic performance. Transfer learning is a promising approach to solve this problem (Zhang et al. 2017), as the weights of the network structure using a transfer-learning method are pretrained with large datasets and can be publicly available for download.
Accordingly, the purpose of this study was to develop multiparametric MRI-based CNN models with transfer learning and to investigate the value of the combined diagnosis of these models to differentiate TNBC from fibroadenoma magnetic resonance BI-RADS 4 lesions. Furthermore, we tried to compare the performance of the combined diagnosis of these models with that of radiologists to check whether the models can assist the radiologists in diagnosis.
Materials and methods
Patients
This retrospective study was approved by the ethics committee of our hospital, and the requirement for informed consent was waived. We retrospectively searched breast MRI examinations performed between January 01, 2012, and April 31, 2019, from the picture archives and communication system of our hospital. The inclusion criteria for this study were as follows: (1) histologically confirmed TNBC or fibroadenoma masses; (2) patients who underwent preoperative dynamic contrast-enhanced MRI examination; and (3) patients who were classified as having BI-RADS 4 lesions in medical radiological reports. The exclusion criteria were as follows: (1) preoperative endocrine therapy, chemotherapy, or radiotherapy; (2) preoperative invasive breast operation; (3) no pathological results; (4) other pathological types; (5) non-mass lesions; and (6) obvious artifacts on magnetic resonance images. After screening, a total of 319 female patients with 319 BIRADS 4 lesions (only the largest lesion was included in the analysis for patients with multicentric carcinoma or multiple fibroadenomas) were included in this study. Of the 319 BI-RADS 4 lesions (154 TNBCs and 165 fibroadenomas), there were 199 BI-RADS 4A, 66 BIRADS 4B, and 54 BI-RADS 4C lesions according to the radiological reports. All lesions were confirmed by pathologic results of vacuum-assisted biopsy or surgical excision. Clinical data, including age, family history of breast cancer, and menopausal status, were collected. All patients were randomly divided into the training, validation, and testing sets.
Magnetic resonance image acquisition
All breast MRI examinations were performed using a 3.0 T superconducting magnetic resonance scanner (Verio, Siemens Medical Systems, Erlangen, Germany), with the patient in a prone position, using a dedicated breast surface coil (4-channel coils, Siemens, Erlangen). The turbo spin-echo T2W axial sequence was performed by adopting the following parameters: repetition time (TR), 5200 ms; echo time (TE), 89 ms; field of view, 34 × 34 cm; matrix size, 360 × 360; slice thickness, 4 mm; gap, 0; number of excitations (NEX), 1. The diffusion-weighted image (DWI) axial sequence was performed at b values of 0 and 1000 s/mm2 by adopting the following parameters: TR, 4300 ms; TE, 80 ms; field of view, 34 × 34 cm; matrix size, 256 × 256; slice thickness, 5.0 mm; slice gap, 1; NEX, 5. Apparent diffusion coefficient (ADC) maps were generated on the workstation by adopting the least-squares method with the images of b values of 0 and 1000 s/mm2. After the T2W and DWI axial sequences, a dynamic contrast-enhanced (DCE)-MRI sequence was performed with six dynamic acquisitions, one before and five after elbow vein bolus injection of gadolinium-dimeglumine (GE Healthcare) equal to 0.1 mmol/kg body weight, followed by a 20 ml saline flush. The first postcontrast dynamic image acquisition started at the 30th second after bolus injection of the contrast agent. Each phase of postcontrast dynamic acquisitions took 60 s. DCE-MRI sequence scan parameters: TR, 4.7 ms; TE, 1.61 ms; flip angle, 12°; field of view, 34 × 34 cm; matrix, 520 × 480; slice thickness, 1 mm; NEX, 1.
Regions of interest delineation
THE AXIAL T2WI, ADC, and contrast-enhanced T1-weighted (T1 + C) images were selected to develop the CNN models with transfer learning. The first axial postcontrast dynamic images were used as T1 + C because this phase of the DCE-MRI (the 90th second after the bolus injection) could best show the lesion boundary relative to adjacent tissues and contain rich distinguishing information. The ROIs of breast lesions on the axial T2WI, ADC, and T1 + C were manually delineated by a radiologist with six years of experience, without histological information. The delineation of the ROIs was performed slice-by-slice for the whole lesion volume. An example of the ROIs is shown in Fig. 1. Then, the ROIs of breast lesions were confirmed by a radiologist with over 10 years of experience. The delineation of the ROIs was performed using open-source software (3Dslicer; https://www.slicer.org/).
Fig. 1.
Delineation of regions of interest and image preprocessing included data augmentation and block extraction
Data preprocessing
Data augmentation was applied to the image sets (Fig. 1), with random rotation from −20 to 20°, stretching from 0.9 to 1.1, and shifting from −10 to 10 pixels. After the geometric image transformations, the original size of the image datasets was increased fivefold. Then, according to the ROIs of each breast lesion, a block with a size of 64 × 64 × 3 was cropped from the MR images. The extracted block centered at the center of the lesion. All blocks were reshaped to a size of 224 × 224 × 3 by zero padding.
Transfer learning model
All experiments were conducted in Python (version 3.7.0; Python Software Foundation, Wilmington, Del) using PyTorch (version: 1.4.0) on a workstation equipped with a GeForce GTX 3080 GPU. After preprocessing, the images of three sequences were used to establish three artificial intelligence (AI) models (Fig. 2). ResNet18 pretrained on ImageNet was chosen. ResNet18 was suitable for our study to avoid overfitting, as the training dataset was not too large. Several modifications were made to the pretrained ResNet18 in this study. The number of neurons in the fully connected layer was adjusted from 1000 to 2 to fit the classification task of our study. During the training process of the ResNet18 network, the weights of the first two convolutional layers were kept frozen, while the weights of all fully connected layers and all convolutional layers except for the first two convolutional layers were updated. All images were normalized before feeding into the ResNet18 network by subtracting the mean and dividing by the standard deviation. The weights of the fully connected layers and the convolutional layers, except for the first two convolutional layers, were adjusted by the images of the training sets, which were fed into the network. The ADAM algorithm was used during the training process to minimize the loss (cross-entropy) function, with a mini-batch size of 8. The initial learning rate was set to 0.001 and decayed by a factor of 10 after the 50th epoch. The training of the network was performed for 100 epochs. Finally, the model with the highest validation accuracy was selected. The images of the testing set were fed into the trained model to output the probability of every class, and the class with the highest probability was chosen as the classification result.
Fig. 2.
The conceptual architecture of the three deep learning models and artificial intelligence combination score in this study
Diagnosis of radiologists and AI
The testing set included 32 TNBCs and 35 fibroadenomas. Each lesion was evaluated as TNBC or fibroadenoma by the three AI diagnostic models. If a lesion was evaluated as TNBC by any of the three AI diagnostic models, the lesion scored 1 point. According to the total score (AI combination score) (Fig. 2), lesions of 0 points and 1 point, lesions of 2 points, and lesions of 3 points were classified as 4A, 4B and 4C, respectively. Two junior radiologists (JRs) with 2 and 3 years of experience in breast MRI and two senior radiologists (SRs) with 11 and 17 years of experience in breast MRI were recruited to diagnose the lesions on the original magnetic resonance images following the 5th version of the BI-RADS. Each lesion was classified as BI-RADS 4A, 4B, or 4C by all the radiologists without any pathological or clinical information. The radiologists were informed that the cases shown could be either fibroadenoma or TNBC. All the radiologists first independently diagnosed the 67 lesions of the testing set. After a washout period of 6 weeks, they diagnosed the testing set again with AI assistance (when the radiologists diagnosed cases, they were shown the results of three AIs.).
Statistical analysis
All statistical analyses were performed using SPSS (IBM SPSS Statistics for Windows, v.25.0. Armonk, NY) and Python. We compared the diagnostic performance of the three models, AI combination score, SRs, and JRs (with and without AI assistance). The gold standard for the diagnosis of breast lesions was the histopathology result. When evaluating the performance of the radiologists compared with the pathological results, BI-RADS 4A lesions were considered fibroadenomas and 4B and 4C lesions were considered TNBCs. The area under the curve (AUC) and its 95% confidence interval (CI), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were calculated. Significant differences between the AUCs were compared by DeLong’s test (DeLong et al. 1988). The diagnostic agreement of the radiologists was assessed using the Kappa value. For continuous variables, Welch's t test or Student's t test was used to test intergroup differences. For categorical variables, Pearson’s chi-square test or Fisher’s exact test was utilized to test intergroup differences. A two-sided P value < 0.05 was considered statistically significant.
Results
Clinicopathologic data
The clinicopathologic characteristics of the patients in the three datasets are summarized in Table 1. There was no significant difference in the clinicopathologic characteristics among the training set, validation set, and test set. The MRI features evaluated based on the radiological reports of the TNBC group and fibroadenoma group were summarized in Table 2. There were statistically significant differences in the shape, margin, T2-signal, internal enhancement, dark internal septation; time curve description, and mean ADC.
Table 1.
Clinicopathological characteristics of the patients
| Characteristic | Training set (n = 190) | Validation set (n = 62) | Testing set (n = 67) | P value |
|---|---|---|---|---|
| Age | 42 (27–75) | 45 (29–69) | 43 (32–72) | 0.820 |
| < 40 y | 59 (31.1) | 18 (29.0) | 22 (32.8) | |
| 40–49 y | 66 (34.7) | 21 (33.9) | 23 (34.3) | |
| 50–59 y | 46 (24.2) | 13 (21.0) | 17 (25.4) | |
| ≥ 60 | 19 (10.0) | 10 (16.1) | 5 (7.5) | |
| Pathology | 0.841 | |||
| Fibroadenoma | 100 (52.6) | 30 (48.4) | 35 (52.2) | |
| TNBC | 90 (47.4) | 32 (51.6) | 32 (47.8) | |
| Menopausal | 0.828 | |||
| Premenopausal | 118 (62.1) | 36 (58.1) | 42 (62.7) | |
| Post-menopausal | 72 (37.9) | 26 (41.9) | 25 (37.3) | |
| Family history | 0.712 | |||
| No | 179 (94.2) | 60 (96.8) | 63 (94.0) | |
| Yes | 11 (5.8) | 2 (3.2) | 4 (6.0) | |
| Symptom | 0.383 | |||
| Asymptomatic | 106 (55.8) | 34 (54.8) | 38 (56.7) | |
| Palpable | 76 (40.0) | 23 (37.1) | 22 (32.8) | |
| Pain | 8 (4.2) | 5 (8.1) | 7 (10.5) | |
| Tumor size | 0.920 | |||
| ≤ 2.0 cm | 106 (55.8) | 33 (53.2) | 36 (53.7) | |
| 2.1–4.0 cm | 84 (44.2) | 29 (46.8) | 31 (46.3) | |
| Lesion position | 0.695 | |||
| Right | 102 (53.8) | 34 (54.8) | 40 (59.7) | |
| Left | 88 (46.2) | 28 (45.2) | 27 (40.3) |
TNBC triple-negative breast cancer
Table 2.
The MRI features of the TNBC group and Fibroadenoma group
| Lesion characteristics | TNBC (n = 154) | Fibroadenoma (n = 165) | P value |
|---|---|---|---|
| Shape | 0.001 | ||
| Round or oval | 96 (62.3) | 131 (79.4) | |
| Irregular | 58 (37.7) | 34 (20.6) | |
| Margin | < 0.001 | ||
| Circumscribed | 67 (43.5) | 133 (80.6) | |
| Noncircumscribed | 87 (56.5) | 32 (19.4) | |
| T2-signal | < 0.001 | ||
| Hyperintensity | 93 (60.4) | 143 (86.7) | |
| Isointensity/hypointensity | 61 (39.6) | 22 (13.3) | |
| Internal enhancement | < 0.001 | ||
| Homogeneous | 27 (17.5) | 136 (82.4) | |
| Heterogeneous | 54 (35.1) | 29 (17.6) | |
| Rim enhancement | 73 (47.4) | 0 (0.0) | |
| Dark internal septation | 0 (0.0) | 120 (72.7) | < 0.001 |
| Time curve description | < 0.001 | ||
| Type 1 | 6 (3.9) | 126 (76.4) | |
| Type 2 | 38 (24.7) | 39 (23.6) | |
| Type 3 | 110 (71.4) | 0 (0.0) | |
| Mean ADC | 1.086 ± 0.195 | 1.337 ± 0.265 | < 0.001 |
TNBC triple-negative breast cancer; MRI magnetic resonance imaging; ADC apparent diffusion coefficient
Performance of the AI models
An overview of the performance of the AI models for the testing set is shown in Table 3 and Fig. 3. The training and validation curves of the AI models that display the process of training are shown in Fig. 4. The T1 + C model, ADC model, and T2WI model achieved the best accuracy at the 98th, 76th, and 59th epochs, respectively. The evaluations of the test set were based on the best model. The T1 + C model yielded good performance, with an AUC of 0.885 (95% CI: 0.791, 0.979), a sensitivity of 0.867, a specificity of 0.892, a PPV of 0.867, an NPV of 0.892, and an accuracy of 0.881. The ADC model also achieved an AUC of 0.835 (95% CI: 0.735, 0.935), a sensitivity of 0.833, a specificity of 0.811, a PPV of 0.781, an NPV of 0.857, and an accuracy of 0.821. The T2WI model demonstrated an AUC of 0.820 (95% CI: 0.715, 0.925), a sensitivity of 0.800, a specificity of 0.811, a PPV of 0.774, an NPV of 0.833, and an accuracy of 0.806. The T1 + C model performed better than the ADC model (P = 0.068) and the T2WI model (P = 0.036), and the ADC model had almost equivalent performance to the T2WI model (P = 0.795).
Table 3.
The performance characteristics of the AI combination score and the radiologists (with and without AI assistance)
| ACC | SEN | SPE | PPV | NPV | AUC (95% CI) | |
|---|---|---|---|---|---|---|
| T1 + C | 0.881 | 0.867 | 0.892 | 0.867 | 0.892 | 0.885 (0.791–0.979) |
| ADC | 0.821 | 0.833 | 0.811 | 0.781 | 0.857 | 0.835 (0.735–0.935) |
| T2WI | 0.806 | 0.800 | 0.811 | 0.774 | 0.833 | 0.820 (0.715–0.925) |
| AI combination score | 0.940 | 0.926 | 0.950 | 0.926 | 0.950 | 0.944 (0.882–0.998) |
| SR1 | 0.881 | 0.926 | 0.850 | 0.806 | 0.944 | 0.901 (0.834–0.968) |
| SR1 with AI assistance | 0.910 | 0.963 | 0.875 | 0.839 | 0.972 | 0.925 (0.868–0.983) |
| SR2 | 0.940 | 0.963 | 0.925 | 0.897 | 0.974 | 0.950 (0.902–0.999) |
| SR2 with AI assistance | 0.970 | 1.000 | 0.950 | 0.931 | 1.000 | 0.975 (0.941–1.000) |
| JR1 | 0.851 | 0.852 | 0.850 | 0.793 | 0.894 | 0.833 (0.729–0.938) |
| JR1 with AI assistance | 0.881 | 0.963 | 0.825 | 0.788 | 0.970 | 0.885 (0.822–0.974) |
| JR2 | 0.791 | 0.741 | 0.825 | 0.741 | 0.825 | 0.823 (0.729–0.917) |
| JR2 with AI assistance | 0.866 | 0.926 | 0.825 | 0.781 | 0.943 | 0.876 (0.795–0.957) |
ACC accuracy; SEN sensitivity; SPE specificity; PPV positive predictive value; NPV negative predictive value; AUC area under the receiver operating characteristic curve; CI confidence interval; T1 + C first axial post contrast dynamic images; ADC apparent diffusion coefficient; AI artificial intelligence; JR junior radiologist; SR senior radiologist
Fig. 3.

ROC analysis of the contrast-enhanced T1-weighted imaging model, diffusion-weighted imaging model, and T2-weighted imaging model in the testing set
Fig. 4.
Accuracy curves and loss curves of the training and validation sets. The training process of the three models based on contrast-enhanced T1-weighted imaging (a, d), diffusion-weighted imaging (b, e), and T2-weighted imaging (c, f)
Performance of the AI combination score and the radiologists
The sensitivity, specificity, PPV, NPV, accuracy, and AUC of the AI combination score and the radiologists are summarized in Table 3. The results of the ROC analysis are displayed in Fig. 5. For identification of lesions in the testing set, the AI combination score yielded an excellent performance, with an AUC of 0.944 (95% CI: 0.882, 0.998), a sensitivity of 0.926, a specificity of 0.950, a PPV of 0.926, an NPV of 0.950, and an accuracy of 0.940. JR1 and JR2 showed sensitivities of 0.852 and 0.741, specificities of 0.850 and 0.825, PPVs of 0.793 and 0.741, NPVs of 0.894 and 0.825, and accuracies of 0.851 and 0.791, respectively. SR1 and SR2 achieved sensitivities of 0.926 and 0.963, specificities of 0.850 and 0.925, PPVs of 0.806 and 0.897, NPVs of 0.944 and 0.974, and accuracies of 0.881 and 0.940, respectively. The AI combination score had almost equivalent performance to SR2 (P = 0.846). The AI combination score performed slightly better than SR1 (P = 0.323), and the AI combination score performed significantly better than JR1 (P = 0.073) and JR2 (P = 0.032).
Fig. 5.

The ROC analysis of the artificial intelligence combination score and four radiologists with and without artificial intelligence assistance in the testing set
Improved performance was observed when all the radiologists had AI assistance in diagnosing the lesions. With AI assistance, the AUC for JR1 increased from 0.833 to 0.885 and that for JR2 increased from 0.823 to 0.876. The AUCs for SR1 and SR2 increased slightly from 0.901 and 0.950 to 0.925 and 0.975 with AI assistance, respectively.
Interobserver agreement of the radiologists
There was an obvious increase in the interobserver agreement among the radiologists with AI assistance. The weighted kappa of interobserver agreement between JR1 and SR1 increased from 0.420 (95% CI, 0.244–0.569) to 0.660 (95% CI, 0.482–0.813) after AI assistance, and that between JR2 and SR2 increased from 0.667 (95% CI, 0.483–0.823) to 0.758 (95% CI, 0.591–0.878) after AI assistance. In addition, there was an improvement between JR1 and JR2, with a weighted kappa of 0.480 (95% CI, 0.322–0.597) without AI assistance and 0.876 (95% CI, 0.796–0.935) with AI assistance.
Discussion
In this study, we established the T1 + C model, ADC model, and T2WI model to differentiate between TNBC and fibroadenoma BI-RADS 4 lesions. The T1 + C model performed better than the ADC model and the T2WI model, which may be because the T1 + C images can best show the details of the lesions. Although a single model can only achieve moderate performance, the AI combination score can yield better performance. Our results demonstrated the potential utility of the combined diagnosis of multiparametric MRI-based deep learning models for differentiating between TNBC and fibroadenoma BI-RADS 4 lesions. The performance of the AI combination score was comparable to that of SR1 and SR2, with a sensitivity of 0.926, a specificity of 0.950, a PPV of 0.926, an NPV of 0.950, an accuracy of 0.940, and an AUC of 0.944. In addition, the diagnostic performances of JR1 and JR2 were significantly improved with AI assistance. Our study indicated that the AI combination score can reach sufficiently high sensitivity and specificity and that it may be used as a concurrent reader to improve the diagnostic performance of JRs. Some radiomics studies based on ultrasound also presented high diagnostic performances in discriminating TNBCs and fibroadenomas (Lee et al. 2018; Moon et al. 2015), and the highest AUC reached 0.970. The AI combination score in our study maintained noninferior performance to these previous radiomics models, and it can be applied end to end under the supervision of radiologists, eliminating the time-consuming process of handcrafted radiomics features.
In our study, the improvement in the diagnostic performance of the JRs was more obvious than that of the SRs, but the diagnostic performance of the SRs did improve slightly. There is substantial overlap in imaging features in category 4 benign and malignant lesions, which complicates their differential diagnosis. Some breast cancers may also present benign findings on both morphology and enhancement kinetics, and a considerable proportion of hereditary breast cancers present benign findings with oval or round shapes, smooth margins, and homogenous enhancement (Schrading and Kuhl 2008). In a multi-institutional study, 45% of breast lesions histologically confirmed as malignant lesions presented persistent enhancement, which indicated a benign feature (Schnall et al. 2006). Thus, independent differential diagnosis of TNBC and fibroadenoma BI-RADS 4 lesions is a difficult task for less experienced JRs. Some studies have reported that combining computer-aided diagnosis with breast ultrasound leads to improved specificity and AUC for radiologists in the diagnosis of breast neoplasms, and computer-aided diagnosis is more useful for less experienced radiologists (Choi et al. 2018; Kim et al. 2017). A recent study that explored the potential benefit of AI assistance for mammography showed that radiologists with the least experience and training achieved the most improvement in performance with AI assistance, and an experienced radiologist achieved the least improvement in the breast cancer detection rate with AI assistance (Watanabe et al. 2019). These studies are consistent with our findings, and our result may be attributed to two reasons: (1) the SRs demonstrated very high performance with MRI, leaving only limited space for improvement, and (2) the JRs were more willing to accept the AI diagnosis suggestions.
Our study indicated that the interobserver agreement among radiologists can be significantly improved with AI assistance. Although the fifth BI-RADS lexicon edition can be used for the standardized description of breast lesions (Marino et al. 2018), as previously reported (Milos et al. 2020), it does not provide guidance on how breast lesions that present with certain features should be managed or the limitations of empirical BI-RADS 4 category assignments that do not follow objective rules in some high-risk patients. Thus, when less experienced radiologists diagnose breast lesions independently, there may be more variability in the assessment. AI assistance can be applied if AI generalizes to a different population than the one it was trained with, and radiologists do not follow AI when it makes a mistake. When less experienced radiologists diagnose with AI assistance, a better interobserver agreement may be more conducive to subsequent diagnosis and treatment decisions for each patient.
We trained AI models to differentiate between TNBC and fibroadenoma using transfer learning to alleviate the limitations of a small dataset. To date, only a few studies that involve the differentiation of malignant and benign breast lesions have applied deep learning to breast MRI (Hu et al. 2020; Truhn et al. 2019; Zhou et al. 2019). The limited data of breast MRI is the major factor. To solve the problems of limited MRI data for CNN model training, transfer learning with fine-tuning can be a promising approach. Since many image features are composed of universal elements, a CNN initialized with pre-trained weights can perform better than a CNN trained from randomly initialized weights (Zhang et al. 2017), especially when the dataset available for training is small. Some studies have also applied the transfer learning strategy using the fine-tuned CNN with pre-trained weights and achieved higher accuracy for the differentiation of prostate and breast lesions (Byra et al. 2019; Yuan et al. 2019).
This study had several limitations. First, only four radiologists were recruited for our study, and there was a difference between the research environment in which the radiologists diagnosed lesions and the real work scene. Thus, prospective studies with more radiologists from more centres will be warranted to validate the actual effect of AI assistance. Second, selection bias may be present in our study, as these patients were not consecutive cases and had a small sample size. Third, our research adopted single-type scanners with standardized parameters, and it is unclear what the performances of AI models are when using data from other types of scanners.
In summary, our preliminary study indicated that the combined diagnosis of multiparametric MRI-based deep learning models for differentiating TNBC from fibroadenoma magnetic resonance BI-RADS 4 lesions achieved comparable performance to that of SRs and can improve the diagnostic efficiency of JRs.
Author contributions
G-wL: Supervision, Writing, Reviewing and Editing, Conceptualization. Z-hX and H-hJ: Data curation. H-lY: Writing-Original draft preparation, Methodology. YJ: Formal analysis. All authors contributed to the article and approved the submitted version.
Funding
This work was supported by the National Natural Science Foundation of China (grant no. 81771816).
Data availability
Data are available in the article. All other data can be provided upon reasonable request to the corresponding authors.
Code availability
The code of the proposed method can be provided upon reasonable request to the corresponding authors.
Declarations
Conflict of interest
The authors of this manuscript declare no conflict of interest.
Ethics approval
This retrospective study was approved by The Institutional Review Board of Huadong Hospital, and the requirement for written informed consent was waived by The Institutional Review Board of Huadong Hospital.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Anders C, Carey LA (2008) Understanding and treating triple-negative breast cancer. Oncology (Williston Park) 22:1233–1239 (discussion 1239–1240, 1243) [PMC free article] [PubMed] [Google Scholar]
- Byra M, Galperin M, Ojeda-Fournier H, Olson L, O’Boyle M, Comstock C, Andre M (2019) Breast mass classification in sonography with transfer learning using a deep convolutional neural network and color conversion. Med Phys 46:746–755. 10.1002/mp.13361 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choi JH, Kang BJ, Baek JE, Lee HS, Kim SH (2018) Application of computer-aided diagnosis in breast ultrasound interpretation: improvements in diagnostic performance according to reader experience. Ultrasonography 37:217–225. 10.14366/usg.17046 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clauser P, Krug B, Bickel H, Dietzel M, Pinker K, Neuhaus VF, Marino MA, Moschetta M, Troiano N, Helbich TH, Baltzer PAT (2021) Diffusion-weighted imaging allows for downgrading MR BI-RADS 4 lesions in contrast-enhanced MRI of the breast to avoid unnecessary biopsy. Clin Cancer Res 27:1941–1948. 10.1158/1078-0432.CCR-20-3037 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Costantini M, Belli P, Bufi E, Asunis AM, Ferra E, Bitti GT (2016) Association between sonographic appearances of breast cancers and their histopathologic features and biomarkers. J Clin Ultrasound 44:26–33. 10.1002/jcu.22312 [DOI] [PubMed] [Google Scholar]
- DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44:837–845 [PubMed] [Google Scholar]
- Dogan BE, Turnbull LW (2012) Imaging of triple-negative breast cancer. Ann Oncol 23(Suppl 6):vi23-29. 10.1093/annonc/mds191 [DOI] [PubMed] [Google Scholar]
- Hu Q, Whitney HM, Giger ML (2020) A deep learning methodology for improved breast cancer diagnosis using multiparametric MRI. Sci Rep. 10.1038/s41598-020-67441-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kassam F, Enright K, Dent R, Dranitsaris G, Myers J, Flynn C, Fralick M, Kumar R, Clemons M (2009) Survival outcomes for patients with metastatic triple-negative breast cancer: implications for clinical practice and trial design. Clin Breast Cancer 9:29–33. 10.3816/CBC.2009.n.005 [DOI] [PubMed] [Google Scholar]
- Kim K, Song MK, Kim EK, Yoon JH (2017) Clinical application of S-detect to breast masses on ultrasonography: a study evaluating the diagnostic performance and agreement with a dedicated breast radiologist. Ultrasonography 36:3–9. 10.14366/usg.16012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumar P, Aggarwal R (2016) An overview of triple-negative breast cancer. Arch Gynecol Obstet 293:247–269. 10.1007/s00404-015-3859-y [DOI] [PubMed] [Google Scholar]
- Lee SE, Han K, Kwak JY, Lee E, Kim EK (2018) Radiomics of US texture features in differential diagnosis between triple-negative breast cancer and fibroadenoma. Sci Rep 8:13546. 10.1038/s41598-018-31906-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leithner D, Wengert G, Helbich T, Morris E, Pinker K (2017) MRI in the assessment of BI-RADS(R) 4 lesions. Top Magn Reson Imaging 26:191–199. 10.1097/RMR.0000000000000138 [DOI] [PubMed] [Google Scholar]
- Liu H, Chen Y, Zhang Y, Wang L, Luo R, Wu H, Wu C, Zhang H, Tan W, Yin H, Wang D (2021) A deep learning model integrating mammography and clinical factors facilitates the malignancy prediction of BI-RADS 4 microcalcifications in breast cancer screening. Eur Radiol 31:5902–5912. 10.1007/s00330-020-07659-y [DOI] [PubMed] [Google Scholar]
- Marino MA, Riedl CC, Bernathova M, Bernhart C, Baltzer PAT, Helbich TH, Pinker K (2018) Imaging phenotypes in women at high risk for breast cancer on mammography, ultrasound, and magnetic resonance imaging using the fifth edition of the breast imaging reporting and data system. Eur J Radiol 106:150–159. 10.1016/j.ejrad.2018.07.026 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mersin H, Yildirim E, Berberoglu U, Gulben K (2008) The prognostic importance of triple negative breast carcinoma. Breast 17:341–346. 10.1016/j.breast.2007.11.031 [DOI] [PubMed] [Google Scholar]
- Milos RI, Pipan F, Kalovidouri A, Clauser P, Kapetas P, Bernathova M, Helbich TH, Baltzer PAT (2020) The Kaiser score reliably excludes malignancy in benign contrast-enhancing lesions classified as BI-RADS 4 on breast MRI high-risk screening exams. Eur Radiol 30:6052–6061. 10.1007/s00330-020-06945-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moon WK, Huang YS, Lo CM, Huang CS, Bae MS, Kim WH, Chen JH, Chang RF (2015) Computer-aided diagnosis for distinguishing between triple-negative breast cancer and fibroadenomas based on ultrasound texture features. Med Phys 42:3024–3035. 10.1118/1.4921123 [DOI] [PubMed] [Google Scholar]
- Ryu EB, Chang JM, Seo M, Kim SA, Lim JH, Moon WK (2014) Tumour volume doubling time of molecular breast cancer subtypes assessed by serial breast ultrasound. Eur Radiol 24:2227–2235. 10.1007/s00330-014-3256-0 [DOI] [PubMed] [Google Scholar]
- Schnall MD, Blume J, Bluemke DA, DeAngelis GA, DeBruhl N, Harms S, Heywang-Kobrunner SH, Hylton N, Kuhl CK, Pisano ED, Causer P, Schnitt SJ, Thickman D, Stelling CB, Weatherall PT, Lehman C, Gatsonis CA (2006) Diagnostic architectural and dynamic features at breast MR imaging: multicenter study. Radiology 238:42–53. 10.1148/radiol.2381042117 [DOI] [PubMed] [Google Scholar]
- Schrading S, Kuhl CK (2008) Mammographic, US, and MR imaging phenotypes of familial breast cancer. Radiology 246:58–70. 10.1148/radiol.2461062173 [DOI] [PubMed] [Google Scholar]
- Sheth D, Giger ML (2020) Artificial intelligence in the interpretation of breast cancer on MRI. J Magn Reson Imaging 51:1310–1324. 10.1002/jmri.26878 [DOI] [PubMed] [Google Scholar]
- Sung JS, Jochelson MS, Brennan S, Joo S, Wen YH, Moskowitz C, Zheng J, Dershaw DD, Morris EA (2013) MR imaging features of triple-negative breast cancers. Breast J 19:643–649. 10.1111/tbj.12182 [DOI] [PubMed] [Google Scholar]
- Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F (2021) Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71:209–249. 10.3322/caac.21660 [DOI] [PubMed] [Google Scholar]
- Truhn D, Schrading S, Haarburger C, Schneider H, Merhof D, Kuhl C (2019) Radiomic versus convolutional neural networks analysis for classification of contrast-enhancing lesions at multiparametric breast MRI. Radiology 290:290–297. 10.1148/radiol.2018181352 [DOI] [PubMed] [Google Scholar]
- Uematsu T, Kasami M, Yuen S (2009) Triple-negative breast cancer: correlation between MR imaging and pathologic findings. Radiology 250:638–647. 10.1148/radiol.2503081054 [DOI] [PubMed] [Google Scholar]
- Watanabe AT, Lim V, Vu HX, Chim R, Weise E, Liu J, Bradley WG, Comstock CE (2019) Improved cancer detection using artificial intelligence: a retrospective evaluation of missed cancers on mammography. J Digit Imaging 32:625–637. 10.1007/s10278-019-00192-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9:611–629. 10.1007/s13244-018-0639-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yuan Y, Qin W, Buyyounouski M, Ibragimov B, Hancock S, Han B, Xing L (2019) Prostate cancer classification with multiparametric MRI transfer learning model. Med Phys 46:756–765. 10.1002/mp.13367 [DOI] [PubMed] [Google Scholar]
- Zhang L, Le L, Nogues I, Summers RM, Liu S, Yao J (2017) DeepPap: deep convolutional networks for cervical cell classification. IEEE J Biomed Health Inf 21:1633–1643. 10.1109/JBHI.2017.2705583 [DOI] [PubMed] [Google Scholar]
- Zhou J, Zhang Y, Chang KT, Lee KE, Wang O, Li J, Lin Y, Pan Z, Chang P, Chow D, Wang M, Su MY (2019) Diagnosis of benign and malignant breast lesions on DCE-MRI by using radiomics and deep learning with consideration of peritumor tissue. J Magn Reson Imaging 51:798–809. 10.1002/jmri.26981 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data are available in the article. All other data can be provided upon reasonable request to the corresponding authors.
The code of the proposed method can be provided upon reasonable request to the corresponding authors.



