Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Apr 29.
Published in final edited form as: Acad Radiol. 2018 Mar 8;26(2):196–201. doi: 10.1016/j.acra.2018.01.023

Breast Cancer Molecular Subtypes Prediction by Mammographic Radiomics Features

Wenjuan Ma a,b,c,d, Yumei Zhao a,b,c, Yu Ji a,b,c, Xinpeng Guo a,b,c, Xiqi Jian d, Peifang Liu a,b,c, Shandong Wu e
PMCID: PMC8082943  NIHMSID: NIHMS941573  PMID: 29526548

Abstract

Rationale and Objectives:

This study aimed to investigate whether quantitative radiomics features extracted from digital mammogram images are associated with molecular subtypes of breast cancer.

Materials and Methods:

In this institutional review board approved retrospective study, we collected 331 Chinese women who were diagnosed with invasive breast cancer in 2015. This cohort included 29 triple-negative, 45 human epidermal growth factor receptor 2 (HER2)-enriched, 36 luminal A, and 221 luminal B lesions. A set of 39 quantitative radiomic features, including morphological, gray scale statistic, and texture features, were extracted from the segmented lesion area. Three binary-classifications of the subtypes were performed: triple-negative vs. non-triple-negative, HER2-enriched vs. non-HER2-enriched, and luminal (A+B) vs. non-luminal. The Naive Bayes machine learning scheme was employed for the classification and the least absolute shrink age and selection operator (LASSO) method was used to select most predictive features for the classifiers. Classification performance was evaluated by the area under receiver operating characteristic curve (AUC) and accuracy.

Results:

The model that used the combination of both the craniocaudal (CC) and the mediolateral oblique (MLO) view images achieved the overall best performance than using either of the two views alone, yielding an AUC (or accuracy) of 0.865 (0.796) for triple-negative vs. non-triple-negative, 0.784 (0.748) for HER2-enriched vs. non-HER2-enriched, and 0.752 (0.788) for luminal vs. non-luminal subtypes. Twelve most predictive features were selected by the LASSO method and four of them (i.e., roundness, concavity, gray mean, and correlation) showed a statistical significance (p<0.05) in the subtype classification.

Conclusion:

Our study showed that quantitative radiomics imaging features of breast tumor extracted from digital mammograms are associated with breast cancer subtypes. Future larger studies are needed in order to further evaluate the findings.

Keywords: molecular subtypes, mammogram, radiomics, breast cancer

1. Introduction

Breast cancer subtyping has important therapeutic implications on clinical management of the disease. The major breast cancer molecular subtypes include luminal A, luminal B, human epidermal growth factor receptor 2 (HER2)-enriched, and triple-negative types (1). In general, luminal tumors represent a majority (70%) of invasive breast cancers and respond well to endocrine therapy. HER2-enriched tumors are better candidates for targeted antibody therapy (2). Triple-negative cancers can be more aggressive and difficult to treat, but some of these tumors may respond well to chemotherapy (23). Breast cancer subtypes can be categorized by genetic array testing or approximated in typical clinical practice using immunohistochemistry (IHC) markers. IHC requires tissue specimens typically obtained by a needle biopsy. Due to the relatively small tissue sample size and tumor heterogeneity, the subtyping assessment performed on a needle biopsy sample may not be representative of the tumor entirety. Clinical imaging is a non-invasive approach that has the ability to capture a broader range of tumor heterogeneity than from a single tissue sample (4). Recently, breast cancer subtype characterization has made progress on using radiological images. For example, certain qualitative and visual information of the imaging characteristics assessed on breast magnetic resonance imaging (MRI), mammography, or ultrasound has been shown to be related to the molecular subtypes of breast cancer (58).

Radiomics refers to the extraction and analysis of large amount of quantitative imaging features from medical images, which has been extensively studied in recent literature, demonstrating predictive or prognostic associations between quantitative imaging features and medical outcome (910). A few recent studies reported that there are differences in quantitative radiomics features extracted from breast dynamic contrast-enhanced (DCE) MRI with respect to the four breast cancer subtypes (1113). Breast MRI is however an expensive examination and currently not widely available especially in less-developed countries. Digital mammography is less expensive and widely used for breast cancer screening and diagnosis; but unlike DCE-MRI, mammography lacks the capability of characterizing certain biological/physiological properties of the breast tissue. Yet, several studies have demonstrated that quantitative imaging features can distinguish malignant and benign lesions in digital mammogram images (1416). In this study, we aimed to employ machine learning techniques to investigate whether quantitative radiomics imaging features extracted from digital mammography are associated with breast cancer subtypes on a Chinese women population.

2. Materials and Methods

2.1. Study cohort and imaging dataset

We conducted a retrospective study and it was approved by our institutional review board, and the informed consent requirement was waived. We collected a study cohort of 331 Chinese women who were diagnosed with invasive breast cancer (confirmed by pathology) during August 2015 to October 2015. All the 331 cancers were detected on mammography and 253 (76.4%) of them were palpable. This cohort included 29 triple-negative, 45 HER2-enriched, 36 luminal A, and 221 luminal B lesions. All mammogram images were acquired with the Hologic Lorad Selenia full field digital mammography (FFDM) systems. The FFDM images were at 14-bit quantization with a pixel size of 70×70 μm. Each patient/breast has both the craniocaudal (CC) view and mediolateral oblique (MLO) view images. A total of 662 mammogram images were analyzed.

2.2. Radiomics feature extraction

Radiomics features were extracted from the lesion area on each image. In order to do this, the first step was to outline the lesion area from each image. An experienced breast imaging-specialized radiologist (25 years’ experience) manually outlined the contours of the diagnosed breast tumor in each image of each individual patient. The largest lesion was selected if there are multiple lesions in a breast. These contours circumscribe the breast tumor region, and over this lesion region, we applied existing automated computer programs to calculate a set of 39 quantitative radiomic features from the mammographic imaging data. The 39 features include: 1) morphological features, such as shape, size, area, perimeter, roundness, concavity, and Fourier Coefficient descriptors; 2) gray-scale statistic features calculated from the histogram of tumor voxel intensities, such as mean, standard deviation, skewness, and kurtosis (17); and 3) Haralick texture features for quantifying intra-tumor heterogeneity calculated using the gray level co-occurrence matrix (GLCM), such as contrast, correlation, energy, entropy, homogeneity, inertia, and inverse different moments (18). All the 39 radiomic features were computed based on standard mathematical algorithms or formulae, as reported in previous literatures. These features provide a quantitative way to capture important phenotypic information of the segmented lesions. Note that all these features were normalized to a standard range before used in a machine learning model for breast cancer subtype classification.

2.3. Molecular subtype classification

We performed three binary-classification tasks for subtype prediction: A) triple-negative vs. non-triple-negative, B) HER2-enriched vs. non-HER2-enriched, and C) luminal (A and B) vs. non-luminal. The Naive Bayes machine learning scheme was employed for the classification (19). Considering that some of the 39 radiomics features may be correlated, we used the least absolute shrinkage and selection operator (LASSO) feature selection method (20) to pre-identify those top-ranked or most predictive features prior to the classification experiments. LASSO is a regression analysis process to improve the prediction accuracy and interpretability of statistical models by altering the model fitting process to select only a subset of the provided variables for use in the final model rather than using all of them. LASSO utilizes both variable selection and regularization to select the subset of variables that minimizes predicting error of the outcome. Statistical significance of a feature was assessed by the Kruskal-Walls test. In addition, because the sample numbers of the subtypes are unbalanced for certain subtypes (e.g., 29 triple-negative vs. 153 non-triple-negative images), we adopted the synthetic minority oversampling technique (SMOTE) to balance the sample numbers. The SMOTE technique has been used in many previous work to address a similar data imbalance problem for image classification (2123) and here in our study we have followed the routine procedures of this technique. We utilized a 10-fold cross validation and repeated it ten time to calculate an average classification performance. Classification performance was evaluated by the area under receiver operating characteristic curve (AUC) and accuracy. In addition, in order to evaluate the respective effects of the MLO and CC view images for the subtype classification, we conducted and compared the classification experiments on the MLO view images alone, CC view alone, and their combination.

3. Results

Table 1 shows the characteristics of our study cohort and lesions. There was no statistically significant difference in age (all p>0.05) among the three subtype groups. The majority of the breast cancers are mass (49.5%) and non-mass (49.5%, including architectural distortion and asymmetrical density) with only 2 microcalcifications. Out of the 331 cancer cases, 305 (92.1%) were diagnosed as invasive ductal carcinomas of no special type and 26 (7.9%) were diagnosed as invasive carcinomas of special type (including 10 mucous carcinomas, 9 lobular carcinomas, and 7 papillary carcinomas).

Table 1.

Characteristics of the study cohort and lesions.

Characteristics Triple-negative HER2-enriched Luminal (A+B)
No. of patients 29 45 108
Age (years): mean (range) 52 (30–82) 51 (31–72) 53 (29–78)
Masses
with Microcalcifications 1 (3.4%) 2 (4.4%) 6 (5.5%)
without Microcalcifications 24 (82.8%) 13 (28.9%) 44 (40.7%)
Non-mass (including architectural distortion and asymmetrical density)
with Microcalcifications 0 (0%) 19 (42.2%) 20 (18.5%)
without Microcalcifications 4 (14%) 10 (22.2%) 37 (34.3%)
Microcalcifications 0 (0%) 1 (2.2%) 1 (1%)

Table 2 shows the comparisons of the overall classification performance. As can be seen, AUC ranged from 0.695 to 0.865 while accuracy ranged from 0.634 to 0.796 in all the experiments. In terms of AUC, MLO view shows a higher AUC than the CC view across the three binary-classifications, and when the CC and MLO view were combined, AUC increased except for the luminal vs. non-luminal classification. In terms of accuracy, there were no consistent trends between the CC and MLO view data across the three subtype classifications; however, accuracy of the combination of the CC and MLO view outperformed either of them alone. In general, it is clear that the combination of CC and MLO view yielded the overall best classification performance.

Table 2.

Overall classification performance (format: AUC / accuracy) of the three binary-classifications of breast cancer subtypes.

Data Triple-negative Vs. Non- triple-negative HER2-enriched Vs. Non- HER2-enriched Luminal Vs. Non-Luminal
CC view 0.695 / 0.634 0.741 / 0.725 0.755 / 0.785
MLO view 0.853 / 0.772 0.756 / 0.702 0.791 / 0.782
CC and MLO 0.865 / 0.796 0.784 / 0.748 0.752 / 0.788

In terms of the most significant features selected by the LASSO method, there were 11, 10 and 12 top-ranked features for the CC view, MLO view, and their combination, respectively. Figure 1 shows the boxplots of the 12 selected features from CC or MLO view images when they were combined for classification, including perimeter (CC), concavity (CC), correlation (CC), inverse different moment (CC), roundness (MLO), concavity (MLO), tenth Fourier coefficients (MLO), 24th Fourier coefficients (MLO), gray mean (MLO), correlation (MLO), energy (MLO), and inverse different moment (MLO). Note that the three features, concavity, correlation, and inverse different moment in both CC and MLO views were selected. Out of the 12 features, four showed a statistically significant (p<0.05) difference: concavity (p=0,027), correlation (p=0.0015), roundness (p=0.00016), and gray mean (p=0.026). More specifically, it can be seen that the concavity values of triple-negative samples tend to be smaller than HER2-enhanced and luminal samples (Figure 1f and 1b), the correlation values of luminal lesions tend to be larger than the other type lesions (Figure 1c), the roundness values of triple-negative samples were significantly larger than HER2-enahnced and luminal lesions, and Figure 1i showed that the gray mean values of triple-negative samples tend to be larger than other subtypes.

Figure 1.

Figure 1.

The top 12 ranked radiomic imaging features selected in the CC or MLO view images. Four of them, i.e., roundness, concavity, gray mean, and correlation, are statistically significant in the difference among the subtypes.

4. Discussion

In this study, we employed a radiomics approach to investigate the potential association between breast cancer molecular subtypes and quantitative imaging features extracted from digital mammogram images. Our results on the three binary-classifications of subtypes (i.e., triple-negative vs. other types, HER2-enhncaed vs. other types, and luminal vs. other types) showed that a set of such quantitative radiomic features are predictive of the molecular subtypes of breast cancer.

While several previous studies have shown that breast MRI-derived radiomics features are associated with the subtypes (2428), our study complements answering a question that whether imaging features in digital mammography would have a similar association effect or not. While mammography is different from MRI in terms of potential imaging traits that they may be able to capture, our study demonstrated the value of mammogram images that capture only morphological/anatomical properties of the breast in helping assess breast cancer subtypes from the image-derived features.

Several previous studies have showed some correlation between breast cancer subtypes and certain qualitative mammographic characteristics or reading experience. For example, triple-negative subtypes may be more likely to manifest as an ill-defined mass, while non-triple-negative subtypes more likely to present as a spiculate mass; HER2-enriched subtypes are often characterized by structural distortion, and luminal lesions are more often to present like oval on mammogram images. Our finding is in line with these previous studies and we showed a further relationship between the molecular subtypes and the quantitative radiomics imaging features automatically extracted from digital mammograms.

In our study four features were identified as most significant predictors of the subtypes: roundness, concavity, gray mean, and correlation. Roundness depicts a lesion’s shape, while concavity reflects the irregularity of a lesion boundary. Gray mean simply represents the brightness of a lesion. Correlation represents the smoothing gradient of patterns and a larger value indicates a smoother pattern. Based on these interpretations, our results indicate that triple-negative breast cancers may be more round, regular, and brighter than other subtypes. Similarly, our results imply that luminal lesions may be more smoothing than HER2-enriched and triple-negative breast cancers. These findings, however, will need further evaluation by larger future studies. Meanwhile, we noted that some of our identified features are different from those reported in previous breast MRI-based studies (24). This may be interpreted by the intrinsic difference in nature of the 2D mammogram images and 3D DCE-MRI. This merits a further analysis and comparison of the two imaging modalities in future work. Because in our study cohort only a very small portion of the patients had breast MRI available, this prevented us from performing a comparative study between the two modalities for radiomics-based prediction of breast cancer subtypes.

We compared the effects of using MLO view, CC view, and their combination in this specific subtype classification task. Our finding that the combination of the two views achieved a higher performance may be due to the fact that the two views can provided more information than either of them alone.

Our study has several limitations. First, this was a retrospective analysis of a single-vendor images acquired at a single institution. It will be critical to evaluate whether our findings will generalize on other vendor images and external data. A future multi-center study may help address this question. Second, we used clinical immunohistochemical surrogate markers to categorize breast cancer subtypes. This is a routine approach in most such studies since in standard clinical practice genetic assay is not available to define the subtypes. Third, we have mentioned that we were not able to compare the performance of mammogram images and DCE-MRI in this cohort due to the lack of MRI data. But we would like to point out that this will be an important study to follow up. In addition, note that we did not separate luminal A and luminal B subtypes in our classification, mainly because they are similar and there is only a small difference between them in clinical management. Finally, we believe it will be worthwhile to test and compare with other machine learning techniques for the subtype classification work and we plan to do so in next steps.

Mammogram images are the most commonly available examination for breast cancer screening and diagnosis. If the automated radiomics features like we identified in this study are validated to be predictive of the molecular subtypes, it can provide further information from the images to aid radiologists in mammographic reading and to better inform clinical diagnosis and decision-making. This would have important additional value too for patients who do not have a breast MRI scan available.

In summary, this pilot radiomics study showed that quantitative imaging features extracted from digital mammograms are associated with breast cancer subtypes. Future larger studies are needed in order to further evaluate the findings and examine the relationship with breast MRI-identified features.

Acknowledgements

This work was supported by the Key Project of Tianjin Science and Technology Committee Foundation grant (12ZCDZSY16000) and the Tianjin Municipal Government of China (15JCQNJC14500). This work was also partially supported by a National Institutes of Health (NIH)/National Cancer Institute (NCI) R01 grant (#1R01CA193603) and a R01 Supplement grant (#3R01CA193603–03S1).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Declarations of interest: None

References

  • 1.Goldhirsch A, Wood W C, Coates A S, et al. Strategies for subtypes—dealing with the diversity of breast cancer: highlights of the St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2011. Ann Oncol 2013; 18(9):1133–1144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Lam S W, Jimenez C R, Boven E. Breast cancer classification by proteomic technologies: Current state of knowledge. Cancer Treat Rev 2014; 40(1):129–138. [DOI] [PubMed] [Google Scholar]
  • 3.Huber K E, Carey L A, Wazer D E. Breast cancer molecular subtypes in patients with locally advanced disease: impact on prognosis, patterns of recurrence, and response to therapy. Semin Radiat Oncol 2009; 19(4):204–210. [DOI] [PubMed] [Google Scholar]
  • 4.Lambin P, Riosvelazquez E, Leijenaar R, et al. Radiomics: Extracting more information from medical images using advanced feature analysis. Eur J Cancer 2012; 48(4):441–446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Çelebi F, Pilanci KN, Ordu Çetin, et al. The role of ultrasonographic findings to predict molecular subtype, histologic grade, and hormone receptor status of breast cancer. Diagnostic & Interventional Radiology, 2015, 21(6):448–453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Uematsu T, Kasami M, Yuen S. Triple-negative breast cancer: correlation between MR imaging and pathologic findings. Radiology, 2009, 250(3):638–647. [DOI] [PubMed] [Google Scholar]
  • 7.Wu M, Jie M. Association between imaging characteristics and different molecular subtypes of breast cancer. Academic Radiology, 2016, 24(4). [DOI] [PubMed] [Google Scholar]
  • 8.Luck A A, Evans A J, James J J, et al. Breast carcinoma with basal phenotype: mammographic findings. Ajr American Journal of Roentgenology, 2008, 191(2):346–51. [DOI] [PubMed] [Google Scholar]
  • 9.Kumar V, Gu Y, Basu S, et al. Radiomics: the process and the challenges. Magn Reson Imaging 2012; 30(9):1234–1248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Limkin EJ, Sun R, Dercle L, et al. Promises and challenges for the implementation of computational medical imaging (radiomics) in oncology. Ann Oncol 2017; 28(6):1191–1206 [DOI] [PubMed] [Google Scholar]
  • 11.Li H, Zhu Y, Burnside ES, et al. Quantitative MRI radiomics in the prediction of molecular classifications of breast cancer subtypes in the TCGA/TCIA data set. Npj Breast Cancer, 2016, 2:16012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wu J, Sun X, Wang J, et al. Identifying relations between imaging phenotypes and molecular subtypes of breast cancer: Model discovery and external validation. Journal of Magnetic Resonance Imaging Jmri, 2017; 46(4):1017–1027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Sutton EJ, Dashevsky BZ, Oh JH, et al. Breast cancer molecular subtype classifier that incorporates MRI features. Journal of Magnetic Resonance Imaging, 2016; 44(1):122–129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tang J, Rangayyan RM, Xu J, et al. Computer-aided detection and diagnosis of breast cancer with mammography: recent advances. IEEE Trans Inf Technol Biomed 2009; 13(2):236–251. [DOI] [PubMed] [Google Scholar]
  • 15.Baltzer PAT, Dietzel M, Kaiser WA. A simple and robust classification tree for differentiation between benign and malignant lesions in MR-mammography. Eur Radiol 2013; 23(8):2051–2060. [DOI] [PubMed] [Google Scholar]
  • 16.Mu T, Nandi A K, Rangayyan R M. Classification of breast masses using selected shape, edge-sharpness, and texture features with linear and kernel-based classifiers. J Digit Imaging 2008; 21(2):153–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Karsten R, Ewert B. A feature set for cytometry on digitized microscopic images. Anal Cell Pathol 2003; 25(1):1–36 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Haralick R, Shanmugam K, Dinstein I. Texture parameters for image classification. IEEE TransSMC 1973; 3:610–621. [Google Scholar]
  • 19.Peter H. Machine learning in action. Shelter Island, NY: Manning Publcations, 2012. [Google Scholar]
  • 20.Tibshirani R. The lasso method for variable selection in the cox moedl. Statistics in Medicine, 1997, 16(4):385–95. [DOI] [PubMed] [Google Scholar]
  • 21.Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 2002; 16(1):321–357. [Google Scholar]
  • 22.Emaminejad N, Wang Y, Qian W, et al. Applying a radiomics approach to predict prognosis of lung cancer patients, Medical Imaging 2016: Computer-Aided Diagnosis. Medical Imaging 2016: Computer-Aided Diagnosis, 2016:97851E. [Google Scholar]
  • 23.Maciejewski T and Stefanowski J, Local neighbourhood extension of SMOTE for mining imbalanced data, in Proceeding of the IEEE Symposium on Computational Intelligence and Data Mining (IEEE, Paris, France, 2011), pp. 104–111. [Google Scholar]
  • 24.Li H, Zhu Y, Burnside ES, et al. Quantitative MRI radiomics in the prediction of molecular classifications of breast cancer subtypes in the TCGA/TCIA data set. NPJ Breast Cancer 2016, 2:16012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Mazurowski MA, Zhang J, Grimm LJ, et al. Radiogenomic analysis of breast cancer: luminal B molecular subtype is associated with enhancement dynamics at MR imaging. Radiology 2014; 273(2):365–72. [DOI] [PubMed] [Google Scholar]
  • 26.Sutton EJ, Dashevsky BZ, Oh JH, et al. Breast cancer molecular subtype classifier that incorporates MRI features. J Magn Reson Imaging 2016; 44(1):122–129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Chang RF, Chen HH, Chang YC, et al. Quantification of breast tumor heterogeneity for ER status, HER2 status, and TN molecular subtype evaluation on DCE-MRI. Magn Reson Imaging 2016; 34(6):809–819. [DOI] [PubMed] [Google Scholar]
  • 28.Fan M, Li H, Wang S, et al. Radiomic analysis reveals DCE-MRI features for prediction of molecular subtypes of breast cancer. Plos One 2017; 12(2): e0171683. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES