Abstract
Breast cancer is molecularly heterogeneous and categorized into four molecular subtypes: Luminal-A, Luminal-B, HER2-amplified and Triple-negative. In this study, we aimed to apply an ensemble decision approach to identify the ultrasound and clinical features related to the molecular subtypes. We collected ultrasound and clinical features from 1,000 breast cancer patients and performed immunohistochemistry on these samples. We used the ensemble decision approach to select unique features and to construct decision models. The decision model for Luminal-A subtype was constructed based on the presence of an echogenic halo and post-acoustic shadowing or indifference. The decision model for Luminal-B subtype was constructed based on the absence of an echogenic halo and vascularity. The decision model for HER2-amplified subtype was constructed based on the presence of post-acoustic enhancement, calcification, vascularity and advanced age. The model for Triple-negative subtype followed two rules. One was based on irregular shape, lobulate margin contour, the absence of calcification and hypovascularity, whereas the other was based on oval shape, hypovascularity and micro-lobulate margin contour. The accuracies of the models were 83.8%, 77.4%, 87.9% and 92.7%, respectively. We identified specific features of each molecular subtype and expanded the scope of ultrasound for making diagnoses using these decision models.
Breast cancer is one of the major causes of death for females worldwide and its incidence has been increasing1. This disease follows a diverse natural history and is variably responsive to treatments2. The limitations of traditional histological classification have led to the development of a new molecular classification, which has demonstrated the existence of four main subtypes2: Luminal-A type (LA), Luminal-B type (LB), Epidermal growth factor receptor 2-amplified type (HER2), and Triple-Negative type (TN). Gallen et al.3 found that the LA tumors with high Ki-67 expression (Ki-67 ≥ 14%) should be classified as LB subtype. Clinically, LA subtype is the most common, and several genomic tests exist for assisting in predicting patient outcomes upon receiving endocrine therapy. LB patients can benefit from neoadjuvant chemotherapy. HER2 patients generally show excellent clinical outcomes when given an effective therapeutic, trastuzumab, which targets the HER2 gene. TN subtype is a group that only has chemotherapy options4,5,6.
Although pathological diagnosis is the “gold standard” for distinguishing the molecular subtypes of breast cancer, it is invasive and might cause physical and psychological discomfort in patients. Accordingly, the development of a non-invasive method will significantly improve the diagnostic procedure. The utility of ultrasound method for the diagnosis of breast lesions has increased over the past decade7. Ultrasound, with its high level of safety and low cost, is becoming the preferred method for both physicians and patients. Reports have indicated that improvements in ultrasound technologies might make it possible to highly sensitively differentiate malignant solid breast masses from benign ones based on their different ultrasound features8. A few studies have examined the correlation between ultrasound features and certain types of biological behavior. Irshad et al. found that posterior shadowing is strongly associated with Estrogen Receptor-positive (ER+) and low-grade tumours, whereas posterior enhancement is strongly associated with high-grade tumours and a moderate risk of being receptor negativity9. Wang et al. demonstrated that in contrast to ER-negative HER2-negative tumours, ER-negative HER2-positive tumours were more likely to have spiculated margins with calcification and a higher cancer stages5. Ko et al. suggested that TN breast cancers have more circumscribed margins, are hypoechoic, and exhibit less calcification and posterior shadowing10. The above studies suggested the possibility of determining the relationship between patient characteristics and the individual molecular situation, molecular ordination or TNBC imaging features. However, although the ultrasound features of breast cancer might correlate with the molecular subtypes identified by immunochemistry (IHC) examination, the characterization of the four breast cancer molecular subtypes by ultrasound imaging and clinical modality might be complex11. As such, identifying the subtypes might require the assessment of a combination of characteristics, similar to those used for differentiating benign and malignant tumours.
To accurately detect the different features of breast cancer molecular subtypes, efficient statistical methods and computational algorithms for analysing the massive amount of clinical data available need to be developed. Decision trees are one of the most popular classification techniques for multiple features in data mining and machine learning12, and these can be converted to rule sets to improve interpretation. However, existing attempts to apply decision trees to classification using gene expression data have shown that single-tree algorithms are not sufficient for high accuracy and stability. In this study, we propose the ensemble decision approach that integrated multiple decision trees based on an ensemble decision theory to select the special features of each subtype13. We obtained multiple feature sets from the training sets by a resampling technique, and integrated the multiple feature sets to produce a combination of features of each subtype by the ensemble decision approach. We not only constructed the models but also obtained high accuracy with the models, and considered that the ensemble decision approach might have significant utility for ultrasound diagnosis of breast cancer in the future.
Results
The general description and feature distribution of the four subtypes
When analysing the four breast cancer subtypes, we found that the rates of LA, LB, HER2, and TN were 37.8%, 36.8%, 12.5% and 12.9%, respectively, as depicted in Fig. 1a. The images in Fig. 1b–e are ultrasound pictures of the four molecular subtypes, which were intuitively diverse. The appearance of each ultrasound feature in each subtype is summarized in Fig. 2.
The ensemble decision models of breast cancer subtypes
We constructed the models using the ensemble decision approach. We randomly selected 80% of the data from each sample category (LA, LB, HER2 and TN) to construct the training set, corresponding to 256, 249, 84 and 87 patients, for a total of 676 patients. The remaining 20% of the data were used as the test set, which comprised 170 patients.
Identifying LA breast cancer based on ultrasound features
We randomly selected 80% of LA data and 80% nLA data (containing the LB, HER2 and TN categories) for a total of 541 patients from the training set and we used these data to construct a decision tree. Then, we repeated this step 1,000 times, and built 1000 decision trees. We extracted the features from each decision tree and calculated their frequencies (FV) across the 1000 decision trees (Table 1). For each feature, we obtained the empirical null distribution by randomly permuting the category labels of the patients and determining the corresponding cutoff (Table 1). Then, we selected the features (, one-tailed), including boundary, post acoustic and Adler. To get the stable features, we repeated the above steps 1,000 times. The frequencies of the three features were 999, 999 and 117, respectively. We finally chose the stable features with high frequency, including boundary and post acoustic, to construct the decision model (Fig. 3a).
Table 1. The FV, cutoff and frequencies for each subtype.
Features | LA | LB | HER2 | TN | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FV | Cutoff | Frequencies | Final selection | FV | Cutoff | Frequencies | Final selection | FV | Cutoff | Frequencies | Final selection | FV | Cutoff | Frequencies | Final selection | |
Age | 0.792 | 1.000 | 0 | no | 0.978 | 1.000 | 0 | no | 0.972 | 0.779 | 998 | yes | 0.551 | 0.771 | 0 | no |
Size | 0.722 | 1.000 | 0 | no | 0.806 | 1.000 | 0 | no | 0.756 | 0.851 | 0 | no | 0.964 | 0.774 | 885 | no |
Shape | 0 | 0.956 | 0 | no | 0.034 | 0.952 | 0 | no | 0.011 | 0.428 | 0 | no | 0.999 | 0.463 | 992 | yes |
Orientation | 0.002 | 0.962 | 0 | no | 0.218 | 0.972 | 0 | no | 0.183 | 0.451 | 0 | no | 0.013 | 0.452 | 0 | no |
Margin border | 0 | 0.945 | 0 | no | 0.252 | 0.956 | 0 | no | 0 | 0.539 | 0 | no | 0.036 | 0.399 | 996 | yes |
Margin contour | 0.566 | 0.983 | 0 | no | 0.741 | 0.985 | 0 | no | 0.117 | 0.569 | 0 | no | 0.845 | 0.452 | 462 | no |
Post. acoustic | 1 | 0.987 | 999 | yes | 0.992 | 0.987 | 397 | no | 1 | 0.532 | 999 | yes | 0.545 | 0.496 | 662 | no |
Calcification | 0.245 | 0.966 | 0 | no | 0.608 | 0.966 | 0 | no | 1 | 0.437 | 999 | yes | 0.825 | 0.421 | 993 | yes |
Boundary | 1 | 0.926 | 999 | yes | 0.999 | 0.954 | 999 | yes | 0.026 | 0.402 | 0 | no | 0.461 | 0.432 | 0 | no |
Echogenicity | 0 | 0.913 | 0 | no | 0 | 0.937 | 0 | no | 0 | 0.100 | 0 | no | 0 | 0.117 | 0 | no |
Adler | 1 | 0.988 | 117 | no | 1 | 0.993 | 999 | yes | 0.567 | 0.503 | 934 | yes | 1 | 0.662 | 999 | yes |
BI-RADS | 0.363 | 0.956 | 0 | no | 0.824 | 0.966 | 0 | no | 0 | 0.441 | 0 | no | 0.209 | 0.428 | 0 | no |
In the training set, the majority of patients exhibited the echogenic halo (63.7%, n = 163/256) and shadowing/indifference in terms of post-acoustic (65.6% + 25.8%, 234/256), which demonstrated that shadowing was the most common post acoustic feature, followed by indifference.
IHC was performed in the 170 patients from the test set to evaluate our model. A total of 64 patients with Estrogen Receptor (ER) and/or Progesterone Receptor (PR) positive, HER2 negative and Ki-67 < 14% were classified as LA subtype. Of the 170 patients, 57 had boundary with echogenic halo, and of these, 43 cases were of the LA subtype. In particular, there were 31 patients showing echogenic halo and post acoustic shadowing, of which 26 were the LA subtype. Additionally, 13 patients exhibited echogenic halo and post acoustic indifference, of which 8 were the LA subtype. The diagnosis of two patients whose ultrasounds featured echogenic halo (Fig. 3b1) and post-acoustic shadow (Fig. 3b1) or post-acoustic indifference (Fig. 3b2) were confirmed by the IHC results (Fig. 3c1) classified as LA subtype. Remarkably, our model yielded with an accuracy of 83.8%, sensitivity of 77.3% and specificity of 87.5%.
These data together suggested that echogenic halo was a significant feature of LA subtype. Combined echogenic halo and post-acoustic shadowing/indifference were important for distinguishing the LA subtype.
Identifying LB breast cancer based on ultrasound features
Likewise, we identified the ultrasound features of LB breast cancer, including boundary and Adler. The details of FV, cutoff and frequencies were shown in Table 1. Based on the stable features, we constructed the decision model (Fig. 4a).
In contrast to LA breast cancer, the majority of LB patients did not display echogenic halo (86.7%, n = 215/248). The Adler degree of 205 LB patients was II or III (82.7%, 205/248), while 43 showed 0 and I (17.3%, 43/248), suggesting that vascularity could be used to characterized the LB subtype.
Next, we used IHC to further validate our model. A total of 63 patients having ER or/and PR positive, HER2 negative and Ki-67 ≥ 14% or ER or/and PR positive and HER2 overexpressed or/and amplified tumor cells were classified as the LB subtype. Of the 170 patients, 114 showed boundaries without echogenic halo, of these 53 cases were of LB subtype. In particular, there were 79 patients lacked echogenic halo and vascularity, of these 60 were of LB subtype. The diagnosis of a tumour who had ultrasound features without echogenic halo (Fig. 4b1) and vascularity (Fig. 4b2) was confirmed by the IHC examination (Fig. 4c) and classified as the LB subtype. Our model yielded an accuracy of 77.4%, sensitivity of 75.9% and specificity of 78.1%.
Our results showed that the combination of echogenic halo and vascularity could help to distinguish the LB subtype, while it was necessary to identify other factors that might improve the accuracy of this model.
Identifying HER2-amplified breast cancer based on ultrasound features
Using similar approach, we chose the stable high selected features, age, post acoustic, calcification and Adler, to construct the decision model (Fig. 5a). The FV, cutoff and frequencies were displayed in Table 1.
The training set contained 84 HER2 subtype patients, of these 52 patients (61.9%) showed post acoustic enhancement, 66 displayed calcification (78.6%), and 49 patients were older than 52 (58.3%). In addition, the Adler degree of 70 patients was II or III (82.7%).
There were 22 patients with IHC features of HER2 subtype include ER and PR negative and HER2 overexpressed or/and amplified tumour cells. Of the 170 patients, 9 patients showed post acoustic enhancement, calcification, vascularity (Adler II or III) and were older than 52, of which 8 were of the HER2 subtype. The diagnosis of a patient whose post-acoustic showed enhancement (Fig. 5b1), exhibited calcification (Fig. 5b1), was older than 52 and abundant vascularity (Fig. 5b2) was confirmed by IHC examination (Fig. 5c) and classified as being of the HER2 subtype. The accuracy of the model was evaluated by the IHC results, showing an accuracy of 87.9%, sensitivity of 20% and specificity of 97.2%.
These data suggested that the combination of post acoustic enhancement, calcification, age older than 52 and vascularity shows high specificity, but low sensitivity, for distinguishing the HER2 subtype.
Identifying triple-negative breast cancer based on ultrasound features
For triple-negative breast cancer, we also chose the stable high selected features of shape, margin contour, Adler and calcification (Table 1), to construct the corresponding decision model (Fig. 6a).
In the training set of 87 TN patients, 50 exhibited an irregular shape (57.5%), while 37 had an oval shape (42.5%). A total of 25 patients showed a smooth margin contour (28.7%), 56 patients had a lobulate margin contour (64.4%) and 6 patients had angular or spiculate margin contours (6.9%). Additionally, 23 patients had calcification (26.4%). 57 patients (65.5%) showed the Adler degree 0 or I. The TN subtype was characterized by both an irregular or an oval shape and smooth or lobulate margin contours. However, TN subtype patients were hypovascularity and lack of calcification.
Using IHC, TN subtype was characterized by ER, PR, and HER2 negative tumor cells, and a total of 22 patients were determined. As seen from the model, there were two categories of TN breast cancer. There were 14 patients exhibiting an irregular shape and a lobulate margin contour lacking calcification that were hypovascular, of these 10 had the TN subtype. Additionally, there were 4 patients had an oval shape, were hypovascular and showed lobulate margin contour, of which all were the TN subtype. Furthermore, the lobulate margin contour was primarily a micro-lobulate margin contour. The diagnosis of two patients, one of whom was type 1tumour with an irregular shape and a lobulate margin contour, no calcification and hypovascularity (Fig. 6b1) and another that was type 2 with an oval shape, lobulate margin contour and hypovascularity (Fig. 6b2), was confirmed by IHC examination (Fig. 6c1) as the TN subtype. The accuracy of the model was evaluated based on the results from the IHC, yielding a significant accuracy of 92.7%, a sensitivity of 63.2% and a high specificity of 98.1%.
Together, the combination of irregular shape, lobulate margin contour, lack of calcification and hypovascularity or the combination of oval shape, hypovascularity and lobulate margin contour has high accuracy and specificity but low sensitivity for distinguishing the TN subtype.
Discussion
One challenge of breast cancer ultrasound research studies has been the development of a reliable decision-making rule for classifying patients into molecular subtypes. Accordingly, we have proposed a new method called the ensemble decision approach. From our analysis, we obtained relatively meaningful results using the ensemble decision approach. The ensemble decision approach not only identified the unique features of each molecular subtype but also generated models for distinguishing the molecular subtypes. The accuracy of the models test was high. The results described above show that a single feature could not identify the molecular subtype, but instead that an ordination of the features was valuable for molecular subtype diagnosis.
Echogenic halo and post-acoustic shadowing characterized the LA subtype. Previous studies had examined the relationship between the ultrasound features and pathological characteristics of breast cancer. It had been suggested that tumours with acoustic shadowing might be formed by desmoplastic reactions that were more likely to exist in low-grade tumours and were caused by excessive sound reflection or attenuation by the tumour compared to the surrounding tissue. It had been confirmed that tumours of the LA subtype were mostly low-grade tumors14. Therefore, the LA subtype was mostly9 associated with a post-acoustic shadowing. The echogenic halo corresponded to the histopathological features of tumour cells invading fat tissue admixed with adipocytes and elastic fibers15. The echogenic halo was thought to exist in low-grade and spiculate tumours8. Because the LA subtype was mostly of low-grade14 and ER (+) breast cancer accounts for the majority of cells that exhibit an echogenic halo9, most LA cancers also exhibited an echogenic halo. Our findings were similar to the results of Eun et al., who indicated that ER-positive/PR-negative/HER2-negative breast cancers more frequently exhibited echogenic halo than triple-negative and HER2 subtype10.
The absence of an echogenic halo and the presence of vascularity were the characteristics of the LB subtype. The LB subtype was associated with an increased risk of relapse and most are of high-grade16, so that was a lack of echogenic halo. Previous studies had shown that overexpression of HER2 was closely associated with increased angiogenesis and the expression of vascular endothelial growth factor (VEGF), which mediated endothelial cell signaling and other functions17. There was preclinical evidence supporting the role of angiogenesis in mediating downstream HER2 signaling18. The results of this study showed that Luminal-B subtype tumours overexpressing HER2 demonstrate vascularity in ultrasound images.
Post-acoustic enhancement, calcification, older age and vascularity were the characteristics of the HER2 subtype. In contrast to the Luminal-A subtype, tumours with acoustic enhancement were found to be more cellular and tended to be high-grade tumours19 because of the reduced attenuation of the ultrasound waves compared to the surrounding tissue. The HER2 subtype was found to mainly include high-grade tumours with adverse prognoses20. Thus, the HER2 subtype showed post-acoustic enhancement. Our results were similar to previous findings11. Studies by Sung et al. showed that the expression of the HER2 oncogene was strongly correlated with the presence of calcification upon ultrasound21,22, and this result was similar to results obtained by many others as well as this study23. The HER2 oncogene was overexpressed in the HER2 subtype, which might account for the frequent observation of calcification. We found that an older age was characteristic of HER2, which was in contrast to results from other groups16 and this might be due to the sequential decision. In accordance with previous studies, the HER2 subtype was found to be vascular in ultrasound images.
Triple-negative breast cancer was associated with aggressive histological features, unresponsiveness to the usual endocrine treatment, a poor prognosis and a shorter survival time24,25,26. However, because the TN breast cancer mass could look benign, being able to discriminate TN breast cancer was of critical importance. Ko et al. suggested that TN breast cancers have more circumscribed margins, were hypoechoic and exhibit less calcification or posterior shadowing10. Dogan et al. found that TN breast cancers were masses without calcification, and 32% had circumscribed margins27. By ultrasound, Wang et al. concluded that TN-negative breast cancers (n = 20) were more likely to lack calcification and were more likely to present as hypoechoic (80%) masses with an irregular (54%) or lobulated (20%) shape and with distinct (40%), microlobulate (33%), smooth or circumscribed (27%) margins5. Ko et al. found that TN breast cancers were likely to be irregular (83%) or oval shaped (16%), with circumscribe (57%), angular (16%), indistinct (12%), microlobulated (9%), or spiculated (5%) margins10. In general, our results showed that there were two categories of imaging features for TN breast cancer. One was an irregular shape with lobulate margins, while the other was an oval shape with micro-lobulate margins. However, both categories lack vascularity, which is similar to the observations of the above study. Wojcinski et al. described this smooth appearance as a pushing border that was associated with a non-infiltrative process caused by rapid tumour growth28.
The decision models had significant applications in clinical diagnostic. For an unknown breast mass, we extracted 12 ultrasound features (Table 2), and input them into the four decision models containing 18 rules (Supplemental Information). Through the predictions of models, we obtained the categories of mass (that is, the prediction results from four decision models) and then identified the subtype. For example, assume the twelve ultrasound features of a patient were: age, 48 years; size, 23 cm; shape, irregular; orientation, parallel; margin border, indistinct; margin contour, angular/spiculate; post-acoustic, shadowing; calcification, absent; boundary, echogenic halo; echogenicity, hyper-, isoechoic; Adler, I; BI-RADS, V. Then, using the four decision models, the prediction would be “YES” only for LA subtype model and “NO” for the other three models (e.g., LB, HER2 and TN). Thus, the patient would be judged as LA subtype (for details see the Supplemental Information).
Table 2. Definition of the ultrasound features criteria.
Variables | Definition | |
---|---|---|
Size | Maximum diameter of the tumor by ultrasound | |
Shape | Oval, round | Oval, spherical or round |
irregular | Not round or oval | |
Orientation | Parallel | Long axis of lesion parallels the skin line |
Not parallel | Long axis, not oriented along the skin line | |
Margin border | circumscribed | A margin that is well defined or sharp, with an abrupt transition |
indistinct | No clear demarcation between mass and its surrounding tissue | |
Margin contour | smooth | Smooth, even margin without any irregularity |
lobulate | Short cycle undulations impart a scalloped appearance to the margin of the mass | |
Angular, spiculate | Margin is formed or characterized by sharp lines projecting from the mass | |
Post. acoustic | Indifferent | No shadowing or enhancement |
Enhancement | Increased posterior echo | |
Shadowing | Decreased posterior echo and combined | |
Calcification | Absent | No punctuated extensively hyper-echoic foci |
Present | Punctuated extensively hyper-echoic foci | |
Boundary | Abrupt interface | No thin capsule or echoic halo |
Echogenic Halo | Blurred, irregular hyperechoic rim around the lesion | |
Echogenicity | Hyper-, isoechoic | Hyper- or isoechogenicity compared to fat, e.g., fibroglandular tissue |
Complex, hypoechoic | Hypoechoic compared to fat tissue | |
Adler | 0 | Vascularity not present |
I | 1-2 spot vessels, caliber shorter than 1 mm | |
II | 1-2 vessels, longer than the radius of the tumor | |
III | More than 4 vessels | |
BI-RADS | I | No lesion found |
II | Benign finding | |
III | Probably benign finding | |
IV | Suspicious abnormality | |
V | Highly suggestive of malignancy |
In conclusion, it is a valuable to use the ensemble decision approach to identify ultrasound and clinical features of breast cancer molecular subtypes. Distinguishing molecular subtypes using ultrasound feature-based classification models is an improvement on ultrasound diagnosis, which can serve as an effective method of auxiliary diagnosis and guide treatment in the clinical setting.
Materials and Methods
The study protocol was approved by the Ethics Committee of Harbin Medical University (2008–0022) and written informed consent was obtained from all participants involved in the study. The methods used in this study were performed in accordance with approved guidelines. Our study sample consisted of 1,000 consecutive patients with breast cancer who underwent surgery and biopsy in the Second Affiliated Hospital of Harbin Medical University between Jan. 22, 2009 and Jan. 20, 2014, who were initially diagnosed by breast ultrasound. Patients who were treated with neo-adjuvant therapy, failed to undergo histological examination, or had multiple breast cancers were excluded. A total of 864 women (mean age 46.31 ± 9.79 years; range 11–67 years) with definite histological results were evaluated in this study.
Ultrasound examination
All the real-time scanning was performed by a radiologist with 4 years of experience in breast ultrasound. The ultrasound was performed with a HITACHI Vision 900 system (Hitachi Medical System, Tokyo, Japan) equipped with a linear probe of 6–13 MHz. The static images and cine clips from B-mode and Doppler ultrasound were saved in the database for double-blind analysis. Three breast radiologists with 7, 9 and 13 years of clinical experience, respectively, retrospectively and independently reviewed the ultrasound images. A consensus interpretation was reached in cases of disagreement. A consensus interpretation was reached in cases of disagreement. The examined ultrasound criteria were listed, illustrated and defined in Table 2. Specifically, the Adler degree was the blood flow level of the vascularity characterization29. The BI-RADS was the assessment category of the breast tumour based on the Breast Imaging Reporting and Data System (BI-RADS), a standardized lexicon for ultrasound features developed in 2003 by the American College of Radiology (ACR)30.
Histological examination
The experiments followed the reporting recommendations for tumour marker prognostic studies (REMARK)31. All the tumours were excised and stained with hematoxylin-eosin (HE). The tissues were formalin-fixed, paraffin-embedded and subsequently used for immunochemistry (IHC) staining with appropriate antibodies. The cutoff point for ER-positive, PR-positive expression was 10%. HER-2 status was graded as 0, 1+, 2+ and 3+. Only a HER-2 status of 3+ was deemed to be positive, while statuses of 0 and 1+ were deemed to be negative. Fluorescence in situ hybridization (FISH) was performed on all grade 2 samples. Samples with a <2-fold-change in expression were regarded as negative, and samples with a >2-fold increase were regarded as positive for gene amplification24,32. Ki67 was visually scored for the percentage of tumour cell nuclei with positive immunostaining above background. Over 14% was considered high expression, and less than 14% was considered low expression33.
Molecular subtypes of breast cancer
Breast cancer molecular subtypes were categorized according to the immunohistochemistry results for ER, PR, HER2 and Ki-67, as recommended by the 12th International Breast Conference30, as follows:
Luminal A type (LA): ER or/and PR positive, HER2 negative and Ki-67 < 14%;
Luminal B type (LB): ER or/and PR positive, HER2 negative and Ki-67 ≥ 14%, ER or/and PR positive and HER2 overexpressed or/and amplified;
HER2 amplified type (HER2): ER and PR negative and HER2 overexpressed or/and amplified;
Triple-Negative type (TN): ER, PR and HER2 negative.
Ensemble decision approach
We proposed an ensemble decision approach based on a recursive partition tree, using the following basic procedures. First, a resampling technique was used to construct a training set and a test set for learning and testing, respectively. Second, a binary tree was grown on the outset, with 80% of the data randomly selected from each sample category of the training set by a recursive partition algorithm. This step was repeated 1,000 times, so that 1,000 trees and the FV of each feature were obtained. Third, feature selection was optimized using a method we developed.
Construction of a features matrix
The ultrasound matrix could be represented by an m*n matrix, in which m represented the patients and n represented the feature. In the matrix, each element X = (xpq) represented the qth ultrasound features of the pth sample (Xp). Each sample could be described by a feature vector Xi = (xi1,…,xiq). The samples were divided into four classes, LA, LB, HER2 and TN3.
Ensemble feature selection
The proposed ensemble selection was a data-mining method based on decision trees, which had previously been applied as an effective solution to classification and prediction problems12. The meaning of the decision tree was a sequence of binary splits of the data, separating one class from the other classes as effectively as possible34. The recursive partition tree is one of the most effective methods used for constructing decision trees35.
Construction of the training set and the test set
The given data were divided into two sets. The training set was used to build the classification model, and the test data set was used to validate the model. First, we randomly selected 80% of the data from each sample category to construct the training set, and the remaining 20% of the data were used as the test set. Then, from the training set, we randomly selected 80% of the data from one type and 80% of the data from the remaining types, called the outer set, to construct the decision tree (the following description used HER2 as an example).
Algorithm of the recursive partition tree
The trees were structured as a root, internal and leaf nodes. A binary tree was grown from the outer set using a recursive partition algorithm. Depending on whether a particular selected predictor was above a chosen cutoff value, the samples were divided into smaller and smaller groups. If the ultrasound feature leads to minimal impurity at the node, this feature was selected at the node of the tree. At each internal node, a decision was made with regard to the choice of a feature and a threshold value (cut-off), such that class impurity was reduced to a minimum when a branch was created by an the induction rule13.
Selection of relevant ultrasound features
When tree growth was stopped, we extracted the ultrasound features at the nodes. A subset of the ultrasound features was obtained from the particular outer set, denoted {F1, F2,…, Fq} We defined FV as the magnitude of the relevance intensity, which could be used to calculate whether an ultrasound feature was relevant to one category as follows:
fk denoted for a particular ultrasound feature; wd was the weight, a measure of the classification performance of Fd; was an indicator functions:
To build the distribution of the FV, we randomly assigned a category label to each patient in a process, called the permutation approach. FV(fk) was computed using the permutation results. Then, FV0(fk), the empirical null distribution, was obtained. The critical value, , was obtained based on the given empirical FV0(fk) and a specified significance level, β(e.g. 0.001). If (one-tailed), the ultrasound feature was chosen. We repeated this step 1,000 times and selected the stable highly frequent ultrasound features from an ultrasound features subset. The final decision tree was constructed from the feature group.
Evaluation of the ensemble decision model
We used the χ2 statistic method to assess the accuracy of the extracted ultrasound features subset,
Where, n = n00 + n01 + n10 + n11, n00 was the frequency of true negatives, n01 was the frequency of false positives, n10 was the frequency of false negatives and n11 was the frequency of true positives. This statistic followed an asymptotic χ2 distribution with one degree of freedom.
The computational process was implemented on the R platform. The specific workflow was shown in Fig. 7.
Additional Information
How to cite this article: Zhang, L. et al. Identifying ultrasound and clinical features of breast cancer molecular subtypes by ensemble decision. Sci. Rep. 5, 11085; doi: 10.1038/srep11085 (2015).
Supplementary Material
Acknowledgments
This study was funded through four National Natural Science Foundations of China (Grant No. 81071216, 81101103, 81271647 and 81371632).
Footnotes
Author Contributions L.Z. and Y.X. designed the study, analyzed the data and wrote the manuscript; J.L. coordinated quality control of ultrasound data; H.C. coordinated review of clinical cases and organization of study groups. All authors contributed to the data interpretation and the final version of the manuscript, which they all approve.
References
- Drukker K., Gruszauskas N. P., Sennett C. A. & Giger M. L. Breast US computer-aided diagnosis workstation: performance with a large clinical diagnostic population. Radiology 248, 392–7 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Perou C. M. et al. Molecular portraits of human breast tumours. Nature 406, 747–52 (2000). [DOI] [PubMed] [Google Scholar]
- Goldhirsch A. et al. Strategies for subtypes-dealing with the diversity of breast cancer: highlights of the St. Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2011. Ann Oncol 22, 1736–47 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones R. L. et al. Relationship between oestrogen receptor status and proliferation in predicting response and long-term outcome to neoadjuvant chemotherapy for breast cancer. Breast Cancer Res Treat 119, 315–23 (2010). [DOI] [PubMed] [Google Scholar]
- Wang Y. et al. Estrogen receptor-negative invasive breast cancer: imaging features of tumors with and without human epidermal growth factor receptor type 2 overexpression. Radiology 246, 367–75 (2008). [DOI] [PubMed] [Google Scholar]
- Cancer Genome Atlas N. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berg W. A. et al. Diagnostic accuracy of mammography, clinical examination, US, and MR imaging in preoperative assessment of breast cancer. Radiology 233, 830–49 (2004). [DOI] [PubMed] [Google Scholar]
- Stavros A. T. et al. Solid breast nodules: use of sonography to distinguish between benign and malignant lesions. Radiology 196, 123–34 (1995). [DOI] [PubMed] [Google Scholar]
- Irshad A. et al. Assessing the role of ultrasound in predicting the biological behavior of breast cancer. AJR Am J Roentgenol 200, 284–90 (2013). [DOI] [PubMed] [Google Scholar]
- Ko E. S. et al. Triple-negative breast cancer: correlation between imaging and pathological findings. Eur Radiol 20, 1111–7 (2010). [DOI] [PubMed] [Google Scholar]
- Au-Yong I. T. et al. Sonographic correlations with the new molecular classification of invasive breast cancer. Eur Radiol 19, 2342–8 (2009). [DOI] [PubMed] [Google Scholar]
- Kantardzic M. in Data mining: concepts, models, methods, and algorithms,2nd edn , 1–4 (Wiley, 2003). [Google Scholar]
- Li X., Rao S., Wang Y. & Gong B. Gene mining: a novel and powerful ensemble decision approach to hunting for disease genes using microarray expression profiling. Nucleic Acids Res 32, 2685–94 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu X., Ying J., Wang F., Wang J. & Yang H. Estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2 status in invasive breast cancer: a 3,198 cases study at National Cancer Center, China. Breast Cancer Res Treat 147, 551–5 (2014). [DOI] [PubMed] [Google Scholar]
- Japan Association of breast and thyroid sonology. in Guideline for breast ultrasound-Management and diagnosis,2nd edn (Tokoy: Japanese, 2008). [Google Scholar]
- Wang Y. et al. A retrospective study of breast cancer subtypes: the risk of relapse and the relations with treatments. Breast Cancer Res Treat 130, 489–98 (2011). [DOI] [PubMed] [Google Scholar]
- Kumar R. & Yarmand-Bagheri R. The role of HER2 in angiogenesis. Semin Oncol 28, 27–32 (2001). [DOI] [PubMed] [Google Scholar]
- Alameddine R. S., Otrock Z. K., Awada A. & Shamseddine A. Crosstalk between HER2 signaling and angiogenesis in breast cancer: molecular basis, clinical applications and challenges. Curr Opin Oncol 25, 313–24 (2013). [DOI] [PubMed] [Google Scholar]
- Lamb P. M., Perry N. M., Vinnicombe S. J. & Wells C. A. Correlation between ultrasound characteristics, mammographic findings and histological grade in patients with invasive ductal carcinoma of the breast. Clin Radiol 55, 40–4 (2000). [DOI] [PubMed] [Google Scholar]
- Taucher S. et al. Do we need HER-2/neu testing for all patients with primary breast carcinoma? Cancer 98, 2547–53 (2003). [DOI] [PubMed] [Google Scholar]
- Kim S. H. et al. Correlation of ultrasound findings with histology, tumor grade, and biological markers in breast cancer. Acta Oncol 47, 1531–8 (2008). [DOI] [PubMed] [Google Scholar]
- Badra F. A. et al. Non-palpable breast carcinomas: correlation of mammographically detected malignant-appearing microcalcifications and epidermal growth factor receptor (EGFR) family expression. Cancer Lett 244, 34–41 (2006). [DOI] [PubMed] [Google Scholar]
- Seo B. K. et al. Correlation of HER-2/neu overexpression with mammography and age distribution in primary breast carcinomas. Acad Radiol 13, 1211–8 (2006). [DOI] [PubMed] [Google Scholar]
- Bauer K. R., Brown M., Cress R. D., Parise C. A. & Caggiano V. Descriptive analysis of estrogen receptor (ER)-negative, progesterone receptor (PR)-negative, and HER2-negative invasive breast cancer, the so-called triple-negative phenotype: a population-based study from the California cancer Registry. Cancer 109, 1721–8 (2007). [DOI] [PubMed] [Google Scholar]
- Rakha E. A., Reis-Filho J. S. & Ellis I. O. Basal-like breast cancer: a critical review. J Clin Oncol 26, 2568–81 (2008). [DOI] [PubMed] [Google Scholar]
- Cheang M. C. et al. Basal-like breast cancer defined by five biomarkers has superior prognostic value than triple-negative phenotype. Clin Cancer Res 14, 1368–76 (2008). [DOI] [PubMed] [Google Scholar]
- Dogan B. E., Gonzalez-Angulo A. M., Gilcrease M., Dryden M. J. & Yang W. T. Multimodality imaging of triple receptor-negative tumors with mammography, ultrasound, and MRI. AJR Am J Roentgenol 194, 1160–6 (2010). [DOI] [PubMed] [Google Scholar]
- Wojcinski S. et al. Sonographic features of triple-negative and non-triple-negative breast cancer. J Ultrasound Med 31, 1531–41 (2012). [DOI] [PubMed] [Google Scholar]
- Adler D. D., Carson P. L., Rubin J. M. & Quinn-Reid D. Doppler ultrasound color flow imaging in the study of breast cancer: preliminary findings. Ultrasound Med Biol 16, 553–9 (1990). [DOI] [PubMed] [Google Scholar]
- American College of Radiology. in American College of Radiology Breast Imaging Reporting and Data System (BI-RADS), ultrasound, 4nd edn (ACR, 2003). [Google Scholar]
- McShane L. M. et al. REporting recommendations for tumor MARKer prognostic studies (REMARK). Breast Cancer Res Treat 100, 229–35 (2006). [DOI] [PubMed] [Google Scholar]
- Aksoy S., Dizdar O., Harputluoglu H. & Altundag K. Demographic, clinical, and pathological characteristics of Turkish triple-negative breast cancer patients: single center experience. Ann Oncol 18, 1904–6 (2007). [DOI] [PubMed] [Google Scholar]
- Cheang M. C. et al. Ki67 index, HER2 status, and prognosis of patients with Luminal B breast cancer. J Natl Cancer Inst 101, 736–50 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bastos J. & Ramalho J. Nonparametric models of financial leverage decisions. Centre for Applied Mathematics and Economics , No. 1005 (2010). [Google Scholar]
- Zhang H., Yu C. Y., Singer B. & Xiong M. Recursive partitioning for tumor classification with gene expression microarray data. Proc Natl Acad Sci USA 98, 6730–5 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.