Highlights
-
•
No unique genome signature or molecular therapy exists for inflammatory breast cancer (IBC), a highly aggressive breast cancer with a 5-year survival rate of less than 30%.
-
•
We show that various gene lists proposed as molecular footprints of IBC have no overlap and thus very limited predictive accuracy in identifying IBC samples.
-
•
We observed that single-sample gene set enrichment analysis (ssGSEA) of IBC samples along the epithelial-hybrid-mesenchymal spectrum can help IBC identification.
-
•
IBC samples robustly displayed a higher coefficient of variation in terms of EMT scores, as compared to non-IBC samples.
-
•
Higher heterogeneity along the epithelial-hybrid-mesenchymal spectrum can be regarded to be a hallmark of IBC and a possibly useful biomarker.
Keywords: IBC, Gene expression signature, Tumor heterogeneity, Hybrid epithelial/ mesenchymal
Abstract
Inflammatory breast cancer (IBC) is a highly aggressive breast cancer that metastasizes largely via tumor emboli, and has a 5-year survival rate of less than 30%. No unique genomic signature has yet been identified for IBC nor has any specific molecular therapeutic been developed to manage the disease. Thus, identifying gene expression signatures specific to IBC remains crucial. Here, we compare various gene lists that have been proposed as molecular footprints of IBC using different clinical samples as training and validation sets and using independent training algorithms, and determine their accuracy in identifying IBC samples in three independent datasets. We show that these gene lists have little to no mutual overlap, and have limited predictive accuracy in identifying IBC samples. Despite this inconsistency, single-sample gene set enrichment analysis (ssGSEA) of IBC samples correlate with their position on the epithelial-hybrid-mesenchymal spectrum. This positioning, together with ssGSEA scores, improves the accuracy of IBC identification across the three independent datasets. Finally, we observed that IBC samples robustly displayed a higher coefficient of variation in terms of EMT scores, as compared to non-IBC samples. Pending verification that this patient-to-patient variability extends to intratumor heterogeneity within a single patient, these results suggest that higher heterogeneity along the epithelial-hybrid-mesenchymal spectrum can be regarded to be a hallmark of IBC and a possibly useful biomarker.
Introduction
Inflammatory breast cancer (IBC) is a rare (2–4% of breast cancer cases) but highly aggressive, locally advanced breast cancer with extremely poor prognosis and a 5-year survival rate of less than 30% [1]. At diagnosis, most IBC patients exhibit signs of lymph node metastasis; approximately 30% of patients have distant metastases as compared to 5% of patients in non-IBC breast cancers [2]. IBCs are often highly angiogenic and invasive, of high histological grade, and cause 7–10% of all breast cancer-associated deaths [1]. Histologically, IBC cells often do not present as a dominant mass, rather being diffused in clusters throughout the breast and skin, thereby leading to many false-negative imaging findings [3]. Decoding unique molecular underpinnings of this deadly disease remains an unmet clinical need.
The presence of tumor emboli in dermal-lymphatic vessels is a pathological hallmark of IBC. Consistently, IBC patients have a higher frequency and larger average size of clusters of circulating tumor cells (CTCs) as compared to non-IBC patients [4]. These clusters have a strong association with poor survival [4], reminiscent of the disproportionately high metastatic fitness of CTC clusters [5]. Aside from this major difference, no unique genomic signature has yet been conclusively identified for IBC, suggesting that other factors may be more important than genetic events for IBC: phenotypic and/or epigenetic heterogeneity, interaction of malignant cells within emboli or with cells from the tumor microenvironment [1,3]. Because of these uncertainties, no specific molecular therapeutic approaches have been yet proposed to manage IBC.
Extensive efforts have been undertaken to identify unique gene expression signatures of IBC. In 2013, the IBC World Consortium identified a 79-gene signature that focused on the inhibition of TGFβ signaling as molecular footprint of IBC [6]; this was consistent with experimental observations that weakened TGFβ signaling promoted collective cell invasion, a hallmark of IBC, while a strong activation of TGFβ signaling promotes individual invasion consistent with a full-blown EMT [7], [8], [9]. However, later, these differences were found to arise due to difference in incidence of HER2-positive subtype in IBC vs. non-IBC samples used [1]. Further efforts involving micro-dissected tumors led to identification of 132-gene signature associated with poor outcome [10], but this signature was seen in approximately 25% of breast cancer samples in TCGA which in fact has very few IBC samples, thus highlighting the limited ability of this signature in identifying IBC [1]. The only other study using micro-dissected tumors identified differences in gene expression in the stroma, instead of the tumor cells, in IBC vs. non-IBC cases [11]. Thus, a comprehensive gene expression signature of IBC and its association, if any, with a partial or complete EMT, remains to be identified.
EMT is a multi-dimensional process involving changes in molecular and morphological traits [12]. The deluge of transcriptomics data associated with EMT has driven the development of metrics to quantify the extent of EMT (also known as EMT scores) [13], [14], [15]. EMT scoring can, thus, enable assessing a rigorous relationship between the extent of EMT and other processes (in this case, IBC categorization).
Here, we compare the utility of multiple proposed IBC gene expression signatures by their ability to identify IBC vs. non-IBC cases. Our results here reveal shortcomings in the consistency and predictive utility of these signatures. We subsequently show that single-sample gene set enrichment analysis (ssGSEA) of IBC samples correlate with their EMT scores as calculated via three independent metrics that quantify the spectrum of epithelial-hybrid-mesenchymal phenotypes. Next, we show that while mean EMT scores of IBC samples were not consistently high or low when compared with corresponding non-IBC samples, IBC samples robustly displayed a comparatively higher coefficient of variation in terms of their EMT score. These results suggest that higher heterogeneity along epithelial-hybrid mesenchymal spectrum can be regarded a hallmark of IBC.
Results
Available IBC gene signatures do not distinguish robustly between IBC and non-IBC
First, we collated the four available gene lists identified to be unique to IBC. These four gene signatures showed high accuracy in classifying IBC samples from non-IBC (nIBC) samples in their respective studies. Each signature was comprised of a different number of genes/probes – 132, 78, 50 and 109 (denoted as 132 GES, 78 GES, 50 GES and 109 GES henceforth) [6,10,16,17]. The 109 GES signature consists of 109 different probe sets which uniquely mapped to 90 genes; in all other gene signatures, all probe sets mapped to an equal number of genes. These gene signatures were identified via distinct data-driven algorithms, each utilizing a single dataset and variable number of samples in the respective training and validation sets. These signatures exhibited varied accuracy in identifying IBC cases (Fig. 1A). Investigating the intersection of common genes amongst each signature revealed minimal or no overlap. Aside from one common gene identified by 132 GES and 50 GES, all other signature elements were unique (Fig. 1B).
Based on high accuracy values (68% - 88%) of these gene lists in their ability to identify IBC in corresponding datasets, we hypothesized that each gene list might be capable of correctly classifying IBC and nIBC cases in three datasets containing gene expression data of clinically annotated IBC and nIBC samples (see Methods). We first performed principal component analysis (PCA) on all genes present in these datasets. In each case, PCA showed no clear separation between IBC and nIBC (Fig. 2; Fig. S1A i; S1B, i), thus highlighting the high transcriptional heterogeneity observed in IBC and nIBC. Surprisingly, none of the four IBC gene lists (132 GES, 78 GES, 50 GES and 109 GES) performed consistently better than the all-genes approach in being able to segregate between IBC and non-IBC samples (Fig. 2, ii-v; Fig. S1A, B ii-iv), with the only possible exception being the performance of 132 GES in one of the three datasets (Fig. 2, iii). This segregation was not improved by using non-linear methods such as uMAP as well (Fig. S2A-C), suggesting that the mRNA levels of these genes are unable to resolve IBC and nIBC.
ssGSEA scores of IBC gene signatures helps in separation of IBC and nIBC samples
As both linear and non-linear combinations of all four gene lists failed to segregate the IBC from nIBC samples in these datasets, we next examined whether as a group, these genes are enriched in IBC samples or not. We used single-sample GSEA (ssGSEA), an extension of Gene Set Enrichment Analysis (GSEA), which calculates separate enrichment scores for each pair of a sample and a gene set. Each ssGSEA enrichment score represents the degree to which the genes in a particular gene set are cumulatively up- or down-regulated within a given sample [18]. We thus tested whether IBC gene signatures are relatively enriched in IBC samples.
We calculated ssGSEA enrichment scores for all four different gene sets and compared these scores between IBC and nIBC samples across three corresponding datasets. This comparison showed some instances of statistically significant differences in ssGSEA scores of IBC and nIBC samples: a) ssGSEA scores of 78 GES for IBC samples was higher than that of nIBC samples in GSE22597, b) ssGSEA scores of 50 GES and 132 GES was relatively higher for IBC samples in GSE5847, and c) ssGSEA scores of 132 GES were comparatively higher for IBC samples in GSE45581 (Fig. 3A). However, none of the four gene sets showed consistent and statistically significant differences between IBC and nIBC across the three datasets.
Next, we performed a permutation test to check whether the statistical differences observed in the mean ssGSEA scores of IBC and non-IBC samples were specific to these gene lists. This test was performed for 132 GES, because it showed significant results in terms of being enriched in IBC samples vs. non-IBC ones (p<0.05) for two out of three datasets. We chose the same number of genes (132) randomly out of the entire list of genes available to calculate the ssGSEA scores of IBC and non-IBC samples and tested whether the means of ssGSEA scores were significantly different for IBC and non-IBC samples. We repeatedly generated 1000 such instances and compared it to the ssGSEA scores obtained for 132 GES for each instance. This experiment showed that across the three datasets, for a large number of such randomly chosen gene lists, the difference in mean values of ssGSEA scores was not statistically significant (Fig. 3B). This analysis indicated that the predefined 132 GES is quite likely to better distinguish between IBC and non-IBC compared to a randomly chosen gene set, but the extent of utility of this signature needs further validation.
After comparing ssGSEA scores, we next tested their ability to sort samples into two clusters and checked whether those two clusters corresponded to IBC and nIBC samples. We performed k-means (k = 2) clustering on all four ssGSEA scores to cluster the samples and measured the accuracy of clustering into IBC/nIBC based on the SR (sample ratio) values (Fig. 3C). A perfect clustering would mean that one of two clusters identified via k-means contains all IBC samples in that dataset, and the other contains all nIBC samples. To quantify the effectiveness of clustering into IBC/nIBC, we first calculated cluster IBC and nIBC scores for both clusters; these are the percentage of IBC (and nIBC) samples in the entire dataset that get classified into this cluster. Thus, we calculated the sample ratio (SR) by dividing the smaller of these two cluster scores by the larger of these scores. The closer the SR value is to 0, the higher the difference between the two sample proportions and therefore the closer that cluster is to contain either all IBC or nIBC samples. For GSE22597, clusters formed on basis of 78GES had the lowest SR values compared to clusters formed by 132 GES, 50GES or 109 GES. For GSE45581, the SR values of 132 GES were shown to be the least; for GSE5847, 132 GES performed the best (Fig. 3C).
We considered the actual sample groups and cluster numbers as two categorical variables. Statistical significance of enrichment of IBC and nIBC samples across these two clusters for each dataset was calculated based on a Fisher-exact test. 78 GES was the best performer in GSE22597, 50 GES in GSE45581, and 132 GES in GSE5847 (Fig. 3C-D).
Logistic regression iteratively identifies an IBC signature
After exploring all the available IBC gene signatures, next we tried to define IBC signature by applying logistic regression (LR) to the three different datasets. LR is a machine learning-based classification scheme that can predict the probability of a given sample to belong to one of the two (or more) categories. The LR approach has been used to bin samples into epithelial, mesenchymal or hybrid epithelial/mesenchymal categories [15]. Applying LR to each dataset individually, all transcripts are ranked based on their ability to distinguish between IBC and non-IBC samples in the corresponding dataset. The generation of the specific IBC signatures yielded reasonable preliminary models of IBC vs. non-IBC (Fig. 4A). Predictor goodness-of-fit and predictive accuracies correlated within each dataset, with the highest values observed in GSE45581 and the lowest in GSE22597 (Table S1). First, we identified which transcripts were best able to resolve IBC from non-IBC samples in individual datasets. While top transcripts varied across each dataset in their ability to separate IBC and non-IBC samples, with deviances ranging from 18.3 to 89.7 in the top ten from each dataset, their predictive accuracies were quite comparable – 75%−80% (Table S1).
These results suggest that IBC can be reasonably identified, quite simply by using a single predictor, provided the analysis is restricted to a given dataset. On the other hand, generalizing across datasets is less straightforward. In an attempt to define an IBC signature based on the mutual intersection of three datasets, we looked for overlap in top ranked transcripts from each dataset; top 200 transcripts revealed no common genes. In top 2000 transcripts, 13 of them were found to be common across the three datasets. Third, we calculate the fold-change in levels of these 13 genes in IBC vs. non-IBC samples in these datasets. None of the 13 transcripts showed consistent upregulation or downregulation in IBC samples across the three datasets (Fig. 4B).
While our iterative approach represents the maximal resolvability achievable for each dataset using the iterative LR approach, the discrepancies across datasets seen in our earlier analysis do not completely disappear. For instance, lack of resolvability was also seen in IBC and non-IBC when using the LR-derived gene lists in PCA across the three datasets (Fig. S3 B-D). Thus, increasing the predictive accuracy further may be a useful goal for future pursuits of a universal IBC signature. Such an approach is possible by utilizing multivariate LR models, but it requires significantly more training data for IBC and non-IBC cases, to prevent overfitting.
Correlation between IBC gene signature ssGSEA scores and EMT scores
It has been proposed that IBC cells exhibit a partial EMT behavior, given the retention of E-cadherin levels and the trait of collective cell migration through tumor emboli [19]. Thus, following the assessment of IBC gene signatures, we quantified the EMT-ness of IBC and nIBC samples based on three different EMT scoring metrics – KS, 76GS and MLR [20].
These metrics score EMT on a continuum, based on the transcriptomics of individual samples. While KS and MLR score the samples on a scale of [−1, 1] and [0, 2] respectively, the 76GS metric has no pre-defined scale. The higher the MLR or KS score, the more mesenchymal the sample is; the higher the 76GS score, the more epithelial the sample is. Thus, KS and MLR scores of samples in a given dataset correlate positively with one another; both of them correlate negatively with 76GS scores, as seen across multiple datasets (Table S2)
Here, we used these metrics to estimate where IBC and nIBC samples lie in the entire epithelial-hybrid-mesenchymal spectrum, followed by an assessment of correlation of EMT scores with their corresponding ssGSEA enrichment scores. Two ssGSEA enrichment scores (132 GES and 50 GES) showed consistently very high and significant correlations with all three EMT scoring metrics in two datasets (GSE45581, GSE5847). Overall, a higher enrichment in IBC signature is associated with a more EMT-like phenotype. These correlations were maintained across both IBC and nIBC samples (Fig. 5A, B, Fig. S5–S8). However, this consistency is lost when using 78GES or 109GES in these datasets as well as in the case of GSE22597 using any of the four ssGSEA scores (132 GES, 50GES, 78 GES, 109 GES) (Fig. S3, S4). Taken together, these results suggest that some of the IBC gene lists may enrich for mesenchymal samples instead of segregating IBC/ nIBC. The heterogeneity of both IBC and nIBC samples along epithelial-hybrid-mesenchymal spectrum may contribute to compromising the accuracy of these gene lists in identifying IBC.
Combination of EMT score and ssGSEA score helps in better separation of IBC and nIBC samples
Next, we asked whether IBC and nIBC can be separated using a combination of two different dimensions - ssGSEA enrichment score (using one or more of the four gene lists – 78 GES, 132 GES, 50 GES, 109 GES) and EMT score (using one or more of the EMT scoring metrics).
To test the performance of different classifier combinations in sample segregation, we used different numbers and combinations of ssGSEA enrichment scores and EMT scores to perform k-means clustering. We used one, two, three, four, five, six, and seven different scores in all possible combination to cluster the samples into two groups. The clustering accuracy was again measured based on both SR value and Fisher-exact test. This exercise showed that a combination of EMT scores and ssGSEA scores performs best in terms of clustering IBC and nIBC into separate groups (Table S3). Again, these combination of scores were not consistent across different datasets but it was always one or more ssGSEA scores with the combination of one or more EMT scores. Based on the SR value and Fisher exact test – (1) Combination of 78 GES, 109 GES, 50 GES and KS in GSE22597 (2) Combination of 50 GES, 132 GES, 78 GES and KS in GSE45581 and (3) Combination of 132 GES, MLR, and 78 GES in GSE5847 were the best performers in terms of clustering (Fig. 6A).
EMT score COV (coefficient of variance) is higher in IBC samples
Finally, after establishing the importance of EMT scores in segregation of IBC and nIBC samples, we compared these scores across IBC and nIBC. There was no significant difference in mean scores of IBC vs. nIBC across the three datasets (Fig. S9), with overlapping variance between the two groups. However, the within-group coefficient of variance (COV), a better measure than variance to assess the dispersion around the mean, was consistently higher in 8 out of 9 (3 EMT scores x 3 datasets) total cases (Fig. 6B). This result shows IBC samples are more heterogenous as compared to nIBC in terms of their positioning on EMT spectrum.
Discussion
Identifying unique genomic or transcriptomic signatures for IBC has been a challenge, and this lack of consensus limits potential molecular therapeutic approaches to treat this rare but deadly disease [1]. The term 'inflammatory' for IBC originated from its physical appearance, which mimics an acute inflammation of the breast [21]. However, a useful association between the inflammatory phenotype and the cell's omics has not yet been established.
An earlier study that attempted to characterize IBC based on clinical presentation as a distinct molecular entity concluded that “molecular subtype and inflammatory character are two independent features of breast cancers” [22], as the major molecular subtypes described for non-IBC were also found to exist within IBC. Nevertheless, there were multiple studies aimed at defining the molecular signature of IBC using genome-wide gene expression profiling. These studies used several different unsupervised and supervised methods to identify features related to IBC [6,10,16,17]. These gene lists have little to no overlap, and none of them so far have been interpretable as one or more common biological processes or pathways. The main objective of our study was not to predict the status of a new sample as IBC or non-IBC, but to validate the previously defined IBC gene signatures using available datasets consisting of IBC and non-IBC samples. These gene signatures were defined in individual datasets containing both IBC and non-IBC samples using specific statistical models, and we investigated if these gene lists are capable of identifying IBC and non-IBC in independent datasets. Here, we compare these previously defined gene signatures in their ability to classify IBCs from non-IBC, based on a analysis of three separate microarray datasets, none of which were directly involved in identifying the four gene lists (132 GES, 109 GES, 78 GES, 50 GES). Additionally, we also elucidate the EMT status of IBC samples using three different transcriptomics-based EMT scores. To the best of our understanding, this is the first study that contrasts different IBC gene signatures and EMT scoring across IBC datasets.
Mechanistic studies utilizing in vitro and in vivo models have revealed some markers for IBC, such as P-cadherin [23]. This molecule has also been proposed as a marker of the hybrid epithelial/mesenchymal (E/M) phenotype [24] due to its role in promoting collective cell migration and invasion [25,26] as well as tumor-initiating properties [27]; both of these properties are considered as hallmarks of hybrid E/M phenotype(s) [28]. P-cadherin (CDH3) is also a transcriptional target of NP63α [29], another potential ‘phenotypic stability factor’ (PSFs) for a hybrid E/M phenotype [19]. Similar to other PSFs [30], [31], [32], overexpression of P-cadherin associates with poor clinical outcome in invasive breast carcinomas [33]. Another pathway that has been reported to be enriched in IBC is the IL-6 pathway [19] which can promote Notch-JAG1 signaling [34]. JAG1 was reported as one of the top upregulated genes in collectively migrating cells [35] and its knockdown severely inhibited emboli formation in SUM149 IBC cells [34]. Moreover, given the role of IL-6 in mediating tumor-stroma crosstalk in IBC [36], it is possible that IL-6 mediates cell-cell communication both among IBC cells and with stroma. Despite these promising mechanistic insights, no accurate predictive signature for IBC exists.
Our results highlight that the proposed IBC gene lists so far (109 GES, 78 GES, 50 GES and 132 GES), despite showing a good accuracy in their corresponding validation datasets, show quite limited success in segregating IBC from non-IBC samples in independent cases. Various reasons may contribute to this result: inconsistency in the identification of IBC in different clinical samples, possible contamination by stromal cells in the samples investigated, technical aspects such as differences in the platforms used for RNA extraction and analysis, and/or lack of metrics other than gene enrichment. To further indicate this failure of transferability, we used logistic regression (LR) methods to identify predictors (genes) that can best segregate between IBC and non-IBC, but no common trend was seen even in the top 2000 predictors collated from each of the three clinical datasets investigated. Put together, there exists a need to examine alternative metrics to be able to accurately identify IBC samples and perhaps gene expression on its own is insufficient to distinguish between IBC and non-IBC. One potential way to overcome the existing limitation may be to apply multivariate LR, but more training data to identify IBC from non-IBC samples would be required there to prevent any overfitting. Another approach would be to incorporate additional modalities of data, e.g. proteomics, to distinguish between IBC and nIBC.
Given the extensive literature on the role of a partial or complete EMT in collective migration and metastasis in breast cancer [35,37,38], we investigated if a more mesenchymal phenotype correlated with enrichment of ssGSEA scores for IBC gene lists in breast cancer samples. Indeed, ssGSEA scores for two IBC gene lists (50GES, 132 GES) correlated significantly with more mesenchymal samples, irrespective of whether those samples belonged to IBC or non-IBC. However, the ssGSEA scores for the 78GES list correlated with a more epithelial phenotype specifically for IBC samples, reminiscent of 78GES being associated with attenuated TGFβ signaling that may drive collective migration. Put together, one way to interpret these results may be that an ‘intermediate’ EMT associates with IBC, but at large, this inconsistency demonstrates a complex relation between IBC and EMT-ness and strengthens the idea of EMT- related heterogeneity in IBC. It is worth noting that these gene lists have been identified based on primary tumors and the expression signatures of primary tumors may contain little information about whether circulating tumor cells (CTCs) migrate individually or collectively [39]. Further characterization of emboli or clusters of CTCs for IBC will help deconvolute the contribution of EMT for IBC metastasis. Moreover, single-cell analysis of primary tumors and CTCs of IBC shall help in identifying various immune cell subsets in IBC which may drive disease progression. The composition and/or spatial localization of immune cells is likely to yield better insights into immune ecology of highly aggressive disease such as IBC [40,41].
The EMT scores for IBC as well as non-IBC samples did not indicate a ‘full-blown’ EMT (i.e. MLR score > 1.5 or equivalently KS score > 0.6), thus strengthening our previous observations that a complete EMT is not required for metastasis [42]. Intriguingly, we did observe a higher heterogeneity in EMT scores for IBC samples as compared to non-IBC samples. It remains to be ascertained whether the inter-tumor heterogeneity in EMT scores in IBC is also reflected as high intra-tumor heterogeneity as well. If that turns out to be the case, a higher intra-tumor heterogeneity along the EMT spectrum can be considered as a potential biomarker for IBC. Further, this heterogeneity may be a contributing factor to the lack of a unique gene expression signature that can be ascribed to IBC per se.
While genomic heterogeneity in tumors has been extensively studied, quantifying non-genetic (i.e. phenotypic) heterogeneity has been possible only recently through investigating cell-to-cell variability in isogenic populations [43], [44], [45], [46]. Higher phenotypic heterogeneity may encourage cancer invasion [47] as well as the evolution of therapy resistance [48]. It may arise from network topology features such as mutually inhibitory feedback loops [49], for instance, the loop between RKIP and BACH1 [50] or that between AMPK and AKT [51]. Our earlier attempts have highlighted network topology based features such as hierarchical organization as a marker for IBC [42], endorsing that quantitative metrics to dissecting phenotypic heterogeneity in IBC may be better poised to highlight the IBC hallmarks than searching for gene signatures.
Declaration of Competing Interest
The authors declare that there are no competing interests.
Acknowledgments
Acknowledgments
This work was supported by Ramanujan Fellowship (SB/S2/RJN-049/2018) awarded to MKJ by Science and Engineering Research Board (SERB), Department of Science and Technology (DST), Government of India.
Ethics statement
No ethics approval was needed because this study was based on data analysis of publicly available gene expression datasets GSE22597, GSE45581, and GSE5847 available on https://www.ncbi.nlm.nih.gov/geo/.No new datasets were generated here.
Author contributions
PC and JTG performed research, HL and WW analyzed data, MKJ designed and supervised research. All authors participated in writing and editing of the manuscript.
Methods
All the analyses have been performed on R 3.4.4 version and data was plotted using ggplot2 package.
Datasets: Three separate IBC datasets were used in this study. GSE5847, GSE22597 and GSE45581 datasets were downloaded from NCBI GEO website.
Principle component analysis (PCA): PCA was performed using prcomp function available in R and plotted using factstoextra R package.
ssGSEA analysis: ssGSEA analysis for various different gene sets were performed using GSVA R Bioconductor package with “ssgsea” option for method argument.
k-means clustering: K-means clustering was performed using R package “cluster” and centers were set as two to get two separate clusters.
EMT scoring: Three different EMT scoring methods – KS, MLR, 76GS were used to score samples separately in the three datasets [20].
Statistical analysis: All the pairwise comparison significance was tested using student's t-test. Significance of the enrichment of IBC and nIBC samples across clusters were tested using fisher exact test.
Permutation/Randomization test
Permutation test was performed to test the significance of a gene signature as compared to the random set of genes. To determine the efficiency of an IBC gene signature on the basis of a distribution firstly the values were calculated using original expression values then same number of genes were chosen randomly. This process was repeated 1000 times to generate a null distribution of ssGSEA scores. Next, these ssGSEA scores compared across IBC and nIBC groups to obtain difference in mean values and significance based on t-test. This test was used to show that the original expression shows a higher difference in the mean of ssGSEA score and a lower p-value as compared to most of the random cases.
Resolution IBC vs. n-IBC via iterative logistic regression: Samples from GSE22597 were first identified and categorized into IBC and n-IBC samples as previously reported, with all other samples omitted from analysis. The predictor set was comprised of all transcripts and for each individual transcript binomial logistic regression was fitted to the categorical IBC status. The output corresponds to each transcript a generalized residual sum of squared error, or deviance, with smaller values corresponding to better fit. The transcripts were then sorted in increasing order of deviance. For each of top ten predictors coding for genes, a leave-one-out assessment of prediction ability was performed. In each case, the logistic regression model was again constructed, this time on all but one sample. The corresponding statistical model was then used to predict the IBC status of the withheld sample. This procedure was repeated iteratively, withholding a distinct sample each time, and the results of the prediction were aggregated to estimate predictive accuracy.
This process was repeated for two additional datasets (GSE45581 and GSE58477). The top ten predictors for each dataset, together with their deviance values and predictive accuracy are listed in Fig. S3. The top predictors from each of the three datasets define a logistic regression (LR)-specific IBC signature by taking the mutual intersection of the top 2000 scoring transcripts from each dataset (Fig. 3).
Data availability
Data sharing not applicable to this article as no datasets were generated during the current study.
Footnotes
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.tranon.2021.101026.
Appendix. Supplementary materials
References
- 1.Lim B., Woodward W.A., Wang X., Reuben J.M., Ueno N.T. Inflammatory breast cancer biology: the tumour microenvironment is key. Nat. Rev. Cancer. 2018;18:485–499. doi: 10.1038/s41568-018-0010-y. [DOI] [PubMed] [Google Scholar]
- 2.Woodward W.A. Inflammatory breast cancer: unique biological and therapeutic considerations. Lancet Oncol. 2015;16:e568–e576. doi: 10.1016/S1470-2045(15)00146-1. [DOI] [PubMed] [Google Scholar]
- 3.Rosenbluth J.M., Overmoyer B.A. Inflammatory breast cancer: a separate entity. Curr. Oncol. Rep. 2019;21:86. doi: 10.1007/s11912-019-0842-y. [DOI] [PubMed] [Google Scholar]
- 4.Mu Z., Wang C., Ye Z., Austin L., Civan J., Hyslop T., Palazzo J.P., Jaslow R., Li B., Myers R.E., Jiang J., Xing J., Yang H., Cristofanilli M. Prospective assessment of the prognostic value of circulating tumor cells and their clusters in patients with advanced-stage breast cancer. Breast Cancer Res. Treat. 2015;154:563–571. doi: 10.1007/s10549-015-3636-4. [DOI] [PubMed] [Google Scholar]
- 5.Jolly M.K., Mani S.A., Levine H. Hybrid epithelial/mesenchymal phenotype(s): the ‘fittest’ for metastasis? Biochim. Biophys. Acta Rev. Cancer. 2018;1870:151–157. doi: 10.1016/j.bbcan.2018.07.001. [DOI] [PubMed] [Google Scholar]
- 6.Laere S.J., Van; Ueno N.T., Finetti P., Vermeulen P., Lucci A., Robertson F.M., Marsan M., Iwamoto T., Krishnamurthy S., Masuda H., Dam P., Van; Woodward W.A., Viens P., Cristofanilli M., Birnbaum D., Dirix L., Reuben J.M. Uncovering the molecular secrets of inflammatory breast cancer biology: an integrated analysis of three distinct affymetrix gene expression datasets. Clin. Cancer Res. 2013;19:4685–4696. doi: 10.1158/1078-0432.CCR-12-2549. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Matise L.A., Palmer T.D., Ashby W.J., Nashabi A., Chytil A., Aakre M., Pickup M.W., Gorska A.E., Zijlstra A., Moses H.L. Lack of transforming growth factor-β signaling promotes collective cancer cell invasion through tumor-stromal crosstalk. Breast Cancer Res. 2012;14:R98. doi: 10.1186/bcr3217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Giampieri S., Manning C., Hooper S., Jones L., HIll C.S., Sahai E. Localized and reversible TGFβ signalling switches breast cancer cells from cohesive to single cell motility. Nat. Cell Biol. 2009;11:1287–1296. doi: 10.1038/ncb1973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Jia W., Deshmukh A., Mani S.A., Jolly M.K., Levine H. A possible role for epigenetic feedback regulation in the dynamics of the epithelial-mesenchymal transition (EMT) Phys. Biol. 2019;16 doi: 10.1088/1478-3975/ab34df. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Woodward W.A., Krishnamurthy S., Yamauchi H., El-Zein R., Ogura D., Kitadai E., Niwa S., Cristofanilli M., Vermeulen P., Dirix L., Viens P., Van Laere S., Bertucci F., Reuben J.M., Ueno N.T. Genomic and expression analysis of microdissected inflammatory breast cancer. Breast Cancer Res. Treat. 2013;138:761–772. doi: 10.1007/s10549-013-2501-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Boersma B.J., Reimers M., Yi M., Ludwig J.A., Luke B.T., Stephens R.M., Yfantis H.G., Lee D.H., Weinstein J.N., Ambs S. A stromal gene signature associated with inflammatory breast cancer. Int. J. Cancer. 2008;122:1324–1332. doi: 10.1002/ijc.23237. [DOI] [PubMed] [Google Scholar]
- 12.Jolly M.K., Somarelli J.A., Sheth M., Biddle A., Tripathi S.C., Armstrong A.J., Hanash S.M., Bapat S.A., Rangarajan A., Levine H. Hybrid epithelial/mesenchymal phenotypes promote metastasis and therapy resistance across carcinomas. Pharmacol. Ther. 2019;194:161–184. doi: 10.1016/j.pharmthera.2018.09.007. [DOI] [PubMed] [Google Scholar]
- 13.Tan T.Z., Miow Q.H., Miki Y., Noda T., Mori S., Huang R.Y.-J., Thiery J.P. Epithelial-mesenchymal transition spectrum quantification and its efficacy in deciphering survival and drug responses of cancer patients. EMBO Mol. Med. 2014;6:1279–1293. doi: 10.15252/emmm.201404208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Guo C.C., Majewski T., Zhang L., Yao H., Bondaruk J., Wang Y., Zhang S., Wang Z., Lee J.G., Lee S., Cogdell D., Zhang M., Wei P., Grossman H.B., Kamat A., Duplisea J.J., Ferguson J.E., Huang H., Dadhania V., Gao J., Dinney C., Weinstein J.N., Baggerly K., McConkey D., Czerniak B. Dysregulation of EMT drives the progression to clinically aggressive sarcomatoid bladder cancer. Cell Rep. 2019;27:1781–1793. doi: 10.1016/j.celrep.2019.04.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.George J.T., Jolly M.K., Xu S., Somarelli J.A., Levine H. Survival outcomes in cancer patients predicted by a partial EMT gene expression scoring metric. Cancer Res. 2017;77:6415–6428. doi: 10.1158/0008-5472.CAN-16-3521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bertucci F., Finetti P., Rougemont J., Charafe-Jauffret E., Nasser V., Loriod B., Camerlo J., Tagett R., Tarpin C., Houvenaeghel G., Nguyen C., Maraninchi D., Jacquemier J., Houlgatte R., Birnbaum D., Viens P. Gene expression profiling for molecular characterization of inflammatory breast cancer and prediction of response to chemotherapy. Cancer Res. 2004;64:8558–8565. doi: 10.1158/0008-5472.CAN-04-2696. [DOI] [PubMed] [Google Scholar]
- 17.Van Laere S., Van Der Auwera I., Van Den Eynden G.G., Fox S.B., Bianchi F., Harris A.L., Van Dam P., Van Marck E.A., Vermeulen P.B., Dirix L.Y. Distinct molecular signature of inflammatory breast cancer by cDNA microarray analysis. Breast Cancer Res. Treat. 2005;93:237–246. doi: 10.1007/s10549-005-5157-z. [DOI] [PubMed] [Google Scholar]
- 18.Barbie D.A., Tamayo P., Boehm J.S., Kim S.Y., Moody S.E., Dunn I.F., Schinzel A.C., Sandy P., Meylan E., Scholl C., Fröhling S., Chan E.M., Sos M.L., Michel K., Mermel C., Silver S.J., Weir B.A., Reiling J.H., Sheng Q., Gupta P.B., Wadlow R.C., Le H., Hoersch S., Wittner B.S., Ramaswamy S., Livingston D.M., Sabatini D.M., Meyerson M., Thomas R.K., Lander E.S., Mesirov J.P., Root D.E., Gilliland D.G., Jacks T., Hahn W.C. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature. 2009;462:108–112. doi: 10.1038/nature08460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Jolly M.K., Boareto M., Debeb B.G., Aceto N., Farach-Carson M.C., Woodward W.A., Levine H. Inflammatory breast cancer: a model for investigating cluster-based dissemination. NPJ Breast Cancer. 2017;3:21. doi: 10.1038/s41523-017-0023-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Chakraborty P., George J.T., Tripathi S., Levine H., Jolly M.K. Comparative study of transcriptomics-based scoring metrics for the epithelial-hybrid-mesenchymal spectrum. Front. Bioeng. Biotechnol. 2020;8:220. doi: 10.3389/fbioe.2020.00220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Robertson F.M., Bondy M., Yang W., Yamauchi H., Wiggins S., Kamrudin S., Krishnamurthy S., Le-Petross H., Bidaut L., Player A.N., Barsky S.H., Woodward W.A., Buchholz T., Lucci A., Ueno N., Cristofanilli M. Inflammatory breast cancer: the disease, the biology, the treatment. CA Cancer J. Clin. 2010;60:351–375. doi: 10.3322/caac.20082. [DOI] [PubMed] [Google Scholar]
- 22.Bertucci F., Finetti P., Rougemont J., Charafe-Jauffret E., Cervera N., Tarpin C., Nguyen C., Xerri L., Houlgatte R., Jacquemier J., Viens P., Birnbaum D. Gene expression profiling identifies molecular subtypes of inflammatory breast cancer. Cancer Res. 2005;65:2170–2178. doi: 10.1158/0008-5472.CAN-04-4115. [DOI] [PubMed] [Google Scholar]
- 23.Hamida A.B., Labidi I.S., Mrad K., Charafe-Jauffret E., Arab S., Ben; Esterni B., Xerri L., Viens P., Bertucci F., Birnbaum D., Jacquemier J. Markers of subtypes in inflammatory breast cancer studied by immunohistochemistry: prominent expression of P-cadherin. BMC Cancer. 2008;8:28. doi: 10.1186/1471-2407-8-28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ribeiro A.S., Paredes J. P-cadherin linking breast cancer stem cells and invasion: a promising marker to identify an “intermediate/metastable” EMT state. Front. Oncol. 2015;4:371. doi: 10.3389/fonc.2014.00371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Plutoni C., Bazellieres E., Le Borgne-Rochet M., Comunale F., Brugues A., Séveno M., Planchon D., Thuault S., Morin N., Bodin S., Trepat X. Gauthier-Rouvière, C. P-cadherin promotes collective cell migration via a Cdc42-mediated increase in mechanical forces. J. Cell Biol. 2016;212:199–217. doi: 10.1083/jcb.201505105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Ribeiro A.S., Albergaria A., Sousa B., Correia A.L., Bracke M., Seruca R., Schmitt F.C., Paredes J. Extracellular cleavage and shedding of P-cadherin: a mechanism underlying the invasive behaviour of breast cancer cells. Oncogene. 2010;29:392–402. doi: 10.1038/onc.2009.338. [DOI] [PubMed] [Google Scholar]
- 27.Vieira A.F., Ricardo S., Ablett M.P., Dionísio M.R., Mendes N., Albergaria A., Farnie G., Gerhard R., Cameselle-Teijeiro J.F., Seruca R., Schmitt F., Clarke R.B., Paredes J. P-cadherin is coexpressed with CD44 and CD49f and mediates stem cell properties in basal-like breast cancer. Stem Cells. 2012;30:854–864. doi: 10.1002/stem.1075. [DOI] [PubMed] [Google Scholar]
- 28.Jolly M.K., Mani S.A., Levine H. Hybrid epithelial/mesenchymal phenotype(s): the “fittest” for metastasis? BBA Rev. Cancer. 2018;1870:151–157. doi: 10.1016/j.bbcan.2018.07.001. [DOI] [PubMed] [Google Scholar]
- 29.Shimomura Y., Wajid M., Shapiro L., Christiano A.M. P-cadherin is a p63 target gene with a crucial role in the developing human limb bud and hair follicle. Development. 2008;135:743–753. doi: 10.1242/dev.006718. [DOI] [PubMed] [Google Scholar]
- 30.Bocci F., Tripathi S.C., Vilchez M.S.A., George J.T., Casabar J., Wong P., Hanash S., Levine H., Onuchic J., Jolly M. NRF2 activates a partial epithelial-mesenchymal transition and is maximally present in a hybrid epithelial/mesenchymal phenotype. Integr. Biol. 2019;11:251–263. doi: 10.1093/intbio/zyz021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Bocci F., Jolly M.K., Tripathi S.C., Aguilar M., Hanash S.M., Levine H., Onuchic J.N. Numb prevents a complete epithelial-mesenchymal transition by modulating Notch signaling. J. R. Soc. Interface. 2017:14. doi: 10.1098/rsif.2017.0512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Subbalakshmi A.R., Kundnani D., Biswas K., Ghosh A., Hanash S.M., Tripathi S.C., Jolly M.K. NFATc acts as a non-canonical phenotypic stability fatcor for a hybrid epithelial/mesenchymal phenotype. bioRxiv. 2020 doi: 10.3389/fonc.2020.553342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Paredes J., Albergaria A., Oliveira J.T., Jeronimo C., Milanezi F., Schmitt F.C. P-cadherin overexpression is an indicator of clinical outcome in invasive breast carcinomas and is associated with CDH3 promoter hypomethylation. Clin. Cancer Res. 2005;11:5869–5877. doi: 10.1158/1078-0432.CCR-05-0059. [DOI] [PubMed] [Google Scholar]
- 34.Bocci F., Gearhart-Serna L., Boareto M., Ribeiro M., Ben-Jacob E., Devi G.R., Levine H., Onuchic J.N., Jolly M.K. Toward understanding cancer stem cell heterogeneity in the tumor microenvironment. Proc. Natl. Acad. Sci. U.S.A. 2019;116:148–157. doi: 10.1073/pnas.1815345116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Cheung K.J., Padmanaban V., Silvestri V., Schipper K., Cohen J.D., Fairchild A.N., Gorin M.A., Verdone J.E., Pienta K.J., Bader J.S., Ewald A.J. Polyclonal breast cancer metastases arise from collective dissemination of keratin 14-expressing tumor cell clusters. Proc. Natl. Acad. Sci. 2016;113:E854–E863. doi: 10.1073/pnas.1508541113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wolfe A.R., Trenton N.J., Debeb B.G., Larson R., Ruffell B., Chu K., Hittelman W., Diehl M., Reuben J.M., Naoto T., Woodward W.A. Mesenchymal stem cells and macrophages interact through IL-6 to promote inflammatory breast cancer in pre-clinical models. Oncotarget. 2016;7:82482–82492. doi: 10.18632/oncotarget.12694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Aceto N., Bardia A., Miyamoto D.T., Donaldson M.C., Wittner B.S., Spencer J.A., Yu M., Pely A., Engstrom A., Zhu H. others Circulating tumor cell clusters are oligoclonal precursors of breast cancer metastasis. Cell. 2014;158:1110–1122. doi: 10.1016/j.cell.2014.07.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Jolly M.K., Ware K.E., Gilja S., Somarelli J.A., Levine H. EMT and MET : necessary or permissive for metastasis ? Mol. Oncol. 2017;11:755–769. doi: 10.1002/1878-0261.12083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Thangavel H., Angelis C., De; Vasaikar S., Bhat R., Jolly M.K., Nagi C., Creighton C.J., Chen F., Dobrolecki L.E., George J.T., Kumar T., Abdulkareem N.M., Mao S., Nardone A., Rimawi M., Osborne C.K., Lewis M.T., Levine H., Zhang B., Schiff R., Giuliano M., Trivedi M.V. A CTC-cluster-specific signature derived from OMICS analysis of patient-derived xenograft tumors predicts outcomes in basal-like breast cancer. J. Clin. Med. 2019;8:1772. doi: 10.3390/jcm8111772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Cohen E.N., Gao H., Anfossi S., Mego M., Reddy N.G., Debeb B., Giordano A., Tin S., Wu Q., Garza R.J., Cristofanilli M., Mani S.A., Croix D.A., Ueno N.T., Woodward W.A., Luthra R., Krishnamurthy S., Reuben J.M. Inflammation mediated metastasis: immune induced epithelial-to-mesenchymal transition in inflammatory breast cancer cells. PLoS One. 2015:10. doi: 10.1371/journal.pone.0132710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Li X., Jolly M.K., George J.T., Pienta K.J., Levine H. Computational modeling of the crosstalk between macrophage polarization and tumor cell plasticity in the tumor microenvironment. Front. Oncol. 2019;9:1–12. doi: 10.3389/fonc.2019.00010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Tripathi S., Jolly M.K., Woodward W.A., Levine H., Deem M.W. Analysis of hierarchical organization in gene expression networks reveals underlying principles of collective tumor cell dissemination and metastatic aggressiveness of inflammatory breast cancer. Front. Oncol. 2018;8:244. doi: 10.3389/fonc.2018.00244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Jolly M.K., Celia-Terrassa T. Dynamics of phenotypic heterogeneity associated with EMT and stemness during cancer progression. J. Clin. Med. 2019;8:1542. doi: 10.3390/jcm8101542. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Meyer A.S., Heiser L.M. Systems biology approaches to measure and model phenotypic heterogeneity in cancer. Curr. Opin. Syst. Biol. 2019;17:35–40. doi: 10.1016/j.coisb.2019.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Sharma A., Merritt E., Hu X., Cruz A., Jiang C., Sarkodie H., Zhou Z., Malhotra J., Riedlinger G.M., De S. Non-genetic intra-tumor heterogeneity is a major predictor of phenotypic heterogeneity and ongoing evolutionary dynamics in lung tumors. Cell Rep. 2019;29:2164–2174. doi: 10.1016/j.celrep.2019.10.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Tripathi S., Chakraborty P., Levine H., Jolly M.K. A mechanism for epithelial-mesenchymal heterogeneity in a population of cancer cells. PLoS Comput. Biol. 2020;16 doi: 10.1371/journal.pcbi.1007619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Shin Y., Han S., Chung E., Chung S. Intratumoral phenotypic heterogeneity as an encourager of cancer invasion. Integr. Biol. (United Kingdom) 2014;6:654–661. doi: 10.1039/c4ib00022f. [DOI] [PubMed] [Google Scholar]
- 48.Farquhar K.S., Charlebois D.A., Szenk M., Cohen J., Nevozhay D., Balázsi G. Role of network-mediated stochasticity in mammalian drug resistance. Nat. Commun. 2019;10:2766. doi: 10.1038/s41467-019-10330-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Hari K., Sabuwala B., Subramani B.V., Porta C., La; Zapperi S., Font-Clos F., Jolly M.K. Identifying inhibitors of epithelial-mesenchymal plasticity using a network topology based approach. npj Syst. Biol. Appl. 2020;6:15. doi: 10.1038/s41540-020-0132-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Lee J., Lee J., Farquhar K.S., Yun J., Frankenberger C.A., Bevilacqua E., Yeung K., Kim E.-.J., Balázsi G., Rosner M.R. Network of mutually repressive metastasis regulators can promote cell heterogeneity and metastatic transitions. Proc. Natl. Acad. Sci. U.S.A. 2014;111:E364–E373. doi: 10.1073/pnas.1304840111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Saha M., Kumar S., Bukhari S., Balaji S.A., Kumar P., Hindupur S.K., Rangarajan A. AMPK–Akt double-negative feedback loop in breast cancer cells regulates their adaptation to matrix deprivation. Cancer Res. 2018;78:1497–1510. doi: 10.1158/0008-5472.CAN-17-2090. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated during the current study.