Abstract
The claudin-low breast cancer subtype is defined by gene expression characteristics and encompasses a remarkably diverse range of breast tumors. Here, we investigate genomic, transcriptomic, and clinical features of claudin-low breast tumors. We show that claudin-low is not simply a subtype analogous to the intrinsic subtypes (basal-like, HER2-enriched, luminal A, luminal B and normal-like) as previously portrayed, but is a complex additional phenotype which may permeate breast tumors of various intrinsic subtypes. Claudin-low tumors are distinguished by low genomic instability, mutational burden and proliferation levels, and high levels of immune and stromal cell infiltration. In other aspects, claudin-low tumors reflect characteristics of their intrinsic subtype. Finally, we explore an alternative method for identifying claudin-low tumors and thereby uncover potential weaknesses in the established claudin-low classifier. In sum, these findings elucidate the heterogeneity in claudin-low breast tumors, and substantiate a re-definition of claudin-low as a cancer phenotype.
Subject terms: Cancer, Breast cancer, Classification and taxonomy, Diagnostic markers
In breast cancer, the claudin-low breast cancer subtype is remarkably diverse. Here, the authors propose that claudin-low is not a classical intrinsic breast cancer subtype, but rather a complex additional phenotype that can occur across intrinsic subtypes.
Introduction
The five breast cancer intrinsic subtypes were initially identified by hierarchical clustering of genes with significantly greater variation in expression between different breast tumors than between paired tumor samples pre- and post-chemotherapy1,2. Claudin-low breast tumors did not emerge as an independent group in this analysis. The claudin-low breast cancer subtype was discovered 7 years later in an integrated analysis of human and murine mammary tumors3. The existence of this subtype has later been observed in several independent breast cancer cohorts4–9, and an analogous claudin-low subtype has been identified in bladder cancer10,11.
The claudin-low breast cancer subtype is defined by gene expression characteristics, most prominently: Low expression of cell–cell adhesion genes, high expression of epithelial–mesenchymal transition (EMT) genes, and stem cell-like/less differentiated gene expression patterns12. Beyond these gene expression features, claudin-low tumors have marked immune and stromal cell infiltration9,12, but are in many other aspects remarkably heterogeneous. No specific genomic aberrations accurately delineate claudin-low tumors, and there is a greater variation in mutational burden and degree of copy number aberration (CNA) than in the other breast cancer subtypes13. Claudin-low tumors are, however, often genomically stable, potentially due to their less differentiated state and a protective effect mediated by the EMT-related transcription factor ZEB114,15. Claudin-low breast tumors are reported to be mostly estrogen receptor (ER)-negative, progesterone receptor (PR)-negative, and human epidermal growth factor receptor 2 (HER2)-negative (triple negative), and are associated with poor prognosis12,16. The prevalence of claudin-low breast cancer shows striking variability, ranging from 1.5% to 14% of tumors in breast cancer cohorts5,7,8,12.
An algorithm (predictor) for identifying claudin-low tumors was described with the original characterization of the subtype12. Briefly, nine claudin-low cell lines were identified by hierarchical clustering of gene expression values of 1906 breast cancer intrinsic genes17 in a cohort of 52 cell lines. Cell lines were used to build the claudin-low predictor, rather than bulk tumor samples, to minimize immune and stromal infiltration as confounding factors12. Two centroids were then defined: one for the cell lines with claudin-low gene expression features and one for all other breast cancer cell lines. Claudin-low tumors are identified by correlating a tumor’s gene expression values to the two centroids and defining a tumor as claudin-low if it has stronger correlation to the claudin-low centroid than the other centroid. Importantly, the intrinsic subtypes (basal-like, HER2-enriched, luminal A, luminal B and normal-like) are first identified using the PAM50 predictor17, and claudin-low subtyping is subsequently performed as an isolated second step12. In published studies, claudin-low is treated as a sixth intrinsic subtype, and the subtype assigned by PAM50 is therefore overwritten in claudin-low tumors5,8,9,12. As a consequence, claudin-low breast tumors have, thus far, been characterized as a single group, without regard for the distribution of the underlying intrinsic subtypes in the given set of claudin-low tumors8,9,12,13.
In this study, we aim to elucidate the heterogeneity observed in claudin-low breast cancer. By stratifying claudin-low tumors according to intrinsic subtype, we show that the characteristics of claudin-low tumors reflect the intrinsic subtype to which they are initially assigned. Further, we explore an alternative method for identifying claudin-low tumors, and demonstrate that the nine-cell line claudin-low predictor12 may be overly inclusive in classifying tumors with marked immune and stromal infiltration as claudin-low.
Results
Claudin-low breast tumors are delineated by intrinsic subtype
We identified 87 claudin-low tumors (4.6%) in the METABRIC cohort4,5 using the nine-cell line claudin-low predictor12,18 (Supplementary Data 1). By intrinsic subtype, the majority of these were classified either as basal-like (51.7%, n = 45), normal-like (32.2%, n = 28) or luminal A (LumA; 10.3%, n = 9) (Fig. 1a, Table 1). 14.6% and 15.3% of all basal-like and normal-like tumors, respectively, were identified as claudin-low. All three remaining subtypes were represented in the set of claudin-low tumors, but with a lower prevalence, representing 0.6 - 1.3% of tumors from each subtype. The distribution of intrinsic subtypes within the set of claudin-low tumors differed significantly from the distribution of intrinsic subtypes in non-claudin-low tumors (P < 0.001, χ2-test). Basal-like and normal-like tumors were significantly overrepresented in the set of claudin-low tumors, while the remaining intrinsic subtypes were significantly underrepresented (P = 0.001 for HER2-enriched, P < 0.001 for all other, Fisher’s exact test). Only two HER2-enriched and three luminal B (LumB) tumors were classified as claudin-low. These two subtypes were not analyzed further due to low sample numbers. Claudin-low tumors broadly showed similar histology to non-claudin-low tumors (Supplementary Data 1), with 70% of tumors being classified as no special type (NST). One metaplastic tumor was found in the cohort, which was classified as claudin-low.
Table 1.
Intrinsic subtype | Claudin-low (n) | Non- claudin-low (n) | Proportion claudin-low in subtype |
---|---|---|---|
Basal-like | 45 | 263 | 14.6% |
HER2-enriched | 2 | 231 | 0.9% |
LumA | 9 | 684 | 1.3% |
LumB | 3 | 466 | 0.6% |
Normal-like | 28 | 155 | 15.3% |
Source data are provided as a Source Data file.
There were significant differences in the proportion of tumors expressing estrogen receptor when claudin-low tumors were stratified by intrinsic subtype (Fig. 1b; P < 0.001, χ2-test). 28.6%, 100% and 85.7% of basal-like, LumA, and normal-like claudin-low tumors, respectively, were ER-positive, closely reflecting the pattern seen in non-claudin low tumors (Fig. 1b). These findings indicate that the expression of ER in claudin-low tumors is reflected in their intrinsic subtype, and that characterizing claudin-low tumors as a triple negative subgroup of breast cancer9,12 is an oversimplification.
Claudin-low tumors, as a whole, have previously been reported to have a low mutational burden and low level of genomic instability compared to the other subtypes13,14. Whole genome copy number data and sequence data from a panel of 173 cancer-associated genes were available for the METABRIC cohort4,5. When claudin-low tumors were stratified by intrinsic subtype, they consistently showed lower mutational burden and genomic instability compared to their non-claudin-low counterparts (Fig. 1c, d), with the exception of genomic instability in LumA tumors. There were, however, also significant differences in mutational burden (P = 0.002, Kruskal–Wallis test) and genomic instability (P < 0.001, Kruskal–Wallis test) between claudin-low tumors of the different intrinsic subtypes. Despite a degree of subtype specific variations, these findings point toward lower mutational rate and lower levels of genomic instability as bona fide claudin-low characteristics.
Curtis et al.4 introduced breast cancer subtypes (IntClust) defined by patterns of CNA with cis correlation to gene expression. The genomically stable IntClust4 subtype showed overlap with claudin-low tumors4,5,14. In our analyses, 75% of all claudin-low tumors in the METABRIC cohort were classified as IntClust4. Stratified by intrinsic subtype, claudin-low tumors were consistently more likely to be classified as IntClust4 compared to non-claudin-low tumors of the same subtype (Fig. 1e). There were however significant variations in the proportion of claudin-low tumors classified as IntClust4 (P < 0.001, χ2-test), ranging from 60% of basal-like claudin-low tumors to 100% of normal-like claudin-low tumors. Further, IntClust4 tumors have been separated into ER-positive and ER-negative groups due to major differences in their biological and clinical characteristics, despite strong similarities in gene expression patterns and associated low levels of CNA4,5,19. Claudin-low tumors classified as IntClust4ER+ were predominantly LumA and normal-like, whereas claudin-low tumors classified as IntClust4ER− were predominantly basal-like (Supplementary Fig. 1a, b).
The high frequency of claudin-low tumors classified as IntClust4 supports the association between claudin-low gene expression characteristics and genomic stability. However, only 21% of all IntClust4 tumors in the METABRIC cohort were classified as claudin-low, and genomic instability index (GII) did not accurately predict correlation to the claudin-low centroid, as determined by the nine-cell line predictor12 (Supplementary Fig. 2). Thus, while most claudin-low tumors were genomically stable, only a subset of genomically stable tumors were claudin-low.
No putative driver20 mutations or CNAs were found at a significantly higher rate in claudin-low tumors, stratified by intrinsic subtype, than in non-claudin-low tumors of the same subtype (Fisher’s exact test, Bonferroni corrected; Supplementary Data 2). Rather, claudin-low tumors tended to exhibit patterns of mutation/CNA associated with their intrinsic subtype. Reflecting the lower levels of genomic instability and mutational burden, claudin-low tumors generally had lower incidences of potential driver aberrations compared to their non-claudin-low counterparts. To illustrate the relative frequencies of driver aberrations in claudin-low and non-claudin-low tumors, we selected four early genomic driver aberrations for further analysis: TP53 mutation, PIK3CA mutation, MYC gain (located on 8q24), and MDM4 gain (located on 1q32). Similar to the pattern observed for ER-positivity, the incidence of TP53 mutations in claudin-low tumors largely followed the incidence seen in the tumors’ intrinsic subtype (Fig. 1f). The differences in TP53 mutation rates between claudin-low tumors stratified by intrinsic subtype were statistically significant (P < 0.001, χ2-test). There were similar trends for the other three aberrations analyzed (Fig. 1g–i). Claudin-low tumors stratified by intrinsic subtype showed significantly different rates of MYC and MDM4 gain (P < 0.001 and P = 0.006, χ2-test), but not PIK3CA mutation (P = 0.19, χ2-test).
Claudin-low tumors have previously been characterized as slower cycling, with proliferation levels lower than in basal-like tumors, but higher than in LumA and normal-like tumors8,12. Ki-67, encoded by the MKI67 gene, is a commonly used proliferation marker. When claudin-low tumors were stratified by intrinsic subtype, there were significantly different levels of MKI67 expression between subtypes (Fig. 1j; P < 0.001, Kruskal–Wallis test), with basal-like claudin-low tumors showing significantly higher levels of MKI67 expression than LumA claudin-low tumors and normal-like claudin-low tumors (P < 0.001 for both, Wilcoxon rank-sum test). Claudin-low tumors did, however, also show significantly lower levels of MKI67 expression than non-claudin-low counterparts in all intrinsic subtypes (Fig. 1j; P = 0.01, 0.03 and <0.001 claudin-low compared to non-claudin-low in basal-like, LumA, and normal-like tumors, respectively, Wilcoxon rank-sum test). Thus, MKI67 gene expression levels indicate that claudin-low tumors reflect the proliferation levels of their intrinsic subtype but are also slower cycling than non-claudin-low counterparts.
Claudin-low tumors have previously been associated with poor prognosis8,12. This characterization was accurate when claudin-low tumors were viewed as a single group (Supplementary Fig. 1c). However, when the survival of patients with claudin-low tumors was stratified by intrinsic subtype, the survival patterns generally observed in non-claudin-low breast cancer2 re-emerged (Fig. 2a). Further, there were no significant differences in survival between patients with claudin-low and non-claudin-low tumors within each intrinsic subtype (Fig. 2b–d). Thus, we did not find evidence indicating that claudin-low status affects survival in breast cancer patients.
Claudin-low tumors have been reported to mostly occur in younger patients, with age at diagnosis slightly higher than in basal-like tumors, but lower than in the remaining subtypes8,9. When claudin-low tumors were stratified by intrinsic subtype, there were, however, significant differences in the average age at diagnosis (P = 0.01, Kruskal–Wallis test; Supplementary Fig. 1d), with basal-like claudin-low tumors diagnosed at a significantly lower age than LumA claudin-low and normal-like claudin-low tumors (P = 0.03 and 0.01 respectively, Wilcoxon rank-sum test). Claudin-low and non-claudin-low tumors of the same intrinsic subtype showed similar age at diagnosis (basal-like P = 0.67, LumA P = 0.53, normal-like P = 0.052, two-tailed Wilcoxon rank-sum test).
A condensed gene list refines claudin-low classification
Claudin-low tumors have been shown to exhibit high degrees of immune and stromal infiltration9,12. Also when stratified by intrinsic subtype, claudin-low tumors in the METABRIC cohort consistently had higher infiltration of immune and stromal cells compared to non-claudin-low tumors (as determined by ESTIMATE, a gene expression-based tool for inferring normal-cell infiltration in tumors21) (Supplementary Fig. 3a, b). The nine-cell line claudin-low predictor uses 807 genes, and Prat et al. acknowledge that it may inappropriately identify some tumors as claudin-low solely due to stromal infiltration12. This statement is supported by a strong correlation between a tumor’s inferred21 stromal infiltration and closeness to the nine-cell line claudin-low centroid (R2 = 0.76, linear regression; Supplementary Fig. 3c, d). A similar, but weaker trend was also observed for inferred21 immune cell infiltration (R2 = 0.27, linear regression; Supplementary Fig. 3e, f). We therefore considered whether a more targeted gene list could be used for claudin-low classification, in order to more accurately isolate features intrinsic to claudin-low tumors.
We created a condensed claudin-low gene list (Supplementary Table 1), consisting of 19 genes representing only the pathognomonic gene expression characteristics of claudin-low tumors: Low expression of cell–cell adhesion genes, high expression of EMT genes, and gene expression patterns typical of stem cell-like/less differentiated cells3,8,9,12,14. In the METABRIC cohort, hierarchical clustering of gene expression values, using the condensed gene list, revealed a tumor cluster with gene expression characteristics in line with those previously described in claudin-low tumors (Fig. 3; P = 0.006, SigClust22). We refer to tumors in this cluster as core claudin-low (CoreCL), while claudin-low tumors (as defined by the nine-cell line predictor) outside the CoreCL cluster are referred to as other claudin-low (OtherCL). Individual inspection of gene expression values showed that OtherCL tumors displayed certain claudin-low characteristics, albeit to a lesser degree than CoreCL tumors (Supplementary Fig. 4).
The CoreCL cluster consisted of 79 tumors (4.2% of tumors in the cohort), of which 57 (72.2%) were identified as claudin-low by the nine-cell line predictor (Supplementary Data 1). While several intrinsic subtypes were prominently represented in the group of CoreCL tumors, the OtherCL (n = 30) tumors were predominantly basal-like (n = 23; Fig. 4a). Thus, our method for identifying claudin-low tumors primarily differed from the nine-cell line predictor by filtering out a set of basal-like tumors with high levels of stromal and immune infiltration (Supplementary Fig. 5a, b), but without pathognomonic claudin-low gene expression characteristics (Supplementary Fig. 6).
There were marked contrasts between the characteristics of basal-like CoreCL tumors (n = 25), basal-like OtherCL-tumors (n = 23), and non-claudin-low basal-like tumors (n = 260). Basal-like CoreCL tumors carried significantly fewer mutations than basal-like OtherCL tumors and non-claudin-low basal-like tumors (Fig. 4b; P = 0.015 & P < 0.001 respectively, Wilcoxon rank-sum test). Basal-like CoreCL tumors also displayed significantly lower levels of genomic instability than basal-like OtherCL tumors and non-claudin-low basal-like tumors (Fig. 4c; P < 0.001 for both, Wilcoxon rank-sum test). There were no significant differences in GII between basal-like OtherCL tumors and non-claudin-low basal-like tumors (Fig. 4c, P = 0.082, Wilcoxon rank-sum test). There was also a greater proportion of basal-like CoreCL tumors in IntClust4, than basal-like OtherCL and non-claudin-low basal-like tumors (Fig. 4d, Supplementary Fig. 5c, d). In total, 80% of basal-like CoreCL tumors were classified as IntClust4, in contrast to 43% of basal-like OtherCL tumors and 10% of basal-like non-claudin-low tumors. There were also lower rates of TP53 mutation, MYC gain and MDM4 gain, in basal-like CoreCL tumors compared to basal-like OtherCL and basal-like non-claudin-low tumors, reflecting the lower mutational burden and GII (Supplementary Fig. 5e, g). This trend was, however, not evident for PIK3CA (Supplementary Fig. 5h). Basal-like CoreCL tumors expressed significantly lower levels of MKI67 than basal-like OtherCL and basal-like non-claudin-low tumors (Fig. 4e; P < 0.001 for both, Wilcoxon rank-sum test). There were no significant differences in MKI67 expression between basal-like OtherCL and basal-like non-claudin-low tumors (P = 0.63, Wilcoxon rank-sum test). In sum, the characteristics of basal-like OtherCL tumors show weaker concordance with the characteristics of claudin-low tumors, compared to basal-like CoreCL tumors. It is therefore likely that OtherCL tumors are classified as claudin-low by the nine-cell line predictor due to their stromal infiltration (Supplementary Figs. 3c, 5b), and that the classification of these tumors as claudin-low may be dubious.
Despite differences in genomic and transcriptomic features, as well as in immune and stromal infiltration, there were no significant differences in survival between basal-like CoreCL, basal-like OtherCL and non-claudin-low basal-like tumors (Fig. 4f). These findings reinforce our observations indicating that claudin-low status is not a major determinant of survival in breast cancer patients.
There were few OtherCL samples not classified as basal-like (n = 1, 3, and 3 for LumA, LumB and normal-like tumors, respectively; Fig. 4a). The characteristics of normal-like CoreCL (n = 39) and LumA CoreCL (n = 13) tumors were similar to the characteristics of normal-like claudin-low and LumA claudin-low tumors identified by the nine-cell line predictor (Supplementary Figs. 7, 8). These findings indicate that the nine-cell line predictor’s promiscuous classification of stromally infiltrated tumors as claudin-low may mostly be of concern in basal-like tumors.
Validation cohorts reinforce key claudin-low characteristics
To validate our findings, we queried the Oslo2 cohort23, for which gene expression data and ER/HER2 status were available. There were 29 claudin-low tumors, as defined by the nine-cell line predictor, in the cohort (7.6%), of which most were classified as basal-like, LumA or normal-like (n = 7, 5 and 11, respectively; Supplementary Fig. 9a, Supplementary Data 3). When clustering using the condensed claudin-low gene list, there was a cluster with claudin-low gene expression characteristics and high levels of immune and stromal cell infiltration (Supplementary Fig. 9b; P < 0.001, SigClust22). 28 tumors in the cohort (7.3%) were located in the core claudin-low cluster (Supplementary Fig. 9c), of which 16 (57%) were identified as claudin-low by the nine-cell line predictor. Seven basal-like tumors were classified as claudin-low by the nine-cell line predictor; two of these were CoreCL, both of which were IntClust4, and the remaining five were OtherCL, none of which were IntClust4. Using IntClust4 as a surrogate marker for low levels of genomic instability4,19,24, these findings emphasize that the nine-cell line predictor may be overly inclusive in identifying basal-like tumors as claudin-low. The OtherCL tumors in the Oslo2 cohort were, however, more diverse than in the METABRIC cohort, with 7 of 12 OtherCL tumors being non-basal-like (n = 1, 4 and 2 for HER2-enriched, LumA, and LumB, respectively). In total, 89% of CoreCL tumors in the Oslo2 cohort were classified as IntClust4, compared to 38% of OtherCL tumors and 20% of non-claudin-low tumors. Thus, the characteristics of claudin-low tumors in the Oslo2 cohort were mostly consistent with those observed in the METABRIC cohort.
Finally, we explored the TCGA breast cancer cohort7,25. 32 of 1082 tumors (3.0%) were classified as claudin-low by the nine cell-line predictor (Supplementary Data 4); however, no core claudin-low cluster emerged (Supplementary Fig. 10). As previously noted, non-tumor cell infiltration is a central characteristic of claudin-low tumors. An inclusion criterion in the TCGA protocol is a tumor cellularity over 60%7. The METABRIC cohort was originally divided into a discovery cohort with a cellularity cut-off of 40%, which had a claudin-low prevalence of 3.6%, and a validation cohort with no cellularity cut-off4, which had a claudin-low prevalence of 5.6%. There was no cut-off for cellularity in the Oslo2 cohort23, which had a claudin-low prevalence of 7.6%. Thus, there may be an association between cellularity cut-off in a cohort and claudin-low prevalence (Fig. 5). This strengthens the observation of non-tumor cell infiltration as a fundamental claudin-low characteristic and may explain the absence of a core-claudin-low cluster in the TCGA-BRCA cohort.
Claudin-low tumors in the TCGA breast cancer cohort mostly showed histological features in line with those of non-claudin-low tumors (Supplementary Data 4). There were eight metaplastic tumors in the cohort, of which six were classified as claudin-low, confirming that most metaplastic tumors are claudin-low26.
Discussion
Here, we have re-evaluated the characteristics of claudin-low breast tumors, from the perspective of claudin-low as a phenotype that may permeate the intrinsic subtypes. Through analyses of genomic, transcriptomic and clinical data, we have shown that the characteristics of claudin-low tumors reflect their intrinsic subtype. Characteristics that are associated with claudin-low status include marked immune and stromal cell infiltration, low levels of genomic instability and mutational burden, and reduced proliferation levels. Finally, we explored an alternative method for identifying claudin-low tumors, and thereby showed that a subset of tumors with pronounced immune and stromal infiltration may be inappropriately classified as claudin-low by the established claudin-low predictor12.
We stratified claudin-low tumors by intrinsic subtype and found differences between claudin-low tumors of different intrinsic subtypes in almost all aspects analyzed. Perhaps most surprisingly, we found no evidence indicating that claudin-low status affects disease-specific survival, contrasting with previous reports of claudin-low as a poor prognosis subtype8,12. These findings imply that a large subset of previously reported characteristics of claudin-low tumors are not bona fide claudin-low characteristics but are rather an average of the characteristics of several intrinsic subtypes. Thus, the established practice of analyzing claudin-low tumors as a single entity, without taking intrinsic subtype into consideration, may obscure the features that are attributable to claudin-low status.
Claudin-low breast cancer has previously been considered a single disease entity, analogous to the intrinsic breast cancer subtypes8,9,12,13 (Fig. 6a). Our findings, however, imply that breast tumors are not claudin-low instead of the intrinsic subtype to which they are assigned by the PAM50 predictor, rather that they can carry a claudin-low phenotype in addition to their intrinsic subtype (Fig. 6b). According to this interpretation, claudin-low is a measure of a set of biological features which is distinct from the set of biological features measured by the intrinsic subtypes.
We explored a method of identifying claudin-low tumors using a condensed gene list. The claudin-low tumors identified using this method (CoreCL) showed more consistent traits than the claudin-low tumors identified by the nine-cell line predictor. OtherCL tumors can be interpreted to not be genuine claudin-low tumors. OtherCL tumors did, however, display some genomic and transcriptomic traits which were consistent with the claudin-low phenotype, though to a lesser degree than CoreCL tumors. A compelling interpretation may instead be that claudin-low is a continuum (degree of “claudin-lowness”, Fig. 6c), rather than a binary feature (claudin-low vs. non-claudin-low, Fig. 6b). According to this interpretation, breast tumors might exist along a spectrum of claudin-lowness, in which they lie somewhere between: (1) non-claudin-low, fully concordant with an intrinsic subtype, (2) moderately claudin-low with marked imprint of an intrinsic subtype (exemplified by the average claudin-low tumor identified by the nine-cell line predictor), (3) extensively claudin-low, with limited imprint of an intrinsic subtype (exemplified by the average CoreCL tumor), or (4) purely claudin-low, with no imprint of intrinsic subtype (perhaps exemplified by special histological subtypes26,27). This model would be consistent with recent descriptions of partial EMT phenotypes in cancer28,29 and cellular pliancy as an etiological explanation for the claudin-low phenotype14,15.
Claudin-low tumors had high levels of non-tumor cell infiltration, and there was a lower prevalence of claudin-low tumors in the cohorts with a cut-off for tumor cellularity. It is also known that EMT-like gene expression features in tumors are similar to the gene expression characteristics of stromal tissue29, and a subset of normal breast tissue samples show marked similarities to claudin-low-like gene expression patterns30,31. In the context of these observations, it is pertinent to ask: How much of the claudin-low phenotype is a result of stromal infiltration, and could the claudin-low phenotype simply be an artifact of stromal infiltration? If the claudin-low phenotype were only a sampling artifact, irrelevant to a tumor’s biology, one would expect claudin-low tumors to be similarly distributed among the intrinsic subtypes. Claudin-low tumors were, however, overrepresented in basal-like and normal-like tumors, and underrepresented in the remaining intrinsic subtypes. Further, if claudin-lowness were only mediated by stromal infiltration, it should be possible to accurately identify claudin-low tumors solely on the basis of stromal infiltration. However, while almost all claudin-low tumors had high levels of stromal infiltration, only a minority of tumors with high levels of stromal infiltration were in fact classified as claudin-low (Supplementary Data 1, 3, and 4). Finally, numerous studies have identified features in claudin-low tumors (human, murine and cell-line), which are directly attributable to claudin-low tumor cells6,9,12,14,32. Therefore, non-tumor cell infiltration is undoubtedly an important feature of the claudin-low tumor microenvironment33–36, and may even be the feature that induces EMT in claudin-low tumor initiating cells15. However, the characteristics observed in claudin-low tumors cannot solely be attributed to non-tumor-cell infiltration.
While we did not find evidence that claudin-low status affects survival, certain claudin-low characteristics may nonetheless be clinically relevant and/or actionable. For example, claudin-low tumors show high levels of immune cell infiltration8,12, express high levels of PD-L113, are immunosuppressed by T-regulatory cells33, and carry low mutational burden;13,14 these factors may all be relevant for the efficacy of immunotherapeutics in claudin-low tumors. The EMT phenotype in claudin-low tumors may in itself be a therapeutic target, and may also have implications for chemoresistance37. Due to the major influence of non-tumor-cell infiltration, it is likely that immunocompetent animal models will be of particular importance for functionally evaluating how these features can be therapeutically targeted3,13,38,39.
Several limitations of this study should be noted. Despite analyzing over three thousand breast tumors, we identified relatively few claudin-low tumors. The findings presented in this article must therefore be interpreted with a degree of caution. While the Kaplan–Meier curves for the METABRIC cohort (Fig. 2, Supplementary Fig. 8) show clear resemblance to those observed in non-claudin-low tumors2, the claudin-low cohort was not powered to detect statistically significant differences between groups. Further, it is difficult to ascertain the extent to which the observations from bulk tumor samples represent the characteristics of tumor cells or non-tumor cells29. It is therefore likely that single-cell transcriptomic analyses will be required in order to effectively disentangle the features of tumor cells and infiltrating immune and stromal cells. Finally, it must be highlighted that we deliberately chose a biased approach to building the condensed claudin-low gene list. This choice was motivated by our findings (Supplementary Fig. 3) and informed by contemporary studies of claudin-low tumors6,9,14,15,32. We therefore stress that there is no gold standard for identifying claudin-low tumors, and that the method presented here may lack external generalizability. Additional approaches to refining claudin-low classification, which could be used in conjunction with the nine-cell line predictor or the method presented here, might include: Immunohistochemical staining of EMT-related protein markers, implementing a cut-off for maximum permitted GII, or checking for overlap with IntClust4 status.
In summary, we have comprehensively analyzed claudin-low breast tumors, and through these analyses substantiated a re-definition of claudin-low as a breast cancer phenotype. Our findings explain the large degree of heterogeneity observed in claudin-low breast tumors, thereby enabling more accurate and nuanced investigations into this poorly understood form of cancer.
Methods
Cohorts
The METABRIC4,5, Oslo223, and TCGA-BRCA7,25 cohorts were analyzed in this study. Processed data from the METABRIC cohort were downloaded from cBioportal;40,41 queried data include hormone receptor status, IntClust subtype, disease-specific survival, mutation status in a panel of 173 sequenced genes5, gene-centric copy number status, and normalized gene expression values. Intrinsic subtypes (identified using the PAM50 predictor17) for the METABRIC cohort were retrieved from supplementary files in Curtis et al.4. Copy number segments and tumor ploidy were retrieved from the repository associated with Pereira et al.5. There were 1886 tumors in the METABRIC cohort with aforementioned data available. Centrally reviewed histological classifications were kindly made available by Dr. Elena Provenzano42. Histological classification was available for 1575 tumors in the dataset.
For the Oslo2 cohort, normalized gene expression values, intrinsic subtypes (identified using PAM50) and hormone receptor status were downloaded from the Gene Expression Omnibus (GEO), accession GSE80999. All 381 samples from the cohort were included in the analyses. Analyses were carried out using GEOquery43 and Biobase44. Copy number data were only available for seven claudin-low tumors and was therefore not used in the analyses.
Normalized gene expression values, intrinsic subtype (identified using PAM50) and histological classification from tumors in the TCGA-BRCA cohort were downloaded from cBioportal7,25,40,41. All 1082 tumors from the TCGA-BRCA cohort were analyzed.
Transcriptomic analyses
The generation and pre-processing of gene expression data are described in the cohorts’ respective publications4,5,7,23,25. Gene expression values were mean centered and scaled (z-score). In the Oslo2 cohort, genes represented by multiple probes were reduced to a single gene expression value by finding the mean of all probes representing the given gene.
Claudin-low tumors were identified using the implementation of the nine-cell line claudin-low predictor12 in the Genefu18 package for R45. Euclidean distance was used as the distance metric for claudin-low classification. IntClust subtypes in the Oslo2 and TCGA-BRCA cohorts were determined using a gene-expression-based IntClust-classifier24 implemented in Genefu18. Immune and stromal infiltration was inferred from gene expression data using ImmuneScore and StromalScore derived by ESTIMATE21.
We observed that the nine-cell line claudin-low predictor was heavily influenced by the effect of non-tumor cell infiltration (Supplementary Fig. 3). This can be related to the marked stromal and immune infiltration in claudin-low tumors12, and to the partial overlap in gene expression features between stromal tissue and tumors with an EMT phenotype29–31. Given these challenges which arose from the unbiased approach used by Prat et al.12, and the progress made in the understanding of claudin-low tumors6,14,15,32, we chose to explore a biased approach to identifying claudin-low tumors. The reduced gene set used to identify core claudin-low tumors (Supplementary Table 1) was manually selected on the basis of published characterizations of claudin-low gene expression features and advances in understanding the etiological basis of claudin-low tumors3,6,8,9,12,14,15,32. We reasoned that the genes should capture the characteristics unique to claudin-low tumors: Low expression of cell–cell adhesion genes, high expression of EMT genes, and stem-cell like/undifferentiated gene expression pattern. Further, we reasoned that the gene list should not include characteristics that are not unique to claudin-low tumors, such as a low expression of luminal epithelium-related genes. Inclusion of such genes would risk recapitulating the intrinsic subtypes. Hierarchical clustering using the reduced gene list was performed by complete linkage with Euclidean distance as the distance metric. Clustering and visualization were performed using the ComplexHeatmap package46 for R. The significance of the core claudin-low cluster was evaluated using SigClust22.
Genomic analyses
GII was derived by dividing the number of copy number aberrant nucleotides by the total number of nucleotides in the genome. GII was ploidy-corrected by defining a segment as copy number aberrant if the copy number state deviated from the nearest integer value for ploidy. All GII values were ploidy-corrected.
Individually analyzed genomic aberrations were chosen according to the following criteria: (1) known function as early driver events;20,47 (2) among the most frequently observed aberrations in breast cancer;4,5 (3) significantly different incidence between intrinsic subtypes (χ2-test P < 0.05);4,5 (4) non-overlap with other selected events (i.e. only one CNA located on 8q24). TP53 mutation, PIK3CA mutation, MYC amplification (8q24), and MDM4 amplification (1q32) were selected for further analysis on the basis of these criteria.
Survival analyses
Survival analyses were performed using the Survival package48 for R, and Kaplan–Meier plots were generated using the Survminer package.
Statistical analyses
All significance tests (where applicable) were two-tailed. For continuous variables, Wilcoxon rank-sum test and Kruskal–Wallis test were used to test for differences between two or more than two groups, respectively. For categorical variables, Fisher’s exact test and χ2-test were used to test for differences between two or more than two groups, respectively. Significance in survival analyses was determined by log-rank tests. Adjustments were made for multiple hypothesis testing in the analyses detailed in Supplementary Data 2 (Bonferroni-correction); no other corrections were made for multiple hypothesis testing. Whiskers in box-and-whisker plots were generated using the Tukey method; individual data points were not plotted, as the imbalance in sample numbers between groups (Table 1) tended to obscure overall trends.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
The authors thank Aleix Prat and Ole Christian Lingjærde for insightful discussions and critical reading of the manuscript. We are grateful to Elena Provenzano for providing us with centrally reviewed histological classifications of tumors in the METABRIC cohort, Hege G. Russnes for histopathological support, and the Oslo Breast Cancer Research Consortium (OSBREAC) for access to data from the Oslo2 cohort. C.F., H.B., and J.H.N. are supported by grants from the Norwegian Research Council (163027) and South-Eastern Norway Regional Health Authority (2012056) to T.S.
Source Data
Author contributions
C.F. conceptualized and designed the study, and performed all analyses. C.F., H.B., J.H.N., and T.S. interpreted the results. J.H.N. and T.S. provided supervision. T.S. acquired funding. C.F. wrote the original manuscript draft. C.F., H.B., J.H.N., and T.S. reviewed and edited the manuscript.
Data availability
The data used in this study are available through cBioportal40,41 (METABRIC4,5, TCGA7,25), GSE8099923, supplementary tables 2 and 3 in Curtis et al.4 and the repository associated with Pereira et al.5. Histological classification of the METABRIC cohort may be available upon request to Mukherjee et al.42. Detailed instructions for gathering data can be found in the repository associated with this study. The source data underlying each figure are provided as a Source Data file. All other data are available within the Article, Supplementary Information files or available from the author upon reasonable request.
Code availability
All code used in the described analyses is available at https://github.com/clfougner/ClaudinLow.
Competing interests
The authors declare no competing interests.
Footnotes
Peer review information Nature Communications thanks Stephen Chanock and Brian Lehmann and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary information is available for this paper at 10.1038/s41467-020-15574-5.
References
- 1.Perou CM, et al. Molecular portraits of human breast tumours. Nature. 2000;406:747–752. doi: 10.1038/35021093. [DOI] [PubMed] [Google Scholar]
- 2.Sørlie T, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl Acad. Sci. 2001;98:10869–10874. doi: 10.1073/pnas.191367098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Herschkowitz JI, et al. Identification of conserved gene expression features between murine mammary carcinoma models and human breast tumors. Genome Biol. 2007;8:R76. doi: 10.1186/gb-2007-8-5-r76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Curtis C, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature. 2012;486:346. doi: 10.1038/nature10983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Pereira B, et al. The somatic mutation profiles of 2,433 breast cancers refine their genomic and transcriptomic landscapes. Nat. Commun. 2016;7:11479. doi: 10.1038/ncomms11479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bruna A, et al. TGFβ induces the formation of tumour-initiating cells in claudin low breast cancer. Nat. Commun. 2012;3:1055. doi: 10.1038/ncomms2039. [DOI] [PubMed] [Google Scholar]
- 7.The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature. 2012;490:61. doi: 10.1038/nature11412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Sabatier R, et al. Claudin-low breast cancers: clinical, pathological, molecular and prognostic characterization. Mol. Cancer. 2014;13:228. doi: 10.1186/1476-4598-13-228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Dias K, et al. Claudin-low breast cancer; clinical & pathological characteristics. PLoS ONE. 2017;12:e0168669. doi: 10.1371/journal.pone.0168669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Damrauer JS, et al. Intrinsic subtypes of high-grade bladder cancer reflect the hallmarks of breast cancer biology. Proc. Natl Acad. Sci. 2014;111:3110–3115. doi: 10.1073/pnas.1318376111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kardos J, et al. Claudin-low bladder tumors are immune infiltrated and actively immune suppressed. JCI Insight. 2016;1:e85902. doi: 10.1172/jci.insight.85902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Prat A, et al. Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer. Breast Cancer Res. 2010;12:R68. doi: 10.1186/bcr2635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Fougner C, Bergholtz H, Kuiper R, Norum JH, Sørlie T. Claudin-low-like mouse mammary tumors show distinct transcriptomic patterns uncoupled from genomic drivers. Breast Cancer Res. 2019;21:85. doi: 10.1186/s13058-019-1170-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Morel AP, et al. A stemness-related ZEB1-MSRB3 axis governs cellular pliancy and breast cancer genome stability. Nat. Med. 2017;23:568–578. doi: 10.1038/nm.4323. [DOI] [PubMed] [Google Scholar]
- 15.Puisieux A, Pommier RM, Morel A-P, Lavial F. Cellular pliancy and the multistep process of tumorigenesis. Cancer Cell. 2018;33:164–172. doi: 10.1016/j.ccell.2018.01.007. [DOI] [PubMed] [Google Scholar]
- 16.Prat A, et al. Molecular characterization of basal-like and non-basal-like triple-negative breast cancer. Oncologist. 2013;18:123–133. doi: 10.1634/theoncologist.2012-0397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Parker JS, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 2009;27:1160. doi: 10.1200/JCO.2008.18.1370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gendoo DMA, et al. Genefu: an R/Bioconductor package for computation of gene expression-based signatures in breast cancer. Bioinformatics. 2015;32:1097–1099. doi: 10.1093/bioinformatics/btv693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Russnes HG, Lingjaerde OC, Børresen-Dale A-L, Caldas C. Breast cancer molecular stratification: from intrinsic subtypes to integrative clusters. Am. J. Pathol. 2017;187:2152–2162. doi: 10.1016/j.ajpath.2017.04.022. [DOI] [PubMed] [Google Scholar]
- 20.Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer18, 696–705 (2018). [DOI] [PMC free article] [PubMed]
- 21.Yoshihara K, et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 2013;4:2612. doi: 10.1038/ncomms3612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Liu Y, Hayes DN, Nobel A, Marron JS. Statistical significance of clustering for high-dimension, low–sample size data. J. Am. Stat. Assoc. 2008;103:1281–1293. doi: 10.1198/016214508000000454. [DOI] [Google Scholar]
- 23.Aure MR, et al. Integrative clustering reveals a novel split in the luminal A subtype of breast cancer with impact on outcome. Breast Cancer Res. 2017;19:44. doi: 10.1186/s13058-017-0812-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ali HR, et al. Genome-driven integrated classification of breast cancer validated in over 7,500 samples. Genome Biol. 2014;15:431. doi: 10.1186/s13059-014-0431-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hoadley KA, et al. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell. 2018;173:291–304. doi: 10.1016/j.cell.2018.03.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Weigelt B, et al. Metaplastic breast carcinomas display genomic and transcriptomic heterogeneity. Mod. Pathol. 2015;28:340. doi: 10.1038/modpathol.2014.142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Vidal M, et al. Gene expression-based classifications of fibroadenomas and phyllodes tumours of the breast. Mol. Oncol. 2015;9:1081–1090. doi: 10.1016/j.molonc.2015.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.McFaline-Figueroa JL, et al. A pooled single-cell genetic screen identifies regulatory checkpoints in the continuum of the epithelial-to-mesenchymal transition. Nat. Genet. 2019;51:1389–1398. doi: 10.1038/s41588-019-0489-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Williams, E. D., Gao, D., Redfern, A. & Thompson, E. W. Controversies around epithelial–mesenchymal plasticity in cancer metastasis. Nat. Rev. Cancer19, 716–732 (2019). [DOI] [PMC free article] [PubMed]
- 30.Haakensen VD, et al. Gene expression profiles of breast biopsies from healthy women identify a group with claudin-low features. BMC Med. Genomics. 2011;4:77. doi: 10.1186/1755-8794-4-77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Bergholtz, H. et al. A longitudinal study of the association between mammographic density and gene expression in normal breast tissue. J. Mammary Gland Biol. Neoplasia24, 163–175 (2019). [DOI] [PubMed]
- 32.Morel A-P, et al. EMT inducers catalyze malignant transformation of mammary epithelial cells and drive tumorigenesis towards claudin-low tumors in transgenic mice. PLoS Genet. 2012;8:e1002723. doi: 10.1371/journal.pgen.1002723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Taylor NA, et al. Treg depletion potentiates checkpoint inhibition in claudin-low breast cancer. J. Clin. Invest. 2017;127:3472–3483. doi: 10.1172/JCI90499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Alsuliman A, et al. Bidirectional crosstalk between PD-L1 expression and epithelial to mesenchymal transition: significance in claudin-low breast cancer cells. Mol. Cancer. 2015;14:149. doi: 10.1186/s12943-015-0421-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Chuck Harrell J, et al. Endothelial-like properties of claudin-low breast cancer cells promote tumor vascular permeability and metastasis. Clin. Exp. Metastasis. 2014;31:33–45. doi: 10.1007/s10585-013-9607-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Hanahan D, Coussens LM. Accessories to the crime: functions of cells recruited to the tumor microenvironment. Cancer Cell. 2012;21:309–322. doi: 10.1016/j.ccr.2012.02.022. [DOI] [PubMed] [Google Scholar]
- 37.Nieto MA, Huang RY-J, Jackson RA, Thiery JP. EMT: 2016. Cell. 2016;166:21–45. doi: 10.1016/j.cell.2016.06.028. [DOI] [PubMed] [Google Scholar]
- 38.Pfefferle AD, et al. Transcriptomic classification of genetically engineered mouse models of breast cancer identifies human subtype counterparts. Genome Biol. 2013;14:R125. doi: 10.1186/gb-2013-14-11-r125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Norum, J. H. et al. GLI1 induced mammary gland tumours are transplantable and maintain major molecular features. Int. J. Cancer146, 1125–1138 (2019). [DOI] [PubMed]
- 40.Cerami E, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discovery. 2012;2:401–404. doi: 10.1158/2159-8290.CD-12-0095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Gao J, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 2013;6:pl1. doi: 10.1126/scisignal.2004088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Mukherjee A, et al. Associations between genomic stratification of breast cancer and centrally reviewed tumour pathology in the METABRIC cohort. NPJ breast cancer. 2018;4:5. doi: 10.1038/s41523-018-0056-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Davis S, Meltzer PS. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics. 2007;23:1846–1847. doi: 10.1093/bioinformatics/btm254. [DOI] [PubMed] [Google Scholar]
- 44.Huber W, et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods. 2015;12:115. doi: 10.1038/nmeth.3252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.R Core Team. R: A Language and Environment for Statistical Computing ISBN 3-900051-07-0 (R Foundation for Statistical Computing, 2017).
- 46.Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016;32:2847–2849. doi: 10.1093/bioinformatics/btw313. [DOI] [PubMed] [Google Scholar]
- 47.Gerstung M, et al. The evolutionary history of 2,658 cancers. Nature. 2020;578:122–128. doi: 10.1038/s41586-019-1907-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Therneau, T. M. & Grambsch, P. M. Modeling Survival Data: Extending the Cox Model (Springer, 2000).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data used in this study are available through cBioportal40,41 (METABRIC4,5, TCGA7,25), GSE8099923, supplementary tables 2 and 3 in Curtis et al.4 and the repository associated with Pereira et al.5. Histological classification of the METABRIC cohort may be available upon request to Mukherjee et al.42. Detailed instructions for gathering data can be found in the repository associated with this study. The source data underlying each figure are provided as a Source Data file. All other data are available within the Article, Supplementary Information files or available from the author upon reasonable request.
All code used in the described analyses is available at https://github.com/clfougner/ClaudinLow.