Abstract
Background:
Modified median and subgroup-specific gene centering are two essential pre-processing methods to assign breast cancer molecular subtypes by PAM50. We evaluated the PAM50 subtypes derived from both methods in a subset of Nurses’ Health Study (NHS) and NHSII participants; correlated tumor subtypes by PAM50 with immunohistochemistry (IHC) surrogates; and characterized the PAM50 subtype distribution, proliferation scores and risk of relapse with proliferation and tumor size weighted (ROR-PT) scores in the NHS/NHSII.
Methods:
PAM50 subtypes, proliferation scores and ROR-PT scores were calculated for 882 invasive breast tumors and 695 histologically normal tumor-adjacent tissues. Cox proportional hazard models evaluated the relationship between PAM50 subtypes or ROR-PT scores/groups with recurrence free survival (RFS) or distant RFS.
Results:
PAM50 subtypes were highly comparable between the two methods. The agreement between tumor subtypes by PAM50 and IHC surrogates improved to fair when Luminal subtypes were grouped together. Using the modified median method, our study consisted of 46% Luminal A, 18% Luminal B, 14% HER2-enriched, 15% Basal-like and 8% Normal-like subtypes; 53% of tumor-adjacent tissues were Normal-like. Women with the Basal-like subtype had a higher rate of relapse within five years. HER2-enriched subtypes had poorer outcomes prior to 1999.
Conclusion:
Either pre-processing method may be utilized to derive PAM50 subtypes for future studies. The majority of NHS/NHSII tumor and tumor-adjacent tissues were classified as Luminal A and Normal-like, respectively.
Impact:
Pre-processing methods are important for the accurate assignment of PAM50 subtypes. These data provide evidence that either pre-processing method can be used in epidemiological studies.
Keywords: breast cancer, gene expression, PAM50, tumor-adjacent, epidemiology, nurses’ health study
Introduction
Breast cancer is a heterogeneous disease at both morphological and molecular levels (1,2). Given this diversity, many approaches such as MammaPrint (3), Oncotype DX (4) and PAM50 (5), have been developed to classify breast tumors to inform prognosis and guide treatment. PAM50 is a 50-gene signature that classifies breast cancer into five molecular intrinsic subtypes: Luminal A, Luminal B, human epidermal growth factor receptor 2 (HER2)-enriched, Basal-like and Normal-like (1,5). Each of the five molecular subtypes vary by their biological properties and prognoses (6,7). Luminal A generally has the best prognosis; HER2-enriched and Basal-like are considered more aggressive diseases. Less common subtypes, such as Claudin-low, Interferon-rich and Molecular Apocrine, have also been identified using other gene expression profiling assays (8–11).
Molecular subtyping using the PAM50 gene signature can be performed using gene expression derived from microarrays, RNASeq or qRT-PCR. Until the recent development of Prosigna™, a rapid PAM50-based molecular subtype classifier using the NanoString nCounter Dx Analysis System (12), the complexities of using PAM50 and other gene signature assays for molecular subtyping have limited their use in clinical practice and led to the development of immunohistochemical (IHC) surrogate definitions to classify tumors into molecular subtypes (13,14). For example, the immunophenotypic surrogate profile for classifying a tumor as Basal-like is one that is estrogen receptor (ER), progesterone receptor (PR) and HER2 negative, with positive expression of cytokeratin 5/6 (CK 5/6) and/or epidermal growth factor receptor (EGFR) (15). However, studies have reported differences in tumor classification when comparing molecular assays and IHC (16,17). There are ongoing efforts to refine the IHC definitions to more closely approximate molecular subtypes (7,18–21).
In addition to the discrepancies between molecular subtyping using PAM50 and IHC, inaccurate pre-processing of gene expression data as well as utilizing non-standard PAM50 algorithms will result in inconsistent and/or erroneous assignment of molecular subtypes (22–25). In particular, molecular subtype assignment by PAM50 may be affected when the clinicopathological distribution (e.g., ER status) of the intended research cohort differs from the original cohort used by Parker et al. to derive the PAM50 algorithm. The original cohort had an equal distribution of ER+ and ER- tumors (i.e., 50% ER+/50% ER-) (5). To address this problem, a modified median gene centering (MMGC) pre-processing method was developed (1,2). Later, Zhao et al. proposed a subgroup-specific gene centering (SSGC) pre-processing method (26).
Although PAM50 subtypes were initially developed to classify breast cancer, molecular subtypes can also be reflected in histologically normal tumor-adjacent tissues (henceforth referred to as “tumor-adjacent”). Each subtype is associated with a distinct physiological response in the tumor-adjacent tissue; and specific gene expression patterns in these tumor-adjacent regions may be associated with varying risk of recurrence and prognosis (27–30). Thus, these prior studies suggest the importance of studying tumor-adjacent tissues in breast cancer.
We have previously reported the tumor molecular subtypes using IHC surrogates for 5561 Nurses’ Health Study (NHS) and NHSII participants diagnosed with breast cancer (31). In this study, we describe the tumor and tumor-adjacent PAM50 molecular subtypes in a subset of 954 NHS/NHSII participants with gene expression data. Specifically, we:
computed and compared breast cancer PAM50 molecular subtypes, proliferation scores and risk of relapse with proliferation and tumor size weighted (ROR-PT) scores derived from both the MMGC and SSGC pre-processing methods;
determined the concordance of tumor molecular subtypes using PAM50 and IHC surrogates; and
described the tumor PAM50 subtype distribution, proliferation scores and ROR-PT scores in the NHS/NHSII.
Materials and Methods
Study population
The Human Subjects Committee at Partners Healthcare System and Brigham and Women’s Hospital in Boston, MA approved this study. The NHS and NHSII cohorts are ongoing prospective studies of US female registered nurses followed biennially by questionnaires to query exposures and identify newly diagnosed diseases. NHS was established in 1976 with 121,700 participants between 30–55 years of age, and NHSII was established in 1989 (n=116,429, ages 25–42). Written permission was obtained from participants who were diagnosed with invasive breast cancer, or their next of kin, to review medical records for diagnosis confirmation, retrieval of cancer details, and to collect archival tissue specimens. Archival formalin-fixed paraffin embedded (FFPE) breast cancer tissue blocks were requested from respective hospitals (32).
Breast cancer recurrence
Local and distant recurrences were self-reported by NHS/NHSII participants; no medical record review was conducted for recurrences. Recurrence-free survival (RFS) is defined as time from diagnosis to reported breast cancer recurrence, diagnosis of cancer in common sites of recurrence (i.e., liver, lung, brain or bone) or death from breast cancer without reported recurrence. Distant recurrence-free survival (DRFS) is defined as time from initial diagnosis to diagnosis of cancer in common sites of recurrence (i.e., liver, lung, brain or bone) or death from breast cancer without reported recurrence.
Gene expression data
The protocol to obtain RNA from FFPE tissues was previously published (33). Gene expression data were obtained in two batches using microarrays performed in 2012–2014 using the Glue Grant Human Transcriptome Arrays (HTA) 3.0 pre-release version (Affymetrix, Santa Clara, CA) (33) and 2015–2018 (HTA 2.0, Affymetrix) by the Molecular Biology Core Facilities, Dana-Farber Cancer Institute, Boston, MA. Gene expression data were normalized, summarized into Log2 values using Robust Multi-array Average and annotated. All microarrays and sample information are available at the National Center for Biotechnology Information Gene Expression Omnibus (accession number: GSE115577).
Molecular subtyping by PAM50, proliferation scores and ROR-PT scores
Molecular subtyping by PAM50 was carried out separately for tumor and tumor-adjacent samples. Gene adjustment factors for tumor-adjacent samples were estimated from tumors. After adjusting the gene expression dataset using the MMGC or SSGC method, research-based PAM50 classification was performed. Proliferation scores and ROR-PT scores are additional measures that were subsequently developed to further characterize breast tumors and are automatically generated by the PAM50 algorithm (34). Thus, proliferation scores and ROR-PT scores were only reported for tumor tissues. Proliferation scores were computed using three methods: Log2 expression (no centering), MMGC-adjusted expression, and SSGC-adjusted expression. The ROR-PT score is calculated using PAM50 subtype, proliferation score, and pathological tumor size.
Molecular subtyping using IHC surrogates
IHC data were obtained from tissue microarrays (31,32,35). Missing IHCs for ER, PR and HER2 (n=144) were replaced with data from medical records. Tumors were classified into Luminal A, Luminal B, HER2-enriched and Basal-like as previously defined (14,31,36). For tumors missing Ki-67 IHC data (n=545), histologic grade was used as a proxy in classification.
Please refer to the Supplementary Methods for gene expression, pre-processing methods and IHC surrogate details.
Statistical methods
Confusion matrices were used to determine the concordance of PAM50 subtypes when gene expression data were pre-processed using the MMGC or SSGC method, the concordance of molecular subtypes classified using PAM50 and IHC in tumor tissues, and the concordance of subtypes in paired tumor and tumor-adjacent tissues (37). The confusion matrix computes summaries such as accuracy (the frequency of agreement), and Cohen’s kappa (a measure which accounts for the agreement expected to occur by random chance). Spearman’s rho was used to determine the correlation between the two methods used to derive proliferation scores and ROR-PT scores in tumor tissues.
RFS and DRFS were evaluated at five and ten years because these time points are generally utilized in clinical studies. Thus, crude and adjusted Cox proportional hazard models evaluated the relationship between PAM50 subtypes or ROR-PT scores/groups with RFS or DRFS within five and ten years in the NHS/NHSII. Individuals were censored for RFS or DRFS if they were reported to have death from other causes or end of follow-up. Adjusted models included age and year of diagnosis, clinical grade, stage, type of surgery (lumpectomy, mastectomy, none and unknown) and type of treatment (chemotherapy, hormone therapy, radiotherapy, two or more types of therapies, none, and unknown). When evaluating tumor PAM50 subtypes in the Cox proportional hazard models, Luminal A was set as the reference group. The proportional hazards assumption was tested through evaluation of scaled Schoenfeld residuals (38). All tests of statistical significance were two-sided. Statistical significance was defined as a p-value <0.05. All analyses were conducted using R version 3.4.0. Kaplan-Meier curves were plotted using survminer version 0.4.0 package in R.
Results
This analysis consisted of gene expression data from 954 women who contributed 882 tumors and 695 histologically normal tumor-adjacent samples. Of these, there were 623 paired samples. This subset of 954 women with gene expression data was generally representative of the NHS/NHSII population diagnosed with breast cancer (Supplementary Table S1). The majority of participants in this study had stage I disease, were clinical grade 2, ER+ and PR+, and HER2-. NHS women had more IHC HER2+ cases compared to NHSII (Supplementary Table S2). Amongst the 882 women who contributed tumor samples, RFS and DRFS data were unavailable for six women. At 10 years of follow-up, there were 112 recurrence and 85 distant recurrence events. ROR-PT scores were computed for 863 cases; 19 cases were not computed due to missing tumor size. Thus, only 857 women were included for ROR-PT and RFS/DRFS analyses.
Comparing PAM50 molecular subtypes, proliferation scores and ROR-PT scores derived from the two pre-processing methods
PAM50 subtypes derived by both pre-processing methods were highly agreeable. Figure 1 shows the concordance of PAM50 subtypes in tumor (accuracy = 0.86, kappa = 0.81) and tumor-adjacent tissue (accuracy = 0.82, kappa = 0.74). Most tumors were classified as Luminal A (46% using MMGC and 40% using SSGC; Figure 1A). Of the 695 tumor-adjacent tissues, 53% and 39% were classified as Normal-like by MMGC and SSGC, respectively (Figure 1B). More tumor samples were assigned as Luminal B or HER2-enriched using SSGC compared to MMGC, while MMGC assigned more tumor samples as Normal-like. Further investigation into why there was a shift in tumors classified as Luminal B to Luminal A (n=44), and more Normal-like calls (n=36) using MMGC revealed that proliferation scores computed using SSGC were slightly higher compared to MMGC resulting in these cases being classified into more aggressive molecular subtypes when SSGC method was used (Supplementary Figures S1A-S1C). In general, proliferation scores of tumors were highly correlated between simple Log2 expression (no centering) and each pre-processing method (both p<0.01; Supplementary Figure S2). ROR-PT scores for tumors were highly correlated between MMGC and SSGC (Spearman’s rho=0.99, p<0.01).
Comparing PAM50 molecular subtypes derived from the two pre-processing methods and IHC surrogates
Figures 2A and 2B display the correlation between Luminal A, Luminal B, HER2-enriched and Basal-like as classified by PAM50 and IHC surrogates (MMGC: accuracy = 0.54, kappa = 0.32; and SSGC: accuracy = 0.53, kappa = 0.32). With kappa at 0.32, there is poor agreement between PAM50 and IHC. When the Luminal subtypes were grouped together, the correlation between PAM50 and IHC improved to fair agreement (MMGC: accuracy = 0.81, kappa = 0.53, Figure 2C; SSGC: accuracy = 0.79, kappa = 0.49, Figure 2D). Very similar results were obtained when analyses were restricted to women with Ki-67 IHC data (n=337; Supplementary Data).
Molecular subtypes in tumor and tumor-adjacent tissues
Amongst 623 paired samples, the most common pairing was Luminal A tumors and Normal-like tumor-adjacent tissues using both pre-processing methods (Figures 3A and 3B). Women with Luminal A or B tumors were more likely to have Normal-like tumor-adjacent subtype than women with HER2-enriched or Basal-like tumors. The agreement between paired tumor and tumor-adjacent subtypes was 30% using MMGC and 32% using SSGC.
Tumor PAM50 subtypes, ROR-PT scores and prognosis in the NHS/NHSII
Since there was high concordance in PAM50 subtypes between the two pre-processing methods, subsequent main tables in the manuscript will display results derived from the MMGC method while supplementary tables display results from SSGC. Luminal A and B, and HER2-enriched were generally of clinical grade 2 while 38% of grade 3 tumors were of the Basal-like subtype (Table 1). Seventy-six percent of IHC ER+ tumors were Luminal A or B, while 54% of ER- tumors were Basal-like. Similarly, 77% of IHC PR+ tumors were Luminal subtypes and 49% of PR- tumors were Basal-like. In tumors classified as HER2-enriched, only 42% were IHC HER2+. The association of PAM50 subtypes computed using the SSGC method and NHS/NHSII participants are in Supplementary Table S3.
Table 1.
Luminal A | Luminal B | HER2-Enriched | Basal-like | Normal-like | |
---|---|---|---|---|---|
n | 405 | 157 | 124 | 128 | 68 |
NHS Cohort, n (%) | |||||
NHS | 234 (57.8) | 93 (59.2) | 81 (65.3) | 84 (65.6) | 45 (66.2) |
NHSII | 171 (42.2) | 64 (40.8) | 43 (34.7) | 44 (34.4) | 23 (33.8) |
Tumor grade, n (%) | |||||
1: Predominantly well-differentiated | 143 (36.4) | 21 (13.5) | 14 (11.8) | 11 (9.4) | 22 (36.1) |
2: Moderately differentiated | 216 (55.0) | 86 (55.1) | 72 (60.5) | 33 (28.2) | 35 (57.4) |
3: Poorly differentiated | 34 (8.7) | 49 (31.4) | 33 (27.7) | 73 (62.4) | 4 (6.6) |
Stage, n (%) | |||||
I | 265 (65.6) | 89 (56.7) | 65 (52.4) | 62 (48.8) | 47 (69.1) |
II | 104 (25.7) | 53 (33.8) | 40 (32.3) | 59 (46.5) | 15 (22.1) |
III | 32 (7.9) | 14 (8.9) | 18 (14.5) | 6 (4.7) | 5 (7.4) |
IV | 3 (0.7) | 1 (0.6) | 1 (0.8) | 0 (0.0) | 1 (1.5) |
Tumor estrogen receptor, n (%) | |||||
Positive | 389 (96.5) | 150 (96.2) | 82 (66.7) | 36 (28.3) | 52 (76.5) |
Negative | 14 (3.5) | 6 (3.8) | 41 (33.3) | 91 (71.7) | 16 (23.5) |
Tumor progesterone receptor, n (%) | |||||
Positive | 382 (95.3) | 141 (90.4) | 76 (62.3) | 33 (26.2) | 51 (75.0) |
Negative | 19 (4.7) | 15 (9.6) | 46 (37.7) | 93 (73.8) | 17 (25.0) |
Tumor HER2, n (%) | |||||
Positive | 101 (26.5) | 40 (27.8) | 48 (42.1) | 21 (17.8) | 14 (23.0) |
Negative | 280 (73.5) | 104 (72.2) | 66 (57.9) | 97 (82.2) | 47 (77.0) |
Tumor Ki-67, n (%) | |||||
High | 36 (25.2) | 27 (44.3) | 19 (38.0) | 26 (49.1) | 5 (16.7) |
Low | 107 (74.8) | 34 (55.7) | 31 (62.0) | 27 (50.9) | 25 (83.3) |
Women with the Basal-like subtype were significantly more likely to have poorer RFS outcomes within five years (Table 2A). Although women with HER2-enriched subtypes appear to have significantly poorer RFS and DRFS outcomes at both five and ten years compared to women with Luminal A subtypes (Tables 2A and 2B), further analyses showed that this finding is generally reflective of women diagnosed prior to the introduction of targeted therapy for HER2 (i.e., trastuzumab) in 1999 (Supplementary Table S4A). After 1999, there was no difference in RFS or DRFS rates among women with HER2-enriched subtypes compared to women with Luminal A subtypes (Supplementary Table S4B). The relationships between PAM50 subtypes and RFS or DRFS in the NHS/NHSII are illustrated in Supplementary Figures S3A-S3D.
Table 2A.
5-years | 10-years | |||||||
---|---|---|---|---|---|---|---|---|
Event n/ Total n | HR | (95% CI) | P-Value | Event n/ Total n | HR | (95% CI) | P-Value | |
A. Crude | ||||||||
Luminal A | 19/402 | 1.00 | ref | - | 36/402 | 1.00 | ref | - |
Luminal B | 13/157 | 1.81 | (0.90,3.67) | 0.10 | 22/157 | 1.63 | (0.96,2.76) | 0.07 |
HER2-enriched | 18/122 | 3.32 | (1.74,6.33) | <0.01 | 22/122 | 2.15 | (1.27,3.66) | <0.01 |
Basal-like | 13/127 | 2.26 | (1.12,4.58) | 0.02 | 21/127 | 1.93 | (1.13,3.31) | 0.02 |
Normal-like | 7/68 | 2.24 | (0.94,5.33) | 0.07 | 11/68 | 1.85 | (0.94,3.64) | 0.07 |
ROR-PT score* | 70/857 | 1.24 | (1.11,1.39) | <0.01 | 112/857 | 1.19 | (1.09,1.30) | <0.01 |
B. Adjusted Model# | ||||||||
Luminal A | 19/402 | 1.00 | ref | - | 36/402 | 1.00 | ref | - |
Luminal B | 13/157 | 1.61 | (0.77,3.38) | 0.21 | 22/157 | 1.46 | (0.83,2.56) | 0.19 |
HER2-enriched | 18/122 | 2.80 | (1.42,5.55) | <0.01 | 22/122 | 1.87 | (1.06,3.27) | 0.03 |
Basal-like | 13/127 | 2.42 | (1.06,5.49) | 0.03 | 21/127 | 1.80 | (0.96,3.38) | 0.07 |
Normal-like | 7/68 | 1.99 | (0.78,5.09) | 0.15 | 11/68 | 1.75 | (0.86,3.58) | 0.13 |
ROR-PT score* | 70/857 | 1.09 | (0.95,1.25) | 0.21 | 112/857 | 1.06 | (0.95,1.18) | 0.27 |
ROR-PT was evaluated as continuous variable per 10-unit change in 857 women with ROR-PT scores.
Adjusted model included age and year of diagnosis, clinical grade, stage, type of surgery and type of treatment. Hazard Ratio, HR.
Table 2B.
5-years | 10-years | |||||||
---|---|---|---|---|---|---|---|---|
Event n/ Total n | HR | (95% CI) | P-Value | Event n/ Total n | HR | (95% CI) | P-Value | |
A. Crude | ||||||||
Luminal A | 17/402 | 1.00 | ref | - | 25/402 | 1.00 | ref | - |
Luminal B | 9/157 | 1.38 | (0.61,3.09) | 0.44 | 18/157 | 1.88 | (1.03,3.45) | 0.04 |
HER2-enriched | 16/122 | 3.29 | (1.66,6.51) | <0.01 | 18/122 | 2.52 | (1.38,4.62) | <0.01 |
Basal-like | 9/127 | 1.72 | (0.77,3.86) | 0.19 | 15/127 | 1.95 | (1.03,3.69) | 0.04 |
Normal-like | 6/68 | 2.14 | (0.84,5.43) | 0.11 | 9/68 | 2.17 | (1.02,4.66) | <0.05 |
ROR-PT score* | 57/857 | 1.29 | (1.14,1.46) | <0.01 | 85/857 | 1.23 | (1.11,1.36) | <0.01 |
B. Adjusted Model# | ||||||||
Luminal A | 17/402 | 1.00 | ref | - | 25/402 | 1.00 | ref | - |
Luminal B | 9/157 | 1.19 | (0.51,2.79) | 0.69 | 18/157 | 1.68 | (0.88,3.18) | 0.11 |
HER2-enriched | 16/122 | 2.74 | (1.31,5.70) | <0.01 | 18/122 | 2.18 | (1.15,4.14) | 0.02 |
Basal-like | 9/127 | 1.96 | (0.76,5.04) | 0.16 | 15/127 | 2.07 | (0.98,4.37) | 0.06 |
Normal-like | 6/68 | 1.85 | (0.66,5.16) | 0.24 | 9/68 | 1.93 | (0.86,4.34) | 0.11 |
ROR-PT score* | 57/857 | 1.15 | (0.98,1.34) | 0.08 | 85/857 | 1.10 | (0.97,1.23) | 0.13 |
ROR-PT was evaluated as continuous variable per 10-unit change in 857 women with ROR-PT scores.
Adjusted model included age and year of diagnosis, clinical grade, stage, type of surgery and type of treatment. Hazard Ratio, HR.
ROR-PT categories (low, medium and high) were automatically stratified by the PAM50 algorithm and confirmed the expected relationships using both pre-processing methods, where women predicted as “low” had the best RFS and DRFS outcomes (Figures 4A-4D). ROR-PT scores were also analyzed as a continuous variable per 10-unit change in the Cox proportional hazard model. In crude models for MMGC, every 10-unit increase in ROR-PT scores corresponded to 24% increase in risk of recurrence (95% CI 1.11–1.39) within five years and 19% (95% CI 1.09–1.30) within ten years (Table 2A); and 29% increase risk of distant recurrence (95% CI 1.14–1.46) within five years and 23% (95% CI 1.11–1.36) within ten years (Table 2B). These findings attenuated in the adjusted models, though not all the way to the null. Results were very similar when PAM50 and ROR-PT were computed using SSGC (Supplementary Tables S5A-S5D).
Discussion
The discovery of molecular subtypes has created a new tool for clinicians and researchers to further understand breast cancer biology (39), etiology, risk factors (40), and evaluate response to treatment (34,41,42). Thus, the accurate assignment of molecular subtypes is important. The distribution of PAM50 subtypes, proliferation scores, and ROR-PT scores were highly comparable when computed using either the MMGC or SSGC pre-processing method. Furthermore, the agreement between PAM50 classification by gene expression and IHC was fair when Luminal A and B were considered as a single group. The majority of the NHS/NHSII participants had Luminal A subtype tumors. There was a higher rate of recurrence in women with Basal-like subtypes compared to Luminal A subtypes. ROR-PT scores were only prognostic in crude analyses.
The application of a pre-processing step to the gene expression data as well as selecting a specific preprocessing method (i.e., MMGC or SSGC) prior to subtyping are critical components to establish a reproducible informatics workflow for PAM50 classification. MMGC and SSGC generally yielded concordant subtypes and highly correlated proliferation scores. The associations between the PAM50 subtypes & ROR-PT scores and prognosis were similar when subtypes were computed using either method. It remains unclear which pre-processing method should be considered superior as there is no gold standard measure to compare with. Both pre-processing methods have practical utility and either one may be employed to classify breast tumors. We decided to use the MMGC method to report our main results as this method is widely utilized by The Cancer Genome Atlas breast cancer study team (1,2). SSGC is an elegant alternative to MMGC that is useful as an additional check when performing PAM50 subtyping. Future data analyses should take note that proliferation estimates are generally higher when computed by the SSGC method compared to MMGC – tumors are more likely to be classified into the more aggressive molecular subtypes, and tumor-adjacent tissue is less likely to be classified as Normal-like.
The 2015 St Gallen International Expert Conference Report published recommended IHC definitions to more accurately reflect molecular subtypes (7). There are slight differences in the IHC definitions for the Luminal subtypes between our study and St Gallen’s recommendations. Our PR were manually graded as 0, >1% and >10% while St Gallen’s suggests using >20% to classify PR+. We graded Ki-67 as low (<14%) or high (>14%) while St Gallen categorizes Ki-67 into low (<14%), intermediate (14–19%) and high (≥20%). Ki-67 staining information was unavailable for about 60% of women; tumor grade was used as a proxy for these individuals. This may result in the misclassification of IHC subtypes for some individuals, and may in part explain the low agreement between molecular subtyping by PAM50 and IHC in our study. If IHC surrogate definitions are still to be used, further refinement is needed so that breast tumor classification will more closely approximate the PAM50 subtypes.
With technological advances in RNA extraction and the availability of the NanoString nCounter Dx Analysis System (12), more studies in the future should be able to obtain molecular subtypes derived from gene expression instead of relying on IHC surrogates. The difference in molecular subtyping by PAM50 and IHC is further demonstrated by this current study. The PAM50 distribution of NHS/NHSII participants in this study was 46% Luminal A, 18% Luminal B, 14% HER2-enriched, 15% Basal-like and 8% Normal-like while our previous study utilized IHC surrogates to classify 5561 tumors reported higher percentages of women classified as Luminal A (55%) and B (27%), and lower percentages of HER2-enriched (6%) and Basal-like (10%) with 2.9% unclassified tumors (31).
Gene expression data are only available for a subset of the breast cancer cases in NHS/NHSII, though this subset is generally representative of the overall NHS/NHSII breast cancer population. The majority of NHS participants are white postmenopausal women, while NHSII participants are mostly white premenopausal women. Our data showed that the prevalence of each PAM50 subtype did not differ by participant menopausal status. Given that IHC subtype surrogates have been shown to differ by race, future studies should investigate potential differences in the distributions of PAM50 subtypes in minority populations (43).
As expected, women with tumors of Basal-like subtypes had poorer RFS outcomes compared to women with Luminal A subtypes. Women with tumors of HER2-enriched subtype only had significantly poorer RFS outcomes at both five and ten years compared to women with Luminal A subtypes before 1999. In contrast to other studies, we did not observe poorer prognosis for Luminal B tumors at ten years (41,44). This may be attributed to the small number of events among women with Luminal B tumors or different pre-processing method used for PAM50 subtyping.
Molecular subtyping was specifically developed to classify breast cancers. We applied the PAM50 algorithm to classify histologically normal tumor-adjacent tissue into molecular subtypes. The histologically normal tissue was classified as Normal-like for 40–50% of women, depending on the pre-processing method used. This suggests that histologically normal tumor-adjacent tissue may not be biologically normal for all women. Expression of an estrogen response signature and in vivo triple-negative signature in tumor-adjacent tissue was found to differ across tumor PAM50 subtypes (30). Future work could identify novel molecular subtypes unique to tumor-adjacent tissue and determine if these novel subtypes within tumor-adjacent tissue may harbor additional insights to therapy response and prognosis (28,29).
In summary, we used two pre-processing methods (MMGC and SSGC) to characterize the PAM50 breast cancer molecular subtypes of tumor and histologically normal tumor-adjacent samples. We have shown that either pre-processing method may be utilized to derive PAM50 subtypes for future studies. In the NHS/NHSII, the majority of tumor and tumor-adjacent tissues were classified as Luminal A and Normal-like, respectively. Women with Luminal A or B tumors were more likely to have Normal-like tumor-adjacent tissues than women with HER2-enriched or Basal-like tumors. Women with Basal-like subtypes had poorer prognoses compared to Luminal A subtypes. The identification of novel tumor-adjacent molecular subtypes in the future may provide new insights into breast cancer therapy response and prognosis.
Supplementary Material
Acknowledgements
We thank the participants and staff of the NHS and the NHSII for their valuable contributions as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, WY.
Financial Information
Funding for this project was provided by National Institutes of Health grants U19 CA148065 (D Hunter, R Tamimi), UM1 CA186107 (R Tamimi), P01 CA87969 (R Tamimi, AH Eliassen, Y Heng, A Stancu, M Pyle, G Baker), UM1 CA176726 (AH Eliassen, R Tamimi, Y Heng, A Stancu, M Pyle, G Baker), and R01 CA166666 (S Hankinson, Y Heng, A Stancu, M Pyle); Komen grant SAC110014 (S Hankinson, Y Heng, A Stancu); National Cancer Institute Predoctoral National Research Service Award F31CA192462 (K Kensler); National Cancer Institute Institutional National Research Service Award T32CA009001 (K Kensler); and the Klarman Family Foundation (Y Heng).
Abbreviations list:
- CK 5/6
cytokeratin 5/6
- DRFS
distant recurrence-free survival
- EGFR
epidermal growth factor receptor
- ER
estrogen receptor
- FFPE
formalin-fixed paraffin embedded
- HER2
human epidermal growth factor receptor 2
- HTA
Human Transcriptome Arrays
- IHC
immunohistochemistry
- MMGC
modified median gene centering
- NHS
Nurses’ Health Study
- PR
progesterone receptor
- RFS
recurrence free survival
- ROR-PT
risk of relapse with proliferation and tumor size weighted
- SSGC
subgroup-specific gene centering
Footnotes
Conflict of interest disclosure: JSP is a consultant for Nanostring. All other authors declare no competing interests.
References
- 1.The Cancer Genome Atlas. Comprehensive molecular portraits of human breast tumours. Nature. 2012;490(7418):61–70. doi: 10.1038/nature11412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Heng YJ, Lester SC, Tse GMK, et al. The molecular basis of breast cancer pathological phenotypes. J Pathol. 2017;241(3):375–391. doi: 10.1002/path.4847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Paik S, Shak S, Tang G, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004;351(27):2817–2826. doi: 10.1056/NEJMoa041588. [DOI] [PubMed] [Google Scholar]
- 4.van de Vijver MJ, He YD, van’t Veer LJ, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002;347(25):1999–2009. doi: 10.1056/NEJMoa021967. [DOI] [PubMed] [Google Scholar]
- 5.Parker JS, Mullins M, Cheang MCU, et al. Supervised risk predictor of breast cancer sased on intrinsic subtypes. J Clin Oncol. 2009;27(8):1160–1167. doi: 10.1200/JCO.2008.18.1370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Caan BJ, Sweeney C, Habel LA, et al. Intrinsic subtypes from the PAM50 gene expression assay in a population-based breast cancer survivor cohort: Prognostication of short- and long-term outcomes. Cancer Epidemiol Biomarkers Prev. 2014;23(5):725–734. doi: 10.1158/1055-9965.EPI-13-1017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Coates AS, Winer EP, Goldhirsch A, et al. Tailoring therapies-improving the management of early breast cancer: St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2015. Ann Oncol. 2015;26(8):1533–1546. doi: 10.1093/annonc/mdv221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Prat A, Parker JS, Karginova O, et al. Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer. Breast cancer Res. 2010;12(5):R68. doi: 10.1186/bcr2635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Sabatier R, Finetti P, Guille A, et al. Claudin-low breast cancers: clinical, pathological, molecular and prognostic characterization. Mol Cancer. 2014;13(1):228. doi: 10.1186/1476-4598-13-228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lehmann-Che J, Hamy A-S, Porcher R, et al. Molecular apocrine breast cancers are aggressive estrogen receptor negative tumors overexpressing either HER2 or GCDFP15. Breast Cancer Res. 2013;15(3):R37. doi: 10.1186/bcr3421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hu Z, Fan C, Oh DS, et al. The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomics. 2006;7:96. doi: 10.1186/1471-2164-7-96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wallden B, Storhoff J, Nielsen T, et al. Development and verification of the PAM50-based Prosigna breast cancer gene signature assay. BMC Med Genomics. 2015;8:54. doi: 10.1186/s12920-015-0129-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Guiu S, Michiels S, André F, et al. Molecular subclasses of breast cancer: How do we define them? The IMPAKT 2012 working group statement. Ann Oncol. 2012;23(12):2997–3006. doi: 10.1093/annonc/mds586. [DOI] [PubMed] [Google Scholar]
- 14.Tamimi RM, Colditz GA, Hazra A, et al. Traditional breast cancer risk factors in relation to molecular subtypes of breast cancer. Breast Cancer Res Treat. 2012;131(1):159–167. doi: 10.1007/s10549-011-1702-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Cheang MC, Voduc D, Bajdik C, et al. Basal-like breast cancer defined by five biomarkers has superior prognostic value than triple-negative phenotype. Clin Cancer Res. 2008;14(5):1368–1376. doi: 10.1158/1078-0432.CCR-07-1658. [DOI] [PubMed] [Google Scholar]
- 16.de Ronde JJ, Hannemann J, Halfwerk H, et al. Concordance of clinical and molecular breast cancer subtyping in the context of preoperative chemotherapy response. Breast Cancer Res Treat. 2010;119(1):119–126. doi: 10.1007/s10549-009-0499-6. [DOI] [PubMed] [Google Scholar]
- 17.Bastien RR, Rodríguez-Lescure Á, Ebbert MTW, et al. PAM50 breast cancer subtyping by RT-qPCR and concordance with standard clinical molecular markers. BMC Med Genomics. 2012;5:44. doi: 10.1186/1755-8794-5-44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Maisonneuve P, Disalvatore D, Rotmensz N, et al. A revised clinico-pathological surrogate definition of Luminal A intrinsic breast cancer subtype. Breast Cancer Res. 2014;16(3):R65. doi: 10.1186/bcr3679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Cheang MCU, Chia SK, Voduc D, et al. Ki67 index, HER2 status, and prognosis of patients with luminal B breast cancer. J Natl Cancer Inst. 2009;101(10):736–750. doi: 10.1093/jnci/djp082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Prat A, Cheang MCU, Martín M, et al. Prognostic significance of progesterone receptor-positive tumor cells within immunohistochemically defined luminal a breast cancer. J Clin Oncol. 2013;31(2):203–209. doi: 10.1200/JCO.2012.43.4134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Allott EH, Geradts J, Cohen SM, et al. Frequency of breast cancer subtypes among African American women in the AMBER consortium. Breast Cancer Res. 2018;20(1):12. doi: 10.1186/s13058-018-0939-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lusa L, McShane LM, Reid JF, et al. Challenges in projecting clustering results across gene expression-profiling datasets. J Natl Cancer Inst. 2007;99(22):1715–1723. doi: 10.1093/jnci/djm216. [DOI] [PubMed] [Google Scholar]
- 23.Gendoo DMA, Ratanasirigulchai N, Schröder MS, et al. Genefu: An R/Bioconductor package for computation of gene expression-based signatures in breast cancer. Bioinformatics. 2016;32(7):1097–1099. doi: 10.1093/bioinformatics/btv693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Patil P, Bachant-Winner PO, Haibe-Kains B, Leek JT. Test set bias affects reproducibility of gene signatures. Bioinformatics. 2015;31(14):2318–2323. doi: 10.1093/bioinformatics/btv157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Curtis C, Shah SP, Chin S-F, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature. 2012;486(7403):346–352. doi: 10.1038/nature10983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zhao X, Rødland EA, Tibshirani R, Plevritis S. Molecular subtyping for clinically defined breast cancer subgroups. Breast Cancer Res. 2015;17(1):29. doi: 10.1186/s13058-015-0520-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Huang X, Stern DF, Zhao H. Transcriptional profiles from paired normal samples offer complementary information on cancer patient survival – evidence from TCGA pan-cancer data. Sci Rep. 2016;6:20567. doi: 10.1038/srep20567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Roman-Perez E, Casbas-Hernandez P, Pirone JR, et al. Gene expression in extratumoral microenvironment predicts clinical outcome in breast cancer patients. Breast Cancer Res. 2012;14(2):R51. doi: 10.1186/bcr3152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Troester MA, Hoadley KA, D’Arcy M, et al. DNA defects, epigenetics, and gene expression in cancer-adjacent breast: a study from The Cancer Genome Atlas. npj Breast Cancer. 2016;2:16007. doi: 10.1038/npjbcancer.2016.7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Casbas-Hernandez P, Sun X, Roman-Perez E, et al. Tumor intrinsic subtype is reflected in cancer-adjacent tissue. Cancer Epidemiol Biomarkers Prev. 2015;24(2):406–414. doi: 10.1158/1055-9965.EPI-14-0934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Sisti JS, Collins LC, Beck AH, Tamimi RM, Rosner BA, Eliassen AH. Reproductive risk factors in relation to molecular subtypes of breast cancer: Results from the nurses’ health studies. Int J Cancer. 2016;138(10):2346–2356. doi: 10.1002/ijc.29968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Tamimi RM, Baer HJ, Marotti J, et al. Comparison of molecular phenotypes of ductal carcinoma in situ and invasive breast cancer. Breast Cancer Res. 2008;10(4):R67. doi: 10.1186/bcr2128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Wang J, Heng YJ, Eliassen AH, et al. Alcohol consumption and breast tumor gene expression. Breast Cancer Res. 2017;19(1):108. doi: 10.1186/s13058-017-0901-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Nielsen TO, Parker JS, Leung S, et al. A comparison of PAM50 intrinsic subtyping with immunohistochemistry and clinical prognostic factors in tamoxifen-treated estrogen receptor-positive breast cancer. Clin Cancer Res. 2010;16(21):5222–5232. doi: 10.1158/1078-0432.CCR-10-1282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Collins LC, Marotti JD, Baer HJ, Tamimi RM. Comparison of estrogen receptor results from pathology reports with results from central laboratory testing. J Natl Cancer Inst. 2008;100(3):218–221. doi: 10.1093/jnci/djm270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Hirko KA, Chen WY, Willett WC, et al. Alcohol consumption and risk of breast cancer by molecular subtype: Prospective analysis of the nurses’ health study after 26 years of follow-up. Int J Cancer. 2016;138(5):1094–1101. doi: 10.1002/ijc.29861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kuhn M Building predictive models in R using the caret package. J Stat Softw. 2008;28(5):1–26. doi: 10.1053/j.sodo.2009.03.002.27774042 [DOI] [Google Scholar]
- 38.Grambsch PM, Therneau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika. 1994;81(3):515–526. doi: 10.1093/biomet/81.3.515. [DOI] [Google Scholar]
- 39.Kwan ML, Kroenke CH, Sweeney C, et al. Association of high obesity with PAM50 breast cancer intrinsic subtypes and gene expression. BMC Cancer. 2015;15(1):278. doi: 10.1186/s12885-015-1263-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Barnard ME, Boeke CE, Tamimi RM. Established breast cancer risk factors and risk of intrinsic tumor subtypes. Biochim Biophys Acta. 2015;1856(1):73–85. doi: 10.1016/j.bbcan.2015.06.002. [DOI] [PubMed] [Google Scholar]
- 41.Liu MC, Pitcher BN, Mardis ER, et al. PAM50 gene signatures and breast cancer prognosis with adjuvant anthracycline- and taxane-based chemotherapy: correlative analysis of C9741 (Alliance). npj Breast Cancer. 2016;2(1):15023. doi: 10.1038/npjbcancer.2015.23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Prat A, Cheang MCU, Galván P, et al. Prognostic Value of Intrinsic Subtypes in Hormone Receptor-Positive Metastatic Breast Cancer Treated With Letrozole With or Without Lapatinib. JAMA Oncol. 2016;2(10):1287–1294. doi: 10.1001/jamaoncol.2016.0922. [DOI] [PubMed] [Google Scholar]
- 43.Carey LA, Perou CM, Livasy CA, et al. Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. JAMA. 2006;295(21):2492–2502. doi: 10.1001/jama.295.21.2492. [DOI] [PubMed] [Google Scholar]
- 44.Chia SK, Bramwell VH, Tu D, et al. A 50-gene intrinsic subtype classifier for prognosis and prediction of benefit from adjuvant tamoxifen. Clin Cancer Res. 2012;18(16):4465–4472. doi: 10.1158/1078-0432.CCR-12-0286. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.