Summary
To correlate the variable clinical features of estrogen receptor positive (ER+) breast cancer with somatic alterations, we studied pre-treatment tumour biopsies accrued from patients in a study of neoadjuvant aromatase inhibitor (AI) therapy by massively parallel sequencing and analysis. Eighteen significantly mutated genes were identified, including five genes (RUNX1, CBFB, MYH9, MLL3 and SF3B1) previously linked to hematopoietic disorders. Mutant MAP3K1 was associated with Luminal A status, low grade histology and low proliferation rates whereas mutant TP53 associated with the opposite pattern. Moreover, mutant GATA3 correlated with suppression of proliferation upon AI treatment. Pathway analysis demonstrated mutations in MAP2K4, a MAP3K1 substrate, produced similar perturbations as MAP3K1 loss. Distinct phenotypes in ER+ breast cancer are associated with specific patterns of somatic mutations that map into cellular pathways linked to tumor biology but most recurrent mutations are relatively infrequent. Prospective clinical trials based on these findings will require comprehensive genome sequencing.
Introduction
Estrogen receptor positive (ER+) breast cancer exhibits highly variable prognosis, histological growth patterns and treatment outcomes. Neoadjuvant aromatase inhibitor (AI) treatment trials provide an opportunity to document ER+ breast cancer phenotypes in a setting where sample acquisition is facile, prospective consent for genomic analysis can be obtained and responsiveness to estrogen deprivation therapy is documented1. We therefore conducted massively parallel sequencing (MPS) on 77 samples accrued from two neoadjuvant aromatase inhibitor clinical trials2,3. Forty-six cases underwent whole genome sequencing (WGS) and 31 cases, exome sequencing, followed by extensive analysis for somatic alterations and their association with aromatase inhibitor response. Case selection for discovery was based on the tumour Ki67 level in the surgical specimen, since high cellular proliferation despite AI treatment identifies poor prognosis tumours exhibiting estrogen-independent growth4 (Supplementary Fig. 1). Twenty-nine samples displayed Ki67 above 10% (“AI resistant tumours”, median Ki67 21%, range 10.3–80%) and 48 were at or below 10% (“AI sensitive tumours”, median Ki67=1.2%, range 0–8%), Cases were also classified as luminal A or B by gene expression profiling3. We subsequently examined interactions between Ki67 biomarker change, histological categories, intrinsic subtype and mutation status in selected recurrently mutated genes in 310 cases overall. Pathway analysis was applied to contrast the signaling perturbations in AI sensitive versus resistant tumors.
Results
The mutation landscape of luminal-type breast cancer
Using paired-end MPS, 46 tumour and normal genomes were sequenced to at least 30-fold and 25-fold haploid coverage, respectively, with diploid coverage of at least 95% based on concordance with SNP array data (Supplementary Table 1). Candidate somatic events were identified using multiple algorithms,5,6 then were verified by hybridization capture-based validation that targeted all putative somatic single nucleotide variants (SNVs) and small insertions/deletions (indels) that overlap coding exons, splice sites, and RNA genes (tier 1), high-confidence SNVs and indels in non-coding conserved or regulatory regions (tier 2), as well as non-repetitive regions of the human genome (tier 3). In addition somatic structural variants (SVs) and germline SVs that potentially affect coding sequences (Supplementary Information) were assessed. Digital sequencing data from captured target DNAs from the 46 tumour and normal pairs (Supplementary Table 2 and Supplementary Information) confirmed 81,858 mutations (point mutations and indels) and 773 somatic SVs. The average numbers of somatic mutations and SVs were 1,780 (range 44 – 11,619) and 16.8 (range 0 – 178) per case, respectively (Supplementary Table 3). Tier 1 point mutations and small indels predicted for all 46 cases also were validated using both 454 and Illumina sequencing (Supplementary Information). BRC25 was a clear outlier with only 44 validated tier 1–3 mutations at low allele frequencies (ranging from 5% to 26.8%). Likely, this sample had low tumour content despite histopathology assessment, but the data are included to avoid bias.
The overall mutation rate was 1.18 validated mutations per Mbp (tier 1:1.05; tier 2: 1.14; tier 3: 1.20). The mutation rate for tier 1 was higher than observed for AML (0.18–0.23)6,7 but lower than reported for hepatocellular carcinoma (1.85)8, malignant melanoma (6.65)9 and lung cancers (3.05–8.93)10,11 (Supplementary Table 4). The background mutation rate (BMR) across the 21 AI resistant tumours was 1.62 per Mbp, nearly twice that of the 25 AI sensitive tumours at 0.824 per Mbp (P = 0.02, one-sided t-test). A trend for more somatic structural variations in the AI resistant group also was observed, as the validated somatic structural variation frequency in the 21 AI resistant tumour genomes was 21.69 versus an average of 12.76 in 25 AI sensitive tumours (P = 0.16, one-sided t-test) (Fig. 1). If 10 TP53 mutated cases were excluded, the BMR still tended to be higher in the AI resistant group (P=0.08). To demonstrate a single tumour core biopsy produced representative genomic data, whole genome sequencing of two pre-treatment biopsies was conducted for 5 of the 46 cases. The frequency of mutations in the paired specimens showed high concordance in all cases (correlation co-efficiency ranged from 0.74 to 0.95) (Supplementary Fig. 2) and a somatic mutation was infrequently detected in only one of the two samples (4.65% overall).
Significantly mutated genes in luminal-type breast cancer
The discovery effort was extended by studying 31 additional cases by exome sequencing, producing an additional 1,371 tier 1 mutations. In total the 77 cases yielded 3,355 tier 1 somatic mutations, including 3,208 point mutations, 1 dinucleotide mutation, and 146 indels, ranging from 1 to 28 nucleotides. The point mutations included 733 silent, 2,145 missense, 178 nonsense, 6 read-through, 69 splice-site mutations, and 77 in RNA genes (Supplementary Table 5). Of 2,145 missense mutations, 1,551 were predicted to be deleterious by SIFT14 and/or PolyPhen15. The MuSiC package (Dees et al., manuscript submitted) was applied to determine the significance of the difference between observed versus expected mutation events in each gene based on the background mutation rate. This identified 18 significantly mutated genes (SMG) with a convolution FDR < 0.26 (Table 1 and Supplementary Table 6). The list contains genes previously identified as mutated in breast cancer (PIK3CA12, TP5313, GATA314, CDH115, RB116, MLL317, MAP3K118 and CDKN1B19) as well as genes not previously observed in clinical breast cancer samples, including TBX3, RUNX1, LDLRAP1, STNM2, MYH9, AGTR2, STMN2, SF3B1, and CBFB.
Table 1.
Gene | Total | MS | NS | Indel | SS | P-value | FDR |
---|---|---|---|---|---|---|---|
MAP3K1 | 13 | 2 | 3 | 8 | 0 | 0 | 0 |
PIK3CA | 45 | 44 | 0 | 1 | 0 | 0 | 0 |
TP53 | 18 | 13 | 1 | 1 | 1 | 0 | 0 |
GATA3 | 8 | 1 | 0 | 4 | 3 | 1.15E-19 | 7.41E-16 |
CDH1 | 8 | 1 | 1 | 5 | 1 | 3.07E-15 | 1.59E-11 |
TBX3 | 4 | 0 | 0 | 3 | 0 | 2.58E-06 | 0.011 |
ATR | 6 | 6 | 0 | 0 | 0 | 3.73E-06 | 0.014 |
RUNX1 | 4 | 4 | 0 | 0 | 0 | 6.59E-06 | 0.021 |
ENSG00000212670 | 2 | 2 | 0 | 0 | 0 | 2.31E-05 | 0.066 |
RB1 | 4 | 2 | 1 | 0 | 1 | 2.76E-05 | 0.071 |
LDLRAP1 | 2 | 1 | 1 | 0 | 0 | 4.27E-05 | 0.092 |
STMN2 | 2 | 1 | 0 | 1 | 0 | 4.15E-05 | 0.092 |
MYH9 | 4 | 1 | 1 | 2 | 0 | 8.96E-05 | 0.178 |
MLL3 | 5 | 1 | 1 | 3 | 0 | 0 | 0.191 |
CDKN1B | 2 | 0 | 1 | 1 | 0 | 0 | 0.240 |
AGTR2 | 2 | 2 | 0 | 0 | 0 | 0 | 0.256 |
SF3B1 | 3 | 3 | 0 | 0 | 0 | 0 | 0.256 |
CBFB | 2 | 1 | 1 | 0 | 0 | 0 | 0.256 |
Footnote:
ENSG00000212670 is not in RefSeq release 50.
MS = Missense, NS = Nonsense, SS = Splice Site.
Thirteen mutations (3 nonsense, 6 frame-shift indels, 2 in-frame deletions and 2 missense) were identified in MAP3K1 (Table 1 and Fig. 2), a serine/threonine kinase that activates the ERK and JNK kinase pathways through phosphorylation of MAP2K1 and MAP2K420. Of interest, a missense (S184L) and a splice-region mutation (e2+3 likely affecting splicing) in MAP2K4 were observed in two tumours with no MAP3K1 mutation (Fig. 2). Single nonsynonymous mutations in MAP3K12, MAP3K4, MAP4K3, MAP4K4, MAPK15, and MAPK3 also were detected (Supplementary Table 5). TBX3 harbored three small indels (one insertion and two deletions). TBX3 affects expansion of breast cancer stem-like cells through regulation of FGFR21. Two truncating mutations in the tumor suppressor CDKN1B were identified19. Four missense RUNX1 mutations were observed, with three in the RUNT domain clustered within the 8 amino acid putative ATP-binding site (R166Q, G168E, and R169K). RUNX1 is a transcription factor affected by mutation and translocation in the M2 subtype of AML22 and is implicated in tethering ER to promoters independently of estrogen response elements23. Two mutations (N104S and N140*) also were identified in CBFB, the binding partner of RUNX1. Additional mutations included 3 missense (2 K700E and 1 K666Q), in SF3B1, a splicing factor implicated in MDS24 and CLL25. One missense and one nonsense mutation, and two indels, were found in the MYH9 gene involved hereditary macrothrombocytopenia26 as well being observed in an ALK translocation in anaplastic large cell lymphoma27.
We also identified three SMGs (LDLRAP1, AGTR2, and STMN2), not previously implicated in cancer. A missense and a nonsense mutation were observed in LDLRAP1, a gene associated with familial hypercholesterolemia28. AGTR2, angiotensin II receptor type 2 harbored two missense mutations (V184I and R251H). Angiotensin signaling and ER intersect in models of tissue fibrosis29. STMN2, a gene activated by JNK family kinases30,31 and therefore regulated by MAP3K1 and MAP2K4, harbored one frameshift deletion and one missense mutation. Three deletions and one point mutation (Supplementary Fig. 3) were identified in a large, infrequently spliced non-coding (lnc) RNA gene, MALAT1 (metastasis associated lung adenocarcinoma transcript 1), that regulates alternative splicing by modulating the phosphorylation of SR splicing factor32. Translocations and point mutations of MALAT1 have been reported in sarcoma33 and colorectal cancer cell lines34. Five additional MALAT1 mutations were found in the recurrent screening set (Supplementary Table 5d). The locations of these mutations clustered in a region of species homology (F1 and 2 domains) that could mediate interactions with SRSF132 (Supplementary Fig. 4). Non-coding mutation clusters were found in ATR, GPR126, and NRG3 (Supplementary Information and Supplementary Table 7).
Correlations between mutations, AI response biomarkers, and histology
To study clinical correlations, mutation recurrence screening was conducted on an additional 240 cases (Supplementary Table 8 and Supplementary Fig. 1). By combining WGS, exome, and recurrence screening data, we determined the mutation frequency in PIK3CA to be 41.3% (131 of 317 tumours) (Supplementary Table 5a-d and Supplementary Fig. 3). TP53 was mutated in 51 of 317 tumours (16.1%) (Supplementary Table 5a-d and Supplementary Fig. 3). Additionally, 52 nonsynonymous MAP3K1 mutations in 39 tumours and 10 mutations in its substrate MAP2K4 were observed representing a combined case frequency of 15.5% (Supplementary Table 5a-d and Fig. 3). Of note, 52 of the 62 non-silent mutations in MAP3K1 and MAP2K4 were scattered indels or other protein truncating events strongly suggesting functional inactivation. In addition, 13 tumours harbored two non-silent MAP3K1 mutations, indicative of bi-allelic loss and reinforcing the conclusion that this gene is a tumour suppressor. Twenty nine tumours harboured a total of 30 mutations in GATA3, consisting of 25 truncation events, one in-frame insertion, and 4 missense mutations including 3 recurrent mutations at M294K (Supplementary Table 5a-d and Supplementary Fig. 3). BRC8 harboured a chromosome 10 deletion that includes GATA3. CDH1 mutation data were available for 169 samples and, as expected, its mutation status was strongly associated with lobular breast cancer15 (Table 2). We applied a permutation-based approach in MuSiC (Dees et al., submitted) to ascertain relationships between mutated genes. Negative correlations were found between mutations in gene pairs such as GATA3 and PIK3CA (P = 0.0026), CDH1 and GATA3 (P = 0.015), and CDH1 and TP53 (P = 0.022). MAP3K1 and MAP2K4 mutations were mutually exclusive, albeit without reaching statistical significance (P = 0.3). In contrast, a positive correlation between MAP3K1/MAP2K4 and PIK3CA mutations was highly significant (P = 0.0002) (Supplementary Table 9).
Table 2.
Gene | Expression/histopathology variable | Mutation Frequency* | SET1 P† | SET2 P† | Whole Set FDR P¶ |
---|---|---|---|---|---|
TP53 | Luminal subtype | ||||
Luminal A | 9.3% (13/140) | 0.001 | 0.46 | 0.041 | |
Luminal B | 21.5% (38/177) | ||||
| |||||
TP53 | Histological grade | ||||
I | 4.5% (3/66) | 0.050 | 0.067 | 0.020 | |
II/III | 19.2% (48/250) | ||||
| |||||
MAP3K1 | Luminal subtype | ||||
Luminal A | 20.0% (28/140) | 0.018 | 0.028 | 0.005 | |
Luminal B | 6.2% (11/177) | ||||
| |||||
MAP3K1 | Histological grade | ||||
I | 25.8% (17/66) | 0.061 | 0.011 | 0.005 | |
II/III | 8.8% (22/250) | ||||
| |||||
CDH1 | Histological Type | ||||
Ductal | 5.9% (10/169) | 0.41‡ | 2.8E-11 | 3.9E-10 | |
Lobular | 50.0% (20/40) |
Mutation percentage (mutant cases/total cases in a category); Counts are based on all cases (SET 1 and 2 combined).
Unadjusted p-value from Fisher’s Exact Test or Chi-square test as proper.
Banjamini-Hochberg false discovery rate (FDR) adjusted p-value using all cases (SET1 and SET2 combined).
only 77 cases in SET 1 had CDH1 sequencing results.
Two independent mutation data sets from these clinical trial samples were analyzed separately and then in combination, with a false discovery rate (FDR) corrected P value to gauge the overall strength and consistency of genotype/phenotype relationships (Table 2 and Supplementary Fig. 1). TP53 mutations in both data sets correlated with significantly higher Ki67 levels, both at baseline (P = 0.0003) and at surgery (P = 0.001). Furthermore, TP53 mutations were significantly enriched in luminal B tumours (P = 0.04) and in higher histological grade tumours (P = 0.02). In contrast, MAP3K1 mutations were more frequent in luminal A tumours (P = 0.02), in grade 1 tumours (P=0.005) and in tumours with lower Ki67 at baseline (P = 0.001) with consistent findings across both data sets. GATA3 mutation did not influence baseline Ki67 levels but was enriched in samples exhibiting greater percentage Ki67 decline (P = 0.01). This finding requires further verification because it was significant in SET1 (uncorrected P value 0.003) but was a marginal finding in SET2 (P = 0.08). However, it suggests GATA3 mutation may be a positive predictive marker for AI response.
Structural variation and DNA repair mechanisms
Analysis of copy number alterations (CNAs) revealed arm-level gains for 1q, 5p, 8q, 16p, 17q, 20p, and 20q and arm-level losses for 1p, 8p, 16q, and 17p in the 46 WGS tumour genomes (Supplementary Fig. 5). A total of 773 SVs (579 deletions, 189 translocations, and 5 inversions) identified by WGS were validated as somatic in 46 breast cancer genomes by capture validation. No recurrent translocations were detected but six in-frame fusion genes were validated by RT-PCR (Supplementary Information and Supplementary Tables 10–13). Seven tumours had multiple complex translocations with breakpoints suggestive of a catastrophic mitotic event (“chromothripsis”; Supplementary Table 11). Analysis of the SV genomic breakpoints shows the spectra of putative chromothripsis-related events are the same as seen for other somatic events, with the majority of SVs arising from non-homologous end-joining. We classified somatic (mitotic) and germline (meiotic) SVs into four groups: variable number tandem repeat (VNTR), non-allelic homologous recombination (NAHR), microhomology-mediated end joining (MMEJ), and non-homologous end joining (NHEJ), according to criteria described in Supplementary Information. The fraction of each classification is shown for germline and somatic (mitotic) events (Supplementary Table 14). There were significantly more somatic NHEJ events in tumour genomes than the other three types (P < 2.2e-16).
Pathways in luminal breast cancer relevant to AI response
Pathscan 35 analysis (Supplementary Table 15 and Supplementary Information) indicated that somatic mutations detected in the 77 discovery cases affect a number of pathways including caspase cascade/apoptosis, ErbB signaling, Akt/PI3K/mTOR signaling, TP53/RB signaling, and MAPK/JNK pathways (Figure 4a). To discern the pathways relevant to AI sensitivity, we conducted separate pathway analyses for AI sensitive versus AI resistant tumors. While the majority of top altered pathways (FDR <= 0.15) in each group are shared, several pathways were enriched in the AI resistant group, including the TP53 signaling pathway, DNA replication, and mismatch repair. Specifically, 38% of the AI resistant group (11 of 29 tumours) have mutations in the TP53 pathway with three having double or triple hits involving TP53, ATR, APAF1, or THBS1. In contrast, only 16.6% (8 of 48 tumours) of the Ki67 low group had mutations in the TP53 signaling pathway, each with only a single hit in genes TP53, ATR, CCNE2, or IGF1. (Supplementary Table 16).
GeneGo pathway analysis of MetaCore interacting network objects was used to identify genes in the 77 luminal breast cancers with low-frequency mutations that cluster into pathway maps. Eight networks assembled from significant maps encompassed mutations from 71 (92%) of the tumours (Fig. 4b). Many of the network objects shared pathways with SMGs such as TP53, MAP3K1, PIK3CA, and CDH1. GeneGo analysis also revealed that several genes with low-frequency mutations were actually subunits of complexes, resulting in higher mutation rates for that object, e.g., the condensin complex (4 mutations in 4 genes) and the MRN complex (4 mutations on 3 genes). Several pathways without multiple SMGs, such as the apoptotic cascade, calcium/phospholipase signaling, and G-protein coupled receptors, were significantly affected by low-frequency mutations. Grouping tumours by SMGs and pathway mutation status showed that while 55 (71%) of the tumours contained SMGs in significant pathways, an additional 16 (21%) contained only non-SMGs in these pathways. Thus, tumours without a given SMG often had other mutations in the same relevant pathway (Fig. 4b, Supplementary Fig. 6, Supplementary Table 17, and Supplementary Information).
We also applied PARADIGM36 to infer pathway-informed gene activities using gene expression and copy number data to identify several “hubs” of activity (Supplementary Fig. 7, Supplementary Fig. 8 and Supplementary Information). As expected, ESR1 and FOXA1 were among the hubs activated cohort-wide while other hubs exhibited high but differential changes in AI resistant tumours including C-MYC, FOXM1, and C-MYB (Supplementary Fig. 8). The concordance among the 104 MetaCore maps from GeneGo analysis described above is significant, with 75 (72%) matching one of the PARADIGM subnetworks at the 0.05 significance level after multiple test correction (P < 4.4×10−6; Bonferroni-adjusted hypergeometric test) (Supplementary Fig. 9). We identified significant subnetworks associated with Ki67 biomarker status (Supplementary Fig. 10 and Supplemental Information) involving transcription factors controlling large regulons.
The PARADIGM-inferred pathway signatures were further used to derive a map of the genetic mechanisms that may underlie treatment response. A sub-network was constructed in which interactions were retained only if they connected two features with higher than average absolute association with Ki67 biomarker status (Supplementary Fig. 10 and 11 and Supplemental Information). Consistent with the PathScan results, among the largest of the hubs in the identified network were a central DNA Damage hub with the second highest connectivity (55 regulatory interactions; 1% of the network) and TP53 with the 14th highest connectivity (26 connections; 0.5% of the network). Additional highly connected hubs identified in order of connectivity were MYC with 79 connections (1.4%), FYN with 45 (0.8%), MAPK3 with 43, JUN with 40, HDAC1 with 40, SHC1 with 39, and HIF1A/ARNT complex with 39 (Supplementary Fig. 11).
To identify higher-level connections between mutations and clinical features, we compared the samples based on pathway-derived signatures. For each clinical attribute and each SMG, we dichotomized the discovery samples into a positive and negative group for pair-wise comparisons (see details in Supplementary Information). We then computed all pair-wise Pearson correlations between pathway signatures and clustered the resulting correlations (Fig. 5). The entire process was repeated using validated mutations and signatures derived from the validation set (Supplementary Fig. 12). In line with expectation, PIK3CA, MAP3K1, MAP2K4, and low risk preoperative endocrine prognostic index (PEPI) scores (PEPI is an index of recurrence risk post neoadjuvant AI therapy4) cluster with the luminal A subtypes and with each other, and are supported by the validation set analysis. The luminal B-like signatures included TP53, RB1, RUNX1 and MALAT1, which also associated with other poor outcome features such as high baseline and surgical Ki67 levels, high grade histology and high PEPI scores. The TP53 and MALAT1 associations in the discovery set also were supported by the validation set analysis.
Druggable gene analysis
We defined mutations in druggable tyrosine kinase domains including in ERBB2 (a V777L and a 755–759 LRENT in frame deletion homologous to gefitinib-activating EGFR mutations in lung cancer 37), as well as in DDR1 (A829V, R611C), DDR2 (E583D), CSF1R (D735H, M875L), and PDGFRA (E924K). In addition, pleckstrin homology domain mutations were observed in AKT1 (C77F) and AKT2 (S11F) and a kinase domain mutation was identified in RPS6KB1 (S375F) (Supplementary Table 18).
Discussion
The low frequency of many SMGs presents an enormous challenge for correlative analysis, but several statistically significant patterns were identified, including the relationship between MAP3K1 mutation, luminal A subtype, low tumour grade and low Ki67 proliferation index. On this basis, for patients with MAP3K1 mutant luminal tumors, neoadjuvant AI could provide a favorable option. In contrast, tumors with TP53 mutation, which are mostly AI resistant, would be more appropriately treated with other modalities. MAP3K1 activates the ERK family, thus, loss of ERK signaling could explain the indolent nature of MAP3K1 deficient tumours20. However, MAP3K1 also activates JNK through MAP2K4, which also can be mutated38. Loss of JNK signaling produces a defect in apoptosis in response to stress, which would hypothetically explain why these mutations accumulate39,40. PIK3CA harbored the most mutations (41.3%) but was neither associated with clinical nor Ki67 response, confirming our earlier report41. However, the positive association between MAP3K1/MAP2K4 mutations and PIK3CA mutation at both the mutation and pathway levels suggests cooperativity (Fig. 4a).
The finding of multiple SMGs linked previously to benign and malignant haematopoeitic disorders suggests that breast cancer, like leukemia, can be viewed as a stem cell disorder that produces indolent or aggressive tumours that display varying phenotypes depending on differentiation blocks generated by different mutation repertoires 42. While only MLL3 showed statistical significance in the analysis of 46 WGS cases, multiple mutations in genes related to histone modification and chromatin remodeling are worth noting (Supplementary Table 19). An array of coding mutations and structural variations was discovered in methyltransferases (MLL2, MLL3, MLL4, and MLL5), demethyltransferases (KDM6A, KDM4A, KDM5B, and KDM5C), and acetyltransferases (MYST1, MYST3, and MYST4). Furthermore, our analysis identified several adenine-thymine (AT)–rich interactive domain–containing protein genes (ARID1A, ARID2, ARID3B, and ARID4B) that harbored mutations and large deletions, reinforcing the role of members from the SNF/SWI family in breast cancer.
Pathway analysis enables the evaluation of mutations with low recurrence frequency where statistical comparisons conventionally are underpowered. For example, the eight samples with MAP2K4 mutations were sufficient to derive a reliable pathway-based gene signature in PARADIGM that aligns with MAP3K1. This approach also pointed to a putative connection between MALAT1 and the TP53 pathway. Finally, we provide evidence that transcriptional associations to Ki67 response reside in a connected network under the control of several key “hub” genes including MYC, FYN, and MAP kinases among others. Targeting these hubs in resistant tumours could produce therapeutic advances. In conclusion, the genomic information derived from unbiased sequencing is a logical new starting point for clinical investigation, where the mutation status of an individual patient is determined in advance and treatment decisions are driven by therapeutic hypotheses that stem from knowledge of the genomic sequence and its possible consequences. However, the accrual of large numbers of patients and the use of comprehensive sequencing and gene expression approaches will be required because of the extreme genomic heterogeneity documented by this investigation.
Methods summary
Clinical trial samples were accessed from the preoperative letrozole phase 2 study (NCT00084396) 2 that investigated effect of letrozole for 16 to 24 weeks on surgical outcomes and from the American College of Surgeons Oncology Group (ACOSOG) Z1031 study (NCT00265759) 3 that compared anastrozole with exemestane or letrozole for 16 to 18 weeks before surgery (REMARK flow charts, supplementary Fig. 1). Baseline snap-frozen biopsy samples with greater than 70% tumour content (by nuclei) underwent DNA extraction and were paired with a peripheral blood DNA sample. Two formalin-fixed biopsies were obtained at baseline and at surgery, and were used to conduct ER and Ki67 immunohistochemistry as previously published4. Paired end Illumina reads from tumours and normals were aligned to NCBI build36 using BWA. Somatic point mutations were identified using SomaticSniper43, and indels were identified by combining results from a modified version of the Samtools indel caller (http://samtools.sourceforge.net/), GATK, and Pindel. Structural variations were identified using BreakDancer5 and SquareDancer (unpublished). All putative somatic events found in 46 cases were validated by targeted custom capture arrays (Nimblegen)/Illumina sequencing and all tier 1 mutations for 46 WGS cases also were validated using PCR/454 sequencing. All statistical analyses, including SMG, mutation relation and clinical correlation were done using the MuSiC package (manuscript submitted) and/or by standard statistical tests (Supplementary Information). Pathway analysis was performed with PathScan, GeneGo Metacore (http://www.genego.com/metacore.php), and PARADIGM. A complete description of the materials and methods used to generate this data set and results is provided in the Supplementary Methods section.
Supplementary Material
Table 3.
Gene | Ki67 Variable | Wildtype mean* | Mutant mean* | SET1 P† | SET2 P† | Whole Set FDR P¶ |
---|---|---|---|---|---|---|
TP53 | Baseline | 13.1 | 25.1 | 3.7E-05 | 0.012 | 0.0003 |
Surgery | 1.40 | 4.00 | 0.0002 | 0.014 | 0.001 | |
% change | −89.2 | −84.3 | 0.09 | 0.28 | 0.24 | |
| ||||||
MAP3K1 | Baseline | 15.8 | 8.1 | 0.049 | 0.001 | 0.002 |
Surgery | 1.86 | 0.75 | 0.11 | 0.10 | 0.05 | |
% change | −88.3 | −90.5 | 0.49 | 0.65 | 0.55 | |
| ||||||
GATA3 | Baseline | 14.8 | 11.5 | 0.13 | 0.95 | 0.56 |
Surgery | 1.95 | 0.38 | 0.001 | 0.23 | 0.012 | |
% change | −86.8 | −96.9 | 0.003 | 0.08 | 0.012 |
Geometric means are based on all cases (SET1 and SET2 combined),
Unadjusted P value from Wilcoxon Rank Sum test.
Banjamini-Hochberg false discovery rate (FDR) adjusted p-value using all cases (SET1 and SET2 combined).
Acknowledgments
This article is dedicated to the memory of Evelyn Lauder in recognition of her tireless efforts to eradicate breast cancer. We would like to thank the participating patients and their families, clinical investigators and their support staff and the Cancer Therapy Evaluation Program at the US National Cancer Institute. We would like to acknowledge the efforts of the following people and groups at The Genome Institute for their contributions to this manuscript: the Analysis Pipeline group for developing the automated analysis pipelines that generated alignments and somatic variants; the LIMS group for developing tools to manage validation array ordering, capture, and sequencing, and Joelle Veizer and Heather Schmidt for structural variant and recurrent screening analyses. We thank the many members of the Siteman Cancer Center at Washington University in St. Louis for support, and the committed members of the American College of Surgeons Oncology Group and their patients for contributing samples to the Z1031 trial. This work was funded by grants to R.K.W. (Richard K. Wilson) from the National Human Genome Research Institute (NHGRI U54 HG003079), grants to M.J.E. (Matthew J. Ellis) from the National Cancer Institute (NCI R01 CA095614, NCI U01 CA114722), the Susan G Komen Breast Cancer Foundation (BCTR0707808), and the Fashion Footwear Charitable Foundation, Inc., grant awards to ACOSOG included NCI U10 CA076001, the Breast Cancer Research Foundation, and clinical trial support from Novartis and Pfizer, and a Center grant (NCI P50 CA94056) to D.P.-W. (David Piwnica-Worms). We also acknowledge institutional support in the form of the Washington University Cancer Genome Initiative (R.K.W.), and a productive partnership with Illumina, Inc. The tissue procurement core was supported by an NCI core grant to the Siteman Cancer Center (NCI 3P50 CA68438). The BRIGHT Institute is supported in part by an ATT/Emerson gift to the Siteman Cancer Center.
Footnotes
Author Contributions
M.J.E. led the clinical investigations, biomarker analysis and chip-based genomics. E.R.M., M.J.E., L.D., R.S.F., T.J.L., and R.K.W. designed the experiments. L. D. and M.J.E. led data analysis. D.S., J.W.W., D.C.K., C.C.H., M.D.M., K.C., C.M., W.S., M.C.W., R.C. and C.K. performed data analysis. D.S., C.M., J.W.W., J.F.M., C. L., and L.D. prepared figures and tables. R.S.F., L.L.F., R.D., M.H., T.V., J.H., L.L., R.C. and J.S. performed laboratory experiments. L.E., G.U., J.M., G.V.B., P.K.M., J.M.G., M.L., K.H., and J.O. provided samples and clinical data. V.J.S., K.B., J.L., Y.T., and C.K. provided statistical and clinical correlation analysis. D.O. oversees the ACOSOG Operations Center that provides oversight and tracking for ACOSOG clinical trials. K.D., S.McD., D.C.A., and M.W. provided pathology analysis. B.V.T, J.W., R.J.G., A.E., D.P.-W., H.P.-W., J. M. S., T. C. G., S. N., C. K., and M.C.W. performed pathway analysis. L-W.C. and R.B. analyzed the druggable target mutation data. D.J.D. and B.O. provided informatics support. L.D., M. J. E., and E. R. M. wrote the manuscript. T.J.L., M.C.W., and R.K.W. critically read and commented on the manuscript.
References
- 1.Chia YH, Ellis MJ, Ma CX. Neoadjuvant endocrine therapy in primary breast cancer: indications and use as a research tool. Br J Cancer. 2010;103:759–764. doi: 10.1038/sj.bjc.6605845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Olson JA, Jr, et al. Improved surgical outcomes for breast cancer patients receiving neoadjuvant aromatase inhibitor therapy: results from a multicenter phase II trial. J Am Coll Surg. 2009;208:906–914. doi: 10.1016/j.jamcollsurg.2009.01.035. discussion 915–906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ellis MJ, et al. Randomized Phase II Neoadjuvant Comparison Between Letrozole, Anastrozole, and Exemestane for Postmenopausal Women With Estrogen Receptor-Rich Stage 2 to 3 Breast Cancer: Clinical and Biomarker Outcomes and Predictive Value of the Baseline PAM50-Based Intrinsic Subtype--ACOSOG Z1031. J Clin Oncol. 2011 doi: 10.1200/JCO.2010.31.6950. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ellis MJ, et al. Outcome prediction for estrogen receptor-positive breast cancer based on postneoadjuvant endocrine therapy tumor characteristics. J Natl Cancer Inst. 2008;100:1380–1388. doi: 10.1093/jnci/djn309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Chen K, et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature methods. 2009;6:677–681. doi: 10.1038/nmeth.1363. nmeth.1363 [pii] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Mardis ER, et al. Recurring mutations found by sequencing an acute myeloid leukemia genome. The New England journal of medicine. 2009;361:1058–1066. doi: 10.1056/NEJMoa0903840. NEJMoa0903840 [pii] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ley TJ, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature. 2008;456:66–72. doi: 10.1038/nature07485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Totoki Y, et al. High-resolution characterization of a hepatocellular carcinoma genome. Nat Genet. 2011;43:464–469. doi: 10.1038/ng.804. [DOI] [PubMed] [Google Scholar]
- 9.Pleasance ED, et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature. 2010;463:191–196. doi: 10.1038/nature08658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Pleasance ED, et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature. 2010;463:184–190. doi: 10.1038/nature08629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lee W, et al. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature. 2010;465:473–477. doi: 10.1038/nature09004. [DOI] [PubMed] [Google Scholar]
- 12.Samuels Y, et al. High frequency of mutations of the PIK3CA gene in human cancers. Science. 2004;304:554. doi: 10.1126/science.1096502. [DOI] [PubMed] [Google Scholar]
- 13.Prosser J, Thompson AM, Cranston G, Evans HJ. Evidence that p53 behaves as a tumour suppressor gene in sporadic breast tumours. Oncogene. 1990;5:1573–1579. [PubMed] [Google Scholar]
- 14.Usary J, et al. Mutation of GATA3 in human breast tumors. Oncogene. 2004;23:7669–7678. doi: 10.1038/sj.onc.1207966. [DOI] [PubMed] [Google Scholar]
- 15.Berx G, et al. E-cadherin is a tumour/invasion suppressor gene mutated in human lobular breast cancers. Embo J. 1995;14:6107–6115. doi: 10.1002/j.1460-2075.1995.tb00301.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.T’Ang A, Varley JM, Chakraborty S, Murphree AL, Fung YK. Structural rearrangement of the retinoblastoma gene in human breast carcinoma. Science. 1988;242:263–266. doi: 10.1126/science.3175651. [DOI] [PubMed] [Google Scholar]
- 17.Wang XX, et al. Somatic mutations of the mixed-lineage leukemia 3 (MLL3) gene in primary breast cancers. Pathol Oncol Res. 2011;17:429–433. doi: 10.1007/s12253-010-9316-0. [DOI] [PubMed] [Google Scholar]
- 18.Kan Z, et al. Diverse somatic mutation patterns and pathway alterations in human cancers. Nature. 2010;466:869–873. doi: 10.1038/nature09208. [DOI] [PubMed] [Google Scholar]
- 19.Spirin KS, et al. p27/Kip1 mutation found in breast cancer. Cancer Res. 1996;56:2400–2404. [PubMed] [Google Scholar]
- 20.Fanger GR, Johnson NL, Johnson GL. MEK kinases are regulated by EGF and selectively interact with Rac/Cdc42. Embo J. 1997;16:4961–4972. doi: 10.1093/emboj/16.16.4961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Fillmore CM, et al. Estrogen expands breast cancer stem-like cells through paracrine FGF/Tbx3 signaling. Proc Natl Acad Sci U S A. 2010;107:21737–21742. doi: 10.1073/pnas.1007863107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Mao S, Frank RC, Zhang J, Miyazaki Y, Nimer SD. Functional and physical interactions between AML1 proteins and an ETS protein, MEF: implications for the pathogenesis of t(8;21)-positive leukemias. Mol Cell Biol. 1999;19:3635–3644. doi: 10.1128/mcb.19.5.3635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Stender JD, et al. Genome-wide analysis of estrogen receptor alpha DNA binding and tethering mechanisms identifies Runx1 as a novel tethering factor in receptor-mediated transcriptional activation. Mol Cell Biol. 2010;30:3943–3955. doi: 10.1128/MCB.00118-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Papaemmanuil E, et al. Somatic SF3B1 mutation in myelodysplasia with ring sideroblasts. N Engl J Med. 2011;365:1384–1395. doi: 10.1056/NEJMoa1103283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wang L, et al. SF3B1 and other novel cancer genes in chronic lymphocytic leukemia. N Engl J Med. 2011;365:2497–2506. doi: 10.1056/NEJMoa1109016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Chen Z, et al. The May-Hegglin anomaly gene MYH9 is a negative regulator of platelet biogenesis modulated by the Rho-ROCK pathway. Blood. 2007;110:171–179. doi: 10.1182/blood-2007-02-071589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lamant L, et al. Non-muscle myosin heavy chain (MYH9): a new partner fused to ALK in anaplastic large cell lymphoma. Genes Chromosomes Cancer. 2003;37:427–432. doi: 10.1002/gcc.10232. [DOI] [PubMed] [Google Scholar]
- 28.Wilund KR, et al. Molecular mechanisms of autosomal recessive hypercholesterolemia. Hum Mol Genet. 2002;11:3019–3030. doi: 10.1093/hmg/11.24.3019. [DOI] [PubMed] [Google Scholar]
- 29.Delle H, et al. Antifibrotic effect of tamoxifen in a model of progressive renal disease. J Am Soc Nephrol. 2012;23:37–48. doi: 10.1681/ASN.2011010046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Tararuk T, et al. JNK1 phosphorylation of SCG10 determines microtubule dynamics and axodendritic length. J Cell Biol. 2006;173:265–277. doi: 10.1083/jcb.200511055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Westerlund N, et al. Phosphorylation of SCG10/stathmin-2 determines multipolar stage exit and neuronal migration rate. Nat Neurosci. 2011;14:305–313. doi: 10.1038/nn.2755. [DOI] [PubMed] [Google Scholar]
- 32.Tripathi V, et al. The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Molecular cell. 2010;39:925–938. doi: 10.1016/j.molcel.2010.08.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rajaram V, Knezevich S, Bove KE, Perry A, Pfeifer JD. DNA sequence of the translocation breakpoints in undifferentiated embryonal sarcoma arising in mesenchymal hamartoma of the liver harboring the t(11;19)(q11;q13.4) translocation. Genes Chromosomes Cancer. 2007;46:508–513. doi: 10.1002/gcc.20437. [DOI] [PubMed] [Google Scholar]
- 34.Xu C, Yang M, Tian J, Wang X, Li Z. MALAT-1: a long non-coding RNA and its important 3′ end functional motif in colorectal cancer metastasis. Int J Oncol. 2011;39:169–175. doi: 10.3892/ijo.2011.1007. [DOI] [PubMed] [Google Scholar]
- 35.Wendl MC, et al. PathScan: A Tool for Discerning Mutational Significance in Groups of Putative Cancer Genes. Bioinformatics. 2011 doi: 10.1093/bioinformatics/btr193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Vaske CJ, et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics. 2010;26:i237–245. doi: 10.1093/bioinformatics/btq182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Lynch TJ, et al. Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib. The New England journal of medicine. 2004;350:2129–2139. doi: 10.1056/NEJMoa040938. [DOI] [PubMed] [Google Scholar]
- 38.Johnson GL, Lapadat R. Mitogen-activated protein kinase pathways mediated by ERK, JNK, and p38 protein kinases. Science. 2002;298:1911–1912. doi: 10.1126/science.1072682. [DOI] [PubMed] [Google Scholar]
- 39.Widmann C, Johnson NL, Gardner AM, Smith RJ, Johnson GL. Potentiation of apoptosis by low dose stress stimuli in cells expressing activated MEK kinase 1. Oncogene. 1997;15:2439–2447. doi: 10.1038/sj.onc.1201421. [DOI] [PubMed] [Google Scholar]
- 40.Wagner EF, Nebreda AR. Signal integration by JNK and p38 MAPK pathways in cancer development. Nat Rev Cancer. 2009;9:537–549. doi: 10.1038/nrc2694. [DOI] [PubMed] [Google Scholar]
- 41.Ellis MJ, et al. Phosphatidyl-inositol-3-kinase alpha catalytic subunit mutation and response to neoadjuvant endocrine therapy for estrogen receptor positive breast cancer. Breast Cancer Res Treat. 2010;119:379–390. doi: 10.1007/s10549-0090575-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Prat A, Perou CM. Mammary development meets cancer genomics. Nat Med. 2009;15:842–844. doi: 10.1038/nm0809-842. [DOI] [PubMed] [Google Scholar]
- 43.Larson DE, et al. SomaticSniper: Identification of Somatic Point Mutations in Whole Genome Sequencing Data. Bioinformatics. 2011 doi: 10.1093/bioinformatics/btr665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Krzywinski M, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.