Significance
In this study, we provide the first detailed molecular characterization, to our knowledge, of a distinct cancer genomic configuration, the tandem duplicator phenotype (TDP), that is significantly enriched in the molecularly related triple-negative breast, serous ovarian, and endometrial carcinomas. We show here that TDP represents an oncogenic configuration featuring (i) genome-wide disruption of cancer genes, (ii) loss of cell cycle control and DNA damage repair, and (iii) increased sensitivity to cisplatin chemotherapy both in vitro and in vivo. Therefore, the TDP is a systems strategy to achieve a protumorigenic genomic configuration by altering a large number of oncogenes and tumor suppressors. The TDP arises in a molecular context of joint genomic instability and replicative drive, and is consequently associated with enhanced sensitivity to cisplatin.
Keywords: tandem duplications, cisplatin, triple-negative breast cancer, BRCA1, TP53
Abstract
Next-generation sequencing studies have revealed genome-wide structural variation patterns in cancer, such as chromothripsis and chromoplexy, that do not engage a single discernable driver mutation, and whose clinical relevance is unclear. We devised a robust genomic metric able to identify cancers with a chromotype called tandem duplicator phenotype (TDP) characterized by frequent and distributed tandem duplications (TDs). Enriched only in triple-negative breast cancer (TNBC) and in ovarian, endometrial, and liver cancers, TDP tumors conjointly exhibit tumor protein p53 (TP53) mutations, disruption of breast cancer 1 (BRCA1), and increased expression of DNA replication genes pointing at rereplication in a defective checkpoint environment as a plausible causal mechanism. The resultant TDs in TDP augment global oncogene expression and disrupt tumor suppressor genes. Importantly, the TDP strongly correlates with cisplatin sensitivity in both TNBC cell lines and primary patient-derived xenografts. We conclude that the TDP is a common cancer chromotype that coordinately alters oncogene/tumor suppressor expression with potential as a marker for chemotherapeutic response.
Cancer evolution is generally thought to result from the progressive accumulation of genomic lesions affecting key regulatory components of physiological cellular functions (1, 2). Oncogenic changes can manifest as single-nucleotide mutations; copy number alterations, such as deletions or duplications; and balanced rearrangements, including chromosomal translocations and inversions (3).
More recently, the systematic application of whole-genome sequencing (WGS) to the study of human cancer genomes has uncovered more complex scenarios, where large portions of the genome are affected by a multitude of somatic structural variations, which either originate from a few unique catastrophic events [e.g., chromothripsis, chromoplexy (4–6)] or result from the derangement of key molecular mechanisms leading to specific mutator phenotypes (7, 8). Although not always associated with a discernible driver mutation, these genome-wide structural variation patterns have the potential to deregulate several oncogenic elements simultaneously, and have been clearly associated with malignant phenotypes (4, 5, 9, 10).
Despite their relevance to the tumorigenic process, the causes of these genome-wide chromotypes, the cancer-driving oncogenic elements induced by these structural changes, and the clinical implications of these configurations remain unclear. Although recent advances have been made in understanding the mechanisms underlying chromothripsis, no specific therapeutic intervention has yet been identified for chromothriptic cancers or for other chromotypes (11–14).
Here, we study one of these genomic configurations, the tandem duplicator phenotype (TDP), which is characterized by the presence of a large number of somatic head-to-tail DNA segmental duplications [i.e., tandem duplications (TDs)] homogeneously distributed throughout the cancer genome (10, 15). In a meta-analysis of over 3,000 cancer genomes, we identify the most prevalent genetic features associated with this phenotype and those genetic features that may be responsible for its tumorigenic drive. Furthermore, we show an association between the extent of TDP and sensitivity to platinum-based chemotherapy in cell and primary xenograft models of triple-negative breast cancer (TNBC), providing a first indication of the potential utility of the TDP chromotype as a predictive genomic biomarker in a clinical setting.
Results
Homogeneous Distribution of TDs Across Cancer Genomes as a Systematic Measure of the TDP.
To address the lack of a systematic approach to identify and score the TDP, we developed a reproducible metric of TD genomic distribution, which we refer to as the TDP score. For each tumor sample, we tally the total number of TDs mapped by breakpoint analysis, and compare the observed (Obsi) and expected (Expi) numbers of TDs for each chromosome i:
where k equals the threshold value, which normalizes all values to the subsequently determined threshold for the TDP configuration (discussed below).
This metric is easily able to distinguish between a genomic configuration characterized by localized segmental amplifications with TDs vs. the TDP, in which TDs are evenly distributed across all chromosomes (Fig. 1A).
Fig. 1.
TDP scoring and sample classification. (A) Circos plots showing structural variations of representative cancer genomes with different levels of TDP scores. For each plot, sample identification number, the TDP score, and number of TDs over the total number of detected rearrangements are indicated (top to bottom). Structural variations were classified based on the four basic discordant paired-end mappings as TDs (red), deletions (blue), unpaired inversions (green), or interchromosomal translocations (gray). (B) Trimodal distribution of the TDP score values across the 277 cancer samples examined.
To address the incidence and genomic properties of the TDP, we combined WGS data from 277 human genomes representing 11 cancer types, including 96 breast tumors and cancer cell lines (4, 9, 10, 15–22) (Dataset S1). We observed that the TDP score distribution in this dataset follows a trimodal pattern (Materials and Methods and Fig. S1A), suggesting that cancers can be separated into distinct groups based on their propensity for TD formation. Upon visual inspection of tumors within a range of TDP scores by Circos plots, those tumors with the highest scores show the characteristic TD distribution of the TDP (Fig. 1A). In order to derive an unbiased threshold for classifying TDP tumors, we identified the threshold as the score that corresponds to 2 SDs from the second modal peak (−0.71; Fig. S1A). To simplify data presentation, we then set the TDP score to 0 at this defining threshold (k), resulting in positive and negative scores for TDP and non-TDP tumors, respectively (Fig. 1B). Using this threshold, 18.1% of the tumors analyzed are classified as TDPs, each showing a high number of TDs (average number of TDs per sample = 112.2, range: 23–416, modal TDP score = 0.19) that are broadly distributed throughout the genome (Fig. 1A and Fig. S1 A and B). By contrast, non-TDP samples are either associated with an intermediate number of TDs (10 to ∼100, modal TDP score = −0.50) that are invariably clustered in specific genomic regions or have a low number of TDs altogether (<20) indicative of a more stable genome (Fig. 1A and Fig. S1 A and B).
Fig. S1.
Structural variation-based score distributions and TDP status assignment. (A) Trimodal distribution of TDP scores (n = 266 samples with detected TDs) and cutoff for TDP classification. We resolved the trimodal distribution of TDP scores using the normalmixEM function of the mixtools package in R. The fraction of samples belonging to each one of the three underlying normal distributions, as well as the median and SD values of each curve, is shown in the table. The cutoff value to classify TDP samples is set to −0.71, which corresponds to the median + 2 × SDs of the second distribution. For better visualization, TDP scores were then centered around 0, as shown in C. (B) Scatter plot of TDP scores and TD numbers across tumor types (n = 266 cancer genomes analyzed by WGS). A color code differentiates between tumor types that are TDP-enriched (red), TDP-depleted (blue), or with no significant TDP prevalence (grey), as indicated in Table 1. (C) Distribution of the four basic structural variation scores across all cancer samples (n = 277). A calculation analogous to the one used to compute TDP scores was applied to other structural variation types (deletion, inversion, and interchromosomal translocation). Only the distribution of TD scores (red) shows a clear sample subpopulation characterized by distinctively higher scores.
We applied a similar scoring method to the three other basic rearrangements (deletion, inversion, and interchromosomal translocation), but found no evidence for distinct groups to manifest in multimodal score distributions as seen for TDs (Fig. S1C). This finding suggests that the TDP is not merely an indicator of genomic instability but, instead, represents a unique tumor subgroup with a distinct structural phenotype.
Previous evidence has suggested a higher frequency of the TDP in TNBC and ovarian (OV) cancer (19). Using our more precise and quantitative TDP measure based on only the WGS dataset, we confirmed that the TDP occurs statistically more frequently in TNBC (P = 2.16E-04), OV carcinoma (P = 4.95E-02), and hepatocellular carcinoma (P = 2.92E-02) but that it is significantly depleted in non-TNBC (P = 5.27E-02), glioblastoma (P = 4.10E-02), and prostate cancer (P = 1.77E-03) (Table 1). Indeed, we rarely observed TDP samples in prostate cancer, in which chromoplexy and chromothripsis appear to be the predominant whole-genome rearrangement patterns (4). This finding suggests that different mechanisms are active in different tumor types to produce specific dominant cancer genomic configurations.
Table 1.
Prevalence of the TDP among different tumor types
WGS | SNP array* | |||||||||
Cancer type | Total no. | TDP no. | % | P | Status | Total no. | TDP no. | % | P | Status |
TNBC | 40 | 17 | 42.5 | 2.16E-04 | E | 94 | 37 | 39.4 | 1.23E-08 | E |
Other breast cancers (non-TNBC) | 56 | 6 | 10.7 | 5.27E-02 | D | 594 | 22 | 3.7 | 2.41E-20 | D |
Colorectal adenocarcinoma | 14 | 0 | 0.0 | 6.11E-02 | ns | 545 | 6 | 1.1 | 3.36E-31 | D |
Glioblastoma | 16 | 0 | 0.0 | 4.10E-02 | D | 18 | 2 | 11.1 | 2.50E-01 | ns |
Hepatocellular carcinoma | 19 | 7 | 36.8 | 2.92E-02 | E | NA | — | — | — | |
Kidney renal clear cell carcinoma | 3 | 0 | 0.0 | 5.49E-01 | ns | 509 | 2 | 0.4 | 4.61E-34 | D |
Lung adenocarcinoma | 25 | 3 | 12.0 | 1.69E-01 | ns | NA | — | — | — | |
Lung squamous cell carcinoma | 18 | 5 | 27.8 | 1.24E-01 | ns | 364 | 31 | 8.5 | 3.43E-05 | D |
Multiple myeloma | 7 | 0 | 0.0 | 2.47E-01 | ns | NA | — | — | — | |
OV | 26 | 8 | 30.8 | 4.95E-02 | E | 382 | 236 | 61.8 | 4.16E-94 | E |
Prostate cancer | 43 | 1 | 2.3 | 1.77E-03 | D | NA | — | — | — | |
Endometrial carcinoma | 10 | 3 | 30.0 | 1.76E-01 | ns | 481 | 123 | 25.6 | 2.80E-09 | E |
Total | 277 | 50 | 18.1 | 2,987 | 459 | 15.4 |
TDP status was assigned based on either WGS data (n = 277 tumor samples) or Affymetrix SNP 6.0 array data (SNP array, n = 2,987 tumor samples). P values were computed using the binomial test. D, depletion; E, enrichment; ns, nonsignificant.
Tumor samples were classified based on the stringent thresholds described in Fig. S2E.
Whereas the TDP score is based on the identification of TDs through the assignment of breakpoints, and relies on the availability of WGS data, Ng et al. (15) estimated the prevalence of the TDP by counting the number of TD-like features from array-based copy number profiling in high-grade serous OV carcinoma. We wanted to compare the performance of our TDP scoring algorithm when applied to sequence- vs. array-based detection systems. We therefore analyzed Affymetrix SNP 6.0 array segmented copy number data from a subset of 81 tumor genomes profiled as part of The Cancer Genome Atlas (TCGA) project to compute copy number (array)-derived TDP scores and compare them with TDP scores obtained using paired-end WGS data (Fig. S2 A and B). Using SNP array copy number data alone, we could identify TDP samples with high specificity (0.95; Fig. S2 C and D) but lower sensitivity (0.57), likely due to the lower resolution of array data in detecting short segmental duplications. To increase the discrimination power of the SNP array-based TDP classification, we set a more stringent threshold to categorize non-TDP samples (Fig. S2E) and improve the sensitivity of the technology to 0.80 (Fig. S2F).
Fig. S2.
TDP status prediction using array-based copy number (CN) data. (A) TD-like segmental duplications were defined as CN segments ranging between 1 kb and 2 Mb in length, which showed an increase in C compared with both of their neighboring segments (log2 CN ratio ≥ 0.3) in a genomic region of otherwise homogeneous CN (difference in log2 CN ratios between the two flanking segments ≤ 0.3). Scatter plots of the number of TD (B) and TDP (C) scores as predicted by WGS or SNP array CN analysis for each one of the 81 TCGA cancer samples for which both types of data were available. (D) Sensitivity and specificity of TDP predictions based on CN data. The TDP classification obtained based on WGS data is used as a reference. (E and F) More stringent differentiation between TDP and non-TDP samples improves the sensitivity (0.80) of TDP sample detection using SNP array data, while maintaining a high degree of specificity (0.94). TDP tumors are defined as samples whose TDP score is higher than 0, as previously defined for WG-sequenced genomes. However, non-TDP samples are identified relative to a non-TDP SNP array-based threshold computed based on the trimodal distribution of TDP scores across the entire SNP array dataset (n = 3,535 samples, threshold = −0.4).
The advantage of analyzing array-based data is the availability of a larger number of cancer samples. When we classified 2,987 primary tumors from several TCGA datasets profiled using the Affymetrix SNP 6.0 array, we were able to reproduce our previous findings that the TDP is significantly enriched in TNBC (P = 1.23E-08) and OV cancer (P = 4.16E-94), whereas it is depleted in non-TNBC (P = 2.41E-20) (Table 1 and Dataset S2). In addition, because of the greater number of available tumors in the TCGA array dataset, we found that uterine corpus endometrial carcinoma (UCEC) also is enriched in TDPs (P = 2.80E-09). Interestingly, most of the UCEC samples classified as TDPs belong to the recently described cluster 4 endometrial carcinoma subtype, which is characterized by an extensive degree of copy number variations (CNVs) and has been shown to share a similar molecular phenotype with TNBC and OV cancer (23). The consistent observation of TDP enrichment/depletion across alternative cancer datasets, generated via diverse genomic technologies and analysis protocols, suggests that our scoring approach is reproducible and generalizable.
TD Breakpoints Occur in Regions of Open Chromatin and Active Transcription.
To investigate possible molecular mechanisms for the generation of the TDP, we examined the genetic, epigenetic, and transcriptional configurations of the chromosomal coordinates affected by TD events in TDP genomes. We focused our analysis on breast cancer [TNB and non-TNB (NTNB) WGS datasets, n = 23 TDP tumor genomes], because this type of cancer was the best-represented tumor type in our WGS sample cohort, and therefore provided adequate statistical power. We first asked whether TDs in TDP occurred in functional regions of the genome enriched for genes. We observed a highly significant positive correlation between the number of TD breakpoints and the number of genes in local windows along the genome (R = 0.5, P = 1.8E-178; 10-Mb sliding windows, 1-Mb offset; Fig. S3A). Furthermore, TD breakpoints were biased to occur within gene bodies (exons + introns) as opposed to intergenically (P < 1.0E-04; Fig. S3B). We assessed the physiological expression levels of genes that are frequently affected by breast cancer TD breakpoints in normal breast tissue. Based on a collection of 106 normal breast epithelium samples from the TCGA breast cancer dataset, genes located at the boundaries of TDs show significantly higher levels of activity in the normal breast when compared with the entire gene population (P < 2.2E-16; Fig. S3C). This observation is consistent with the positioning of TD boundaries near genes with antioncogenic signals, which would subsequently be disrupted during TD formation. However, it also suggests that TD formation requires transcriptional activity. Indeed, we observed a significant enrichment of Pol2 binding sites as well as histone modification marks associated with an open chromatin configuration (H3K4me3, H3K4me1, and H3K27ac) in the proximity of TD breakpoints (P < 1.0E-04; Fig. S3 D–F). This finding is in agreement with recent findings describing a strong affinity of structural variation breakpoints for genomic regions characterized by protein binding and euchromatin (22). By contrast, H3K9me3 signals, which mark heterochromatin, were depleted from TD breakpoint regions (P < 1.0E-04; Fig. S3 E and F). Concordant results were obtained by testing different nonoverlapping symmetrical windows around the TD breakpoints, showing that significant associations between functionalized chromatin regions and TDs are maintained up to ∼200–500 kb from the TD breakpoints (Fig. S3G). Overall, these results concordantly indicate a significantly higher likelihood for TD breakpoints to affect transcriptionally active, easily accessible chromatin regions. The mechanistic underpinnings of this relationship are unclear. One possibility is that if TDs are related to replisome stalling, collisions between the replisome and ongoing transcription might be more common in highly transcribed genes. Alternatively, the TDs embedded within certain highly transcribed genes may be preferentially selected during tumor evolution (discussed below).
Fig. S3.
Molecular features of the genomic regions affected by TD breakpoints in TDP cancer genomes. (A) TD breakpoints cluster in gene-dense regions. The scatter plot shows a positive correlation between gene density and TD breakpoint density, computed per 10-Mb overlapping windows (1-Mb offset) along the entire genome. The combined TD coordinate data corresponding to the total of 50 TDP tumor genomes identified via WGS (including all available tumor types) were used in this analysis. The Pearson correlation coefficient (R) and its corresponding P value are reported in the graphs. (B) TDs are more likely to engage gene bodies than intergenic regions. Histogram bars represent the fraction of TD breakpoints that map within gene bodies in TDP genomes. A red line indicates the overall fraction of the genome occupied by gene bodies (including coding and noncoding sequences). ***P < 0.0001, computed using the binomial test. (C) Genes that are frequently located at the boundaries of TDs in TDP breast cancer genomes are generally expressed at high levels in the normal breast epithelium. Density plots represent the distribution of gene expression levels in normal breast tissue samples from the TCGA dataset (n = 106). Median values for each distribution are indicated by dashed lines. A P value (vs. all RefSeq genes, n = 20,502) was computed using the Mann–Whitney U test. (D) Pol2 binding site enrichment in the proximity of breast cancer TD break points. Histogram bars correspond to the average OR of 43 Pol2 ChIP-seq datasets. ***P < 0.0001. (E and F) Histone modification marks enrichment/depletion in the proximity of breast cancer TD breakpoints. The results shown correspond to ChIP-seq datasets generated from the HMEC (E) and the vHMEC (F) cell lines. ***P < 0.0001. (G) Enrichment ORs for different histone modification marks in the proximity of breast cancer TD breakpoints in TDP breast tumors (n = 23 tumors). ChIP-seq data for both the HMEC (Top) and the vHMEC (Bottom) cell lines are shown. Each bin on the horizontal axis represents a range of nonoverlapping distances (e.g., a mark between 10 kb and 20 kb corresponds to the enrichment in regions >10 kb but <20 kb from the nearest TD breakpoint).
Genomic Features of TDs in TDP and Non-TDP Tumors.
A comparison between the genomic properties of structural rearrangements occurring in TDP and non-TDP samples shows a striking difference in the per-sample median TD span size, with TDP samples having significantly smaller median spans (median span size = 89.9 kb for TDPs and 1,189.7 kb for non-TDPs; P = 7.23E-09; Fig. 2A). More specifically, by plotting the distribution of the collection of all individual TD spans for TDP and non-TDP genomes (WGS dataset; n = 50 and n = 227, respectively), we observed that whereas non-TDP tumors feature a continuum range of very large TDs reaching a plateau at around 1 Mb, TDP samples are characterized by two sharper TD span distribution modes at ∼10 kb and ∼250 kb (Fig. 2B). This finding suggests that in TDP tumors, the mechanism for generating TDs may be different than for non-TDP tumors.
Fig. 2.
Genomic features of TDs in TDP and non-TDP tumors. (A) Correlation of TDP score and median TD span size across the 277 tumor genomes analyzed by WGS. Horizontal lines indicate the overall median span size for the TDP and non-TDP sample subgroups. A P value was computed using Student’s t test. (B) TD span distributions for the TDP and the non-TDP sample groups. TDP samples feature TDs with span peaks at ∼10 kb and ∼150 kb. Non-TDP samples feature a much larger TD span range, which homogeneously ranges from ∼1 to ∼10 Mb. A P value for the distance between the two empirical distributions was generated using the two-sample Kolmogorov–Smirnov test. (C) Sequence analysis of TD breakpoints across TDP (n = 4) and non-TDP (n = 7) TNBC cell line genomes. ORs and P values were computed using Fisher’s exact test. (D) Replication time (RT) of genes located inside or on the boundary of TDs in TDP and non-TDP samples based on the breast cancer dataset. RT is expressed on a scale of 100 (early) to 1,500 (late). P values were computed based on the Mann–Whitney U test.
We directly sequenced the rearrangement junctions of 122 TDs from 11 different TNBC cell lines of both TDP and non-TDP types, and analyzed the sequences at the breakpoint junctions for patterns indicative of specific DNA repair mechanisms (10, 21, 24). We classified the validated breakpoint junctions into those junctions characterized by the presence of short (<10 bps) or long insertions; short (<5 bps), long, or no microhomology (MH); or long-range imperfect homology (Fig. 2C). The large majority of TDs in TDP tumors (72%, range: 46–82%) show overlapping MH between the two DNA segments contributing to the rearrangement junction, suggesting that the underlying mechanism entails MH-mediated end-joining or MH-mediated break-induced replication (MMBIR) (13, 24). Significantly, only 40% (range: 27–86%) of TDs found in non-TDP tumors show a similar profile [odds ratio (OR) = 3.6, P = 6E-04; Fig. 2C and Dataset S3]. By contrast, TD rearrangements characterized by long-range imperfect homology, a signature indicative of nonallelic homologous repair (NAHR) (24), are prevalent in non-TDP tumors [23% (range: 0–50%) vs. 7% (range: 0–31%) in TDPs; OR = 0.25, P = 2E-02; Fig. 2C and Dataset S3]. These differences further support the idea that distinct DNA repair mechanisms may underlie the formation of TDs in TDP and non-TDP tumors.
Recent evidence has revealed meaningful correlations between DNA replication timing, genomic instability, and the emergence of DNA mutations (25, 26). Indeed, we found a significant association between TD-affected genes and replication timing (27). Genes truncated by TD boundaries are found in late replication regions, and genes spanned by TDs are enriched in early replicating regions (P < 2.2E-16 and P < 2.2E-16 for the TDP set; P = 9.3E-07 and P < 2.2E-16 for the non-TDP set; Fig. 2D). This specific pattern of replication timing is consistent across all samples (TDPs and non-TDPs), and it may reflect a shortage of DNA repair opportunities in late S phase, leading to an increased incidence of misrepaired double-strand breaks resulting in CNVs (25). However, given that DNA replication typically encompasses ∼400- to 800-kb chromosomal domains, it is plausible that the shorter TDs found in TDP genomes are generated within intrareplication timing domains, whereas, the larger, non-TDP TDs are more likely to result from the spatial proximity of distinct replication domains through the tridimensional looping of chromatin structures.
TDP Is Characterized by the Coordinated Perturbation of Several Cancer Genes.
One of the most direct consequences of DNA segmental duplication is the increased expression of the genes that are entirely comprised within the rearrangement, whose copy number is thus augmented. We hypothesize that a genomic configuration generating a large number of segmental duplications would represent a cancer genomic mechanism for the modulation of hundreds of potential oncogenic signals, providing a selective advantage for the TDP cancer cell. To assess this possibility, we first compared changes in gene expression between normal and tumor breast samples, with respect to the genes found to be most frequently affected by TDs in the TDP breast cancer WGS dataset (n = 23; Dataset S4). As hypothesized, genes that are frequently found inside TDs are generally overexpressed in breast cancers when compared with the normal breast epithelium (median log2-fold change = 0.17, P = 4.0E-16). In contrast, genes frequently located at the boundaries of TDs appear to be down-regulated in breast cancers (median log2-fold change = −0.3, P = 5.0E-05) (Fig. 3A). Moreover, genes frequently encompassed by TD segments are enriched in known oncogenes (P = 1.2E-02) and genes whose increased expression levels are associated with a poor prognosis for patients with breast cancer (P = 3.3E-05), whereas genes that map to TD boundaries are most significantly associated with known (P = 5.9E-05) and putative tumor suppressor genes (STOP genes, P = 5.1E-04; good prognosis genes, P = 4.6E-12; Fig. 3B). We confirmed these findings by identifying the genes affected by TD-like features predicted using SNP array data, which provided a significantly larger dataset (n = 418 TDP tumor samples; Fig. S4A). Indeed, well-known oncogenes, such as paired box 8 (PAX8), erb-b2 receptor tyrosine kinase 2 (ERBB2), and MYC, are among the most recurrent genes that are spanned by a TD across TDP samples, whereas known tumor suppressor genes, such as RAD51L, PTEN, and RB1, populate the list of the top genes affected by TD breakpoints (Fig. S4B and Dataset S5).
Fig. 3.
TDP is characterized by the coordinated perturbation of several cancer genes. (A) Fold change (FC) in gene expression (breast tumor/normal breast) for genes frequently located inside or at the boundary of TDs in TDP tumors (P values determined by the Mann–Whitney U test). (B) Genes frequently affected by a TD breakpoint are enriched in anticancer genes (Left), whereas genes frequently spanned by a TD are enriched in procancer genes (Middle). (Right) Short-span TDs appear to interfere with anticancer most frequently as opposed to procancer gene integrity. (P values determined by Fisher’s exact test).
Fig. S4.
TD-like features specifically affect TSGs and oncogenes. (A) Data from 418 TDP genomes assessed by SNP array (TNB, NTNB, OV, and UCEC datasets). P values and ORs were computed using Fisher’s exact test. NS, not significant. (B) Histograms of frequencies for genes found at the boundaries (Left) or inside (Right) TD-like features in TDP tumors. Thresholds for frequency significance were defined based on 1,000 random gene sampling as described in Materials and Methods. Specific examples of oncogenes (red) and TSGs (blue) are indicated by arrows, together with the number of unique TDP tumors in which they are affected. (C) Heat map of co-occurrences for the top 25 genes found inside (red) and at the boundaries of (blue) TD-like features in TDP tumors. The top known cancer genes are indicated with the percentage of samples in which they are affected. (D) Co-occurences are likely for genes that map within a short distance of each other, and are therefore affected by the same TDs. The top 25 TD-inside genes shown in C are clustered based on chromosomal location. (E) Overview of all TD-like features at specific chromosomal loci. TD-like features are color-coded based on their effect on the gene of interest depicted in each graph [i.e. PAX8 (Top), PTEN (Bottom)]: gray, no effect; red, gene duplication (the target gene is located inside the TD); blue, gene disruption [the target gene located at the TD boundary (i.e., BP)]. BP, breakpoint.
This systems strategy to generating the cancer state supposes that many different combinations of oncogenic signals would suffice as opposed to a single dominant oncogenic cassette such as the cassette proposed for genes associated with ERBB2 amplification (9). To test this strategy, we examined the frequency of specific one-gene and multiple-gene combinations affected by TDs across 418 TDP genomes assessed using SNP array data (TNB, NTNB, OV, and UCEC datasets) and found that only up to a maximum of 15.5% of tumors share TD-like features affecting a single common tumor suppressor gene (RAD51L1 and, at lower frequencies, WWOX, NF1, RB1, PTEN, and BRCA1; Fig. S4B and Dataset S5), and, even less frequently, an oncogene (i.e., PAX8, duplicated in 10.5% of tumors, followed by ERBB2, ERBB3, TERC, STAT2, CDK2, and MYC; Fig. S4B and Dataset S5). In addition, two-gene combinations are relatively rare, with the top-scoring gene pairs being those pairs that map within a short distance of each other, and are therefore affected by the same TDs (e.g., PAX8, PSD4, which are coordinately duplicated in 8.9% of the tumors examined, or PAX8, CBWD2, IL1RN, which are coordinately duplicated in 6% of the tumors examined (Fig. S4 C–E). Much rarer are two-gene combinations comprising frequent TD-boundary genes (Fig. S4C), arguing against the presence of a dominant TD-affected cancer gene or small gene set.
Intriguingly, we observed that the shorter span TDs seen exclusively in TDP (∼10 kb) do not cause the segmental duplication of full-length genes but disrupt gene body integrity. We found that 38.2% (1,181 of 3,086) of the short-span TDs (span < 100 kb) present in the 50 TDP cancer genomes analyzed by WGS are completely embedded within a gene body, often disrupting the intron/exon structure (P < 0.001; Fig. S5). Moreover, we observed that the genes affected by these short TDs are more likely to function as anticancer as opposed to procancer genes, because they are enriched in TSGs (P = 1.5E-03) and putative TSGs (P = 1.8E-11), while being depleted for oncogenes (known oncogenes, P = 7.3E-03; poor prognosis genes, P = 3.1E-04; Fig. 3B).
Fig. S5.
Short-span TDs cause TSG disruption. (A) Short-span TDs (<100 kb) are more likely to fall completely within gene bodies than expected by chance. Short-span TD genomic coordinates (n = 3,086, based on WGS data from 50 TDP cancer genomes) were randomly permuted 1,000 times, preserving their sizes. At each permutation, the percentage of TDs integrally falling within gene bodies was recorded to generate the expected distribution. A red vertical line indicates the observed percentage of gene-embedded TDs, which exceeds all of the 1,000 permuted values. (B) UCSC Genome Browser screen shot showing the location of two short-span TDs affecting the integrity of the PTEN TSG on chromosome (chr) 10.
Taken together, these results strongly suggest that the consequence of generating many TDs is a genome-wide mechanism that simultaneously augments (albeit moderately) the expression of many oncogenes and suppresses the expression of antioncogenes/antitumor suppressors. In this model, there is no obvious genetic driver by virtue of levels of expression or the frequency of occurrence. Given these findings, and the fact that the TDP characteristic is presumably established in the preneoplastic cells prior to the generation of TDs, we searched for genetic alterations that might cause a cell to adopt a TDP.
Insights into the Molecular Background Favoring TDP Formation.
In the first analysis, we could not find enrichment of specific TDs in the TDP tumors that could explain the unique genomic features associated with the phenotype. This result suggested that there may be intrinsic molecular differences between TDP and non-TDP tumors that induce the TDP and that the changes in gene expression arising from tandem duplicons are a consequence of the TDP.
To identify factors that may correlate with the molecular mechanisms underlying the phenotype, we investigated the characteristics of TDP as compared with non-TDP samples within each of the three most highly TDP-enriched tumor datasets: TNB, OV, and UCEC. In addition, we extended our analysis to non-TNBCs (NTNB dataset), which, although depleted in TDPs as a cancer group, comprised a sufficient number of TDP and non-TDP samples to perform statistical comparisons. We first computed the overall mutation burden as the total number of genes per sample that are affected by at least one nonsilent mutation as assessed by exome sequencing (23, 28, 29). Although the TNB, NTNB, and, to a lesser extent, OV datasets showed a significantly higher mutation burden in the TDP subgroup (P = 3.7E-05, P = 9.4E-06, and P = 4.0E-02, respectively), this trend was not consistent in the remaining dataset (UCEC; Fig. S6).
Fig. S6.
TDP samples do not consistently show a higher mutation burden compared with non-TDP samples. Box plots represent distributions in the number of unique genes per sample that are affected by nonsilent somatic mutations. Although there is a significant increase in the overall number of mutations detected in TDP compared with non-TDP samples in the two breast cancer datasets analyzed, and with a more modest significant increase in the OV dataset, the trend was completely reversed in the UCEC dataset. TDP status was assigned based on SNP array data. P values were computed using the Mann–Whitney U test.
We therefore focused on individual gene mutations to search for genes that, when mutated, are associated with the TDP. For each cancer dataset analyzed, we compiled a list of frequently mutated genes (i.e. mutated in at least 15% of cases within either the TDP or non-TDP sample subgroup). Somatic mutation frequencies were then compared between TDP and non-TDP tumors using Fisher’s exact test, and significant differences were assessed across cancer datasets (Dataset S6). Of a total of 56 frequently mutated genes, the TP53 gene is the only one whose somatic mutation rate is recurrently higher in TDP relative to non-TDP samples across different tumor types, with all of the four examined datasets showing a significant enrichment (TNB, OR = 7.6; NTNB, OR = 4.6; OV, OR = 5.2; UCEC, OR = 60.4) (Fig. 4A and Dataset S6).
Fig. 4.
Loss of the TP53 and BRCA1 tumor suppressor genes in the context of abnormal DNA replication may provide a permissive background for the insurgence of the TDP. (A) TP53 mutation rate is recurrently higher in TDP samples compared with non-TDP samples. ORs and corresponding P values refer to the enrichment of TDP samples for samples with gene disruption. Percentages of TDP and non-TDP samples carrying the gene disruption are indicated in purple and green, respectively. (B and C) DNA replication genes are consistently up-regulated in TDP vs. non-TDP samples. (B) Top 10 GO terms significantly enriched in up-regulated genes (TDP vs. non-TDP) across the four different datasets analyzed. (C) Heat map of individual gene expression levels. Tumor samples are sorted based on tumor type and increasing TDP score. Only the 23 DEGs closely involved in DNA replication are shown. (D) TDP samples are significantly enriched in BRCA1 low expressors across different tumor types. The threshold for low BRCA1 expression was defined based on the bimodal distribution of BRCA1 transcriptional levels in each individual dataset. Graph annotations are as in A. Expression levels of the BRCA1 gene in TDP (purple) and non-TDP (green) TNBC cell lines (E) and PDXs (F) are shown. TDP scores for these genomes were computed based on WGS data. The BRCA1 somatic mutational status is indicated in brackets. mt, mutated; na, not available; wt, wild type. Pearson correlation coefficients (R) and their corresponding P values are reported in each graph. (Right) Box plots of BRCA1 expression values for TDP and non-TDP sample groups, log2-fold changes and Student’s t test P values are shown. (G) TDP samples are enriched for BRCA1-deficient tumors in both the TNB and OV datasets. BRCA1 loss is defined by the presence of germline or somatic mutations, or promoter methylation.
We then asked whether TDP and non-TDP tumors show profiles of differential gene expression that distinguish these two states. Following the identification of differentially expressed genes (DEGs) between TDP and non-TDP tumors within each tumor-type dataset, we performed a gene ontology (GO) enrichment analysis of the lists of up- and down-DEGs to identify biological processes most commonly perturbed in association with the TDP. Up-regulation of genes engaged in biological processes relevant to cell proliferation and DNA replication appeared to be the most robustly and consistently enriched across all four analyzed datasets (Fig. 4B). This finding strongly suggests that TDPs are more prone to increased/perturbed DNA replication. Among the DNA replication genes most frequently up-regulated (in at least three of the four datasets examined), CCNE1 was the one with the highest cumulative fold change, followed by several critical DNA replication initiation factors, including CDT1, MCM2, MCM6, and MCM10 (30, 31) (Fig 4C and Dataset S7).
Although no multigene cassettes engaged in specific biological processes appeared to be consistently down-modulated in the TDP datasets, we observed in the cancer subgroup of TNBC that the BRCA1 gene is among the most significantly down-regulated genes, with a greater than two-fold decrease in TDP vs. non-TDP tumors (P = 0.03; Fig. S7 A and B). Indeed, we found a highly significant enrichment for TDP tumors in BRCA1 low expressors (27% of all TDP samples compared with 0% of non-TDP TNBC samples; P = 3.9E-05). We validated the strong association between low BRCA1 expression and the TDP score in the NTNB (P = 1.3E-03) and OV (P = 3.4E-03) datasets (Fig. 4D), and in two other independent TNBC datasets (P = 0.027 and P = 0.05), all showing an overall negative correlation between BRCA1 expression level and TDP score (Fig. 4 E and F). Furthermore, we found a significant association between BRCA1 promoter methylation status and reduced BRCA1 expression levels in the TNB (R = −0.61, P = 2.3E-07) and OV (R = −0.74, P < 1.0E-05E) datasets (Fig. S7C), pointing at epigenetic silencing as a key mechanism of transcriptional inactivation of BRCA1 in TDP tumors.
Fig. S7.
Loss of BRCA1, but not of BRCA2, in TDP tumors. (A) Box plot of BRCA1 expression values for the TNB dataset. The BRCA1 gene is significantly down-regulated in TDP compared with non-TDP samples. Adj., adjusted. (B) Bimodal distribution of BRCA1 expression values was resolved to identify low expressors. Low BRCA1 expressors are significantly enriched for TDP samples. (C) BRCA1 expression levels are inversely correlated with BRCA1 promoter methylation levels in the TNB and OV datasets (Pearson correlation: R = −0.61, P = 2.3E-07 for the TNB dataset; R = −0.74, P < 1.0E-05 for the OV dataset). The 10% most highly methylated samples at the BRCA1 promoter are indicated in red. (D) Contrary to the BRCA1 gene, the BRCA2 gene is more frequently mutated in non-TDP compared with TDP tumors across different tumor types. Only somatic mutations were analyzed for the UCEC dataset.
Whereas we did not find any enrichment in BRCA1 somatic mutations that distinguishes TDP, when we combined somatic and germline mutations and promoter hypermethylation, we did observe a significant increase in the frequency of BRCA1 disruption in TDP vs. non-TDP tumors in the TNB and OV datasets (OR = 9.8 and OR = 5.1, P = 8.0E-03 and P = 1.0E-03, respectively; Fig. 4G). On the contrary, BRCA2 mutation rates did not show any association with the TDP and, instead, appeared to be modestly but consistently higher in the non-TDP tumor sets (Fig. S7D), raising the hypothesis that the TDP is an exquisite feature of BRCA1 loss and not of BRCA2 loss.
When taken together, these results suggest that a combination of TP53 loss-of-function mutation, BRCA1 reduced expression/activity, and overexpression of DNA replication and cell cycle genes may be required for TDP generation.
We have established that certain multigene expression changes are strongly associated with TDP tumors. We asked whether changes were a result of the TDs or preceded the induction of these structural mutations. Of the 23 DEGs involved in DNA replication and cell cycle associated with TDP, only four (CALR, CCNE1, RAD51, and TK1) were also found inside TDs in multiple TDP samples but at modest frequencies of <5% (Dataset S5). If we removed the TDP tumor samples harboring physical TDs spanning these four differentially expressed DNA replication and cell cycle genes, the association of these genes with the TDP remains statistically significant (P < 1.0E-04; Fig. S8). This observation suggests that their overexpression is likely to be engaged in the establishment of the TDP and is not simply a consequence of the phenotype.
Fig. S8.
TDP-associated overexpression of DNA replication genes does not depend on their duplication status. Frequently up-regulated DNA replication genes that are also often affected by TDs across TDP samples were tested to assess whether their expression levels could be explained by the presence of TDs that increased their CN status. For each gene, TDP samples with TDs spanning its entire length were removed from the analysis of differential gene expression. In all four cases, differences in expression levels between non-TDP and TDP tumors remained significant. ***P < 0.0001, Mann–Whitney U test.
TDP as a Genomic Marker for Drug Sensitivity.
We explored whether the TDP could represent a marker for drug sensitivity by searching the Genomics of Drug Sensitivity in Cancer database (32) for drugs and compounds that differed in their effect between the TDP and non-TDP breast cancer cell lines. Interestingly, cisplatin was among six drugs showing a significant positive correlation between TDP scores (computed based on available WGS data) and IC50 values (Dataset S8). Given the utility of platinum-based therapeutics as neoadjuvants in the clinical management of patients with TNBC (33–35), and the reported association between platinum-based treatment clinical success and a “BRCA-ness” molecular profile (36–38), we hypothesized that the TDP subset of TNBCs may be characterized by a better response to platinum-based chemotherapy. We therefore tested a total of 14 genomically characterized TNBC cell lines and found significant negative correlations between IC50 values relative to both cisplatin and carboplatin treatments and the TDP score (R = −0.57, P = 0.032 for cisplatin; R = −0.58, P = 0.029 for carboplatin) (Fig. 5A). By contrast, olaparib, an inhibitor of the poly ADP ribose polymerase (PARP) shown to have antitumor activity in patients with BRCA-mutated cancer (39, 40), did not show any significant association with the TDP score when tested on our panel of TNBC cell lines (Dataset S9), suggesting that the sensitivity of TDP tumors to cisplatin may not be exclusively related to the mutational status of BRCA1 or BRCA2.
Fig. 5.
TDP as a genomic marker for drug sensitivity. (A) TDP scores correlate with cisplatin or carboplatin sensitivities in TNBC cell lines. Pearson correlation coefficients (R) and their corresponding P values are reported in the graph. Ln, natural logarithm. (B) TDP scores associate with cisplatin sensitivity in vivo. Waterfall plots representing cisplatin response for eight TNB PDX models sorted by decreasing values of TDP scores are shown. Response calls are indicated underneath each bar and were computed based on adapted Response Evaluation Criteria in Solid Tumors (RECIST) criteria as described in SI Materials and Methods.
Remarkably, although the levels of BRCA1 expression correlate with the TDP score in the TNBC cell lines examined, we did not observe any significant association between BRCA1 levels and either cisplatin or carboplatin IC50 values (Dataset S9). This finding indicates that platinum sensitivity correlates more directly with the TDP score than with BRCA1 expression levels and that the TDP score, which is modulated by other genes in addition to BRCA1, may be a key genomic predictor of cisplatin sensitivity in TNBC.
We explored this hypothesis further by testing the in vivo response to cisplatin treatment in eight independent patient-derived xenograft (PDX) models of TNBC. Following a 3-wk-long cisplatin regimen, four of the five TDP PDX models showed remarkable levels of tumor shrinkage, including two complete responses (>80% average tumor shrinkage across all animals in the treatment arm, PDX3 and PDX6) and two partial responses (30–80% average tumor shrinkage, PDX7 and PDX10) (Fig. 5B). On the contrary, none of the three non-TDP models analyzed exhibited a reduction of the original tumor volume after 3 wk of treatment, and all three responses to the cisplatin regimen were classified as progressive disease (>20% average tumor growth; Fig. 5B). Thus, in both established cell lines and in vivo PDXs, TDP status is strongly associated with cisplatin sensitivity.
Discussion
Recent studies have described previously unrecognized massive structural aberration events occurring on a genome-wide scale in human cancer (4, 5, 10, 41). A fundamental challenge is to define a quantitative metric to identify these global genomic configurations in cancer samples systematically and to investigate the role they play in tumorigenesis (42). Here, we devised a mathematical approach to the unbiased recognition of the TDP (9, 10, 15, 19). By applying this TDP scoring metric to a collection of ∼3,000 tumors with genomic data (WGS and/or SNP array), we provide statistical evidence that the TDP is enriched in specific tumor types, suggesting a distinct biological mechanism underlying this phenotype that cuts across histological subtypes (Fig. S9A).
Fig. S9.
Molecular and functional features discriminating between TDs found in TDP and non-TDP cancer genomes. (A) Graphic summary. (B) OncoPrints for the 90 TNBC samples for which RNA-seq, SNP array, and mutation data were available. BRCA1 down-regulation was defined as in Fig. S7B. CCNE1 and CDT1 up-regulation was defined as a greater than twofold increase in expression compared with the average gene expression level across all TNB non-TDP tumors. Thirteen of 33 TDP tumors show perturbation of three or four of the candidate genes, whereas only two of 57 non-TDP tumors do. OR = 17.2, P = 2.1E-05 (Fisher’s exact test).
The mechanisms for the generation of segmental TDs have been previously explored in Saccharomyces cerevisiae (43). Green et al. (43) have shown how defects in the molecular machinery responsible for preventing DNA rereplication can result in head-to-tail segmental duplications in yeast. Notably, the TDs in this study were mediated by NAHR between yeast transposon repetitive elements, a mechanism distinct from the MH-mediated mechanisms that dominate in the TDP tumors we have analyzed here. The authors proposed a mechanism by which the increase in copy number of chromosomal segments can result from the molecular repair of stalled rereplication bubble structures emerging in a permissive molecular background (e.g., following the deregulation of DNA replication proteins). They name this process rereplication-induced gene amplification (RRIGA) and speculate that it may play a critical role in oncogenesis (43, 44). Koszul et al. (45) and Koszul and coworkers (46) identified an MH-mediated POL32-dependent replicative mechanism underlying segmental TD formation in S. cerevisiae. A genetic analysis of MH-mediated TD formation in Escherichia coli by Slack et al. (47) implicated stalled replication as a trigger to the formation of these TDs. These observations are today collectively termed MMBIR (13, 14). Costantino et al. (48) demonstrated the significant enrichment of short-span copy number gains (<200 kb) in an artificial model of DNA replication stress induced through the ectopic overexpression of the CCNE1 gene in the U2OS human osteosarcoma cell line. Our work supports these observations in a spontaneous human cancer setting. Indeed, the size range of the DNA duplications generated via RRIGA in yeast and via CCNE1 overexpression in the U2OS cell line matches the size range of the TDs found in our TDP samples, which we have shown to be characterized by the significant overexpression of replication initiation genes, including CDT1 and CCNE1 (Fig. 4C). We therefore speculate that the mechanism of TD formation in the TDP chromotype may entail replicative mechanisms, such as MMBIR.
Whereas cancers with high amplification of a single locus in non-TDP tumors depend on a dominant driver oncogene, such as ERBB2 or MYC, the TDP is unusual in that there does not appear to be a discernable single cancer driver gene targeted by the TDP. Rather, different combinations of many potential drivers appear to be affected by the widespread genomic distribution of TDs. Indeed, in our analysis of genes perturbed by TDs in TDP, we could not find any individual gene that appears to be affected in more than 15.5% of the samples examined (Dataset S5). However, the TDP configuration generates changes that affect the expression and function of hundreds of genes in a distributed manner within each tumor. Thus, TDP tumors may derive selective growth advantage from a systemic process, namely, genome-wide segmental TD formation, which simultaneously targets many cancer genes distributed across the genome. In seeking to uncover the root genetic aberrations that may underlie the induction of the TDP, we looked at the gene expression and mutational profiles that are frequently found and most strongly associated with the TDP across a number of tumor types. Our findings suggest that the TDP is induced by specific combinations of gene perturbations that (i) cause the loss of genome integrity (i.e., loss of TP53 and BRCA1) and (ii) drive the augmented expression of cell cycle and DNA replication genes (e.g., increased activity of CCNE1, CDT1). In fact, combinations of these TDP-associated gene perturbations occur remarkably more frequently in TDP than in non-TDP TNBC tumors (OR = 17.2, P = 2.1E-05; Fig. S9B). Earlier reports have suggested a BRCA1-independent mechanism for the TDP, based on the absence of BRCA1 mutations in samples (breast and OV carcinomas) with a large number of TDs (15, 19). However, we observed a strong negative correlation between BRCA1 gene expression and the TDP score, as well as the enrichment for BRCA1-defective tumors (assessed by the presence of somatic or germline mutations, or promoter hypermethylation) in TNBC and OV cancer (Fig. 4 D–G). This finding strongly supports a previously unrecognized critical role for BRCA1 loss of function in the induction of the TDP.
Finally, we find that the quantitative assessment of the TDP may have clinical relevance. We describe an association between the extent of TDP and greater sensitivity to platinum-based chemotherapy both in cell lines and in PDXs. It has been reported that breast tumors with perturbations of BRCA1 respond better to cisplatin treatment (37). Although our observations in vitro suggest that cisplatin sensitivity is better correlated with the TDP score than BRCA1 levels or mutational status, we suggest that the TDP score integrates multiple genetic factors, such as TP53 status and select driver gene expression (e.g., CDT1, CCNE1), which may be the genetic components needed for the sensitivity phenotype. Whereas recent neoadjuvant studies suggest that the effectiveness of cisplatin in TNBC is associated with loss of BRCA1 by mutation or low expression (34, 36, 38), it may be that the TDP score is a more robust predictor of response to platinum-based chemotherapies independent of tumor type. Indeed, high TDP scores are enriched in TNBC, in OV cancer, and in the recently described cluster 4 endometrial carcinoma, which have been shown to share a similar transcriptional and molecular profile (23). Given the specific molecular determinants associated with the TDP across tumor types, it will be interesting to investigate the possible benefit of a cisplatin and PARP inhibitor combination in TDP tumors.
In summary, we envision that the TDP assessment may provide a unique genome sequence-based predictive marker for platinum-based drug sensitivity and allow for detailed interrogation of more precise mechanisms of cisplatin sensitivity.
Materials and Methods
WGS Datasets and TDP Classification.
A catalog of somatic structural variation data was compiled from a number of WGS studies, comprising a total of 277 tumor samples, as listed in Dataset S1 (4, 9, 10, 15–22). We manually curated the available structural variation information (relative orientation and mapping coordinates of the discordant mate-pair or paired-end read clusters) from every individual study to classify each reported somatic event into one of the four basic rearrangements: deletion, TD, inversion, or interchromosomal translocation (49). For studies that reported structural variation coordinates relative to the hg18 reference human genome, a lift over to hg19 was performed using the Galaxy Lift-Over tool (https://usegalaxy.org). Previous attempts at describing the genomic features of the TDP have relied on a basic TD count or on the proportion of TDs relative to the total number of structural variations in a cancer genome (10, 15). These approaches lack in robustness, because they are prone to be influenced by observer and technical biases, such as sequencing coverage, and are not able to discriminate between the genome-wide TD prevalence that characterizes the TDP vs. abnormal TD accumulation in a few functional genomic loci, previously described in association with focal amplification (9, 16). Our proposed metric to calculate the TDP score is described in the main text. A visualization of the TDP score distribution density plot across all samples suggested a multimodal distribution (Fig. S1A). We used the normalmixEM function of the mixtools package in R to fit different numbers of mixture components (up to five) to the TDP score value distribution (50), using default estimates as the starting values for the iterative procedure. We compared the resulting mixture model estimates using the Bayesian information criterion and found that a trimodal distribution corresponded to the optimal fit.
TCGA Genomic Datasets.
Affymetrix SNP 6.0 CNV datasets for primary tumor tissues were downloaded from the TCGA Data Portal in the form of level 3 CNV data type (CNV segments). Primary tumor samples from the TCGA breast invasive carcinoma dataset were classified as TNBC (TNB) or non-TNBC (NTNB), according to TCGA clinical annotations (28) (https://tcga-data.nci.nih.gov/tcga/).
TCGA somatic mutation data for the TNB, NTNB, OV, and UCEC datasets were downloaded from the UCSC Cancer Genomic Browser (https://genome-cancer.ucsc.edu) as gene-based somatic mutation calls generated by the TCGA PANCANCER Analysis Working Group. For each sample, any gene affected by at least one nonsilent somatic mutation (nonsense, missense, short insertion/deletion, splice site mutation, stop codon read-through) was considered somatically mutated.
RNA-sequencing (RNA-seq) gene expression data for the TNB, NTNB, OV, and UCEC datasets were downloaded from the TCGA Data Portal in the form of level 3 RSEM raw expression estimates, generated using the TCGA RNA Sequencing Version 2 analysis pipeline. Raw gene read counts were then scale-normalized using the trimmed mean of M-values normalization method before being converted into log counts per million with associated precision weights using the voom transformation included in the limma package in R (51).
Detection of TD-Like Features Based on Copy Number Profiling.
Based on the assumption that an isolated TD within any given genomic locus will result in a chromosomal segment with uniform, increased copy number compared with its two adjacent genomic regions, we scanned SNP array genomic data for CNV profiles indicative of TD-like features (i.e., copy number segments with a length ranging from 1 kb to 2 Mb, characterized by a copy number increase of one or more units and flanked by segments of equal copy number) (15) (Fig. S2A). The identified TD-like features were then used to compute TDP scores following the same metric and threshold applied for WGS data (as described in Results).
Analysis of Differential Gene Expression.
To identify DEGs between any two given groups of samples, the RNA-seq expression dataset was first filtered and only genes whose expression value was >1 in at least n − 1 samples [with n = number of samples in the smallest sample group (i.e., TDP, non-TDP)] were retained for further analysis. Sample group comparisons were carried out using the moderated t statistic of the limma package in R (51). A false discovery rate-adjusted P-value threshold of 0.05 was used to identify DEGs.
GO Enrichment Analysis.
Gene enrichment analyses for GO terms were carried out using the topGO package in R (52). Briefly, predefined lists of interesting genes were tested for their enrichment in GO terms against the all-gene background using Fisher’s exact test as the test statistic and the eliminating genes (elim) algorithm as the method for GO graph structure. GO terms with less than 10 annotated genes were removed from the analysis.
Cell Culture and IC50 Determination.
All of the cell lines were purchased from the American Type Culture Collection. They were authenticated by short tandem repeat DNA profiling and regularly tested for Mycoplasma contamination using the MycoAlert PLUS Mycoplasma Detection Kit (Lonza). MB436, HCC38, HCC1187, HCC1395, MDA-MB231, HCC1937, HCC1599, HCC1143, HCC70, DU4475, MDA-MB157, and HCC1806 were maintained in RPMI with 10% (vol/vol) FBS. BT549 was maintained in DMEM with 10% (vol/vol) FBS, and Hs578T was maintained in DMEM with 10% (vol/vol) FBS and 0.01 mg/mL bovine insulin. IC50 value determinations were obtained by plating target cells in 96-well plates at a density of 1–5 × 103 cells per well. After 24 h, cisplatin (Santa Cruz Biotechnology, Inc.) or carboplatin (Selleck Chemicals) was added in triplicate wells to the culture medium in half-log serial dilutions in the range of 3.3 nM to 100 μM. Cells were incubated for 72 h before assessing cell viability using a WST-8 assay (Dojindo Molecular Technologies, Inc.). Absorbance values were normalized to control wells (medium only), and IC50 values were calculated using the IC50 R package (53). Two independent replicate experiments were carried out for each cell line and each treatment, and the average IC50 value from the two experiments was used for the analysis.
WGS of TNBC Cell Lines.
Cell line genomic DNA was isolated from ∼1 × 106 cells using a DNeasy Kit (Qiagen) and fragmented using Covaris E220 (Covaris) to a range of sizes centered on 500 bp. Paired-end DNA libraries were constructed using a NEBNext DNA Library Prep Master Mix set for Illumina (New England BioLabs), including a bead-based size selection to select for inserts with an average size of 500 bp and 10 cycles of PCR. The resulting libraries were quantified by quantitative PCR and pooled in groups of two before being sequenced on one lane of an Illumina HiSeq 2500 platform. Fastq files were paired and run through the next-generation sequencing (NGS) quality control (QC) Toolkit (version 2.3; IlluQC_PRLL.pl) with a quality control cutoff of 30, before alignment to the human reference genome (National Center for Biotechnology Information Build 37 from the 1000 Genomes Project) using bwa (version 0.7.4) and default parameters (bwa mem). The Hydra-Multi algorithm (54) was used to predict structural variation events. All datasets were analyzed at the same time, and structural variation events were filtered as described by Malhotra et al. (55). Only structural variations exclusive to individual datasets were considered for further analysis. WGS data are freely available from the Sequence Read Archive database (www.ncbi.nlm.nih.gov/sra) under project ID SRP057179.
Animal Work.
All animal procedures were approved by The Jackson Laboratory Institutional Animal Care and Use Committee (IACUC) under protocol number 12027.
Additional methods information can be found in SI Materials and Methods.
SI Materials and Methods
TD-Inside and TD-Boundary Genes.
A catalog of genes mapping to regions spanned by a TD [TD-inside genes (i.e., genes that are completely embedded inside a TD)] in breast cancer was generated by intersecting gene bodies’ coordinates (defined according to the TCGA General Annotation Files library) with TDs’ coordinates for the 23 TDP breast cancer samples analyzed, and requiring TDs to overlap 100% of each gene feature. The 5% largest TDs were removed from the analysis, because they are more likely to generate gene count biases, which resulted in a total number of 3,475 TDs, with a maximum span size of 4.1 Mb. Conversely, gene features that are only partially spanned by any given TD (i.e., genes whose bodies are interrupted by at least one TD breakpoint) were labeled as TD-boundary genes.
To identify genes found inside or at the boundary of TDs at a statistically significant frequency (i.e., frequently TD-affected genes), we compared observed gene counts with expected values as estimated through 1,000 random gene samplings. For each sampling, we computed the number of TD-inside and TD-boundary genes and stored the value corresponding to the median gene count + 2 SDs to build empirical distributions of expected TD-inside and TD-boundary gene counts. Frequently TD-affected genes were then identified by setting a gene count threshold equal to the round-up integer of the maximum value obtained in the empirical distributions. According to this calculation, we considered as significantly frequent any gene characterized by a count equal to or higher than 2. The full lists of frequently TD-affected genes are provided in Dataset S4.
In a similar way, we computed lists of genes frequently affected by TD-like features using the TDs’ coordinates corresponding to a total of 418 TDP tumors analyzed using SNP array data (TNB, NTNB, OV, and UCEC datasets). Based on the gene count empirical distribution obtained by generating 1,000 random gene samplings, we set significance thresholds at 5 and 11 for genes frequently found at the TD boundaries and inside TDs, respectively (Fig. S4B and Dataset S5).
Cancer Gene Lists.
A list of 1,035 known tumor suppressor genes (TSGs) was generated as the union of (i) known recessive TSGs according to the Cancer Gene Census (56), (ii) homozygously inactivated genes observed by WGS in the Catalogue of Somatic Mutations in Cancer (COSMIC) database (57), (iii) genes tagged by “Entrez Query: Tumour Suppressor” in the CancerGenes database (58) (genes that also matched the “Entrez Query: Oncogene” search were considered ambiguous and manually reassigned to the correct gene list in case of clear literature evidence or excluded from both lists in case of uncertainty), and (iv) human protein coding TSGs as described in the TSGene database (59). Of these genes, 1,020 matched gene symbols reported in the TCGA expression dataset and were used for enrichment analysis. A list of 962 known oncogenes was generated as the union of (i) genes tagged by “Entrez Query: Oncogene: in the CancerGenes database (58) (genes that also matched the “Entrez Query: Tumour Suppressor” search were considered ambiguous and manually reassigned to the correct gene list in case of clear literature evidence or excluded from both lists in case of uncertainty), (ii) genes amplified and overexpressed in cancer (60), or (iii) essential genes (61). Of these genes, 921 matched gene symbols reported in the TCGA expression dataset and were used for enrichment analysis.
STOP and GO genes were identified as genes that negatively and positively regulate cell proliferation, respectively, through a genome-wide shRNA screening by Solimini et al. (61). Of the 3,596 STOP and 1,127 GO genes identified in the study, 3,377 and 1,039 matched gene symbols reported in the TCGA expression dataset, respectively, and were used for enrichment analysis.
Genes associated with the prognosis data (good and poor prognosis genes) of patients with breast cancer were identified as previously described (9).
Pol2 and Histone Modification ChIP-Sequencing Data Retrieval and Enrichment Analysis.
Peaks corresponding to a total of 43 Pol2 ChIP-sequencing (ChIP-seq) experiments across a variety of cell lines were downloaded from the ENCODE January 2011 data freeze (62). Histone modification peaks relative to the human mammary epithelial cell line (HMEC) were downloaded from ENCODE, under accession number GSE29611. Histone modification data relative to the variant human mammary epithelial cell line (vHMEC) were obtained from the NIH Roadmap Epigenomics Project (accession number GSE16368) (63). Peaks were called using MACS2.09 software (64, 65) with the following settings “macs2 callpeak --nomodel and --shiftsize 100 --broad --keep-dup=1,” and using matching input ChIP-seq datasets.
The enrichment of Pol2 binding and histone modification marks in the vicinity of TD breakpoints was calculated as described elsewhere (22). Briefly, for each breast cancer TD breakpoint, we defined a symmetrical genomic window extending 200 kb upstream and downstream of the breakpoint coordinate. We then computed the fraction of Pol2 binding regions or histone modification peaks falling within the collection of TD breakpoint windows. Finally, we calculated ORs and z-scores of the enrichment/depletion of Pol2 protein binding or histone modification marks within the defined TD breakpoint windows.
PDXs.
TNBC PDX models were established at The Jackson Laboratory campus (JAX-West) and tested for cisplatin sensitivity as previously published (66). Briefly, patient tumor material acquired from biopsy or surgical resection was implanted subcutaneously into the flank of NOD-scid IL-2r gamma-chain null female mice (8–10 weeks old). Models were considered “established” when log-phase growth in a second passage was evident. Individual tumor-bearing mice were randomized into treatment cohorts of at least six animals each on an accrual basis when tumors reached a volume of 150 mm3 (day 0), at which point each tumor model was assessed for its response to cisplatin treatment administered at a dose of 2 mg/kg of body weight and following a 3-wk regimen consisting of one i.v. injection per week. Changes in tumor volumes were measured twice a week for a full 4 wk from the beginning of the treatment or until tumor volumes reached the 1,500-mm3 end point. Treatment outcome was evaluated in terms of total response (percentage of tumor shrinkage/growth at day 20 relative to day 0). We analyzed the seven TNBC PDX models that were available as part of The Jackson Laboratory inventory of TNBC PDX live tumor-bearing mice. In combination with a high number of replicates per model (6–10 animals per treatment arm), we obtained enough power to observe the statistically significant effect of the TDP configuration on cisplatin response.
A fragment of the original engrafted tumors was used for DNA and RNA isolation. Nextera mate-pair genomic libraries were generated and sequenced on a HiSeq 2500 Illumina platform, as described elsewhere (22). Sequenced reads were analyzed through Xenome (67) against a combined human Hg19 and mouse Mm10 reference to identify and remove any mouse contaminant read pairs. Structural variations were then predicted using a custom structural variation pipeline that combines the Hydra-Multi (54, 55) and DELLY (68) algorithms. Structural variation data obtained from the peripheral blood lymphocyte DNA of four independent individuals (9) were used to remove germline variants.
RNA-seq libraries were generated following the Illumina TruSeq paired-end library preparation protocol and were sequenced on a HiSeq 2500 Illumina platform. Following the filtering of mouse reads using Xenome, human-specific paired-end reads were aligned to the hg19/GRCh37-based “UCSC” gene reference transcriptome using Bowtie2, and RSEM (69) was used to estimate the abundance of each individual gene. Upper quartile normalization was performed within each tumor sample after discarding genes with no counts. Finally, gene expression levels were adjusted using a percentile rank transformation.
Response Calls.
The in vivo response to chemotherapy treatment was measured based on the percentage tumor volume change (ΔVol) at day 20 compared with its baseline (day 0). The criteria for response were adapted from Response Evaluation Criteria in Solid Tumors (RECIST) criteria (70) and defined as follows: complete response, ΔVol < −80%; partial response, −80% < ΔVol < −30%; stable disease, −30% < ΔVol < 20%; progressive disease, ΔVol > 20%.
OncoPrints.
OncoPrints were generated using the OncoPrinter tool from the cBioPortal website (71, 72).
Supplementary Material
Acknowledgments
WGS library preparation and sequence data analysis were performed by Scientific Services at The Jackson Laboratory, Bar Harbor, ME. Research reported in this publication was partially supported by the National Cancer Institute under Award P30CA034196. J.H.C. was supported by the National Human Genome Research Institute and National Cancer Institute of the NIH under Awards R21HG007554 and R21CA184851.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission. T.W. is a guest editor invited by the Editorial Board.
Data deposition: Whole-genome sequencing data are freely available from the Sequence Read Archive (SRA) database, www.ncbi.nlm.nih.gov/sra (Project ID SRP057179).
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1520010113/-/DCSupplemental.
References
- 1.Hanahan D, Weinberg RA. Hallmarks of cancer: The next generation. Cell. 2011;144(5):646–674. doi: 10.1016/j.cell.2011.02.013. [DOI] [PubMed] [Google Scholar]
- 2.Yates LR, Campbell PJ. Evolution of the cancer genome. Nat Rev Genet. 2012;13(11):795–806. doi: 10.1038/nrg3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458(7239):719–724. doi: 10.1038/nature07943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Baca SC, et al. Punctuated evolution of prostate cancer genomes. Cell. 2013;153(3):666–677. doi: 10.1016/j.cell.2013.03.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Stephens PJ, et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell. 2011;144(1):27–40. doi: 10.1016/j.cell.2010.11.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zhang CZ, Leibowitz ML, Pellman D. Chromothripsis and beyond: Rapid genome evolution from complex chromosomal rearrangements. Genes Dev. 2013;27(23):2513–2530. doi: 10.1101/gad.229559.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zhang F, Carvalho CM, Lupski JR. Complex human chromosomal and genomic rearrangements. Trends Genet. 2009;25(7):298–307. doi: 10.1016/j.tig.2009.05.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gisselsson D, et al. Chromosomal breakage-fusion-bridge events cause genetic intratumor heterogeneity. Proc Natl Acad Sci USA. 2000;97(10):5357–5362. doi: 10.1073/pnas.090013497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Inaki K, et al. Systems consequences of amplicon formation in human breast cancer. Genome Res. 2014;24(10):1559–1571. doi: 10.1101/gr.164871.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Stephens PJ, et al. Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature. 2009;462(7276):1005–1010. doi: 10.1038/nature08645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Crasta K, et al. DNA breaks and chromosome pulverization from errors in mitosis. Nature. 2012;482(7383):53–58. doi: 10.1038/nature10802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Maciejowski J, Li Y, Bosco N, Campbell PJ, de Lange T. Chromothripsis and Kataegis Induced by Telomere Crisis. Cell. 2015;163(7):1641–1654. doi: 10.1016/j.cell.2015.11.054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Willis NA, Rass E, Scully R. Deciphering the Code of the Cancer Genome: Mechanisms of Chromosome Rearrangement. Trends Cancer. 2015;1(4):217–230. doi: 10.1016/j.trecan.2015.10.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zhang CZ, et al. Chromothripsis from DNA damage in micronuclei. Nature. 2015;522(7555):179–184. doi: 10.1038/nature14493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ng CK, et al. The role of tandem duplicator phenotype in tumour evolution in high-grade serous ovarian cancer. J Pathol. 2012;226(5):703–712. doi: 10.1002/path.3980. [DOI] [PubMed] [Google Scholar]
- 16.Hillmer AM, et al. Comprehensive long-span paired-end-tag mapping reveals characteristic patterns of structural variations in epithelial cancer genomes. Genome Res. 2011;21(5):665–675. doi: 10.1101/gr.113555.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Natrajan R, et al. A whole-genome massively parallel sequencing analysis of BRCA1 mutant oestrogen receptor-negative and -positive breast cancers. J Pathol. 2012;227(1):29–41. doi: 10.1002/path.4003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Nik-Zainal S, et al. Breast Cancer Working Group of the International Cancer Genome Consortium Mutational processes molding the genomes of 21 breast cancers. Cell. 2012;149(5):979–993. doi: 10.1016/j.cell.2012.04.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.McBride DJ, et al. Tandem duplication of chromosomal segments is common in ovarian and breast cancer genomes. J Pathol. 2012;227(4):446–455. doi: 10.1002/path.4042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Imielinski M, et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell. 2012;150(6):1107–1120. doi: 10.1016/j.cell.2012.08.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Yang L, et al. Diverse mechanisms of somatic structural variations in human cancer genomes. Cell. 2013;153(4):919–929. doi: 10.1016/j.cell.2013.04.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Grzeda KR, et al. Functional chromatin features are associated with structural mutations in cancer. BMC Genomics. 2014;15:1013. doi: 10.1186/1471-2164-15-1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kandoth C, et al. Cancer Genome Atlas Research Network Integrated genomic characterization of endometrial carcinoma. Nature. 2013;497(7447):67–73. doi: 10.1038/nature12113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Hastings PJ, Lupski JR, Rosenberg SM, Ira G. Mechanisms of change in gene copy number. Nat Rev Genet. 2009;10(8):551–564. doi: 10.1038/nrg2593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.De S, Michor F. DNA replication timing and long-range DNA interactions predict mutational landscapes of cancer genomes. Nat Biotechnol. 2011;29(12):1103–1108. doi: 10.1038/nbt.2030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sima J, Gilbert DM. Complex correlations: Replication timing and mutational landscapes during cancer and genome evolution. Curr Opin Genet Dev. 2014;25:93–100. doi: 10.1016/j.gde.2013.11.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Chen CL, et al. Impact of replication timing on non-CpG and CpG substitution rates in mammalian genomes. Genome Res. 2010;20(4):447–457. doi: 10.1101/gr.098947.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Cancer Genome Atlas N. Cancer Genome Atlas Network Comprehensive molecular portraits of human breast tumours. Nature. 2012;490(7418):61–70. doi: 10.1038/nature11412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Cancer Genome Atlas Research Network Integrated genomic analyses of ovarian carcinoma. Nature. 2011;474(7353):609–615. doi: 10.1038/nature10166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Caillat C, Perrakis A. Cdt1 and geminin in DNA replication initiation. Subcell Biochem. 2012;62:71–87. doi: 10.1007/978-94-007-4572-8_5. [DOI] [PubMed] [Google Scholar]
- 31.Powell SK, et al. Dynamic loading and redistribution of the Mcm2-7 helicase complex through the cell cycle. EMBO J. 2015;34(4):531–543. doi: 10.15252/embj.201488307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Yang W, et al. Genomics of Drug Sensitivity in Cancer (GDSC): A resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013;41(Database issue):D955–D961. doi: 10.1093/nar/gks1111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Sikov WM, et al. Impact of the addition of carboplatin and/or bevacizumab to neoadjuvant once-per-week paclitaxel followed by dose-dense doxorubicin and cyclophosphamide on pathologic complete response rates in stage II to III triple-negative breast cancer: CALGB 40603 (Alliance) J Clin Oncol. 2015;33(1):13–21. doi: 10.1200/JCO.2014.57.0572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.von Minckwitz G, Martin M. Neoadjuvant treatments for triple-negative breast cancer (TNBC) Ann Oncol. 2012;23(Suppl 6):vi35–vi39. doi: 10.1093/annonc/mds193. [DOI] [PubMed] [Google Scholar]
- 35.von Minckwitz G, et al. Neoadjuvant carboplatin in patients with triple-negative and HER2-positive early breast cancer (GeparSixto; GBG 66): A randomised phase 2 trial. Lancet Oncol. 2014;15(7):747–756. doi: 10.1016/S1470-2045(14)70160-3. [DOI] [PubMed] [Google Scholar]
- 36.Davis SL, Eckhardt SG, Tentler JJ, Diamond JR. Triple-negative breast cancer: Bridging the gap from cancer genomics to predictive biomarkers. Ther Adv Med Oncol. 2014;6(3):88–100. doi: 10.1177/1758834013519843. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Stefansson OA, Villanueva A, Vidal A, Martí L, Esteller M. BRCA1 epigenetic inactivation predicts sensitivity to platinum-based chemotherapy in breast and ovarian cancer. Epigenetics. 2012;7(11):1225–1229. doi: 10.4161/epi.22561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Silver DP, et al. Efficacy of neoadjuvant Cisplatin in triple-negative breast cancer. J Clin Oncol. 2010;28(7):1145–1153. doi: 10.1200/JCO.2009.22.4725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Fong PC, et al. Inhibition of poly(ADP-ribose) polymerase in tumors from BRCA mutation carriers. N Engl J Med. 2009;361(2):123–134. doi: 10.1056/NEJMoa0900212. [DOI] [PubMed] [Google Scholar]
- 40.Farmer H, et al. Targeting the DNA repair defect in BRCA mutant cells as a therapeutic strategy. Nature. 2005;434(7035):917–921. doi: 10.1038/nature03445. [DOI] [PubMed] [Google Scholar]
- 41.Liu P, et al. Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements. Cell. 2011;146(6):889–903. doi: 10.1016/j.cell.2011.07.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Korbel JO, Campbell PJ. Criteria for inference of chromothripsis in cancer genomes. Cell. 2013;152(6):1226–1236. doi: 10.1016/j.cell.2013.02.023. [DOI] [PubMed] [Google Scholar]
- 43.Green BM, Finn KJ, Li JJ. Loss of DNA replication control is a potent inducer of gene amplification. Science. 2010;329(5994):943–946. doi: 10.1126/science.1190966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Finn KJ, Li JJ. Single-stranded annealing induced by re-initiation of replication origins provides a novel and efficient mechanism for generating copy number expansion via non-allelic homologous recombination. PLoS Genet. 2013;9(1):e1003192. doi: 10.1371/journal.pgen.1003192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Koszul R, Caburet S, Dujon B, Fischer G. Eucaryotic genome evolution through the spontaneous duplication of large chromosomal segments. EMBO J. 2004;23(1):234–243. doi: 10.1038/sj.emboj.7600024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Payen C, Koszul R, Dujon B, Fischer G. Segmental duplications arise from Pol32-dependent repair of broken forks through two alternative replication-based mechanisms. PLoS Genet. 2008;4(9):e1000175. doi: 10.1371/journal.pgen.1000175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Slack A, Thornton PC, Magner DB, Rosenberg SM, Hastings PJ. On the mechanism of gene amplification induced under stress in Escherichia coli. PLoS Genet. 2006;2(4):e48. doi: 10.1371/journal.pgen.0020048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Costantino L, et al. Break-induced replication repair of damaged forks induces genomic duplications in human cells. Science. 2014;343(6166):88–91. doi: 10.1126/science.1243211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Medvedev P, Stanciu M, Brudno M. Computational methods for discovering structural variation with next-generation sequencing. Nat Methods. 2009;6(11) Suppl:S13–S20. doi: 10.1038/nmeth.1374. [DOI] [PubMed] [Google Scholar]
- 50.Benaglia T, Chauveau D, Hunter DR, Young DS. mixtools: An R Package for Analyzing Finite Mixture Models. J Stat Softw. 2009;32(6):1–29. [Google Scholar]
- 51.Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3:e3. doi: 10.2202/1544-6115.1027. [DOI] [PubMed] [Google Scholar]
- 52.Alexa A, Rahnenführer J, Lengauer T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006;22(13):1600–1607. doi: 10.1093/bioinformatics/btl140. [DOI] [PubMed] [Google Scholar]
- 53.Frommolt P, Thomas RK. Standardized high-throughput evaluation of cell-based compound screens. BMC Bioinformatics. 2008;9:475. doi: 10.1186/1471-2105-9-475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Lindberg MR, Hall IM, Quinlan AR. Population-based structural variation discovery with Hydra-Multi. Bioinformatics. 2015;31(8):1286–1289. doi: 10.1093/bioinformatics/btu771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Malhotra A, et al. Breakpoint profiling of 64 cancer genomes reveals numerous complex rearrangements spawned by homology-independent mechanisms. Genome Res. 2013;23(5):762–776. doi: 10.1101/gr.143677.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Futreal PA, et al. A census of human cancer genes. Nat Rev Cancer. 2004;4(3):177–183. doi: 10.1038/nrc1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Forbes SA, et al. 2008. The Catalogue of Somatic Mutations in Cancer (COSMIC). Curr Protoc Hum Genet Chapter 10:Unit 10.11.
- 58.Higgins ME, Claremont M, Major JE, Sander C, Lash AE. CancerGenes: A gene selection resource for cancer genome projects. Nucleic Acids Res. 2007;35(Database issue):D721–D726. doi: 10.1093/nar/gkl811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Zhao M, Sun J, Zhao Z. TSGene: A web resource for tumor suppressor genes. Nucleic Acids Res. 2013;41(Database issue):D970–D976. doi: 10.1093/nar/gks937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Santarius T, Shipley J, Brewer D, Stratton MR, Cooper CS. A census of amplified and overexpressed human cancer genes. Nat Rev Cancer. 2010;10(1):59–64. doi: 10.1038/nrc2771. [DOI] [PubMed] [Google Scholar]
- 61.Solimini NL, et al. Recurrent hemizygous deletions in cancers may optimize proliferative potential. Science. 2012;337(6090):104–109. doi: 10.1126/science.1219580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Consortium EP. ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Bernstein BE, et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol. 2010;28(10):1045–1048. doi: 10.1038/nbt1010-1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Feng J, Liu T, Qin B, Zhang Y, Liu XS. Identifying ChIP-seq enrichment using MACS. Nat Protoc. 2012;7(9):1728–1740. doi: 10.1038/nprot.2012.101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Zhang Y, et al. Model-based analysis of ChIP-Seq (MACS) Genome Biol. 2008;9(9):R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Shultz LD, et al. Human cancer growth and therapy in immunodeficient mouse models. Cold Spring Harb Protoc. 2014;2014(7):694–708. doi: 10.1101/pdb.top073585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Conway T, et al. Xenome--a tool for classifying reads from xenograft samples. Bioinformatics. 2012;28(12):i172–i178. doi: 10.1093/bioinformatics/bts236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Rausch T, et al. DELLY: Structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28(18):i333–i339. doi: 10.1093/bioinformatics/bts378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Li B, Dewey CN. RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323. doi: 10.1186/1471-2105-12-323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Eisenhauer EA, et al. New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1) Eur J Cancer. 2009;45(2):228–247. doi: 10.1016/j.ejca.2008.10.026. [DOI] [PubMed] [Google Scholar]
- 71.Cerami E, et al. The cBio cancer genomics portal: An open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2(5):401–404. doi: 10.1158/2159-8290.CD-12-0095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Gao J, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013;6(269):pl1. doi: 10.1126/scisignal.2004088. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.