Abstract
Cancer cells defective in homologous recombination (HR) are responsive to DNA crosslinking chemotherapies, PARP inhibitors, and inhibitors of polymerase theta, a key mediator of the backup pathway alternative end-joining. Such cancers include those with pathogenic bi-allelic alterations in core HR genes and another cohort of cases that exhibit sensitivity to the same agents and harbor genomic hallmarks of HR deficiency (HRD). These HRD signatures include a single base substitution pattern, large rearrangements, characteristic tandem duplications, and small deletions. Here, we utilized what is now known about the backup pathway alternative end-joining (Alt-EJ) through the key factor polymerase theta to design and test novel signatures of polymerase theta mediated (TMEJ) repair. We generated two novel signatures; a signature composed of small deletions with microhomology and another consisting of small, templated insertions. We find that templated insertions (TINS) consistent with TMEJ repair are highly specific to tumors with pathogenic bi-allelic mutations in BRCA2 and that high TINS genomic signature content in advanced ovarian cancers associate with overall survival following treatment with platinum agents. Additionally, the combination of TINS with other HRD metrics significantly improves the association of platinum sensitivity with survival compared to current state-of-the-art signatures.
Keywords: homologous recombination deficiency, theta-mediated end-joining, ovarian cancer, templated insertions, genomic signatures
INTRODUCTION
Inherited germline mutations in BRCA1 and BRCA2 are associated with an elevated risk of breast, ovary, pancreas, and prostate cancer. During early carcinogenesis, the second allele often becomes defective due to loss of heterozygosity leading to cancers with two defective copies of core components of the homologous recombination (HR) pathway. HR deficient cancers constitute a clinically important subgroup. They can be targeted with DNA damaging agents such as platinum salts (cisplatin and carboplatin) which cause interstrand DNA crosslinks repaired through the Fanconi anemia and HR pathways. For example, in a randomized phase III trial, patients with triple-negative breast cancer and germline BRCA1/2 mutations exhibited dramatically improved response rates to carboplatin relative to docetaxel (68% vs. 33%, p=0.03), unlike those with wildtype BRCA1/2 (1). Similarly, in prostate, pancreas, and ovarian cancers, the response to platinum agents ranges from 65–95% for cases with BRCA1/2 mutations (2–5). PARP inhibitors are also used to target HRD cancers to exploit synthetically lethality between PARP inhibition and BRCA1/2 mutations (6).
In advanced epithelial ovarian cancers, the standard of care currently consists of optimal cytoreductive surgery followed by platinum-based chemotherapy (7). However, nearly all stage III and all stage IV cancers recur, and thus overall survival following initial surgery and platinum-based chemotherapy is considered highly determined by inherent platinum sensitivity (8,9). Additionally, extensive prospective data concludes that bi-allelic pathogenic BRCA1/2 mutated cancers respond substantially better to platinum, leading to improved progression-free (10) and overall survival (11–16) relative to BRCA1/2 wild-type cases.
Ovarian cancers without known alterations in HR genes can also exhibit platinum and PARPi sensitivity and the hallmark genomic signatures associated with BRCA1/2 alterations (6,17). As carcinogenesis proceeds over many cell divisions, genetic insults typically repaired through HR are instead shunted to backup repair pathways such as alternative end-joining and non-homologous end-joining, leaving behind characteristic genomic DNA repair scars. In 2012, three similar signatures were reported: loss of heterozygosity (LOH)(18), large scale state transitions (LST) (19), and telomeric imbalance (tAI) (20), each characterized by large megabase pair (Mbp) intra- and interchromosomal rearrangements. These three tests were combined into one genomic readout known commercially as Myriad myChoice CDx HRD score (21), which is now FDA approved as a companion diagnostic test to select ovarian cancers patients eligible for two PARP inhibitors, Olaparib (22) and Niraparib (23). Another FDA-approved diagnostic test, FoundationOne CDx, uses LOH and BRCA-status to determine patients eligible for treatment with Olaparib or Rucaparib (24).
Other HRD signatures subsequently discovered include a base substitution pattern (SBS3) characterized by an even distribution of substitutions without a contextual bias (25), small deletions with microhomology around flanking the breaksite (26,27), and small and large tandem duplications and deletions (RefSig R3/R5) (28,29). A composite score of SBS3, HRD, indels with microhomology, RefSig R3, and RefSig R5, known as HRDetect, is highly predictive of cases with BRCA1/2 mutations in breast and ovarian cancer genomes (17,29,30). In addition, the small deletion signature was further refined as composed of the ID6 signature, consisting of small deletions of ≥ 5bp with small stretches of microhomology in their flanking sequences, and the ID8 signature, primarily associated with germline BRCA1 mutated cases, exhibits deletions of similar size but without microhomology (31).
The ID6 signature is consistent with repair via alternative end-joining and its predominant mediator polymerase theta (Pol θ, gene name POLQ). Repair through Pol θ is also termed Theta Mediated End-joining (TMEJ), a pathway highly used in the absence of functional NHEJ or HR (32). Loss of POLQ is synthetically lethal with BRCA1 and BRCA2 loss and Pol θ inhibitors are preferentially active in BRCA1 and BRCA2 deleted cell lines (32,33). Since the discovery of these HRD signatures, more is known about the nature of TMEJ. The enzyme typically searches for microhomology within 15 base pairs on either side of the break and utilizes mainly 3 or more base pairs of microhomology (34). Other pathways, including canonical non-homologous end-joining, can utilize up to 2 bp of MH, which means the features of NHEJ and TMEJ scars can overlap (32,34). Finally, Pol θ is also known to mediate small insertions representing an initial insufficient microhomology match followed by aborted synthesis, reannealing, and repair (34). The resected 3’ end can also snap back and anneal to itself, followed by polymerization, dissolution, and reannealing across the break, leaving behind an inverted template insertion. These events are known as templated insertions and are associated with germline BRCA1/2 mutated breast cancer genomes (34).
In the current report, we sought to apply the preclinical, mechanistic model of TMEJ (and the predicted genomic products) to clinically annotated patient datasets to understand if a TMEJ signature could improve the association with platinum sensitivity in tumors harboring HRD.
MATERIALS AND METHODS
Driver calls:
Mutational signature analyses were conducted with data from The Pan-Cancer Analysis of Whole Genomes (PCAWG), a consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) (https://dcc.icgc.org/releases/PCAWG, accessed August 2020). PCAWG had 2,793 whole cancer genomes available for analysis. Of these 2,793 samples, 2,354 had cancer driver mutation calls and were publicly available (35). These include previously known driver gene single nucleotide variants, indels, structural variants and translocations, non-coding, and non-genic elements totaling 674 unique driver alterations. Only drivers with over ten samples were tested to ensure enough data for continued analysis, leaving 219 drivers and 2,275 samples available for study.
Known Mutational Signatures:
The 2,275 PCAWG genomes with driver calls were matched with single base substitution (SBS) and indel (ID) signature calls generated by Alexandrov et al. using SigProfiler based on sample id numbers (31). Proportions of SBS3, ID6, and ID8 signatures were calculated for each sample. To understand the portion of double-strand break repair events reflected in ID6 and ID8, we accounted for specific indel signatures known to reflect other DNA repair events. We excluded ID1 and ID2 mutations as they are caused by slippage events during DNA replication. Likewise, ID7 mutations caused by defective DNA mismatch repair and ID13 mutations resulting from DNA damage induced by UV light were omitted. ID11 and ID16 were found to be predominately insertions and were also omitted.
Large-scale state transitions (LST), loss of heterozygosity (LOH), and telomere allelic imbalances (TAI), all markers of HRD, were determined for all samples using a modified version of the calc.lst(), calc.loh(), and calc.ai() functions from the Signature Tools Lib R package, respectively (29). HRD score was determined by computing the unweighted sum of LST, TAI, and LOH. HRDetect probabilities were calculated using the HRDetect pipeline from the Signature Tools Lib R package (29). For HRDetect calculations, 42 samples were missing BEDPE files, and HRDetect probabilities could not be computed and thus were left out of future analyses.
Novel Mutational Signatures:
Indel profiles were generated consisting of insertions and deletions sizes, repeats, and microhomology lengths. VCF files of PCAWG samples were run through a combination of the indel caller within the HRDetect toolbox and previously developed tools to check for sequence context around indels and determine the presence of microhomology or repetitive regions (29,36). Novel signatures were defined using these profiles and to create TMEJ specific signatures (34). These novel TMEJ-specific signatures have 1–30 bp deletions with ≥ 2 (TMEJ2), ≥ 3 (TMEJ3), or ≥ 4 (TMEJ4) bp of microhomology. TMEJ deletions by the signatures were normalized by the total number of indels to standardize the proportion of these events.
Templated insertions (TINS) were identified according to a protocol developed by Carvajal-Garcia et al., which scans for direct or inverted repeats within 50 bp on either side of insertions of length 5 or larger (34). To ensure no tandem repeats were included, insertions directly adjacent to their template, or a 0 bp distance between the insertion and templated sequence, were removed from further analysis. Human genome build hs37d5 was used as a reference genome (as used in PCAWG variant analyses) (35). Inverted and direct repeats were summed to create a total TINS count. The unique categories of inverted TINS (iTINS) and direct repeat TINS (drTINS) were tallied separately. Raw TINS, iTINS, and drTINS counts were normalized by the total indel count to create comparable frequency counts across samples. All signature values for each of the 2,275 samples can be found in Supplementary Table 1.
Statistical Analysis:
Univariate Mann-Whitney U-tests were used to test each of 219 cancer driver mutations for enrichment in each of the previously defined mutational signatures by comparing mutated samples to wild-type for every driver mutation (Supp. Table 2). Significant driver hits (false discovery rate ≤ 0.05) were passed into the multivariate analyses. Multivariate analyses were performed using linear regression on each mutational signature with univariate significant driver hits and cancer type as covariates (Supp. Table 3). All analyses were performed in R (version 4.0.3). Statistical tests were considered significant as P ≤ 0.05 (or FDR ≤ 0.05). Asterisks used to define significance as follows: * P ≤ 0.05, ** P ≤ 0.01, ***, P ≤ 0.001, **** P ≤ 0.0001.
Survival Analysis:
Survival time is defined as the time interval between diagnosis to death or last follow-up. All survival analyses were performed on either PCAWG stage III and IV, platinum-treated ovarian cancer cases (106 samples) or TCGA stage III and IV, platinum-treated ovarian cancer cases (407 samples). Kaplan-Meier curves were generated and compared using log-rank tests for PCAWG stage III and IV, platinum-treated ovarian cancer cases for HRD and HR competent groups depending on mutational signatures. Univariate Cox proportional hazards regression models were fitted with each cancer driver as the predictor for PCAWG stage III and IV, platinum-treated ovarian cancer cases and TCGA stage III and IV, platinum-treated ovarian cancer cases. In addition, the relationship between mutational signatures and survival was examined by fitting a Cox proportional hazards model.
Mutational signatures were dichotomized to indicate HRD cases based on each signature. When available, known signature thresholds were used to create distinct cutoffs. HRDetect probability of ≥ 0.7 and HRD score ≥ 42 have been previously reported as acceptable HRD thresholds. There are no known thresholds for SBS3, ID6, or ID8, so any signature presence was considered an HRD threshold. The median value of the signature for PCAWG stage III and IV, platinum-treated ovarian cancer cases were used to determine the HRD cutoff for the novel signatures. For TMEJ these values are 0.03 for TMEJ2, 0.01 for TMEJ3, and 0.003 for TMEJ4. For TINS, 0.007 for TINS and 0.003 for iTINS and drTINS.
Data Availability:
All data used in this study are available from PCAWG (accessed Sept. 2020). Controlled PCAWG data can be obtained after applying for access through ICGC DACO and dbGaP (https://docs.icgc.org/pcawg/data/).
Code Availability:
R code available at GitHub repository (https://github.com/HigginsonLab/InvertedTemplatedInsertions).
RESULTS
Known signatures do not demonstrate any novel associations with PCAWG driver mutations
Homologous recombination deficiency (HRD) in the face of genomic insults creates various genomic scars reflective of the DNA repair pathway used. Using whole genomes from The Pan-Cancer Analysis of Whole Genomes (PCAWG) project, we analyzed known HRD signatures, including base substitutions, large rearrangements, structural variants, small indels, and composite HRD scores (Fig. 1A). We next examined the relationship between these signatures and driver mutations previously called for each case in PCAWG, including single-gene mutations, copy number alterations, long non-coding RNAs, and other non-coding driver events (Fig. 1B, 1C, Supp. Fig. 1) (35). The dataset contains 219 testable drivers (present in at least 10 genomes) called in 2,275 cases.
Figure 1.
No consistent associations between genes and signatures other than BRCA1/2 indicating lack of signature specificity. A. Diagrams of current, known mutational signatures. B. Pipeline of data analysis from PCAWG WGS data to univariate and multivariate analysis of signatures and ovarian cancer survival data. C. LEFT: Heatmap of driver genes with significant associations to two or more signatures from the multivariate regression model. Only showing drivers with available survival data from either PCAWG or TCGA ovarian cancer data. Significance of the association is shown by the size of the dot. Color indicates the contribution to the model. Red is a positive coefficient and blue is negative. RIGHT: Cox regression hazard plots of stage III and IV, platinum-treated ovarian cancer cases from PCAWG and TCGA. D. Kaplan-Meier curves for stage III and IV, platinum-treated ovarian cancer cases from PCAWG for HRDetect, ID6, and ID8, shown with and without pathogenic, biallelic BRCA1/2 and RAD51B mutations. Groups are divided by known thresholds. Signature thresholds are defined as 0.7 or greater for HRDetect and greater than 0 for ID6 and ID8. P-values from log-rank test.
Using univariate Mann-Whitney U-tests, we determined the relationship between the presence of a driver and the HRD signature. (Fig. 1B). Significant drivers in univariate analysis (FDR<0.05) for each signature were then applied to a multivariate linear regression model to account for the co-occurrence of driver genes. Cancer type was also included in the multivariate analysis as per previous analyses to account for inherently different baselines of genomic alteration. As expected, pathogenic, biallelic BRCA1 and BRCA2 driver mutations had the most significant associations with all HRD signatures (Fig. 1C, Supp. Fig. 1A). RB1 loss was also associated with four of the five tested signatures, albeit more weakly. The remaining tested drivers were inconsistently associated with each signature, suggesting non-specificity.
We then evaluated the HRD detection performance of driver mutations using patient survival data from stage III/IV platinum-treated ovarian cancers as a clinical surrogate of HR deficiency. Survival data matched to whole genomes in the PCAWG (n = 106) and exomes/whole genomes in the TCGA (n = 407) are shown. Of those testable drivers (significant in at least two multivariate analyses), only pathogenic, biallelic driver mutations in BRCA1 and BRCA2 had consistent, significant associations with the signatures and patient survival (Fig. 1C). While RB1 appeared to have a strong association with four of the five signatures, there was no significant difference in survival between patients with and without RB1 driver mutations (Fig. 1C, Supp. Fig. 2A). Similarly, there was not a significant association with survival among patients with BRD4 amplification (Fig. 1C) though there is a strong negative association with HRDetect. Therefore, it is not likely that other clearly identifiable drivers in non-HR genes would provide prognostic value for HR capacity and platinum sensitivity in this dataset. Other altered core HR genes beyond BRCA1/2 mutations, including PALB2 (8 cases), RAD51B (9 cases), RAD51C (3 cases), and RAD51D (2 cases) were rare amongst the 2,275 genomes.
We then determined the prognostic value of the signatures themselves, both in all cases and in cases without biallelic mutations in known HR-related genes using survival analyses per Kaplan-Meier survival curves and log-rank tests. In this ovarian cancer dataset, there were 3 monoallelic PALB2 mutations but, because it is generally believed that only biallelic mutations produce HR deficiency, these were considered to be in the BRCA1/2 wild-type cohort (37,38). There is also 1 biallelic RAD51B mutated genome and RAD51B mutations were recently associated with an increased HRD cancer predilection (39). We removed this case along with all of the biallelic BRCA1/2 mutated cases for our BRCA1/2 wild-type cohort. For HRD score and HRDetect analyses, we grouped patients into HR competent or deficient by using established thresholds (21,29). Significant differences in survival were observed between the two groups defined by HRDetect but not HRD score (Figure 1D and Supplemental Figure 1B). The SBS3, ID6, and ID8 signatures do not have established thresholds for determining HR competency vs. deficiency. Thus, we used signature presence or absence as a binary factor. In addition, we assessed a threshold set at the median SBS3, ID6, or ID8 contribution across all cases. ID6 and ID8 separated high and low-risk cohorts, including all cases, but only ID6 remained marginally significant in BRCA1/2 wild-type cases (Figure 1D and Supplementary Figure 1C).
Novel TMEJ deletion signature performs similarly to ID6
Given the limited utility of existing non-core HR gene driver mutations to predict HRD, we sought to develop a novel signature based on known markers caused by theta-mediated end-joining (TMEJ). TMEJ is classified by the use of polymerase theta (Pol θ) to repair stranded double-strand breaks using small stretches of microhomology to align and repair breaks, often resulting in small deletions (Fig. 2A) (34). After a double-strand break and resection of the 5’ ends, Pol θ utilizes microhomology (MH) preferentially within 15 bp on either side of the break to align and anneal the two strands (34). An analysis of deletion sizes showed that deletions in the 5–30 bp range were increased in pathogenic, biallelically mutated BRCA1/2 samples compared to wild-type PCAWG samples (Fig. 2B). Similarly, indels with 1–4 bp of microhomology were much more common among BRCA1 and BRCA2 mutated samples than wild-type (Fig. 2C). This is in line with prior studies, which reported that Pol θ preferentially uses 2–6 bp of microhomology (34). The ID6 signature represents small deletions >5kb and predominantly >2bp of MH. In recognition of the typical size of TMEJ scars seen preclinically and the possibility of overlap of NHEJ and TMEJ scars with short MH stretches, we tested three TMEJ signatures, defined as deletions of 1–30 bp with microhomology lengths starting from 2 to 4 bp (TMEJ2-TMEJ4).
Figure 2.
Refining HRD signatures to match what is known of TMEJ. A. Diagram of TMEJ specific signatures developed. B. Comparison of pathogenic, biallelic mutated BRCA1, BRCA2, and WT deletion sizes as an average proportion of all indels in PCAWG data. C. Comparison of pathogenic, biallelic mutated BRCA1, BRCA2, and WT MH lengths as an average proportion of all indels in PCAWG data. D. Heatmap of driver gene significantly associated with a TMEJ signature. Significance of the association is shown by the size of the dot. Color indicates the contribution to the model. Red is a positive coefficient and blue is negative. E. Cox regression hazard plots for PCAWG stage III and IV, platinum-treated ovarian cancer samples for each of the TMEJ signatures, shown with and without pathogenic, biallelic BRCA1/2 and RAD51B mutations. Groups are divided by known and computed thresholds. Signature thresholds were defined as greater than 0 for ID6 and ID8 and as the median value of TMEJ signatures in PCAWG stage III and IV, platinum-treated ovarian cancer cases. F. Kaplan-Meier curves for TMEJ4 in PCAWG stage III and IV, platinum-treated ovarian cancer data, shown with and without pathogenic, biallelic BRCA1/2 and RAD51B mutations. Groups are divided by computed thresholds. Signature thresholds were defined as the median value of TMEJ signatures in PCAWG stage III and IV, platinum-treated ovarian cancer cases. P-values from log-rank test. G. TMEJ4 proportions in pathogenic, biallelic mutated BRCA1, BRCA2, and WT PCAWG breast, prostate, ovarian, and pancreas samples. P-values from Mann-Whitney U test.
We performed the same univariate and multivariate analysis approach with the 219 unique driver mutations and the novel TMEJ signatures. BRCA1 and BRCA2 mutations had the most significant associations with the novel signatures, BRCA1 becoming gradually less significant as the microhomology size increased while BRCA2 remained consistently significantly associated (Fig. 2D). Using the median values of each signature as a threshold, we performed univariate cox regression analysis and generated Kaplan-Meier survival curves with log-rank test comparisons in the PCAWG stage III and IV platinum-treated ovarian cancer cohort. We found that while TMEJ2 and TMEJ3 are not associated with improved survival, TMEJ4 performs similarly to ID6 in separating high and low-risk groups but does not reach statistical significance (Fig. 2E, 2F, Supp. Fig. 3A). We also considered 0 as a threshold to generate binary groups, any count of these events versus no counts, as performed in Fig. 1 for SBS3, ID6, and ID8. However, we noted that due to the high frequency of TMEJ2–3 events, few genomes were without at least one such deletion (Supp. Fig. 3B, 3C). In addition to being associated with improved survival, TMEJ4 was increased in pathogenic, biallelic mutated BRCA1 and BRCA2 compared to wild-type in breast, prostate, ovarian, and pancreas cancer samples (Fig. 2G). Thus, we conclude that TMEJ4 performs similarly to ID6, though the TMEJ4 criteria are closer to what is known pre-clinically about TMEJ repair.
Templated insertions (TINS) associate with BRCA2 mutations and overall survival in advanced ovarian cancers treated with platinum agents
Another unique characteristic of TMEJ is small, templated insertions (TINS)(34). These insertions include two categories of direct-repeat templated insertions (drTINS) and inverted templated insertions (iTINS) (Fig. 3A, 3B)(34). When there is insufficient microhomology between DNA strands on opposite sides of the break, 3’ overhangs can fold back on themselves and find microhomology to begin fill-in synthesis (Fig. 3A), followed by dissolution, re-annealing across the break, and repair. iTINS are then insertions ≥ 5 bp insertions that are reverse complements of the neighboring sequence, 50 bp on either side of the insertion site. drTINS are insertions of ≥ 5 bp that are direct repeats of the adjacent sequence within 50 bp on either side of the insertion site. These events result from the initial microhomology annealing between the two DNA strands slipping and reannealing after fill-in synthesis has begun (Fig. 3B). We used the 5 bp cutoff to decrease the probability of finding these insertion events by chance and removed from analysis insertions at tandem repeats, as shown previously (34). The templates for the insertions were predominantly located directly adjacent to the insertions themselves (Fig. 3C). Most of the TINS insertions are 5–6 bp insertions in length, but larger drTINS insertions are also present (Fig. 3D).
Figure 3.
Templated insertions all predict survival as long as separated from replication slippage at tandem repeats. A. Diagram of inverted templated insertions (iTINS) foldback insertion mechanism. B. Diagram of direct repeat insertions (drTINS) direct insertion mechanism. C. Location of templated sequence relative to insertion site. D. Distribution of TINS size. E. Heatmap of driver genes significantly associated with at least two TINS signatures. Significance of the association is shown by the size of the dot. Color indicates the contribution to the model. Red is a positive coefficient and blue is negative. F. TINS, iTINS, and drTINS proportions in pathogenic, biallelic mutated BRCA1, BRCA2, and WT PCAWG breast, prostate, ovarian, and pancreas samples. P-values from Mann-Whitney U test. G. Cox regression hazard plots for all PCAWG stage III and IV ovarian cancer samples, shown with and without pathogenic, biallelic BRCA1/2 and RAD51B mutations. Groups are divided by computed thresholds. Signature thresholds are defined as the median value for TINS signatures in PCAWG stage III and IV, platinum-treated ovarian cancer cases. H. Kaplan-Meier curves for TINS, iTINS, and drTINS signatures in PCAWG stage III and IV, platinum-treated ovarian cancer data, shown with and without pathogenic, biallelic BRCA1/2 and RAD51B mutations. Groups are divided by computed thresholds. Signature thresholds are defined as the median value for TINS signatures in PCAWG stage III and IV, platinum-treated ovarian cancer cases. P-values from log-rank test.
We performed the same univariate and multivariate analysis approach on the three TINS signatures, iTINS, drTINS, and the combination of the two. Both all TINS and drTINS had possibly non-specific associations with various driver mutations (Fig. 3E, Supp. Fig. 4A). All three signatures, TINS, iTINS, and drTINS, were significantly associated with BRCA2 driver mutations in a pan-cancer context and were increased in pathogenic, biallelic mutated BRCA2 compared to BRCA1 and wild type in breast, prostate, ovarian, and pancreas cancers (Fig. 3E, 3F).
We again defined a threshold between HRD and HR competent cases by the median values of each signature for the PCAWG stage III and IV platinum-treated ovarian cancer cohort. HRD cases defined by all three TINS signatures were significantly associated with improved survival in the complete PCAWG stage III and IV platinum-treated ovarian cancer cohort, even after removing BRCA1 and BRCA2 mutated cases (Fig. 3G). Log-rank tests showed that TINS, iTINS, and drTINS are prognostic signatures for survival in ovarian cancer (Fig. 3H). We noted that when using a threshold of 0, i.e., any signature measurement to define HRD cases, all three signatures remained significantly associated with survival regardless of BRCA status (Supp. Fig. 4B, 4C).
Combining TINS signature score with HRDetect improves classification of prognostic groups
The known and novel signatures discussed can be and are used to predict patient survival in platinum-treated advanced ovarian cancer. HRD score closely resembles the Myriad MyChoice test and LOH is used by the FoundationOne test, both of which are FDA-approved (21,40). While no other signatures are presently available in a clinical setting, HRDetect, ID6, TMEJ4, and all three TINS signatures are significantly associated with survival in this patient cohort. Comparing the hazard ratios of these signatures, using TINS as the total measurement of templated insertions, revealed that only TINS, HRDetect, and ID6 are significantly associated with survival in a BRCA1/2 wild-type context (Fig. 4A). TMEJ4 performs very similarly to ID6 but does not improve the signature’s ability to identify patients with HRD. Including these three signatures in a multivariate Cox regression model, TINS and HRDetect remain significantly associated with survival (Fig. 4B). In addition, TINS identified some samples as HRD which HRDetect did not and vice versa, supporting the integration of both metrics (Fig. 4C).
Figure 4.
TINS is comparable to HRDetect at identifying HRD cases. A. Cox regression hazard plots of known and novel signatures for PCAWG stage III and IV, platinum-treated ovarian cancer cases without pathogenic, biallelic BRCA1/2 and RAD51B mutations. Groups are divided by known and computed thresholds. Signature thresholds are defined as 0.7 or greater for HRDetect, greater than 0 for ID6, greater than 42 for HRD, and as the median value for TMEJ4 and TINS signatures in PCAWG stage III and IV, platinum-treated ovarian cancer cases. B. Multivariate Cox regression results for signatures with ≥ 0.05 p-value in survival models. C. Oncoprint of signatures for PCAWG stage III and IV, platinum-treated ovarian cancer cases. Groups are divided by known and computed thresholds. Signature thresholds are defined as 0.7 or greater for HRDetect and as the median value for and TINS signatures in PCAWG stage III and IV, platinum-treated ovarian cancer cases. D. Kaplan-Meier curves of cox regression significant signatures for PCAWG stage III and IV, platinum-treated ovarian cancer cases without pathogenic, biallelic BRCA1/2 and RAD51B mutations. Groups are divided by known and computed thresholds. Signature thresholds are defined as 0.7 or greater for HRDetect and as the median value for TINS signatures in PCAWG stage III and IV, platinum-treated ovarian cancer cases. P-values from log-rank test.
While cases with high TINS or high HRDetect are associated with improved survival individually, cases with both signatures exhibited dramatically different outcomes than those without either one, regardless of BRCA1/2 status (Fig. 4D). Thus, when cases without known BRCA mutations are grouped into those with both high TINS and high HRDetect, median survival improves to 49.7 months compared to 20.5 months in cases without either one (p=0.0016).
DISCUSSION
One HRD genomic signature has been FDA approved as a companion diagnostic test to select ovarian cancer patients for Olaparib and Niraparib, but there remains room for improvement (41). While the Myriad myChoice HRD test is currently used to predict PARP inhibitor sensitivity, there is a large overlap between platinum and PARP inhibitor sensitivity as both depend upon inactive HR pathways. Whole-genome sequencing-based tests, such as HRDetect and CHORD, are able to identify HR deficient breast (27,30) and ovarian cancers (17,27) such as those with known germline and somatic BRCA1/2 mutations. This analysis supports the inclusion of TINS as an additional feature associated with responses to platinum agents.
A challenge in evaluating TMEJ scars is the degree of overlap between features suggestive of NHEJ and those suggestive of TMEJ. Both NHEJ and TMEJ can utilize short stretches of microhomology up to 2bp, leaving behind identical deletions (34,36,42). Here we have further evaluated potential TMEJ signatures using whole genomes and criteria defined preclinically, including deletions size and MH features and templated insertion signatures previously associated with BRCA1/2 mutated breast cancer genomes (34). We find templated insertions to be specifically associated with bi-allelic BRCA2 mutated genomes in a pan-cancer analysis and overall survival in advanced stage III/IV ovarian cancer cases treated with platinum-based chemotherapy.
Interestingly, the TINS signature is associated with BRCA2 but not BRCA1 mutated genomes in this analysis. The deletion signatures ID6 and ID8 also demonstrate differential associations with BRCA1 and BRCA2, with ID8 (an NHEJ-like scar) is most associated with BRCA1 mutant genomes and ID6 (an Alt-EJ-like scar) is more represented in BRCA2 than BRCA1 mutant genomes (31,36). BRCA1 is understood to promote end resection role (43), a first step in repair common to both HR and TMEJ. As such, BRCA1 may promote TMEJ rather than suppress TMEJ as does BRCA2 at least at two ended DSBs (36,44). However, polymerase theta inhibitors are active in BRCA1 deficient cell lines (32,33) and BRCA1 and polymerase theta loss is synthetically lethal (32), implying unresolved complexity. There was also an association between the TINS signature and an IDH1 missense driver mutation. IDH1 mutations have been shown to be associated with HRD and PARPi sensitivity, providing a possible explanation for this correlation (45).
One possibly confounding factor with TMEJ-mediated templated insertions is microsatellite instability or other processes mediating templated insertions such as microhomology-mediated break-induced replication (MMBIR)(46). However, to our knowledge, TMEJ is the only known process that can mediate short inverted templated insertions and iTINS alone shows the same association with BRCA2 mutations pan-cancer and overall survival in the ovarian cohort as the collective TINS signature. We have used the term inverted templated insertions instead of ‘foldback’ insertions to be consistent with prior work (34) and avoid confusion with larger foldback inversions (median 2,329bp) due to breakage-fusion-bridge cycles, which are negatively associated with BRCA1/2 mutant genomes and ovarian cancer survival (47).
Up to 70% of cases in the PCAWG ovarian cancer cohort exhibited either above-threshold HRDetect or high TINS, a higher percentage of HR deficiency than the commonly cited 50% estimate (48). However, only 6% of advanced ovarian cancers are primary refractory to cisplatin (49) and thus it is possible the inclusion of the TINS high cases accounts for some of the difference. It is possible that in some cases identified as HRD without loss of key HR genes like BRCA1/2, BRCA1 or RAD51C promoter hypermethylation may be responsible for the HRD phenotype as it has been shown that BRCA1 promoter methylation contributes up to 22% of HRD cases in ovarian cancer (30). Unfortunately, this data was not available in the PCAWG dataset.
One possible explanation for the TINS positive, HRDetect negative cohort represents cases with intermediate levels of HR capacity. The existence of a continuum of HR capacity is supported by the ARIEL3 trial, a randomization of rucaparib versus placebo in patients with relapsed high-grade serous ovarian/endometrioid/primary peritoneal/or fallopian carcinoma who had achieved at least a partial response to platinum therapy (50). PARP inhibition was associated with improved progression-free survival most clearly in the germline or somatic mutated BRCA1/2 cohort (HR 0.23). But less substantial associations were also seen in patients with high genomic LOH (HR 0.44) or even low genomic LOH (HR 0.58). Another possibility is that the TINS positive, HRDetect negative cohort exhibits improved overall survival without a clear HR deficiency, as we have demonstrated the prognostic but not predictive value of the TINS signature with regard to platinum sensitivity. Yet another possible explanation is that the HRDetect algorithm is tuned to stringently identify cases that are BRCA1/2 deficient and may miss intermediate levels of HR deficiency (29,30).
The use of a TINS genomic signature comes with certain limitations shared with other whole genome-based signatures in terms of the clinical practicality of obtaining sufficient tumor tissue and accounting for tumor heterogeneity and tumor stroma. In addition, TINS are rare events, occurring a median of 11 times per genome in bi-allelic BRCA2 mutated cases. Thus, there is likely no substitute for whole-genome data to obtain this signature. Finally, ovarian cancer survival employed here is a clinical surrogate for HR capacity and additional prospective data is needed to validate the relationship between TINS, platinum, and/or PARP inhibitor response. As such, we propose the use of TINS as an addition to other composite methods of HRD detection like HRDetect or Myriad MyChoice to improve upon their ability to identify BRCA1/2 wild-type HRD patients.
In conclusion, we have evaluated possible refined TMEJ signatures using pre-clinical criteria, demonstrated the specificity of templated insertions in terms of association with BRCA2 mutated cases, and shown independent prognostic association with advanced ovarian cancers treated with platinum-based chemotherapy.
Supplementary Material
Implications:
Small, templated insertions indicative of theta-mediated end-joining likely can be used in conjunction with other HRD mutational signatures as a prognostic tool for patient response to therapies targeting HR deficiency.
ACKNOWLEDGEMENTS
We acknowledge and thank the many investigators involved in the ICGC and TCGA projects, the PCAWG consortium, and the patients who contributed specimens. We thank Suleman Hussain for critical review of the manuscript. We apologize in advance for many other references we could not include due to journal limits.
We acknowledge funding from the National Cancer Institute (R33CA236670-01A1) and the Emerson Collective Cancer Research Fund supporting D. Higginson, R. Majumdar, and G. Moore. We further acknowledge funding from Cancer Center Support Grant [CCSG, P30 CA08748]) supporting all authors.
Footnotes
Conflict of interest: GM and DSH are listed as inventors on a provisional patent filed by their institution related to the templated insertions and their use as a molecular signature of response to platinum salts and PARP inhibitors. There are no licenses or royalties. DSH has a sponsored research contract with SQZBiotechnologies for an unrelated project and acknowledges travel funds from Biorad, Inc.
REFERENCES
- 1.Tutt A, Tovey H, Cheang MCU, Kernaghan S, Kilburn L, Gazinska P, et al. Carboplatin in BRCA1/2-mutated and triple-negative breast cancer BRCAness subgroups: the TNT Trial. Nat Med 2018;24:628–37 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Park W, Chen J, Chou JF, Varghese AM, Yu KH, Wong W, et al. Genomic Methods Identify Homologous Recombination Deficiency in Pancreas Adenocarcinoma and Optimize Treatment Selection. Clin Cancer Res 2020;26:3239–47 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Pomerantz MM, Spisak S, Jia L, Cronin AM, Csabai I, Ledet E, et al. The association between germline BRCA2 variants and sensitivity to platinum-based chemotherapy among men with metastatic prostate cancer. Cancer 2017;123:3532–9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Pennington KP, Walsh T, Harrell MI, Lee MK, Pennil CC, Rendi MH, et al. Germline and somatic mutations in homologous recombination genes predict platinum response and survival in ovarian, fallopian tube, and peritoneal carcinomas. Clin Cancer Res 2014;20:764–75 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Pan Z, Xie X. BRCA mutations in the manifestation and treatment of ovarian cancer. Oncotarget 2017;8:97657–70 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Dal Molin GZ, Omatsu K, Sood AK, Coleman RL. Rucaparib in ovarian cancer: an update on safety, efficacy and place in therapy. Ther Adv Med Oncol 2018;10:1758835918778483 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.National Comprehensive Cancer Network. October 12, 2021. Ovarian Cancer/Fallopian Tube Cancer/Primary Peritoneal Cancer (Version 3.2021). <https://www.nccn.org/professionals/physician_gls/pdf/ovarian.pdf>. October 12, 2021. [Google Scholar]
- 8.Markman M, Markman J, Webster K, Zanotti K, Kulp B, Peterson G, et al. Duration of response to second-line, platinum-based chemotherapy for ovarian cancer: implications for patient management and clinical trial design. J Clin Oncol 2004;22:3120–5 [DOI] [PubMed] [Google Scholar]
- 9.Bouberhan S, Pujade-Lauraine E, Cannistra SA. Advances in the Management of Platinum-Sensitive Relapsed Ovarian Cancer. J Clin Oncol 2019;37:2424–36 [DOI] [PubMed] [Google Scholar]
- 10.McLaughlin JR, Rosen B, Moody J, Pal T, Fan I, Shaw PA, et al. Long-term ovarian cancer survival associated with mutation in BRCA1 or BRCA2. J Natl Cancer Inst 2013;105:141–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Boyd J, Sonoda Y, Federici MG, Bogomolniy F, Rhei E, Maresco DL, et al. Clinicopathologic features of BRCA-linked and sporadic ovarian cancer. JAMA 2000;283:2260–5 [DOI] [PubMed] [Google Scholar]
- 12.Cass I, Baldwin RL, Varkey T, Moslehi R, Narod SA, Karlan BY. Improved survival in women with BRCA-associated ovarian carcinoma. Cancer 2003;97:2187–95 [DOI] [PubMed] [Google Scholar]
- 13.Chetrit A, Hirsh-Yechezkel G, Ben-David Y, Lubin F, Friedman E, Sadetzki S. Effect of BRCA1/2 mutations on long-term survival of patients with invasive ovarian cancer: the national Israeli study of ovarian cancer. J Clin Oncol 2008;26:20–5 [DOI] [PubMed] [Google Scholar]
- 14.Yang D, Khan S, Sun Y, Hess K, Shmulevich I, Sood AK, et al. Association of BRCA1 and BRCA2 mutations with survival, chemotherapy sensitivity, and gene mutator phenotype in patients with ovarian cancer. JAMA 2011;306:1557–65 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hyman DM, Zhou Q, Iasonos A, Grisham RN, Arnold AG, Phillips MF, et al. Improved survival for BRCA2-associated serous ovarian cancer compared with both BRCA-negative and BRCA1-associated serous ovarian cancer. Cancer 2012;118:3703–9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Vencken P, Kriege M, Hoogwerf D, Beugelink S, van der Burg MEL, Hooning MJ, et al. Chemosensitivity and outcome of BRCA1- and BRCA2-associated ovarian cancer patients after first-line chemotherapy compared with sporadic ovarian cancer patients. Ann Oncol 2011;22:1346–52 [DOI] [PubMed] [Google Scholar]
- 17.Sztupinszki Z, Diossy M, Borcsok J, Prosz A, Cornelius N, Kjeldsen MK, et al. Comparative Assessment of Diagnostic Homologous Recombination Deficiency-Associated Mutational Signatures in Ovarian Cancer. Clin Cancer Res 2021 [DOI] [PubMed] [Google Scholar]
- 18.Abkevich V, Timms KM, Hennessy BT, Potter J, Carey MS, Meyer LA, et al. Patterns of genomic loss of heterozygosity predict homologous recombination repair defects in epithelial ovarian cancer. Br J Cancer 2012;107:1776–82 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Popova T, Manie E, Rieunier G, Caux-Moncoutier V, Tirapo C, Dubois T, et al. Ploidy and large-scale genomic instability consistently identify basal-like breast carcinomas with BRCA1/2 inactivation. Cancer Res 2012;72:5454–62 [DOI] [PubMed] [Google Scholar]
- 20.Birkbak NJ, Wang ZC, Kim JY, Eklund AC, Li Q, Tian R, et al. Telomeric allelic imbalance indicates defective DNA repair and sensitivity to DNA-damaging agents. Cancer Discov 2012;2:366–75 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Telli ML, Timms KM, Reid J, Hennessy B, Mills GB, Jensen KC, et al. Homologous Recombination Deficiency (HRD) Score Predicts Response to Platinum-Containing Neoadjuvant Chemotherapy in Patients with Triple-Negative Breast Cancer. Clin Cancer Res 2016;22:3764–73 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ray-Coquard I, Pautier P, Pignata S, Perol D, Gonzalez-Martin A, Berger R, et al. Olaparib plus Bevacizumab as First-Line Maintenance in Ovarian Cancer. N Engl J Med 2019;381:2416–28 [DOI] [PubMed] [Google Scholar]
- 23.Gonzalez-Martin A, Pothuri B, Vergote I, DePont Christensen R, Graybill W, Mirza MR, et al. Niraparib in Patients with Newly Diagnosed Advanced Ovarian Cancer. N Engl J Med 2019;381:2391–402 [DOI] [PubMed] [Google Scholar]
- 24.Swisher EM, Lin KK, Oza AM, Scott CL, Giordano H, Sun J, et al. Rucaparib in relapsed, platinum-sensitive high-grade ovarian carcinoma (ARIEL2 Part 1): an international, multicentre, open-label, phase 2 trial. Lancet Oncol 2017;18:75–87 [DOI] [PubMed] [Google Scholar]
- 25.Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SA, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature 2013;500:415–21 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, Greenman CD, Raine K, et al. Mutational processes molding the genomes of 21 breast cancers. Cell 2012;149:979–93 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Nguyen L W M Martens J, Van Hoeck A, Cuppen E. Pan-cancer landscape of homologous recombination deficiency. Nat Commun 2020;11:5584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Nik-Zainal S, Davies H, Staaf J, Ramakrishna M, Glodzik D, Zou X, et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 2016;534:47–54 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Degasperi A, Amarante TD, Czarnecki J, Shooter S, Zou X, Glodzik D, et al. A practical framework and online tool for mutational signature analyses show inter-tissue variation and driver dependencies. Nat Cancer 2020;1:249–63 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Davies H, Glodzik D, Morganella S, Yates LR, Staaf J, Zou X, et al. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nat Med 2017;23:517–25 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, et al. The repertoire of mutational signatures in human cancer. Nature 2020;578:94–101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ramsden DA, Carvajal-Garcia J, Gupta GP. Mechanism, cellular functions and cancer roles of polymerase-theta-mediated DNA end joining. Nat Rev Mol Cell Biol 2021 [DOI] [PubMed] [Google Scholar]
- 33.Zhou J, Gelot C, Pantelidou C, Li A, Yucel H, Davis RE, et al. A first-in-class Polymerase Theta Inhibitor selectively targets Homologous-Recombination-Deficient Tumors. Nat Cancer 2021;2:598–610 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Carvajal-Garcia J, Cho JE, Carvajal-Garcia P, Feng W, Wood RD, Sekelsky J, et al. Mechanistic basis for microhomology identification and genome scarring by polymerase theta. Proc Natl Acad Sci U S A 2020;117:8476–85 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Consortium ITP-CAoWG. Pan-cancer analysis of whole genomes. Nature 2020;578:82–93 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Hussain SS, Majumdar R, Moore GM, Narang H, Buechelmaier ES, Bazil MJ, et al. Measuring nonhomologous end-joining, homologous recombination and alternative end-joining simultaneously at an endogenous locus in any transfectable human cell. Nucleic Acids Res 2021;49:e74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Mutter RW, Riaz N, Ng CK, Delsite R, Piscuoglio S, Edelweiss M, et al. Bi-allelic alterations in DNA repair genes underpin homologous recombination DNA repair defects in breast cancer. J Pathol 2017;242:165–77 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Riaz N, Blecua P, Lim RS, Shen R, Higginson DS, Weinhold N, et al. Pan-cancer analysis of bi-allelic alterations in homologous recombination DNA repair genes. Nat Commun 2017;8:857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Setton J, Selenica P, Mukherjee S, Shah R, Pecorari I, McMillan B, et al. Germline RAD51B variants confer susceptibility to breast and ovarian cancers deficient in homologous recombination. NPJ Breast Cancer 2021;7:135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Arora S, Balasubramaniam S, Zhang W, Zhang L, Sridhara R, Spillman D, et al. FDA Approval Summary: Pembrolizumab plus Lenvatinib for Endometrial Carcinoma, a Collaborative International Review under Project Orbis. Clin Cancer Res 2020;26:5062–7 [DOI] [PubMed] [Google Scholar]
- 41.Miller RE, Leary A, Scott CL, Serra V, Lord CJ, Bowtell D, et al. ESMO recommendations on predictive biomarker testing for homologous recombination deficiency and PARP inhibitor benefit in ovarian cancer. Ann Oncol 2020;31:1606–22 [DOI] [PubMed] [Google Scholar]
- 42.Kelso AA, Lopezcolorado FW, Bhargava R, Stark JM. Distinct roles of RAD52 and POLQ in chromosomal break repair and replication stress response. PLoS Genet 2019;15:e1008319 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Tarsounas M, Sung P. The antitumorigenic roles of BRCA1-BARD1 in DNA repair and replication. Nat Rev Mol Cell Biol 2020;21:284–99 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Yun MH, Hiom K. CtIP-BRCA1 modulates the choice of DNA double-strand-break repair pathway throughout the cell cycle. Nature 2009;459:460–3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Sulkowski PL, Corso CD, Robinson ND, Scanlon SE, Purshouse KR, Bai H, et al. 2-Hydroxyglutarate produced by neomorphic IDH mutations suppresses homologous recombination and induces PARP inhibitor sensitivity. Sci Transl Med 2017;9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Osia B, Alsulaiman T, Jackson T, Kramara J, Oliveira S, Malkova A. Cancer cells are highly susceptible to accumulation of templated insertions linked to MMBIR. Nucleic Acids Res 2021;49:8714–31 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Wang YK, Bashashati A, Anglesio MS, Cochrane DR, Grewal DS, Ha G, et al. Genomic consequences of aberrant DNA repair mechanisms stratify ovarian cancer histotypes. Nat Genet 2017;49:856–65 [DOI] [PubMed] [Google Scholar]
- 48.Cancer Genome Atlas Research N. Integrated genomic analyses of ovarian carcinoma. Nature 2011;474:609–15 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Morgan RD, McNeish IA, Cook AD, James EC, Lord R, Dark G, et al. Objective responses to first-line neoadjuvant carboplatin-paclitaxel regimens for ovarian, fallopian tube, or primary peritoneal carcinoma (ICON8): post-hoc exploratory analysis of a randomised, phase 3 trial. Lancet Oncol 2021;22:277–88 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Coleman RL, Oza AM, Lorusso D, Aghajanian C, Oaknin A, Dean A, et al. Rucaparib maintenance treatment for recurrent ovarian carcinoma after response to platinum therapy (ARIEL3): a randomised, double-blind, placebo-controlled, phase 3 trial. Lancet 2017;390:1949–61 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data used in this study are available from PCAWG (accessed Sept. 2020). Controlled PCAWG data can be obtained after applying for access through ICGC DACO and dbGaP (https://docs.icgc.org/pcawg/data/).