Highlights
-
•
Homologous recombination deficiency is strongly dependent on tumor purity estimation.
-
•
Further bioinformatic algorithms and parameter choice influence the score.
-
•
For correct tumor purity determination implementation of digital pathology is advantageous.
Keywords: Homologous recombination deficiency, HRD, Tumor purity, Tumor cell content, Whole exome sequencing, WES
Abstract
Homologous recombination deficiency (HRD) is a predictive marker for response to poly (ADP-ribose) polymerase inhibitors (PARPi) in ovarian carcinoma. HRD scores have entered routine diagnostics, but the influence of algorithms, parameters and confounders has not been analyzed comprehensively.
A series of 100 poorly differentiated ovarian carcinoma samples was analyzed using whole exome sequencing (WES) and genotyping. Tumor purity was determined using conventional pathology, digital pathology, and two bioinformatic methods. HRD scores were calculated from copy number profiles determined by Sequenza and by Sclust either with or without fixed tumor purity. Tumor purity determination by digital pathology combined with a tumory purity informed variant of Sequenza served as reference method for HRD scoring.
Seven tumors had deleterious mutations in BRCA1/2, 12 tumors had deleterious mutations in other homologous recombination repair (HRR) genes, 18 tumors had variants of unknown significance (VUS) in BRCA1/2 or other HRR genes, while the remaining 63 tumors had no relevant alterations. Using the reference method for HRD scoring, 68 tumors were HRD-positive. HRDsum determined by WES correlated strongly with HRDsum determined by single nucleotide polymorphism (SNP) arrays (R = 0.85). Conventional pathology systematically overestimated tumor purity by 8% compared to digital pathology. All investigated methods agreed on classifying the deleterious BRCA1/2-mutated tumors as HRD-positive, but discrepancies were observed for some of the remaining tumors. Discordant HRD classification of 11% of the tumors was observed comparing the tumor purity uninformed default of Sequenza and the reference method.
In conclusion, tumor purity is a critical factor for the determination of HRD scores. Assistance by digital pathology helps to improve accuracy and imprecision of its estimation.
Introduction
Homologous recombination deficiency (HRD) is highly prevalent in high-grade serous ovarian carcinoma (HGSC) and most commonly caused by germline or somatic mutations in BRCA1/2 or hypermethylation of the BRCA1 promoter, but also potentially by alterations of other genes in the homologous recombination repair (HRR) pathway [1]. In the The Cancer Genome Atlas (TCGA) ovarian cancer cohort, 20% of the tumors had germline or somatic mutations in BRCA1/2, 16% of the tumors were BRCA1-hypermethylated, while 69% of the tumors had an elevated genomic instability score [2,3]. HRD-positive carcinomas respond favorably to platinum-based chemotherapy and to poly (ADP-ribose) polymerase inhibitors (PARPi). The introduction of PARPi into clinical practice has significantly changed the ovarian cancer treatment landscape in both, first-line and relapse settings [4,5].
HRD status can be determined either by analysis of its "cause" or of the "consequence" [6]. The former approach is based on the detection of deleterious or likely deleterious mutations in BRCA1/2 and other genes involed in HRR. The latter approach is based on the detection of genomic scar patterns arising from the increased usage of the more error-prone nonhomologous end-joining when HRR is defective [7]. Three different genomic instability scores (as known as HRD scores), namely loss of heterozygosity (LOH), large scale transitions (LST), and telomeric allelic imbalance (TAI), have been described in independent studies [8], [9], [10]. Today, the combination HRDsum = LOH + LST + TAI with the cutpoint 42 is the most commonly used genomic instability score used in clinical practice [11,12]. Clinical trials evaluating PARPi as maintenance therapy after first-line chemotherapy or after relapse demonstrated an incremental benefit across subgroups defined by HRD testing: The greatest benefit was observed in the BRCA1/2 mutation subgroup, followed by the HRD-score-positive but BRCA1/2 negative subgroup, while the smallest benefit was detected in the HRD-negative subgroup [13]. In 2020, both the European Medicines Agency (EMA) and the U.S. Food and Drug Administration (FDA) approved genomic instability scores as predictive marker for olaparib and bevacizumab as maintenance therapy after first-line chemotherapy and bevacizumab based on results of the PAOLA-1 trial [14]. A high level of agreement was observed among European experts that both BRCA1/2 and HRD score based HRD testing should be performed at primary diagnosis of ovarian cancer [15]. Thus, methods for reliable HRD detection are of critical importance to optimize the clinical benefit of PARPi.
Measurement of HRD scores strongly relies on accurate calling of allele specific copy number alterations (CNA) in the cancer genome. In this context, admixture of normal cells in tumor tissue samples (influencing tumor purity) as well as tumor aneuploidy need to be considered important for CNA calling. Publicly available and commonly used bioinformatic tools for CNA analysis include ASCAT for genotyping data as well as Sequenza and Sclust for whole exome sequencing (WES) data [16,17]. The three algorithms process paired tumor and germline DNA data to a segmentation of the tumor genome in segments of distinct copy numbers of each of the two alleles. ASCAT and Sequenza allow for tumor purity and tumor ploidy as manual input parameters, but also offer the possibility to estimate these parameters bioinformatically. Additional parameters of the segmentation algorithms that need to be prespecified include the gap penalty charged for each discontinuity in the copy number curve and the minimum number SNPs required to support calling of a CNA [18]. However, the influence of tumor purity as well as other parameters and confounders on the accuracy of genomic instability scores has not been systematically investigated.
To perform an in-depth analysis of these parameters, tumor purity measurements and HRD scores were analyzed for a series of 100 poorly differentiated ovarian carcinomas. The tumors were analyzed by conventional pathology, digital pathology, WES and SNP arrays. In a first step, we evaluated our reference implementation of HRD scoring which based on tumor purity estimation by digital pathology and CNA calling by Sequenza. In a second step, we analyzed the performance of three alternative implementations compared to the reference implementation. We also investigated the effect of the parameters 'gap penalty' (gamma) and 'minimum number of supporting SNP' (kmin) on the resulting HRD scores. Our results show that accurate determination of tumor purity is crucial for the determination of reliable and valid HRD scores.
Material and methods
Study cohort
The study cohort comprised 100 formalin-fixed and paraffin embedded (FFPE) poorly differentiated ovarian carcinoma samples diagnosed at the Institute of Pathology at the Heidelberg University Hospital (Suppl. Table S1). The majority of tumors were high-grad serous ovarian cancer (HGSOC, 93%). The cohort also included endometroid cancer (EOV, 3%), mucinous ovarian cancer (MOV, 2%), clear cell ovarian cancer (CCOV, 1%), and ovarian carcinosarcoma (OCS, 1%). The majority of tumors had FIGO stage III-IV (67%), a single tumor had FIGO stage I (1%), and staging data were not available for the remaining tumors. The retrospective analysis of sequencing data was performed in line with the Declaration of Helsinki and the guidelines of the Ethics Committee of the Medical Faculty at the University of Heidelberg (ethics vote S-315/2020).
Whole exome sequencing of FFPE tissue samples
Libraries for tumor and corresponding normal tissues were prepared with the Agilent SureSelect XT HS2 system and incorporation of unique molecular indices (UMI). Target enrichment was performed with the Clinical Exome v2 bait set. In brief, tumor DNA extracted from FFPE tissue and DNA from total blood was manually sheared on a Covaris ME220 ultrasonicator to fragment sizes of 200–500 bp. 80–100 ng DNA were used for end repair and adaptor ligation according to the manufacturer's instructions. Hybridization was performed overnight followed by bead purification. Final libraries were qualified on an Agilent D1000HS tape and fluorimetrically quantified using QuBit instrument (Thermo Fisher). Four tumor/normal pairs (ratio 4:1) were pooled and sequenced together on a NovaSeq6000 SP flow cell (Illumina) using 2 × 101 bp reads. Data analysis was performed on a local Illumina DRAGEN Bio-IT platform version 3.8 using somatic tumor-normal and germline only workflows using the genome assembly GRCh37.
Estimation of tumor purity
Four different methods were applied to determine tumor purity (Suppl. Fig. 1, Suppl. Table S2): 1. Tumor purity was determined assisted by digital image analysis of the scanned H&E-stained tissue slide as detailed below (digital pathology). This method was considered as the gold standard for the current study. 2. Tumor purity was determined by microscopic inspection of the H&E-stained tissue slide by an experienced pathologist (conventional pathology). For both pathological methods, tumor purity was defined as the number of vital tumor cells without necrosis divided by all cells in the marked area of the slide. The marked area of the H&E slide corresponded to the tissue sample that was analyzed by WES. 3. Tumor purity was estimated bioinformatically from the WES data using Sequenza [16]. 4. Tumor purity was estimated bioinformatically from the WES data using Sclust [19].
Digital image analysis was performed by an experienced pathologist in a semiautomatic manner using QuPath, a more detailed description can be found in the online methods [20,21].
Analysis of the mutations in the homologous recombination pathway
We adopted the tumor classification system developed before [3]: Class H1a comprised tumors with deleterious or likely deleterious BRCA1/2 alterations, class H1b comprised tumors with deleterious or likely deleterious alterations in other genes of the HRR pathway, class H2a comprised tumors with VUS in BRCA1/2, class H2b comprised tumors with VUS in other relevant genes, and class H3 comprised all the remaining tumors. Further details are in the online methods.
Homologous recombination deficiency scores
Calculations of HRD scores involved two steps, the determination of allele-specific copy numbers from WES data of paired tumor and normal tissues samples and the subsequent calculation of LOH, LST, TAI, and HRDsum. The first step was performed using four different methods: 1. Copy numbers were estimated using Sequenza with fixed tumor purity at the value determined by digital image analysis. This method was considered as reference. 2. Copy numbers were estimated using Sequenza with fixed tumor purity at the value determined by a pathologist using conventional light microscopy. 3. Copy number were estimated using Sequenza without input of a tumor purity estimate. 4. Copy numbers were estimated using Sclust without input of a tumor purity estimate. Variants needed for the Sclust workflow were obtained from the Illumina DRAGEN analysis pipeline. The second step was performed using a modified version of scarHRD [22].
Genotyping analysis
Genotyping was performed in a subcohort of 38 tumors with the Illumina Infinium CytoSNP-850 K v1.2 BeadChip using an automated protocol according to the manufacturer's instructions (Illumina, San Diego, CA, USA). Data analysis was conducted using Genome Studio from Illumina to extract B allele frequency and log R ratio from raw data. More detailed information can be found in the online methods.
Analysis of TCGA data
Calculation of HRD scores and analysis of the mutations in the HRR pathway for the cohort of ovarian serous adenocarcinoma (TCGA-OV) were performed in the same way as for the in-house cohort and as described before [3].
Statistical analysis and visualization
Statistical analysis and graphics generation were performed using the statistical programming language R and Python with SciPy [23], scikit-learn [24]. Graphics were created using Matplotlib [25], and seaborn [26]. 95% confidence intervals (CI) for AUC ROC were computed using the R package pROC [27]
Significance of differences of tumor purity and HRDsum score estimates between different methods were assessed using the Wilcoxon test. Association of HRD scores calculated by different methods was assessed using Pearson correlations and linear models with an intercept set to zero. The deviation between two correlations was assessed for significance by Steiger's method as implemented in the Python script CorrelationsStats [28].
We developed a novel kind of heatmap to visualize the level of HRD scores as a function of tumor purity and tumor ploidy. In the heatmaps, levels of Log-Posterior Probability (LPP) are shown as contour lines, a red box indicates the reference solution and a yellow star the default solution. Furthermore, we developed a special type of circos plots showing allele-specific copy numbers as well as the CNA contributing to LOH, TAI and LST using the R package BioCircos [29,30].
The systematic analysis varying the gap penalty (gamma) and the minimum number of supporting SNPs (kmin) demanded for a CNA call was performed in a subcohort of ten tumors.
Results
A series of 100 ovarian cancer tissue samples was analyzed using WES of paired tumor and normal DNA samples and evaluated for HRD (Suppl. Fig. 1). In the first part of the study, the samples were analyzed using our reference implementation for HRD scoring. For the reference method, tumor purity determined by digital image analysis of the scanned H&E-stained slides was used as input parameter to estimate the landscape of allele-specific copy numbers. This was done using Sequenza [16] by optimizing the LPP of the fitted model under the condition of fixed tumor purity. Then, LOH, LST, TAI, and HRDsum were calculated from the copy number profiles. In the second part of the study, the results from the reference implementation were used as a benchmark and we systematically analyzed different aspects of HRD measurement, including the impact of tumor purity, algorithms and parameter settings.
Part 1: reference implementation
Tumors were classified according to our classification system introduced before [3]: Class H1a (n = 7) included tumors with deleterious or likely deleterious somatic or germline alterations in BRCA1/2, while H1b (n = 12) included tumors with such alterations in other genes of the HRR pathway. Class H2a (n = 5) included tumors with somatic and/or germline VUS in BRCA1/2, while class H2b (n = 13) included tumors with such alteration in other relevant genes. Class H3 included tumors without any relevant alteration in HRR genes (n = 63). The median values of HRDsum in class H1a, H1b, H2a, H2b, and H3 were 72, 57, 56, 42, and 45 (Fig. 1a). HRDsum was significantly higher in class H1a compared to class H3 (fold change=1.58, p = 0.0002), while no significant difference was detected in class H1b, H2a, and H2b compared to class H3.
Fig. 1.
Performance of HRD scores determined by WES of paired tumor and normal DNA in the study cohort (100 cases). HRD scores were calculated bv the reference method (Sequenza with fixed tumors purity as determined by digital pathology). a Levels of HRDsum in the mutation class H1a (deleterious BRCA1/2 alterations), H1b (deleterious mutations in other HRR genes), H2a (BRCA1/2 VUS), H2b (VUS in other HRR genes), and H3 (no relevant alterations detected). b Strong correlation of HRDsum determined by WES and HRDsum determined by SNP arrays (38 cases). c Correlation analysis of TAI, LST, and LOH (all determined by WES).
HRDsum calculated from WES strongly correlated with HRDsum calculated from SNP arrays (R = 0.85, p = 1E-10, Fig. 1b). A linear fit indicated no significant bias between the HRDsum scores determined by the two methods (slope=1.0).
Significant pairwise correlations ranging from 0.25 to 0.66 were observed between TAI, LOH, and LST (Fig. 1c). Linear regression showed that TAI was systematically higher by 2% and LOH was systematically lower by 60% compared to LST. Similar percentages of 1% and 59% were observed in the analysis of the genotyping data (Suppl. Fig. 2). This result supports the notion that extraction of all three scores TAI, LOH, and LST from the WES data was feasible.
Part 2: alterative implementations
Comparison of methods to estimate tumor purity
Three alternative methods to estimate tumor purity were compared to the digital pathology approach that served as reference (Fig. 2a). Tumor purity estimated by conventional pathology correlated moderately with the reference (R = 0.48). Also, moderate correlations with the reference were observed for the bioinformatic tumor purity estimates obtained by Sequenza and by Sclust (R = 0.42 and R = 0.6). Comparing the two bioinformatic methods, the correlation with the reference observed for Sclust was significantly stronger than the one observed for Sequenza (p = 0.0023). Tumor purity estimates obtained by the two bioinformatic methods correlated strongly with each other (R = 0.76, Suppl. Fig. 3).
Fig. 2.
Comparison of four different methods to determine tumor purity (100 cases). Tumor purity was estimated starting from H&E-stained slides (i) using conventional pathology and (ii) using image analysis of digital slides (digital pathology) as well assolely based on the coverage data from WES (iii) using Sequenza and (iv) using Sclust. Each of the other methods were compared against digital pathology that served as reference method. a Moderate correlation of the tumor purity estimates obtained by conventional pathology, Sequenza and Sclust and the reference method. b Systematic analysis of the differences (Δ) between tumor purity estimates obtained by conventional pathology, Sequenza as well as Sclust and estimates obatained by the reference method.
We systematically investigated the deviation of tumor purity estimates between the alternative methods and the reference method (Fig. 2b). Conventional pathology significantly overestimated tumor purity compared to the reference (median=8%, p = 6.8E-05), while a significant underestimation was observed for Sclust (median=−4%, p = 0.02) and a non-significant underestimation for Sequenza (median=−3%, p = 0.21). The comparison of tumor ploidy determined by the four methods is shown in Suppl. Fig. 4.
Tumor purity uninformed detection of HRD
The default implementation of Sequenza does not require tumor purity as input parameter, but relies on bioinformatic estimation of tumor purity from the WES data. Both the default and the reference implementation derive a segmentation of the tumor genome into segments of constant copy number from the WES data by maximizing the LPP. For the reference implementation, this maximization is done over a range of ploidies, while the tumor purity is kept fixed. For the default implementation, the maximization is done over a range of ploidies and all possible tumor purities.
Comparing the default and reference implementation, a deviation of five or more of the HRDsum level was observed for 49 of the 100 analyzed tumors (Figs. 3a, Suppl. 5), while a change in classification was observed in 11 tumors: 3 tumors were classified falsely positive (HRDsum ≥ 42) by the default method, while 8 tumors where classified falsely negative (HRDsum < 42) by the default method.
Fig. 3.
Comparison of HRDsum scores using the default version of Sequenza (tumor purity uninformed) and the reference version of Sequenza (informed by digitally determined tumor purity). a Overview of 49 tumors (49%) for which the HRDsum scores obtained by the two methods differed by five or more. b Systematic analysis of the differences (Δ) between tumor purity estimates obtained by the default method compared to the reference method and their effect on the HRD scores. c Systematic analysis of the differences (Δ) between tumor ploidy estimates obtained by the default method compared to the reference method and of their effect on HRD scores.
Next, we looked for a systematic bias of HRD scores associated with under- or overestimation of tumor purity by the default method (Fig. 3b). Overestimation of tumor purity was associated with lower LOH, LST, TAI, and HRDsum. Overestimation of tumor ploidy was associated with lower LOH and HRDsum.
Visualization of HRD scores
Exemplary cases illustrate the effect of a tumor purity informed approach (reference implementation) compared to a tumor purity uninformed approach (default implementation) on the resulting HRD scores. Heatmaps show the dependency of HRDsum on tumor purity and tumor ploidy (Figs. 4a, Suppl. 6). Circos plots visualize the segmentation of the tumor genome in regions of constant copy number segments as well as the contribution of CNA to LOH, LST, and TAI to HRDsum for the default (Fig. 4b) and reference implementation (Fig. 4c), respectively.
Fig. 4.
Visualization of tumor purity, tumor ploidy and HRD score estimates in four exemplary cases. a Heatmaps showing HRDsum estimated by Sequenza in dependence of tumor purity and tumor ploidy. Contour lines visualize the log-posterior probability (LPP) of the copy number segmentation estimated from the WES coverage data. Darker colors encode higher LPP levels. A yellow star marks the Sequenza default solution as obtained by maximizing the LPP over the entire heatmap (default method). A red box marks the Sequenza reference solution that is obtained by maximizing the LPP under the condition of fixed tumor purity at the value obtained by digital pathology (reference method). b Circos plots show the genomic scars contributing to LOH, LST, and TAI as determined by the default method. The inner blue and red tracks visualize the estimated copy numbers of the major (CN A) and the minor allele (CN B). Chromosomes are ordered clockwise, starting with chromosome 1 at 12 o'clock. c Same as in b, but for the reference method.
For case OV15, tumor purity was overestimated by the default implementation (54% instead of 28%). As a consequence, a lower HRDsum score of 26 instead of 67 was obtained and the tumor was classified false-negatively. The discrepancy was mainly due to an underestimation of LST as revealed by the circos plot.
For case OV37, tumor purity was underestimated (41% instead of 60%). Tumor ploidy was estimated as 2.0 (diploid) using the default method, while tumor ploidy was estimated as 4.2 (tetraploid) using the reference method. As a consequence, a higher HRDsum score of 71 instead of 49 was obtained comparing the default to the reference method. The discrepancy was mainly due to the detection of a higher number of LOH regions.
For case OV66, tumor purity was overestimated by the default method (75% instead of 36%). As a consequence, an HRDsum score of 10 instead of 40 was derived. The discrepancy was mainly due to an underestimation of LST.
For case OV100, tumor purity was considerably underestimated be the default method (27% instead of 88%). As a consequence, an HRDsum score of 50 instead of 18 was derived and the tumor was classified false-positively. The discrepancy was mainly due to an overestimation of LOH and TAI.
Two of the four illustrative cases changed HRD status when different methods were used to derive HRD scores. In line with the systematic analysis presented in Fig. 3B, overestimation of tumor purity resulted in falsely positive HRD classification, while underestimation of tumor purity resulted in falsely negative HRD classification.
Influence of gap penalty and minimum number of supporting SNPs
Additionally to tumor purity and tumor ploidy, the following two parametersneed to be specified when fitting a copy number segmentation to sequencing coverages: The gap penalty (gamma) specifying the cost connected to each CNA and the minimum number of supporting SNPs (kmin) demanded for a CNA call.The variation of HRDsum was small when these parameters were varied over a wide range (Fig. 5). LOH and LST scores were slighly higher for lower gap penalty and for a lower number of supporting SNPs. TAI scores tended to be higher for lower gap penalty, but lower for a lower number of supporting SNPs. When combining LOH, LST, and TAI to HRDsum, some of these dependencies were eliminated resulting in less variation in the combined score compared to the single scores.
Performance comparison of method to detect HRD
Fig. 5.
Systematic analysis of different choices of the parameters gap penalty (gamma) and minimum number of supporting SNPs (kmin) and their effect on LOH, TAI, LST, and HRDsum (10 cases). Deviations (Δ) of the HRD scores from the default results (in %) are visualized as heatmaps. A green box marks the default values of the parameters that are used by scarHRD (gamma = 60, kmin = 50).
Three alternative methods were compared to the reference implementation, in which Sequenza was combined with tumor purity estimation by digital pathology: Sequenza combined with conventional pathology, Sequenza with bioinformatic estimation of tumor purity and Sclust with bioinformatic estimation of tumor purity. As Sclust did not converge for eleven of the 100 samples, this analysis was performed in a subcohort of 89 samples. Using HRDsum and the cutpoint of 42, all four methods classified all of the tumors in class H1a as HRD-positive (Fig. 6a). By contrast, higher percentages of 68%, 54%, and 61% of the tumors in class H3 were classified as HRD-positive by the Sequenza-based method compared to 19% that were above the threshold using Sclust (all p<0.015).
Fig. 6.
Comparison of four different methods to derive HRD scores from WES data (89 cases): 1. Sequenza combined with tumor purity estimated by digital pathology (reference method), 2. Sequenza combined with tumor purity estimated by conventional pathology, 3. Sequenza with bioinformatically estimated tumor purity (default method), and 4. Sclust with bioinformatically estimated tumor purity. a Proportion of the HRD-positive (HRDsum≥42) tumors in the five mutation classes H1a, H1b, H2a, H2b, and H3. b ROC analysis to analyze the capability of HRDsum to separate between class H1a and class H3. c ROC analysis to analyze the capability of HRDsum to separate between class H1b and class H3. d Systematic analysis of the deviations (Δ) of HRDsum scores between the three alternative methods and the reference method.
In a ROC analysis, a significant separation of class H1a and class H3 was achieved by all four methods (Fig. 6b). Comparing the performance of the methods by area under the curve (AUC) Sclust performed numerically best (AUC=0.98, CI 0.86–0.99), followed by the reference method (AUC=0.93, CI 0.86–0.99), Sequenza with bioinformatic estimation of tumor purity (AUC=0.92, CI 0.85–0.99), and Sequenza combined with conventional pathology (AUC=0.89, CI 0.76–1), but the performance was not significantly different between the methods. A significant separation of class H1b from class H3 was reached using Sclust (AUC=0.80, CI 0.64–0.96, p = 0.0064), but not using the three other methods (Fig. 6c). In the corresponding analysis in TCGA-OV, Sequenza slightly outperformed Sclust in the separation of class H1a from class H3 (AUC=0.83 vs. AUC=0.79, p = 1.2E-14), while neither Sequenza nor Sclust significantly separated class H1b from class H3 (Suppl. Fig. 7).
We systematically investigated the deviation of HRDsum estimates between the alternative methods and the reference method (Fig. 6d). The distribution of HRD scores obtained by the two Sequenza-based methods did not differ significantly from the reference method. By contrast, comparing Sclust and the reference method a strong and significant negative bias of the HRD score was observed (median=−11, p = 1.6E-15) and more than 25% of the tumors were downshifted by 20 units or more.
Discussion
Over the last few years several complex predictive biomarkers (tumor mutational burden, microsatellite instability, and HRD) have been approved by the medical agencies and are increasingly being used in routine diagnostics influencing patient management. Reliable interrogation of complex biomarkers and an increasing number of single druggable targets for an increasing number of patients require more comprehensive sequencing approaches (large panels, WES, WGS). A number of successful comprehensive genomic profiling programs have been launched over the last years and it is reasonable to assume that further development in the field will shift these programs from research to clinical care [31], [32], [33], [34].
Obviously, harmonization and standardization, which guarantes homogenous and reliable test results across institutions, networks and programs will play an increasing role in this context. Since the underlying diagnostic processes including preanalytics, sequencing, and bioinformatic analysis become increasingly complex and require compliance with legal regulations (e.g. with the recently introduced In-vitro-Diagnostic Device Regulation (IVDR) in Europe [35,36]), a comprehensive understanding and control of influencing parameters is critical for management and to ensure reliable test results. While significant efforts have been made to standardize NGS based on specific quality parameters, the influence of pre-analytical steps involving determination of tumor purity as well as bioinformatic tools used in both research and clinical programs are only partly recognized, particularly in the field of complex biomarkers. As HRD scores are summary measures of allele specific CNAs in the tumor cells, CNA calling with integrated decomposition of the tumor tissue in tumor and normal cells employing tumor purity and ploidy as parameters is key for their determination.
Prompted by our recent in silico analysis, which revealed that HRD scores are significantly influenced by tumor purity [3], we set out to systematically investigate the influence of methods used for tumor purity measurement as well as algorithms commonly employed to determine copy number profiles that in turn are needed to calculate the components of the HRDsum score (LOH, TAI, LST). To this end and in a first step, we defined a benchmark method for determination of HRD scores from WES data: Tumor purity was determined assisted by image analysis of digitalized H&E slides und used as input for the calculation of HRD scores using Sequenza and scarHRD. A strong correlation of R = 0.85 was observed comparing the HRD scores obtained by this method compared to the results from genotyping using SNP arrays, the method originally used to extract HRD scores [8], [9], [10].
In the second step, the resulting benchmark data set was compared with HRD results using different methods, algorithms and parameter settings for tumor purity measurement and CNA calling. First, all four employed methods agreed in identifying the seven BRCA1/2-mutated tumors as HRD-positive. Second, a discordant HRD-classification was obtained for 11% of the patients when using a purely bioinformatic measurement of tumor purity instead of considering the accurate measurement of tumor purity by digital pathology. While large deviations were observed for only a minority of samples, drastically wrong bioinformatic estimations of the tumor purity with errors of 25% or more can occur and were observed in exemplary cases. Thus, accurate determination of tumor purity, ideally by digital image analysis, is critical for reliable determination of HRD scores.
Third, HRDsum scores determined by Sclust were systematically lower than the ones determined by the reference implementation with a median deviation of −11. Using Sclust, only 29% of the tumors in the study cohort were classified as HRD-positive, while 64% of the tumors were classified as HRD-positive using the reference method. In the study cohort, the best separation of classes H1a and H1b from class H3 was accomplished using Sclust, but the outperformance of Sclust compared to the other methods was not significant. In the TCGA-OV data, Sequenza separated class H1a significantly better from class H3 than Sclust. As a limitation, Sclust could not call copy numbers for 11% of the samples and HRD scores could not be determined for these samples. Further studies are warranted to compare the performance of Sclust or Sequenza for the calculation of HRD scores and for the prediction of response to PARPi.
Our data were obtained by WES, which will be increasingly used in molecular cancer profiling. While we have not analyzed panel sequencing or WGS data in the current study, our results strongly suggest an influence of the parameters analyzed here on other types of NGS data as similar bioinformatic pipelines are used for the calculation of CNA and of HRD scores. Additional studies are required to investigate this further. Another limitation of the current study is the lack of clinical outcome data that were not mature enough for an evaluation.
In summary, our recently published in silico analysis [3] and the study results presented here show a significant influence of tumor purity and the utilized bioinformatic methods, algorithms and parameters on HRD scores and HRD classification. Since these influences can lead to a shift of HRD scores and a change of HRD classification that can directly affect patient management, rigorous standardization is critical and also urges commercial providers of assays used in approval trials to disclose the corresponding preanalytical and bioinformatic settings. The same applies to clinical laboratories reporting molecular profiling results including HRD based on laboratory developed tests. Our study identified tumor purity as critical paramter for the determination of HRD scores, reported a considerable variability of tumor purity measurements across different applied methods and recommends to incorporate digital image analysis for a more accurate determination of tumor purity.
Funding sources
This work was funded by the state of Baden-Wuerttemberg and within the network of Centers for Personalized Medicine Baden-Wuerttemberg (Zentren für Personaliserte Medizin, ZPM).
CRediT authorship contribution statement
Michael Menzel: Methodology, Software, Validation, Formal analysis, Investigation, Visualization, Writing – original draft, Writing – review & editing. Volker Endris: Methodology, Investigation, Writing – review & editing. Constantin Schwab: Investigation, Writing – review & editing. Klaus Kluck: Software, Formal analysis, Data curation, Writing – review & editing. Olaf Neumann: Investigation, Writing – review & editing. Susanne Beck: Software, Writing – review & editing. Markus Ball: Methodology, Writing – review & editing. Christian Schaaf: Writing – review & editing. Stefan Fröhling: Writing – review & editing. Peter Lichtner: Investigation, Writing – review & editing. Peter Schirmacher: Writing – review & editing. Daniel Kazdal: Methodology, Investigation, Writing – review & editing. Albrecht Stenzinger: Conceptualization, Methodology, Investigation, Writing – original draft, Writing – review & editing, Supervision, Funding acquisition. Jan Budczies: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Visualization, Writing – original draft, Writing – review & editing, Supervision, Funding acquisition.
Declaration of Competing Interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:
MM holds stock in Illumina. ON reports personal fees from Novartis outside the submitted work. SF received payment or honoraria for lectures from Illumina, Roche; travel grants from Illumina, Roche; participated on Data Safety Monitoring Board or Advisory Board for Illumina, Roche; received equipment, materials, drugs, medical writing, gifts or other services from AstraZeneca, Illumina, Pfizer, PharmaMar, Roche. PS received payment or honoraria for lectures from AstraZeneca, Incyte, Janssen; participated on Data Safety Monitoring Board or Advisory Board for Bristol Myers Squibb, MSD, AstraZeneca, Roche. DK received personal fees outside the submitted work from AstraZeneca, Bristol-Myers Squibb, Pfizer, Lilly, Agilent, Takeda. AS participated on Advisory Board/Speaker's Bureau from Astra Zeneca, AGCT, Bayer, BMS, Eli Lilly, Illumina, Janssen, MSD, Novartis, Pfizer, Roche, Seattle Genetics, Takeda, Thermo Fisher; received grants from Bayer, BMS, Chugai, Incyte. JB received grants from German Cancer Aid; received personal fees outside the submitted work from MSD. VD, CS, KK, SB, MB, CS, PL declare no competing interests.
Footnotes
Supplementary material associated with this article can be found in the online version at doi:10.1016/j.tranon.2023.101706.
Contributor Information
Albrecht Stenzinger, Email: albrecht.stenzinger@med.uni-heidelberg.de.
Jan Budczies, Email: jan.budczies@med.uni-heidelberg.de.
Appendix. Supplementary materials
Data availability
The molecular data of the 100 ovarian carcinomas were uploaded to the EBI BioStudies repository (S-BSST1077). Sequencing raw data are available upon request to the corresponding author.
References
- 1.Moschetta M., George A., Kaye S.B., Banerjee S. BRCA somatic mutations and epigenetic BRCA modifications in serous ovarian cancer. Ann. Oncol. 2016;27(8):1449–1455. doi: 10.1093/annonc/mdw142. [DOI] [PubMed] [Google Scholar]
- 2.Cancer Genome Atlas Research, Network Integrated genomic analyses of ovarian carcinoma. Nature. 2011;474(7353):609–615. doi: 10.1038/nature10166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Rempel E., Kluck K., Beck S., Ourailidis I., Kazdal D., Neumann O., et al. Pan-cancer analysis of genomic scar patterns caused by homologous repair deficiency (HRD) NPJ Precis. Oncol. 2022;6(1):36. doi: 10.1038/s41698-022-00276-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ruscito I., Bellati F., Ray-Coquard I., Mirza M.R., du Bois A., Gasparri M.L., et al. Incorporating parp-inhibitors in primary and recurrent ovarian cancer: a meta-analysis of 12 phase II/III randomized controlled trials. Cancer Treat. Rev. 2020;87 doi: 10.1016/j.ctrv.2020.102040. [DOI] [PubMed] [Google Scholar]
- 5.Tew W.P., Lacchetti C., Ellis A., Maxian K., Banerjee S., Bookman M., et al. PARP inhibitors in the management of ovarian cancer: ASCO guideline. J. Clin. Oncol. 2020;38(30):3468–3493. doi: 10.1200/JCO.20.01924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Stewart M.D., Merino Vega D., Arend R.C., Baden J.F., Barbash O., Beaubier N., et al. Homologous recombination deficiency: concepts, definitions, and assays. Oncologist. 2022;27(3):167–174. doi: 10.1093/oncolo/oyab053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Dietlein F., Thelen L., Reinhardt H.C. Cancer-specific defects in DNA repair pathways as targets for personalized therapeutic approaches. Trends Genet. 2014;30(8):326–339. doi: 10.1016/j.tig.2014.06.003. [DOI] [PubMed] [Google Scholar]
- 8.Abkevich V., Timms K.M., Hennessy B.T., Potter J., Carey M.S., Meyer L.A., et al. Patterns of genomic loss of heterozygosity predict homologous recombination repair defects in epithelial ovarian cancer. Br. J. Cancer. 2012;107(10):1776–1782. doi: 10.1038/bjc.2012.451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Birkbak N.J., Wang Z.C., Kim J.Y., Eklund A.C., Li Q., Tian R., et al. Telomeric allelic imbalance indicates defective DNA repair and sensitivity to DNA-damaging agents. Cancer Discov. 2012;2(4):366–375. doi: 10.1158/2159-8290.CD-11-0206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Popova T., Manie E., Rieunier G., Caux-Moncoutier V., Tirapo C., Dubois T., et al. Ploidy and large-scale genomic instability consistently identify basal-like breast carcinomas with BRCA1/2 inactivation. Cancer Res. 2012;72(21):5454–5462. doi: 10.1158/0008-5472.CAN-12-1470. [DOI] [PubMed] [Google Scholar]
- 11.Telli M.L., Timms K.M., Reid J., Hennessy B., Mills G.B., Jensen K.C., et al. Homologous recombination deficiency (HRD) score predicts response to platinum-containing neoadjuvant chemotherapy in patients with triple-negative breast cancer. Clin. Cancer Res. 2016;22(15):3764–3773. doi: 10.1158/1078-0432.CCR-15-2477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Timms K.M., Abkevich V., Hughes E., Neff C., Reid J., Morris B., et al. Association of BRCA1/2 defects with genomic scores predictive of DNA damage repair deficiency among breast cancer subtypes. Breast Cancer Res. 2014;16(6):475. doi: 10.1186/s13058-014-0475-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Miller R.E., Leary A., Scott C.L., Serra V., Lord C.J., Bowtell D., et al. ESMO recommendations on predictive biomarker testing for homologous recombination deficiency and PARP inhibitor benefit in ovarian cancer. Ann. Oncol. 2020;31(12):1606–1622. doi: 10.1016/j.annonc.2020.08.2102. [DOI] [PubMed] [Google Scholar]
- 14.Ray-Coquard I., Pautier P., Pignata S., Perol D., Gonzalez-Martin A., Berger R., et al. Olaparib plus bevacizumab as first-line maintenance in ovarian cancer. N. Engl. J. Med. 2019;381(25):2416–2428. doi: 10.1056/NEJMoa1911361. [DOI] [PubMed] [Google Scholar]
- 15.Vergote I., Gonzalez-Martin A., Ray-Coquard I., Harter P., Colombo N., Pujol P., et al. European experts consensus: bRCA/homologous recombination deficiency testing in first-line ovarian cancer. Ann. Oncol. 2022;33(3):276–287. doi: 10.1016/j.annonc.2021.11.013. [DOI] [PubMed] [Google Scholar]
- 16.Favero F., Joshi T., Marquard A.M., Birkbak N.J., Krzystanek M., Li Q., et al. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann. Oncol. 2015;26(1):64–70. doi: 10.1093/annonc/mdu479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Van Loo P., Nordgard S.H., Lingjaerde O.C., Russnes H.G., Rye I.H., Sun W., et al. Allele-specific copy number analysis of tumors. Proc. Natl. Acad. Sci. U. S. A. 2010;107(39):16910–16915. doi: 10.1073/pnas.1009843107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Nilsen G., Liestol K., Van Loo P., Moen Vollan H.K., Eide M.B., Rueda O.M., et al. Copynumber: efficient algorithms for single- and multi-track copy number segmentation. Bmc Genomics [Electronic Resource] 2012;13:591. doi: 10.1186/1471-2164-13-591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Cun Y., Yang T.P., Achter V., Lang U., Peifer M. Copy-number analysis and inference of subclonal populations in cancer genomes using Sclust. Nat. Protoc. 2018;13(6):1488–1501. doi: 10.1038/nprot.2018.033. [DOI] [PubMed] [Google Scholar]
- 20.Bankhead P., Loughrey M.B., Fernandez J.A., Dombrowski Y., McArt D.G., Dunne P.D., et al. QuPath: open source software for digital pathology image analysis. Sci. Rep. 2017;7(1):16878. doi: 10.1038/s41598-017-17204-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kazdal D., Rempel E., Oliveira C., Allgauer M., Harms A., Singer K., et al. Conventional and semi-automatic histopathological analysis of tumor cell content for multigene sequencing of lung adenocarcinoma. Transl. Lung Cancer Res. 2021;10(4):1666–1678. doi: 10.21037/tlcr-20-1168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Sztupinszki Z., Diossy M., Krzystanek M., Reiniger L., Csabai I., Favero F., et al. Migrating the SNP array-based homologous recombination deficiency measures to next generation sequencing data of breast cancer. NPJ Breast Cancer. 2018;4:16. doi: 10.1038/s41523-018-0066-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Virtanen P., Gommers R., Oliphant T.E., Haberland M., Reddy T., Cournapeau D., et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods. 2020;17(3):261–272. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 2011;12(null):2825–2830. [Google Scholar]
- 25.Hunter JD. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 2007;9(3):90–95. doi: 10.1109/mcse.2007.55. [DOI] [Google Scholar]
- 26.Waskom M. seaborn: statistical data visualization. J. Open Source Software. 2021;6(60) doi: 10.21105/joss.03021. [DOI] [Google Scholar]
- 27.Robin X., Turck N., Hainard A., Tiberti N., Lisacek F., Sanchez J.C., et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinf. 2011;12(1) doi: 10.1186/1471-2105-12-77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Singer P. CorrelationStats. https://github.com/psinger/CorrelationStats. 2021-04-23 (accessed 2022-07-11).
- 29.Cui Y., Chen X., Luo H., Fan Z., Luo J., He S., et al. BioCircos.js: an interactive Circos JavaScript library for biological data visualization on web applications. Bioinformatics. 2016;32(11):1740–1742. doi: 10.1093/bioinformatics/btw041. [DOI] [PubMed] [Google Scholar]
- 30.Vulliard L. Biocircos. https://github.com/lvulliard/BioCircos.R. 2019-05-19 (accessed 2022-07-11).
- 31.Horak P., Klink B., Heining C., Groschel S., Hutter B., Frohlich M., et al. Precision oncology based on omics data: the NCT Heidelberg experience. Int. J. Cancer. 2017;141(5):877–886. doi: 10.1002/ijc.30828. [DOI] [PubMed] [Google Scholar]
- 32.Lih C.J., Harrington R.D., Sims D.J., Harper K.N., Bouk C.H., Datta V., et al. Analytical validation of the next-generation sequencing assay for a nationwide signal-finding clinical trial: molecular analysis for therapy choice clinical trial. J. Mol. Diagn. 2017;19(2):313–327. doi: 10.1016/j.jmoldx.2016.10.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Priestley P., Baber J., Lolkema M.P., Steeghs N., de Bruijn E., Shale C., et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature. 2019;575(7781):210–216. doi: 10.1038/s41586-019-1689-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Stenzinger A., Edsjo A., Ploeger C., Friedman M., Frohling S., Wirta V., et al. Trailblazing precision medicine in Europe: a joint view by genomic medicine sweden and the centers for personalized medicine, ZPM, in germany. Semin. Cancer Biol. 2021 doi: 10.1016/j.semcancer.2021.05.026. [DOI] [PubMed] [Google Scholar]
- 35.Kahles A., Goldschmid H., Volckmar A.L., Plöger C., Kazdal D., Penzel R., et al. Struktur und Inhalt der EU-IVDR. Die Pathol. 2022 doi: 10.1007/s00292-022-01077-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Vogeser M., Bruggemann M., Lennerz J., Stenzinger A., Gassner U.M. Laboratory-developed tests in the new European Union 2017/746 regulation: opportunities and risks. Clin. Chem. 2021;68(1):40–42. doi: 10.1093/clinchem/hvab215. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The molecular data of the 100 ovarian carcinomas were uploaded to the EBI BioStudies repository (S-BSST1077). Sequencing raw data are available upon request to the corresponding author.