Abstract
High-grade serous ovarian cancer (HGSC) is the most lethal histotype of ovarian cancer and the majority of cases present with metastasis and late-stage disease. Over the last few decades, the overall survival for patients has not significantly improved, and there are limited targeted treatment options. We aimed to better characterize the distinctions between primary and metastatic tumors based on short- or long-term survival. We characterized 39 matched primary and metastatic tumors by whole exome and RNA sequencing. Of these, 23 were short-term (ST) survivors (overall survival (OS) < 3.5 years) and 16 were long-term (LT) survivors (OS > 5 years). We compared somatic mutations, copy number alterations, mutational burden, differential gene expression, immune cell infiltration, and gene fusion predictions between the primary and metastatic tumors and between ST and LT survivor cohorts. There were few differences in RNA expression between paired primary and metastatic tumors, but significant differences between the transcriptomes of LT and ST survivors in both their primary and metastatic tumors. These findings will improve the understanding of the genetic variation in HGSC that exist between patients with different prognoses and better inform treatments by identifying new targets for drug development.
Subject terms: Cancer genomics, Ovarian cancer
Analysis of whole-exome and RNA-seq data from matched primary and metastatic tumor samples in a cohort of high-grade serous ovarian cancer patients reveals distinct genomic and transcriptomic alterations associated with survival outcomes.
Introduction
High-Grade Serous Cancer (HGSC) of ovary, fallopian tube, or peritoneum is the most lethal ovarian cancer histotype and the second most common gynecologic malignancy1,2. Over the past few decades, chemotherapy has been the standard of care, yet overall survival (OS) has not significantly improved3,4. PARP inhibitors, which target base-excision DNA repair mechanisms and cause genetic lethality in tumors of patients harboring BRCA1 or BRCA2 mutations or homologous recombination deficiencies, can be used as maintenance therapies, but are only applicable for ~ 50% of HGSC patients5,6. The majority of cases, about 80%, present with late-stage disease, when the tumor has already metastasized3,4,7. These patients have only a 29.2% chance of surviving longer than 5 years4. Therefore, to improve patient survival, we sought to better characterize the genomic and transcriptomic landscapes of matched primary and metastatic ovarian cancers and to identify novel targets for drug development, especially in a more aggressive metastatic disease setting.
Large-scale tumor characterizations, by consortia such as The Cancer Genome Atlas (TCGA), have established that primary HGSC tumors harbor ubiquitous TP53 mutations and copy number alterations, and a low prevalence of other recurrently mutated genes8. Prior genomic studies of ovarian cancers have only included large numbers of primary tumors rather than comparing matched primary and metastatic disease in the context of outcomes. A recent study, Yang et al., characterized clinical and genomic features from HGSC primary tumors that correlated with short-term (ST, OS < 2 years) and long-term (LT, OS > 10 years) survival9. While this study does provide evidence that there are clinical and genomic characteristics unique to patient survival in primary tumors, metastatic tumors were not included. Study design is further nuanced by the context of survival duration, since exceptional survivors (>10 years of survival), have a high prevalence of BRCA mutations, and are known to respond well to standard therapy10. One study has examined genomic and transcriptomic sequencing from matched primary and metastatic tumors in the context of response to chemotherapy or surgical resection11. Another study identified a transcriptome signature that distinguished between primary and metastatic tumors but did not relate this to survival12.
Here, we sought to determine whether there are unique features in the genomes and transcriptomes of metastatic tumors from short-term survivors when compared to their matched primary tumors and/or to primary/metastasis paired tumors from long-term survivors. Our cohort design examines these differences between tumors from patients within the median survival time for ovarian cancer13. In this context, we compared somatic variants, copy number alterations, mutational burden, differential expression, immune cell infiltrates, and gene fusion predictions between chemo naïve primary and metastatic tumors from 23 HGSC short-term (ST, OS < 3.5 years) survivors and 16 HGSC long-term (LT, OS > 5 years) survivors using whole-exome sequencing (WES) and RNA sequencing (RNA-seq).
Results
Characterization of genomic landscape
We compared somatic variants, copy number alterations, and mutational burden between the primary and metastatic tumors of the ST and LT survival groups. Our cohort of patient tumors exhibited characteristics typical of those seen in previously sequenced HGSC tumors, such as nearly ubiquitous TP53 mutations, high numbers of copy alterations, and a low number of recurrently mutated genes (Fig. 1a, b). As was found in the Yang et al. study, our cohort of LT survivors also exhibited a significantly greater mutational burden than the ST survivors (Mann–Whitney–Wilcoxon test statistic = 611, p-value = 5.1e-6)9. There was no statistical difference between the mutational burden of paired primary and metastatic tumors (MW stat = 611, p-value = 0.4299) (Fig. 2c).
Interestingly, in comparison to recurrently mutated genes identified in the TCGA cohort, we also observed that RB1 mutations were found only in primary and metastatic tumors of LT survivors. This is a finding consistent with studies analyzing exceptional HGSC survivors8,14. CDK12 was only mutated in the LT survivors. We confirmed previously published findings that tumors of LT survivors were more likely to have BRCA1 alterations and copy-altered segments when compared to ST survivors9. In total, we identified an average of 723 somatic mutations per LT survivor (10,124 SNVs total/14 patients) and an average of 591 somatic mutations per ST survivor (13,599 SNVs total/23 patients). Four patients exhibited somatic BRCA1 mutations with VAF > 30 (Supplementary Table 2). Six patients exhibited germline BRCA1 mutations and 1 LT survivor had a benign germline BRCA2 mutation (VAF > 30), all of which had mixed ClinVar-based clinical interpretations of pathogenic significance (Supplementary Tables 3 and 4)15.
We applied Classification of Ovarian Cancer signatures to our dataset and observed no statistically significant differences based on survivorship or tumor type among the signatures. The most prevalent signatures among our cohort were Mesenchymal and Differentiated (Fig. 1b)16. To compare the types of variants found in each of the tumors, we calculated the percentage of total variants identified in each patient that can be contextualized by the human cancer mutational signatures17,18. In our cohort, the most common signatures were for 5- methylcytosine deamination, mismatch repair, and double-strand break repair, along with a large number of mutations contributing to the unknown signature. There were no statistically significant differences between the signature percentages when comparing matched primary and metastatic tumors or ST and LT survivors. However, there is a higher percentage contribution to the mismatch repair signature in LT survivors compared to ST survivors, although not statistically significant (Supplementary Fig. 4). We also calculated percentages of the mutational nucleotide transitions and transversions. Transitions from C > T account for the largest percentage of total mutations in most of the tumor samples. The percent of total mutation ratios remain relatively the same between the primary and metastatic tumors in the majority of patients, but large shifts in the mutational transitions and transversions can be seen in Patients 15, 20, 30 and 34. There were no statistically significant differences when comparing the mutational percentages of transitions and transversions between ST and LT survivors and primary and metastatic tumors (Supplementary Fig. 5).
ST survivors exhibited a higher percentage of shared variants
For each patient, we calculated the percentage of called variants that were unique to the primary or to the metastatic tumor, or were shared between the two tumors. We observed that there were higher percentages of shared variants between the primary and metastatic tumors for ST survivors compared to LT survivors, although this was not statistically significant (MW statistic = 117.0, p-value = 0.0866) (Fig. 2a, b). Of note, all of our LT survivor samples were FFPE whereas the ST survivors included FF specimens. There was no FFPE/FF specific variant filtering applied in our variant calling pipeline, but each sample did undergo quality control and log-likelihood ratio (LLR) filtering (https://github.com/genome/docker-somatic-llr-filter/blob/master/somatic_llr_filter.py). Thus, the differences seen in the number of shared variants of the ST and LT cohorts could be affected by FFPE artifacts from sample preparation.
Within our cohort, all but 2 patients had tumors that harbored TP53 variants (Supplementary Table 5). Of the 35 patients that carried TP53 mutations, all but one patient shared the same TP53 mutation between their primary and metastatic tumor and 4 patients carried multiple TP53 mutations. The majority of TP53 mutations were missense or frame-shift deletions within the DNA-binding domain. One known hotspot mutation, R273H, was present in 3 patients, 2 of which were ST survivors. The 2 ST survivors that harbored this hotspot mutation had an overall survival ranging between 17-19.6 months, whereas the LT survivor lived more than 147 months after their diagnosis. In the TCGA-OV patient cohort, 2% (11/489) of tumor samples also had the TP53 R273H mutation, compared to the 8% (3/37) in our cohort8,19,20.
LT survivors exhibited more copy number alterations
Concordant with findings from the TCGA-OV project, CNAs were abundant in these data, with CNAs observed in every sample, and the number of copy-altered segments ranged from 33 to 739, with segment lengths ranging from 3636 to 229,754,969 nucleotides. The average segment length was 3,997,903 nucleotides (Supplementary Fig. 1a–c)8. The LT survivors had a greater proportion of copy-altered segments (p = 0.03, 95% CI 0.004 to 0.11), driven by a greater proportion of amplifications (p = 0.01, 95%CI 0.01–0.1). There was no significant difference between primary tumor samples and metastases, nor were there significant differences in mean estimated ploidy between groups.
We identified recurrent CNAs in our cohort overall and subsets of the ST survivors, LT survivors, primary tumors, and metastatic tumors (Supplementary Fig. 2a–c)21. Overall, our cohort exhibited a total of 254 recurrent copy-altered segments, including 85 amplified segments and 169 deleted segments with a 90% confidence interval. We identified 2333 genes within the amplified peaks and 4904 genes within the deleted segments. Region 20q13.12, previously identified in other ovarian tumors, was amplified in our cohort, along with other genes that have previously found amplified in ovarian cancer such as CCNE1, ERBB2, RSF1 and deleted genes like BRCA18,22. Among the cohort and subset analyses, there were more recurrently deleted segments than amplified segments and regions within 8q, 3q, and 19q were among the most recurrently amplified segments while peaks on 9q, 15q, 16q, 17q were among the most recurrently deleted segments. We correlated the CNA and our RNA-seq data for all samples in our cohort, utilizing the threshold values GISTIC2.0 calculates with their corresponding RNA-seq FPKM values for the genes involved in altered regions (Supplementary Fig. 2c). The relationship between copy number and expression is not simple, but the medians of the data suggest that more amplified genes trend toward having higher RNA expression.
Our GISTIC2.0 analysis results comparing ST to LT survivors revealed that there were more recurrent copy-altered segments among the ST survivors (ST = 79 amplified, 198 deleted; LT = 60 amplified, 101 deleted). Comparing the primary and metastatic tumor analyses showed that the metastatic tumors had more recurrent segments (primary = 39 amplified, 135 deleted; met= 63 amplified, 157 deleted). Consistent with published data9, CCNE1 was amplified in ST survivors, primary tumors, and metastatic tumors sample subsets, but not among the LT survivor samples. Both the primary tumor and metastatic tumor subsets were significantly amplified at 19q12 and at 20q13.12, while the ST subset had 19q12 amplified and the LT subset had 20q13.12 significantly amplified.
Differentially expressed genes correlate with survival
We calculated differential expression (DE) of genes between the ST and LT survivors in both primary and metastatic tumors. Overall, there were distinct transcriptomes that correlated with ST or LT survival, both within the tumor cohorts separately and when combining all patients regardless of tumor type (Fig. 3a, b). Within the metastatic tumor cohort, there were 4792 DE genes (DEGs) between ST and LT survivors, with an FDR < 0.01, after selecting for only protein-coding genes. Genes such as SZRD1 and ERV3-1 were upregulated in the metastatic tumors of ST survivors, as previously reported in other solid cancer types such as cervical and colorectal cancer23,24.
In order to identify DEGs that were specifically associated with survival in the metastatic tumors, we filtered out any genes that were also differentially expressed between the ST and LT survivors’ primary tumors. This revealed 325 genes only DE in metastatic tumors, with 295 of these (90.7%) downregulated in ST survivors (Fig. 3a). The DEGs unique to metastatic tumors of ST survivors are enriched for several Biological Processes GO terms with an FDR < 0.05, such as “regulation of cellular biosynthetic process” (Supplementary Fig. 6)25–28. A GO enrichment analysis on the 30 upregulated DEGs unique to metastatic tumors showed enrichment with an FDR < 0.05 for several Molecular Function GO terms associated with DNA binding and transcriptional activity (Fig. 3c)25,27,28. This enrichment is most likely due to the 13/30 of those upregulated DEGs that are in the zinc finger family. We also used DAVID to find enrichment of the KEGG and Biocarta pathways within our DEGs, and although there were no significantly enriched Biocarta terms, there were 9 KEGG pathways enriched. Some of these included “Adherens junction” and “protein processing in endoplasmic reticulum”. (Supplementary Fig. 7) Of note, FOXL2NB and PTCH2 have correlated with poor survival in other cancer types29,30. There is evidence that OGN plays a role in EMT, and PRDX1 has been studied as prognostic marker in lung cancer31,32. We calculated DE between genes of ST and LT survivors within the primary tumors. We found that there was a total of 4248 DE genes with FDR < 0.01. After filtering for protein-coding genes, we narrowed our list of DE genes to 3694, with 502 DEGs that were specifically differentially expressed only in primary tumors (Fig. 3b).
When all tumors are included in the DE analysis, there were a total of 7304 protein-coding DEGs between ST and LT survivors (Supplementary Data 1). The top 50 upregulated and top 50 downregulated DEGs are included in Supplementary Fig. 8. Additionally, we calculated DEGs between primary and metastatic tumors and identified only 4 DEGs with an FDR < 0.01. When we lower the FDR filter to <0.1, the number of DEGs increased to 15. Of those, 5 genes (WIPF3, STAR, SCUBE1, PEG3, CNTNAP2) were also found in the top 100 DEGs identified by Sallinen et al., which compared DEGs between 10 matched primary and metastatic ovarian tumors having an FDR < 0.112.
Differentially expressed lncRNAs correlate with survival
From the RNA-sequencing data, we identified several long-noncoding RNA transcripts (lncRNAs) that were among the top differentially expressed transcripts between the ST and LT survivors in both metastatic and primary tumors. Within the metastatic tumors, we identified 11 lncRNAs that were differentially expressed and all but one was upregulated in ST survivors (Fig. 3d). This set of lncRNAs included ARRDC1-AS1, which was shown to be a part of a potential lncRNA prognostic signature in breast cancer33. Among the primary tumors, we identified 36 lncRNAs of which 35 were upregulated in ST survivors. Of these 36 lncRNAs, 9 lncRNAs (25%), (FAR2P1, ARRDC1-AS1, MIRLET7BHG, OVCH1-AS1, C11orf72, FLJ22447, LACTB2-AS1, ALOX12-AS1, and C5orf56) overlapped with the lncRNAs identified in our metastatic tumor cohort.
Tumors from ST survivors harbored recurrent gene fusion predictions
A total of 1164 gene fusions were predicted among our tumor sample cohort, 35 of which were recurrent (seen in at least 2 samples) and unique to ST survivors (Supplementary Table 6). The higher number of gene fusions identified in ST survivors was due to a higher level of quality in the RNA-sequencing since this subgroup included FF tumor samples, whereas all LT survivors were FFPE samples.
INTEGRATE detected several ESR1 gene fusions in our tumor cohort, which have previously been implicated in breast and ovarian cancer9,34. In particular, the ESR1 > CCDC170 recurrent gene fusion identified by Yang et al. was present in 2 of our tumor samples, 1 ST primary tumor (5 reads) and 1 LT metastatic tumor (7 reads). Interestingly, we also noticed that a total of 33 gene fusions involved collagen genes, 32 of which were identified in ST survivors, 21 were in metastatic tumors, and 20 are in-frame fusions (Supplementary Table 7). Pathway analysis on the genes involved in recurrent gene fusions in our cohort were significantly enriched for terms related to “collagen chain trimerization” and “Collagen degradation” in the PANTHER reactome set, which is interesting given the known role of collagen in the ovarian cancer tumor microenvironment25–28.
Immune cell populations abundances
We used the program Cibersort to estimate the fraction of immune cell types in our tumor samples (Fig. 4). The immune cell groups CD4 T-Cells, macrophages, and monocytes had the highest fractions in many of the tumor samples. There was much variability in immune cell type fractions across patients, but there were few significant differences in immune cell fractions between primary or metastatic tumors or between ST and LT survivors among the 22 immune cell types (Supplementary Fig. 9). Of note, CD4 naïve T-cells (higher fractions in ST), follicular helper T-cells (higher fractions in LT), regulatory T-cells (higher fractions in LT), and activated dendritic cells (higher fractions in LT) were significantly different between ST and LT survivors with Mann–Whitney statistical p-values < 0.05. Between the primary and metastatic tumors, the CD8 T-cells, activated CD4 memory T-cells, and neutrophils were significantly higher in metastatic tumors based on Wilcoxon statistical p-values < 0.05. A chart of the statistical differences between all of the subsets for the 22 immune cell fractions is in Supplementary Table 8.
Lee et al. found significant abundance differences of M2 macrophages and monocytes between their R0 and NACT patient groups, and a significant difference between the abundance of resting CD4 memory T cells between primary and metastatic tumors, but these patterns did not appear in our dataset11. Thorsson et al. performed immunogenomic analysis across cancer types in TCGA and identified six immune subtypes35. In their analysis, the ovarian cancer cohort correlated the most with their C2 IFN-y dominant signature, which is defined by having high M1 and M2 macrophage polarization and strong CD8 signal. This is consistent with the higher fraction of macrophages we found in our Cibersort analysis. The ovarian cohort also had representation of their C1 wound healing and C4 lymphocyte depleted signatures, but did not have representation for their C3 inflammatory, C5 immunologically quiet, or C6 TGF-B Dominant signatures. The lack of these signatures is consistent with our cohort’s low immune cell fractions for several lymphocytes and the variability between patient samples. Since our cohort included metastatic tumors that are not represented in TCGA, perhaps a more specific immunogenomic analysis with more metastatic tumors for ovarian cancer is necessary to better understand the immune landscape in these tumors35.
Discussion
HGSC can rapidly metastasize before patients experience symptoms, therefore many patients are diagnosed at late stages and have limited treatment options. Despite many studies of the genetics of HGSC tumors, we have yet to fully characterize and identify genetic biomarkers of HGSC metastatic tumors, especially those with poor survival outcomes. In this study we built on previous studies to better characterize the genomic features of matched primary and metastatic tumors in the context of patient survival, so we might identify unique features of ST metastatic tumors.
We found supporting data for RB1 mutations as a marker for long survivorship as previously discovered, since RB1 mutations were identified exclusively in our LT survivor cohort8,14. In our study we found that there was a higher percentage of shared variants between the primary and metastatic tumors of ST survivors compared to LT survivors. Although this difference was not significant, it can suggest that tumors from ST survivors may be more clonal and genetically similar than tumors from LT survivors. This could mean that tumors from ST survivors are inherently more resistant to treatments, since both their primary and metastatic tumors are genetically similar. However, other studies of the clonality of HGSC tumors have yet to find a correlative pattern between clonality and survival1,7,36–38, hence, many more tumor samples will need to be analyzed to answer this question. Shared variants that are likely to be present in all clones of the tumor may be the best suited for targeted therapies. With the advent of single-cell sequencing, we may now be able to answer more questions about the heterogeneity and clonal development of HGSC tumors39.
TP53 mutations are a hallmark of high-grade serous ovarian cancer and TP53 gene has known hotspot mutations across cancer types. One of these hotspot mutations, R273H, was identified in 3 patients within our cohort, two of whom had an overall survival ranging 17-34 months. In TCGA Genomic Data Commons Portal, there are a total of 99 cases across cancer types that harbor a mutation at this position in TP53, 9 of which are in ovarian cancer samples. Recent functional studies have shown that this particular mutation results in a p53 gain-of-function that may promote metastasis in colorectal, esophageal, and breast cancers. Additionally, breast cancer cell lines with a R273H gain-of-function have been found to have improved response to combination PARPi and a DNA-damaging agent40–42. If this is also seen in ovarian cancer cells, this may lead to additional patients receiving PARP inhibitor and combination treatments in the future. However, further work in characterizing the therapeutic potential of this specific TP53 mutation in ovarian cancer is needed.
Yang et al. demonstrated genomic differences between HGSC primary tumors of ST and LT extreme survivors9. Our study focused on paired primary and metastatic tumors within the median survival range of ovarian cancer. Yang et al. demonstrated that more than 50% of tumors with BRCA mutations are LT survivors with an OS > 10 years. This is consistent since HGSC patients with BRCA mutations respond better to chemotherapy10. Therefore, our study was better able to characterize the genomic features of tumors from patients with more moderate survival to poor survival.
Recently gene fusions have proven to be useful drug targets for cancer. For example, the identification of EML4 > ALK gene fusions in non-small cell lung cancer paved the way for the development of ALK inhibitors and recently drugs targeting tumors of any cancer type with gene fusions involving NTRK genes have been approved by the FDA43,44. In our analyses, we identified an ESR1-CCDC170 gene fusion in our cohort as previously described by Yang et al.9. There is evidence that ESR1 gene fusions in estrogen receptor-positive breast cancer promote endocrine therapy resistance and metastasis, thus ESR1 gene fusions may have a role in ovarian cancer progression34. Lei et al. demonstrated that CDK4/6 inhibitors were able to suppress growth that was driven by ESR1 gene fusions, indicating that gene fusion driven cancers are treatable34. We found a higher number of gene fusion predictions in our tumors from ST survivors and these could be a potential source for new drug development, but additional work will be needed to identify recurrent gene fusions that are targetable in ovarian cancer.
Previous studies have demonstrated that HGSC primary and metastatic tumors have similar transcriptomes. Two such studies using microarrays identified few differentially expressed genes between the HGSC primary and metastatic tumors45,46. In this study, we also identified few DEGs between primary and metastatic tumors. However, when we analyzed primary or metastatic tumors separately to find DEGs between ST and LT survivors, we found DEGs unique to metastatic tumors from patients with ST survival. This demonstrates that clinical outcome can be used to identify DEGs specific to metastatic tumors. We found several DEGs in the zinc finger family that were upregulated in the ST survivor metastatic, suggesting that these tumors have more transcriptionally active genes that could be promoting metastatsis or could be used as markers for poor prognosis, like FOXL2NB29 and PTCH230, which have correlated with poor survival in other cancer types. The large number of DEGs that identified in our DE analyses are a resource for future studies for biomarkers given their correlation with poor prognosis in ovarian and other cancer types and because we filtered for genes unique to the metastatic tumors in our cohort.
Additionally, we identified lncRNAs that were differentially expressed between survival groups. LncRNAs have only recently been studied for their role in cancer development and prognosis and have not been extensively studied in ovarian cancer yet47,48. There are some lncRNAs, such as RP11-190D6.2, that have shown to be tumor suppressors or oncogenes in ovarian cancer cell lines48,49. Given that we found several lncRNAs having increased expression in ST survivors, they could serve as potential targets or biomarkers for future treatment development.
It should be noted that all of our LT survivor samples are from FFPE, while ST tumors were not. Though we have applied rigorous quality control and filtering to our variant calling, we cannot exclude the possibility that sample preparation has some effect on the results. It is possible that the batch correction from FFPE and FF samples reduced the number of DEGs that were able to be identified in our cohort between the primary and metastatic tumors. The SVA batch correction may have over accounted for unknown variation or it may be introduced variation, but was still necessary so we could include all tumor samples in our DE analysis, regardless of RNA sample preparation. This dataset, like many using patient samples, has limitations but provides insights into the differences between HGSC primary and metastatic tumors in the context of moderate survival outcomes.
In conclusion, our research characterizes the exomes and transcriptomes of a unique dataset of matched primary and metastatic tumors in the context of patient survival. We were able to confirm many of the genomic features seen in previous studies8,9,11,12,16. We observed that the transcriptomes of primary and metastatic tumors were similar to each other, compared to the transcriptomes of tumors from ST and LT survivors that had more DEGs and DE lncRNAs. Our gene fusion analysis revealed fusions that have the potential to be new targets in HGSC and could warrant further functional studies. In short, our research improves the understanding of genetic variation in HGSC metastases that exist between patients with different prognoses can better inform treatments and may identify new targets for drug development.
Methods
Patient cohort sample criteria
This study was approved by the Washington University in St. Louis Institutional Review Board #201309075). Criteria for approval are met per 45 CFR 46.111 and/or 21 CFR 56.111 as applicable. Patients were included in they had FIGO stage III–IV ovarian cancer of serous (n = 38) or endometrioid (n = 1) histology and were undergoing primary cytoreductive surgery, and for all patients informed consent was obtained. All research conformed with the principles of the Declaration of Helsinki. We analyzed normal tissue, primary tumor, and metastatic tumor samples from a total of 39 patients.
Normal tissue samples consisted of adjacent non-malignant omentum or peritoneum. All tumors were collected during primary cytoreductive surgery, prior to any chemotherapy treatment, and were stored as either fresh frozen (FF) or formalin-fixed paraffin-embedded (FFPE). These patients were separated into two groups based on their overall survival. Patients who lived less than 3.5 years after their diagnosis were considered short-term (ST) survivors and patients who lived more than 5 years after diagnosis were considered long-term (LT) survivors (Table 1). Other clinical characteristics of patients are shown in Table 1. All patients received standard regimens of carboplatin and paclitaxel following cytoreductive surgery. More LT survivors received intraperitoneal (IP) chemotherapy than ST survivors (1 ST survivors, 5 LT survivors, p-value = 0.042), Otherwise there were no differences in the use of bevacizumab or PARP inhibitor treatments between the two cohorts. All 23 ST patients and 12 LT patients had matched DNA and RNA extracted and sequenced. An additional 4 LT patients had tumor sequencing performed: Patients 031 and 035 had matched primary and metastatic tumors DNA sequenced and Patients 032 and 040 only had RNA-sequencing from their matched primary and metastatic tumor tissue.
Table 1.
Short-term (ST) | Long-term (LT) | |
---|---|---|
Patients (no.) | n = 23 | n = 16 |
Tumors | 46 | 32 |
Primary | 23 | 16 |
Metastatic | 23 | 16 |
Age (years) | 61.5 ± 19.5 | 57 ± 8.2 |
FIGO stage | ||
IIIA | 2 | 0 |
IIIC | 14 | 16 |
IV | 7 | 0 |
FIGO grade | ||
Moderately differentiated | 3 | 0 |
Poorly differentiated | 20 | 16 |
Histology | ||
Serous | 22 | 16 |
Endometrioid | 1 | 0 |
Median overall survival (OS) (months) | 21 (range: 0–41) | 111 (range: 82–195) |
Fresh frozen (FF) samples | ||
Primary | 17 | 0 |
Metastatic | 11 | 0 |
Paraffin-fixed (FFPE) samples | ||
Primary | 6 | 16 |
Metastatic | 12 | 16 |
Whole-exome sequenced | 46 | 14 |
RNA-sequenced | 46 | 14 |
Clinical characteristics of patients diagnosed with HGSC at FIGO stage III–IV. Tumor samples were collected from patients during primary cytoreductive surgery in Washington University in St. Louis.
Exome and RNA sequencing
All tumors were examined by a pathologist to determine tumor cellularity and necrosis and only samples of 60% tumor cellularity or higher with <20% necrosis were sequenced. DNA and RNA were extracted from FF or FFPE tissues using Qiagen’s DNeasy Blood & Tissue Kit and RNeasy kit. Whole-exome sequencing of DNA from matched primary tumor, metastatic tumors, and normal tissue samples was completed for 39 patients with the NimbleGen VCRome exome capture kit (NimbleGen Roche) according to the manufacturer’s protocol. Paired-end Illumina 151 bp reads were generated for normal samples to a minimum of depth of 65x, while tumor samples were sequenced to a minimum of 139x, with the average coverage of ~300x. A coverage table provides per-sample coverage details (Supplementary Table 1). RNA sequencing of primary and metastatic tumor samples was performed using the Illumina TruSeq stranded Total RNA library kit following the Manufacturer-recommended protocol. Paired-end Illumina sequencing of 151 bp read length yielded an average of approximately 125 million paired reads per-sample and an average of approximately 134 million reads mapped per sample. Quality Control metrics for the RNA-seq samples were generated using MultiQC and are reported in Supplementary Data 250.
Variant calling and genomic analysis
Exome sequencing data were aligned to human reference build GRCh37 using BWA-mem and deduplicated with Picard version 1.11351,52. Somatic variants were called from combined data using the Genome Modeling System pipeline53,54. In brief, variants were called from the union of 4 callers, which included Samtools version r932, Somatic Sniper version 1.0.4, VarScan version 2.3.6, Strelka version 1.0.11, and Mutect v1.1.453,55–59. Indels were detected from the union of 4 callers; GATK somatic-indel version 5336, Pindel version 0.5, VarScan version 2.3.6, and Strelka version 1.0.1153,55,56,60,61. Further variant filtering was applied as described in Ghobadi et al.53. Briefly, SNVs and indels were discarded if they had below 20x coverage, appeared as artifacts in a panel of 905 normal exomes, or exceeded 0.1% frequency in the 1000 genomes or NHLBI exome sequencing projects62,63. A Bayesian classifier (https://github.com/genome/genome/blob/master/lib/perl/Genome/Model/Tools/Validation/IdentifyOutliers.pm) was also applied and variants that classified as somatic with a binominal log-likelihood of at least 5 were retained53. All called variants compared in this study are provided in Supplementary Data 2. Mutational burden was calculated as the number of variants called per megabase for all variants that passed the QC filtering. The waterfall plot (Fig. 1a, b) depicting frequently mutated genes from TCGA-OV was generated using GenVisR8,64. Mutational clinical significance for somatic and germline BRCA mutations was determined from ClinVar using their definitiions of clinical signficiance terms (https://www.ncbi.nlm.nih.gov/clinvar/) (Supplementary Data 2, Supplementary Tables 3 and 4)15. Classification of Ovarian Cancer signatures were calculated according to parameters defined in Verhaak et al.16. Copy-altered segments were identified from VarScan (Supplementary Figs. 1 and 2)56. Significant copy-altered segments were identified for all tumors, all tumors from ST survivors, all tumors from LT survivors, metastatic tumors, primary tumors, and only the metastatic tumors of ST survivors using the GISTIC 2.0 version 6.15.28 Module on the AWS GenePattern cloud (https://cloud.genepattern.org/gp). Default parameters and reference genome Human_Hg19.mat were used to run GISTIC 2.0 analyses21. We used the wide peak region analyses from GISTIC 2.0 to calculate the total number of genes amplified or deleted within those regions. The correlation between copy number alteration (CNA) and RNA-seq expression was completed using the thresholded CNA values GISTIC 2.0 calculated based on each sample’s segment files21,65. The violin plot was created by binning all CNA threshold values from every gene for every sample and plotting that with their corresponding log2(FPKM) values (Supplementary Fig. 2c).
Differential expression analysis
Normalization and quality control
Transcript read counts were obtained using Kallisto version: v0.43.1 and gene-level read counts were calculated using GRCh37 in Ensembl66. Quality control and normalization of the raw count data were performed using the R/Bioconductor package edgeR version 3.2867. For our comparison of LT survivor samples to ST survivor samples, we removed genes with less than 1 Count Per Million mapped reads in at least half of the samples to ensure that a gene was retained if expressed in only one of the two groups. For our comparison of primary to metastatic tumors, genes with less than 1 Count Per Million mapped reads in at least half the samples were removed. Normalization factors were calculated using the Trimmed Mean of M-values normalization method in edgeR to account for compositional biases in libraries between each pair of samples.
Removal of batch effects
Due to technical artifacts introduced by the use of FFPE that can affect gene expression analyses, we performed batch effect correction prior to differential expression analysis for the comparison of LT to ST survival samples68,69. We used the SVA function of the R/Bioconductor package SVA version 3.34.0 to estimate and remove surrogate variables for unwanted and unknown batch effects and other sources of variation present in the data70. The SVA function estimated surrogate variables for each subset analysis, which was adjusted for within the statistical model applied in the edgeR package in downstream analyses of differential gene expression. After batch effect correction, samples were analyzed by a Principal Component Analysis using the R function “dist” on regularized log-transformed (rlog) data to calculate the Euclidian distance between samples. Plotting of the first (PC1) and second (PC2) principal components revealed that expression values from the same patient are more related to one another than between groups (Supplementary Fig. 3). We also observed 4 potential outlier samples, which were removed from downstream analyses because of their distance from the other samples in the Principal Component Analysis plot after normalizing and batch correcting transcript counts. These 4 removed samples are highlighted in Supplementary Fig. 3a and were all collected within the same year, but their exclusion could mean we are missing out on some biological features of these tumor samples.
Differential gene expression (DGE) analysis
DGE analysis was performed using edgeR version 3.28.0, which implements a negative-binomial general linear model67. We performed 4 comparisons: ST survival samples versus LT survival samples for all tumors in the study; ST survival versus LT survival among metastatic tumors; ST survival versus LT survival among primary tumors; and primary tumors versus their matched metastatic tumor. The surrogate variables estimated with SVA were included in the model used for the LT versus ST survival comparison. To normalize gene-level variance, the biological coefficient of variation was calculated using Cox-Reid dispersion for negative-binomial general linear models. The p-values of differential expression tests were corrected for multiple-hypothesis testing using Benjamini–Hochberg false-discovery rate (FDR) correction. The threshold for significance was set to FDR Q-value < 0.01. We further curated our differentially expressed genes (DEGs) by limiting to protein-coding genes that were listed in Ensembl genes 100 Human genes (GRCh38.p13) protein_coding transcript type on BioMart. All DEGs discussed in this paper are listed in Supplementary Data 1. Pathway analysis was applied to the DEG and gene fusion gene lists using the PANTHER classification system 16.0 (http://pantherdb.org/), with the organisms set as ‘Homo sapiens’ and performing a statistical overrepresentation test using Fisher’s Exact test and calculating a False-Discovery Rate25. We used all Gene Ontology (GO) terms (Biological Processes, Molecular Function, and Cellular Components), PANTHER pathways, and Reactome pathways annotation sets26–28. We used DAVID to identify enrichment for KEGG and Biocarta pathways71,72.
Immune cell abundance estimates
We used Cibersort (https://cibersort.stanford.edu/) to estimate the abundance of infiltrating immune cell types using our tumor RNA-seq data73. We generated a mixture file for our cohort of tumor samples based on the gene abundance counts generated from the RNA-seq reads using Kallisto66. We used the LM22 gene signature, which calculated immune cell fractions for immune cell types, and ran our Cibersort analysis with 500 permutations under the relative mode.
Gene fusion predictions
Gene fusion predictions for each tumor sample were produced using INTEGRATE v0.2.6 to analyze the tumor RNA-sequencing data74. Full-length raw reads and a set of reads trimmed to remove potentially low-quality bases were each aligned to human reference genome GRCh38 (r90) using STAR v2.5.3a with a minimum chimeric segment length of 18 and chimeric alignments output to a separate SAM file75. The chimeric alignments were then used as inputs for INTEGRATE fusion with default parameters for fusion discovery with tumor RNA-seq only. Fusion predictions from the full and trimmed reads were then merged and manually reviewed to ensure all fusion calls were valid. Since normalization of FFPE and FF tumor samples is more challenging for gene fusions, we characterized the predicted gene fusions as independent events regardless of sample preparation.
Statistics and reproducibility
We analyzed normal tissue, primary tumor, and metastatic tumor samples from a total of 39 patients. Statistical analysis and figure generation was performed in R 3.6.2 and Python 3.8.2. The p-values of differential expression tests were limited to an FDR Q-value < 0.01. Comparisons between survival groups were determined by Mann–Whitney U-statistical tests for individual samples and comparisons between tumor types were performed using Mann–Whitney–Wilcoxon statistical tests for dependent samples since the primary and metastatic tumors were matched. Enrichment for pathway analyses with our DEGs was done using Fisher’s Exact test and calculating a false-discovery rate. Given the genetic heterogeneity of individuals and their tumors, it should be noted that we sequenced only one sample from each primary and metastatic tumor, which limits our abilities to fully capture the genetic diversity within these tumors.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Acknowledgements
The authors are grateful to all the of the participants in this study as well as the surgeons and supporting staff that made sample collection and sequencing possible. We’d like to acknowledge Dr. John Edwards for providing insight, guidance, and comments for the duration of this project and manuscript preparation. We’d also like to thank Megan Richters for her insight, support, and advice with genomic analyses. Funding for this study was provided by the Foundation for Barnes-Jewish Hospital.
Author contributions
E.N.K. contributed to consolidating, analyzing, interpreting all genomics data, creating figures and writing and preparing the manuscript. M.M.M. contributed to interpretation of the data and collecting patient clinical data. N.C.S. contributed to analyzing and interpreting the SNV and copy number data. T.L. contributed to initial tumor sequencing processing and genomic data analysis. M.I. and J.Z. analyzed RNA-seq data with INTEGRATE for gene fusion analysis. F.M-R. contributed code and helped with DE analysis. I.S.H., C.K.M., P.H.T., A.R.H., M.A.P., and D.G.M. all contributed to surgically collecting tumors and patient data collection for the study. C.A.M. helped interpret gene fusion and genomic data. C.A.M., E.R.M., and G.L. contributed to advising on genomic analyses, interpreting genomic data, and revising this manuscript. D.K. contributed to revising this manuscript. K.C.F. is the senior author and contributed to the conception, design of the project, and preparation of the manuscript. The authors have read and approved of the final manuscript.
Peer review
Peer review information
This manuscript has been previously reviewed at another Nature Portfolio journal. The manuscript was considered suitable for publication without further review at Communications Biology. Communications Biology thanks Susana García, Kylie L Gorringe, Aritro Nath and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary handling editor: George Inglis. A peer review file is available.
Data availability
RNA-sequencing files have been deposited in the NCBI GEO data base under GSE218939. WES data generated for this analysis have been deposited within the Sequence Read Archive under the accession PRJNA957243, and can be found at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA957243. Lists of SNVs, DEGs, lncRNAs, and gene fusions are provided in Supplementary Data 1 and 2. Source data for figures have been submitted in Supplementary Data 3.
Code availability
Code used to analyze genomic data is publicly available and custom code is deposited on Github (https://github.com/ekotnik/OC-Tumor-genomic-analyses) 10.5281/zenodo.787376276.
Competing interests
The authors declare no competing interests.
Consent for publication
All patient information was anonymized and no identifying personal information was collected.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s42003-023-05026-3.
References
- 1.McPherson A, et al. Divergent modes of clonal spread and intraperitoneal mixing in high-grade serous ovarian cancer. Nat. Genet. 2016;48:758–767. doi: 10.1038/ng.3573. [DOI] [PubMed] [Google Scholar]
- 2.Bowtell DD. The genesis and evolution of high-grade serous ovarian cancer. Nat. Rev. Cancer. 2010;10:803–808. doi: 10.1038/nrc2946. [DOI] [PubMed] [Google Scholar]
- 3.Stewart C, Ralyea C, Lockwood S. Ovarian Cancer: An Integrated Review. Semin Oncol. Nurs. 2019;35:151–156. doi: 10.1016/j.soncn.2019.02.001. [DOI] [PubMed] [Google Scholar]
- 4.Howlader N, N. A. et al. (eds). SEER Cancer Statistics Review (National Cancer Institute, 1975–2017).
- 5.McLachlan J, George A, Banerjee S. The current status of PARP inhibitors in ovarian cancer. Tumori. 2016;102:433–440. doi: 10.5301/tj.5000558. [DOI] [PubMed] [Google Scholar]
- 6.Ledermann JA. PARP inhibitors in ovarian cancer. Ann. Oncol. 2016;27:i40–i44. doi: 10.1093/annonc/mdw094. [DOI] [PubMed] [Google Scholar]
- 7.Testa, U., Petrucci, E., Pasquini, L., Castelli, G. & Pelosi, E. Ovarian cancers: genetic abnormalities, tumor heterogeneity and progression, clonal evolution and cancer stem cells. Medicines (Basel)510.3390/medicines5010016 (2018). [DOI] [PMC free article] [PubMed]
- 8.Bell D, et al. Integrated genomic analyses of ovarian carcinoma. Nature. 2011;474:609–615. doi: 10.1038/nature10166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Yang SYC, et al. Landscape of genomic alterations in high-grade serous ovarian cancer from exceptional long- and short-term survivors. Genome Med. 2018;10:81–81. doi: 10.1186/s13073-018-0590-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Alsop K, et al. BRCA mutation frequency and patterns of treatment response in BRCA mutation-positive women with ovarian cancer: a report from the Australian Ovarian Cancer Study Group. J. Clin. Oncol. 2012;30:2654–2663. doi: 10.1200/JCO.2011.39.8545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lee S, et al. Molecular analysis of clinically defined subsets of high-grade serous ovarian cancer. Cell Rep. 2020;31:107502. doi: 10.1016/j.celrep.2020.03.066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Sallinen H, et al. Comparative transcriptome analysis of matched primary and distant metastatic ovarian carcinoma. BMC Cancer. 2019;19:1121. doi: 10.1186/s12885-019-6339-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Knisely AT, et al. Trends in primary treatment and median survival among women with advanced-stage epithelial ovarian cancer in the US From 2004 to 2016. JAMA Netw. Open. 2020;3:e2017517. doi: 10.1001/jamanetworkopen.2020.17517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Peng G, Mills GB. Surviving ovarian cancer: an affair between defective DNA repair and RB1. Clin. Cancer Res. 2018;24:508–510. doi: 10.1158/1078-0432.CCR-17-3022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Landrum MJ, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic acids Res. 2018;46:D1062–D1067. doi: 10.1093/nar/gkx1153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Verhaak RG, et al. Prognostically relevant gene signatures of high-grade serous ovarian carcinoma. J. Clin. Invest. 2013;123:517–525. doi: 10.1172/JCI65833. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Alexandrov LB, Nik-Zainal S, Wedge DC, Campbell PJ, Stratton MR. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 2013;3:246–259. doi: 10.1016/j.celrep.2012.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Alexandrov LB, et al. Signatures of mutational processes in human cancer. Nature. 2013;500:415–421. doi: 10.1038/nature12477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Cerami E, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2:401–404. doi: 10.1158/2159-8290.CD-12-0095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gao J, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 2013;6:pl1. doi: 10.1126/scisignal.2004088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Mermel CH, et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 2011;12:R41. doi: 10.1186/gb-2011-12-4-r41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Etemadmoghadam D, et al. Integrated genome-wide DNA copy number and expression analysis identifies distinct mechanisms of primary chemoresistance in ovarian carcinomas. Clin. Cancer Res. 2009;15:1417–1427. doi: 10.1158/1078-0432.CCR-08-1564. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wang-Johanning F, et al. Expression of human endogenous retrovirus k envelope transcripts in human breast cancer. Clin. Cancer Res. 2001;7:1553–1560. [PubMed] [Google Scholar]
- 24.Lee SH, et al. Elevation of human ERV3-1 env protein expression in colorectal cancer. J. Clin. Pathol. 2014;67:840–844. doi: 10.1136/jclinpath-2013-202089. [DOI] [PubMed] [Google Scholar]
- 25.Mi H, et al. PANTHER version 16: a revised family classification, tree-based classification tool, enhancer regions and extensive API. Nucleic Acids Res. 2020;49:D394–D403. doi: 10.1093/nar/gkaa1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Mi H, Thomas P. PANTHER pathway: an ontology-based pathway database coupled with data analysis tools. Methods Mol. Biol. 2009;563:123–140. doi: 10.1007/978-1-60761-175-2_7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ashburner M, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gene Ontology Consortium. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res. 2021;49:D325–d334. doi: 10.1093/nar/gkaa1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Tian S, Meng G, Zhang W. A six-mRNA prognostic model to predict survival in head and neck squamous cell carcinoma. Cancer Manag Res. 2018;11:131–142. doi: 10.2147/CMAR.S185875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Geng H, et al. Survival prediction for patients with lung adenocarcinoma: A prognostic risk model based on gene mutations. Cancer Biomark. 2020;27:525–532. doi: 10.3233/CBM-191204. [DOI] [PubMed] [Google Scholar]
- 31.Song, C. et al. PRDX1 stimulates non-small-cell lung carcinoma to proliferate via the Wnt/β-Catenin signaling. Panminerva Med.10.23736/s0031-0808.20.03978-6 (2020). [DOI] [PubMed]
- 32.Chen H, Yang L, Sun W. Elevated OGN expression correlates with the EMT signature and poor prognosis in ovarian carcinoma. Int. J. Clin. Exp. Pathol. 2019;12:584–589. [PMC free article] [PubMed] [Google Scholar]
- 33.Liu H, et al. Long non-coding RNAs as prognostic markers in human breast cancer. Oncotarget. 2016;7:20584–20596. doi: 10.18632/oncotarget.7828. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Lei JT, et al. Functional annotation of ESR1 gene fusions in estrogen receptor-positive breast cancer. Cell Rep. 2018;24:1434–1444.e1437. doi: 10.1016/j.celrep.2018.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Thorsson V, et al. The immune landscape of cancer. Immunity. 2018;48:812–830.e814. doi: 10.1016/j.immuni.2018.03.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Lee JY, et al. Tumor evolution and intratumor heterogeneity of an epithelial ovarian cancer investigated using next-generation sequencing. BMC Cancer. 2015;15:85. doi: 10.1186/s12885-015-1077-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Zhang AW, et al. Interfaces of malignant and immunologic clonal dynamics in ovarian cancer. Cell. 2018;173:1755–1769.e1722. doi: 10.1016/j.cell.2018.03.073. [DOI] [PubMed] [Google Scholar]
- 38.Rojas V, Hirshfield KM, Ganesan S, Rodriguez-Rodriguez L. Molecular characterization of epithelial ovarian cancer: implications for diagnosis and treatment. Int J. Mol. Sci. 2016;17:2113. doi: 10.3390/ijms17122113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hornburg, M. et al. Single-cell dissection of cellular components and interactions shaping the tumor immune phenotypes in ovarian cancer. Cancer Cell10.1016/j.ccell.2021.04.004 (2021). [DOI] [PubMed]
- 40.Xiao G, et al. Gain-of-function mutant p53 R273H interacts with replicating DNA and PARP1 in breast cancer. Cancer Res. 2020;80:394–405. doi: 10.1158/0008-5472.CAN-19-1036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Qiu, W. G. et al. Identification, validation, and targeting of the mutant p53-PARP-MCM chromatin axis in triple negative breast cancer. NPJ Breast Cancer310.1038/s41523-016-0001-7 (2017) [DOI] [PMC free article] [PubMed]
- 42.Polotskaia A, et al. Proteome-wide analysis of mutant p53 targets in breast cancer identifies new levels of gain-of-function that influence PARP, PCNA, and MCM4. Proc. Natl Acad. Sci. USA. 2015;112:E1220–E1229. doi: 10.1073/pnas.1416318112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Koivunen JP, et al. EML4-ALK fusion gene and efficacy of an ALK kinase inhibitor in lung cancer. Clin. Cancer Res. 2008;14:4275–4283. doi: 10.1158/1078-0432.CCR-08-0168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Cocco E, Scaltriti M, Drilon A. NTRK fusion-positive cancers and TRK inhibitor therapy. Nat. Rev. Clin. Oncol. 2018;15:731–747. doi: 10.1038/s41571-018-0113-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Adib TR, et al. Predicting biomarkers for ovarian cancer using gene-expression microarrays. Br. J. Cancer. 2004;90:686–692. doi: 10.1038/sj.bjc.6601603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Hibbs K, et al. Differential gene expression in ovarian carcinoma: identification of potential biomarkers. Am. J. Pathol. 2004;165:397–414. doi: 10.1016/S0002-9440(10)63306-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Ma Y, Lu Y, Lu B. MicroRNA and long non-coding RNA in ovarian carcinoma: translational insights and potential clinical applications. Cancer Invest. 2016;34:465–476. doi: 10.1080/07357907.2016.1227446. [DOI] [PubMed] [Google Scholar]
- 48.Wang JY, Lu AQ, Chen LJ. LncRNAs in ovarian cancer. Clin. Chim. Acta. 2019;490:17–27. doi: 10.1016/j.cca.2018.12.013. [DOI] [PubMed] [Google Scholar]
- 49.Tong W, Yang L, Yu Q, Yao J, He A. A new tumor suppressor lncRNA RP11-190D6.2 inhibits the proliferation, migration, and invasion of epithelial ovarian cancer cells. Onco Targets Ther. 2017;10:1227–1235. doi: 10.2147/OTT.S125185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32:3047–3048. doi: 10.1093/bioinformatics/btw354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.“Picard Toolkit.” Broad Institute, GitHub Repository. https://broadinstitute.github.io/picard/; Broad Institute (2019).
- 52.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Ghobadi A, et al. Shared cell of origin in a patient with Erdheim-Chester disease and acute myeloid leukemia. Haematologica. 2019;104:e373–e375. doi: 10.3324/haematol.2019.217794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Griffith M, et al. Genome modeling system: a knowledge management platform for genomics. PLoS Comput Biol. 2015;11:e1004274. doi: 10.1371/journal.pcbi.1004274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Saunders CT, et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics. 2012;28:1811–1817. doi: 10.1093/bioinformatics/bts271. [DOI] [PubMed] [Google Scholar]
- 56.Koboldt DC, et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22:568–576. doi: 10.1101/gr.129684.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Larson DE, et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics. 2011;28:311–317. doi: 10.1093/bioinformatics/btr665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Cibulskis K, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 2013;31:213–219. doi: 10.1038/nbt.2514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009;25:2865–2871. doi: 10.1093/bioinformatics/btp394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.DePristo MA, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 2011;43:491–498. doi: 10.1038/ng.806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Fairley S, Lowy-Gallego E, Perry E, Flicek P. The International Genome Sample Resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Res. 2020;48:D941–d947. doi: 10.1093/nar/gkz836. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Exome Variant Server, NHLBI GO Exome Sequencing Project (ESP), Seattle, WA (http://evs.gs.washington.edu/EVS/) (2023).
- 64.Skidmore ZL, et al. GenVisR: genomic visualizations in R. Bioinformatics. 2016;32:3012–3014. doi: 10.1093/bioinformatics/btw325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Shao X, et al. Copy number variation is highly correlated with differential gene expression: a pan-cancer study. BMC Med. Genet. 2019;20:175. doi: 10.1186/s12881-019-0909-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 2016;34:525–527. doi: 10.1038/nbt.3519. [DOI] [PubMed] [Google Scholar]
- 67.Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25. doi: 10.1186/gb-2010-11-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Parker HS, et al. Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction. Bioinformatics. 2014;30:2757–2763. doi: 10.1093/bioinformatics/btu375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Viljoen KS, Blackburn JM. Quality assessment and data handling methods for Affymetrix Gene 1.0 ST arrays with variable RNA integrity. BMC Genomics. 2013;14:14. doi: 10.1186/1471-2164-14-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28:882–883. doi: 10.1093/bioinformatics/bts034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Huang da W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
- 72.Sherman BT, et al. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update) Nucleic Acids Res. 2022;50:W216–W221. doi: 10.1093/nar/gkac194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Newman AM, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods. 2015;12:453–457. doi: 10.1038/nmeth.3337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Zhang J, et al. INTEGRATE: gene fusion discovery using whole genome and transcriptome data. Genome Res. 2016;26:108–118. doi: 10.1101/gr.186114.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Dobin A, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Kotnik, E. OC-Tumor-Genomic-Analyses. Available from: https://github.com/ekotnik/OC-Tumor-genomic-analyses (2023).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
RNA-sequencing files have been deposited in the NCBI GEO data base under GSE218939. WES data generated for this analysis have been deposited within the Sequence Read Archive under the accession PRJNA957243, and can be found at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA957243. Lists of SNVs, DEGs, lncRNAs, and gene fusions are provided in Supplementary Data 1 and 2. Source data for figures have been submitted in Supplementary Data 3.
Code used to analyze genomic data is publicly available and custom code is deposited on Github (https://github.com/ekotnik/OC-Tumor-genomic-analyses) 10.5281/zenodo.787376276.