Abstract
Background
Germline Structural Variants (SVs) represent an important source of genetic diversity, in large part due to their influence on gene transcription. It is necessary to systematically catalog germline SVs and their associated impacted genes across different cohorts and tissue and cellular contexts, including pediatric brain or Central Nervous System (CNS) tumors.
Methods
We combined RNA with whole genome sequencing across 1430 pediatric brain or CNS tumor patients from the Children’s Brain Tumor Network. We set out to systematically identify genes for which the proximity of germline SVs was recurrently and significantly associated with differential expression in the tumor sample across multiple patients.
Results
For hundreds of genes, recurrent and common germline SV breakpoints within 1 Mb were associated with higher or lower expression in tumors spanning various histologic types. Some germline SV-expression associations involved gene deletion or disruption, while others represented cis-regulatory alterations. Rare and singleton SVs disrupting DNA repair-related and mitochondrial-related genes collectively involved 2.7 and 4.7% of patients, respectively. Genes with germline SV breakpoint patterns and expression associated with patients of African ancestry included ACOT1 and CRYBB2P1. Genes with germline SV breakpoint patterns and expression associated with patient survival included ACTG1 and AHRR. Genes altered in association with both somatic and germline SVs included HGF and BCOR.
Conclusion
Our results capture a class of phenotypic variation at work in the setting of pediatric brain tumors, including genes with cancer roles.
Supplementary Information
The online version contains supplementary material available at 10.1186/s40478-025-02098-6.
Introduction
Structural Variants (SVs) are large-scale changes in the genome that include copy number variants and balanced rearrangements. SVs represent an important source of genetic diversity. Whereas, historically, our knowledge of genetic variation was limited mainly to heterochromatin polymorphisms and single nucleotide variants (SNVs), in recent years, the development and application of microarray technologies and Whole Genome Sequencing (WGS) have enabled SV detection to the level of base pair resolution [1, 2]. Genetic structural variation can directly or indirectly influence gene dosage through different mechanisms, influencing phenotypic variation and disease [3, 4]. Germline SVs have a widespread influence on gene expression variation, where most SVs affecting expression are noncoding variants in gene regulatory regions [5]. The impact of germline variants on the regulation of specific genes is highly dependent on tissue and cellular context [6, 7]. To better understand genetic diversity and phenotypic variation, germline SVs and their associated impacted genes should be systematically cataloged across different cohorts and tissue and cellular contexts, including human disease.
Only in recent years have combined WGS and RNA-sequencing (RNA-seq) data on sizable numbers of human tumors become available [8, 9], allowing for integrative analyses of SVs and gene expression. Somatic SVs greatly impact the cancer transcriptome, involving associated copy number alterations (CNAs) and cis-regulatory alterations, as explored by our group and others, including the Pan-Cancer Analysis of Whole Genomes (PCAWG) consortium [10–20]. The global influence of germline SVs on gene expression in human cells and tissues has been largely understudied, owing in large part to the historical reliance upon SNP genotyping arrays for expression quantitative trait loci (eQTL) analyses [5]. Germline SV eQTLs using short-read WGS platform have been catalogued in at least two studies using the Genotype-Tissue Expression (GTEx) datasets, respectively involving diverse normal tissues from 147 and 613 individuals [5, 7], but few other combined datasets of germline SVs and gene expression data for large numbers of individuals would be currently available. Recently, we surveyed 1218 adult cancers from the PCAWG consortium to systematically catalog gene-level associations with differential tumor expression in conjunction with nearby germline SV breakpoints [21], using a gene-centric integrative analysis approach [18, 22]. Analogous to our studies of somatic SVs [13–18], we found hundreds of germline SV-expression associations that cut across tumors from various tissues of origin, while other germline SV-expression associations were tissue-specific. A subset of genes involving our germline SV-expression associations in adult cancers had established roles in disease and could conceivably contribute to cancer predisposition. Our analytical approach to exploring germline SVs in adult cancers remained to be carried out for pediatric brain or central nervous system (CNS) tumors, which represent an entirely different disease entity and tissue and cellular context from that of the PCAWG cohort.
Brain and other CNS tumors are the leading cause of cancer-related morbidity and mortality in children [9, 23, 24]. Germline SVs could contribute to cancer susceptibility with moderate to high penetrance [25]. A genetic predisposition to tumor development can include up to 10% of pediatric patients with brain or CNS tumors [23]. The Children’s Brain Tumor Network (CBTN) is a multi-institutional research consortium with a publicly accessible multi-omics data repository on more than 1000 pediatric brain tumors to date [9, 26]. CBTN data includes combined WGS and RNA-seq, from which somatic SVs and their impact have been explored [15, 27]. Genomic studies of adult cancers are insufficient for understanding pediatric cancers, as different disease etiologies would be involved [10]. In our studies of somatic SVs in pediatric brain tumors [15, 20], we have found SV-expression associations in that disease entity to involve a different set of genes compared to adult cancers. The CBTN datasets provide a unique opportunity to study how gene expression appears altered in pediatric brain tumors in association with germline SVs, including which genes and patients are involved. This avenue of study would characterize a class of normal variation across individuals in the context of the CNS, with a subset of the genes involved having potential relevance to disease. Recently, we explored global associations of germline SVs with DNA methylation in the CBTN cohort [28], focusing on the small subset of genes having both expression and methylation associations with SVs, but with the vast majority of genes (~ 98%) remaining unexplored in terms of expression patterns. The CBTN data do not provide germline SVs on individuals without cancer, and so case–control studies to identify variants as candidates for susceptibility to brain tumor development would not be possible here.
In this present study, we combined germline SV data (from blood samples) with tumor RNA-seq data across 1430 patients in the CBTN cohort to systematically catalog gene-level associations with differential tumor expression in conjunction with nearby germline SV breakpoints. By design, our study considered both common and rare SVs. The identified germline SV-expression associations involved hundreds of genes and cut across pediatric tumors spanning various brain and CNS histologies. Most of these genes would not necessarily have specific roles in cancer but would instead reflect a class of phenotypic variation across brain and CNS tissues. However, we could identify rare and singleton SVs disrupting important gene classes such as DNA repair. While our present study was not a case–control study to identify candidate disease-causing variants, we leveraged patient survival and somatic mutation data to highlight genes and associated germline SV patterns with potential roles as disease contributors. For some genes, the associated germline SVs and differential tumor expression patterns align with the patient’s ancestry. For other genes, the germline SVs and associated differential expression could predict patient survival. Furthermore, a subset of genes with germline SV-expression associations also showed targeting by somatic alteration, including somatic SVs.
Results
Germline SV patterns across pediatric brain tumor patients
As an avenue for exploring germline structural variation, we referred to the CBTN datasets of germline SV calls representing 1430 pediatric brain or CNS tumor patients (Table S1 and Fig. 1a and b). For each patient, one tumor was profiled for mRNA expression by RNA-seq, which allowed us to explore germline SV associations with differential gene expression across tumors (Fig. 1b and c). Based on data from the blood normal sample, we assembled a merged germline SV call set based on detection using two algorithms (Methods). A median of 1978 germline SVs was identified per patient (with a standard deviation of 162.8), where the numbers of germline SVs detected did not vary on average according to tumor histologic type (Table S1). In contrast to the somatic SVs involving a tumor sample (profiled for 1375 of the 1430 patients), most germline SVs in the CBTN dataset were deletions (79%), followed by duplications and inversions (12% and 9%, respectively, Figs. 1a and S1a). Only ~ 2% of germline SV calls identified by two SV calling algorithms were chromosome translocations, which likely largely represent artifacts [29] and were therefore removed from the analysis. In contrast, 52% of somatic SVs involved in the CBTN cohort were translocations. For intrachromosomal SVs, the median distance between breakpoints was much smaller for germline versus somatic SVs, ~ 460 bp versus 39 kb, respectively (Figure S1b).
Fig. 1.
Overview of the study. a Diagrams of the more common germline SV types [60]. Other, complex structural variations would exist [31], and SV type calls as made by Manta algorithm [61] may be later found to resolve to other types. b We referred to the CBTN multi-omics dataset representing 1430 patients to explore germline structural variation in pediatric brain tumors. Germline SV calls were generated from the blood normal sample (using whole genome sequencing data). The CBTN generated somatic mutation calls and gene expression profiles based on the corresponding tumor sample. We combined tumor RNA data with germline SV data across the 1430 cancer patients to determine the extent to which germline SVs are associated with differential expression of nearby genes. c Overview of the main findings of the study. Germline deletion SVs spanning genes or involving predicted loss of gene function frequently associated with lower expression in tumors. For specific genes, germline SV breakpoints near the gene were recurrently associated with differential expression in the tumor. For some of these genes, germline SV breakpoint patterns and associated differential expression involved pediatric brain tumor patient survival. For other genes, the germline SVs and differential expression patterns associated with patient ancestry. Furthermore, a subset of genes with germline SV-expression associations also appeared targeted by somatic alteration, including somatic SVs involving differential expression in the tumors
Most germline SVs in the CBTN dataset had been observed elsewhere, with 89.8% represented in the Database of Genomic Variants (DGV) curated from published studies [30] (Table S2 and Figures S1c and d). The frequency of germline SVs in the CBTN cohort broadly correlated with allele frequencies (AFs) as reported in gnomAD [31] (Fig. S1d), where some 96% of gnomAD SVs had AF < 1%. Of the CBTN SVs reported in gnomAD, a very small fraction—about 0.2%—were detected in > 10% of CBTN patients but with AF < 1% in both gnomAD and TOPMed [32] datasets (Table S2); however, we make no conclusions here regarding their possible disease relevance in the absence of future case–control studies. Germline SV events tended to be more frequent within specific genomic regions across patients. For each patient, we assessed cytoband-level enrichment of germline SVs, with a top set of 56 cytoband regions identified as significant (p < 0.0001 by chi-square test) for at least 20 patients (Figure S1d). All the above observations regarding germline SVs in the CBTN cohort reflected similar observations we previously reported for the PCAWG adult cancer cohort involving 1218 patients [21]. In particular, there was a highly significant overlap (p < 1E-16, one-sided Fisher’s exact test) between the top 56 enriched cytobands from the CBTN cohort and a top set of 79 enriched cytobands from the PCAWG cohort, involving 29 cytobands (Figure S1e). While the PCAWG cohort included some pediatric brain cancer patients with WGS data (from project PBCA-DE), none of these cases had RNA-seq data available as part of the combined PCAWG datasets.
Germline SVs involving gene deletion or loss of function
We might expect that duplication and deletion SV events involving breakpoints spanning genes would be reflected in the corresponding gene copy number. However, the copy number landscape in tumors is heavily influenced by somatic SVs [18], as was also reflected in tumors in the CBTN cohort (Fig. 2a, left), which can largely overshadow most germline SV-related copy number associations. Still, when examining germline deletion SV events whereby the associated breakpoints spanned one or more genes (the gene boundaries being within both breakpoints), we found a significant number of these also to involve observed copy loss in the tumor sample (Fig. 2a, right, p < 1E-25 chi-square test). An analogous though modest association between germline duplication SVs spanning genes and gene amplification was also observed (Fig. 2a, right). When we systematically carried out a gene-level expression analysis, comparing CBTN patients with germline deletion SVs spanning the gene versus other patients, we identified a top set of 48 genes with significantly lower expression in the tumor in association with deletion SVs, with 19 of these genes located in the genomic region spanning 14q21.3–14q24.1 (Fig. 2b). The 48 genes included GSTT2B on chromosome 22 [33], which deletion SVs spanning involving 1099 of the 1430 CBTN patients, with these patients showing dramatically lower GSTT2B expression (Fig. 2c, fold change = 0.41). When surveying the 14q region, we identified germline deletion SVs spanning a very large region (> 28 Mb), involving some 158 genes and eight CBTN patients (Fig. 2d, left). For the eight patients, the average normalized expression of the 158 genes in their tumor samples was markedly lower than that for the other patients (Fig. 2d, right), illustrating how the expression of these genes as an entire group was lower in association with these large-scale germline deletion SVs.
Fig. 2.
Germline deletion SVs associated with lower gene expression in pediatric brain tumors. a For both germline SV (blood-based) and somatic SV (tumor-based) call sets, gene-SV associations with SV breakpoints spanning the gene for a specific sample (the gene boundaries being within both breakpoints) are broken down by gene-level copy number in the tumor (corrected for tumor ploidy; amplification, > 5 copies; copy gain, 3–5 copies; copy loss, 0–1.2 copies), for duplication SVs versus deletion SVs. Enrichment patterns (by chi-square test) include germline deletion SVs associating with gene copy deletion. X and Y chromosomes not included here. b Top set of genes for which deletion SV events spanning the gene include more than five patients and are associated with lower tumor expression (p < 0.01, t-test). Of the 48 top genes, 19 are in the genomic region spanning 14q21.3-14q24.1. c GSTT2B mRNA levels in brain tumors corresponding to germline deletion SV breakpoints spanning the gene (left). Boxplot (right) shows GSTT2B expression by patient tumor samples with germline SV breakpoints versus other patient samples. d For eight CBTN patients, germline deletion SVs on chromosome 14 span a large region (> 28 Mb) involving more than 150 genes. The average normalized expression of 158 genes in this region for the eight patients is represented in the scatterplot (left) and in the boxplot (right), with the deletion SVs associating with lower expression of these genes as a group. For parts d-e, boxplots represent 5%, 25%, 50%, 75%, and 95%, with p-values by t-test, and data point coloring indicates tumor type
Rare and singleton germline SVs involving loss of gene function may confer risk to certain types of pediatric cancer [34]. In our CBTN cohort, we examined 10,788 germline SV deletion events involving 5973 rare (AF < 1%) or singleton SVs (i.e., found for just one patient) having a breakpoint within a gene. We considered whether each SV was predicted to cause loss of function of the gene and which tumors from patients with the SV had lower expression of the gene (< −0.5SD from the sample median). Of the 10,788 SV events, 420—involving 332 genes and 420 patients (Table S2)—had both predicted loss of gene function and lower gene expression, representing a significant overlap (p < 0.0001, one-sided Fisher’s exact test, Fig. 3a). Enriched gene categories [35] for the 332 genes included ‘mitochondrial part’, ‘DNA duplex unwinding’, ‘ATP hydrolysis activity’, ‘nucleotide binding’, ‘Golgi apparatus’, and ‘DNA repair’ (Fig. 3b). When examining 688 manually-curated Cancer Susceptibility Genes (CSGs) involving various cancer types [36], including pediatric cancers, we found 15 of these to involve the 332 with loss of function SVs, involving 31 of the 1430 patients in all, with most of the SVs being present in just one or two patients (Fig. 3c). These CSGs included PMS2 and NF2, which have already been understood to represent CSGs in pediatric brain and CNS tumors [23]. Similarly, 38 CBTN patients harbored rare SVs impacting DNA duplex unwinding or DNA repair genes (Fig. 3c), consistent with previous findings of inherited pathogenic variants in DNA damage repair genes in particular conferring germline predisposition to certain pediatric cancers such as Ewing sarcoma [34, 37]. With genes involving the mitochondrion being highly enriched within our top 332 genes (Fig. 2.5b, p < 1E-5, one-sided Fisher’s exact test), rare SVs impacting 36 genes in this category involved 68 patients in all (Fig. 3c). These rare and singleton SVs disrupting genes represent a different class of germline mutation from small mutations involved in known tumor predisposition syndromes [38] (Table S1).
Fig. 3.
Rare and singleton germline SVs with prediction loss of function. a Across the CBTN cohort, 10,788 germline SV deletion events involved rare (AF < 1%) or singleton SVs having a breakpoint within a gene. Of these events, 420 (involving 332 genes and 420 patients) had both predicted loss of function for the gene [31] and lower gene expression in the patient tumor sample (< −0.5SD from sample median). b Selected significantly enriched GO terms [35] within the 332 genes from part a. Enrichment p-values and numbers of overlapping genes are indicated for each GO term. For parts a and b, significance of overlap by one-sided Fisher’s exact test. c For cancer susceptibility genes (ref [36], upper left panel), DNA duplex unwinding or DNA repair genes (from part b, lower left panel), and mitochondrial-related genes (from part b, right panel), the genes and patients involving an SV with predicted loss of function for the gene. For each gene listed, the same germline SV is involved for all patients with SV event indicated. For each SV event, higher or lower gene expression in the tumor sample (> 0.5SD or < −0.5SD, respectively) is indicated (yellow and blue, respectively)
Gene-level germline SV-associated expression alterations
We set out to systematically identify genes for which the proximity of germline SVs was recurrently and significantly associated with differential expression in the tumor sample across multiple patients, which may involve cis-regulatory alterations [10, 11, 16]. Using integration approaches between SVs and gene expression, previously demonstrated in tumors for somatic or germline SVs [13–18, 21, 22], we assessed gene-level associations between tumor expression and nearby germline SV breakpoints across the CBTN cohort. For each gene with expression data from the tumor, we assessed the pattern of nearby germline SV breakpoints within a given region window (e.g., 100 kb upstream of the gene). From the CBTN dataset, we assembled a data matrix of gene-level breakpoint patterns for 21,642 genes across 1430 tumors. We then assessed the association between expression and germline SV breakpoint pattern for each gene by linear models correcting for covariates, including tumor histologic type. In some respects, our overall approach parallels the concept of eQTLs [5, 7, 39], but with our approach being gene-relative region-specific rather than SV-specific. Genomic region windows considered here included larger windows spanning 1 Mb upstream or downstream of a given gene, as germline SVs often span several kilobases and have been previously reported to be involved in long-range gene regulation [40].
By integrating transcriptomic data (from tumors) with germline SV data [22] (from blood normal), hundreds of genes showed significantly differential gene expression in relation to nearby SV breakpoints (relative to tumors without breakpoints). SV breakpoints associated with differential expression include breakpoints located downstream or upstream of genes or within the gene boundaries (Fig. 4a and Table S2). For regions 100 kb upstream of the gene, 100 kb downstream of the gene, within the gene body, or 1 Mb upstream or downstream of the gene, the numbers of significant genes at FDR < 10% were 116, 98, 93, and 136, respectively, after correction for tumor histologic type, tumor ploidy, and gene-level copy number (along with patient sex for genes on chromosomes X or Y). Interestingly, correcting for covariates such as tumor histologic type or gene copy number did not substantially change the numbers of statistically significant genes, suggesting that such covariates would not represent major confounders in the relationship between germline SVs and expression as observed in this tissue context. All major SV classes (duplication, inversion, deletion) were involved in the significant germline SV-expression associations, though associations involving higher expression were statistically significantly enriched for duplication SVs (Fig. 4b). A set of 308 genes associated with differential expression in conjunction with nearby germline SV breakpoints (FDR < 10%, after correction for tissue type, gene-level copy, and tumor ploidy, along with patient sex in the instance of X or Y genes) for any of the above four genomic region windows examined (Figs. 4c and d).
Fig. 4.
Genes with differential mRNA expression recurrently associated with nearby germline SV breakpoints across the CBTN cohort. a For each of four genomic region windows in relation to genes (100 kb upstream, 100 kb downstream, within the gene body, or 1 Mb upstream or downstream), the numbers of significant genes (FDR < 10%) with association between mRNA expression and nearby germline SV breakpoint. Numbers above and below the zero point of the y-axis denote positively and negatively correlated genes, respectively. Linear regression models evaluated significant associations with and without corrections for specific covariates, as indicated. b SV class breakdown (duplication, inversion, deletion) for the germline SV-gene associations with breakpoints 100 kb upstream of the gene involving higher or lower expression, respectively (p < 0.01 and expression greater than or less than the tumor median, respectively). Enrichment p-values by chi-square test. c Heatmap of significance patterns for 308 genes associated with differential expression in conjunction with nearby germline SV breakpoints (FDR < 10%, after correction for covariates) for any genomic region window. Genes listed are cancer-associated by COSMIC [62]. d Significance of genes with germline SV-expression associations (best gene FDR among the genomic region windows 100 kb upstream, 100 kb downstream, and within the gene), as plotted (y-axis) versus the percentage of samples with SV breakpoint in the genomic region window (x-axis). e Venn diagrams representing overlapping top genes between the CBTN-based germline SV-expression associations (p < 0.01, any of the four genomic region windows) and germline SV-expression associations from the PCAWG adult cancer cohort [21]. Left diagram is for positively associated genes in both cohorts; right diagram, for negatively associated genes. Significance of overlap by one-sided Fisher’s exact test. f ADGRG7 and TFG mRNA levels in pediatric brain tumors (orange and purple, respectively) corresponding to germline SV breakpoints (left) involving a previously identified TFG-ADGRG7 polymorphic gene fusion in healthy individuals [41]. Each SV for a patient sample has two breakpoints represented. Boxplot (right, representing 5%, 25%, 50%, 75%, and 95%, p-value by t-test) shows ADGRG7 expression by patient tumor samples with or without SV breakpoint. For parts b-e, SV-expression association p-values and FDRs correct for tumor type, gene-level copy, and tumor ploidy by linear modeling
A significant fraction of the germline SV-expression associations observed in the CBTN cohort was represented in other cohorts. Previously, we had generated germline SV-expression associations for the PCAWG cohort of 1218 tumors representing adult cancers from various tissues of origin [21]. Of the 466 genes with mRNAs positively correlated with germline SV breakpoints in the CBTN cohort (p < 0.01, linear model with covariates, any genomic region window), a significant number—36—were similarly positively associated between germline SVs and expression in the PCAWG cohort (overlap p < 1E-6, one-sided Fisher’s exact test; Fig. 4e, left). Similarly, 37 of the 469 genes with mRNAs negatively correlated with germline SV breakpoints in the CBTN cohort (p < 0.01) were also negatively correlated in the PCAWG cohort (overlap p < 1E-14; Fig. 4e, right). We also found that our germline SV-expression associations from the CBTN cohort shared significant overlaps with SV-eQTLs previously cataloged using normal brain tissues from the GTEx project [7] (Fig. S1g). At the same time, most of the germline SV-expression associations in the CBTN cohort appear unique to that cohort as compared to PCAWG or GTEx cohorts, which might be attributable to the unique tissue and cellular context represented by the CBTN cohort. Genes with germline SV-expression associations found for CBTN and PCAWG cohorts included ADGRG7, involved in a previously identified TFG-ADGRG7 polymorphic gene fusion in healthy individuals [41]. In the CBTN cohort, 30 patients harbored germline SV breakpoints corresponding to this TFG-ADGRG7 fusion, involving very high ADRGRG7 expression levels in particular (Fig. 4f, fold change = 6.7). We also examined the CBTN germline SV-expression associations for any associations with gene regulatory elements [42], and while a fraction of these associations might have a possible explanation, e.g., in terms of regulatory element disruption or duplication of enhancers (Fig. S2), most associations did not. Elsewhere, we explored the impact of SV-associated altered DNA methylation on expression in pediatric brain tumors [28], involving a subset of genes.
The CBTN germline SV-expression associations reported above were significant after accounting for tumor histologic type, meaning that the extensive molecular differences involving tumor histology would not explain these associations. For the top SV-expression associations, including specific genes highlighted above and below, the associated SVs and differential expression patterns spanned multiple tumor types, similar to what we previously observed regarding somatic SV-associated expression patterns [15, 20]. The SV-expression associations as catalogued here represent a class of phenotypic variation, where most all the genes involved are presumed to not represent cancer predisposing genes. In general, cancer predisposing genes tend to be cancer type specific [43]. We carried out a formal search to see for which of the 925 genes involved in the top SV-expression associations in either direction (p < 0.01, linear model with covariates, any region) involved significant enrichment of SV breakpoint patterns (p < 0.001 chi-squared test) for one of 24 tumor types (minimum of five patients). Only ten genes (ATRT: ERCC6L2, HCCS, JPX; CRANIO: PGS1, LYZL1; DIPG: TRIM45, TFB2M, UGT2B17; NFIB: TNNI3K; SCHW, ZC3H11A) met the above criteria, and the associated SV breakpoint patterns for each of these genes still spanned multiple tumor types (Table S3) and so did not fit the expected profile for strong cancer predisposing gene candidates (e.g., having high penetrance).
Germline SVs and associated genes involving patient ancestry
Widespread genetic differences are associated with ancestry [44]. In the CBTN cohort, we could identify germline SVs that distinguished the patients based on ancestry. In an unsupervised analysis, the gene-centric germline SV breakpoint patterns for the genes with SV-expression association (p < 0.01, linear model with covariates) showed an overall separation between patients of African ancestry and other patients, as well as some separation by Asian ancestry (Fig. 5a). We also observed similar overall germline SV breakpoint patterns by ancestry in the PCAWG adult cancer cohort (Fig. S3a). For each of the three major ancestral groups represented in the CBTN cohort—African, European, and Asian—and for each genomic region window (100 kb upstream, 100 kb downstream, within-gene, 1 Mb up- or downstream), there were hundreds of genes over the chance expected for which the gene-centric germline SV breakpoints were enriched (p < 0.01, one-sided Fisher’s exact test) in that ancestral group compared to the rest of the patients (Table S4). For each ancestral group and gene region window, there was very high overlap between the top genes with enrichment of germline SV breakpoints as respectively observed in the CBTN and PCAWG cohorts (Fig. S3b–d), with close to half of the CBTN genes being similarly significant in the PCAWG cohort. In addition, for each major ancestral group, we found widespread gene expression differences higher or lower in tumors of that group versus tumors from the rest of the patients (Table S4).
Fig. 5.
Germline SV breakpoint patterns involving patient ancestry. a Principal Components Analysis (PCA) plot of the 1430 CBTN patients, based on the germline SV breakpoint patterns for the genes with significant SV-expression association (p < 0.01, linear model with covariates) for genomic region windows 100 kb upstream, 100 kb downstream, or within the gene (breakpoint present vs absent for the given gene, patient, and region window). Patients are colored by ancestry (AFR, African; EUR, European; ASN, Asian; AI/AN, American Indian or Alaskan Native; NH, Native Hawaiian). b For the genes with significant germline SV-expression association in the CBTN cohort (p < 0.01 for 100 kb upstream region window, linear model with covariates), significant numbers had the associated SV breakpoint patterns enriched in African ancestry versus other patients (p < 0.01, one-sided Fisher’s exact test) or were differentially expressed in African ancestry versus other patient tumors with the same direction of change (p < 0.05, linear model correcting for tumor type). Gene set overlap p-values by one-sided Fisher’s exact test. c Top set of 41 genes (including ACOT1 and CRYBB2P1) with both germline SV breakpoints and expression associated with patients of African ancestry. These genes have a germline SV-expression association (p < 0.01) for one of the four genomic region windows examined, with significant enrichment in African ancestry patients of SV breakpoints for that region window (p < 0.01) and with higher or lower expression in African ancestry patients (p < 0.05) in the same direction as the SV-expression association. d ACOT1 tumor mRNA levels corresponding to germline SVs located in the genomic region 100 kb downstream of the gene (left). Each SV for a given patient sample has two breakpoints represented. A deletion SV (“SV 1”) and a duplication SV (“SV 2”) respectively involve mostly African and European ancestries. Boxplot (right) shows ACOT1 expression by tumor samples by germline SV status (African-associated versus European-associated versus other). e Similar to part d, but for germline SV breakpoints within or upstream of CRYBB2P1, the latter associating with African ancestry patients and higher expression. For parts d and e, boxplots represent 5%, 25%, 50%, 75%, and 95%, with p-values by t-test
For the sets of genes with germline SV-expression associations in the CBTN cohort (p < 0.01, linear model with covariates), there were significant overlaps with the genes respectively differentially expressed in tumors from African patients (with the same direction of change as the SV-expression association) or having germline SV breakpoint patterns enriched in African patients (for the same genomic region window, Figs. 5b and S3e). We defined a top set of 41 genes with both germline SV breakpoints and expression associated with patients of African ancestry as described above, with the genes also having a germline SV-expression association (Fig. 5c). Similar sets of genes could be defined according to European or Asian ancestry, though with fewer top genes (Table S4). The top 41 African ancestry-associated genes included ACOT1 and CRYBB2P1. ACOT1 had two recurrent germline SVs with breakpoints 100 kb downstream of the gene, one mostly represented by patients of European ancestry and the other mostly represented by patients of African ancestry, with the latter in particular associated with higher expression in tumors (Fig. 5d, fold change = 1.7). ACOT1 encodes Acyl-CoA thioesterase 1 catalyzes the hydrolysis of long-chain acyl-CoAs to free fatty acids and coenzyme A and is typically upregulated in obesity [45]. The genomic region involving CRYBB2P1 included three recurrent germline SVs, one with both breakpoints upstream of the gene, one with a breakpoint within the gene, and one with SV breakpoints spanning the gene. The germline SVs with at least one breakpoint upstream were over-represented for African ancestry patients and associated with higher CRYBB2P1 tumor expression (Fig. 5e, fold change = 1.4). In breast cancer, CRYBB2P1 is significantly higher in tumors of African ancestry patients relative to European patients and enhances tumorigenesis by promoting cell proliferation [46].
Germline SVs and associated genes involving patient survival
While most genes with germline SV-associated differential expression in the CBTN cohort would presumably have no role in pediatric cancer, a subset of these genes may have potential roles. Such genes may include those with established links to cancer in the literature, while data-driven approaches may reveal other genes of potential interest. Overall survival data were available for 1152 of the 1430 CBTN patients (Table S1). For each of the 21,642 genes in our compiled CBTN datasets, we associated the germline SV breakpoint pattern with survival (by Cox, accounting for tumor histologic type) for each of the SV breakpoint matrices respectively involving the four genomic region windows examined for SV-expression associations (Table S5). Using a strict FDR cutoff of 10%, 39, 15, 34, and 35 top genes were respectively associated with patient survival for gene-centric regions 1 Mb, within gene, 100 kb upstream, and 100 kb downstream (Table S5), and using a relaxed p-value cutoff of 0.05, about 75% more significant genes were found over chance expected due to multiple testing, indicative of information in the patient germline representing a factor in patient outcomes. In carrying out integrative analyses to identify top genes involving SV-associated survival patterns, we used a more relaxed p-value cutoff for each individual analysis to limit false negatives, consistent with approaches used elsewhere to identify enrichment patterns that would be missed using overly strict FDR cutoffs [13, 21, 47, 48]. In particular, for each of the genomic region windows 100 kb upstream of the gene, 100 kb downstream of the gene, and within the gene body, we observed significant overlap between genes with SV-expression association (p < 0.01 by linear model with covariates) and genes with germline SV breakpoint patterns positively associated with pediatric brain tumor patient survival (one-sided Cox p < 0.05, Fig. 6a). The enrichment patterns suggested potential roles involving some of the observed SV-expression associations with more aggressive pediatric brain tumors.
Fig. 6.
Germline SV breakpoint patterns involving pediatric brain tumor patient survival. a Venn diagrams of the overlap between genes with germline SV breakpoint patterns associated with pediatric brain tumor patient survival and genes with SV-expression association for each of the genomic region windows represented. Enrichment p-values by one-sided Fisher’s exact test. b Combining germline SV data with patient survival data and tumor expression data across the 1430 CBTN patients (1152 with available survival data), 85 genes had both a positive association between germline SV breakpoints near the gene and worse overall survival and a positive or negative germline SV-expression association, for the same genomic region window. c Association of the 85-gene signature from part b with patient survival in the CBTN cohort, based on scoring of the tumor mRNA expression profiles. The direction of each gene in the 85-gene signature, as applied here to the CBTN RNA-seq dataset, is based on the direction of the germline SV-expression association. Survival association p-values correct for tumor type. d ACTG1 tumor mRNA levels corresponding to germline SVs located in the genomic region 100 kb downstream of the gene (left). Each SV for a given patient sample has two breakpoints represented. Boxplot (right) shows ACTG1 expression by tumor samples with germline SV breakpoint downstream of the gene versus other tumors. e Similar to part d, but for germline SV breakpoints located 1 Mb upstream or downstream of AHRR. f Associations with worse patient survival for ACTG1- and AHRR-associated germline SV breakpoint patterns and for ACTG1 and AHRR expression in tumors. AHRR SV patterns involve breakpoints 500 kb upstream or 250 kb downstream (corresponding to the boxplots of part e, involving patients with survival data); ACTG1 SV patterns involve breakpoints 100 kb downstream. For parts a and b, germline SV breakpoint survival associations by one-sided p < 0.05 by Cox accounting for tumor type (minimum of 10 patients with breakpoint in region), and germline SV-expression associations by p < 0.01 by linear model correcting for covariates. For parts a and b, Cox p-values are one-sided; all other Cox p-values in the figure are two-sided. For part f, survival associations are significant (p < 0.05) by Cox correcting for tumor type (Table S5). For parts d and e, boxplots represent 5%, 25%, 50%, 75%, and 95%, with p-values by t-test
We identified a top set of 85 genes for which there was both a positive association (for any region examined) between germline SV breakpoints near the gene and worse overall survival and a positive or negative association between nearby germline SV breakpoints and differential gene expression (Fig. 6b, 43 genes with positive SV-expression association). As we had not used expression associations with survival to derive the 85 genes, we tested whether the 85-gene signature could collectively predict patient outcome using the CBTN expression dataset, whereby patients were significantly stratified into high-, low-, and intermediate-risk groups (Fig. 6c, Log-rank p = 0.009, accounting for tumor type). Several factors, not limited to SVs, may underlie the survival-associated expression patterns in tumors observed across the CBTN cohort. Genes in the 85-gene signature include ACTG1 and AHRR, both having nearby germline SV breakpoints associated with slightly elevated expression in tumors (Figs. 6d and e, fold change = 1.07 for both). ACTG1 encodes an actin isoform, and ACTG1 overexpression promotes growth and migration and is associated with greater metastatic potential and worse prognosis [49]. AHRR encodes a suppressor of Aryl Hydrocarbon Receptor (AhR), where AhR may act as either an oncogenic protein or a tumor suppressor [50]. For ACTG1 and AHRR, germline SV breakpoints and expression in tumors were each associated with worse patient survival (Fig. 6f), where other factors could influence the differential expression of these genes across tumors in addition to germline SVs. Similarly, 52 genes with SV breakpoints associated with better survival also involved expression correlates of survival (Fig. S4 and Table S5).
Germline SVs and associated genes involving somatic targeting
As another avenue for identifying potential disease-relevant genes among those with germline SV-associated differential expression, we considered genes appearing somatically targeted in the CBTN cohort. Of the genes with germline SV-expression associations in the CBTN cohort (p < 0.01, linear modeling with covariates, any region window), a significant number—39 (p = 0.002, one-sided Fisher’s exact test)—showed somatic copy number amplification in > 2% of tumors, which associated amplification patterns were not limited to patients with a germline SV (Fig. 7a). Also, in parallel with the above analyses linking germline SV breakpoint patterns with differential expression of nearby genes, we carried out a similar set of analyses for the CBTN somatic call set involving 1375 of the 1430 patients (Table S6). For somatic SV breakpoints in regions 100 kb upstream of the gene, 100 kb downstream of the gene, within the gene body, or 1 Mb upstream or downstream of the gene, the numbers of significant genes at FDR < 10% were 254, 219, 134, and 354, respectively, after correction for tumor type, tumor ploidy, gene-level copy number, and patient sex (the last covariate for X/Y genes). Notably, there was little overlap in the significant genes between the germline and somatic SV results sets. A set of 37 genes had both germline and somatic SV breakpoints associated with differential expression (p < 0.01 for each by linear modeling with covariates) in the same direction and for any genomic region window (Fig. 7b).
Fig. 7.
Genes with germline SV-expression associations that also appear targeted by somatic alteration in pediatric brain tumors. a Of the genes with germline SV-expression associations in the CBTN cohort (p < 0.01), a significant number—39—showed somatic copy number amplification (> 4 copies, corrected for tumor ploidy) in > 2% of CBTN tumors (p-value by one-sided Fisher’s exact test). The heatmap off to the right represents gene amplification patterns for the 39 genes, involving 145 tumors with amplification for any gene. See part c for tumor type color bar scheme. b Heatmap of significance patterns for 37 genes with both germline and somatic SV breakpoints associated with differential expression (p < 0.01 for each by linear modeling correcting for covariates) in the same direction and for any genomic region window (100 kb upstream of the gene, 100 kb downstream of the gene, within the gene body, or 1 Mb upstream or downstream of the gene). c HGF mRNA levels in brain tumors corresponding to germline SV breakpoints (orange) and somatic SV breakpoints (purple) located upstream of the gene (left). Boxplot (right) shows HGF expression by patient tumor samples with somatic SV breakpoints versus samples with germline SV breakpoints versus other patient samples. d Similar to part c, but for germline and somatic SV breakpoints located upstream of or within BCOR. For parts a and b, SV-expression association p-values correct for tumor type, gene-level copy, and tumor ploidy by linear modeling. For parts c-d, boxplots represent 5%, 25%, 50%, 75%, and 95%, with p-values by t-test; for somatic SV breakpoints, scatterplots represent the breakpoint closest to the gene start
Genes of particular interest within the set of 37 with both germline SV-expression and somatic SV-expression associations included two genes with well-established cancer associations: HGF and BCOR (Fig. 7c and d). HGF signaling promotes tumor progression and angiogenesis in many cancers, including brain tumors [51]. Of the 1430 CBTN patients, nine had tumors harboring somatic SV breakpoints upstream of HGF, and an additional six patients had germline SV breakpoints upstream of HGF, with HGF expression notably higher in tumors from both patient groups (Fig. 7c, fold change = 2.4 for germline SVs). BCOR-associated somatic SVs appeared to represent the rare disease known as central nervous system tumor with BCOR internal tandem duplication [52]. BCOR-associated somatic SVs were located within the gene and involved ten patients, while most germline SVs associated with BCOR had breakpoints upstream and involved 46 patients, with BCOR expression notably higher in tumors from both patient groups (Fig. 7d, fold change = 1.3 for germline SVs).
Discussion
Here, we assembled a catalog of gene-level germline SV-expression associations in pediatric brain and CNS tumors. Some germline SV-expression associations involved gene deletion or disruption, while others represented cis-regulatory alterations. A significant fraction of the SV-expression associations from the CBTN pediatric brain tumor cohort was observable in our previous results defining germline SV-expression associations in adult cancers [21] and SV-expression analysis results across various normal tissues [7]. However, most CBTN associations were specific to this cohort, indicating the importance of tissue and cellular context in cataloging germline SV-expression associations. Of the individual genes highlighted in our study report, only ADGRG7, CRYBB2P1, and ACOT1 were included in our previous results set of SV-expression associations based on PCAWG cohort [21]. Our results capture a class of phenotypic variation where the vast majority of the genes involved would likely not be disease contributors. For most SV-expression associations not involving gene disruption or copy number changes, there was no clear explanation as to what specific gene regulatory elements might be involved, which may reflect a need for a deeper understanding of gene regulatory mechanisms in general [7]. The widespread associations reported here are non-random but in and of themselves do not demonstrate cause and effect in every instance, where some other element associated with a germline SV might be involved. Some genes with germline SV-expression associations involve subtle versus dramatic changes in expression levels. Whether small expression changes in cancer-relevant genes throughout a patient’s life could contribute to cancer remains an open question. Our molecular landscaping study mapping out SV-expression associations across pediatric brain tumors involves both exploratory and cataloging aspects, where the translational implications of this work may not be immediate (e.g., in terms of providing a genetic test for clinical application) but would be downstream and involving a subset of our SV-expression associations, as described further below.
In the CBTN cohort, we could identify germline SVs and associated differentially expressed genes that distinguished patients based on ancestry. A large portion of the ancestry-specific germline SV breakpoint patterns observed in the CBTN pediatric brain tumor cohort was also observed in the PCAWG adult cancer cohort, even if any associated expression differences were not always observable in both cohorts. These results reflect the need to include diverse patient populations in genetic and genomic studies [44, 53]. Presumably, most of the ancestry-associated genes highlighted in our study would not relate to human disease but instead reflect the phenotypic variation between patient populations. Still, a few genes uncovered in our study would have associations with disease in the literature (though not necessarily regarding pediatric brain tumors), including CRYBB2P1 [46], ACOT1 [45], and XIAP [54]. As larger genomic datasets with combined WGS and expression data become available, more refined catalogs of germline SV-expression associations across various normal and disease contexts can be established, where larger patient numbers would provide greater statistical power for establishing robust patterns for a given gene and uncovering additional genes of interest. Our present study relied on short-read WGS, and while emerging technologies like long-read WGS can identify additional SVs not detectable by short-read sequencing [55], our study’s strengths included the higher numbers of patients over what other studies may involve.
The germline SV-expression associations in the CBTN cohort would include genes relevant to cancer. Such genes may include those with established links to cancer in the literature, while other genes of potential interest were revealed here by incorporating patient survival and somatic mutation patterns. The subset of germline SVs involving genes relevant to cancer arising in our present study would represent strong candidates for further investigation to establish any potential cancer risk associations, e.g., involving Genome-Wide Association Studies (GWAS) and similar studies involving both cases and controls. Knowledge on genetic predisposition for brain and CNS tumors remains sparse [56]. Studies utilizing SNP genotyping arrays and next-generation sequencing have defined both pathogenic germline mutations in known cancer genes and common SNVs associated with pediatric brain tumors [56, 57], but more analogous WGS-based case–control studies of SVs remain to be carried out. A recent study by Gilliani et al. [34], carried our GWAS of germline SVs in Neuroblastoma and Ewing sarcoma, involving both cases and controls; however, no single germline SV was individually associated with the risk of these cancers. However, as a group, rare and singleton SVs that result in loss of function was shown to increase risk. Likewise, our present study cataloged such rare SVs, implicating processes of DNA repair and mitochondria. For future SV GWAS or case–control studies in pediatric tumors, our present study’s catalog of SV-expression represents an important resource, as any disease-associated SVs would need context to help interpret the association in terms of which nearby genes appear affected. Any established cancer risk variants could potentially aid genetic testing in the clinical setting. At the same time, the penetrance of many genes would be moderate, and any associated patient survival patterns may not be dramatic though statistically significant, so such genes may not be of much use in clinical decision-making but still provide insight into disease biology [58].
Our present study results include an intriguing observation of germline SV breakpoint patterns associated with patient survival. This observation mirrors a similar finding we previously reported for the PCAWG cohort [21], but involving different genes, as well as similar findings in CBTN regarding DNA methylation patterns [28]. As with previous results, the CBTN cohort results show gene-level survival associations exceeding the chance expected due to multiple gene testing. While for nominal p-value cutoffs it may be difficult for us to robustly associate a particular gene with survival, the top associated genes together represent information regarding germline influences on patient outcome. The patient survival associations provide us a means of highlighting genes and associated germline SV patterns with potential roles as disease contributors. Nominally significant survival associations at the germline level, combined with SV-expression association and other information such as survival associations at the expression-level can help prioritize genes for further investigation. The observed prognostic power associated with germline SVs would be weak compared to other clinical or molecular variables and, therefore, would not have utility in predicting survival in the clinical setting. Human beings are highly complex, and a subtle influence of the patient germline on disease outcome, present alongside other factors, is conceivable. As larger combined SV and expression datasets become available (as expected for the CBTN [9]), we can anticipate greater power in defining global patterns involving patient outcome [20, 59], which may include survival associations within specific tumor types or molecular subgroups (not possible here due to limited power). Our catalog of germline SV-expression associations would involve genes that would be informative about the biology of cancers and thereby could inform future studies targeting these genes or their associated pathways.
Methods
Patient cohort
Results are based on data generated by the CBTN. At the time of this study, 1430 patients had data available for both short-read WGS (at 60 × coverage for tumor, 30 × coverage for blood normal) and RNA-seq (tumor only, 30 × coverage). Tumor samples in this CBTN cohort spanned at least 33 different tumor histologic types (Table S1): APTAD, Adenoma; ATRT, Atypical Teratoid Rhabdoid Tumor; CHDM, Chordoma; CNC, Neurocytoma; CPC, Choroid plexus carcinoma; CPP, Choroid plexus papilloma; CRANIO, Craniopharyngioma; DIPG, Diffuse intrinsic pontine glioma; DNT, Dysembryoplastic neuroepithelial tumor (DNET); EPM, Subependymal Giant Cell Astrocytoma (SEGA); EPMT, Ependymoma; ES, Ewing's Sarcoma; GMN, Germinoma; GNBL, Ganglioneuroblastoma; GNG, Ganglioglioma; GNOS, Glial-neuronal tumor not otherwise specified (NOS); HMBL, Hemangioblastoma; LCH, Langerhans Cell histiocytosis; MBL, Medulloblastoma; MNG, Meningioma; MPNST, Malignant peripheral nerve sheath tumor; NBL, Neuroblastoma; NFIB, Neurofibroma/Plexiform; ODG, Oligodendroglioma; PBL, Pineoblastoma; PCNSL, Primary CNS lymphoma; PHGG, High-grade glioma/astrocytoma (WHO grade III/IV); PLGG, Low-grade glioma/astrocytoma (WHO grade I/II); PNET, Supratentorial or Spinal Cord primitive neuroectodermal; RMS, Rhabdomyosarcoma; SARCNOS, Sarcoma; SCHW, Schwannoma; TT, Teratoma; and Other/unspecified. The histologic designations of the tumors, as provided by the individual CBTN member institutions contributing the samples, were confirmed by independent pathology review at the CBTN centralized biorepository, with most contributing sites providing representative histology slides. The CBTN’s regulatory for CNS tumors intentionally allowed for the broad collection of abnormal cell growth, not necessarily specific to brain-specific cell types. Likewise, all patients with available CBTN data on combined RNA-seq and WGS formed the basis of our study. For some patients, multiple tumors were profiled for RNA-seq (e.g., recurrent or progressive and initial tumor); for these patients, one tumor was randomly selected for this present study. Tumor molecular profiling data were generated through informed consent as part of the CBTN efforts and analyzed here per the CBTN’s data use guidelines and restrictions.
Molecular profiling datasets
The somatic DNA workflow for DNA variant calling is available in the KidsFirst Github repository (https://github.com/kids-first/kf-somatic-workflow) and described previously [26]. The CBTN used Manta SV v1.4.0 algorithm for somatic SV calls [61] based on WGS data. Germline SV calls were made using Manta and SVABA (v1.1.0) [64] algorithms. The hg38 reference used for somatic SV calling was limited to canonical chromosome regions. We accessed the somatic SV VCF and gene-level copy number CNS files (both based on WGS data) through the CBTN Cavatica site (https://cbtn.org) during March and December of 2023 from the “CBTN” and “CBTN-X01” folders. We accessed the germline SV VCF files during January and February of 2024 through the Cavatica site, using a private link provided by the CBTN for protected patient genomic data. Manta algorithm classified each SV call as one of the following: tandem duplications, insertions, deletions, inversions, and translocations. We used only germline and somatic SV calls that passed quality filters in the analyses.
Germline SV calls from Manta and SVABA were pairwise joined based on SV position, allowing 200 bp slop at the breakpoints. For this merged germline SV call set, we used the Manta SV genomic coordinates throughout the study and required a minimum SV size of 10 bp. To minimize any potential batch effects due to the data for the first CBTN cohort being generated much earlier than the CBTN-X01 cohort [15], we filtered out of the merged germline SV call set any SVs that were found in only one of the two cohorts (for SVs involving at least six patients). A small percentage of germline SV calls represented translocations, which in practice are often false positives caused by misalignments of repeat expansions [29]. Therefore, we removed all germline translocation SV calls from the analyses.
We obtained processed RNA-seq data for CBTN tumors from the CBTN Cavatica site in March 2023 from the “CBTN” and “CBTN-X01” folders. We obtained processed RNA-seq data for CBTN tumors from the CBTN Cavatica site on March 7, 2023, from the “CBTN” and “CBTN-X01” folders. As we observed extensive batch effects between the CBTN and CBTN-X01 RNA-seq datasets (due to the data for the CBTN dataset being generated much earlier than the CBTN-X01 dataset [15]), we previously generated a batch effect-corrected CBTN RNA-seq dataset [20] using the SVA package and ComBat algorithm in R Bioconductor [65, 66], using histology as the experimental group (with histologies of < 20 tumors consolidated into an “other” group), and removing genes with non-zero values in < 20 tumors from the batch-correction. The final batch-corrected dataset showed dominant global expression patterns according to histologic type rather than batch [20]. For representing expression values in scatterplots and boxplots, we used z-normalized expression (subtracting the median and dividing by standard deviation, using log2-transformed values).
Using WGS data, the CBTN made gene-level copy number calls in the tumor sample based on consensus among Control-FREEC [67], CNVkit [68], and Manta [61] algorithms, as previously described [26]. Of the 1430 CBTN patients analyzed in this study, 55 did not have somatic WGS calls available. For these patients, we inferred copy number using the copyKAT algorithm (v1.1.0) as applied to the RNA-seq dataset [69]. The combined datasets (including RNA-seq and gene-level copy number calls) involved 21,642 genes with Entrez identifier. We estimated tumor ploidy for each patient tumor sample by taking the median copy number levels across all 21,642 genes. To make calls for gene amplification or gene copy loss, we generated a ploidy-corrected copy number dataset by dividing the copy number by the tumor ploidy and multiplying it by two.
Integrative analyses between SVs and expression
Rare and singleton SVs were defined in our call set as SVs with either AF < 1% (CBTN, gnomAD, and TOPMed cohorts) or occurring in a single CBTN patient. For the subset of deletion SVs for which one breakpoint fell within a gene boundary, we used SVAnnotate of the GATK-SV pipeline [31] to identify SVs with predicted loss of function (e.g., the deletion overlaps any coding sequence of the transcription start site). Of the original candidates SVs that were not deletion SVs, only three duplication SVs were called as involving predicted loss of function. In one analysis, we focused on 688 manually curated Cancer Susceptibility Genes involving various cancer types [36] (accessed at csgs.sequenxe.com in May 2024).
Using SVExpress [22], we systematically defined genes with differential RNA expression in the tumor being associated with nearby germline SV breakpoints. Relative to each gene, genomic region windows considered included 100 kb upstream of the gene (from the gene start), 100 kb downstream of the gene (from the gene end), within the gene, and 1 Mb upstream or downstream of the gene start. For the 100 kb upstream or downstream or within gene regions, SVExpress constructed a gene-to-sample matrix with entries as 1 if a breakpoint occurs in the specified region with respect to the given gene in the given sample and 0 if otherwise. For the 1 Mb region, SVExpress constructed a gene-to-sample matrix using the “relative distance metric” option [17], whereby breakpoints close to the gene have more numeric weight in identifying SV-expression associations, while breakpoints further away but within 1 Mb can still have some influence. For the 1 Mb region, if multiple breakpoints occur near the gene, the breakpoint closest to the gene start is used in the breakpoint matrix. The same genomic region windows used in our present study were previously utilized in multiple studies of ours involving somatic SV-expression analyses [13–15, 20–22]. Gene-level SV-expression association analyses included 21,642 genes with Entrez identifier. Using the geneXsample SV breakpoint matrix, SVExpress assessed the correlation between the expression of the gene and the presence of an SV breakpoint using a linear regression model (with log-transformed expression values). Linear regression models separately evaluated significant associations without any correction for covariates or when correcting for tumor tissue type or when correcting for tissue type, gene-level copy number (from the tumor sample), and tumor ploidy. For genes on chromosomes X or Y, we added patient sex as a model covariate. Genes significant by the third model—correcting for tissue type, gene-level copy number, and tumor ploidy (and patient sex for chromosome X/Y genes)—were carried forward in the downstream analyses.
By SVExpress, a gene shows significant SV-expression associations if the expression and SV breakpoint patterns line up non-randomly with respect to each other across all samples after correction for covariates. For the analyses involving within-gene, 100 kb upstream, and 100 kb downstream gene regions, we only considered genes for which at least three patients had an SV breakpoint within the given region when estimating FDR by Storey and Tibshiani method [70]. All genomic coordinates were based on the hg38 human reference genome. As for the germline SVs, we also generated SV-expression associations based on the somatic SVs. For the somatic SV-expression analysis, 1375 of the 1430 patients had a tumor with somatic SV calls by WGS. Where the patient tumor represented in the RNA-seq dataset had somatic SV calls, we used those calls in the analysis; otherwise, we used somatic calls for a different tumor from the same patient where available (Table S1).
Our analytical approach to associate SVs with gene expression was gene-centric rather than variant-centric. This aspect is in contrast with eQTL-based approaches focusing on variant-level associations [39]. In contrast to SNVs, there were much fewer germline SVs per patient, making our gene-centric approach tractable. Also, the gene-centric approach greatly simplifies the number of tests performed, allowing more associations to survive multiple testing considerations. Our approach also allows for different variants within a region to collectively contribute to differential expression, which lends itself well to the study of somatic SVs in particular, as somatic SVs are usually not recurrent but can have breakpoints covering a sizeable genomic region in relation to genes [12]. In addition, some germline SV calls representing the same variant may differ slightly in genomic coordinates, e.g., due to sequencing or mapping issues, where all variant calls would be consistently captured by our approach. In contrast, germline SVs tend to be highly recurrent and common, and in some cases, only certain germline SVs and not others within a given region would associate with differential expression of nearby genes.
We found that RNA sequencing center did not contribute to the significant SV-expression associations observed. For 1397 of the 1430 tumors with combined germline WGS and RNA-seq data in our present study, the sequencing center information (e.g., NantOmics, BGI@CHOP Genome Center, BGI) was available (85% of samples from NantOmics). Of the 466 genes significant at p < 0.01 for 1 Mb region, correcting for the covariates as described above, all genes remained significant (p < 0.05) when incorporating sequencing center as a covariate.
Associations with patient ancestry
We obtained self-reported patient ancestry from the PedCBioportal (https://pedcbioportal.kidsfirstdrc.org/) and from Cavatica website (https://cbtn.org), for the handful of patients not included in PedCBioportal). For the three most represented ancestral groups (African, European, Asian), we determined differential expression in each ancestral group relative to the rest of the patients by linear model incorporating tumor histologic type as a covariate. Also, for each major ancestral group, we determined the enrichment of SV germline breakpoint patterns for each gene. For each genomic region window relative to genes considered (100 kb upstream of the gene, 100 kb downstream of the gene, within the gene, and 1 Mb upstream or downstream of the gene start), we counted the patients with a germline SV breakpoint relative to the gene in the region of interest, then determined enrichment of breakpoint occurrences by one-sided Fisher’s exact test. For each major ancestral group and gene region, we determined the sets of overlapping genes involving two or more of the following: germline SV-expression association across all patients (by linear model correcting for covariates), significant enrichment of breakpoint patterns for that ancestral group (by one-sided Fisher’s exact test), and higher or lower expression in patients of that ancestral group in the same direction as the SV-expression association (by linear model correcting for tumor type).
Survival analyses
We identified gene-level molecular correlates of patient survival associated with nearby germline SV breakpoints in the CBTN cohort, with 1152 of the 1430 patients having overall survival data. We obtained CBTN patient survival data from the PedCBioportal (https://pedcbioportal.kidsfirstdrc.org/) on May 8, 2024. For associating nearby SV breakpoints with patient outcome, we utilized the geneXsample relative distance breakpoint matrix, generated by SVExpress, for a given genomic region in relation to the gene (within the gene boundary, 100 kb upstream of the gene, 100 kb downstream of the gene, or within 1 Mb of the gene start, with relative distances involving the 1 Mb region being weighted using the “relative distance metric” option [17]). For each gene, we used a stratified Cox (accounting for cancer type, using as.factor in R) to associate patient overall survival with the germline SV breakpoint patterns for that gene. For each region, FDR calculations were based on the respective set of genes with at least ten patients having a breakpoint in the given region (1 Mb region, 21,475 genes; within gene, 3243; 100 kb upstream, 7001; 100 kb downstream, 6950). We also associated mRNA expression of the gene with overall survival using stratified Cox (accounting for cancer type). We compared the set of genes significant for the SV breakpoint survival association analysis with the set of genes with SV-mRNA associations. When overlapping different result sets, we used more relaxed p-value cutoffs to limit false negatives, where the overlapping genes collectively yielded significant expression-based survival associations. Combining germline SV breakpoint data with patient survival data and tumor expression data across the CBTN patients, 85 genes had both a positive association between germline SV breakpoints near the gene and worse overall survival (one-sided p < 0.05, Cox analysis incorporating tumor type as a covariate, and at least ten patients with a breakpoint), and a positive or negative germline SV-expression association (p < 0.01, linear modeling with covariates), for the same genomic region window, considering any one of the four region windows. For this analysis, we required a positive association between SV breakpoint pattern and survival, as the absence of a germline SV increasing risk of a shorter time to adverse event may be more difficult to interpret.
We examined the association of the 85-gene signature with patient survival in the CBTN tumor mRNA profile dataset. The direction of each gene in the 85-gene signature, as applied to the CBTN RNA-seq dataset, was based on the direction of the germline SV-expression association. We scored patient profiles in the CBTN expression dataset using our previously described “t score” metric [71, 72]. This t score represents the two-sided t statistic when comparing, within each external differential expression profile, the average of the signature high genes with the average of the signature low genes. For example, the t score for a given sample profile is high when the genes high and low in the signature are respectively high and low on average in the external sample profile. We assessed the association of the gene signature score with patient outcome using Cox and log-rank (dividing the cases according to low, high, or intermediate signature scoring, accounting for tumor histologic type). Patient age (in years) was not significantly associated with overall survival by univariate Cox analyses [20], and multivariate cox analyses incorporating other clinical variables was not carried out, as developing a prognostic signature that would add information on top of existing clinical variables, e.g., for potential applications in the clinical setting, was not within our study’s scope or ability. Whereas log-rank tests require binning of patients into groups for comparison (e.g., top third, bottom third, and middle third for the expression-based analyses), the Cox models treated expression values and signatures scores as continuous measures without any binning.
Statistical analyses
All p-values were two-sided unless otherwise specified. We relied on a stricter FDR cutoff [70] for defining top genes when carrying out gene-level global molecular associations for a single analysis (e.g., gene-level SV-mRNA associations). When overlapping different top-gene results sets (e.g., gene-level SV-expression associations involving ancestral correlates or gene-level SV-expression associations involving both germline and somatic SV analyses), we used a more relaxed p-value cutoff (e.g., p < 0.01) to limit false negatives, helping us identify significant overlap patterns. As reported, the degree of gene set overlap across independent analyses was often highly statistically significant. This practice would be consistent with our previous studies identifying genes with demonstrated functional roles as originally identified using integrative analyses [73, 74], and with standard analytical methods like GSEA [47, 48] that can identify enrichment patterns that would be missed using overly strict FDR cutoffs. Visualization using heat maps was performed using JavaTreeview (version 1.1.6r4) [75] and matrix2png (version 1.2.1) [76]. Boxplots represent 5% (lower whisker), 25% (lower box), 50% (median), 75% (upper box), and 95% (upper whisker). Figures represent biological and not technical replicates.
Supplementary Information
Acknowledgements
This work was made possible through the resources and datasets made available by the Children’s Brain Tumor Network (CBTN). We thank the patients and their families for their participation in the CBTN project.
Author contributions
Conceptualization: C.J.C.; Methodology: C.J.C., Y.Z., F.C.; Formal Analysis: C.J.C., Y.Z., F.C., L.F.P., F.J.S.; Data Curation: C.J.C., L.F.P., F.J.S.; Visualization; C.J.C.; Writing: C.J.C.; Manuscript Review: Y.Z., F.C., F.J.S.; Supervision: C.J.C.
Funding
This work was supported by National Institutes of Health (NIH) grant P30CA125123 (C. Creighton).
Availability of data and materials
No datasets were generated or analysed during the current study.
Declarations
Ethics approval and consent to participate
Results are based on data generated by the CBTN. Patients were consented by one of 32 participating sites and enrolled on a local IRB-approved protocol which includes key language to enable prospective collection of, future research on, and sharing of, de-identified surgical specimens, patient demographics, medical history, diagnoses, treatments, and clinical imaging. Tumor molecular profiling data were generated through informed consent as part of CBTN efforts and analyzed here per CBTN’s data use guidelines and restrictions. CBTN Member institutions include the following: Akron Children’s Hospital, Ann & Robert H. Lurie Children’s Hospital of Chicago, Beijing Tiantan Hospital Neurosurgery Center, Children’s Healthcare of Atlanta, Children’s Hospital of Philadelphia, Children’s National Hospital, Children’s of Alabama, Dayton Children’s Hospital, Doernbecher Children’s Hospital, Hassenfeld Children’s Hospital at NYU Langone, Hudson Institute of Medical Research, Intermountain Primary Children’s Hospital, Johns Hopkins All Children’s Hospital, Johns Hopkins Medicine, Joseph M. Sanzari Children’s Hospital at Hackensack University Medical Center, Lucile Packard Children’s Hospital Stanford, Maria Fareri Children’s Hospital at Westchester Medical Center, Meyer Children’s Hospital, Michigan Medicine C.S. Mott Children’s Hospital, Nicklaus Children’s Hospital, Orlando Health Arnold Palmer Hospital for Children, Seattle Children’s Hospital, St. Louis Children’s Hospital, Sydney Children’s Hospital in Randwick, Texas Children’s Hospital, UCSF Benioff Children’s Hospital, University Children’s Hospital Zürich, University of Iowa Stead Family Children’s Hospital, UNC Chapel Hill—North Carolina Children’s Hospital, UPMC Children’s Hospital of Pittsburgh, Wake Forest Baptist Health, and Weill Cornell Medicine.
Consent for publication
Informed consent entailed enrollment on a local IRB-approved protocol which includes key language to enable prospective collection of, future research on, and sharing of, de-identified surgical specimens, patient demographics, medical history, diagnoses, treatments, and clinical imaging.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Fengju Chen and Yiqun Zhang have equally contributed to this work.
References
- 1.Yang L (2020) A practical guide for structural variation detection in human genome. Curr Protoc Hum Genet 107:e103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Stankiewicz P, Lupski J (2010) Structural variation in the human genome and its role in disease. Annu Rev Med 61:437–455 [DOI] [PubMed] [Google Scholar]
- 3.Feuk L, Carson A, Scherer S (2006) Structural variation in the human genome. Nat Rev Genet 7:85–97 [DOI] [PubMed] [Google Scholar]
- 4.Weischenfeldt J, Symmons O, Spitz F, Korbel J (2013) Phenotypic impact of genomic structural variation: insights from and for human disease. Nat Rev Genet 14:125–138 [DOI] [PubMed] [Google Scholar]
- 5.Chiang C, Scott A, Davis J, Tsang E, Li X, Kim Y, Hadzic T, Damani F, Ganel L, GTEx_Consortium et al. (2017) The impact of structural variation on human gene expression. Nature genetics 49: 692-699 [DOI] [PMC free article] [PubMed]
- 6.GTEx_Consortium (2017) Genetic effects on gene expression across human tissues. Nature 550: 204-213 [DOI] [PMC free article] [PubMed]
- 7.Scott A, Chiang C, Hall I (2021) Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes. Genome Res 31:2249–2257 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.The_ICGC-TCGA_Pan-Cancer_Analysis_of_Whole_Genomes_Network (2020) Pan-cancer analysis of whole genomes. Nature 578: 82-93 [DOI] [PMC free article] [PubMed]
- 9.Lilly J, Rokita J, Mason J, Patton T, Stefankiewiz S, Higgins D, Trooskin G, Larouci C, Arya K, Appert E et al (2023) The children’s brain tumor network (CBTN) - Accelerating research in pediatric central nervous system tumors through collaboration and open science. Neoplasia 35:100846 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Gröbner S, Worst B, Weischenfeldt J, Buchhalter I, Kleinheinz K, Rudneva V, Johann P, Balasubramanian G, Segura-Wang M, Brabetz S et al (2018) The landscape of genomic alterations across childhood cancers. Nature 555:321–327 [DOI] [PubMed] [Google Scholar]
- 11.Weischenfeldt J, Dubash T, Drainas A, Mardin B, Chen Y, Stütz A, Waszak S, Bosco G, Halvorsen A, Raeder B et al (2017) Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nat Genet 49:65–74 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rheinbay E, Nielsen MM, Abascal F, Tiao G, Hornshøj H, Hess JM, Pedersen RI, Feuerbach L, Sabarinathan R, Madsen T et al (2020) Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature 578:102–111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Chen F, Zhang Y, Chandrashekar D, Varambally S, Creighton C (2023) Global impact of somatic structural variation on the cancer proteome. Nat Commun 14:5637 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zhang Y, Chen F, Pleasance E, Williamson L, Grisdale C, Titmuss E, Laskin J, Jones S, Cortes-Ciriano I, Marra M, Creighton C (2021) Rearrangement-mediated cis-regulatory alterations in advanced patient tumors reveal interactions with therapy. Cell Rep 37:110023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Zhang Y, Chen F, Donehower L, Scheurer M, Creighton C (2021) A pediatric brain tumor atlas of genes deregulated by somatic genomic rearrangement. Nat Commun 12:937 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zhang Y, Chen F, Fonseca N, He Y, Fujita M, Nakagawa H, Zhang Z, Brazma A, PCAWG_Transcriptome_Working_Group, PCAWG_Structural_Variation_Working_Group, and Creighton C (2020) High-coverage whole-genome analysis of 1220 cancers reveals hundreds of genes deregulated by rearrangement-mediated cis-regulatory alterations. Nat Commun 11: 736 [DOI] [PMC free article] [PubMed]
- 17.Zhang Y, Yang L, Kucherlapati M, Hadjipanayis A, Pantazi A, Bristow C, Lee E, Mahadeshwar H, Tang J, Zhang J et al (2019) Global impact of somatic structural variation on the DNA methylome of human cancers. Genome Biol 20:209 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zhang Y, Yang L, Kucherlapati M, Chen F, Hadjipanayis A, Pantazi A, Bristow C, Lee E, Mahadeshwar H, Tang J et al (2018) A pan-cancer compendium of genes deregulated by somatic genomic rearrangement across more than 1,400 cases. Cell Rep 24:515–527 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Davis C, Ricketts C, Wang M, Yang L, Cherniack A, Shen H, Buhay C, Kang H, Kim S, Fahey C et al (2014) The somatic genomic landscape of chromophobe renal cell carcinoma. Cancer Cell 26:319–330 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Chen F, Zhang Y, Shen L, Creighton C (2024) The DNA methylome of pediatric brain tumors appears shaped by structural variation and predicts survival. Nat Commun 15:6775 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Chen F, Zhang Y, Sedlazeck F, Creighton C (2024) Germline structural variation globally impacts the cancer transcriptome including disease-relevant genes. In: Cell Rep Med E-pub Feb 26 [DOI] [PMC free article] [PubMed]
- 22.Zhang Y, Chen F, Creighton C (2021) SVexpress: identifying gene features altered recurrently in expression with nearby structural variant breakpoints. BMC Bioinformatics 22:135 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Hirsch S, Dikow N, Pfister S, Pajtler K (2021) Cancer predisposition in pediatric neuro-oncology-practical approaches and ethical considerations. Neurooncol Pract 8:526–538 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Price M, Neff C, Nagarajan N, Kruchko C, Waite K, Cioffi G, Cordeiro B, Willmarth N, Penas-Prado M, Gilbert M et al (2024) CBTRUS statistical report: American Brain Tumor Association & NCI Neuro-Oncology Branch adolescent and young adult primary brain and other central nervous system tumors diagnosed in the United States in 2016–2020. Neuro Oncol 26:iii1–iii53 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Thibodeau M, O’Neill K, Dixon K, Reisle C, Mungall K, Krzywinski M, Shen Y, Lim H, Cheng D, Tse K et al (2020) Improved structural variant interpretation for hereditary cancer susceptibility using long-read sequencing. Genet Med 22:1892–1897 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Shapiro J, Gaonkar K, Spielman S, Savonen C, Bethell C, Jin R, Rathi K, Zhu Y, Egolf L, Farrow B et al (2023) OpenPBTA: The Open Pediatric Brain Tumor Atlas. Cell Genom 3:100340 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Yang Y, Yang L (2023) Somatic structural variation signatures in pediatric brain tumors. Cell Rep 42:113276 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Chen F, Zhang Y, Li W, Sedlazeck F, Shen L, Creighton C (2025) Global DNA methylation differences involving germline structural variation impact gene expression in pediatric brain tumors. Nat Commun 16:4713 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Sedlazeck F, Rescheneder P, Smolka M, Fang H, Nattestad M, von Haeseler A, Schatz M (2018) Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods 15:461–468 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.MacDonald J, Ziman R, Yuen R, Feuk L, Scherer S (2014) The database of genomic variants: a curated collection of structural variation in the human genome. Nucleic Acids Res 42:D986-992 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Collins R, Brand H, Karczewski K, Zhao X, Alföldi J, Francioli L, Khera A, Lowther C, Gauthier L, Wang H et al (2020) A structural variation reference for medical and population genetics. Nature 581:444–451 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Jun G, English A, Metcalf G, Yang J, Chaisson M, Pankratz N, Menon V, Salerno W, Krasheninina O, Smith A et al. (2023) Structural variation across 138,134 samples in the TOPMed consortium. In: Res Sq [Preprint] rs.3, rs-2515453
- 33.Zhao Y, Marotta M, Eichler E, Eng C, Tanaka H (2009) Linkage disequilibrium between two high-frequency deletion polymorphisms: implications for association studies involving the glutathione-S transferase (GST) genes. PLoS Genet 5:e1000472 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Gillani R, Collins R, Crowdis J, Garza A, Jones J, Walker M, Sanchis-Juan A, Whelan C, Pierce-Hoffman E, Talkowski M et al (2025) Rare germline structural variants increase risk for pediatric solid tumors. Science 387:eadq0071 [DOI] [PubMed] [Google Scholar]
- 35.Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT (2000) Gene ontology: tool for the unification of biology. The gene ontology consortium. Nature genetics 25: 25–29 [DOI] [PMC free article] [PubMed]
- 36.Shi X, Young S, Cai K, Yang J, Morahan G (2022) Cancer susceptibility genes: update and systematic perspectives. Innov (Camb) 3:100277 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Gillani R, Camp S, Han S, Jones J, Chu H, O’Brien S, Young E, Hayes L, Mitchell G, Fowler T et al (2022) Germline predisposition to pediatric Ewing sarcoma is characterized by inherited pathogenic variants in DNA damage repair genes. Am J Hum Genet 109:1026–1037 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Farouk Sait S, Walsh M, Karajannis M (2021) Genetic syndromes predisposing to pediatric brain tumors. Neurooncol Pract 8:375–390 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Ongen H, Buil A, Brown A, Dermitzakis E, Delaneau O (2016) Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32:1479–1485 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Shi X, Radhakrishnan S, Wen J, Chen J, Chen J, Lam B, Mills R, Stranger B, Lee C, Setlur S (2020) Association of CNVs with methylation variation. NPJ Genom Med 5:41 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Chase A, Ernst T, Fiebig A, Collins A, Grand F, Erben P, Reiter A, Schreiber S, Cross N (2010) TFG, a target of chromosome translocations in lymphoma and soft tissue tumors, fuses to GPR128 in healthy individuals. Haematologica 95:20–26 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Roadmap_Epigenomics_Consortium, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J et al. (2015) Integrative analysis of 111 reference human epigenomes. Nature 518: 317–330 [DOI] [PMC free article] [PubMed]
- 43.Capellini A, Williams M, Onel K, Huang K (2021) The functional hallmarks of cancer predisposition genes. Cancer Manag Res 13:4351–4357 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.The_All_of_Us_Research_Program_Genomics_Investigators (2024) Genomic data in the All of us research program. Nature 627: 340–346 [DOI] [PMC free article] [PubMed]
- 45.Heden T, Franklin M, Dailey C, Mashek M, Chen C, Mashek D (2023) Acot1 deficiency attenuates high-fat diet-induced fat mass gain by increasing energy expenditure. JCI Insight 8:e160987 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Barrow M, Martin M, Coffey A, Andrews P, Jones G, Reaves D, Parker J, Troester M, Fleming J (2019) A functional role for the cancer disparity-linked genes, CRYβB2 and CRYβB2P1, in the promotion of breast cancer. Breast Cancer Res 21:105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Subramanian A, Tamayo P, Mootha V, Mukherjee S, Ebert B, Gillette M, Paulovich A, Pomeroy S, Golub T, Lander E, Mesirov J (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102:15545–15550 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34:267–273 [DOI] [PubMed] [Google Scholar]
- 49.Suresh R, Diaz R (2021) The remodelling of actin composition as a hallmark of cancer. Transl Oncol 14:101051 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Perepechaeva M, Grishanova A (2020) The role of Aryl Hydrocarbon Receptor (AhR) in brain tumors. Int J Mol Sci 21:2863 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Mulcahy E, Colόn R, Abounader R (2020) HGF/MET signaling in malignant brain tumors. Int J Mol Sci 21:7546 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Wang R, Guan W, Qiao M, Zhang Y, Zhang M, Wang K, Wang Y, Wang L (2022) CNS tumor with BCOR internal tandem duplication: clinicopathologic, molecular characteristics and prognosis factors. Pathol Res Pract 236:153995 [DOI] [PubMed] [Google Scholar]
- 53.Fatumo S, Chikowore T, Choudhury A, Ayub M, Martin A, Kuchenbaecker K (2022) A roadmap to increase diversity in genomic studies. Nat Med 28:243–250 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Marsh R, Madden L, Kitchen B, Mody R, McClimon B, Jordan M, Bleesing J, Zhang K, Filipovich A (2010) XIAP deficiency: a unique primary immunodeficiency best classified as X-linked familial hemophagocytic lymphohistiocytosis and not as X-linked lymphoproliferative disease. Blood 116:1079–1082 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Ebert P, Audano P, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder M, Sulovari A, Ebler J, Zhou W, Serra Mari R et al (2021) Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372:eabf7117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Foss-Skiftesvik J, Hagen C, Mathiasen R, Adamsen D, Bækvad-Hansen M, Børglum A, Nordentoft M, Werge T, Christiansen M, Schmiegelow K et al (2021) Genome-wide association study across pediatric central nervous system tumors implicates shared predisposition and points to 1q25.2 (PAPPA2) and 11p12 (LRRC4C) as novel candidate susceptibility loci. Childs Nerv Syst 37:819–830 [DOI] [PubMed] [Google Scholar]
- 57.Mack S, Northcott P (2017) Genomic analysis of childhood brain tumors: methods for genome-wide discovery and precision medicine become mainstream. J Clin Oncol 35:2346–2354 [DOI] [PubMed] [Google Scholar]
- 58.Kang D, Choi J (2021) Breast cancer-related low penetrance genes. Adv Exp Med Biol 1187:419–434 [DOI] [PubMed] [Google Scholar]
- 59.Chen F, Chandrashekar D, Scheurer M, Varambally S, Creighton C (2022) Global molecular alterations involving recurrence or progression of pediatric brain tumors. Neoplasia 24:22–33. 10.1016/j.neo.2021.11.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Liu B, Conroy J, Morrison C, Odunsi A, Qin M, Wei L, Trump D, Johnson C, Liu S, Wang J (2015) Structural variation discovery in the cancer genome using next generation sequencing: computational solutions and perspectives. Oncotarget 6:5477–5489 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, Källberg M, Cox A, Kruglyak S, Saunders C (2016) Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32:1220–1222 [DOI] [PubMed] [Google Scholar]
- 62.Forbes S, Beare D, Boutselakis H, Bamford S, Bindal N, Tate J, Cole C, Ward S, Dawson E, Ponting L et al (2017) COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res 45:D777–D783 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.The_ENCODE_Project_Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57–74 [DOI] [PMC free article] [PubMed]
- 64.Wala J, Bandopadhayay P, Greenwald N, O’Rourke R, Sharpe T, Stewart C, Schumacher S, Li Y, Weischenfeldt J, Yao X et al (2018) SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res 28:581–591 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Leek J, Johnson W, Parker H, Jaffe A, Storey J (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28:882–883 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Johnson W, Rabinovic A, Li C (2007) Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics 8:118–127 [DOI] [PubMed] [Google Scholar]
- 67.Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, Janoueix-Lerosey I, Delattre O, Barillot E (2012) Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 28:423–425 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Talevich E, Shain A, Botton T, Bastian B (2016) CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLoS Comput Biol 12:e1004873 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Gao R, Bai S, Henderson Y, Lin Y, Schalck A, Yan Y, Kumar T, Hu M, Sei E, Davis A et al (2021) Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nat Biotechnol 39:599–608 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100:9440–9445. 10.1073/pnas.1530509100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.The_Cancer_Genome_Atlas_Research_Network (2013) Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499: 43–49. 10.1038/nature12222 [DOI] [PMC free article] [PubMed]
- 72.Cancer_Genome_Atlas_Research_Network (2011) Integrated genomic analyses of ovarian carcinoma. Nature 474: 609-615 [DOI] [PMC free article] [PubMed]
- 73.Grzeskowiak C, Kundu S, Mo X, Ivanov A, Zagorodna O, Lu H, Chapple R, Tsang Y, Moreno D, Mosqueda M et al (2018) In vivo screening identifies GATAD2B as a metastasis driver in KRAS-driven lung cancer. Nat Commun 9:273. 10.1038/s41467-018-04572-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Monsivais D, Vasquez Y, Chen F, Zhang Y, Chandrashekar D, Faver J, Masand R, Scheurer M, Varambally S, Matzuk M, Creighton C (2021) Mass-spectrometry-based proteomic correlates of grade and stage reveal pathways and kinases associated with aggressive human cancers. Oncogene 40:2081–2095. 10.1038/s41388-021-01681-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Saldanha AJ (2004) Java Treeview–extensible visualization of microarray data. Bioinformatics 20:3246–3248 [DOI] [PubMed] [Google Scholar]
- 76.Pavlidis P, Noble W (2003) Matrix2png: a utility for visualizing matrix data. Bioinformatics 19:295–296 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
No datasets were generated or analysed during the current study.







