Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2024 Jan 10;111(2):242–258. doi: 10.1016/j.ajhg.2023.12.010

A comprehensive analysis of clinical and polygenic germline influences on somatic mutational burden

Kodi Taraszka 1,2,, Stefan Groha 2,3, David King 4, Robert Tell 4, Kevin White 4, Elad Ziv 5, Noah Zaitlen 6, Alexander Gusev 2,3,∗∗
PMCID: PMC10870141  PMID: 38211585

Summary

Tumor mutational burden (TMB), the total number of somatic mutations in the tumor, and copy number burden (CNB), the corresponding measure of aneuploidy, are established fundamental somatic features and emerging biomarkers for immunotherapy. However, the genetic and non-genetic influences on TMB/CNB and, critically, the manner by which they influence patient outcomes remain poorly understood. Here, we present a large germline-somatic study of TMB/CNB with >23,000 individuals across 17 cancer types, of which 12,000 also have extensive clinical, treatment, and overall survival (OS) measurements available. We report dozens of clinical associations with TMB/CNB, observing older age and male sex to have a strong effect on TMB and weaker impact on CNB. We additionally identified significant germline influences on TMB/CNB, including fine-scale European ancestry and germline polygenic risk scores (PRSs) for smoking, tanning, white blood cell counts, and educational attainment. We quantify the causal effect of exposures on somatic mutational processes using Mendelian randomization. Many of the identified features associated with TMB/CNB were additionally associated with OS for individuals treated at a single tertiary cancer center. For individuals receiving immunotherapy, we observed a complex relationship between PRSs for educational attainment, self-reported college attainment, TMB, and survival, suggesting that the influence of this biomarker may be substantially modified by socioeconomic status. While the accumulation of somatic alterations is a stochastic process, our work demonstrates that it can be shaped by host characteristics including germline genetics.

Keywords: cancer, polygenic risk scores, somatic burden, fine-scale ancestry, tumor mutational burden, copy number burden, overall survival


We investigated how genetic and non-genetic host characteristics influence the accumulation of both somatic point mutations and copy number variants. We found fine-scale genetic ancestry and polygenic risk scores were associated with tumor mutational burden. We additionally found that many of these associations impacted overall survival.

Introduction

Cancer is a disease caused by germline polymorphisms, somatic alterations, and the interaction of the two, and the genetic architecture of cancer has been extensively studied through both lenses.1,2,3,4,5,6,7,8 These efforts have not only elucidated heritable cancer susceptibility5,6 and individual somatic drivers,7,8 but they have also identified tumor mutational burden (TMB) and aneuploidy (or copy number burden, CNB) as valuable indicators of cancer progression and prognosis. TMB has been shown to vary widely across cancer types9,10,11,12 and to have a parabolic relationship with overall survival (OS) with individuals at the extrema faring better than individuals with intermediate-range TMB.13 In fact, TMB is now an FDA-approved biomarker for immune checkpoint inhibitor (ICI) therapy eligibility, likely through the generation of neoantigens.11,14,15 Additionally, increased TMB has also been linked to exogenous risk factors such as tobacco smoke as well as endogenous defects in DNA mismatch repair and DNA replication.16,17,18 Separately, CNB has been gaining support for its independent prognostic value.19,20,21 Previous work has linked CNB with cancer outcomes, with focal somatic CNVs linked to proliferation while arm-level and chromosome-level aneuploidy were correlated with immune evasion.20 While substantial work has been done to understand the genetic underpinning of cancer, limited work has explored the interplay between germline variants and somatic burden. Here, we explore the impact of clinical features and polygenic germline features on TMB and CNB, as well as their downstream influence on patient survival.

Both demographics and germline/host variation has been previously shown to correlate with TMB, CNB, and patient outcomes. Age and sex have been linked to the somatic landscape of tumors, including mutational signatures and overall somatic burden.22,23,24,25 These findings suggest that individual-level factors may shape tumor evolution, e.g., via changes in DNA damage repair across the lifespan.26 Metastatic tumor biopsies exhibited higher TMB and CNB than primary tumors in multiple studies,19,27,28 indicative of heterogeneous evolutionary dynamics even within a single patient. Polygenic risk scores (PRSs) for cancer and immunological and other host characteristics have been associated with cancer subtypes,29 broad somatic mutational signatures,30 and TMB,31 implicating a potential germline-somatic interaction through hormone regulation and immune response after tumorigenesis. More generally, there is growing evidence that TMB may function in part as a polygenic, heritable phenotype,31 with implications for its impact as a biomarker. Importantly, as germline genetics are fixed at conception, the causal effect of environmental exposures can be estimated on TMB/CNB using Mendelian randomization without the concern of reverse causation.32

While previous studies have shown evidence that host features can influence the somatic profile of tumors, the vast majority of large germline-somatic studies have so far been constrained by using data from The Cancer Genome Atlas (TCGA).31,33,34 While TCGA has been a critical resource of germline and somatic data from hundreds of tumors across many common cancer types, it also presents fundamental challenges. Most specimens were collected prior to the immunotherapy era and thus reflect an outdated treatment landscape. Specimens were also typically selected for availability of tissue to conduct multi-modal measurements, and thus may not reflect a realistic cancer population. A particular challenge for germline-somatic analyses is the presence of systematic technical/batch effects in both the germline and somatic assays, which can lead to spurious associations.35,36,37,38 Separately, the availability of only a single such cohort not only limits the ability to replicate findings in independent data (the gold standard for demonstrating reproducibility) but also the ability to understand the generalizability to other clinical settings. In fact, generalizability is a concern even when findings replicate in a comparable clinical setting, as the association may depend on cohort features such as patient ascertainment, disease stage, treatment regimen, and time from diagnosis. While routine clinical sequencing of tumors has become common and led to large pan-cancer cohorts, genome-wide germline genotyping is rarely collected for the same individuals.39,40 Ultimately, TCGA, now over a decade old, remains one of the few cancer cohorts mined for common germline-somatic associations.

In this study, we generated a large germline-somatic cohort to date with >23,000 individuals from two different institutional settings, with distinct sequencing protocols, ascertainment strategies, and availability of clinical features; our study spans 17 common cancers and includes 1,415 individuals treated with immunotherapy (IO). We explored the impact of clinical features and polygenic germline features on TMB and CNB and identified dozens of significant associations. Using 11,973 individuals with treatment and survival data available, we showed that TMB/CNB as well as clinical features were associated with OS. In addition, we implicated four fine-scale genetic ancestry associations with OS in immunotherapy naive (non-IO) individuals. Finally, interaction analyses identified modifiers of TMB and CNB-survival associations in IO individuals. Overall, we established that host factors, including fine-scale genetic ancestry and PRSs, shape the somatic landscape both pan-cancer and within specific cancers with implications for treatment outcomes.

Material and methods

Profile cohort

Individuals receiving routine treatment at Dana-Farber Cancer Institute may consent to participate in the Profile prospective clinical sequencing effort. Each consented tumor biopsy is assayed on one of three panel versions of the targeted capture platform (OncoPanel). The three panel versions target 275, 300, and 447 genes, respectively, and all samples must minimally have 30× coverage for 80% of targets. A clinical bioinformatics pipeline was used to call all somatic variation including single nucleotide variants and copy number variation, which were then manually reviewed by Brigham & Women’s Hospital pathologists.41 We performed germline imputation across all samples using STITCH imputation software42,43; the targeted sequencing panels generated ultralow genome-wide data which can be leveraged for germline imputation. Germline variants were restricted to those with imputation INFO > 0.4 and minor allele frequency > 0.01. Continental ancestry was computed using imputed dosages and the PLINK2 ‘--score’ function by projecting each sample into the reference principal component (PC) space generated by SNPweights tools in HapMap populations of European, West African (Yoruban), and East Asian (Chinese) ancestry (Figure S1A).44 We calculated the mean and standard deviation of both PCs for self-reported white individuals and retain all individuals within two standard deviations of the mean (PC1: 1.58 × 10−8 [±1.05 × 10−9], PC2: −8.14 × 10−9 [±2.32 × 10−9]). We then further restricted the European cohort to 17 cancers with 300 or more microsatellite stable (MSS) individuals for a total of 13,131 individuals (Figures 1C and S1C). We note that while there were many non-European samples, there were few cancer types with sufficient sample sizes and joint analyses were not possible due to ancestry-specific bias in TMB calling. Finally, samples were selected and sequenced from individuals who were consented under institutional review board (IRB) approved protocol 11-104 from the Dana-Farber/Partners Cancer Care Office for the Protection of Research Subjects. Written informed consent was obtained from participants prior to inclusion in this study. Secondary analyses of previously collected data were performed with approval from the Dana-Farber IRB (DFCI IRB protocol 19-033 and 19-025; waiver of HIPAA authorization approved for both protocols).

Figure 1.

Figure 1

Overview of pipeline and cohorts

(A) Flowchart outlining the bioinformatics pipeline for off-target imputation and analysis. Each germline, clinical, and somatic feature is color coded according to their purpose: outcomes (blue), independent variables (green), and covariates (red).

(B) Distribution of somatic burden pan-cancer for both cohorts, Profile (red) and Tempus (blue). Tumor mutational burden (TMB) is shown on top and copy number burden (CNB) is depicted in the bottom panel.

(C) Final sample sizes for each cancer as well as pan-cancer in Profile and Tempus with a separate column for Tempus normal samples.

Tempus cohort

A second independent cohort was generated using a representative population selected from the Tempus de-identified genomic database. Each sample was sequenced on one of the three panel versions of the targeted Tempus xT next-generation sequencing platform which respectively target the exons of 595, 596, and 648 genes.45 The germline imputation and continental ancestry projections were performed in a manner identical to those described above (Figure S1B). We again calculated the mean and standard deviation of both PCs for self-reported white individuals and retain individuals within two standard deviations of the mean (PC1 [tumor]: 1.57 × 10−8 [±1.32 × 10−9], PC1 [normal]: 1.54 × 10−8 [±1.09 × 10−9]; PC2 [tumor]: −9.07 × 10−9 [±2.88 × 10−9], PC2 [normal]: −9.77 × 10−9 [±2.35 × 10−9]). We then further restrict to the cancers analyzed in Profile resulting in 14 cancers with 200 or more microsatellite-stable individuals. We note we used a less stringent threshold than used in Profile (>300 individuals) in order to maximize the number of overlapping cancers without a significant loss in statistical power. In total, we have a curated cohort of 10,294 individuals, the majority of whom (78%) also had a corresponding normal tissue sample (Figures 1C and S1D). Data from the Tempus multimodal database included in this study were de-identified in accordance with HIPAA regulation so that it is proprietary, de-identified, real-world data that is no longer considered protected health information. Additionally, Tempus Labs, Inc. has been granted an IRB exemption (Advarra Pro00072742) permitting the use of de-identified clinical, molecular, and multimodal data in order to derive or capture results, insights, or discoveries.

TCGA cohort

The Cancer Genome Atlas (TCGA) is a well-studied, publicly available cohort that has thousands of individuals sequenced both on a germline assay and using whole-exome sequencing. We implemented the analysis pipeline used in Profile and Tempus as described above to compare our results to those previously published. The samples were imputed from the genotyping array using the Michigan imputation server with the Haplotype Reference Consortium reference panel.46 Once imputed, we calculated the mean and standard deviation of both PCs for self-reported white individuals and retained individuals within two standard deviations of the mean (PC1: 1.99 × 10−8 [±1.09 × 10−9], PC2: −7.93 × 10−9 [±2.23 × 10−9]). We determined microsatellite stability using the publicly reported MSIsensor score and retained individuals with a score < 4.47,48 We also used the consensus tumor purity previously published.49 We restrict to 11 cancers with 200 or more individuals.

Description of somatic variant calling and outcome generation

The three outcomes assessed in this study are tumor mutational burden (TMB), which is the enumeration of somatic single nucleotide variants (SNVs), and two copy number burden definitions, which enumerate somatic copy number variation (CNVs). For TMB, we restrict somatic SNVs called on the coding region of each gene and show the distribution is comparable between cohorts (Figure 1B). We additionally generate the phenotype TMB-H which is an indicator for high TMB (TMB ≥ 10). When conducting logistic regressions using TMB-H as the phenotype, we consider only cancers with >10 individuals with TMB-H. For copy number burden, one definition considers only deep gains or losses while the other considers all CNVs. For Profile and TCGA, each CNV call indicates whether the alteration is deep or shallow; therefore, we generate the two outcomes using this information. The Tempus cohort did not provide an equivalent indicator of CNV depth. Instead, a CNV gain was defined as 8 or more copies detected in 4 consecutive regions or at least 20% of the gene regions while a CNV loss was defined as 0 copies detected in 4 consecutive regions or at least 20% of the gene regions; this is most comparable to the deep CNV calls in Profile (Figures 1B and S2–S4). For simplicity we will refer to the deep gains and losses definition solely as CNB and the other definition of copy number burden which enumerates all gains and losses as “All CNB.” We note that the two definitions are truly distinct with a correlation of 0.32 in Profile and 0.36 in TCGA with only a negligible change in the correlation after normalization using a rank inverse normal transformation (RINT).

Association of demographic features with somatic burden

The role of clinical/demographic features on somatic burden were tested using a multivariate linear regression in each respective cancer as well as via a pan-cancer fixed effects meta-analysis. The independent variables of interest were age, sex, and metastatic status and the outcomes TMB, CNB, All CNB, and TMB-H were considered separately. In addition to the features of interest, the model included panel version, tumor purity, and the first 5 in-sample principal components (PCs) as covariates, and in Tempus we also included an indicator for whether the tumor sample had a normal-matching sample. We note that in Profile metastatic status is an indicator variable based on the tumor site (local recurrent/primary versus metastatic) while Tempus approximates this feature by an indicator variable for whether the cancer description includes “metastatic.” The 5 in-sample PCs were generated using PLINK2 and are described in more detail below. We note that we included the metastatic status indicator in cancer of unknown primary, but we did not treat this indicator as a feature for discovery. It was included in the model to account for any noise caused by primary labeling (e.g., different pathologist). Additionally, we tested for bias in the distribution of metastatic status across ages and found that being metastatic was not associated with age in Profile while in Tempus metastatic status was associated with a 2.34-year decrease in age (p = 6.9 × 10−28). For all associations including those described in subsequent sections, we used α = 0.05 level of significance and performed a Bonferroni correction to account for multiple testing (see Table S1).

Association of fine-scale genetic ancestry with somatic burden

We used an external reference panel designed to capture the principal components of the within Europe population structure particularly designed to distinguish between Northwestern Europe, Southeastern Europe, and Ashkenazi Jewish ancestral populations though we note that the standardization of the NW-SE cline is within each cohort. We project each sample into the corresponding PC space using PLINK2 ‘--score.” As a sanity check, we confirmed that the AJ-nonAJ indicator was significantly associated with self-reported Jewish religion (ρ = 0.78, p < 2.2 × 10−16; Figure 2A), acknowledging that these are not expected to be perfect surrogates. We convert the AJ-nonAJ cline to an indicator with Ashkenazi Jewish ancestry corresponding to a PC value ≥ 1 × 10−8 in both Profile and Tempus. We only considered cancers for testing (and replication) if there were at least 10 individuals with Ashkenazi Jewish ancestry; while all cancers in Profile met this threshold, only 4 in Tempus did. We separately test the effect of NW-SE cline and the AJ-nonAJ indicator and exclude AJ individuals from the NW-SE cline regression. We used a linear regression that controlled for the effects of sex, age, metastatic status, panel version, and tumor purity, and a covariate for whether the patient has a normal-match was also included in Tempus.

Figure 2.

Figure 2

Fine-scale ancestry is associated with TMB

(A) Inferred European ancestry in Profile, color coded by self-reported religion: non-Jewish religion (red), Jewish religion (blue), and unknown religious status (green). The x axis represents the Northwest-Southeastern cline and the y axis indicates non-Ashkenazi Jewish versus Ashkenazi Jewish ancestry with a vertical line at y = 1.0 × 10−8 indicating the dichotomous variable threshold.

(B) Inferred European ancestry in Tempus, with all points shown in green as religion is unknown. The x and y axes and the dichotomous variable threshold are identical to (A).

(C) Forest plots of the two ancestry-TMB associations with the beta and the 95% confidence interval for each cancer and a pan-cancer meta-analysis for Profile (gray) and Tempus (gold). The left panel shows the Ashkenazi Jewish-non-Ashkenazi Jewish ancestry indicator results, and the Northwest-Southeastern cline results are in the right panel. nominal significance; ∗∗∗Bonferroni significance; ∗∗significant meta-analysis; see Table S1.

(D) Bar graph indicating the proportion of individuals with TMB-H (TMB ≥ 10) in non-small cell lung cancer stratified by Ashkenazi Jewish ancestry with each cohort in a separate panel. Significant odds ratios and their p values are included.

(E) Violin plot overlaid with a boxplot of TMB in non-small cell lung cancer with Profile in the left panel and Tempus in the right panel. Each cohort is stratified by Ashkenazi Jewish ancestry.

Previous work observed continental ancestry bias in TMB estimates for tumor-only samples50; therefore, we explored the possibility of sub-continental ancestry within our European cohort. After accounting for the effect of age, sex, metastatic status, panel version, tumor purity, and cancer type on TMB, we tested for an association with ancestry, presence of normal-matching sample, and the interaction between the two (evaluated by ANOVA). No significant interaction between ancestry and the presence of normal-matching sample was observed, confirming that within-continental ancestry does not result in TMB estimation biases (Table S2). We note that we replicated the Nassar et al.50 findings. All individuals included in the cohort based on European continental ancestry were labeled as European with the rest labeled as non-European samples. We ran a regression and observed a significant interaction between continental ancestry and the presence of a normal-matching sample (Tempus: β = 0.16, p = 1.16 × 10−5).

Association of polygenic risk scores with somatic burden

In addition to the initial restriction on imputed SNPs (MAF > 0.01 and INFO > 0.4), we further restrict the germline variants to HapMap3 SNPs and then LD-prune this set using PLINK2 ‘--indep-pairwise 500 kb 0.5.’ These independent SNPs were used to calculate the in-sample principal components (PCs) using PLINK2 ‘--pca approx’ and polygenic risk score (PRS) using PLINK2 ‘—score.’ To generate the PRS, we first chose 14 cancer-related outcomes from a number of large GWASs (Table S3). We selected both cancer susceptibility PRSs as well as PRSs for traits believed to be broadly associated with cancer risk (https://www.cancer.gov/about-cancer/causes-prevention/risk). For each we compared all but one PRS to measure phenotypes available in the EHR as a sanity check finding significant correlation for all (Table S3). We set eight p value thresholds in the original GWAS beginning with all SNPs and ending with SNPs with a p value < 5 × 10−7 considering each order of magnitude between (i.e., 5 × 10−X for X in {0, 1, .., 7}). At each threshold, we use the intersection between the LD pruned SNPs and retained discovery GWAS SNPs to generate the projection. Finally, we generated a centered genetic relatedness matrix (GRM) between individuals using GEMMA and performed a linear mixed model between the somatic burden outcomes and each PRS controlling for age, sex, metastatic status, tumor purity, panel version, 5 in-sample PCs, the GRM, and whether the sample has a normal-matching sample (in Tempus) and in each analysis. In order to reduce multiple testing correction, we selected the threshold with the most significant association between the PRS and somatic outcome per cancer and separately chose the threshold pan-cancer for each PRS. Using this refined list of PRSs, we use a Bonferroni correction for the number of cancers for each PRS and outcome pair within individual cancers and corrected for the number of PRSs pan-cancer (Table S1). After identifying a significantly associated PRS, the PRS at the same threshold was tested in Tempus, correcting for the number of significantly associated PRSs.

We analyzed three significant PRS associations using the inverse-variance weighted (IVW) Mendelian randomization (MR) approach.51 This approach is robust to reverse causation as germline genetics always predates both the exposure and the tumor (in which TMB and CNB are quantified). For the cigarettes per day PRS, the smoking phenotype is defined in terms of bins of smoking frequency with the difference in means between neighboring bins corresponding to approximately ten cigarettes per day. For the tanning ability PRS, the tanning phenotype is a binary variable splitting individuals who tan very/moderately easily from those who mildly/never tan (i.e., burn). For both PRSs, we used the most restrictive threshold 5 × 10−7 and retained the peak SNP in each megabase region. We then conducted a genome-wide association study on the untransformed TMB using a linear mixed model controlling for age, sex, metastatic status, tumor purity, panel version, and the first 5 in-sample PCs. We retained the set of SNPs that intersected with the peak SNPs for each PRS. Finally, we used the MendelianRandomization R package to conduct an IVW MR, run MR-Egger, and also considered their penalized and robust models (Table S4).51,52

Survival analyses

After conducting associations between the numerous independent features (e.g., age, ancestry, PRS) and TMB/CNB, we performed follow-up survival analyses on the significant associations using a Cox-proportional hazard model. All models regardless of feature of interest include the covariates age, sex, metastatic status, panel version, and tumor purity; the first 5 PCs were also included except when we analyzed how ancestry impacted overall survival (OS). We separately analyzed individuals who received immunotherapy (IO) and those who did not (non-IO). We note that while tumor grade and stage are applicable covariates, their availability was limited; additionally, individuals receiving immunotherapy generally have advanced disease and are more homogeneous regarding these covariates.53 Individuals were left-censored/truncated until time of sequencing to mitigate immortal time bias. Association analyses were performed in the following stages: (1) the association of TMB/CNB with OS; (2) the association of clinical features, genetic ancestry, and PRS with OS; (3) for significant associations from (2) that were associated with TMB/CNB, the association was then retested with TMB/CNB, respectively, included as a covariate; and (4) interaction analyses between TMB/CNB and ancestry/PRS with the model OSb+g+bg, where OS is time-to-event overall survival, b is the burden feature, g is the genetic feature, and bg is the interaction of interest. In order to ensure the significant interaction analytical p values in IO-treated individuals were not inflated due to model violations, we conducted a permutation test using two thousand permutations and found the distribution of empirical p values to be approximately uniform (Figure S5). Additionally, the proportion of empirical p values smaller than the analytical p value was comparable to the analytical p value (TMB and EA PRS interaction in non-small cell lung cancer, empirical p = 1.4 × 10−2, analytical p = 1.1 × 10−2; all CNB and smoker PRS interaction in melanoma, empirical p = 1.1 × 10−2, analytical p = 8.7 × 10−3).

Results

Overview of data

We investigated the impact of host features on the burden of somatic alterations in two independent pan-cancer cohorts leveraging targeted tumor sequencing of >23,000 individuals. Reflecting their practical real-world use, the two cohorts utilized different proprietary sequencing platforms and somatic variant calling pipelines and had varying availability of clinical features. Both cohorts, however, were processed using the same bioinformatics pipeline for off-target germline imputation and analysis (Figure 1A). The first cohort, Profile, consisted of tumors sequenced during the course of routine care at Dana-Farber Cancer Institute and had extensive availability of clinical features. The second cohort, Tempus, was generated in a commercial setting and contained tumors originating from multiple institutions on a variety of tumor-only and tumor-normal platforms (see material and methods). We excluded individuals with microsatellite instability and restricted to individuals with primarily European ancestry due to insufficient discovery power in the individual non-European populations (Figure S1). Recent work has noted biases in tumor-only TMB estimates when comparing across continental ancestries though not for fine-scale (within continent) ancestry.50 We investigated this phenomena following the procedure established in Nassar et al.,50 using our fine-scale ancestry data, and confirmed that tumor-only TMB estimates were not biased by within-Europe ancestry, due to the generally much lower genetic distances within Europe and substantial representation of European ancestry in reference data (Table S2). We note that these associations only rule out ancestry-induced biases in misclassifying germline variants as somatic mutations. In total, we selected 17 cancers across 13,131 tumors in Profile and 14 cancers across 10,294 tumors in Tempus (78% of which have a matching normal sample; Figure S6) which had sufficient power for discovery (Figure 1C; material and methods).

We defined two measures of somatic burden (Figure 1B): (1) the total number of somatic single nucleotide variants (SNVs) per megabase, which we call ''TMB'' for tumor mutational burden, and (2) the total number of deep somatic copy number gains or losses per megabase, which we call ''CNB'' for copy number burden. In the Profile cohort, where more detailed copy number calling was available, we additionally explored a definition of CNB based on all gains or losses (rather than just deep events) which we call ''All CNB'' (Figures S2–S4). All somatic burdens were transformed using a rank inverse normal transformation (RINT) within each cancer type to adjust for their highly skewed distributions. The primary features, TMB and CNB, were not significantly correlated in either cohort (Profile: p = 4.15 × 10−1; Tempus: p = 2.38 × 10−1) while All CNB in Profile was significantly but weakly associated with TMB (ρ = 0.09; p = 8.98 × 10−27) and moderately associated with CNB (ρ = 0.32; p = 7.57 × 10−309). We note that while CNB and All CNB were only moderately correlated, it is unclear whether the differences were biological; however, it is equally unclear which measurement better captured the biologically relevant definition of somatic copy number burden. Lastly, we created a binary variable “TMB-H″ in addition to the continuous TMB phenotype, indicating whether a patient has TMB ≥ 10; this feature was based on the biomarker threshold for immune checkpoint inhibitor therapy.54,55 Somatic variation was called in each cohort using the respectively established variant calling workflows, quality control filtering, and variant sign-out (see material and methods), so as to reflect the same data provided in clinical reports.

In addition to the high exonic coverage provided by targeted panel sequencing, an abundance of off-target reads spanning the genome are generated which allow for accurate inference of common germline polymorphisms.42,43,56 Established pipelines for germline imputation from the 1000 Genomes Project reference panel were used,43 followed by integration into germline polygenic risk scores for a range of cancer-related phenotypes as well as estimated fine-scale genetic ancestry for each individual (see material and methods). A subset of the Tempus data was sequenced with normal-matching samples, enabling us to quantify the effects of tumor-based versus normal-based imputation. We conduct such sensitivity analyses throughout and note when discoveries are significantly influenced by tumor versus normal platform.

Demographic features are associated with somatic burden

We first explored the associations of TMB and CNB with the broad demographic features age, sex, and metastatic status, fit jointly in cancer-specific and pan-cancer analyses. These were designed to serve as a positive control and to re-examine these factors in a large, contemporary cohort (Tables S5 and S6). In total, we identified 42 significant associations in the Profile cohort after using a Bonferroni correction within each feature (Table S7), 11 of which have been previously reported in TCGA (Table S5). We were able to test 26/42 significant associations in the Tempus cohort, of which 15/26 were nominally significant (enrichment test p = 1.39 × 10−13) with 9/26 remaining significant after Bonferroni correction (Table S7). As the Bonferroni correction varies across cohorts and features, we report each threshold in Table S1.

We observed multiple highly significant associations between age and increased TMB across multiple cancer types in both cohorts. In the pan-cancer meta-analysis, increased age was significantly associated with increased TMB in both cohorts (Profile: β = 0.007, p = 3.68 × 10−25; Tempus: β = 0.006, p = 2.43 × 10−14; Figure 3A; Table S7). While the effect was highly statistically significant, the effect size in both cohorts indicated only a small increase in the normalized TMB with each additional year. Within individual cancers, we observed ten significant associations in Profile. For the 8/10 cancer types that were also present in Tempus, 7/8 of the age-TMB associations were nominally significant, of which three remained significant after Bonferroni correction (Figures 3A and S7). Five of the per-cancer age-TMB associations were unique to our work (bladder cancer, cancer of unknown primary, leukemia, melanoma, and non-small cell lung cancer); we tested 4/5 in Tempus and found 3/4 were also significantly associated (Figure S6). Overall, these results are consistent with long-standing links between older age and an increase in somatic mutations, both pan-cancer and in individual cancers.9,11,22

Figure 3.

Figure 3

Demographic features are associated with TMB

(A) Forest plot of the age-TMB effect size and 95% confidence interval for each cancer and pan-cancer meta-analysis. nominal significance; ∗∗∗Bonferroni significance; ∗∗significant meta-analysis; see Table S1.

(B) Bar graph indicating the proportion of individuals with TMB-H (TMB ≥ 10) pan-cancer by age quintile. Significant odds ratios and their p values are included. Associations between the quintile extrema and the other quintiles denoted by a red star.

(C) Bar graph showing the proportion of TMB-H individuals pan-cancer, stratified by sex with the corresponding significant odds ratio and p value.

(D) Bar graph of proportion of TMB-H split by metastatic status with the significant odds ratio and p value included.

We observed positive but generally weaker associations between age and CNB in both cohorts. In the pan-cancer meta-analysis, older individuals had more somatic CNVs in both cohorts (Profile: β = 0.003, p = 4.21 × 10−4; Tempus: β = 0.002, p = 4.33 × 10−2; Table S7). Additionally, age was significantly associated with CNB in five cancer types in the Profile cohort with the positive associations in glioma and ovarian cancer also significant in the Tempus cohort (Figure S7). We note that CNB based only on deep events has not been previously explored but was consistent with the alternative definition of somatic CNV burden (All CNB) which have been previously reported (Figure S7; Table S5).25

Interestingly, we generally observed more heterogeneity in the influence of sex on TMB/CNB across cohorts. The previously reported pan-cancer TMB decreasing effect of female sex was not significant in Profile but was significantly associated in Tempus (Profile: β = −0.03, p = 8.45 × 10−2; Tempus: β = −0.05, p = 2.16 × 10−2; Tables S5 and S7). For individual cancers, the well-established lower mutational rate in women in melanoma was significant in both cohorts (Profile: β = −0.30, p = 1.44 × 10−4; Tempus: β = −0.37, p = 2.77 × 10−4; Table S7).57,58 In the pan-cancer sex-CNB meta-analysis, female sex was significantly associated with lower CNB in the Profile cohort (β = −0.06, p = 4.04 × 10−3), whereas the association was highly significant but in the opposite direction in the Tempus cohort (β = 0.11, p = 1.76 × 10−6); this difference in direction of effect was also consistent within cancer types and may reflect an underlying difference in patient ascertainment between the cohorts (see Discussion; Table S7). Within individual cancers, we identified a significant association in esophagogastric cancer between both sex-CNB and sex-All CNB (Table S5).25

Metastatic status (i.e., whether the biopsy was a metastatic or primary tumor) was significantly associated with increased CNB/TMB pan-cancer for all three definitions of somatic burden (metastatic-TMB β = 0.07, p = 5.65 × 10−4; metastatic-CNB β = 0.16, p = 6.19 × 10−15; metastatic-All CNB β = 0.20, p = 1.43 × 10−24; Table S7). While the metastatic-TMB and metastatic-CNB associations were both tested in Tempus, neither was significantly associated. We note, however, that in Profile, metastasis is defined based on the biopsy site whereas in Tempus metastasis is defined based on the disease stage even when the primary tumor was sequenced (see material and methods). Within cancers, we observed two significant metastatic-TMB associations (breast carcinoma and non-small cell lung cancer), with the association in breast carcinoma being nominally significant in the Tempus cohort (Profile: β = 0.17, p = 9.15 × 10−4; Tempus: β = 0.12, p = 3.44 × 10−2). We also observed four cancers with metastatic-CNB associations and six metastatic-All CNB associations (Table S7). Overall, our findings indicate that metastatic tumors exhibit a higher mutational load, and that this association was a result of the tumor site itself rather than a consequence of disease stage.27

Finally, we conducted additional analyses of the significant age-TMB, sex-TMB, and metastatic-TMB associations using binary TMB-H (TMB ≥ 10) as the outcome, given its clinical relevance as the established biomarker cutoff. Pan-cancer, all three clinical features were significantly associated with TMB-H, with the sex and age associations significant in both cohorts (Figure 3; Table S8; we caution, though, that these tests may be partially overfit due to winner’s curse). Interestingly, while both increased age and male sex were associated with an increased probability of having TMB-H, age was more impactful. While a 61-year-old man would have an 8% probability of qualifying for immunotherapy based on the TMB-H threshold in Profile and an 18% probability in Tempus, a woman just one year older than the mean age (62) would have a 12% and 21% probability, respectively (probabilities based on logistic regression coefficients). This is a 45% increase in Profile and a 21% increase in Tempus. We additionally considered age quintiles and observed a TMB-H odds ratio (OR) of 1.37 for individuals 72–98 years of age compared to the other four age quintiles in Profile as well as an OR of 0.85 those aged 18–50 relative to the other four age quintiles in Tempus (Figure 3B; Table S9). In addition to the pan-cancer discoveries, we analyzed the seven significant individual cancer age-TMB associations (Table S8). Three cancers exhibited a significant age-TMB-H association in Profile, with the melanoma and soft tissue sarcoma associations also significant in Tempus (Table S9). The sex-TMB association in melanoma was also recapitulated for TMB-H in both cohorts (Profile: female OR = 0.52, p = 2.96 × 10−4; Tempus: female OR = 0.47, p = 9.04 × 10−4; Table S9). Overall, our findings indicate that sex and age have a clinically meaningful association with TMB-H and, to a lesser extent, CNB across multiple cancer types in both tested cohorts.

Fine-scale ancestry within Europe influences somatic burden

We next explored the association of TMB/CNB with fine-scale genetic ancestry within the European population using projected principal components (PCs) defined in a reference cohort (Table S10). We focus on Northwest/Southeast Europe (NW-SE) as a continuous cline and Ashkenazi/non-Ashkenazi Jewish (AJ-nonAJ) ancestry as a dichotomous feature which reflects the major components of European American ancestry59 (see material and methods; Figure 2A). We note that while ancestry was estimated based on germline variation, it additionally reflects lifestyle and other non-genetic factors relevant to cancer risk, all of which may influence the accumulation of somatic events. While the influence of continental ancestry on TMB has recently been studied,50,60 the effect of fine-scale genetic ancestry on somatic burden has yet to be established. A major finding of previous work is continental ancestry causes biased TMB estimates for tumor-only samples which we attempted to replicate50; however, no significant interaction between ancestry and the presence of a normal-matching sample was observed for fine-scale ancestry within European samples (Table S2).

We observed a significant decrease in TMB for individuals with AJ ancestry relative to non-AJ individuals in both cohorts. In the pan-cancer meta-analysis, the association with lower TMB was significant in both cohorts (Profile: β = −0.34, p = 3.19 × 10−28; Tempus: β = −0.16, p = 9.84 × 10−3; Table S11) though the smaller normal sample in Tempus was only borderline significant (p = 5.14 × 10−2). We note that this association is not impacted by the higher risk of certain cancers in Ashkenazi Jews (e.g., breast cancer) because it is a meta-analysis of within-cancer estimates. However, it may be indicative of a negative association between risk and TMB within cancers (for example, if higher germline risk requires fewer environmental exposures to develop cancer). In the Profile cohort, we additionally conducted sensitivity analyses by evaluating a number of potential covariates from the electronic health records (EHR) including ever smoking, current drinker, autoimmune disease diagnosis, body mass index, and white blood cell count that may reflect various lifestyle and other non-genetic factors and found the AJ effect remained significant (Table S3).

When considering individual cancers in Profile, there were four significant associations (Figure S8). Only the significant association in non-small cell lung cancer was testable in the Tempus cohort (due to having >10 AJ ancestry individuals), and we observed a significant association (Profile: β = −0.42, p = 9.77 × 10−10; Tempus: β = −041, p = 1.61 × 10−5; Figures 2 and S8; Table S11). We again re-analyzed the AJ-nonAJ-TMB association using the binary TMB-H indicator in non-small cell lung cancer and observed a significantly lower rate of TMB-H in both cohorts for individuals with Ashkenazi Jewish ancestry with comparable effect sizes (Profile: OR = 0.51, p = 7.38 × 10−3; Tempus: OR = 0.63, p = 3.62 × 10−2; Figures 2D and 2E; Table S8). There were no significant associations between AJ ancestry and CNB.

Turning to Southeast European ancestry (SE) along the NW-SE cline (after excluding AJ individuals), we observed an increase in TMB in Profile while Tempus generally lacked sufficient power for discovery (see below for further discussion). In the pan-cancer meta-analysis, we observed increased SE ancestry was significantly associated with increased TMB in both cohorts (Profile: β = 0.11, p = 5.8 × 10−35; Tempus: β = 0.03, p = 6.74 × 10−3; Table S11). When considering the effect of the NW-SE cline on TMB within individual cancers, there were ten significant associations. In contrast to the other nine cancers, only melanoma showed a decrease in TMB with increasing SE ancestry (β = −0.19, p = 3.21 × 10−5), potentially implicating the protective role of increased melanin.61 In order to further explore this hypothesis, we jointly modeled the effect of the NW-SE cline and an ease of tanning PRS on TMB and found both were significantly associated (NW-SE β = −0.128, p = 5.90 × 10−3 and ease of tanning PRS β = 0.193, p = 1.6 × 10−6) and note an interaction effect was also tested but not significant (p = 9.82 × 10−1). We note that while 9/10 cancer types were testable in Tempus, only ovarian cancer was nominally significant (p = 1.30 × 10−2; Table S11).

Lastly, SE ancestry was significantly associated with increased CNB in Profile pan-cancer (β = −0.02, p = 6.75 × 10−3; Figure S8; Table S11), but the finding was not significant in Tempus (p = 1.94 × 10−1). While the per-cancer ancestry associations in Profile were generally not significant in Tempus, the effect directions were consistent for the significant NW-SE cline discoveries (8/9, p = 2.00 × 10−2), suggesting that the limited overlap is driven by lack of statistical power (Table S11). Indeed, we observed a much lower density of individuals along the ancestry clines in Tempus, with variance along both clines in Tempus approximately half of the variance in Profile (Figure 2B) and only 4/14 cancers in Tempus meeting the threshold of ten or more individuals with Ashkenazi ancestry. These differences in the variance of the independent variable coupled with overall differences between the cohorts limited statistical power.

Polygenic germline variation influences somatic burden

We next examined the germline influence on somatic burden through polygenic risk scores (PRSs) for a variety of complex traits. Prior work has indicated limited power for discovery, so we selected 14 relevant phenotypes for exploration, including cigarettes per day, autoimmune disease diagnosis, and ease of tanning from publicly available GWASs (Table S3),62,63,64,65,66,67,68,69 and used a pruning and thresholding approach to evaluate their effect in Profile and then carry over to the Tempus cohort (see material and methods; Table S12).

We identified nine significant PRS-TMB associations in Profile after using a Bonferroni correction (Table S1). Three pan-cancer associations were in Profile—smoking (cigarettes per day), educational attainment (years of education, EA), and white blood cell count (Figure 4)—of which the PRS for cigarettes per day was also significant and positively associated with TMB in the Tempus cohort (Profile: β = 0.03, p = 1.06 × 10−3; Tempus: β = 0.02, p = 4.67 × 10−2; Table S13). The remaining six were significant within-cancer associations in Profile, of which three were also significant in Tempus: smoking (cigarettes per day) and educational attainment (years of education, EA) in non-small cell lung cancer and ease of tanning in melanoma (Table S13). The ease of tanning PRS was also significantly associated with TMB-H in melanoma, indicating that those with poor tanning ability were more likely to have high TMB (OR = 1.37, p = 9.28 × 10−4; Tables S8 and S9). While the association of smoking and UV exposure on TMB are well established, at the germline level only the pan-cancer EA association has been previously reported in TCGA (Table S5).31

Figure 4.

Figure 4

Polygenic risk scores are associated with somatic burden pan-cancer

Forest plots showing the estimated effect size for the PRS on TMB/CNB and the 95% confidence interval in each sub-figure.

All sub-figures contain significant pan-cancer associations in Profile and are stratified by cohort with Profile on the left and Tempus on the right. Tumor samples are in blue and normal samples in red. nominal significance; ∗∗∗Bonferroni significance; ∗∗significant meta-analysis; see Table S1.

(A) Forest plot of cigarettes per day PRS-TMB associations.

(B) Forest plot of educational attainment PRS-TMB associations.

(C) Forest plot of white blood cell count PRS-TMB associations.

(D) Forest plot of autoimmune disease PRS-All CNB associations.

Turning to CNB, we identified a single association in Profile, EA PRS in melanoma, which was not significant in the Tempus cohort. For All CNB, there were five significant associations though they could not be tested in Tempus (differing CNB definitions; Table S13). In particular we note an intriguing pan-cancer association between the PRS for autoimmune disease and All CNB (β = −0.03, p = 1.41 × 10−3), which may indicate underlying immune processes (Figure 4).70 Overall, we identified 15 PRS-TMB/CNB associations in Profile; however, due to the risk of inflated p values, we assign high confidence only to the four which were also significantly associated in Tempus. Additionally, we showed the mean squared error was low between all regression betas to the chosen beta for the four high-confidence associations (Figure S9).

Finally, we conducted a fixed effects meta-analysis of three cohorts, the two main cohorts presented here and TCGA (see below and material and methods; Table S14). We concluded with this analysis to explore whether there was evidence of polygenic germline influences via PRSs on TMB/CNB that the current sample sizes were underpowered to identify. In total, we found 17 significant PRS associations after multiple testing correction (Table S1) with one significant association for PRS-CNB, two for PRS-All CNB, and 14 for PRS-TMB (Table S15). Notably, in non-small cell lung cancer, the PRS for smoking and the PRS for EA were significantly associated with TMB in all three cohorts, indicative of a robust association regardless of sampling time and source. Overall, 9 associations were discovered only via the meta-analysis, motivating further studies in larger cohorts.

Quantifying causal influences of exposures on TMB

We sought to estimate the causal effect of cigarettes per day on the number of somatic mutations exome-wide using the standard inverse-variance weighted (IVW) Mendelian randomization (MR) approach as well as MR-Egger32,51,52 with the raw (unnormalized) TMB phenotype in Profile (Table S4; see material and methods). In brief, MR contrasts the effects of germline variants on an exposure to their effects on a downstream outcome and is thus robust to confounding from non-genetic factors.32 Within non-small cell lung cancer, every ten additional cigarettes per day resulted in 77.3 (±19.5) additional exonic somatic mutations (causal MR p = 7.31 × 10−5) while the pan-cancer regression was not significant (p = 8.0 × 10−2). We note that this estimate is derived from a targeted gene panel but is consistent with prior non-causal estimates of 75 mutations per ten cigarettes per day genome-wide.71 We similarly explored the causal effect of tanning ability within melanoma and observed that a limited ability to tan (relative to the ability to tan well/moderately well) resulted in 365.1 (±115.5) additional exonic somatic mutations (causal MR p = 9.66 × 10−4). We note that while these analyses are in a sense a positive control due to the etiology being well established, Mendelian randomization has been underutilized for somatic events due to the previously limited availability of germline-somatic cohorts.

We next investigated the EA PRS association with lower TMB, which was previously observed in a pan-cancer TCGA analysis with nominal significance.31 We again focused on non-small cell lung cancer, where this association was significant in both cohorts (Profile: β = −0.07, p = 1.35 × 10−3; Tempus: β = −0.06, p = 1.15 × 10−2; Table S13) and in TCGA (Table S15). As the EA PRS is likely to capture a complex set of factors related to educational attainment, socioeconomic status, and familial environment, the assumptions of MR were likely violated and the MR analysis was not conducted. Instead, we conducted a sensitivity analysis to test whether the genetic effect was robust to a direct measure of EA and we re-evaluated the EA PRS-TMB association after including a covariate for self-reported college attainment (SRCA, based on patient demographic data in the clinical record). In this joint model, lower TMB was significantly associated with SRCA and was no longer associated with the EA PRS. While the EA PRS was not marginally associated with TMB-H (p = 2.85 × 10−1), SRCA was significantly associated with a lower rate of TMB-H (OR = 0.70, p = 8.19 × 10−3). SRCA thus fully explained the effect of the EA PRS on TMB and had an independent relationship with TMB-H in this non-small cell lung cancer population.

We further quantified the influence of SRCA on TMB both within individual cancers as well as pan-cancer. SRCA was negatively associated with TMB both pan-cancer (β = −0.08, p = 1.20 × 10−5; Table S16) and in two individual cancers (non-small cell lung cancer: β = −0.26, p = 1.50 × 10−9; cancer of unknown primary: β = −0.26, p = 2.96 × 10−3), with the effect on TMB-H only significant in non-small cell lung cancer (OR = 0.71, p = 8.19 × 10−3). As this association is likely to be due in part to exposures related to socioeconomic status, including smoking and pollutants, we conducted a sensitivity analysis to test the effect of SRCA after accounting for self-reported ever smoking (SRS). For the pan-cancer effect, both SRCA and SRS were associated with TMB in a joint model (SRCA: β = −0.06, p = 9.06 × 10−3; SRS: β = 0.13, p = 4.48 × 10−8; Table S16). Similarly, in non-small cell lung cancer, both SRS and SRCA have a significant effect on TMB in a joint model (SRCA: β = −0.16, p = 7.15 × 10−4; SRS: β = 0.72, p = 5.50 × 10−29) while only SRS is significantly associated with TMB-H in a joint model (SRCA: p = 3.09 × 10−1; SRS: OR = 9.79, p = 1.01 × 10−7). In non-small cell lung cancer, SRCA thus captures aspects of socioeconomic status that influence the accumulation of somatic SNVs independent of SRS, but this effect did not impact eligibility for immunotherapy based on the TMB-H threshold.

Somatic burden and patient demographics are associated with survival

Previous work has linked both TMB/CNB and clinical features with cancer outcomes.14,19,20,21,72,73,74 Here, we sought to further investigate these effects in a large cohort as well as explore the interaction of germline PRS and TMB/CNB on overall survival (OS). These analyses included relevant covariates and were restricted to 11,973 individuals from the Profile cohort who had treatment and survival measurements readily available, of which 1,415 individuals received immune checkpoint inhibitor immunotherapy (IO) while the remaining 10,558 individuals were immunotherapy naive (non-IO) and used as a normative population.

We first explored the influence of TMB/CNB on OS, confirming previous findings in this large cohort as well as expanding the set of studied cancers (Table S17). Considering only IO recipients, we confirmed the significant pan-cancer protective effect of TMB on OS (HR = 0.90, p = 6.61 × 10−3; Table S18) but observed no relationship between CNB/All CNB and OS (in contrast to prior studies20,21). The effect of TMB-H was even more significant pan-cancer (HR = 0.65, p = 2.99 × 10−4) as well as nominally significant in melanoma (HR = 0.59, p = 2.92 × 10−2) and non-small cell lung cancer (HR = 0.63, p = 4.76 × 10−3), with both being well established.54

In contrast, in non-IO individuals both TMB and CNB were associated with increased mortality pan-cancer, emphasizing the broadly hazardous effect of these somatic burdens (TMB-OS: HR = 1.07, p = 1.52 × 10−6; CNB-OS: HR = 1.17, p = 5.19 × 10−31; All CNB-OS: HR = 1.18, p = 1.63 × 10−30; Figure 5; Table S18). In individual cancers, 19 associations were significant: five TMB-OS, six CNB-OS, and eight All CNB-OS. We highlight two notable associations. First, in prostate cancer, there is a highly significant hazardous effect of All CNB on OS (HR = 2.77, p = 1.32 × 10−13). As BRCA1 and BRCA2 have well-established relationships with prostate cancer prognosis,75 we re-analyzed this association and found the All CNB effect remained significant after accounting for whether the affected individual carried a somatic SNV in these genes and separately a somatic CNV. Second, in endometrial cancer, increased TMB had a protective effect on OS for non-IO individuals (HR = 0.74, p = 4.27 × 10−5; Table S18), in contrast to a hazardous effect in every other cancer where the association was significant. This protective effect may be consistent with prior observations in TCGA of better OS for individuals with POLE variants, which are likely to be hypermutable.76 Likewise, TMB-H was protective in endometrial cancer (HR = 0.37, p = 2.75 × 10−3; Table S18), even though no significant pan-cancer effect was observed for TMB-H. TMB/CNB are thus substantial contributors to poor outcomes within and across cancers for non-IO individuals, but protective (TMB) or null (CNB) for IO-treated individuals.

Figure 5.

Figure 5

Somatic burden is associated with overall survival

Forest plots showing the estimated effect sizes and the 95% confidence interval of somatic burden on overall survival (OS) in Profile, separately for both immunotherapy (IO) individuals and non-IO individuals.

Each panel corresponds to a different somatic burden definition (All CNB, CNB, and TMB, in respective order). Within each panel, IO recipients are purple and non-IO individuals are blue (о indicates not significant; • indicates nominal significance; ◾shows Bonferroni significance; ▲ represents a significant meta-analysis; see Table S1).

We next turned to the features significantly associated with TMB/CNB (above) to quantify their direct and indirect influence on OS (Table S19). Age, male sex, and metastatic status were associated with poorer survival within multiple individual cancers as well as pan-cancer, both in IO recipients (older age-OS: HR = 1.01, p = 1.21 × 10−3, female sex-OS: HR = 0.81, p = 5.46 × 10−3; metastatic site-OS: HR = 1.17, p = 3.09 × 10−2; Figure 6) and non-IO individuals (older age-OS: HR = 1.02, p = 7.50 × 10−49; female sex-OS: HR = 0.84, p = 8.23 × 10−8; metastatic site-OS: HR = 1.97, p = 2.17 × 10−109; Table S20; Figure 6). These associations were broadly consistent with prior work.72,73,74 We re-tested each association in a conditional analysis with TMB/CNB and observed the effects were robust, indicating that while increased age, male sex, and metastatic tumor sites were associated with increased TMB/CNB, their association with OS is independent of their relationship with TMB/CNB.

Figure 6.

Figure 6

Clinical features are associated with overall survival

Forest plots showing the estimated effect sizes and the 95% confidence interval of clinical features on overall survival (OS) in Profile for both immunotherapy (IO) individuals and non-IO individuals.

Each sub-figure corresponds to a different clinical feature. Within each sub-figure there is a panel for IO individuals and one for non-IO individuals. We condition on various somatic burden definitions and use a unique symbol for each definition. The significance of the regression is indicated with color. Shown are (A) age-OS association, (B) sex-OS association, (C) metastatic status-OS association.

The one exception to the age-OS association was ovarian cancer, where increased age was significantly protective for IO recipients (HR = 0.95, p = 1.08 × 10−3). We tested a number of sources of confounding on the protective effect of age in ovarian cancer among IO recipients. The association remained significant after controlling for the cancer subtype (HR = 0.96, p = 3.0 × 10−2) and accounting for heterogeneity in treatment history including line of treatment, concurrent treatment with chemotherapy, and whether the sequencing was performed after starting IO (HR = 0.93, p = 2.5 × 10−4). While immunotherapy is currently broadly approved for only ovarian cancers with mismatch repair deficiency or TMB-H, this surprising association with age may merit further analysis in randomized studies.

Germline factors associated with survival for immunotherapy recipients

We next considered how PRSs impacted survival, beginning with IO recipients (Table S21). No PRS was marginally associated with OS, and we therefore focused on potential modifiers of TMB/CNB through interaction analyses. We identified two significant associations after Bonferroni correction for the number of events tested per cancer. For individuals with non-small cell lung cancer, there was an interaction between TMB and the EA PRS (HRinter = 1.16, p = 1.13 × 10−2; Table S22). This interaction effect indicates that individuals with a higher EA PRS and higher TMB fared worse than would be expected from their EA PRS or TMB values alone. Surprisingly, the interaction was less significant when replacing the EA PRS with SRCA (SRCA HRinter = 1.31, p = 2.15 × 10−2). Including SRCA in the model with the EA PRS, while significant (SRCA HR = 0.72, p = 5.26 × 10−3), did not dampen but rather increased the significance of the EA PRS-TMB interaction (PRS HRinter = 1.20, p = 5.03 × 10−3). Additionally, the PRS interaction effect remained significant (HRinter = 1.17, p = 1.05 × 10−2) even when including SRS as an independent variable which did not have a significant marginal or interaction effect. In sum, the EA PRS and SRCA capture independent factors that modify the influence of TMB on survival.

The other significant interaction effect in IO recipients was observed in melanoma, where there was a protective interaction between higher All CNB and high PRS for smoker lung cancer risk (HR = 0.72, p = 8.74 × 10−3; Table S22). While neither All CNB nor the lung cancer PRS had a marginal effect on OS, this interaction indicates that individuals with higher somatic CNV burden and a higher PRS fare significantly better than expected. The interaction was not dampened when including either SRS or the smoking (cigarettes per day) PRS as an independent variable, implying that the interaction is not due to smoking cigarettes directly but rather the risk of developing lung cancer. While this surprising association merits replication, it indicates that some germline factors may dampen the influence of CNB on clinical outcomes.20,77 Interestingly, no significant PRS associations nor interactions were observed in the much larger population of non-IO individuals, suggesting that the influence of germline factors on survival may be particularly pronounced for immunotherapy.

Lastly, genetic ancestry was significantly associated with OS in multiple cancers for non-IO individuals. AJ ancestry had a significant protective effect (non-small cell lung cancer: HR = 0.72, p = 3.47 × 10−3; leukemia: HR = 0.34, p = 3.40 × 10−3; pan-cancer: HR = 0.83, p = 1.93 × 10−4; Table S20; Figure S10) even after conditioning on TMB. There were no significant ancestry-OS associations in the IO population nor were there significant interaction effects in either population.

Comparison to previous TCGA analyses

We examined the cross replication of the 26 uniquely reported TMB/CNB associations testable in The Cancer Genome Atlas (TCGA) as well as 24 previously reported findings, 12/24 were significant discoveries in Profile (Table S5). As our analysis pipeline differs from previous work, we re-analyzed the previous discoveries. In particular, our analysis pipeline was restricted to individuals with European ancestry and those who were microsatellite stable to prevent confounding by continental ancestry,50 race-based biases from inequitable access to care, and spurious associations due to hypermutability. We reproduced 14/24 findings using our pipeline, 9 of which were among the 12 significant Profile findings, suggesting that most TCGA results not replicating in Profile were due to differences in analysis covariates. Of the 26 unique discoveries, we replicated 11 nominally and 5 remained significant after Bonferroni correction. We did not consider metastatic status and survival analyses due to the limited availability of metastatic tumors and survival measurements in TCGA, and we forgo replicating ancestry discoveries due to TCGA facing the same limitations as Tempus while also having less than half of the sample size (Figure S11).

Discussion

Here, we conducted a study of host factors influencing TMB/CNB across two large cohorts and identified numerous associations between both demographic and polygenic germline features, as well as replicating results from many previous studies (Figure 7). Overall, we focused on pan-cancer relationships, but additionally reported a number of cancer type-specific associations. As both TMB and the effectiveness of immunotherapy varies by cancer type, understanding the host influence on TMB/CNB within each cancer type can be informative to clinical care. Age and sex had a strong effect on TMB and a weaker effect on CNB across many contexts. Metastatic specimens harbored increased TMB/CNB, and the effect was more pronounced in metastatic sites than when looking at all tumors from individuals with metastatic disease. Additionally, fine-scale genetic ancestry was associated with TMB pan-cancer, both through the Northwest-Southeast European cline and Ashkenazi Jewish ancestry, implicating potential environmental exposures or (in the case of AJ ancestry) founder effects. We note that these associations may have also captured other factors such as socioeconomic status or demographics. Smoking causally increased TMB by 57 exonic mutations per ten cigarettes (half a pack) per day in lung cancer and the limited ability to tan causally increased TMB by more than 380 exonic mutations in melanoma. The white blood cell count (WBC) PRS had a protective effect on TMB while directly measured WBC did not have an effect on TMB (WBC PRS: β = −0.026, p = 2.25 × 10−3; WBC: β = −0.001, p = 6.14 × 10−1), possibly implicating broader or pre-diagnosis immunological mechanisms captured by germline risk. In analyses of clinical outcomes, many host features were linked with overall survival. While these features were often significantly correlated with TMB/CNB, their effect on OS remained significant even after conditioning on somatic burden.

Figure 7.

Figure 7

Schematic of discoveries overview schematic of the Bonferroni significant Profile discoveries

Each row shows a distinct independent variable organized by outcome: somatic burden (top) and overall survival (bottom) as well as category. Each column represents a cancer and within each cancer, the three somatic burdens: All CNB, CNB, and TMB. The significant positive associations are shown in green while significant negative associations are in blue. A red box indicates well known discoveries (particularly, those in TCGA) that failed to replicate in our study.

One of the more surprising and consistent associations we observed was with the PRS for educational attainment (EA) in non-small cell lung cancer though we note previous work has linked EA with cancer incidence particularly with smoking-related cancers.78 The EA PRS was significantly associated with TMB in all three cohorts (Profile, Tempus, and TCGA). However, this association with TMB was no longer significant after adjusting for self-reported college attainment, suggesting that college attainment itself (or correlated factors) fully explains the germline component captured by the PRS. In the immunotherapy-treated population, we observed a significant interaction between the EA PRS and TMB on overall survival (with higher EA dampening the protective effect of TMB). However, this PRS interaction remained significant even after conditioning on self-reported college attainment and ever smoking; in fact, it became more significant. Taken together, these findings suggest that the EA PRS likely captures components of socioeconomic status and environment that modify the prognostic benefit of TMB. As the germline genetics underlying the PRS are fixed at conception and thus necessarily precedes TMB, PRS associations may implicate modifiable risk factors for immunotherapy outcomes. We note, however, that these potentially modifiable risks are likely not educational attainment itself, but other components of environment and socioeconomic status that are correlated with the PRS. In the case of the EA PRS specifically, only 30% of the genetic effect has been shown to associate with EA directly,79 with the rest explained by genetic variation in parents or relatives,80 indicative of complex environmental interplay.

While some of our findings have previously been reported across tens of publications, the majority of those discoveries utilized a single cohort, TCGA. By studying a distinct cohort from TCGA as well as analyzing two heterogeneous populations within our cohort, we now have a greater ability to determine which effects are consistent across institutional settings and treatment regimes. Our study also differed in its breadth from many of the previous studies. While Sun et al. explored the effect of various PRS on TMB, their focus was solely pan-cancer, and they did not explore CNB nor any downstream effects on survival31; separately, Namba et al. analyzed the effect of PRS on age of onset and somatic burden within cancers, but evaluated only cancer risk PRSs.33 While recent studies outside of TCGA have identified continental ancestry effects on TMB,60 we focused our analyses on fine-scaled genetic ancestries within Europeans. Nassar et al. showed ancestry-specific bias in TMB estimates across continental ancestries which were not present in our cohorts when restricting to ancestries within Europe (Table S5).50 Previous work also found that TMB and CNB are associated with immunotherapy outcomes. While we replicate the effect of TMB on OS, our inability to replicate the CNB effect adds to the discussion of whether these findings are robust to different institutional settings as well as differing definitions of CNB.20,21 In sum, the increased sample size of our study allowed for a comprehensive analysis that demonstrated the wide range of factors and interactions of factors significantly influencing TMB and to a lesser extent CNB.

Our study has multiple limitations. While the heterogeneity between the Profile and Tempus cohorts is advantageous for identifying robust findings, it hinders replication, particularly for weaker effect sizes. One major source of heterogeneity is the difference in ascertainment: one cohort originates from a single tertiary cancer center while the other cohort contains individuals from multiple institutions. To better understand the impact of differing patient populations, we analyzed the distribution of clinical features for each individual cancer and pan-cancer (Figures S12–S16). We observed consistency in age, both pan-cancer and within cancers, whereas metastatic status was consistent only pan-cancer (though we note two different definitions between the cohorts). Tumor purity, which can influence somatic variant calling,43 was generally lower in the Tempus data, though the availability of normal-match sampling (78% of samples) likely mitigates the influence on somatic calling (Figure S6).45 Finally, there were also differences in the somatic calling pipeline (Figures S2–S4) and while our study shows which findings are generalizable over different calling strategies, standardized burden definitions would enable increased power to discover associations across multiple cohorts.

In addition to the limitations within the study design, there were a number of broader considerations. We restricted our analyses to European samples due to sample size, so the generalizability of our findings to cohorts with different ancestral backgrounds is uncertain. Further work is needed to explore differences by continental ancestry, which has been linked with molecular differences within cancers and tissues.81 Another concern is that individuals were primarily sequenced while receiving treatments that can influence somatic burden, but granular treatment history was largely unavailable in Profile and not at all in Tempus. For immunotherapy recipients, our analyses cannot rule out reverse causation where individuals with TMB < 10 were approved for immunotherapy due to other mitigating circumstances, such as very advanced stage disease, progression while receiving other therapies, or clinical trials. In this case the impact of TMB-H on OS may be confounded by systemic differences between who is eligible for immunotherapy instead of the direct effect of TMB-H. Lastly, we explored overall survival only and did not consider treatment response/progression-free survival as these measurements were not available.

The discoveries presented here uncover host determinants of the stochastic process of somatic variant accumulation. By understanding the influence of host features, we can move toward personalized oncology where treatments are recommended based on the patient, their tumor, and the interaction between the two. While our findings highlight this interplay, further work is needed to understand the clinical implications for specific treatment contexts and decisions. Previous work has suggested that the TMB-H threshold (TMB ≥ 10) may not be optimal for all cancers,82 and our work further points to host level factors that may additionally be relevant for personalizing the TMB-H threshold. We separately note that the TMB-H incidence rate within cancer types varies between the two cohorts. In prior work, both panels’ TMB estimates were shown to be consistent with that obtained from whole-exome sequencing, suggesting that these differences may be a function of differences in ascertainment or demographic heterogeneity.45,83 In addition to the improvement to personalized care, our use of PRS and MR to precisely estimate the causal impact of environmental exposures on TMB can serve as a platform for broader causal inference in immuno-oncology.

Data and code availability

Individual level data are protected due to patient privacy. All summary statistics are available within the supplementary tables. Analysis scripts are available at: https://github.com/koditaraszka/pancancer/.

Acknowledgments

N.Z. and K.T. were supported by NIH grants K25HL121295, U01HG009080, R01HG006399, R01CA227237, R01ES029929, and R01HG011345; the DOD grant W81XWH-16-2-0018; and the Chan Zuckerberg Science Initiative. A.G. and K.T. were supported by NIH grants R01CA227237, R01CA244569, and R01CA262577. A.G. was also supported by awards from the Phi Beta Psi Sorority, the Doris Duke Charitable Foundation, and the Emerson Collective.

Declaration of interests

S.G. is an employee of GSK. D.K. and R.T. are employees of Tempus, Inc. K.W. is an employee of National University of Singapore and Scientific Advisor to Tempus, Inc.

Published: January 10, 2024

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2023.12.010.

Contributor Information

Kodi Taraszka, Email: kodi_taraszka@dfci.harvard.edu.

Alexander Gusev, Email: alexander_gusev@dfci.harvard.edu.

Supplemental information

Document S1. Figures S1–S16 and Tables S2, S4, S7–S9, S11, S13, S18, S20, and S22
mmc1.pdf (2.9MB, pdf)
Tables S1, S3, S5, S6, S10, S12, S14–S17, S19, and S21
mmc2.xlsx (609.9KB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (7.2MB, pdf)

References

  • 1.Ponder B.A. Cancer genetics. Nature. 2001;411:336–341. doi: 10.1038/35077207. [DOI] [PubMed] [Google Scholar]
  • 2.Pleasance E.D., Cheetham R.K., Stephens P.J., McBride D.J., Humphray S.J., Greenman C.D., Varela I., Lin M.-L., Ordóñez G.R., Bignell G.R., et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature. 2010;463:191–196. doi: 10.1038/nature08658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Greenman C., Stephens P., Smith R., Dalgliesh G.L., Hunter C., Bignell G., Davies H., Teague J., Butler A., Stevens C., et al. Patterns of somatic mutation in human cancer genomes. Nature. 2007;446:153–158. doi: 10.1038/nature05610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Rashkin S.R., Graff R.E., Kachuri L., Thai K.K., Alexeeff S.E., Blatchins M.A., Cavazos T.B., Corley D.A., Emami N.C., Hoffman J.D., et al. Pan-cancer study detects genetic risk variants and shared genetic basis in two large cohorts. Nat. Commun. 2020;11:4423. doi: 10.1038/s41467-020-18246-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chen S., Iversen E.S., Friebel T., Finkelstein D., Weber B.L., Eisen A., Peterson L.E., Schildkraut J.M., Isaacs C., Peshkin B.N., et al. Characterization of BRCA1 and BRCA2 mutations in a large United States sample. J. Clin. Oncol. 2006;24:863–871. doi: 10.1200/JCO.2005.03.6772. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Goldgar D.E., Easton D.F., Cannon-Albright L.A., Skolnick M.H. Systematic population-based assessment of cancer risk in first-degree relatives of cancer probands. J. Natl. Cancer Inst. 1994;86:1600–1608. doi: 10.1093/jnci/86.21.1600. [DOI] [PubMed] [Google Scholar]
  • 7.Rheinbay E., Nielsen M.M., Abascal F., Wala J.A., Shapira O., Tiao G., Hornshøj H., Hess J.M., Juul R.I., Lin Z., et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature. 2020;578:102–111. doi: 10.1038/s41586-020-1965-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Vogelstein B., Papadopoulos N., Velculescu V.E., Zhou S., Diaz L.A., Kinzler K.W. Cancer Genome Landscapes. Science. 2013;339:1546–1558. doi: 10.1126/science.1235122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Alexandrov L.B., Nik-Zainal S., Wedge D.C., Aparicio S.A.J.R., Behjati S., Biankin A.V., Bignell G.R., Bolli N., Borg A., Børresen-Dale A.L., et al. Signatures of mutational processes in human cancer. Nature. 2013;500:415–421. doi: 10.1038/nature12477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Zehir A., Benayed R., Shah R.H., Syed A., Middha S., Kim H.R., Srinivasan P., Gao J., Chakravarty D., Devlin S.M., et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 2017;23:1004. doi: 10.1038/nm.4333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Chalmers Z.R., Connelly C.F., Fabrizio D., Gay L., Ali S.M., Ennis R., Schrock A., Campbell B., Shlien A., Chmielecki J., et al. Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Med. 2017;9:34. doi: 10.1186/s13073-017-0424-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Sha D., Jin Z., Budczies J., Kluck K., Stenzinger A., Sinicrope F.A. Tumor Mutational Burden as a Predictive Biomarker in Solid Tumors. Cancer Discov. 2020;10:1808–1825. doi: 10.1158/2159-8290.CD-20-0522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Riviere G., Okamura High Tumor Mutational Burden Correlates with Longer Survival in Immunotherapy-Naïve Patients with Diverse CancersTMB and Overall Survival. Mol. Cancer. 2020;19:2139–2145. doi: 10.1158/1535-7163.MCT-20-0161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Samstein R.M., Lee C.-H., Shoushtari A.N., Hellmann M.D., Shen R., Janjigian Y.Y., Barron D.A., Zehir A., Jordan E.J., Omuro A., et al. Tumor mutational load predicts survival after immunotherapy across multiple cancer types. Nat. Genet. 2019;51:202–206. doi: 10.1038/s41588-018-0312-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Turajlic S., Litchfield K., Xu H., Rosenthal R., McGranahan N., Reading J.L., Wong Y.N.S., Rowan A., Kanu N., Al Bakir M., et al. Insertion-and-deletion-derived tumour-specific neoantigens and the immunogenic phenotype: a pan-cancer analysis. Lancet Oncol. 2017;18:1009–1021. doi: 10.1016/S1470-2045(17)30516-8. [DOI] [PubMed] [Google Scholar]
  • 16.Wang X., Ricciuti B., Nguyen T., Li X., Rabin M.S., Awad M.M., Lin X., Johnson B.E., Christiani D.C. Association between Smoking History and Tumor Mutation Burden in Advanced Non-Small Cell Lung Cancer. Cancer Res. 2021;81:2566–2573. doi: 10.1158/0008-5472.CAN-20-3991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Dunlop M.G., Farrington S.M., Carothers A.D., Wyllie A.H., Sharp L., Burn J., Liu B., Kinzler K.W., Vogelstein B. Cancer risk associated with germline DNA mismatch repair gene mutations. Hum. Mol. Genet. 1997;6:105–110. doi: 10.1093/hmg/6.1.105. [DOI] [PubMed] [Google Scholar]
  • 18.Chen S., Wang W., Lee S., Nafa K., Lee J., Romans K., Watson P., Gruber S.B., Euhus D., Kinzler K.W., et al. Prediction of germline mutations and cancer risk in the Lynch syndrome. JAMA. 2006;296:1479–1487. doi: 10.1001/jama.296.12.1479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hieronymus H., Murali R., Tin A., Yadav K., Abida W., Moller H., Berney D., Scher H., Carver B., Scardino P., et al. Tumor copy number alteration burden is a pan-cancer prognostic factor associated with recurrence and death. Elife. 2018;7 doi: 10.7554/eLife.37294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Davoli T., Uno H., Wooten E.C., Elledge S.J. Tumor aneuploidy correlates with markers of immune evasion and with reduced response to immunotherapy. Science. 2017;355 doi: 10.1126/science.aaf8399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Spurr L.F., Weichselbaum R.R., Pitroda S.P. Tumor aneuploidy predicts survival following immunotherapy across multiple cancers. Nat. Genet. 2022;1–4 doi: 10.1038/s41588-022-01235-4. [DOI] [PubMed] [Google Scholar]
  • 22.Li C.H., Haider S., Boutros P.C. Age influences on the molecular presentation of tumours. Nat. Commun. 2022;13:208. doi: 10.1038/s41467-021-27889-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.de Magalhães J.P. How ageing processes influence cancer. Nat. Rev. Cancer. 2013;13:357–365. doi: 10.1038/nrc3497. [DOI] [PubMed] [Google Scholar]
  • 24.Li C.H., Prokopec S.D., Sun R.X., Yousif F., Schmitz N., PCAWG Tumour Subtypes and Clinical Translation. Boutros P.C., PCAWG Consortium Sex differences in oncogenic mutational processes. Nature. 2020;11:4330. doi: 10.1038/s41467-020-17359-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Li C.H., Haider S., Shiah Y.-J., Thai K., Boutros P.C. Sex Differences in Cancer Driver Genes and Biomarkers. Cancer Res. 2018;78:5527–5537. doi: 10.1158/0008-5472.CAN-18-0362. [DOI] [PubMed] [Google Scholar]
  • 26.Hoeijmakers J.H.J. DNA damage, aging, and cancer. N. Engl. J. Med. 2009;361:1475–1485. doi: 10.1056/NEJMra0804615. [DOI] [PubMed] [Google Scholar]
  • 27.Schnidrig D., Turajlic S., Litchfield K. Tumour mutational burden: primary versus metastatic tissue creates systematic bias. Immunooncol. Technol. 2019;4:8–14. doi: 10.1016/j.iotech.2019.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Barroso-Sousa R., Jain E., Cohen O., Kim D., Buendia-Buendia J., Winer E., Lin N., Tolaney S.M., Wagle N. Prevalence and mutational determinants of high tumor mutation burden in breast cancer. Ann. Oncol. 2020;31:387–394. doi: 10.1016/j.annonc.2019.11.010. [DOI] [PubMed] [Google Scholar]
  • 29.Porta-Pardo E., Sayaman R., Ziv E., Valencia A. 2020. The Landscape of Interactions between Cancer Polygenic Risk Scores and Somatic Alterations in Cancer Cells. [Google Scholar]
  • 30.Liu Y., Gusev A., Heng Y.J., Alexandrov L.B., Kraft P. Somatic mutational profiles and germline polygenic risk scores in human cancer. Genome Med. 2022;14:14. doi: 10.1186/s13073-022-01016-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Sun X., Xue A., Qi T., Chen D., Shi D., Wu Y., Zheng Z., Zeng J., Yang J. Tumor Mutational Burden Is Polygenic and Genetically Associated with Complex Traits and Diseases. Cancer Res. 2021;81:1230–1239. doi: 10.1158/0008-5472.CAN-20-3459. [DOI] [PubMed] [Google Scholar]
  • 32.Smith G.D., Ebrahim S. “Mendelian randomization”: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 2003;32:1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]
  • 33.Namba S., Saito Y., Kogure Y., Masuda T., Bondy M.L., Gharahkhani P., Gockel I., Heider D., Hillmer A., Jankowski J., et al. Common germline risk variants impact somatic alterations and clinical features across cancers. Cancer Res. CAN –. 2022:22–1492. doi: 10.1158/0008-5472.CAN-22-1492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Carter H., Marty R., Hofree M., Gross A.M., Jensen J., Fisch K.M., Wu X., DeBoever C., Van Nostrand E.L., Song Y., et al. Interaction Landscape of Inherited Polymorphisms with Somatic Events in Cancer. Cancer Discov. 2017;7:410–423. doi: 10.1158/2159-8290.CD-16-1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Rasnic R., Brandes N., Zuk O., Linial M. Substantial batch effects in TCGA exome sequences undermine pan-cancer analysis of germline variants. BMC Cancer. 2019;19:783. doi: 10.1186/s12885-019-5994-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Choi J.-H., Hong S.-E., Woo H.G. Pan-cancer analysis of systematic batch effects on somatic sequence variations. BMC Bioinf. 2017;18:211. doi: 10.1186/s12859-017-1627-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Buckley A.R., Standish K.A., Bhutani K., Ideker T., Lasken R.S., Carter H., Harismendy O., Schork N.J. Pan-cancer analysis reveals technical artifacts in TCGA germline variant calls. BMC Genom. 2017;18:458. doi: 10.1186/s12864-017-3770-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Koire A., Katsonis P., Lichtarge O. REPURPOSING GERMLINE EXOMES OF THE CANCER GENOME ATLAS DEMANDS A CAUTIOUS APPROACH AND SAMPLE-SPECIFIC VARIANT FILTERING. Pac. Symp. Biocomput. 2016;21:207–218. [PMC free article] [PubMed] [Google Scholar]
  • 39.André F., Arnedos M., Baras A.S., Baselga J., Bedard P.L., Berger M.F., Bierkens M., Calvo F., Cerami E., Chakravarty D., et al. AACR Project GENIE: Powering Precision Medicine through an International Consortium. Cancer Discov. 2017;7:818–831. doi: 10.1158/2159-8290.CD-17-0151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Cheng D.T., Mitchell T.N., Zehir A., Shah R.H., Benayed R., Syed A., Chandramohan R., Liu Z.Y., Won H.H., Scott S.N., et al. Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): A Hybridization Capture-Based Next-Generation Sequencing Clinical Assay for Solid Tumor Molecular Oncology. J. Mol. Diagn. 2015;17:251–264. doi: 10.1016/j.jmoldx.2014.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Garcia E.P., Minkovsky A., Jia Y., Ducar M.D., Shivdasani P., Gong X., Ligon A.H., Sholl L.M., Kuo F.C., MacConaill L.E., et al. Validation of OncoPanel: A Targeted Next-Generation Sequencing Assay for the Detection of Somatic Variants in Cancer. Arch. Pathol. Lab Med. 2017;141:751–758. doi: 10.5858/arpa.2016-0527-OA. [DOI] [PubMed] [Google Scholar]
  • 42.Davies R.W., Flint J., Myers S., Mott R. Rapid genotype imputation from sequence without reference panels. Nat. Genet. 2016;48:965–969. doi: 10.1038/ng.3594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Gusev A., Groha S., Taraszka K., Semenov Y.R., Zaitlen N. Constructing germline research cohorts from the discarded reads of clinical tumor sequences. Genome Med. 2021;13:179. doi: 10.1186/s13073-021-00999-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Chen C.-Y., Pollack S., Hunter D.J., Hirschhorn J.N., Kraft P., Price A.L. Improved ancestry inference using weights from external reference panels. Bioinformatics. 2013;29:1399–1406. doi: 10.1093/bioinformatics/btt144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Beaubier N., Bontrager M., Huether R., Igartua C., Lau D., Tell R., Bobe A.M., Bush S., Chang A.L., Hoskinson D.C., et al. Integrated genomic profiling expands clinical options for patients with cancer. Nat. Biotechnol. 2019;37:1351–1360. doi: 10.1038/s41587-019-0259-z. [DOI] [PubMed] [Google Scholar]
  • 46.McCarthy S., Das S., Kretzschmar W., Delaneau O., Wood A.R., Teumer A., Kang H.M., Fuchsberger C., Danecek P., Sharp K., et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 2016;48:1279–1283. doi: 10.1038/ng.3643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Niu B., Ye K., Zhang Q., Lu C., Xie M., McLellan M.D., Wendl M.C., Ding L. MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics. 2014;30:1015–1016. doi: 10.1093/bioinformatics/btt755. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ding L., Bailey M.H., Porta-Pardo E., Thorsson V., Colaprico A., Bertrand D., Gibbs D.L., Weerasinghe A., Huang K.-L., Tokheim C., et al. Perspective on Oncogenic Processes at the End of the Beginning of Cancer Genomics. Cell. 2018;173:305–320.e10. doi: 10.1016/j.cell.2018.03.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Aran D., Sirota M., Butte A.J. Systematic pan-cancer analysis of tumour purity. Nat. Commun. 2015;6:8971. doi: 10.1038/ncomms9971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Nassar A.H., Adib E., Abou Alaiwi S., El Zarif T., Groha S., Akl E.W., Nuzzo P.V., Mouhieddine T.H., Perea-Chamblee T., Taraszka K., et al. Ancestry-driven recalibration of tumor mutational burden and disparate clinical outcomes in response to immune checkpoint inhibitors. Cancer Cell. 2022;40:1161–1172.e5. doi: 10.1016/j.ccell.2022.08.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Burgess S., Butterworth A., Thompson S.G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol. 2013;37:658–665. doi: 10.1002/gepi.21758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Bowden J., Davey Smith G., Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 2015;44:512–525. doi: 10.1093/ije/dyv080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Singal G., Miller P.G., Agarwala V., Li G., Kaushik G., Backenroth D., Gossai A., Frampton G.M., Torres A.Z., Lehnert E.M., et al. Association of Patient Characteristics and Tumor Genomics With Clinical Outcomes Among Patients With Non–Small Cell Lung Cancer Using a Clinicogenomic Database. JAMA. 2019;321:1391–1399. doi: 10.1001/jama.2019.3241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Marabelle A., Fakih M., Lopez J., Shah M., Shapira-Frommer R., Nakagawa K., Chung H.C., Kindler H.L., Lopez-Martin J.A., Miller W.H., Jr., et al. Association of tumour mutational burden with outcomes in patients with advanced solid tumours treated with pembrolizumab: prospective biomarker analysis of the multicohort, open-label, phase 2 KEYNOTE-158 study. Lancet Oncol. 2020;21:1353–1365. doi: 10.1016/S1470-2045(20)30445-9. [DOI] [PubMed] [Google Scholar]
  • 55.Marcus L., Fashoyin-Aje L.A., Donoghue M., Yuan M., Rodriguez L., Gallagher P.S., Philip R., Ghosh S., Theoret M.R., Beaver J.A., et al. FDA Approval Summary: Pembrolizumab for the Treatment of Tumor Mutational Burden-High Solid Tumors. Clin. Cancer Res. 2021;27:4685–4689. doi: 10.1158/1078-0432.CCR-21-0327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.1000 Genomes Project Consortium. Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., Abecasis G.R. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Gupta S., Artomov M., Goggins W., Daly M., Tsao H. Gender Disparity and Mutation Burden in Metastatic Melanoma. J. Natl. Cancer Inst. 2015;107 doi: 10.1093/jnci/djv221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Schwartz M.R., Luo L., Berwick M. Sex Differences in Melanoma. Curr. Epidemiol. Rep. 2019;6:112–118. doi: 10.1007/s40471-019-00192-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Price A.L., Butler J., Patterson N., Capelli C., Pascali V.L., Scarnicci F., Ruiz-Linares A., Groop L., Saetta A.A., Korkolopoulou P., et al. Discerning the ancestry of European Americans in genetic association studies. PLoS Genet. 2008;4:e236. doi: 10.1371/journal.pgen.0030236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Arora K., Tran T.N., Kemel Y., Mehine M., Liu Y.L., Nandakumar S., Smith S.A., Brannon A.R., Ostrovnaya I., Stopsack K.H., et al. Genetic Ancestry Correlates with Somatic Differences in a Real-World Clinical Cancer Sequencing Cohort. Cancer Discov. 2022;12:2552–2565. doi: 10.1158/2159-8290.CD-22-0312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Brenner M., Hearing V.J. The protective role of melanin against UV damage in human skin. Photochem. Photobiol. 2008;84:539–549. doi: 10.1111/j.1751-1097.2007.00226.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Zhang H., Ahearn T.U., Lecarpentier J., Barnes D., Beesley J., Qi G., Jiang X., O’Mara T.A., Zhao N., Bolla M.K., et al. Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses. Nat. Genet. 2020;52:572–581. doi: 10.1038/s41588-020-0609-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Melin B.S., Barnholtz-Sloan J.S., Wrensch M.R., Johansen C., Il’yasova D., Kinnersley B., Ostrom Q.T., Labreche K., Chen Y., Armstrong G., et al. Genome-wide association study of glioma subtypes identifies specific differences in genetic susceptibility to glioblastoma and non-glioblastoma tumors. Nat. Genet. 2017;49:789–794. doi: 10.1038/ng.3823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Liu M., Jiang Y., Wedow R., Li Y., Brazel D.M., Chen F., Datta G., Davila-Velderrain J., McGuire D., Tian C., et al. Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat. Genet. 2019;51:237–244. doi: 10.1038/s41588-018-0307-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Phelan C.M., Kuchenbaecker K.B., Tyrer J.P., Kar S.P., Lawrenson K., Winham S.J., Dennis J., Pirie A., Riggan M.J., Chornokur G., et al. Identification of 12 new susceptibility loci for different histotypes of epithelial ovarian cancer. Nat. Genet. 2017;49:680–691. doi: 10.1038/ng.3826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.McKay J.D., Hung R.J., Han Y., Zong X., Carreras-Torres R., Christiani D.C., Caporaso N.E., Johansson M., Xiao X., Li Y., et al. Large-scale association analysis identifies new lung cancer susceptibility loci and heterogeneity in genetic susceptibility across histological subtypes. Nat. Genet. 2017;49:1126–1132. doi: 10.1038/ng.3892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Loh P.-R., Kichaev G., Gazal S., Schoech A.P., Price A.L. Mixed-model association for biobank-scale datasets. Nat. Genet. 2018;50:906–908. doi: 10.1038/s41588-018-0144-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Scelo G., Purdue M.P., Brown K.M., Johansson M., Wang Z., Eckel-Passow J.E., Ye Y., Hofmann J.N., Choi J., Foll M., et al. Genome-wide association study identifies multiple risk loci for renal cell carcinoma. Nat. Commun. 2017;8 doi: 10.1038/ncomms15724. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Schumacher F.R., Al Olama A.A., Berndt S.I., Benlloch S., Ahmed M., Saunders E.J., Dadaev T., Leongamornlert D., Anokian E., Cieza-Borrella C., et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nat. Genet. 2018;50:928–936. doi: 10.1038/s41588-018-0142-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Giat E., Ehrenfeld M., Shoenfeld Y. Cancer and autoimmune diseases. Autoimmun. Rev. 2017;16:1049–1057. doi: 10.1016/j.autrev.2017.07.022. [DOI] [PubMed] [Google Scholar]
  • 71.Alexandrov L.B., Ju Y.S., Haase K., Van Loo P., Martincorena I., Nik-Zainal S., Totoki Y., Fujimoto A., Nakagawa H., Shibata T., et al. Mutational signatures associated with tobacco smoking in human cancer. Science. 2016;354:618–622. doi: 10.1126/science.aag0299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Cook M.B., McGlynn K.A., Devesa S.S., Freedman N.D., Anderson W.F. Sex disparities in cancer mortality and survival. Cancer Epidemiol. Biomarkers Prev. 2011;20:1629–1637. doi: 10.1158/1055-9965.EPI-11-0246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Riihimäki M., Thomsen H., Hemminki A., Sundquist K., Hemminki K. Comparison of survival of patients with metastases from known versus unknown primaries: survival in metastatic cancer. BMC Cancer. 2013;13:36. doi: 10.1186/1471-2407-13-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Van Herck Y., Feyaerts A., Alibhai S., Papamichael D., Decoster L., Lambrechts Y., Pinchuk M., Bechter O., Herrera-Caceres J., Bibeau F., et al. Is cancer biology different in older patients? Lancet. Healthy Longev. 2021;2:e663–e677. doi: 10.1016/S2666-7568(21)00179-3. [DOI] [PubMed] [Google Scholar]
  • 75.Shah S., Rachmat R., Enyioma S., Ghose A., Revythis A., Boussios S. BRCA Mutations in Prostate Cancer: Assessment, Implications and Treatment Considerations. Int. J. Mol. Sci. 2021;22 doi: 10.3390/ijms222312628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Cancer Genome Atlas Research Network. Kandoth C., Schultz N., Cherniack A.D., Akbani R., Liu Y., Shen H., Robertson A.G., Pashtan I., Shen R., et al. Integrated genomic characterization of endometrial carcinoma. Nature. 2013;497:67–73. doi: 10.1038/nature12113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Vinay D.S., Ryan E.P., Pawelec G., Talib W.H., Stagg J., Elkord E., Lichtor T., Decker W.K., Whelan R.L., Kumara H.M.C.S., et al. Immune evasion in cancer: Mechanistic basis and therapeutic strategies. Semin. Cancer Biol. 2015;35:S185–S198. doi: 10.1016/j.semcancer.2015.03.004. [DOI] [PubMed] [Google Scholar]
  • 78.Mouw T., Koster A., Wright M.E., Blank M.M., Moore S.C., Hollenbeck A., Schatzkin A. Education and risk of cancer in a large cohort of men and women in the United States. PLoS One. 2008;3:e3639. doi: 10.1371/journal.pone.0003639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Okbay A., Wu Y., Wang N., Jayashankar H., Bennett M., Nehzati S.M., Sidorenko J., Kweon H., Goldman G., Gjorgjieva T., et al. Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nat. Genet. 2022;54:437–449. doi: 10.1038/s41588-022-01016-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Kong A., Thorleifsson G., Frigge M.L., Vilhjalmsson B.J., Young A.I., Thorgeirsson T.E., Benonisdottir S., Oddsson A., Halldorsson B.V., Masson G., et al. The nature of nurture: Effects of parental genotypes. Science. 2018;359:424–428. doi: 10.1126/science.aan6877. [DOI] [PubMed] [Google Scholar]
  • 81.Carrot-Zhang J., Chambwe N., Damrauer J.S., Knijnenburg T.A., Robertson A.G., Yau C., Zhou W., Berger A.C., Huang K.-L., Newberg J.Y., et al. Comprehensive Analysis of Genetic Ancestry and Its Molecular Correlates in Cancer. Cancer Cell. 2020;37:639–654.e6. doi: 10.1016/j.ccell.2020.04.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.McGrail D.J., Pilié P.G., Rashid N.U., Voorwerk L., Slagter M., Kok M., Jonasch E., Khasraw M., Heimberger A.B., Lim B., et al. High tumor mutation burden fails to predict immune checkpoint blockade response across all cancer types. Ann. Oncol. 2021;32:661–672. doi: 10.1016/j.annonc.2021.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Vega D.M., Yee L.M., McShane L.M., Williams P.M., Chen L., Vilimas T., Fabrizio D., Funari V., Newberg J., Bruce L.K., et al. Aligning tumor mutational burden (TMB) quantification across diagnostic platforms: phase II of the Friends of Cancer Research TMB Harmonization Project. Ann. Oncol. 2021;32:1626–1636. doi: 10.1016/j.annonc.2021.09.016. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S16 and Tables S2, S4, S7–S9, S11, S13, S18, S20, and S22
mmc1.pdf (2.9MB, pdf)
Tables S1, S3, S5, S6, S10, S12, S14–S17, S19, and S21
mmc2.xlsx (609.9KB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (7.2MB, pdf)

Data Availability Statement

Individual level data are protected due to patient privacy. All summary statistics are available within the supplementary tables. Analysis scripts are available at: https://github.com/koditaraszka/pancancer/.


Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES