Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 May 2.
Published in final edited form as: Cancer Discov. 2022 Nov 2;12(11):2552–2565. doi: 10.1158/2159-8290.CD-22-0312

Genetic Ancestry Correlates with Somatic Differences in a Real-World Clinical Cancer Sequencing Cohort

Kanika Arora 1,2, Thinh Ngoc Tran 2,6, Yelena Kemel 2,4, Miika Mehine 1,2, Ying L Liu 5, Subhiksha Nandakumar 2,3,6, Shaleigh A Smith 2,3,6,^, A Rose Brannon 1, Irina Ostrovnaya 6, Konrad H Stopsack 5, Pedram Razavi 5, Anton Safonov 5, Hira A Rizvi 3,#, Matthew D Hellmann 5,#, Joseph Vijai 4,5, Thomas C Reynolds 8, James A Fagin 3,5, Jian Carrot-Zhang 6, Kenneth Offit 4,5, David B Solit 2,3,5, Marc Ladanyi 1,3, Nikolaus Schultz 2,3,6, Ahmet Zehir 1,#, Carol L Brown 7,8, Zsofia K Stadler 4,5, Debyani Chakravarty 1,2, Chaitanya Bandlamudi 1,2, Michael F Berger 1,2,3
PMCID: PMC9633436  NIHMSID: NIHMS1835156  PMID: 36048199

Abstract

Accurate ancestry inference is critical for identifying genetic contributors of cancer disparities among populations. While methods to infer genetic ancestry have historically relied upon genome-wide markers, the adaptation to targeted clinical sequencing panels presents an opportunity to incorporate ancestry inference into routine diagnostic workflows. We show that global ancestral contributions and admixture of continental populations can be quantitatively inferred using markers captured by the MSK-IMPACT clinical panel. In a pan-cancer cohort of 45,157 patients, we observed differences by ancestry in the frequency of somatic alterations, recapitulating known and revealing novel associations. Despite comparable overall prevalence of driver alterations by ancestry group, the proportion of patients with clinically actionable alterations was lower for African (30%) compared to European (33%) ancestry. While this result is largely explained by population-specific cancer subtype differences, it reveals an inequity in the degree to which different populations are served by existing precision oncology interventions.

Keywords: Ancestry, MSK-IMPACT, Clinical Genomics

Introduction

Cancer health disparities between racial and ethnic groups remain a key public health challenge in the United States (13). Differences in cancer incidence and mortality are due to a complex interplay of non-genetic factors such as access to healthcare, socioeconomic status, diet and lifestyle, and genetic factors including ancestral differences between populations. While highly correlated with race, ancestry is a distinct attribute that specifically refers to inherited genetic variation correlated with human migration patterns. Moreover, unlike race, ancestry can be inferred quantitatively, including in recently admixed populations such as African Americans and Latin Americans where individuals exhibit a mixture of alleles inherited from multiple ancestral groups (4).

Genetic contributions to observed differences in cancer incidence and clinical outcomes in different ancestral populations have been identified in many cancer types. For example, genome-wide mapping studies have revealed specific risk loci (5) that may contribute to the higher incidence and mortality rates for prostate cancer among African American men compared to European Americans (6). Similarly, population-based studies have consistently identified higher incidence rates of triple negative breast cancer in women of African ancestry compared to Europeans (7,8). Population-specific differences in the observed rates of specific somatic alterations have also been reported, for instance the higher rate of EGFR mutations in patients of Asian descent with non-small cell lung cancer even after accounting for smoking history (9,10).

In recent years, data from large-scale cancer genomics efforts have provided the opportunity to understand the relation between genetic ancestry and somatic features across cancer (11,12). However, the limited number of patients of non-European ancestry in these studies has precluded a thorough evaluation of ancestry-specific associations across all cancer types. These studies have typically relied on genome-wide ancestry markers from broad genome-scale next-generation sequencing (NGS), typically only available in the research setting. As targeted NGS panels are increasingly utilized clinically, the adaptation of these methods to more focused sets of markers presents an opportunity to prospectively incorporate ancestry inference as part of routine clinical care (13). The application to large cohorts of clinically sequenced patients may enable the identification of additional novel associations of ancestry with molecular and clinical features (14). At Memorial Sloan Kettering Cancer Center (MSK), we have compiled an enterprise-scale resource of tumor and matched normal sequencing data from over 500 cancer histologies using MSK-IMPACT, a US Food and Drug Administration (FDA)-authorized targeted NGS panel encompassing up to 505 cancer-associated genes (15).

Here, we demonstrate that ancestral contributions of African (AFR), European (EUR), East Asian (EAS), Native American (NAM) and South Asian (SAS) populations can be robustly inferred using more than 3,000 common SNP markers captured by MSK-IMPACT. We also show that Ashkenazi Jewish (ASJ) ancestry can be inferred with high sensitivity and specificity using an additional set of ASJ ancestry-informative markers. Applying these methods to infer the genetic ancestry of 45,157 patients who underwent prospective sequencing at MSK, we report differences by ancestry in the frequency of somatic features and the prevalence of clinically actionable targets of FDA-approved therapies.

Results

Genetic ancestry inference from MSK-IMPACT data

We first sought to determine whether single nucleotide polymorphism (SNP) markers within captured regions of the targeted MSK-IMPACT panel were sufficient to accurately infer genetic ancestry. To this end, we selected a reference dataset of 2,129 non-admixed individuals from 24 geographical populations who were sequenced using whole genome sequencing (WGS) in the 1000 Genomes Project (1KGP) (16) representing five different continental groups: African (AFR), Native American (NAM), East Asian (EAS), European (EUR) and South Asian (SAS) (Supplementary Figure S1AB, see Methods). Principal component analysis on the genome-wide SNP markers showed clear separation of five continental ancestry groups (Supplementary Figure S1C). To evaluate the power of MSK-IMPACT-specific markers to infer ancestry, we chose an independent dataset of 279 samples from the Simons Genome Diversity Project (SGDP) (17) comprising whole genome sequences from diverse human populations around the world. We used ADMIXTURE (18), a program for estimating ancestry from large autosomal SNP genotype datasets by modeling the probability of the observed genotypes using ancestry proportions and population allele frequencies, to estimate ancestry contributions from five continental population groups in all SGDP samples using both the genome-wide (n=716,011) and MSK-IMPACT-specific (n=3,331-5,378) SNP markers chosen from the different versions of the panel to evaluate the concordance in global ancestry inferences and admixture estimates.

Both the genome-wide and MSK-IMPACT-specific analyses revealed high proportions of NAM, AFR, EUR, EAS and SAS ancestries estimated for samples from America, Sub-Saharan Africa, West Eurasia, East Asia and South Asia, respectively (Figure 1A, Supplementary Figure S2A). We observed near-perfect concordance in the estimated ancestry proportion of admixed individuals between the two marker sets (Pearson’s correlation coefficient = 0.997, p<2.2e-16) (Figure 1B, Supplementary Figure S2BE).

Figure 1: ADMIXTURE analysis with MSK-IMPACT markers.

Figure 1:

(A) ADMIXTURE results on Simons Genome Diversity Project (SGDP) data using n=5,378 MSK-IMPACT SNP markers chosen from the latest version of the MSK-IMPACT panel: IMPACT-505. (B) Comparison of ancestry fractions inferred for SGDP samples using genome-wide markers and IMPACT-505-specific markers (C) Self-reported race, ethnicity, ADMIXTURE results, and assigned admixture labels in MSK-IMPACT cohort. (D) Distribution of inferred ancestral fractions (top) and assigned admixture labels (bottom) in self-reported Non-Hispanic/Latinx White, Hispanic/Latinx White, Non-Hispanic/Latinx Black, Hispanic/Latinx Black and Asian patients. ADM=admixed/other; AFR=African; EAS=East Asian; EUR=European; NAM=Native American; SAS=South Asian.

Having established that MSK-IMPACT markers are sufficient to accurately infer ancestry and admixture in the whole genome SGDP data, we repeated the same supervised ADMIXTURE analysis to infer ancestry contributions of AFR, EAS, EUR, NAM, and SAS populations in 45,157 individuals sequenced on MSK-IMPACT panels (Figure 1C, Supplementary Table S1). We assigned discrete ancestry labels to each patient where the contribution to ancestry of a single population was inferred to be ≥80%; the remaining were considered admixed or of other ancestry that is not represented in the reference panel. Altogether, 76% of patients were labeled EUR, 5% AFR, 6% EAS, 2% SAS, <1% NAM, and 11% Admixed/Other (ADM).

Five different versions of MSK-IMPACT were used during the accumulation of this dataset, comprising 341, 400, 410, 468, or 505 genes. We compared inferred ancestry proportion from samples run on different MSK-IMPACT versions and show that inferred ancestral proportions were highly concordant (Pearson’s correlation coefficient > 0.997, p<2.2e-16) across different versions and samples (Supplementary Figure S3AD). We also demonstrate that admixture estimates can be accurately inferred from tumor-only sequencing (Pearson’s correlation coefficient = 1, p<2.2e-16, Supplementary Figure S3EG).

As an independent comparator, both self-reported race and ethnicity were available for 40,321 individuals (89%). Of the self-reported non-Latinx White individuals, 96% had inferred EUR ancestry (Figure 1D). Of the self-reported non-Latinx Black/African American, 68% had inferred AFR ancestry and 31% were labeled ADM with mean AFR admixture contribution of 66%. On the other hand, the vast majority self-reported Hispanic/Latinx patients (72%) were ADM consistent with known admixtures of European, African, and Native American ancestries in Latin American populations.

In some cases, the inferred genetic ancestry was able to provide a more granular level of detail than the categories available for self-identification of race and ethnicity. For instance, while our genetic analysis specifically distinguishes individuals with either South Asian or East Asian ancestry, individuals were only able to self-report as a single broad “Asian” category. Indeed, of those self-reported as Asians, 64% had EAS, 20% SAS and 14% ADM inferred ancestry (Supplementary Figure S4AB). We were also able to assign ancestry to the 3,014 patients for whom self-identified race information was missing and would therefore have been excluded from analyses relying on self-reported race in this cohort with incomplete data on self-reported race (Supplementary Figure S4C).

Differences between the self-reported race and inferred genetic ancestry were observed in 2% of the patient cohort (739 patients) (Supplementary Figure S4D). Approximately 22% of these cases were individuals that self-reported as Hispanic/Latinx, compared to 6% of the entire cohort. Anticipating that a higher proportion of patients from such recently admixed populations may not self-identify with the available race categories, we focused on discrepancies in self-reported non-Hispanic/Latinx patients. We reasoned that these discrepancies may represent reporting or database errors. To further investigate this, we leveraged independent manually curated family histories collected for a subset of individuals as part of cancer predisposition testing. For 160 patients with discrepancies (27, 41, 43, 22, and 27 with EUR, AFR, EAS, SAS, and NAM inferred ancestry respectively), we identified compelling evidence in support of the inferred ancestry in 136/160 (85%) of cases, compelling evidence contradicting the inferred ancestry in 1/160 (<1%), and inconclusive data available for the remainder. 38/43 EAS individuals were identified through detailed family history as either Chinese, Filipino, Korean, Vietnamese or Singaporean. 16/22 SAS individuals were manually identified as either Indian or Pakistani, with the remaining patients identified as either from Guyana or Trinidad, where Indo-Caribbeans (individuals of Indian origin in the Caribbean) represent one of the largest ethnic groups. 10/15 patients predicted to be EUR and who self-reported as Asian originated from Middle Eastern countries and were expected to share more ancestry with Europeans than with South or East Asians. Taken together, these results reaffirm the accuracy of genetic ancestry inference from targeted panel data and demonstrate its utility and granularity when self-reported race may differ.

Ashkenazi Jewish ancestry inference

Ashkenazi Jews are of Eastern and Central European descent but tend to carry some genetic variants at a different frequency than other European populations (19). Among those variants are pathogenic germline mutations in certain cancer-predisposing genes, such as BRCA1 and BRCA2, which are associated with risk of breast, ovarian, prostate and pancreatic cancer (20). Before dedicated germline testing for pathogenic variants is obtained, ancestry inference may thus have value in clinical genetic risk assessment.

While the MSK-IMPACT markers were chosen to assess continental ancestries, to infer Ashkenazi Jewish (ASJ) ancestry we identified 282 additional SNP markers captured by MSK-IMPACT that had higher minor allele frequency (>1%) in ASJ compared to other populations (<0.1%). To evaluate the effectiveness of these markers for identifying ASJ individuals, we extracted genotypes at these marker sites from the MSK-IMPACT data in 8,217 patients for whom ASJ status was manually curated based on detailed family histories obtained through genetic counseling. Of the 1,257 patients whose ancestry was manually annotated as ASJ, 88% carried the minor allele of at least 3 out of the 282 ASJ ancestry-informative markers, compared to only 1% of the 6,960 patients manually annotated as nonASJ (Figure 2A), indicating an accuracy of 97% in ASJ ancestry inference. This threshold of 3 markers had the highest accuracy and was therefore used for subsequent analyses. Additionally, our inference of ASJ ancestry was concordant across MSK-IMPACT panel versions and between tumor and matched normal for assessment of continental ancestry (Supplementary Figure S5AD).

Figure 2: Ashkenazi Jewish ancestry inference.

Figure 2:

(A) Fraction of manually annotated Ashkenazi Jewish (n=1257) and non-Ashkenazi Jewish (n=6960) patients at different cutoffs of number of ASJ informative marker sites with het/hom-alt genotype calls. The triangles show the accuracy (secondary y-axis). Based on these results, patients with alternate alleles for at least 3 ASJ markers were labeled ASJ, and the remaining were labeled nonASJ (B) Representation of self-reported religion in patients assigned nonASJ and ASJ labels. (C) Fraction of patients with BRCA1/2 mutations in patients assigned nonASJ and ASJ labels. (D) Fraction of patients with BRCA 1/2 founder mutations and representation of self-reported religion binned by the number of het/hom-alt ASJ markers.

We inferred genotype for the ASJ informative markers using MSK-IMPACT data on the entire cohort of 45,157 individuals and assigned ASJ ancestry to patients that met the threshold of 3 non-reference markers. Altogether, 16% of the cohort was identified as ASJ, 98% of whom were labeled EUR by ADMIXTURE. We therefore separated patients with EUR ancestry by ADMIXTURE into Ashkenazi Jewish European (hereby referred to as ASJ) and non-Ashkenazi Jewish European (hereby referred to as EUR) based on their inferred ASJ ancestry (Supplementary Table S1).

Of the patients with self-reported religion information, 92% of those with ASJ inferred ancestry were self-identified Jewish (Figure 2B), and 84% of self-reported Jewish individuals had ASJ inferred ancestry. Considering that not all Jewish individuals are of Ashkenazi descent and that admixture is common in this population, the observed concordance between self-reported religion and ASJ ancestry suggests that our method is accurate and informative.

As an independent indicator of accuracy, we examined the prevalence of cancer-predisposing BRCA1 (c.68_69delAG, c.5266dupC) and BRCA2 (c.5946delT) founder mutations in patients with ASJ and nonASJ inferred ancestry (Supplementary Table S2). These mutations were neither included in nor in linkage disequilibrium with the ASJ-informative marker set. We found that the frequency of observing at least one of these three germline mutations was significantly higher in patients with ASJ inferred ancestry compared to nonASJ (5.1% vs 0.2%, Fisher’s exact test p=8.97e-200) (Figure 2C). Moreover, the prevalence of these BRCA1/2 founder mutations increased with the number of heterozygous (het) or homozygous alternate (hom-alt) ASJ markers (Figure 2D). Overall, 83% of patients with at least one of the BRCA1/2 founder mutations had ASJ inferred ancestry by our analysis. A majority (64%) of these patients had at least six het/hom-alt genotyped ASJ marker sites. Of the BRCA1/2 founder mutation positive patients with ASJ inferred ancestry, 33% did not self-report as Jewish (3% reported a different religion, and for 30% there was no report of a religion) and therefore may not have been considered for germline risk screening.

Ancestral differences in cancer type prevalence and somatic mutation patterns

We next examined the ancestry distribution within each cancer type for our cohort of 45,157 patients. AFR patients were overrepresented among patients with breast cancer, endometrial cancer, gastrointestinal stromal tumors, and uterine sarcoma and underrepresented among patients with melanoma. We also observed an overrepresentation of EAS patients among patients with upper gastrointestinal cancers (esophagogastric and hepatobiliary), head and neck cancer, and salivary gland carcinoma (Figure 3A). These differences were frequently associated with differences in the prevalence of specific subtypes within each general cancer type (Supplementary Figure S6). For instance, while cutaneous melanoma represents by far the most common type of melanoma in patients with EUR ancestry (60%), it accounted for only 14% of melanomas in patients with AFR ancestry as opposed to acral (45%) and mucosal (36%) melanoma, consistent with prior reports (21). Additionally, uterine endometrioid carcinoma (UEC) accounted for over 56% of all endometrial cancers in patients with EUR ancestry but only 23% of patients with AFR ancestry, with the more aggressive serous (USC) and carcinosarcoma/mixed Mullerian (UCS) subtypes representing 35% and 24% respectively of all endometrial cancers in patients with AFR ancestry, compared to 14% and 10% respectively in patients with EUR ancestry.

Figure 3: Cancer types and somatic associations.

Figure 3:

(A) Ancestry representation of patients in different cancer types. (B) Ancestry-gene associations in different cancer types. Associations with adjusted p-value <0.05 (after adjusting for sex, age and sample type) are highlighted. Samples with TMB score of > 20 or MSI score > 10 were excluded. (C) Gene alteration frequencies in Lung Adenocarcinoma. Two-tailed Fisher’s exact test was run to identify significant differences in gene alteration frequencies between EUR and other populations. Asterisks indicate significant p-values. ***: p-value < 0.001, **: p-value < 0.01, * p-value < 0.05

We next sought to identify associations between ancestry and somatic driver alterations. Recognizing that the different subtype frequencies might lead to spurious associations between ancestry and somatic mutations within general cancer types, we restricted our analyses to subtype-level comparisons, examining all cancer subtypes with 15 or more patients in at least one of the non-EUR populations. Since patients of EUR ancestry represent the majority of the cohort, we systematically compared the alteration frequencies of genes in EUR relative to other populations after adjusting for sex, age and disease status (primary vs. metastatic). Altogether, we identified 18 significant associations (FDR adjusted p-value < 0.05) (Figure 3B, Table 1, Supplementary Table S3).

Table 1: Gene-ancestry associations.

18 significant differences (FDR adjusted p-value < 0.05) were identified between the EUR cohort and another ancestral group. Alteration frequencies are displayed for both groups.

Cancer Subtype Code Gene Ancestry Group 2 Alteration frequency in EUR Alteration frequency in Group 2 Citations if previously reported
Breast Invasive Ductal Carcinoma IDC TP53 AFR 43.7% 61.5% (8,11,12,25,26)
Colon Adenocarcinoma COAD KRAS AFR 45.1% 62.9% (13,30)
Glioblastoma Multiforme GBM TERT EAS 87.4% 56.5%
Hepatocellular Carcinoma HCC RB1 EAS 3.4% 23.5% (31)
Hepatocellular Carcinoma HCC TERT EAS 67.2% 35.3%
Lung Adenocarcinoma LUAD EGFR EAS 23.2% 67.5% (9,10,23,45)
Lung Adenocarcinoma LUAD KRAS EAS 40.9% 10.3% (10,23,46)
Lung Adenocarcinoma LUAD EGFR SAS 23.2% 65.5% (22,45)
Lung Adenocarcinoma LUAD STK11 EAS 17.2% 3.1% (23)
Lung Adenocarcinoma LUAD EGFR AFR 23.2% 44.7% (47)
Lung Adenocarcinoma LUAD KEAP1 EAS 8.8% 2.6% (23)
Lung Adenocarcinoma LUAD EGFR ASJ 23.2% 29.4%
Lung Adenocarcinoma LUAD MDM2 SAS 5.9% 16.4%
Lung Adenocarcinoma LUAD KRAS SAS 40.9% 12.7% (46)
Lung Adenocarcinoma LUAD KRAS AFR 40.9% 24.4% (46)
Lung Squamous Cell Carcinoma LUSC TP53 EAS 88.7% 60%
Prostate Adenocarcinoma PRAD FOXA1 EAS 14.1% 35.1% (27,28)
Prostate Adenocarcinoma PRAD MYC AFR 6.9% 12.7% (29)

We found a number of significant associations within lung adenocarcinoma (LUAD) including higher frequency of EGFR mutations in both EAS and SAS compared to EUR, consistent with prior reports (9,22). Other associations in LUAD include lower frequency of KRAS mutations in both EAS and SAS, and lower frequency of STK11 and KEAP1 mutations in EAS (23). Given the influence of smoking status on somatic mutation profiles of lung tumors (24) and differences in smoking status by ancestry observed in our cohort (Supplementary Figure S7AC), we restricted our analysis to never-smokers (n=838 patients) and noted that while EGFR mutations remained significantly enriched in Asian populations, most other associations were no longer significant (Figure 3C). This remained true when including all patients in the analysis and adjusting for smoking status (Supplementary Figure S8). Among never-smokers, we also observed a higher frequency of EGFR mutations in AFR compared to EUR (69% vs 49%, Fisher’s exact test p=0.014), though we cannot exclude the possibility of ascertainment bias in our cohort. While tumor mutation burden (TMB) was significantly lower in ASJ, EAS and SAS patients with LUAD overall, these differences were not observed when restricting to only never-smokers (Supplementary Figure S7C).

Additionally, we observed a higher frequency of TP53 mutations in women of AFR ancestry with breast invasive ductal carcinoma (61% vs 44% in EUR), as has been previously reported (25). This was driven in part by the higher incidence of triple-negative breast cancer (TNBC) in patients of AFR ancestry (32% vs 17% in EUR) (Supplementary Figure S9A) as previously shown (7), the majority of whom carried concomitant TP53 alterations. However, even within the HR+/HER2-group, TP53 mutations were more common in patients with AFR ancestry (39%) compared to those of EUR ancestry (24%) (Fisher’s exact test p=0.005) or other populations (Supplementary Figure S9B), as has been previously reported (26). This trend was also reflected in patients with admixed ancestry, with women exhibiting 20-50% AFR ancestry and 50-80% AFR ancestry harboring intermediate but increasing rates of TP53 alterations (Supplementary Figure S10AB).

Consistent with previous studies, we found higher alteration frequency of FOXA1 in patients of EAS ancestry with prostate adenocarcinoma (PRAD) (27,28), MYC in patients of AFR ancestry with PRAD (27,29), KRAS in patients with AFR ancestry with colon adenocarcinoma (COAD) (30) and RB1 in patients with EAS ancestry with hepatocellular carcinoma (HCC) (31). We also observed lower frequency of TERT (primarily promoter) mutations in patients with EAS ancestry with glioblastoma multiforme (GBM) and in patients with EAS ancestry with HCC compared to those with EUR ancestry, which, to our knowledge, have not been described before.

Differences in prevalence of clinically actionable mutations by ancestry

Finally, we sought to compare the rates of clinically actionable alterations in different ancestral populations. While differences in actionable alterations partly reflect different cancer type and subtype distributions by ancestry, they nevertheless may contribute to cancer health disparities and reveal opportunities for targeted drug development. We annotated mutations using OncoKB (32), an in-house FDA-recognized clinical knowledge base, and considered mutations that were Level 1 (FDA recognized biomarker of FDA-approved treatment response), Level 2 (standard care biomarker of treatment response), and Level 3A (investigational treatment biomarker supported by compelling clinical evidence from the corresponding cancer type) to be actionable. Included in OncoKB Level 1 are second-order features that predict response to immune checkpoint inhibition: microsatellite instability (MSI) and tumor mutation burden (TMB) >10 mutations per megabase (TMB-High) (Supplementary Figure S11AC).

The proportion of patients with solid tumors harboring at least one oncogenic driver alteration was nearly identical across EUR, ASJ, AFR, EAS and SAS populations (93.5% - 94.5%) and slightly less in NAM (90.6%), although we are underpowered to detect differences of less than 10% in NAM due to its small sample size. The number of total driver mutations (median:4.0-4.0; mean:4.13-4.36) and somatic variants of unknown significance (median:3.0-3.0; mean:4.18-4.64) were also consistent across the five main ancestral groups, excluding TMB-high and MSI-high tumors. However, the proportion of patients with clinically actionable somatic alterations in our large real-world cohort was lower in AFR (29.6%) compared to EUR (33.1%) (Fisher’s exact test p=1.06e-03) (Figure 4A). Whereas the rates of actionable mutations in ASJ (34.3%) and EAS (34.4%) were comparable to EUR, we also observed lower rates of actionable alterations in NAM (29.5%) and SAS (29.8%) patients, though these differences were not statistically significant due to fewer individuals in these groups. MSI was most common in NAM and SAS (5.4% and 3.6%) and least common in EAS (2.4%) (Supplementary Figure S11C).

Figure 4: Distribution of levels of actionability across ancestries and cancer types.

Figure 4:

(A) Frequency of actionable mutations pan-cancer in the different populations. (B) Actionable mutations in different cancer types. Error bars show 95% confidence intervals.

The frequency of actionable alterations varied across cancer types, with population-specific differences (Figure 4B, Supplementary Figure S12). For example, melanoma, non-melanoma skin cancer, and endometrial cancer exhibited lower rates of actionable alterations in patients of AFR ancestry compared to those of EUR ancestry. In melanoma, only 12% of patients with AFR ancestry harbored actionable mutations (versus 73% of EUR) due to the much lower prevalence of cutaneous melanomas, where Level 1 BRAF V600E mutations and TMB-High signatures are enriched (Supplementary Figure S13AC). Similarly, the low rate of actionable mutations in patients with AFR ancestry with non-melanoma skin cancer (17%, versus 48% in EUR) is explained by the very low prevalence of cutaneous squamous cell carcinoma (n=2), which exhibit TMB-High signatures in 76% of patients across all ancestries (Supplementary Figure S14AC). However, these cancer types are relatively rare and account for only a small fraction of the difference in actionability observed between the EUR and AFR cohorts. In order to determine the contribution of cancer type and subtype differences to the higher rate of actionable mutations in patients with EUR ancestry, we estimated the frequency of actionable mutations in the EUR cohort if the distribution of cancer subtypes were adjusted to match the AFR cohort. In this case, the actionability rate decreased to 31.4%, almost exactly the midpoint between the two cohorts, suggesting that cancer subtype differences account for approximately half of the difference in actionability between EUR and AFR patients and that the remainder relates to differences in the somatic mutation profiles.

After adjusting for cancer subtype distributions, the predominant contributors to the lower rate of actionable mutations among patients with AFR ancestry were breast invasive ductal carcinoma (IDC) and colon adenocarcinoma (COAD). In breast cancer overall, 47% of patients with AFR ancestry had actionable mutations compared to 58% with EUR ancestry. Moreover, the rate of actionable mutations in IDC (80% of breast cancers in AFR), especially Level 1 alterations such as ERBB2 and PIK3CA, was lower in AFR compared to EUR. This imbalance is largely driven by the higher rate of TNBC in AFR, as well as a significantly lower frequency of PIK3CA mutations in AFR (26% vs. 42% in EUR, Fisher’s exact test p=0.001) in HR+/HER2-IDC samples (Supplementary Figure S15AF), as has been previously reported (21,22). In COAD, the imbalance is due to a lower prevalence of BRAF mutations (7% vs. 12%) and an MSI signature (5% vs. 13%) in the AFR cohort compared to the EUR cohort, whereas mutations in KRAS were significantly more common (Supplementary Figure S16AD) (13).

In contrast to the above cancer types, thyroid cancer exhibited a greater proportion of actionable mutations in patients with AFR ancestry (63%, versus 35% in EUR, Fisher’s exact test p=0.003) that was primarily driven by an enrichment of Level 3A mutations in NRAS (Supplementary Figure S17AC). BRAF mutations, by contrast, are Level 1 in anaplastic thyroid cancer (THAP) and Level 3B in all other subtypes and occurred more frequently in non-AFR populations. Recent studies have described that NRAS and BRAF mutant thyroid cancers appear to be etiologically different entities (33,34). In primary papillary thyroid cancer (THPA), mutations in NRAS and BRAF are usually mutually exclusive, with NRAS mutations enriched in the follicular variant of papillary thyroid carcinoma (FVPTC), whereas BRAF mutations associated with the classical-type and tall cell variant of THPA (33). The dichotomous relationship between BRAF and NRAS is maintained in advanced poorly differentiated (THPD) and THAP cancers (34). A cross-sectional analysis demonstrated that FVPTC were more common among Black Americans than White Americans with papillary thyroid carcinoma (35). This observation led us to examine the relative rates of activating NRAS and BRAF mutations across all thyroid cancers. Patients of AFR ancestry exhibited the highest rate of NRAS mutations of any ancestral population and the lowest rate of BRAF mutations of any ancestral population in every subtype examined: papillary, anaplastic, and poorly differentiated thyroid carcinoma (Supplementary Figure S17D). Overall, NRAS mutations were enriched 2.9-fold, and BRAF mutations were depleted 2.5-fold, in patients of AFR ancestry compared to those of EUR. This trend was also observed when comparing self-reported White and self-reported Black patients from MSK, as well as from other prospectively sequenced clinical cohorts represented in AACR GENIE (36) (Supplementary Figure S17E).

Discussion

Here we show that genetic ancestry can be reliably inferred from targeted sequencing data using only a few thousand markers. We developed a framework for ancestry inference using the FDA-authorized MSK-IMPACT panel, the results of which remained consistent across different panel versions and demonstrated high concordance with self-reported race. Since targeted NGS panels continue to be used widely, especially in the clinical setting, this enables the interrogation of large prospective cancer cohorts (13). MSK-IMPACT alone is used to sequence more than 12,000 patients per year, representing an important and growing resource to study cancer disparities and explore potential genetic contributions to cancer prevalence and outcomes in diverse populations.

While genetically-inferred ancestry is not equivalent to race, nor does it capture the non-genetic environmental and socioeconomic factors that contribute to cancer health disparities, the high overall concordance between genetic ancestry and self-reported race in our cohort of 45,157 patients provides important validation of the accuracy of our method. Moreover, the direct inference of genetic ancestry from NGS data holds key advantages for enhancing population-based cancer research, as self-reported race and ethnicity can be incomplete or inaccurate in cancer registries. For example, in our real-world cohort, self-reported race was unknown or unavailable for 3,014 (7%) individuals. Categorical variables for reporting race also typically match census designations established by the U.S. Office of Management and Budget and can be overly broad (for example, ‘Asian’), whereas South Asian and East Asian ancestry are distinguishable by this method. Additionally, in contrast to self-reported race, ancestry is a continuous variable and can be used to quantitatively capture proportional contributions from different populations in admixed individuals.

Apart from continental-level ancestry, we are also able to identify genetic markers associated with Ashkenazi Jewish (ASJ) ancestry that have high accuracy. This is important clinically, as individuals of ASJ ancestry are at increased risk of developing several types of cancer, and determination of ASJ status is an integral feature of germline risk assessment. With the selection of specific markers and additional testing, we anticipate the inference of additional sub-populations, such as Chinese and Japanese ancestral groups within the East Asian population, as a future direction. As granular ancestry inference becomes more precise, the ethical and privacy issues around sharing this information with patients who may prefer not to know, especially when related to germline cancer risk, must be considered.

We also observed that ancestry inference was comparable between tumors and their matched normal samples, a finding with important implications for the majority of academic and commercial NGS laboratories that perform unmatched tumor testing. Standard strategies such as use of population databases to filter out known germline variants are less accurate for individuals of non-European ancestries (37,38). The inference of genetic ancestry from tumor-only sequencing could inform the probability that a given variant is somatic or germline in origin, enabling more accurate filters for diverse populations and guiding post-test genetic counseling and confirmatory testing when variants that may be germline in etiology are detected. Additionally, given bottlenecks associated with the clinical interpretation of germline variants of unknown significance (VUSs), which often incorporates knowledge of frequency differences in diverse populations, our uniform and reproducible approach to ancestry inference could help to accelerate variant interpretation and reclassification, especially in non-European patients wherein germline VUSs occur at a higher rate (39,40).

Due to the large and increasing number of patients receiving targeted clinical NGS testing, the inclusion of ancestry enables the identification of differences in somatic features by ancestry. In our cohort of 45,157 patients sequenced by MSK-IMPACT, we were able to recapitulate known gene-ancestry associations such as a higher frequency of somatic mutations in EGFR in patients of EAS and SAS ancestries with lung adenocarcinoma, FOXA1 in patients of EAS ancestry with prostate adenocarcinoma, and TP53 in patients of AFR ancestry with breast invasive ductal carcinoma. Additionally, we observed novel putative associations, such as a lower frequency of TERT mutations in patients of EAS ancestry with glioblastoma (GBM). We also observed fewer BRAF and more NRAS mutations in patients of AFR ancestry with thyroid cancers. While we report 18 significant somatic alterations associated with ancestry, we also find nearly 10 of them to be explained by clinical attributes such as receptor status (breast) and smoking (lung). Further studies are required to determine if some of the remaining associations could be explained by other clinical, demographic, and disease-specific covariates that are not available in our study cohort. For example, it remains to be seen if the lower frequency of TERT alterations in hepatocellular carcinomas of East Asian ancestry could be explained by factors such as hepatitis infection and alcohol consumption (41).

Using our institutional FDA-recognized knowledge base, OncoKB, we found that while the rates of oncogenic driver mutations were comparable across the different populations, the overall rate of therapeutically targetable mutations was lower in AFR, NAM and SAS (30% in each), compared to EAS (34%) and EUR (33%) populations. There are several factors that may contribute to this difference, including differences in cancer type and subtype distributions, stage at cancer diagnosis, environmental factors, and access to healthcare. Moreover, this difference in overall actionability is modest, and its implications for diagnosis are unclear. Nonetheless, these findings represent the real-world clinical experience at our cancer center over an 8-year period, highlighting an urgent need for the development and approval of drugs targeting genomic alterations more common among underrepresented populations.

One limitation of this study is the ascertainment bias of our cohort recruited from a tertiary cancer care center. While all patients at MSK with advanced disease are generally eligible to receive MSK-IMPACT testing, the composition of our cohort may not faithfully represent the distribution of cancer types or tumor stages observed in the broader community. Additionally, our cohort exhibits an imbalanced underrepresentation of non-European populations. Similar to TCGA, individuals with European ancestry (including ASJ) make up more than 76% of this study. However, the absolute numbers of these underrepresented populations in this study are nonetheless greater (2,193 AFR, 2,505 EAS, 821 SAS, 160 NAM and 5,012 ADM) compared to TCGA and other studies. Moreover, this center-based patient cohort is growing at a rate of >12,000 patients per year, with community-based partnerships and ongoing initiatives to provide access to testing to underserved populations, which promise to increase the diversity of our institutional cohort.

Another limitation of this study is that we excluded data from patients with admixed ancestry for gene-cancer-subtype association and clinical actionability analyses. Admixed patients make up 11% of the total cohort. Moreover, 34% of self-reported Black/African-American patients and 72% of self-reported Hispanic/Latinx patients exhibited genetic admixture. Ultimately, accurate estimates of local (locus-specific) ancestry will be necessary to understand the contribution of genetic ancestry in these patients. Although local ancestry is typically inferred using genome-wide marker data, Carrot-Zhang et al. (10) have shown that local ancestry may be inferred from targeted panel data by utilizing data from off-target reads. Future efforts to infer local ancestry from MSK-IMPACT data promise to further elucidate genetic contributions to somatic processes, particularly in these admixed populations.

In summary, by establishing a workflow for accurate global ancestry inference from targeted NGS data and applying it to a large real-world cohort profiled using an FDA-authorized clinical sequencing test and annotated using an FDA-recognized clinical knowledge base, we have identified population-specific differences in somatic patterns and actionable genomic alterations. We anticipate that this dataset will serve as an important resource for cancer disparities research.

Methods

Marker selection

For ADMIXTURE analyses, we first selected over 12 million genome-wide autosomal bi-allelic SNP markers with minor allele frequency of greater than 1% in the 1000 genomes cohort using PLINK v1.9 (42). For selecting MSK-IMPACT markers, we further restricted the markers to IMPACT505 probe regions. This resulted in 10,013 markers (Supplementary Table S4).

For selecting markers for Ashkenazi Jewish (ASJ) ancestry determination, we selected bi-allelic autosomal SNPs from gnomAD (43) genomes r2.1.1 that were within regions covered by IMPACT468, and for which

  • AF_asj in gnomAD genome > 0.01

  • AF, AF_nfe, AF_eas, AF_afr in gnomAD genomes < 0.001

  • For markers seen with AF_asj > 0 in gnomAD exomes r2.1.1, AF_asj/(max(AF, AF_asj, AF_nfe, AF_eas, AF_afr, AF_sas)) > 2 in gnomAD exomes

where AF_asj, AF_nfe, AF_eas, AF_sas and AF_afr are the alternate allele frequencies in samples of Ashkenazi Jewish, Non-Finnish Europeans, East Asian, South Asian and African ancestries, and AF is the overall alternate allele frequency. We then genotyped these markers in 1188 individuals with known Ashkenazi Jewish ancestry based on manually-curated family histories, and ran LD pruning using PLINK v1.9 (–indep-pairwise 1000 100 0.2). This resulted in 282 SNP markers (Supplementary Table S5).

Reference sample selection

We used samples from the 1000 Genomes Project (1KGP) as reference. 1KGP comprises data from 26 population groups from 5 different continental populations: African (AFR), American (AMR), East Asian (EAS), European (EUR) and South Asian (SAS) (Supplementary Table S6). However, certain groups in this dataset include individuals with recent ancestral admixture. For example, AMR individuals represent recent admixtures of African, European and Native American populations. Similarly, Africans Caribbean in Barbados (ACB) and Americans of African Ancestry in Southwest USA (ASW) within the AFR super population are recent admixtures of Europeans and Africans. In order to create a clean reference panel of individuals representing single ancestral populations, we identified and removed recently admixed individuals from the 1KGP dataset.

For this, we first ran unsupervised ADMIXTURE (16) v1.3 using 841,707 SNP markers that were a subset of the genome-wide SNP markers (see Marker Selection) in linkage equilibrium (PLINK v1.9 --indep-pairwise 1000 100 0.2), and with number of ancestral populations (K) set to 5.

The 5 populations (P1, P2, P3, P4 and P5) determined by ADMIXTURE captured the EAS, AFR, SAS, NAM and EUR ancestries (Supplementary Figure S1). Any sample from the EAS, AFR, SAS, admixed Americans (AMR) and EUR super-populations of 1KGP that were estimated to have less than 0.8 fraction of P1, P2, P3, P4 and P5 populations as determined by ADMIXTURE were excluded from our reference for all subsequent supervised admixture analyses.

The resulting 1KGP reference panel consisted of 2,129 samples from 24 populations. We also ran principal component analysis on the data and observed distinct separation among all 5 different ancestral groups (Supplementary Figure S1C).

ADMIXTURE analysis on Simons Genome Diversity Project

We downloaded 279 VCF files from SGDP that contained genotype information at every single position. For each SGDP sample, we extracted genotypes for the selected genome-wide SNP marker sets (see Marker Selection) using Genome Analysis Toolkit (GATK) (44) v4.1.9.0 SelectVariants.

We ran GATK v4.1.9.0 Pileup on 100 random samples sequenced on different MSK-IMPACT versions to genotype the selected 10,013 IMPACT SNP markers. For this we required a minimum mapping quality (MQ) of 10, minimum base quality (BQ) of 20, and minimum read depth (for reads that met the MQ and BQ thresholds) of 10. For each MSK-IMPACT version, we selected markers that could be genotyped in at least 80% of the samples. We then extracted genotypes for each of these MSK-IMPACT-specific SNP sets using GATK’s SelectVariants.

We then merged genotype calls from SGDP and 1KGP using PLINK v1.9 and ran linkage disequilibrium pruning (--indep-pairwise 1000 100 0.2). This resulted in 716,011 genome-wide markers, 5,378 IMPACT-505 markers, 4,336 IMPACT-468 markers, 3,990 IMPACT-410 markers, 3,602 IMPACT-341 markers and 3,331 IMPACT-HEME markers (Supplementary Table S4).

We ran supervised ADMIXTURE with K=5 and selected reference samples from 1KGP to estimate ancestral fractions of EAS, AFR, SAS, NAM and EUR for all SGDP samples.

ADMIXTURE analysis on MSK-IMPACT data

We analyzed available MSK-IMPACT data from 45,157 patients who underwent prospective sequencing at MSK as part of their routine clinical care and whose data is also included in the AACR GENIE v10.1-public cohort (36). We used data from the matched normal which was available for 99.2% of the patients in the cohort, and from the tumor sample for the remaining 357 patients. In case there were multiple matched normal samples for a patient, we picked the most recent sample. 6% of all analyzed samples were sequenced on MSK’s IMPACT-HEME panel, 5% on IMPACT341, 19% on IMPACT410, 68% on IMPACT468 and 2% on IMPACT505.

For each sample, we ran GATK v4.1.9.0 Pileup to genotype the selected 10,013 IMPACT SNP markers. For this we required a MQ>=10, BQ>=20, and minimum read depth (for reads that met the MQ and BQ thresholds) of 10. We excluded markers that were not covered in the sample, merged genotype calls on remaining markers with those from 1KGP reference samples, and performed linkage-disequilibrium pruning using PLINK v1.9 (--indep-pairwise 1000 100 0.2). We ran supervised ADMIXTURE to estimate ancestral proportions of AFR, EUR, EAS, NAM and SAS for the patients. Finally, we assigned ancestry labels to each patient. If the ancestral fraction of any of the populations was ≥0.8, the patient was assigned that population label. From ADMIXTURE run on Simons Genome Diversity Project, it was obvious that for individuals from populations not represented in the reference panel, ADMIXTURE would infer them to be admixed. Therefore, if the ancestral fraction of all populations was less than 0.8, the patient was labeled Admixed/Other (ADM). However, we think that the majority of such patients in the MSK-IMPACT cohort would be Admixed and not from a different population.

Ashkenazi Jewish ancestry determination

We ran GATK v4.1.9.0 Pileup to call genotypes on 282 markers selected for ASJ ancestry determination. We enumerated the heterozygous (het) and homozyous alternate (hom-alt) genotyped markers for each patient. If the number of het/hom-alt genotyped markers was ≥3, the patient was assigned an ASJ label.

Ancestry label assignment

The patients labeled EUR in the ADMIXTURE analysis were further divided into Ashkenazi Jewish European (ASJ) and non-Ashkenazi Jewish European (EUR) based on their ASJ labels. All other patients were assigned their ADMIXTURE analysis labels.

Comparison of ADMIXTURE results

To compare two different runs of ADMIXTURE for the same patient (e.g. with different marker sets), we calculated cosine similarity (cossim) between the two results.

cossim=iSnAiBiiSnAiiSnBi

where

  • S={AFR,EAS,EUR,NAM,SAS}

  • Ai ancestral fraction of i in ADMIXTURE run1

  • Bi ancestral fraction of i in ADMIXTURE run2

Sample selection for subtype prevalence, gene-ancestry associations and actionable mutation analyses

There were sometimes multiple tumor samples from the same patient. In that case we used only one sample. For this, we preferred samples with highest tumor purity, primary sample over metastatic sample, and earlier timepoint sample over later timepoints.

Gene-ancestry associations in different cancer subtypes

We used multivariate logistic regression models to identify genetic alterations associated with ancestry in each cancer subtype. The binary alteration status of a gene is defined as 1 when the gene has any somatic SNV, indel, focal CNA or fusion event, and 0 otherwise. For each cancer subtype, we excluded samples that were MSI-H (MSI score > 10), and those with TMB scores of > 20, or if they were labeled as ADM. Additionally, we excluded populations that had fewer than 15 samples for that cancer subtype.

In each cancer subtype, we tested the alteration status of each gene while controlling for age at diagnosis, disease status (primary or metastasis) and sex (where applicable). EUR served as the reference level (coded as 0) in all regressions. P-values were adjusted for multiple hypothesis testing with the Benjamini-Hochberg procedure (Supplementary Table S3).

We repeated the analysis, this time additionally controlling for smoking status for Lung Adenocarcinoma (LUAD) and Small Cell Lung Cancer (SCLC), and hormone receptor status for Breast Invasive Ductal Carcinoma (IDC), Breast Invasive Lobular Carcinoma (ILC) and Breast Mixed Ductal and Lobular Carcinoma (MDLC).

Clinically actionable mutation frequency analysis

We defined samples with clinically actionable mutations as those with at least one Level 1-3A mutations as defined in OncoKB (32). Solid tumor samples with Microsatellite instability (MSI-H) were considered to have Level 1 biomarkers. Similarly, solid tumor samples with tumor mutation burden score of > 10 were considered to have Level 1 biomarkers, but were labeled as ‘Level 1 (TMB-H)’. For each sample, we used the highest OncoKB level for actionable mutation comparisons. We ran exact binomial tests to compute 95% confidence intervals for the rates of clinically actionable mutations in each population.

Data Availability

The human sequence raw data generated in this study are protected and not publicly available due to patient privacy requirements but are available upon reasonable request from the corresponding author subject to institutional approvals. Tumor somatic mutations and associated clinical data for all patients in this study are available through AACR GENIE (v10.1-public cohort). Other data generated in this study are available within the article and its supplementary data files.

Supplementary Material

1
2

Statement of Significance.

We performed a comprehensive analysis of ancestral associations with somatic mutations in a real-world pan-cancer cohort, including >5000 non-European individuals. Using an FDA-authorized tumor sequencing panel and an FDA-recognized oncology knowledgebase, we detected differences in the prevalence of clinically actionable alterations, potentially contributing to health care disparities affecting underrepresented populations.

Acknowledgments

We gratefully acknowledge members of the Marie Josée and Henry R. Kravis Center for Molecular Oncology, the Molecular Diagnostics Service in the Department of Pathology and Laboratory Medicine, and the Berger lab for their contributions. This work was supported by National Institutes of Health awards P30 CA008748 and R01 CA227534 (to M.F.B.) and the Sigrid Jusélius Foundation (to M.M.).

Conflicts of Interest

A.R. Brannon has stock ownership in Johnson & Johnson. P. Razavi has institutional grant/funding from Grail, Illumina, Novartis, Epic Sciences, ArcherDx and Consultation/Ad board/Honoraria from Novartis, Foundation Medicine, AstraZeneca, Epic Sciences, Inivata, Natera, and Tempus. M.D. Hellmann reports grants from BMS; and personal fees from Achilles; Adagene; Adicet; Arcus; AstraZeneca; Blueprint; BMS; DaVolterra; Eli Lilly; Genentech/Roche; Genzyme/Sanofi; Janssen; Immunai; Instil Bio; Mana Therapeutics; Merck; Mirati; Natera; Pact Pharma; Shattuck Labs; and Regeneron; as well as equity options from Factorial, Immunai, Shattuck Labs, Arcus, and Avail Bio. A patent filed by Memorial Sloan Kettering related to the use of tumor mutational burden to predict response to immunotherapy (PCT/US2015/062208) is pending and licensed by PGDx. J. Fagin has received grants from Eisai and personal fees from Loxo Oncology outside the submitted work. K. Offit reports being a founder and shareholder of AnaNeo Therapeutics Incorporated. M. Ladanyi has received advisory board compensation from Boehringer Ingelheim, AstraZeneca, Bristol-Myers Squibb, Takeda, Bayer, and Paige.AI, and research grants from Helsinn Healthcare, Elevation Oncology Inc., Merus, and LOXO Oncology unrelated to the current study. D.B. Solit has consulted with and received honoraria from Pfizer, Loxo/Lilly Oncology, Illumina, Vividion Therapeutics, Scorpion Therapeutics, Fore Biotherapeutics and BioBridge Pharma. Z.K. Stadler has an immediate family member who serves as a consultant in Ophthalmology for Alcon, Adverum, Gyroscope Therapeutics Ltd, Neurogene and RegenexBio, outside the submitted work. M.F. Berger reports receiving research funding from Grail and advisory board activities for Eli Lilly, AstraZeneca, and PetDx. Subsequent to the completion of this work, H.A. Rizvi, M.D. Hellmann, and A. Zehir began as employees at AstraZeneca, Inc. The remaining authors declare no competing interests.

References

  • 1.Zavala VA, Bracci PM, Carethers JM, Carvajal-Carmona L, Coggins NB, Cruz-Correa MR, et al. Cancer health disparities in racial/ethnic minorities in the United States. Br J Cancer 2021;124(2):315–32 doi 10.1038/s41416-020-01038-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ward E, Jemal A, Cokkinides V, Singh GK, Cardinez C, Ghafoor A, et al. Cancer disparities by race/ethnicity and socioeconomic status. CA Cancer J Clin 2004;54(2):78–93 doi 10.3322/canjclin.54.2.78. [DOI] [PubMed] [Google Scholar]
  • 3.Ellis L, Canchola AJ, Spiegel D, Ladabaum U, Haile R, Gomez SL. Racial and Ethnic Disparities in Cancer Survival: The Contribution of Tumor, Sociodemographic, Institutional, and Neighborhood Characteristics. J Clin Oncol 2018;36(1):25–33 doi 10.1200/JCO.2017.74.2049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Mersha TB, Abebe T. Self-reported race/ethnicity in the age of genomic research: its potential impact on understanding health disparities. Hum Genomics 2015;9:1 doi 10.1186/s40246-014-0023-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Freedman ML, Haiman CA, Patterson N, McDonald GJ, Tandon A, Waliszewska A, et al. Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc Natl Acad Sci U S A 2006;103(38):14068–73 doi 10.1073/pnas.0605832103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2022. CA Cancer J Clin 2022;72(1):7–33 doi 10.3322/caac.21708. [DOI] [PubMed] [Google Scholar]
  • 7.Dietze EC, Sistrunk C, Miranda-Carboni G, O’Regan R, Seewaldt VL. Triple-negative breast cancer in African-American women: disparities versus biology. Nat Rev Cancer 2015;15(4):248–54 doi 10.1038/nrc3896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Carey LA, Perou CM, Livasy CA, Dressler LG, Cowan D, Conway K, et al. Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. JAMA 2006;295(21):2492–502 doi 10.1001/jama.295.21.2492. [DOI] [PubMed] [Google Scholar]
  • 9.Shigematsu H, Lin L, Takahashi T, Nomura M, Suzuki M, Wistuba, II, et al. Clinical and biological features associated with epidermal growth factor receptor gene mutations in lung cancers. J Natl Cancer Inst 2005;97(5):339–46 doi 10.1093/jnci/dji055. [DOI] [PubMed] [Google Scholar]
  • 10.Carrot-Zhang J, Soca-Chafre G, Patterson N, Thorner AR, Nag A, Watson J, et al. Genetic Ancestry Contributes to Somatic Mutations in Lung Cancers from Admixed Latin American Populations. Cancer Discov 2021;11(3):591–8 doi 10.1158/2159-8290.CD-20-1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Carrot-Zhang J, Chambwe N, Damrauer JS, Knijnenburg TA, Robertson AG, Yau C, et al. Comprehensive Analysis of Genetic Ancestry and Its Molecular Correlates in Cancer. Cancer Cell 2020;37(5):639–54 e6 doi 10.1016/j.ccell.2020.04.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yuan J, Hu Z, Mahal BA, Zhao SD, Kensler KH, Pi J, et al. Integrated Analysis of Genetic Ancestry and Genomic Alterations across Cancers. Cancer Cell 2018;34(4):549–60 e9 doi 10.1016/j.ccell.2018.08.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Myer PA, Lee JK, Madison RW, Pradhan K, Newberg JY, Isasi CR, et al. The Genomics of Colorectal Cancer in Populations with African and European Ancestry. Cancer Discov 2022;12(5):1282–93 doi 10.1158/2159-8290.CD-21-0813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gusev A, Groha S, Taraszka K, Semenov YR, Zaitlen N. Constructing germline research cohorts from the discarded reads of clinical tumor sequences. Genome Med 2021;13(1):179 doi 10.1186/s13073-021-00999-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zehir A, Benayed R, Shah RH, Syed A, Middha S, Kim HR, et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat Med 2017;23(6):703–13 doi 10.1038/nm.4333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature 2015;526(7571):68–74 doi 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 2016;538(7624):201–6 doi 10.1038/nature18964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res 2009;19(9):1655–64 doi 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kenny EE, Pe’er I, Karban A, Ozelius L, Mitchell AA, Ng SM, et al. A genome-wide scan of Ashkenazi Jewish Crohn’s disease suggests novel susceptibility loci. PLoS Genet 2012;8(3):e1002559 doi 10.1371/journal.pgen.1002559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Struewing JP, Hartge P, Wacholder S, Baker SM, Berlin M, McAdams M, et al. The risk of cancer associated with specific mutations of BRCA1 and BRCA2 among Ashkenazi Jews. N Engl J Med 1997;336(20):1401–8 doi 10.1056/NEJM199705153362001. [DOI] [PubMed] [Google Scholar]
  • 21.McLaughlin CC, Wu XC, Jemal A, Martin HJ, Roche LM, Chen VW. Incidence of noncutaneous melanomas in the U.S. Cancer 2005;103(5):1000–7 doi 10.1002/cncr.20866. [DOI] [PubMed] [Google Scholar]
  • 22.Chougule A, Prabhash K, Noronha V, Joshi A, Thavamani A, Chandrani P, et al. Frequency of EGFR mutations in 907 lung adenocarcioma patients of Indian ethnicity. PLoS One 2013;8(10):e76164 doi 10.1371/journal.pone.0076164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Chen J, Yang H, Teo ASM, Amer LB, Sherbaf FG, Tan CQ, et al. Genomic landscape of lung adenocarcinoma in East Asians. Nat Genet 2020;52(2):177–86 doi 10.1038/s41588-019-0569-6. [DOI] [PubMed] [Google Scholar]
  • 24.Imielinski M, Berger AH, Hammerman PS, Hernandez B, Pugh TJ, Hodis E, et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 2012;150(6):1107–20 doi 10.1016/j.cell.2012.08.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Keenan T, Moy B, Mroz EA, Ross K, Niemierko A, Rocco JW, et al. Comparison of the Genomic Landscape Between Primary Breast Cancer in African American Versus White Women and the Association of Racial Differences With Tumor Recurrence. J Clin Oncol 2015;33(31):3621–7 doi 10.1200/JCO.2015.62.2126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Pitt JJ, Riester M, Zheng Y, Yoshimatsu TF, Sanni A, Oluwasola O, et al. Characterization of Nigerian breast cancer reveals prevalent homologous recombination deficiency and aggressive molecular features. Nat Commun 2018;9(1):4181 doi 10.1038/s41467-018-06616-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Stopsack KH, Nandakumar S, Arora K, Nguyen B, Vasselman SE, Nweji B, et al. Differences in Prostate Cancer Genomes by Self-reported Race: Contributions of Genetic Ancestry, Modifiable Cancer Risk Factors, and Clinical Factors. Clin Cancer Res 2022;28(2):318–26 doi 10.1158/1078-0432.CCR-21-2577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Li J, Xu C, Lee HJ, Ren S, Zi X, Zhang Z, et al. A genomic and epigenomic atlas of prostate cancer in Asian populations. Nature 2020;580(7801):93–9 doi 10.1038/s41586-020-2135-x. [DOI] [PubMed] [Google Scholar]
  • 29.Koga Y, Song H, Chalmers ZR, Newberg J, Kim E, Carrot-Zhang J, et al. Genomic Profiling of Prostate Cancers from Men with African and European Ancestry. Clin Cancer Res 2020;26(17):4651–60 doi 10.1158/1078-0432.CCR-19-4112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kang M, Shen XJ, Kim S, Araujo-Perez F, Galanko JA, Martin CF, et al. Somatic gene mutations in African Americans may predict worse outcomes in colorectal cancer. Cancer Biomark 2013;13(5):359–66 doi 10.3233/CBM-130366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Yao S, Johnson C, Hu Q, Yan L, Liu B, Ambrosone CB, et al. Differences in somatic mutation landscape of hepatocellular carcinoma in Asian American and European American populations. Oncotarget 2016;7(26):40491–9 doi 10.18632/oncotarget.9636. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Chakravarty D, Gao J, Phillips SM, Kundra R, Zhang H, Wang J, et al. OncoKB: A Precision Oncology Knowledge Base. JCO Precis Oncol 2017; 1:PO.17.00011 doi 10.1200/PO.17.00011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Cancer Genome Atlas Research N. Integrated genomic characterization of papillary thyroid carcinoma. Cell 2014;159(3):676–90 doi 10.1016/j.cell.2014.09.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Landa I, Ibrahimpasic T, Boucai L, Sinha R, Knauf JA, Shah RH, et al. Genomic and transcriptomic hallmarks of poorly differentiated and anaplastic thyroid cancers. J Clin Invest 2016;126(3):1052–66 doi 10.1172/JCI85271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Tang J, Kong D, Cui Q, Wang K, Zhang D, Liao X, et al. Racial disparities of differentiated thyroid carcinoma: clinical behavior, treatments, and long-term outcomes. World J Surg Oncol 2018;16(1):45 doi 10.1186/s12957-018-1340-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Consortium APG. AACR Project GENIE: Powering Precision Medicine through an International Consortium. Cancer Discov 2017;7(8):818–31 doi 10.1158/2159-8290.CD-17-0151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Halperin RF, Carpten JD, Manojlovic Z, Aldrich J, Keats J, Byron S, et al. A method to reduce ancestry related germline false positives in tumor only somatic variant calling. BMC Med Genomics 2017;10(1):61 doi 10.1186/s12920-017-0296-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Manrai AK, Funke BH, Rehm HL, Olesen MS, Maron BA, Szolovits P, et al. Genetic Misdiagnoses and the Potential for Health Disparities. N Engl J Med 2016;375(7):655–65 doi 10.1056/NEJMsa1507092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 2015;17(5):405–24 doi 10.1038/gim.2015.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Popejoy AB, Crooks KR, Fullerton SM, Hindorff LA, Hooker GW, Koenig BA, et al. Clinical Genetics Lacks Standard Definitions and Protocols for the Collection and Use of Diversity Measures. Am J Hum Genet 2020;107(1):72–82 doi 10.1016/j.ajhg.2020.05.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Nault JC, Mallet M, Pilati C, Calderaro J, Bioulac-Sage P, Laurent C, et al. High frequency of telomerase reverse-transcriptase promoter somatic mutations in hepatocellular carcinoma and preneoplastic lesions. Nat Commun 2013;4:2218 doi 10.1038/ncomms3218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007;81(3):559–75 doi 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alfoldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 2020;581(7809):434–43 doi 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010;20(9):1297–303 doi 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Midha A, Dearden S, McCormack R. EGFR mutation incidence in non-small-cell lung cancer of adenocarcinoma histology: a systematic review and global map by ethnicity (mutMapII). Am J Cancer Res 2015;5(9):2892–911. [PMC free article] [PubMed] [Google Scholar]
  • 46.Steuer CE, Behera M, Berry L, Kim S, Rossi M, Sica G, et al. Role of race in oncogenic driver prevalence and outcomes in lung adenocarcinoma: Results from the Lung Cancer Mutation Consortium. Cancer 2016;122(5):766–72 doi 10.1002/cncr.29812. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Benbrahim Z, Antonia T, Mellas N. EGFR mutation frequency in Middle East and African non-small cell lung cancer patients: a systematic review and meta-analysis. BMC Cancer 2018;18(1):891 doi 10.1186/s12885-018-4774-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2

Data Availability Statement

The human sequence raw data generated in this study are protected and not publicly available due to patient privacy requirements but are available upon reasonable request from the corresponding author subject to institutional approvals. Tumor somatic mutations and associated clinical data for all patients in this study are available through AACR GENIE (v10.1-public cohort). Other data generated in this study are available within the article and its supplementary data files.

RESOURCES