Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2026 Feb 13;16:8839. doi: 10.1038/s41598-026-39644-8

Association of germline variants with KRAS-mutation status in colorectal cancer

Nijole Pollock Tjader 1,2, Johnny Ramroop 3, Tanish Gandhi 4, Cara Dauch 1, Owen Meadows 1,5, Patrick Stevens 2, Rachel Pearlman 6, Heather Hampel 7, Elom K Aglago 8, Sonja I Berndt 9, Amanda Bloomer 10, Hermann Brenner 11,12,13, Daniel D Buchanan 14,15, Peter T Campbell 16, Yin Cao 17,18,19, Andrew T Chan 20,21,22, Iona Cheng 23, Niki Dimou 24, David A Drew 21,22, Amy J French 25, Peter Georgeson 14, Marios Giannakis 26,27,28, Graham G Giles 29,30,31, Maria Gomez 10,32, Stephen B Gruber 33, Michael Hoffmeister 11, Wen-Yi Huang 9, Meredith A J Hullar 34, Jeroen R Huyghe 34, Nicole Loroña 35, Victor Moreno 36,37,38,39, Christina C Newton 40, Jonathan A Nowak 41, Mireia Obón-Santacana 36,37,38, Shuji Ogino 27,41,42, Andrew Pellatt 43, Anita R Peoples 40, Jennifer B Permuth 10,32, Stephanie L Schmit 44,45, Robert E Schoen 46, Erin M Siegel 10, Robert S Steinfelder 34, Wei Sun 34, Jamie K Teer 47, Claire E Thomas 34, Quang M Trinh 48, Konstantinos Tsilidis 8,49, Tomotaka Ugai 41,42,50, Caroline Y Um 40, Bethany Van Guelpen 51,52, Syed H Zaidi 48, Jane Figueiredo 53,54, Ulrike Peters 34, Amanda I Phipps 34,55, Joseph Paul McElroy 56, Amanda Ewart Toland 1,6,
PMCID: PMC12982500  PMID: 41688657

Abstract

Somatic mutations in KRAS are a common driver of colorectal cancer (CRC) and present at different frequencies by race, sex, tumor site, ethnicity, and genetic similarity. Inherited germline variants may influence tumor somatic mutation frequency by altering mutation or DNA repair processes or altering cellular, immunological and/or microenvironmental responses after a mutation. We hypothesized that the germline genetic background modifies somatic KRAS mutation frequency in CRC. To test this, we performed a genome-wide association study (GWAS) in 7071 individuals with CRC, using KRAS mutation status as the phenotype. Single-nucleotide variants were chosen for validation analyses based on P values from the discovery GWAS, predicted in silico functional effects, and proximity to genes with potential cancer relevance. A validation analysis of 101 SNVs of interest was performed in 2482 individuals. No SNVs were significantly associated with KRAS-mutant CRC (P value < 0.0005). One variant rs73067863-T showed a non-significant exploratory association with fewer KRAS-mutant tumors in the combined sample (P value = 9.7 × 10–7, OR = 0.75). Follow-up studies are needed to determine if these or other germline variants impact population differences in KRAS mutations in CRC.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-026-39644-8.

Keywords: Colorectal cancer, KRAS, GWAS, Somatic mutations, Germline variants

Subject terms: Cancer epidemiology, Cancer genomics, Colorectal cancer, Oncogenes, Cancer genomics, Genetic association study

Introduction

Somatic mutations in KRAS are a common driver observed in 30–50% of colorectal cancers (CRC), causing extended tumor cell survival, elevated proliferation, altered metabolism, and metastasis through multiple downstream effector pathways1. CRCs with KRAS mutations are more likely to metastasize2, show decreased overall survival3, and be resistant to EGFR inhibitors4. Although CRCs with KRAS mutations were once considered undruggable, newer therapeutics that directly inhibit mutant KRAS have shown some success in combination with other drugs in treating CRC5,6.

Multiple studies have evaluated the frequency of somatic mutations in populations of different self-reported race/ethnicity (e.g. Black) and different genetic similarity (e.g. genetic ancestry). KRAS mutations in CRC are more common in Black individuals compared to Hispanic and non-Hispanic White individuals. In a study of KRAS mutations by self-reported race, non-Hispanic White individuals were 36% less likely to have KRAS-mutant CRC tumors compared to non-Hispanic Black individuals7. By genetic similarity, a cohort study of 18,761 individuals with CRC and available KRAS mutation data found that individuals with genetic similarity adjacent to African reference populations (AfA) were significantly more likely to have KRAS mutations compared to individuals with genetic similarity adjacent to European reference populations (EuA)8. KRAS mutations in CRC also vary in frequency by sex (more common in females9) and tumor location (more common in proximal CRCs10). Biological mechanisms resulting in the observed associations between KRAS mutations in CRC and race, genetic similarity, sex, and tumor location remain unknown.

Although most somatic driver mutations in cancer occur by chance, some inherited germline variants are associated with tumor mutation burden (TMB) and mutational signatures. For example, Lynch Syndrome is driven by pathogenic germline variants in DNA mismatch repair genes, leading to CRC tumors with high TMB and high microsatellite instability (MSI-H)11. After a somatic mutation event, germline effects on the cellular microenvironment and immune activity can influence whether the cell with the mutation continues to divide and whether it progresses to become a tumor12,13. A pan-cancer study of human leukocyte antigen (HLA) variants identified specific HLA genotypes that were predictive of tumor oncogenic mutations; this observation was likely driven by neoantigens and inherited differences in antigen presentation and immune destruction of cells after a mutation12,13. With the increasing availability of paired germline genomic and somatic mutation data, identification of interactions between germline variants and specific somatic mutations is now possible. Indeed, associations between germline variants and somatic TP53 mutations in breast cancer have been identified14, as well as associations between local genetic similarity and EGFR and KRAS mutations in lung cancers15, and other somatic mutations and cancers12,13,16,17.

The objective of our study was to identify inherited variants associated with somatic KRAS mutations in CRC. We hypothesized that germline variants associate with KRAS-mutant CRC and that the associated germline variants may have higher risk allele frequency in individuals of AfA or higher effect size compared to individuals of EuA. To identify associated variants, we performed a discovery analysis in individuals of EuA and a multi-ancestry validation analysis. Studying the mechanisms leading to these associations may lead to a better understanding of the events after the occurrence of a somatic mutation that lead to tumor development; this may lead to new biological insights to prevent or treat CRC.

Results

Genome-wide discovery analysis

From the discovery datasets, 7071 individuals were defined as EuA through principal component analysis (PCA) of germline genotyping (Fig. 1A). The dataset included 3328 females and 3743 males (Supplemental Table 1). Individuals with KRAS mutation status and non-EuA genetic similarity were saved for validation analyses and combined with additional validation samples. PCA for genetic similarity in the validation analysis is shown (Fig. 1B). In EuA individuals included in the discovery analysis, KRAS mutations were present in 33% of tumors (Table 1). After variant filtering of 39,128,432 SNVS, 6,861,047 SNVs were included in the genome-wide analyses. No variants were associated with KRAS tumor mutations meeting a genome-wide significance level (P value < 5 × 10–8) (Fig. 2). Three SNVs had P values < 1 × 10–6 and 50 had P values 1 × 10–6–1 × 10–5 for association with KRAS mutation status (Supplemental Table 2).

Fig. 1.

Fig. 1

PCA for genetic similarity analysis. Principal component analysis was performed to approximate genetic similarity to reference populations. (A) PCA was performed from genome-wide data in 10,124 individuals to identify individuals of EuA for discovery analyses (blue). (B) PCA was performed for subsequent validation analyses using genotypes from a targeted ancestry genotyping array of 95 SNVs in 2482 individuals to identify individuals of EuA and AfA. Self-reported race/ethnicity is indicated by color (Asian, red; African American/Black, green; Latino, light blue; South Asian, purple; and European/White, dark blue), and individuals without self-reported race/ethnicity information are in black. PC, principal component; EuA, European reference population adjacent; AfA, African reference population adjacent; PCA, principal component analysis; SNV, single nucleotide variant.

Table 1.

Study demographics and KRAS mutation status.

Discovery dataset Validation dataset Combined data P value
KRAS-mut (%) Total n KRAS-mut (%) Total n KRAS-mut (%) Total n
All individuals 2352 (33) 7071 858 (35) 2482 3210 (34) 9553
AfA NA NA 73 (42) 172 73 (42)** 172 0.02**
EuA 2352 (33) 7071 702 (33) 2109 3054 (33)** 9180
H NA NA 43 (46) 94 43 (46) 94
Other ancestry NA NA 40 (37) 107 40 (37) 107
Females 1099 (33) 3328 411 (36) 1142 1510 (34) 4470
Males 1253 (33) 3743 447 (33) 1340 1700 (33) 5083
Proximal CRC 850 (38) 2252 281 (45) 629 1131 (39)*† 2881
Distal CRC 825 (31) 2652 228 (31) 742 1053 (31)* 3394  < 0.0001*
Rectal CRC 480 (32) 1516 174 (30) 587 654 (31)† 2103  < 0.0001†
Transverse colon 113 (27) 414 33 (32) 104 146 (28) 518
Unspecified site 84 (35) 237 142 (34) 420 226 (34) 657

Fisher’s exact test was used to compare KRAS mutations by proximal versus distal tumor location (*) and proximal versus rectal tumor location  (†), as well as KRAS mutations in AfA individuals versus EuA individuals (**).. KRAS-mut, Tumor with KRAS mutation; CRC, colorectal cancer; EuA, European reference population adjacent; AfA African reference population adjacent; H, Hispanic reference population adjacent. N, number; %, percent.

Fig. 2.

Fig. 2

Manhattan plot of discovery GWAS for CRC tumor KRAS mutation status. Discovery GWAS data comparing individuals of EuA with CRC and a KRAS mutation (n = 2352) and individuals without a KRAS mutation (n = 4719) are plotted by − log10(P value). Chromosome numbers are indicated. The dotted line indicates genome-wide significance (P value < 5 × 10–8). GWAS, genome-wide association study; CRC, colorectal cancer; EuA, European reference population adjacent.

KRAS mutation status by genetic similarity and tumor site

Consistent with other reports, we found KRAS mutations to be significantly more common in tumors from AfA individuals (42%) compared to EuA individuals (33%) (P value = 0.02) (Table 1). KRAS mutations were also more common in proximal colon tumors (39%) compared to distal colon and rectal tumors (31%) (P value < 0.0001) (Table 1). No significant differences in KRAS mutation were observed by sex. Specific amino acid changes were described in 630 of 3210 individuals with KRAS mutations, including 132 in the discovery and 498 in the validation analyses. The most common mutations were p.G12D (34%) and p.G12V (25%) (Supplemental Table 3).

A total of 101 SNVs were prioritized for validation in a set of individuals of multiple ancestries with CRC (Supplemental Table 4) as described through a workflow chart (Fig. 3A) and sectioned Manhattan plot (Fig. 3B). The prioritized SNVs included 20 SNVs with sex-specific risk effects, 15 SNVs with tumor location-specific risk effects, and 6 SNVs with both sex and location-specific risk effects (P values < 1 × 10–4 for specific covariate). Validation analysis of 101 SNVs was performed in 2482 individuals (Supplemental Table 5). These included 1127 individuals with targeted array genotypes that passed overall quality control (QC) standards (Supplemental Table 6). Validation studies included individuals with genome-wide genotype data who did not have EuA by PCA in the discovery analysis or only had genotype or KRAS mutation data available after initial discovery analyses. The availability of genotyping data differed by SNV based on source or array QC. Using PCA to measure genetic similarly, we defined 172 individuals as AfA, 2109 individuals as EuA, 94 individuals as having genetic similarity adjacent to a Hispanic reference population (H), and 107 as undetermined (Table 1). The H cluster showed high admixture. Findings in AfA and H groups were considered exploratory due to small sample sizes.

Fig. 3.

Fig. 3

Workflow for identifying variants for validation analysis. The filtering steps for identifying SNVs to include in the validation analysis from the discovery analysis results are depicted as a workflow (A) and as sections of the graphed discovery data (B). Filtering steps included P value for association, ORs, MAF, redundancy as determined by linkage disequilibrium, and in silico and other functional data. A final determination was whether a Fluidigm genotyping assay could be designed for the SNV. SNV, single nucleotide variant; MAF, minor allele frequency; EuA, European reference population adjacent; AfA, African reference population adjacent; LD, linkage disequilibrium; OR, odds ratio.

None of the SNVs in the multi-ancestry validation analyses were significantly associated with somatic KRAS mutation status after Bonferroni adjustment for multiple comparisons of 101 SNVs (P value < 0.0005). A heterogeneity analysis identified four validation SNVs (rs2298437, rs74030413, r116195080, rs726800) with evidence of heterogeneity (Cochran’s Q < 0.05) by genetic similarity groups. Two SNVs showed suggestive associations of interest (P value < 0.005) (Table 2). This included rs73067863 in EuA (P value = 0.0016, OR 0.67) and combined (P value = 0.0036, OR 0.73) analyses. rs2298437, which showed evidence of heterogeneity by genetic similarity groups (Cochran’s Q < 0.003) showed evidence of association in AfA (P value = 0.0014, OR 0.4) but not other populations. In a combined analysis of both discovery and validation samples (n = 9553) neither rs73067863 (P value = 9.7 × 10–7, OR = 0.75) nor rs2298437 (P value = 7.9 × 10–4, OR = 1.11) were associated with KRAS mutations at genome-wide significance (P value < 5 × 10–8). Additional SNVs with P values < 0.05 were identified, including one observed in the combined analysis, six in AfA individuals and one in H individuals (Table 2). Ten SNVs that were incompatible with array design were analyzed for association with tumor KRAS mutation using existing genome-wide data (n = 1274 individuals). From this analysis, three SNVs showed P values < 0.05, including one in AfA individuals and two of an SNV in LD with rs73067863. One SNV had a P value < 0.05 by tumor location. None of the SNVs tested showed sex-specific associations with tumor KRAS mutation.

Table 2.

SNVs with evidence of association in combined and validation analyses.

SNV Chr Position Ref Alt Pop N (KRAS-mut) OR (95% CI) P value Data type
rs73067863 3 36489472 C T All 2299 (793) 0.726 (0.584–0.898) 0.004 Combined
EuA 2019 (676) 0.672 (0.523–0.856) 0.002 Combined
rs2298437 21 30496968 C T AfA 160 (70) 0.399 (0.221–0.687) 0.001 Combined
rs16966448 16 9830940 C T All 2252 (776) 0.868 (0.762–0.989) 0.033 Combined
rs116195080 3 141700851 A G AfA 148 (69) 0.306 (0.107–0.811) 0.021 Combined
rs197110 1 57183103 G T AfA 151 (64) 0.478 (0.242–0.893) 0.026 Combined
rs74030413 16 80740298 C T AfA 160 (71) 0.444 (0.209–0.905) 0.028 Combined
rs726800 17 13597287 A G AfA 140 (62) 1.833 (1.076–3.214) 0.029 Combined
rs9892103 17 13563369 A G AfA 142 (63) 0.486 (0.236–0.953) 0.041 Combined
rs8131342** 21 30501326 A G AfA 162 (70) 0.601 (0.359–0.984) 0.047 Combined
rs4798561 18 7366503 A G Latino/Admixed 91 (42) 2.607 (1.277–5.742) 0.012 Combined
rs1814562* 3 36453242 C T All 1246 (403) 0.661 (0.487–0.886) 0.007 Genomic
EuA 1097 (348) 0.599 (0.425–0.832) 0.003 Genomic
rs522098 11 116455969 A G AfA 71 (26) 0.06 (0.003–0.408) 0.014 Genomic

*in LD with rs73067863. **LD (R^2 = 0.61) with rs2298437. † Not on validation array, GWAS data only; SNV, single nucleotide variant; Chr, chromosome; GRCh38, Human genome assembly GRCH38 position, Ref, reference allele; Alt, alternate allele; Pop, population; N, number; EuA, European reference population adjacent; AfA, African reference population adjacent; KRAS-mut, KRAS-mutant tumor; OR, odds ratio; CI, confidence interval; FDR, false discovery rate; LD, linkage disequilibrium; Array, targeted genotyping, GWAS, genome-wide association study array.

.

Discussion

Here, we describe a GWAS to identify germline variants that associate with the presence of somatic KRAS mutations in CRC with the aim of determining if germline variants may explain population differences in KRAS mutation frequency. Although this was a study of over 7000 individuals, no variants were identified that met genome-wide significance for association with KRAS-mutation status in CRCs. Two variants of interest were identified, which do not meet genome-wide significance and should be interpreted with caution as they could be false positive findings.

rs73067863 is located in intron 5 of STAC (SH3 and cysteine-rich domain)18,19. This SNV has been identified as an eQTL of MLH1 in GTEx Version 8, though the more recent version of GTEx, Version 10, does not indicate any eQTL function of this SNV or any other SNV in linkage disequilibrium with rs7306786320. We found that the rs73067863-T allele was associated with lower likelihood of a KRAS mutation with an allele frequency of 0.07 in individuals with KRAS-mutant CRC compared to 0.10 in those with KRAS-WT CRC. rs73067863 alleles differ in frequency by genetic similarity groups, but there was no evidence of heterogeneity (Cochran’s Q = 0.37). The T-allele has a higher frequency in individuals of EAsA (0.29) compared to EuA (0.09) and AfA (0.10)18. EAsA individuals were not included as a separate study population due to low numbers. Thus, future multi-ancestry studies of individuals with CRC are needed to determine if rs73067863-T is similarly associated with KRAS-WT CRC in EAsA or other populations not evaluated here.

No consistent predicted functional effects of rs73067863 were identified through in silico analyses. Additional in vitro studies such as DNA accessibility, DNA protein binding, splicing, and local methylation analyses are needed to determine if this SNV impacts expression or regulation of MLH1 or another nearby gene. Of potential importance for result interpretation, we excluded individuals with known MSI-H tumors in our analyses for multiple reasons. These include the high overall somatic mutation rates in MSI-H tumors due to defective DNA repair which could mask effects of low impact variants on KRAS mutations, the low overall frequency of MSI-H CRCs (~ 15%), and the small proportion of MSI-H tumors with KRAS mutations (~ 5–14%)21,22. MSI-H tumors can be driven by deficiency of MLH1 which is essential to DNA mismatch repair23,24. MLH1 deficiency and MSI-H status are more common in KRAS-WT CRC compared to KRAS-mutant CRC24. Further evaluation of this locus and studies which include MSI-H CRCs resulting from MLH1 methylation or germline/somatic mutations are needed to determine if rs73067863 associates with MSI in KRAS-WT CRC and/or alters MLH1 protein expression.

rs2298437 was not significantly associated with KRAS mutations in a combined analysis (OR = 1.1, raw P value = 7.9 × 10–4) but showed a suggestive protective effect of rs2298437-T for KRAS-mutant CRC only observed in an exploratory analysis of data from AfA individuals (OR = 0.4, raw P value = 0.0014). The rs2298437-C allele is more common in AfA individuals (MAF = 0.76) compared to EuA individuals (MAF = 0.57)18. This SNV encodes a p.Tyr48Cys missense variation in KRTAP19-4, a keratin-associated protein18,19. Little is known about the functions of KRTAP19-4, but it is predominantly expressed in skin and hair20. Interestingly, the effect may differ by genetic similarity because an opposite directional effect was observed in EuA individuals in the discovery and validation analyses (OR = 1.15, raw P value = 8.7 × 10–5 and OR = 1.08, raw P value = 0.23, respectively) compared to AfA individuals (OR = 0.4); the variant showed significant evidence of heterogeneity by genetic similarity groups using Cochran’s Q analysis. Risk alleles have previously been identified that have different frequencies by genetic similarity to reference populations or that only convey risk in individuals of certain ancestries25. However, the AfA sample size was small, so caution in overinterpretation of these findings should be taken. Further studies of rs2298437 in a larger sample of AfA individuals with CRC are needed to determine if the observed association and evidence of heterogeneity by genetic similarity groups is genuine.

To consider the larger context of race and colorectal cancer, Black individuals have 20% greater CRC incidence and 31–44% greater mortality compared to Non-Hispanic White individuals26. In the United States, current CRC screening rates are similar in Blacks and Non-Hispanic Whites, (65% vs 68%, respectively), though historically, screening differences have contributed to an estimated 19% of CRC mortality disparities between groups26,27. The effects of social determinants of health and systemic racism are likely contributors to the observed elevated CRC mortality in Black individuals26,28. Nonetheless, the observed differences in KRAS mutation by genetic similarity and race remain unexplained.

It is possible that the higher KRAS mutation frequency in AfA individuals with CRC is driven by germline genomic effects that were not measured by this study, such as copy number variants, rare SNVs, genomic regions not included on microarrays, or regions difficult to genotype (such as the HLA-locus). It is also possible that there may be variants associated with KRAS mutation status unique to tumor subtype that could have been missed in this study. Studies within tumor subtype may provide additional insights. The discovery GWAS analysis was conducted only in EuA individuals as the numbers of individuals of other genetic similarity groups were too small to split into discovery and validation groups, and we wanted to maximize diversity in the validation analyses. Some variants important for KRAS mutation may only be present or enriched in certain populations and these would have been missed as they were not included in our validation study. This study was likely underpowered to identify associations with low effect size and those enriched in non-EuA populations.

We did not attempt to identify germline variants associated with specific mutational changes of KRAS or codon sites of mutation. This type of association is plausible if a germline variant increases risk of a mutational signature or alters the immune response to a specific neoantigen. However, we did not have access to specific mutational changes for many of the contributing studies and, as such, the overall sample size was underpowered for mutation or codon-specific association studies. The contributing studies also used different KRAS genotyping methods; it is possible that rare oncogenic mutations may have been missed, leading to some individuals being misclassified as wildtype for KRAS. Associations between germline variants and specific mutational changes of KRAS may be identified in future studies with a larger sample.

Somatic mutations in BRAF may be driven by germline effects, which could indirectly affect rates of KRAS mutations. Somatic mutations in BRAF are twice as likely in non-Hispanic White individuals with CRC (12%) compared to non-Hispanic Black (6%) or Hispanic/Latino (7%) individuals and tend to be mutually exclusive with KRAS mutations9. We did not include/exclude participants based on tumor BRAF mutations because many of our participants did not have this data, and BRAF-mutant tumors are more likely to be MSI22, which are excluded. Future studies with consideration for BRAF mutations may clarify the interactions between KRAS, BRAF, and germline variants.

This was a study of genetic factors. Non-genetic effects may contribute to the observed population differences in KRAS mutation frequency. One possible non-genetic driver is tumor site location, which also varies by population29. KRAS mutations are more common in proximal CRC and are highest in cecal tumors among all colorectal cancers10,30,31. Another possible driver is gene-environment interactions between KRAS mutations and social determinants of health, such as diet32. Low levels of plasma adiponectin (an adipokine inversely linked with obesity and excess energy balance) are associated with increased incidence of KRAS-mutated (but not KRAS-wild-type) colorectal cancer33,34. Associations between neighborhood deprivation index and tumor genomics have been identified, such as epigenetic alterations in breast cancer35 and KRAS mutations in non-small cell lung cancer36, though these studies were based on small sample sizes. Improving enrollment of underrepresented populations in genomic studies and considering the effects of social determinants of health should be high priorities for cancer genomics research.

Conclusion

We conducted a GWAS of > 7000 individuals to identify inherited variants associated with KRAS mutations in CRC. No variants were found to be associated with KRAS mutation status at genome-wide significance. One variant, rs73067863, showed suggestive evidence of association in meta-analyses and ancestry-specific analyses and therefore warrants additional studies. It remains unknown if and how germline variants influence KRAS mutations in CRC and their impact on observed population differences in KRAS mutation frequency in CRC.

Methods

Study participants

All participants in the discovery GWAS (n = 7071) (Supplemental Table 1, Supplemental “Methods”) or validation studies (n = 2482) (Supplemental Table 5, Supplemental Methods) were diagnosed with invasive CRC. Individuals were eligible if they had existing germline genome-wide genotyping data or germline DNA for genotyping as well as tumor KRAS mutation status or a tumor DNA sample available for KRAS mutational analysis. Individuals with a hereditary CRC syndrome, such as familial adenomatous polyposis or Lynch syndrome, were excluded, as somatic events may be affected by the germline pathogenic variant. Due to high somatic mutation burden and low rates of concurrent KRAS mutation, individuals with MSI-H tumors were also not included. Individuals with MSI-H tumors were also excluded because a subset of these tumors are found in individuals with Lynch syndrome which could have other mechanisms for selection of tumors with KRAS somatic mutations. We did not exclude individuals with POLE, POLD, or MUTYH somatic or germline pathogenic variants as these data were not available for most individuals, and the incidence of individuals germline pathogenic variants in these genes was expected to be small (< 1%)37. DNA extraction and sequencing to identify KRAS mutations are as described (Supplemental “Methods”).

Germline genomic data analysis

All analyses of genomic datasets were performed using PLINK (version 1.9, RRID:SCR_001757) (Purcell Lab, available at: https://www.cog-genomics.org/plink/)38. For the discovery analysis, participant genetic similarity was categorized through PCA. Participants with self-reported White, Black, or Asian race and reference samples from the 1000 Genomes Project were used to identify clusters for genetic similarity analysis39. All clusters available at the time of analysis were included, namely Yoruba in Ibadan (YRI) to estimate AfA, Northern Europeans from Utah (CEU) to estimate EuA, and Japanese in Tokyo (JPT) and Han Chinese in Beijing (CHB) to estimate similarity adjacent to East Asian reference populations (EAsA). SNVs in the reference samples were compared pairwise by similarity clusters and minor allele frequency (MAF). The 5000 most cluster-specific variants were compiled and used for genetic similarity PCA. The centroid for each of 3 groups was computed. Individuals were categorized into a cluster if they were within two standard deviations (Euclidean distance) of the centroid of that group. All who were not categorized into any clusters were classified as Admixed. For individuals lacking genome-wide data, genetic similarity was determined using genotypes from 95 population-specific SNVs (Supplemental Tables 78) identified from our PCA. Genetic similarity analysis was completed in 10,124 individuals considered for discovery analysis to identify participants of EuA.

In individuals of EuA, we evaluated the association of 39,128,432 germline SNVs with tumor KRAS mutation status (yes or no) using logistic regression. SNVs were included if they were biallelic, had a minor allele frequency (MAF) ≥ 1%, had an imputation score (info/R2) > 0.4, and data missingness < 10%40. We performed log-additive analyses for each SNV and KRAS mutation status, considering study of origin as a covariable. A stratified analysis of the discovery data was performed to identify SNVs that were associated with tumor KRAS mutation and had a different effect by patient sex or tumor location in the proximal colon, distal colon, or rectum.

Prioritization of variants for validation analysis

Variants of interest from the discovery analysis were prioritized for validation studies (Fig. 3A and B). In brief, all SNVs associated with tumor KRAS mutation with P values < 10–4 and a MAF ≥ 10% using GnomAD (version 3.0) (Broad Institute, available at: https://gnomad.broadinstitute.org/) were considered for inclusion (n = 472)41. These SNVs were further evaluated for linkage disequilibrium (LD) using LDLink software and the SNPClip tool (version 5.1) (National Cancer Institute, available at: https://ldlink.nih.gov/) for high-throughput evaluation42.

All SNVs that had a P value < 10–5 for association and met MAF criteria were considered high priority, and 1–2 SNVs per LD locus were included for validation (n = 21) (Supplemental Table 4). From the LD loci with high association, in silico findings were used to select SNVs for validation analysis that were most likely to alter transcription of nearby genes, including identifying overlap with previous GWAS findings in the NHGRI-EBI GWAS Catalog (version 1.0) (NHGRI-EBI, available at: https://www.ebi.ac.uk/gwas/)43. SNVs within 50 kb of genes were identified and prioritized using dbSNP (National Library of Medicine, available at: https://www.ncbi.nlm.nih.gov/snp/)44. We also prioritized SNVs based on predicted regulatory functions. The SNVs were evaluated for overlap with expression quantitative trait loci (eQTLs) identified in Gene Tissue Expression (GTEx) (version 8) (Broad Institute, available at: https://gtexportal.org/home/) transverse and sigmoid colon datasets and BarcUVa-Seq, using the CoTrEx 2.0 browser (version 2.0) (CoTrEx, available at: https://barcuvaseq.org/cotrex/)20,45,46. We prioritized SNVs that were possible eQTLs (nominal P value < 0.05) through any of these databases. Additional predicted regulatory functions were identified from Roadmap and ENCODE epigenomic data using three browsers; RegulomeDB included eQTL and motif data (version 2.2) (Stanford University, available at: https://regulomedb.org/regulome-search/), HaploReg validated eQTL and GWAS hits (version 4.2) (Broad Institute, available at: https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php), and the Epilogos browser included additional published epigenomic data (version 1.0) (Altius, available at: https://epilogos.net/)4749. Variants were prioritized if they were ranked 5 or lower in RegulomeDB with a chromatin accessibility peak or transcription factor binding. Variants that were within 25 kb of an enhancer, transcription start site, or showed active transcription in any tissue in Epilogos were also prioritized. We also prioritized SNVs with suggestive association in analyses within sex or tumor location (P values < 10–4).

SNVs with P values ranging from 10–5 to 10–4 that met MAF criteria received lower priority and were included for validation only if there were one or more likely in silico findings for potential biological significance. This potential biological significance was evaluated using the same tools previously described to prioritize SNVs in LD and we included 1–2 SNVs per LD locus.

Multi-ancestry validation analysis

Additional genome-wide genotyping data came from multiple studies through the Cancer Genome Atlas (TCGA), the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) (n = 370) the Ohio Colorectal Cancer Prevention Initiative (OCCPI) (n = 904), and the Hispanic Colorectal Cancer Study (n = 81)5052. Individuals were grouped by genetic similarity based on PCA of ancestry array SNVs as previously described. To identify H participants, an additional 901 samples were used for PCA, but were not included in the validation due to lack of KRAS mutation status. We did not perform population-specific analyses for self-reported Asian or South Asian individuals due to small numbers (n = 19).

For targeted germline genotyping of 1127 individuals for whom genotype data was not previously generated (Supplemental Table 5), two custom real-time PCR arrays were used: one array with ancestry-specific SNVs comparable to another previously described14 and a second array with KRAS-mutation associated SNVs identified from the discovery. If SNVs were incompatible with the genotyping array design or failed all array experiments, SNVs were replaced with others in LD. SNVs that could not be directly replaced were analyzed only in samples with genome-wide data. Genotyping arrays were completed by GSR using a Biomark HD high-throughput real-time PCR system on a Fluidigm 96 × 96 array. Array quality control was performed using methods previously described14.

To determine if any SNVs chosen for validation studies associated with tumor KRAS mutations, we used logistic regression analysis using adjustment for genetic similarity, study, sex, age, and tumor location. After Bonferroni adjustment for multiple comparisons, P values < 0.0005 were considered significant. For SNVs with sex-specific or side-specific risk effects in the discovery analysis (OR > 1.2 or < 0.7, P value < 1 × 10–4), we repeated analysis of potential sex-specific effects or side-specific effects by colon location in the validation analysis.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (13.1KB, docx)
Supplementary Material 2 (944.8KB, xlsx)
Supplementary Material 3 (181.1KB, docx)

Acknowledgements

The Ohio State University Comprehensive Cancer Center (OSU CCC) Total Cancer Care from the Biospecimen Services Shared Resource provided mutational data on CRC cases and DNA samples for validation genotyping. The Ohio State Genomics Shared Resource performed Sanger sequencing analysis for KRAS mutation status and validation genotyping. The Ohio Colorectal Cancer Prevention Initiative (OCCPI) provided samples and clinical data for validation studies. The OCCPI was supported by a grant from Pelotonia, as well as P30 CA016058, and utilized the Biospecimen Services Shared Resource Biorepository of the OSUCCC. This study was funded in part by the National Cancer Institute (R01 CA215151-01 to Amanda E Toland). Johnny Ramroop was supported by a Pelotonia Postdoctoral Fellowship, Nijole Pollock Tjader was supported by a Pelotonia Graduate Research Fellowship and Tanish Gandhi was supported by a Pelotonia Undergraduate Research Fellowship. The OSU CCC GSR and Total Cancer Care were funded in part by NCI grant P30 CA016058. We thank the GECCO Coordinating Center and those who conducted the participating studies, including: Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO): National Cancer Institute, National Institutes of Health, U.S. Department of Health and Human Services (U01 CA137088, R01 CA176272). Genotyping/Sequencing services were provided by the Center for Inherited Disease Research (CIDR) contract number HHSN268201700006I and HHSN268201200008I. This research was funded in part through the NIH/NCI Cancer Center Support Grant P30 CA015704. Scientific Computing Infrastructure at Fred Hutch funded by ORIP grant S10OD028685. The Colon Cancer Family Registry (CCFR): The CCFR (www.coloncfr.org) is supported in part by funding from the National Cancer Institute (NCI), National Institutes of Health (NIH) (award U01 CA167551). Support for case ascertainment was provided in part from the Surveillance, Epidemiology, and End Results (SEER) Program and the following U.S. state cancer registries: AZ, CO, MN, NC, NH; and by the Victoria Cancer Registry (Australia) and Ontario Cancer Registry (Canada). The CCFR Set-1 (Illumina 1 M/1 M-Duo) and Set-2 (Illumina Omni1-Quad) scans were supported by NIH awards U01 CA122839 and R01 CA143237 (to Graham Casey). The CCFR Set-3 (Affymetrix Axiom CORECT Set array) was supported by NIH award U19 CA148107 and R01 CA81488 (to Stephen B Gruber). The CCFR Set-4 (Illumina OncoArray 600 K SNP array) was supported by NIH award U19 CA148107 (to Stephen B Gruber) and by the Center for Inherited Disease Research (CIDR), which is funded by the NIH to the Johns Hopkins University, contract number HHSN268201200008I. Additional funding for the OFCCR/ARCTIC was through award GL201-043 from the Ontario Research Fund (to Brent W Zanke), award 112746 from the Canadian Institutes of Health Research (to TJH), through a Cancer Risk Evaluation (CaRE) Program grant from the Canadian Cancer Society (to SG), and through generous support from the Ontario Ministry of Research and Innovation. The SFCCR Illumina HumanCytoSNP array was supported in part through NCI/NIH awards U01/U24 CA074794 and R01 CA076366 (to Polly A Newcomb). The content of this manuscript does not necessarily reflect the views or policies of the NCI, NIH or any of the collaborating centers in the Colon Cancer Family Registry (CCFR), nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government, any cancer registry, or the CCFR. The Colon CFR graciously thanks the generous contributions of their study participants, dedication of study staff, and the financial support from the U.S. National Cancer Institute, without which this important registry would not exist. The authors would like to thank the study participants and staff of the Seattle Colon Cancer Family Registry and the Hormones and Colon Cancer study (CORE Studies). Darmkrebs: Chancen der Verhütung durch Screening (DACHS): This work was supported by the German Research Council (BR 1704/6-1, BR 1704/6-3, BR 1704/6-4, CH 117/1-1, HO 5117/2-1, HE 5998/2-1, KL 2354/3-1, RO 2270/8-1 and BR 1704/17-1), the Interdisciplinary Research Program of the National Center for Tumor Diseases (NCT), Germany, and the German Federal Ministry of Education and Research (01KH0404, 01ER0814, 01ER0815, 01ER1505A, 01ER1505B, and 01KD2104A). We thank all participants and cooperating clinicians, and everyone who provided excellent technical assistance. Diet, Activity, and Lifestyle Study (DALS): National Institutes of Health (R01 CA048998 to M. L. Slattery). Melbourne Collaborative Cohort Study (MCCS): MCCS cohort recruitment was funded by VicHealth and Cancer Council Victoria. The MCCS was further supported by Australian NHMRC grants 509348, 209057, 251553 and 504711 and by infrastructure provided by Cancer Council Victoria. Cases and their vital status were ascertained through the Victorian Cancer Registry (VCR) and the Australian Institute of Health and Welfare (AIHW), including the National Death Index and the Australian Cancer Database. Harvard cohorts: Health Professionals Follow-up Study (HPFS) and Nurse’s Health Study (NHS): HPFS is supported by the National Institutes of Health (P01 CA055075, UM1 CA167552, U01 CA167552, R01 CA137178, R01 CA151993, R35 CA197735, and R35 CA253185), NHS by the National Institutes of Health (P01 CA087969, UM1 CA186107, R01 CA137178, R01 CA151993, R35 CA197735, and R35 CA253185), and NHS II by the National Institutes of Health (U01 CA176726, R35 CA197735, and R35 CA253185). Additionally, work of Shuji Ogino and Tomotaka Ugai associated with the Harvard cohorts was supported in part by grants from the USA National Institutes of Health (R50 CA274122 to Tomotaka Ugai; R01 CA248857 to Shuji Ogino), the American Cancer Society Clinical Research Professor Award (CRP-24-1185864-01-PROF to Shuji Ogino), a grant from the Prevent Cancer Foundation (to Tomotaka Ugai), the Harvey V. Fineberg Cancer Prevention Fellowship (to Tomotaka Ugai), the Brigham and Women’s Hospital Faculty Career Development Award (to Tomotaka Ugai), and an Investigator Initiated Grant from the American Institute for Cancer Research (to Tomotaka Ugai). The study protocol was approved by the institutional review boards of the Brigham and Women’s Hospital and Harvard T.H. Chan School of Public Health, and those of participating registries as required. We acknowledge Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital as home of the NHS. The authors would like to acknowledge the contribution to this study from central cancer registries supported through the Centers for Disease Control and Prevention’s National Program of Cancer Registries (NPCR) and/or the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) Program. Central registries may also be supported by state agencies, universities, and cancer centers. Participating central cancer registries include the following: Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Hawaii, Idaho, Indiana, Iowa, Kentucky, Louisiana, Massachusetts, Maine, Maryland, Michigan, Mississippi, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Puerto Rico, Rhode Island, Seattle SEER Registry, South Carolina, Tennessee, Texas, Utah, Virginia, West Virginia, Wyoming. The authors assume full responsibility for analyses and interpretation of these data. We thank the Broad Institute for generating high-quality sequence data supported by NHGRI funds (U54 HG003067) with Eric Lander as PI. The datasets used in this manuscript were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000722. We also thank Nurses’ Health Study (NHS) and Health Professionals Follow-up Study (HPFS) for sample collection. Northern Swedish Health and Disease Study (NSHDS): The research was supported by the Swedish Research Council (VR 2017-01737 and, through funding to Biobank Sweden, VR 2017-00650), the Swedish Cancer Society (21 0467 FE 01 H, 20 1154 PjF), Region Västerbotten , Knut and Alice Wallenberg Foundation, the Cancer Research Foundation in Northern Sweden and the Lion’s Cancer Research Foundation in Northern Sweden. NSHDS investigators thank the Västerbotten Intervention Programme and the Northern Sweden MONICA study, the Section of Biobank and Registry Support, Department of Public Health and Clinical Medicine at Umeå University, and Biobanken Norr at Region Västerbotten for providing data and samples and acknowledge the contribution from Biobank Sweden, supported by the Swedish Research Council. Cancer Prevention Study II (CPSII): The American Cancer Society funds the creation, maintenance, and updating of the Cancer Prevention Study-II (CPS-II) cohort. The study protocol was approved by the institutional review boards of Emory University and those of participating registries as required. The authors express sincere appreciation to all Cancer Prevention Study-II participants, and to each member of the study and biospecimen management group. The authors would like to acknowledge the contribution to this study from central cancer registries supported through the Centers for Disease Control and Prevention’s National Program of Cancer Registries and cancer registries supported by the National Cancer Institute’s Surveillance Epidemiology and End Results Program. The authors assume full responsibility for all analyses and interpretation of results. The views expressed here are those of the authors and do not necessarily represent the American Cancer Society or the American Cancer Society – Cancer Action Network. European Prospective Investigation into Cancer- Sweden (EPIC): The coordination of EPIC is financially supported by International Agency for Research on Cancer (IARC) and also by the Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London which has additional infrastructure support provided by the NIHR Imperial Biomedical Research Centre (BRC). The national cohorts are supported by: Danish Cancer Society (Denmark); Ligue Contre le Cancer, Institut Gustave Roussy, Mutuelle Générale de l’Education Nationale, Institut National de la Santé et de la Recherche Médicale (INSERM) (France); German Cancer Aid, German Cancer Research Center (DKFZ), German Institute of Human Nutrition Potsdam- Rehbruecke (DIfE), Federal Ministry of Education and Research (BMBF) (Germany); Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy, Compagnia di SanPaolo and National Research Council (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF), Statistics Netherlands (The Netherlands); Health Research Fund (FIS) - Instituto de Salud Carlos III (ISCIII), Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, and the Catalan Institute of Oncology - ICO (Spain); Swedish Cancer Society, Swedish Research Council and Region Skåne and Region Västerbotten (Sweden); Cancer Research UK (14136 to EPIC-Norfolk; C8221/A29017 to EPIC-Oxford), Medical Research Council (1000143 to EPIC-Norfolk; MR/M012190/1 to EPIC-Oxford). (United Kingdom). Early Detection Research Network (EDRN): EDRN is funded and supported by the NCI, EDRN Grant (U01-CA152753). We acknowledge all contributors to the development of the resource at the University of Pittsburgh School of Medicine, Division of Gastroenterology, Hepatology and Nutrition, Department of Pathology, and Biomedical Informatics. ISACC: The authors would like to thank all those at the ISACC Coordinating Center for helping bring together the data and people that made this project possible. TCGA: The results published here are in whole or part based upon data generated by The Cancer Genome Atlas (TCGA) Research Network managed by the NCI and NHGRI. Information about TCGA can be found at http://cancergenome.nih.gov). CRCGEN: The authors received funding from the Instituto de Salud Carlos III, co-funded by cofounded by ERDF “A way of making Europe” through the Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), action Genrisk, and the “Programa FORTALECE del Ministerio de Ciencia e Innovación”, through the project number FORT23/00032. Also from the Spanish Association Against Cancer (AECC) Scientific Foundation grant GCTRA18022MORE. We thank CERCA Programme, Generalitat de Catalunya for institutional support. Hispanic Colorectal Cancer Study: Supported by National Institutes of Health award R01CA155101 (to Jane C. Figueiredo). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health. Moffitt Total Cancer Care: The Moffitt Total Cancer Care Protocol (MCC #14690) is funded by the Moffitt Cancer Center institutional funding and supported by the Tissue Core Shared Resource at the H. Lee Moffitt Cancer Center & Research Institute, an NCI-designated Comprehensive Cancer Center (P30-CA076292). These results published here are in whole or part based upon data generated through the Oncology Research Information Exchange Network Avatar Project in collaboration with Aster Insights (formerly known as M2Gen). This work was also supported by funding from the NCI U01CA206110. This work was also supported in part by Gastrointestinal Oncology Transformative Initiatives for Translation (GOTIT): A Cloud-based Datamart for Rapid and Efficient Acquisition of Data to Enhance Clinical Care and Inform Research Advances. Where authors are identified as personnel of the International Agency for Research on Cancer / World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer / World Health Organization. This article is the result of the scientific work of Dr. Dimou while she was affiliated with at IARC.

Author contributions

Nijole Pollock Tjader: conceptualization, funding acquisition, investigation, methodology, validation, visualization, writing—original draft, writing—reviewing. Johnny Ramroop: investigation, methodology, writing—review and editing. Tanish Gandhi: investigation, writing—review and editing. Cara Dauch: investigation, writing—review and editing. Owen Meadows: investigation, writing—review and editing. Patrick Stevens: data curation, writing—reviewing and editing. Rachel Pearlman: resources, writing—review and editing. Heather Hampel: resources, writing—review and editing. Elom K Aglago: resources, writing—review and editing. Sonja I Berndt: resources, writing—review and editing. Amanda Bloomer: resources, writing—review and editing. Hermann Brenner: resources, writing—review and editing. Daniel D Buchanan: resources, writing—review and editing. Peter T Campbell: resources, writing—review and editing. Yin Cao: resources, writing—review and editing. Andrew T Chan: data curation, resources, supervision, writing—review and editing. Iona Cheng: resources, writing—review and editing. Niki Dimou: resources, writing—review and editing. David A Drew: resources, writing—review and editing. Amy J French: methodology, resources, writing—review and editing. Peter Georgeson: writing—review and editing. Marios Giannakis: resources, writing—review and editing. Graham G Giles: resources, writing—review and editing. Maria Gomez: resources, writing—review and editing. Stephen B Gruber: data curation, resources, writing—review and editing. Michael Hoffmeister: funding acquisition, data curation, resources, supervision, writing—review and editing. Wen—Yi Huang: resources, writing—review and editing. Meredith AJ Hullar: writing—review and editing. Jeroen R Huyghe: resources, writing—review and editing. Nicole Loroña: writing—review and editing. Victor Moreno: resources, writing—review and editing. Christina C Newton: writing—review and editing. Jonathan A Nowak: resources, writing—review and editing. Mireia Obón-Santacana: resources, writing—review and editing. Shuji Ogino: resources, writing—review and editing. Andrew Pellatt: resources, writing—review and editing. Anita R Peoples: resources, writing—review and editing. Jennifer B Permuth: resources, writing—review and editing. Stephanie L Schmit: resources, writing—review and editing. Robert E Schoen: resources, writing—review and editing. Erin M Siegel: data curation, funding acquisition, resources, writing—review and editing. Robert S Steinfelder: resources, writing—review and editing. Wei Sun: resources, writing—review and editing. Jamie K Teer: data curation, validation, writing—review and editing. Claire E Thomas: resources, writing—review and editing. Quang M Trinh: data curation, resources, writing—review and editing. Konstantinos Tsilidis: resources, writing—review and editing. Tomotaka Ugai: resources, writing—review and editing. Caroline Y Um: resources, writing—review and editing. Bethany Van Guelpen: resources, writing—review and editing. Syed H Zaidi: resources, writing—review and editing. Jane Figueiredo: data curation, funding acquisition, resources, writing—review and editing. Ulrike Peters: data curation, funding acquisition, project administration, resources, supervision, writing—review and editing. Amanda I Phipps: project administration, writing—review and editing. Joseph Paul McElroy: conceptualization, data curation, formal analysis, writing—review and editing. Amanda Ewart Toland: conceptualization, supervision, funding acquisition, investigation, visualization, methodology, writing—review and editing, project administration.

Data availability

Data generated or analyzed during this study are included in this published article in Supplementary Tables, in dbGAP and/or the following data repositories as listed below. The Cancer Genome Atlas (TCGA), HPFS, NHS, and some of the GECCO somatic mutation data and germline genotyping data are available in dbGAP under accession numbers (phs000178.v11.p8, phs001111.v1.p1, phs000722.v3.p2, phs002050.v1.p1 [since updated to phs002050.v2.p1]). Access to other data from international sites can be facilitated by the corresponding author upon request.

Declarations

Competing interests

Heather Hampel consults for LynSight, Exact Sciences, Genome Medical, GI OnDemand, and Carelon, and holds stock/stock options for Genome Medical and GI OnDemand. Jonathan Nowak receives research support from Natera and is a consultant for Leica Biosystems. Shuji Ogino served as a consultant for Sanofi Pasteur S.A. Ulrike Peters was a consultant with AbbVie and her husband holds individual stocks for the following companies: Amazon, ARM Holdings PLC, BioNTech, BYD Company Limited, Crowdstrike Holdings Inc, CureVac, Google/Alphabet, Microsoft Corp, NVIDIA Corp, and Stellantis. The other authors declare no conflicts of interest.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Kiel, C., Matallanas, D. & Kolch, W. The ins and outs of RAS effector complexes. Biomolecules11, 2364 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Huang, D. et al. Mutations of key driver genes in colorectal cancer progression and metastasis. Cancer Metastasis. Rev.37, 173–187 (2018). [DOI] [PubMed] [Google Scholar]
  • 3.Wang, J., Song, J., Liu, Z., Zhang, T. & Liu, Y. High tumor mutation burden indicates better prognosis in colorectal cancer patients with KRAS mutations. Front. Oncol.12, 1015308 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Siddiqui, A. D. & Piperdi, B. KRAS mutation in colon cancer: A marker of resistance to EGFR-I therapy. Ann. Surg. Oncol.17, 1168–1176. 10.1245/s10434-009-0811-z (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Feng, J. et al. Feedback activation of EGFR/wild-type RAS signaling axis limits KRAS(G12D) inhibitor efficacy in KRAS(G12D)-mutated colorectal cancer. Oncogene42, 1620–1633 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Yun, J., Nakagawa, R. & Tham, K. KRAS-targeted therapy in the treatment of non-small cell lung cancer. J. Oncol. Pharm. Pract.29, 422–430 (2023). [DOI] [PubMed] [Google Scholar]
  • 7.Staudacher, J. J. et al. Increased frequency of KRAS mutations in African Americans compared with caucasians in sporadic colorectal cancer. Clin. Transl. Gastroenterol.10.1038/ctg.2017.48 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Jiagge, E. et al. Tumor sequencing of African ancestry reveals differences in clinically relevant alterations across common cancers. Cancer Cell41, 1963-1971.e3 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Booker, B. D. et al. Variation in KRAS/NRAS/BRAF-mutation status by age, sex, and race/ethnicity among a large cohort of patients with metastatic colorectal cancer (mCRC). J. Gastrointest. Cancer55, 237–246 (2024). [DOI] [PubMed] [Google Scholar]
  • 10.Rosty, C. et al. Colorectal carcinomas with KRAS mutation are associated with distinctive morphological and molecular features. Mod. Pathol.26, 825–834 (2013). [DOI] [PubMed] [Google Scholar]
  • 11.Valle, L., Vilar, E., Tavtigian, S. V. & Stoffel, E. M. Genetic predisposition to colorectal cancer: Syndromes, genes, classification of genetic variants and implications for precision medicine. J. Pathol.247, 574–588 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ramroop, J. R., Gerber, M. M. & Toland, A. E. Germline variants impact somatic events during tumorigenesis. Trends Genet.10.1016/j.tig.2019.04.005 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Marty, R. et al. MHC-I genotype restricts the oncogenic mutational landscape. Cell171, 1272-1283.e15 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tjader, N. P. et al. Association of ESR1 germline variants with TP53 somatic variants in breast tumors in a genome-wide study. Cancer Res. Commun.10.1158/2767-9764.CRC-24-0026 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Carrot-Zhang, J. et al. Genetic ancestry contributes to somatic mutations in lung cancers from admixed Latin American populations. Cancer Discov.11, 591–598 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Puzone, R. & Pfeffer, U. SNP variants at the MAP3K1/SETD9 locus 5q11.2 associate with somatic PIK3CA variants in breast cancers. Eur. J. Hum. Genet.25, 384–387 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Gerber, M. M. et al. Evaluation of allele-specific somatic changes of genome-wide association study susceptibility alleles in human colorectal cancers. PLoS ONE7, e37672 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature625, 92–100 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Raney, B. J. et al. The UCSC genome browser database: 2024 update. Nucleic Acids Res52, D1082 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science369 (6509), 1318–1330 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Martianov, A. S. et al. KRAS, NRAS, BRAF, HER2 and MSI Status in a Large Consecutive Series of Colorectal Carcinomas. Int. J. Mol. Sci.24, 4868 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lochhead, P. et al. Microsatellite instability and braf mutation testing in colorectal cancer prognostication. J. Natl. Cancer Inst.105, 1151–1156 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Taieb, J. et al. Deficient mismatch repair/microsatellite unstable colorectal cancer: Diagnosis, prognosis and treatment. Eur. J. Cancer175, 136–157. 10.1016/j.ejca.2022.07.020 (2022). [DOI] [PubMed] [Google Scholar]
  • 24.Poulsen, T. S. et al. Frequency and coexistence of KRAS, NRAS, BRAF and PIK3CA mutations and occurrence of MMR deficiency in Danish colorectal cancer patients. APMIS129, 61–69 (2021). [DOI] [PubMed] [Google Scholar]
  • 25.Law, P. J. et al. Association analyses identify 31 new risk loci for colorectal cancer susceptibility. Nat. Commun.10, 2154 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.American Cancer Society. Cancer Facts & Figures for African American/Black People 2022–2024. (2022).
  • 27.Lansdorp-Vogelaar, I. et al. Contribution of screening and survival differences to racial disparities in colorectal cancer rates. Cancer Epidemiol. Biomark. Prev.21, 728–736 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Mitchell, E. et al. Cancer healthcare disparities among African Americans in the United States. J. Natl. Med. Assoc.114, 236–250 (2022). [DOI] [PubMed] [Google Scholar]
  • 29.Rutter, C. M., Nascimento de Lima, P., Maerzluft, C. E., May, F. P. & Murphy, C. C. Black-White disparities in colorectal cancer outcomes: A simulation study of screening benefit. J. Natl. Cancer Inst. Monogr.2023, 196–203 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Yamauchi, M. et al. Assessment of colorectal cancer molecular features along bowel subsites challenges the conception of distinct dichotomy of proximal versus distal colorectum. Gut61, 847–854 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Ugai, T. et al. Molecular characteristics of early-onset colorectal cancer according to detailed anatomical locations: Comparison with later-onset cases. Am. J. Gastroenterol.118, 712–726 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.El Asri, A. et al. Associations between nutritional factors and KRAS mutations in colorectal cancer: A systematic review. BMC Cancer10.1186/s12885-020-07189-2 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Inamura, K. et al. Prediagnosis plasma adiponectin in relation to colorectal cancer risk according to KRAS mutation status. J. Natl. Cancer Inst.108, djv363 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Myte, R. et al. A longitudinal study of prediagnostic metabolic biomarkers and the risk of molecular subtypes of colorectal cancer. Sci. Rep.10, 5336 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Jenkins, B. D. et al. Neighborhood deprivation and DNA methylation and expression of cancer genes in breast tumors. JAMA Netw. Open10.1001/jamanetworkopen.2023.41651 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wing, S. E. et al. Neighborhood disadvantage is associated with KRAS-mutated non-small cell lung cancer risk. J. Cancer Res. Clin. Oncol.149, 5231–5240 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Mahmood, K. et al. Elucidating the risk of colorectal cancer for variants in hereditary colorectal cancer genes. Gastroenterology165, 1070–1076. 10.1053/j.gastro.2023.06.032 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Purcell, S. et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet.81, 559–575 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Auton, A. et al. A global reference for human genetic variation. Nature526, 68–74. 10.1038/nature15393 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Fernandez-Rozadilla, C. et al. Deciphering colorectal cancer genetics through multi-omic analysis of 100,204 cases and 154,587 controls of European and east Asian ancestries. Nat. Genet.55, 89–99 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature581, 434–443 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Machiela, M. J. & Chanock, S. J. LDlink: A web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics31, 3555–3557 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Sollis, E. et al. The NHGRI-EBI GWAS catalog: Knowledgebase and deposition resource. Nucleic Acids Res.51, D977–D985 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Sherry, S. T. et al. dbSNP: The NCBI database of genetic variation. Nucleic Acids Res.29, 308–311 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Dampier, C. H. et al. Oncogenic features in histologically normal mucosa: Novel insights into field effect from a mega-analysis of colorectal transcriptomes. Clin. Transl. Gastroenterol.11, e00210 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Díez-Obrero, V. et al. Genetic effects on transcriptome profiles in colon epithelium provide functional insights for genetic risk loci. Cell. Mol. Gastroenterol. Hepatol.12, 181–197 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature590, 300–307 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Dong, S. et al. Annotating and prioritizing human non-coding variants with RegulomeDB vol 2. Nat. Genet.55, 724–726. 10.1038/s41588-023-01365-3 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Ward, L. D. & Kellis, M. HaploReg: A resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res.40, D930–D934 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Peters, U. et al. Identification of genetic susceptibility loci for colorectal tumors in a genome-wide meta-analysis. Gastroenterology144, 799 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Pearlman, R. et al. Prevalence and spectrum of germline cancer susceptibility gene mutations among patients with early-onset colorectal cancer. JAMA Oncol.10.1001/jamaoncol.2016.5194 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Ricker, C. N. et al. DNA mismatch repair deficiency and hereditary syndromes in Latino patients with colorectal cancer. Cancer123, 3732–3743 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (13.1KB, docx)
Supplementary Material 2 (944.8KB, xlsx)
Supplementary Material 3 (181.1KB, docx)

Data Availability Statement

Data generated or analyzed during this study are included in this published article in Supplementary Tables, in dbGAP and/or the following data repositories as listed below. The Cancer Genome Atlas (TCGA), HPFS, NHS, and some of the GECCO somatic mutation data and germline genotyping data are available in dbGAP under accession numbers (phs000178.v11.p8, phs001111.v1.p1, phs000722.v3.p2, phs002050.v1.p1 [since updated to phs002050.v2.p1]). Access to other data from international sites can be facilitated by the corresponding author upon request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES