International analysis of premalignant intestinal metaplasia identifies Helicobacter pylori variants, somatic drivers, mutational signatures, and clonal hematopoiesis as risk factors associated with progression to gastric cancer.
Abstract
Intestinal metaplasia (IM) is a premalignant condition associated with increased risk of gastric cancer—a deadly malignancy with varying geographic incidence. High-depth targeted sequencing of more than 1,500 IM samples from six countries identified 47 significantly mutated genes, including driver genes associated with high-risk populations and worse prognosis (ARID1A), KRAS/MAPK signaling (KRAS, BRAF, MAP2K1, MAP3K1, and MAP2K4), and altered mucosal immunity (PIGR). IM whole-genome sequencing and DNA methylation analysis revealed SBS17 as a specific mutational signature separating IMs from normal gastric tissues, associated with late DNA replication, genomic hypomethylation, and tobacco exposure. Beyond epithelial-derived somatic mutations, we observed elevated clonal hematopoiesis (CH) in patients with IM associated with age, smoking, and enhanced risk of progressing to gastric cancer. Patients with CH expansions exhibited co-occurring IM PIGR truncating mutations and greater colonization of the IM microenvironment by orally derived bacteria, suggesting that CH may promote IM progression by modulating host–microbe mucosal immunity.
Significance:
This international study identifies recurrent IM driver genes, IM-specific mutational signatures, and alterations in IM-associated immune landscapes and microbiomes. Our results highlight a role for nonepithelial somatic alterations (CH) in IM progression to gastric cancer, offering new translational opportunities for early cancer detection and interception.
Introduction
Gastric cancer is the fifth most common malignancy worldwide and the fourth leading cause of cancer death, accounting for 769,000 deaths globally in 2020 (1). Gastric cancer incidence is highly variable across geographies; in high-risk Asian countries such as Japan and Korea, gastric cancer incidence is three- to sevenfold higher compared with moderate-risk Asian countries (e.g., Singapore) and other low-risk areas (e.g., North America). Most gastric cancers are adenocarcinomas, typically categorized into two main histologic types: intestinal type and diffuse type. Intestinal-type gastric cancer, the predominant gastric cancer subtype, is characterized by well-differentiated glandular structures thought to develop via the Correa cascade, a multistep sequence starting from normal gastric mucosa, progressing to chronic atrophic gastritis (gastric atrophy), intestinal metaplasia (IM), dysplasia, and eventually gastric cancer (2). Among these lesions, IM is considered a premalignant condition with IM patients exhibiting an increased risk of subsequent gastric cancer (3).
Infection by Helicobacter pylori (Hp) is a major risk factor for gastric atrophy, IM, and gastric cancer. Hp infection accounts for more than 75% of all human gastric cancers (4), and believed to drive gastric carcinogenesis by evoking chronic mucosal inflammation, leading to changes in gastric physiology and promoting IM (5). Besides Hp, gastric cancer incidence is also influenced by other environmental and host genetic factors. For example, genetic ALDH2 polymorphisms are prevalent in East Asian populations and associated with increased gastric cancer risk among current alcohol drinkers (6). Lifestyle factors such as tobacco smoking may also contribute to gastric cancer by activating nicotinic acetylcholine receptors leading to increased oxidative stress (7), and heavy alcohol consumption may increase gastric cancer risk due to acetaldehyde exposure (8). In addition to Hp, recent studies suggest that other oral bacteria may promote inflammation in premalignant gastric epithelia (9). Dissecting the complex interplay between intrinsic and extrinsic factors in IM and gastric cancer progression may reveal new opportunities for gastric cancer early detection, screening, and prevention.
Clonal hematopoiesis (CH) is an age-associated biological condition in which somatic mutations in hematopoietic stem cells cause the expansion of genetically distinct blood cell clones (10). CH-derived clones can outcompete and dominate other blood cell populations, leading to detectable mutations in the blood without overt hematologic malignancies. Although not believed to be directly carcinogenic in epithelial malignancies, CH has been associated with an increased risk of developing certain premalignant and malignant conditions in solid tissues, including lung, liver, and colon (11–13). One proposed mechanism is that CH-derived immune cells can alter inflammatory responses in resident tissues, potentially fostering a proinflammatory microenvironment. To date, the role of CH in IM and gastric cancer progression remains largely unexplored. Here, using high-depth targeted sequencing of >270 genes, we investigated patterns of somatic genetic diversity in a large cohort of IM samples (>1,500) from six different countries with varying gastric cancer incidence rates. Besides defining novel IM driver events and mutational signature exposures influencing IM development, our results highlight CH as a risk factor for IM progression to gastric cancer, possibly mediated through the modulation of host–microbe mucosal immunity. These findings may suggest novel prevention and therapeutic strategies for patients with IM at high-risk of developing gastric cancer.
Results
Data Collection
We performed high-depth targeted DNA sequencing of 277 human and six Hp genes (14) on 1,582 IM samples (average sequencing depth, 1,108×). These included 463 antral IM samples (2022–ongoing) from recently recruited subjects in Singapore (218), South Korea (106), Hong Kong (62), United States (36), Japan (33), and Taiwan (8) along with paired germline samples (Fig. 1A). We also added 1,119 previously collected IMs (632 antrum and 487 body/cardia) and 98 normal gastric samples from Singapore sequenced on the same panel [TransGCEP1000 (15); 2000–2010; ref. 14]. Based on computational simulations and down-sampling analysis (see later), high-depth targeted sequencing was required to confidently identify mutations with low variant allele frequencies (VAF). All sequencing reads were processed with a unified pipeline to ensure consistency in the detection of somatic mutations. Selected samples with appreciable mutation VAFs were also analyzed by whole-genome sequencing (WGS; n = 20; average coverage of 60.5×). To evaluate relationships between IM genetic changes and epigenetic alterations, we further performed genome-wide enzymatic methylation sequencing (EM-seq) in a separate cohort of 14 patients with concurrent matched normal, dysplasia, and early gastric cancer samples (38 samples) from South Korea. Finally, we also incorporated oral microbiome data from saliva of patients with IM, including 32 samples from the US cohort (targeted DNA sequencing) and 173 samples from 154 patients in a local Singaporean cohort (shotgun metagenomics). These were compared against previously generated IM microbiome profiles (14). Supplementary Table S1 provides patient ancestries, epidemiologic information, country of origin, and clinical annotations, including age, tobacco smoking, IM severity, Operative Link on Gastric Intestinal Metaplasia Assessment (OLGIM) stages, family history, and development of dysplasia or early gastric neoplasia (EGN). Supplementary Table S1 also details which individual samples were used for specific analyses and figures. Supplementary Table S2 summarizes key findings observable across multiple geographies and ancestries.
Figure 1.
Multi-geographic genomic analysis of IM samples. A, Distribution of premalignant samples analyzed in this study (n = 1,680 samples; 1582 IM and 98 normal samples). The barplot illustrates Hp positivity rates by sequencing and histology assessment. B, Phylogenetic distribution of Hp gene variants across Hp-positive samples. C, Comparison of Hp gene variants between clade 1 (Japan/Korea) and clade 2 (Singapore). D, Distribution of Hp variants mapped to the cagA gene. E, Structural model of the CagA–ASPP2 complex in two different orientations (top). The bottom highlights CagA residues interacting with ASPP2. Variable amino acids between Singapore and Japan/Korean strains are indicated in red, while CagA residues predicted to bind ASPP2 are in blue. F, Differential binding affinity of CagA variants with ASPP2. Exogenous CagA–ASPP2 interactions were assessed in HEK293T cells via reciprocal co-immunoprecipitation assays. V5-tagged ASPP2 (wild type_WT or non-binding mutant_Y754A/Y754C) or HA-tagged CagA variants were immunoprecipitated with V5 or HA antibodies, respectively, followed by immunoblot analysis of the co-immunoprecipitated CagA or ASPP2. The intermediate-risk region CagA variant (D106/K109/H228) shows reduced ASSP2 binding compared with the high-risk region CagA variant (E106/R109/N228). The nonbinding ASPP2 mutants (Y754A or Y754C) fail to interact with CagA. Quantitative analysis of relative CagA–ASPP2 binding affinity (normalized to total IP) was performed in four independent replicates. ASR, age-standardized rate.
Geographic Patterns of Hp Strains and Infection
Hp infection is the strongest risk factor for gastric cancer (4). We investigated the prevalence of Hp infection across the six countries by evaluating the coverage of Hp genes in our targeted panel. Genomic assessment of Hp infection was concordant with histologically derived Hp status (Fisher test OR, 1,404; P value < 2.2 × 10−16; Fig. 1A). Cases exhibiting high Hp abundance (more than 1× coverage) were more frequently observed in regions associated with high gastric cancer risk, specifically South Korea (24 of 106; 22.6%) and Japan (two of 33; 6.1%), compared with recently collected IMs from Singapore where gastric cancer incidence is moderate [Singapore samples (recent), three of 218; 1.4%; Fisher test OR, 16.4; P value 5.2 × 10−9]. No high-abundance Hp samples were identified in IMs from Hong Kong, United States or Taiwan (0/106). These differences may reflect the impact of Hp eradication efforts as IM samples from Singapore collected at earlier time periods (2000–2010) displayed higher Hp levels (34 of 632; 5.7%), likely reflecting a period when Hp eradication was less routinely practiced. Most subjects with IM (759 of 1,066; 71.2%) had previous histories of Hp infection as assessed from patient medical histories or Hp serology; however, once IM develops, the altered gastric mucosal environment becomes less hospitable for Hp colonization (16, 17). The small number of samples with detectable Hp infection thus likely reflects cases of (a) early-stage IM in which the gastric environment has not yet completely excluded Hp, (b) infection by Hp strains that are better adapted to persist in IM, or (c) individuals with very high initial levels of Hp colonization.
To explore patterns of Hp genomic variation across geographic regions, we identified genetic variants across the six targeted Hp genes and performed phylogenetic analysis. We observed a clear phylogenetic separation between Hp strains from high-risk populations (Japan/South Korea) compared with intermediate-risk Singapore (Fisher test, P value 6.0 × 10−11; Fig. 1B). Further analysis revealed a significant association between this separation and three nonsynonymous coding variants (E106D, R109K, and N228H) in the N-terminus of the Hp cagA gene (Fig. 1C and D). We validated this finding using published Hp genomes from the Helicobacter pylori Genome Project (18). Specifically, cagA genes carrying the three variants were highly enriched in Southeast Asian countries (Singapore, Malaysia, Indonesia, and Vietnam; E106D, 23 of 51, R109K, 23 of 32, and N228H, 20 of 28) compared with Japan and South Korea (five of 71, six of 31, and one of 44; Fisher test P values 1.5 × 10−6, 4.0 × 10−5, and 1.8 × 10−10). CagA encodes a bacterial virulence factor that when injected into gastric epithelial cells by Hp type IV secretion systems can interact with host intracellular proteins in a pro-oncogenic manner (19). For example, the N-terminal (domain I) region of CagA has been reported to interact with tumor suppressors such as ASPP2 (TP53BP2) (20) and RUNX3 (21). Protein structure modeling using AlphaFold 3 suggests that CagA domain I forms a binding groove (residues N94, D102, V105, D113, Q117, T120, Q164, S165, G168, P201, G203, G204, W206, S212, F213, F215, and K217) that interacts closely with ASPP2’s proline-rich domain at amino acid positions 750 to 764 (QKLLYQRTTIAAMET; Fig. 1E). To experimentally validate the impact of these CagA variants on ASPP2 binding, we performed CagA–ASPP2 immunoprecipitation assays. CagA variants from high-risk regions (E106, R109, and N228) robustly interacted with ASPP2 (Fig. 1F) whereas CagA variants with E106D, R109K, and N228H substitutions (commonly found in Southeast Asia) exhibited significantly reduced binding. It is possible that the heightened affinity for ASPP2 by E106/R109/N228 CagA may inhibit ASPP2–TP53 tumor-suppressive functions (22). In summary, our findings reveal substantial differences in both Hp prevalence and genetic diversity in IM patients from different geographies. These results highlight a potential role for Hp genetic diversity in initiating the Correa cascade (23) and contributing to differences in regional gastric cancer risk.
Driver Gene Landscape of Trans-Geographic IM Samples
We utilized the IntOGen pipeline (24), which combines seven complementary algorithms for driver gene discovery, to identify candidate driver genes in the IM samples. A total of 2,100 driver mutations in 47 significantly mutated genes were identified (combination q-value < 0.005), including SOX9, ARID1A, ARID2, and FBXW7, as well as 25 genes not previously reported (SMAD3, PPP2R1A, MAP2K1, ATM, PPP6C, CDH1, FGFR2, RBM10, FOXQ1, LRP1B, GNAS, SMAD2, ZBTB16, APC, EGFR, NOTCH1, KMT2C, ARHGAP5, CHD4, PTPRT, MAP2K4, SPOP, ELF3, NF1, and PREX2). Figure 2A highlights 36 of the 47 driver genes identified by at least one IntOGen algorithm, with the complete list of all 47 driver genes listed in Supplementary Table S3. To support the robustness of our driver gene analysis, we undertook several validation steps. First, we profiled an independent cohort of 48 IM samples using the same targeted panel and observed similar protein-altering mutations in 74.5% (35 of 47) of the predicted driver genes. Second, we compared the IM driver gene mutations with three independent cohorts, including (a) a recent genomic study of nonmalignant gastric samples (25), (b) driver mutation lists from The Cancer Genome Atlas (TCGA) pan-gastrointestinal cancers (26), and (c) driver mutation lists for gastric cancer curated by IntOGen (24), the latter two given the close relationship between IM and gastric cancer. Reassuringly, of the 47 driver genes identified in our study, 37 (78.7%) were also listed as driver genes in at least one of the listed resources. These findings substantially increase the number of IM driver genes compared with previous studies (26 genes), showcasing the utility of analyzing samples from diverse geographies.
Figure 2.
Mutational landscape in multi-ancestry IM samples. A, Oncoprint plot displaying mutations in 36 driver genes across race and geographic sites (n = 1,095 antrum IM). Genes are ranked by IntOGen’s combined significance score. Only genes identified as significant by at least one driver gene algorithm are shown. The accompanying forest plot highlights genes enriched in patients with IM who progressed to EGN. B, Lollipop plot illustrating differential mutation rates in key driver genes between high-risk Japanese/Korean and intermediate-risk Chinese populations. C, Barplots comparing frequencies of ARID1A mutations in patients with IM who developed EGN compared with those who did not.
Focusing on the antrum, a combined analysis of 1,095 antral IMs revealed an average of 23 somatic mutations per sample with an average VAF of 2.9% (median, 1.7%; “Methods”). Mutation rates were higher in Japanese/Korean subjects than in Chinese subjects (median, 28.5 vs. 20; Wilcoxon test P value 4.1 × 10−11). We identified an elevated frequency of SOX9 mutations in antral IMs from Chinese populations (909 subjects: 836 Singapore, 52 Hong Kong, 13 USA, and eight Taiwan), whereas mutations in ARID1A, ARID2, ERBB3, KMT2D, KDM6A, CREBBP, and PREX2 were more prevalent in antral IMs from high-risk populations (33 Japanese and 109 Korean; Fig. 2B). The prevalence of SOX9 truncating mutations was particularly striking in the Chinese population (114 of 909 IMs, 94 truncating mutations), whereas few Korean or Japanese subjects in this study harbored similar mutations (four of 142 IMs; two truncating). Despite these differences, truncating ARID1A mutations were significantly associated across geographies with IM samples exhibiting eventual or concurrent EGN (combined dataset, Fisher test, OR, 6.2; P value 1.5 × 10−3; Fig. 2C). ARID1A encodes a core subunit of the SWI/SNF chromatin remodeling complex, establishing enhancer accessibility and lineage-specific transcription and identity (27). Given SWI/SNF’s role in enhancer regulation and lineage fidelity, ARID1A loss may facilitate transcriptional reprogramming and epithelial plasticity, predisposing IMs to subsequent neoplastic transformation.
Most samples in this study underwent high-depth targeted panel sequencing, which was necessary to achieve the sequencing depth required to confidently detect low-frequency somatic mutations [average coverage of 1,108× for targeted regions, compared with standard whole-exome sequencing (WES) of 100× or WGS 60×). Supporting this strategy, in samples profiled by both targeted panels and WGS, only a fraction of somatic mutations detected by targeted panels could be recovered by WGS (168 of 961; 17.5%), including protein-altering mutations in predicted driver genes (13 of 71; 18.3%). Down-sampling analyses (110×; 10% of the original coverage) simulating 100× WES coverage further showed that only 15.5% of somatic mutations (4,104 of 26,414) and 15.1% of protein-altering mutations in driver genes (318 of 2,100) would be recovered by WES. These findings underscore the importance of using high-depth targeted sequencing for maximizing mutation discovery and accurately characterizing the IM mutational landscape across diverse populations.
KRAS/MAPK Genetic Alterations Are Prevalent in IM
The MAPK signaling network involving the RAS–RAF–MEK–ERK cascade plays a pivotal role in regulating cell proliferation, differentiation, and survival in gastric epithelial cells (28). Persistent activation of this pathway may contribute to the development and progression of EGN. Therapeutically, multiple classes of chemical inhibitors have been developed to disrupt MAPK signaling, particularly targeting MEK and ERK kinases. We observed that several IM driver genes were involved in RAS–RAF–MEK–ERK signaling, such as KRAS, BRAF, MAP2K1, MAP3K1, MAP2K4, and NF1 (Fig. 3A). Mutations in these KRAS–MAPK components often comprised missense and oncogenic gain-of-function mutations, such as KRASG12D, BRAFD594G, and MAP2K1F53L. In contrast, negative regulators of the pathway such as MAP2K4, MAP3K1, and NF1 were often inactivated by loss-of-function mutations (29, 30). To examine the molecular consequences of these KRAS/MAPK pathway mutations, we analyzed bulk RNA sequencing (RNA-seq) data from 104 IM samples, of which 10 samples exhibited driver oncogenic mutations in KRAS, BRAF, MAP2K1, or NF1 (Fig. 3B). Comparing the 10 KRAS-/MAPK-mutated samples with IM samples with no detectable oncogenic KRAS/MAPK pathway mutations, we observed a significant upregulation of KRAS signatures [normalized enrichment score (NES), 1.7; FDR, 6.3 × 10−6] and ERK signatures (NES, 1.6; FDR, 1.1 × 10−4) utilizing a recently published gene expression set capturing KRAS and ERK signaling pathway activation (31). There was an even stronger upregulation when the two pathways were intersected (NES, 2.4; FDR, 9.3 × 10−18). To confirm that this result is not caused by group size imbalances, we performed down-sampling analyses comparing the 10 IM samples with KRAS/MAPK driver alterations with 10 randomly selected IM samples without KRAS/MAPK alterations. Across 1,000 iterations, IMs with KRAS/MAPK mutations consistently overexpressed genes upregulated in the KRAS (953 of 1,000), ERK (993 of 1,000), and KRAS–ERK (983 of 1,000) signaling pathways.
Figure 3.
KRAS–MAPK mutations in IM. A, Lollipop plots showing activating (BRAF, KRAS, and MAP2K1) and inactivating (MAP2K4, MAP3K1, and NF1) mutations in KRAS–MAPK driver genes. Red texts indicate oncogenic gain-of-function mutations and blue text indicates oncogenic loss-of-function mutations classified using oncoKB. B, Gene set enrichment analysis of 10 IM samples with KRAS/MAPK driver mutations vs. 94 IM samples without KRAS/MAPK driver mutations (n = 104 IM samples profiled on both targeted DNA and bulk RNA-seq). C, Violin plot (top left) illustrating KRAS–ERK pathway scores from scRNA-seq samples across gastric/intestinal epithelial cell types. Stacked barplot showing the proportion of cells in different cell cycle phases (non-cycling G1 and cycling S and G2–M phases). Violin plot (right) shows KRAS–ERK pathway scores stratified by epithelial cell types and cell cycle phases. D, CDX2 immunostaining in representative normal stomach (left) and IM (right) organoids. E, Expression of stomach and intestinal lineage genes in normal stomach and IM organoids. Public datasets of normal stomach and normal colon were added for comparison. F, Gene set enrichment analysis of genes associated with KRAS–ERK signaling pathway, comparing three IM organoids derived from patients with severe IM histology with three mild IM organoids. G, Cell viability of IM and normal gastric organoids under 1 μmol/L pyrvinium treatment for 72 hours, measured by ATP levels relative to DMSO control (n = 4 biological replicates per group). IM organoids show significantly reduced viability compared with normal organoids (linear mixed-effects model, P < 0.0005).
Next, using the same pathway signatures, we interrogated single-cell RNA sequencing (scRNA-seq) data from antral IMs to identify cell types and cell states associated with heightened KRAS–ERK pathway activity (see “Methods”). Among the eight epithelial gastric or intestinal cell types, KRAS–ERK pathway signatures were most enriched in gastric stem cells marked by IQGAP3. This association may reflect the highly proliferative nature of gastric stem cells as ERK signaling had been reported to promote the G1–S cell cycle transition. Supporting this hypothesis, most gastric stem cells were predicted to reside within the proliferative S or G2–M phases (referred to as cycling states, inferred by scRNA-seq). Moving beyond gastric lineages, we then examined the four intestinal cell lineages associated with IM (intestinal stem cells, intestinal transit–amplifying cells, intestinal goblet cells, and intestinal enterocytes). Of these, intestinal stem cells (NES, 2.39; FDR, 5.2 × 10−13) and intestinal transit–amplifying cells (NES, 2.95; FDR, 1.2 × 10−25) in cycling states exhibited higher KRAS–ERK pathway enrichment relative to noncycling cells (Fig. 3C). One interpretation of this finding is that KRAS–ERK pathway activation may drive the transition of quiescent IM stem cells into the active cell cycle, thereby enabling the emergence of intestinal cell lineages and IM. It is thus possible that therapies targeting the MEK/ERK pathway may reduce the proliferation and progression of aberrant IM lineages (32) as reported in preclinical models.
Separately, we also extended the scRNA-seq analysis to evaluate changes in immune cell populations between IM and normal gastric samples or between IM samples with different severities. Although we did not observe significant changes in the overall proportions of immune cell types, we found that the ratio of transit-amplifying cells to intestinal stem cells in IM samples positively correlated with the abundance of IgA+ plasma cells (Spearman rho = 0.58, P = 0.019), suggesting expansion of these plasma cell populations as the epithelium adopts a more proliferative state. Conversely, γδ T cells (Spearman rho = 0.54, P = 0.033) were correlated with intestinal stem cell populations. These findings suggest that the IM immune microenvironment becomes dynamically remodeled in concert with IM epithelial cell state transitions.
Interestingly, IM cells with high KRAS–ERK activation also co-expressed other activated pathways such as IL2–STAT5 signaling (Supplementary Table S4), highlighting the possibility of combined pathway targeting (33). To test the possibility of combined MEK/STAT inhibition as an IM therapeutic option, we generated 10 human gastric organoids (six IM and four histologically normal stomach; Supplementary Table S5) and confirmed their IM identities by CDX2 immunostaining (Fig. 3D) and bulk RNA-seq (Fig. 3E). IM organoids exhibited transcriptional similarities with published IM gene expression profiles (34), marked by overexpression of intestinal markers such as REG4, FABP1, and CDX1, and downregulation of gastric markers SOX2, GKN1, and CLDN18. These hybrid profiles differentiated the IM organoids from in-house or published profiles of normal organoids derived from gastric or colonic tissues (Fig. 3E). Notably, KRAS–ERK signaling was significantly enriched in IM organoids from severe patients with IM compared with histologically less severe IM (Fig. 3F). Treatment with pyrvinium pamoate, an inhibitor of both MEK/ERK and STAT signaling (33), revealed that IM organoids demonstrated greater growth sensitivity to pyrvinium compared with normal organoids (Fig. 3G; Supplementary Fig. S1A). In terms of clonogenic potential, normal and IM organoids exhibited comparable baseline colony-forming capacity. However, pyrvinium treatment significantly impaired colony-forming ability in IM organoids but had minimal impact on normal gastric organoids (Supplementary Fig. S1B). Western blot analysis confirmed that pyrvinium treatment caused a reduction in phosphorylated ERK (p-ERK) levels in all IM organoids, whereas phosphorylated STAT3 (p-STAT3) showed no consistent changes (Supplementary Fig. S1C), suggesting that pyrvinium primarily suppresses the MEK/ERK pathway in IM, as previously reported (33).
To further investigate the mechanistic basis of pyrvinium sensitivity in IM, we expanded the drug response analysis to include inhibitors targeting ERK and STAT3 signaling. However, when we evaluated the sensitivity of normal and IM organoids to ERK (ulixertinib) and STAT3 inhibitors (STAT3-IN-1), none of these single-pathway inhibitors or their combination showed clear differences in sensitivity between IM and normal organoids, suggesting that additional mechanisms may contribute to the heightened sensitivity of IM organoids to pyrvinium (Supplementary Fig. S1D). To provide a more exploratory perspective, we performed gene set enrichment analysis using Gene Ontology and Hallmark pathway analysis (Supplementary Table S6). In KRAS-/MAPK-mutated IM compared with wild-type IM, we observed significant upregulation of Hallmark pathways such as mTORC1 signaling (adjusted P value, 8.3 × 10−18), whereas in the comparison between severe and mild IM organoids, Gene Ontology analysis revealed enrichment of mitochondrial gene expression (adjusted P value, 7.7 × 10−10), consistent with a metabolic shift toward higher oxidative phosphorylation (OXPHOS). Although previous studies have reported that pyrvinium induces cell death primarily in dysplastic organoids (33), our findings highlight its potential to selectively diminish cell viability in IM organoids from patients without cancer. In summary, our findings suggest that activating mutations in KRAS/MAPK pathway genes are present in a subset of IM lesions and associated with increased KRAS–ERK pathway activity. Functional studies in human IM organoids further support that drugs such as pyrvinium, which target KRAS/MEK/ERK signaling and other pathways, can selectively reduce IM cell viability highlighting potential therapeutic interventions to inhibit IM progression.
SBS17 Is an IM-Associated Mutational Signature
Although WGS analyses of gastric cancers and normal gastric tissues have been widely reported (25, 35), few reports have analyzed IM WGS, highlighting a critical knowledge gap in this intermediate stage of gastric carcinogenesis. To identify specific mutational processes in IM, we performed signature analysis using SigProfilerAssignment (36) on 20 paired IM–normal WGS samples (12 Singapore, four Korea, three Japan, and one Hong Kong; Fig. 4A). We defined four predominant mutational single-base substitution (SBS) signatures–SBS1 (clock-like; average SBS exposure 17.0%), SBS5/40 (clock-like; 38.8%), SBS17 (unknown etiology; 27.1%), and SBS18 (oxidative stress; 17.0%). Notably, although all four signatures have been observed in gastric cancer (Fig. 4B), only three have been reported in normal stomach (SBS1, SBS5/40, and SBS18 but not SBS17; ref. 25), raising the possibility that SBS17 may reflect a mutagenic process operative in IM and gastric cancer but not normal gastric epithelium. SBS17-associated IM mutations tended to have lower VAFs than other SBS mutations (Wilcoxon test, P < 0.05; Fig. 4C), consistent with SBS17 processes causing sub-clonal mutational events in early stages of IM.
Figure 4.
Mutational signatures in IM. A, Mutational signatures identified in 20 paired IM samples using WGS. Samples are ordered left to right based on mutation counts attributed to SBS17. B, Comparison of mutational signatures across 79 normal gastric samples (Welcome Trust Sanger Institute), 20 IM samples (this study), and 122 gastric cancers (ICGC China). C, Median VAFs of autosomal chromosome mutational signatures in IM WGS samples. Samples are ordered by the median allele frequency of all SBS mutations. Asterisks denote significant differences between VAFs of SBS17 and non-SBS17 mutations. D, Analysis of mutational signatures in IM samples stratified by replication timing. E, Gene set enrichment analysis of genes associated with the Hallmark OXPHOS pathway, comparing six IM organoids with four normal organoids. F, OCR across time for normal and IM organoids. Dotted lines indicate the addition of mitochondrial inhibitors (oligomycin A, FCCP, and rotenone + antimycin A). P values were estimated using linear mixed-effects models. G, Boxplot showing 8-oxo-dG levels in IM and normal organoids. Each organoid was measured in six technical replicates. Replicate outliers, defined within each organoid as 8-oxo-dG levels exceeding the mean ± 1.5 SD, were excluded from analysis. P values were estimated using a linear mixed-effects model. H, Correlation of mutational signatures with patient age and tobacco smoking status. GC, gastric cancer.
To investigate the clonal architecture of IM, we used SciClone to estimate the number of distinct clones per sample. Eight IMs showed no evidence of subclonal architecture, whereas the remaining 12 displayed multiclonality (11 samples with two clones and one sample with three clones), particularly in IMs with higher overall VAFs (Supplementary Fig. S2A). Notably, the smallest subclone in these samples had an average VAF of approximately 9.9%, near the detection limit of our 60× WGS coverage. This suggests that IM samples appearing clonal in nature may harbor additional, undetected subclones at lower VAFs. To examine genomic landmarks associated with the SBS mutational signatures, we explored associations between DNA replication timing and SBS mutations using publicly available replication timing data (37). Dividing the human genome into four equal quartiles, we observed a modest increase in mutational frequency at late replicating regions for SBS18 (5.0×), SBS1 (2.0×), and SBS5/40 (3.0×), consistent with higher mutation rates occurring at late-replicating regions due to reduced DNA repair activity. However, exceeding these modest increases, we observed a striking enrichment (14.5×) of SBS17 mutations in the same late-replicating regions (Fig. 4D). This observation raises the possibility that SBS17 may be linked to the damage of cellular nucleotide pools due to OXPHOS, a metabolic process strongly activated in IM stem cells (14). Damaged nucleotides such as 8-oxo-dG tend to be incorporated into late-replicating DNA, resulting in T>G mutations characteristic of SBS17 (38). Consistent with this hypothesis that IM organoids exhibit higher OXPHOS activity, IM organoids expressed higher levels of OXPHOS pathway signatures compared with normal organoids (Fig. 4E), and at the biochemical level Seahorse mitochondrial stress testing demonstrated that both basal and maximal OXPHOS activities, measured by oxygen consumption rates (OCR), were significantly elevated in IM organoids (Fig. 4F; basal respiration P = 0.035, maximal respiration P = 0.015). Using ELISAs, we confirmed 8-oxo-dG nucleoside levels in IM organoids (Fig. 4G), and to further establish an association between OXPHOS and DNA mutations, we treated a normal organoid with dichloroacetate (DCA), a compound that enhances OXPHOS via inhibition of pyruvate dehydrogenase kinase. DCA treatment increased OCR, elevated levels of reactive oxygen species (ROS), and caused DNA damage as assessed by γH2AX fluorescence (Supplementary Fig. S2B–S2D). However, we acknowledge that although our findings support an association between heightened OXPHOS activity, oxidative stress, and DNA damage, they do not definitively establish a mechanistic link between OXPHOS and SBS17 mutations in IM. This represents a potential avenue for future research (see “Discussion”).
To assess the distribution of SBS signatures across geographies, we then mapped the four SBS signatures onto the targeted sequencing data. Among the 1,095 antrum IMs, we identified 1,028 (93.9%), 1,013 (92.5%), 317 (28.9%), and 287 (26.2%) samples with SBS5/40, SBS1, SBS18, and SBS17 exposures, respectively. The lower prevalence of SBS18 (28.9% vs. 100%) and SBS17 (26.2% vs. 60%) mutations in our targeted panel is likely attributed to the preference for these mutations to occur at late-replicating noncoding regions, which are not reflected in the sequencing panel which focuses on coding regions. We also compared “meta-samples,” representing averaged mutational profiles between countries, using the targeted panel data (Supplementary Table S7). SBS17 was consistently detected in 26.2% of IMs across geographic regions (range: 12.5% in Taiwan to 51.5% in Japan). Comparing mutational profiles between high-risk and low-/intermediate-risk countries, we found that high-risk regions have significantly higher mutation rates across all detected signatures (overall mutation rate: 1.38×, P = 3.2 × 10−11; SBS1: 1.36×, P = 5.0 × 10−8; SBS5/40: 1.32×, P = 3.3 × 10−7; SBS17: 2.16×, P = 2.9 × 10−7; and SBS18: 1.56×, P = 1.2 × 10−4). SBS17 was notably more prevalent in high-risk countries (5.9% vs. 3.8%, 1.57-fold enrichment, P = 8.5 × 10−6). This meta-sample analysis supports the prevalence and reproducibility of all signatures, including SBS17, in IM samples from multiple countries.
To assess epidemiologic features associated with the SBS signatures, SBS1 (Pearson r = 0.27; P < 2.2 × 10−16), SBS5/40 (Pearson r = 0.22; P < 2.2 × 10−16), and SBS18 (Pearson r = 0.11; P = 4.0 × 10−4) were correlated with age but not SBS17 (Pearson r = 0.056; P = 0.064 and Spearman correlation = 0.029; P = 0.33), consistent with previous reports. To further discount the possibility of the data being skewed by outliers, we conducted the analysis both before and after removing outliers, in which outliers were defined as samples with mutation counts exceeding one IQR above the third quartile. Consistently, SBS1, SBS5/40, and SBS18 exhibited significant correlations with age across both methods and scenarios whereas no significant association was observed between SBS17 and age (Supplementary Table S8). In our dataset, we identified a significant positive association between SBS17 and tobacco smoking (Wilcoxon test, P = 6.2 × 10−3; Fig. 4H). These analyses highlight different mutational processes contributing to the mutational landscape of IM, with SBS1, SBS5/40, and SBS18 representing early, prevalent signatures, whereas SBS17 may reflect later, more specific exposures resulting from oxidative stress and tobacco smoking (39), the latter being a potentially modifiable and preventable exposure. In summary, our analyses highlight SBS17 as a specific mutational signature associated with the development of IM, distinct from other signatures that correlate with chronological age. This supports the potential of SBS17 as a potential biomarker of gastric cancer progression, rather than reflecting gradual accumulation of endogenous DNA damage over an individual’s lifespan.
Late-Replicating IM Mutations Coincide with DNA Hypomethylation in Developing Gastric Cancer
Alterations in DNA methylation have been reported to increase gastric cancer risk through “epigenetic field cancerization” (40). To investigate the influence of altered DNA methylation patterns on the IM genomic landscape, we then performed genome-wide EM-seq on a cohort of 14 patients in which each patient had patient-matched normal gastric tissues, dysplastic lesions, and early-stage gastric cancer samples (38 gastric samples). Gastric cancer samples were assigned to TCGA genomic subtypes based on WES data (14). Principal component analysis (PCA) and hierarchical clustering of the genome-wide methylation data revealed two clusters. One cluster comprised dysplastic/gastric cancer lesions enriched with features of IM-associated gastric cancer such as chromosomal instability (CIN) or microsatellite instability (MSI; 10 of 14), whereas the other cluster comprised normal or dysplastic/gastric cancer samples lacking genomic features of CIN, MSI, or Epstein–Barr virus (22 of 24; Fisher test P = 1.1 × 10−4; Supplementary Fig. S3A and S3B). We observed widespread genome-wide methylation alterations involving more than seven million CpG sites in which the majority was hypomethylated [6,741,487 hypomethylated (35.7%) and 326,646 hypermethylated (1.7%) of total tested CpG dinucleotides; q-value < 0.001, methylation change >10%] in the CIN+/MSI+ cluster compared with the CIN−/MSI− cluster (Supplementary Fig. S3C). In a region-based (1 kb bin) analysis, 750,775 (38.4%) regions were hypomethylated whereas 11,273 (0.57%) regions were hypermethylated (q-value < 0.001, methylation change >10%). Consistent with findings from other cancer studies, hypermethylated CpGs and regions were enriched in CpG islands (CpGs 27.7% vs. 0.3%; regions 58.6% vs. 0.37%) and gene promoters (±2 kb from RefGene’s transcription start site; CpGs 20.7% vs. 3.1%; regions 38.3% vs. 3.6%) and were depleted from intergenic regions (CpGs 30.2% vs. 58.9%; regions 28.0% vs. 61.0%; Supplementary Fig. S3D). Comparing CIN (n = 4) and MSI (n = 8) samples, we observed a larger proportion of shared hypomethylated CpG sites between CIN and MSI samples (44.7%) compared with hypermethylated sites (17.4%; Supplementary Fig. S3E). This pattern is consistent with hypomethylation changes representing core features of gastric cancer progression whereas hypermethylation changes are more specific to subtypes. MSI dysplasia/gastric cancer samples showed more hypermethylation changes, consistent with the acquisition of the MSI-linked CpG Island Methylator Phenotype (CIMP).
Of particular relevance to this study, hypomethylated regions showed a strong association with late-replicating regions, particularly for hypomethylated regions commonly observed across subtypes (Supplementary Fig. S3F). When compared with IM genome-wide mutation landscapes, we observed a substantial overlap between IM somatic mutations with gastric cancer/dysplasia hypomethylated regions (Supplementary Fig. S3G), particularly for SBS17 mutations which exhibited the highest specificity (75.7%) within these hypomethylated regions. When stratified by methylation regions, the presence of SBS17 mutations was further increased in late-replicating hypomethylated regions (Supplementary Fig. S3H and S3I). These observations strengthen the connection between replication timing and DNA damage in premalignant gastric cancer and suggest that hypomethylated regions in late-replicating genomic regions may be particularly vulnerable to acquiring mutations common in IM and gastric cancer.
Germline Analysis of IMs Reveals CH as a Risk Factor for Gastric Cancer Progression
Little is known about germline genetic variants contributing to IM (41). We proceeded to analyze the germline targeted sequencing data (from blood and saliva samples) for pathogenic or likely pathogenic germline variants. Examining germline variant patterns across the six countries, more than 30% of SNPs were shared across all cohorts, indicating substantial genetic overlap (Supplementary Fig. S4A–S4C). Among the 47 somatic driver genes, we identified protein-altering germline variants in 44 genes, including eight pathogenic/likely pathogenic variants (by ClinVar), affecting 11 individuals (Supplementary Table S9). Individuals with co-occurring germline and somatic variants in BCORL1 were at higher risk of either developing or exhibiting concurrent EGN and dysplasia (Fisher exact test, P-value = 0.033 and 0.0048, OR = 3.72 and 3.97, respectively). Additionally, individuals with co-occurring germline and somatic variants in BCOR and DDX3X were at higher risk of progressing to dysplasia (Fisher exact test, P-value = 0.039 and 0.029, OR = 2.71 and 4.00, respectively).
CH is a condition in which blood cell clones harboring somatic mutations can expand over time, and CH has been associated with increased risk of blood cancer and cardiovascular disease (42). Utilizing the availability of germline DNA information from blood or saliva samples, we adapted our variant detection workflow to detect CH variants by subtracting variants in normal samples from those in IM samples, enabling the detection of somatic mutations in the germline (this approach is similar to methods used in other studies; refs. 43, 44). We identified four significant CH driver genes (DNMT3A, TET2, ASXL1, and PPM1D), which are the most prevalent mutated genes in CH (Fig. 5A; ref. 45). Providing confidence in our variant detection workflow, the mutational spectrum of the driver genes aligned with published CH mutation patterns from a large pan-cancer dataset of 24,146 patients (predominantly White; matched tumor-blood MSK-IMPACT sequencing; ref. 43). Specifically, we observed recurrent DNMT3A missense mutations at codon R882, as well as truncating mutations in TET2, ASXL1, and the 3′ region of the PPM1D coding sequence (Fig. 5B). In total, we identified 286 mutations in reported CH genes across 225 IM subjects. Performing clinicopathologic associations, we found that that CH carriers tended to be older (Wilcoxon test, P value 2.7 × 10−12) and exhibited higher IM somatic mutation rates (Wilcoxon test, P value 1.3 × 10−2; Fig. 5C), suggesting a potential link between the presence of CH and increased IM somatic mutation burden. Certain lifestyle factors, such as smoking, were also associated with specific CH mutations. For example, ASXL1 mutations were more frequent among smokers (4.9% former/current smoker vs 0.9% never smoker; Fisher test, P value 1.7 × 10−4), consistent with previous reports (Fig. 5D; refs. 43, 46).
Figure 5.
Clonal hematopoiesis in IM samples. A, Oncoprint plot displaying CH mutations across 1,067 subjects with IM, ordered by increasing age. B, Lollipop plots comparing mutations in key CH genes with reported CH mutations in patients with cancer in the MSKCC cohort (GCEP: 1,067 subjects with IM, MSKCC: 24,146 subjects with cancer). C, Association of IM samples harboring CH mutations with patient age and somatic mutation rate (234 CHIP-positive IM samples and 861 CHIP-negative IM samples). D, Association of key CH genes with smoking status in patients with IM (1,067 subjects with IM). E, Univariable and multivariable logistic regression analysis analyzing CH as a risk factor of progression to dysplasia/EGN (n = 765 unique patients).
We performed logistic regression to investigate associations between gastric cancer risk and age, lifestyle, CH with indeterminate potential [CHIP; defined in other studies as CH with VAF > 2% (47)], high CH (VAF > 5%), and other clinicopathologic features. In the combined dataset (n = 765 unique IM subjects), both CHIP (VAF > 2%; P = 0.032) and high CH (VAF > 5%; P = 9.2 × 10−4) were associated with progression to dysplasia along with mutation rate (P = 6.8 × 10−4), ARID1A truncations (P = 0.012), age (P = 8.6 × 10−3), male sex (P = 0.026), and OLGIM stage (stage III, P = 2.9 × 10−3; stage IV, P = 0.011). Multivariable analysis confirmed that high CH (P = 0.045), male gender (P = 0.049), and OLGIM stage III (P = 0.015) were independently associated with dysplasia. When restricted to EGN (high-grade dysplasia or gastric cancer), high CH (P = 6.4 × 10−3) and ARID1A truncations (6.7 × 10−3) remained as significant independent factors associated with IM progression by multivariate analysis (Fig. 5E; Supplementary Table S10). When restricted to GCEP1000 samples with long-term follow-up (n = 312), high CH remained an independent predictor for EGN (Supplementary Table S10). As controls, we confirmed that sequencing coverage in both IM and germline samples was not significantly associated with the presence of dysplasia or EGN (Fig. 5E; Supplementary Table S10), and to ensure that variability in sequencing depth did not bias CH or IM mutation detection, we repeated all analyses after excluding samples with sequencing coverage outside ±1 SD from the mean (for either IM or germline). The results remained consistent, indicating that our findings are robust to potential variations in sequencing depth.
The absolute annual risk of progression from IM to gastric cancer is modest (between 0.18% and 0.25% per year; ref. 3), indicating that although IM increases gastric cancer susceptibility, most lesions do not progress. This low progression rate underscores a critical clinical challenge: although the majority of patients with IM will not develop gastric cancer, a small subset faces significantly elevated risk. We thus explored whether incorporating genomic information could improve risk prediction for EGN among patients with IM (Supplementary Fig. S5). In our cohort of 765 subjects with IM (OLGIM stage II–IV), 26 progressed to EGN. Using clinical variables alone (OLGIM stage, age, and sex), the predictive model achieved an AUC of 0.72. When genomic variables (mutation count, ARID1A mutations, and high CH) were added, the AUC increased to 0.773. Importantly, in the GCEP1000 subset with longer-term follow-up, the improvement was more pronounced: the AUC increased from 0.671 (clinical only) to 0.811 (clinical + genomic; Supplementary Fig. S5). These results indicate that combining molecular and clinical data may enhance risk stratification for EGN, offering a promising approach for more accurately identifying patients with IM who may benefit from targeted endoscopic surveillance. Taken together, our work underscores that although most patients with IM do not progress, there is a compelling need to identify the subgroup at heightened risk. By incorporating molecular markers into predictive algorithms, we can better pinpoint high-risk individuals for more targeted surveillance.
CH Expansions Are Associated with Altered IM Microbiome–Immune Landscapes
To explore how CH might contribute to IM somatic alterations and gastric carcinogenesis, we explored the IM driver genes and found that PIGR mutations were more frequent in patients with IM with high CH (Fig. 6A; Fisher test, OR = 2.3; P = 0.031). Specifically, high CH was significantly associated with PIGR truncating mutations (Fisher test, OR = 2.8; P value 0.017) whereas PIGR missense or in-frame indels did not show a significant association (Fig. 6B; Fisher test, OR = 1.7, P value 0.43). We did not observe a significant association between PIGR expression and PIGR mutational status (Wilcoxon test, P = 0.43). Among CH driver genes, PIGR truncating mutations were associated with TET2 mutations (Fisher test, OR = 2.9; P value 0.044; Fig. 6C).
Figure 6.
CH and immune–microbiome landscapes in IM. A, Forest plot illustrating IM driver genes associated with high CH. For activating driver mutations, we evaluated only missense/inframe mutations in IM samples. For loss-of-function drivers, we evaluated only truncating mutations. Ambiguous drivers were tested for all protein-altering somatic mutations. B, Types of PIGR mutations associated with CH. C, CH genes co-occurring with PIGR mutations in IM samples. D, Proposed model illustrating interactions between CH, the bacterial microbiome, and immune cells within the IM microenvironment. E, Distribution of immune cell types in IM samples harbouring high (n = 3) and low (n = 10) CH, based on scRNA-seq data. The boxplot represents total immune cells in the IM microenvironment in CH-high (n = 3) and CH-low samples (n = 10). The barplot represents the log2 fold change (FC) of individual cell types in the microenvironment. F, Distribution of IgA+ plasma cells and mature T cells in IM samples harboring high (n = 20) and low (n = 94) CH, based on deconvoluted bulk RNA-seq data. Scatterplot represents correlation of IgA+ plasma cells with bacterial and human reads. G, Differential bacterial genera abundance between high- and low-CH patients with IM, based on bulk RNA-seq samples (n = 104 samples; 10 samples with bacterial abundance less than 0.02% were excluded from analysis). H, FISH detection of Streptococcus (genus; red) and Sa (species; green) in human gastric cancer tissue with DAPI nuclear counterstain (blue). Representative regions from Streptococcus-negative and -positive gastric cancer samples are shown. Scale bar, 100 μm. I, Spatial location of Streptococcus infection in one gastric cancer sample, co-localized with CXCL8 gene expression.
PIGR, or polymeric immunoglobulin receptor, is a protein involved in the mucosal immune response, for which PIGR’s main function is to transport polymeric immunoglobulins (mainly IgA) across epithelial cells to mucosal surfaces such as those lining the respiratory, gastrointestinal, and urogenital tracts (48). Once transported, these immunoglobulins are secreted as part of the body’s defense mechanism against pathogens, helping to maintain mucosal immunity and protection against infections. The association of PIGR mutations with CH raises the possibility that CH might promote IM progression by altering host mucosal immunity (49). We hypothesized that disruptions or mutations in PIGR, particularly truncating mutations, could initially impair mucosal immunity, leading to the overgrowth of pathogenic microbes. Supporting this model, we analyzed our targeted sequencing data to compare bacterial abundances in IM samples with and without PIGR mutations. We found that IM samples harboring PIGR truncating mutations exhibited a higher abundance of oral bacteria such as Streptococcus (one-sided Wilcoxon test, P = 0.034) but not Helicobacter (one-sided Wilcoxon test, P = 0.80; Supplementary Table S11). This suggests that PIGR alterations may be linked to the expansion of certain oral bacteria such as Streptococcus.
We hypothesized that CH-driven chronic inflammation could then further worsen bacterial infection, leading to chronic inflammation and IM progression (Fig. 6D). We further tested this model using four approaches. First, we asked whether high CH levels are associated with a disrupted inflammatory environment in the stomach. Specifically, we interrogated scRNA-seq samples (13 samples, three with high CH) and observed that subjects with IM with high CH showed an upregulation of immune cell types (Wilcoxon-test, P = 0.049), driven by increased IgA+ plasma cells and mature T cells within the IM microenvironment (Fig. 6E). To validate this finding, we then analyzed our bulk RNA-seq data for evidence of changes in inflammation associated with CH. In 20 IM samples with high CH (VAF > 5%) compared with 94 non-/low-CH IM samples with targeted DNA and RNA-seq profiles, we again observed higher levels of IgA+ plasma cells (Wilcoxon test, P = 6.9 × 10−3) and mature T cells (Wilcoxon test, P = 3.4 × 10−3) in the CH IM microenvironment, consistent with the scRNA-seq results (Fig. 6F). To account for group size imbalances, we conducted down-sampling analysis, performing 1,000 iterations in which the 20 CH-high samples were compared with 20 randomly selected low-CH samples in each iteration. High CH samples exhibited a consistently higher median IgA+ plasma cell abundance in 99.4% of iterations (proportion P value = 0.006).
Second, consistent with inflammation being linked to microbial composition, the proportion of inferred IgA+ plasma cells correlated (Pearson r = 0.43; P value 1.1 × 10−5) with the abundance of bacterial reads but not the abundance of human reads (r = 0.056; P value 0.59). We noted increases of several bacterial genera, including Streptococcus (Wilcoxon test, P = 8.2 × 10−3), Neisseria (P = 1.2 × 10−2), Gemella (P = 1.1 × 10−3), Actinomyces (P = 0.028), Haemophilus (P = 0.031), Porphyromonas (P = 0.026), and Fusobacterium (P = 7.9 × 10−3; Fig. 6G). To further link microbial abundance to inflammation, we performed Gram staining on gastric cancer sections exhibiting high oral microbiome read counts and confirmed focal Gram-positive cocci in a subset of cases (four of 15). Consecutive sections confirmed neutrophil infiltration (MPO) with CXCL8- and CXCL2-positive immune cells at these bacterial loci (Supplementary Fig. S6A–S6F), indicative of inflammation.
Third, to confirm the oral cavity as a potential source for these bacteria, we analyzed paired targeted DNA sequencing data from IM and saliva (32 samples; USA cohort) and confirmed that these CH-expanded bacterial genera are among the most prevalent bacterial genera in the saliva (Supplementary Table S12). We independently confirmed these observations in a Singapore cohort using a shotgun metagenomics sequencing dataset of 173 saliva samples and found that these bacterial genera, normally abundant in the oral cavity, are often increased in patients with IM with high CH. Among the 20 most abundant genera in the saliva identified using shotgun metagenomics, 13 were also detected using IM RNA-seq. Of these 13 genera, all exhibited increased abundance in patients with IM with high CH, with nine genera showing statistically significant increases (Wilcoxon test, P value < 0.05; Supplementary Table S12).
Fourth and finally, to directly examine how oral bacteria might modulate host–microbe immune interactions in IM/gastric cancer, we noted a recent report implicating Streptococcus anginosus (Sa) in gastric cancer progression (9). Performing FISH experiments using genus-level Streptococcus and species-specific Sa probes in gastric cancer tissues with high bacterial reads, we confirmed the presence of Sa in gastric cancer tissues with good concordance between the two probes at bacterial loci (Fig. 6H; Supplementary Fig. S6G). To our knowledge, this is the first in situ detection of Sa reported in human gastric cancer tissues by FISH. We then performed spatial enhanced resolution omics sequencing (Stereo-seq) spatial transcriptomic analysis on four gastric cancer samples, using a methodology enabling detection of both human and bacterial sequences. In two gastric cancer samples, we detected spatial clusters of bacterial reads mapping to the Streptococcus genus (Supplementary Fig. S7). In one gastric cancer sample (D04613G6), these Streptococcus regions overlapped with areas exhibiting elevated expression of CXCL8 (Fig. 6I), a key mediator of neutrophil recruitment and inflammation and suggesting local activation of inflammatory pathways. In the other sample (Y01514A6), we detected bacterial reads mapping to Streptococcus overlapping intestinal-type epithelial regions (PIGR, REG4, and ANPEP) with concurrent IgA+ plasma cell signatures (IGHA1, IGKC, and JCHAIN), again consistent with increased inflammation (Supplementary Fig. S7). These preliminary observations support the hypothesis that CH may be associated with altered mucosal immunity and expansion of oral bacteria such as Streptococcus during IM progression, which may contribute to gastric cancer tumorigenesis by fostering a proinflammatory microenvironment. Notably, given the limited sample size (Stereo-seq and scRNA-seq) and exploratory nature of these analyses, further studies in larger and more diverse cohorts are required to validate these associations. In summary, our analyses suggest that CH expansions, associated with PIGR mutations, are linked to shifts in the IM microbiome and increased inflammation, potentially accelerating progression along later stages of the Correa cascade.
Discussion
Here, we present a comprehensive genomic analysis of more than 1,500 IM samples from six different countries associated with varying gastric cancer risk profiles. By pooling samples from various international sources, we substantially increased the current number of known IM driver genes, reflecting the scientific utility of studying samples from diverse populations and geographies. Importantly, many of our results were observed in at least two or more countries, including Hp variants, driver mutation profiles, mutational signatures, and CH patterns. The majority of driver gene mutations were also observed in multiple internal and external validation datasets. Notably, we observed significant differences in driver gene frequencies in high-risk (e.g., ARID1A) and intermediate-risk (e.g., SOX9) populations, raising the possibility that some of these driver genes may contribute to regional differences in gastric cancer incidence. From a translational perspective, one notable finding was the identification of frequent mutations in IM samples of RAS–RAF–MEK–ERK pathway genes. Subsequent gene expression analysis confirmed elevated expression of KRAS–ERK signaling pathways in these mutated IM samples. This elevated signaling may facilitate the transition of normally quiescent IM cells into a proliferative state, consistent with observations that in adult mice, the expression of activated KrasG12D in gastric mucosa can induce metaplastic changes (50, 51). These insights provide a rationale for exploring therapeutic interventions targeting KRAS–ERK signaling to mitigate the risk of gastric cancer progression in patients with IM (32, 33).
Another interesting finding was the observation of SBS17 mutational signatures in IM but not in normal gastric samples, occurring predominantly as late subclonal events. Apart from IM and gastric cancer, SBS17 is also commonly observed in Barrett esophagus and esophageal adenocarcinoma (52, 53), suggesting a shared molecular mechanism in premalignant conditions of the upper gastrointestinal tract. The identification of SBS17 genomic hotspots in IM, associated with late-replicating regions and DNA hypomethylation, supports a potential role for intrinsic oxidative damage affecting nucleotide pools in SBS17 causation. Specifically, aberrant oxidation of guanine bases into 8-oxoguanine (8-oxoG) and subsequent incorporation of 8-oxoG into DNA can lead to 8-oxoG-adenine mismatches and T-to-G mutations (38), a prominent feature of SBS17. Low nucleotide availability can become increasingly exacerbated as replication progresses through the genome, potentially explaining the observed enrichment of SBS17 in late-replicating regions. In addition, DNA incorporation of mispaired 8-oxoG may also cause incomplete maintenance of methylation (54), which may facilitate global DNA hypomethylation in developed gastric cancer. Although the etiology of SBS17 remains unknown in the majority of tumors, some studies have linked SBS17 to 5-fluorouracil chemotherapy and oxidative damage caused by ROS, which causes depletion of nucleotide pools (55, 56). However, as our patients with IM had not undergone cancer therapy, the SBS17 signature in patients with untreated IM may be caused by other sources of endogenous oxidative stress, such as those generated through elevated OXPHOS observed in IM stem cells (14).
Beyond somatic changes in gastric epithelial cells, we leveraged our IM dataset to study patterns of CH and its interaction with the IM inflammatory environment. Specifically, by combining somatic mutations in the germline and IM, as well as clinical and epidemiologic data, we provide to our knowledge the first evidence that IM development and progression are shaped, at least in part, by an intricate interplay between IM host genetics, the microbiome, and the immune microenvironment. Patients with IM with high levels of CH often exhibited concurrent truncating PIGR mutations, which has been proposed to impair mucosal immunity and associated (in our data) with a greater abundance of IgA+ plasma cells and bacterial reads, supporting the role of bacterial infection in triggering gastric inflammation (57). Recent studies provide compelling evidence that Hp eradication reduces the risk of gastric cancer (58, 59) although some risk remains after eradication (60). Our findings support the potential of considering antibiotic treatments targeting bacteria other than Hp (61), especially in the subset of patients with high CH or other immune factors that may contribute to gastric cancer.
Our finding that high CH is associated with higher gastric cancer risk is consistent with recent studies from the UK Biobank in which CH has been associated with several solid cancer types, including lung and kidney (62). In our study, we defined “high CH” as clonal events with VAF > 5% and CHIP as CH events with VAF > 2% in the absence of cytopenia or hematologic malignancy. Several studies have shown that CH, particularly involving loss-of-function mutations in DNMT3A or TET2, is linked to a hyperinflammatory state characterized by elevated levels of proinflammatory cytokines such as TNFα, IL6, IL1β, and IL8 (63–65). In the context of gastric cancer, we propose that CH-driven inflammatory states may amplify chronic inflammation in IM by impairing mucosal immunity and accelerating bacterial colonization, thereby driving the transition from IM to gastric cancer. Given that CH is a systemic feature which can be monitored from peripheral blood, CH could potentially serve as a valuable biomarker for identifying individuals at higher risk of gastric cancer. Further investigations, including large-scale longitudinal studies, are needed to validate CH as a risk factor in IM-to-gastric cancer progression.
In conclusion, our study offers an integrated perspective on distinct genetic, environmental, and immune factors shaping IM evolution, with direct implications for better understanding gastric cancer risk. Figure 7 integrates these findings into a broader framework, illustrating how pathogen genetic diversity, epithelial and nonepithelial somatic mutations, mutational signatures, and host–microbiome interactions might converge to influence IM evolution and progression. Figure 7 also highlights the translational opportunities derived from our analysis, including the development of biomarkers, such as ARID1A mutations, SBS17, and CH for early IM detection and gastric cancer risk stratification, as well as the therapeutic potential of targeting both the RAS–RAF–MEK–ERK pathway and the gastric microbiome to intercept gastric cancer progression. From a clinical perspective, routine surveillance for all IM cases is neither cost-effective nor practical—particularly in regions with low-to-intermediate gastric cancer incidence. This situation highlights the need for more precise stratification methods that focus on identifying the smaller subset of patients at truly high risk. Despite the relatively small EGN subgroup in our cohort, our study makes important contributions by identifying significant genetic and molecular risk factors for progression from IM, pointing toward actionable biomarkers for more targeted surveillance.
Figure 7.
Key findings from this study, illustrating how pathogen genetic diversity, somatic epithelial and nonepithelial genetic variants, immune alterations, and changes in microbiome composition interact at each stage of the Correa cascade in the progression of gastric carcinogenesis. Translational opportunities for intervention are highlighted in the bottom blue box.
Our study has limitations. First, although GCEP1000 comprises a well-established clinical dataset with at least 5 years of follow-up in the intermediate-risk Singapore population, the international samples from countries with different risk profiles were collected relatively recently, and for this study, clinical outcomes are largely limited to those recorded at the time of baseline enrollment. Ongoing efforts to collect and update clinical outcome data of these international samples may refine our understanding of factors driving IM progression. Second, another limitation is the reliance on targeted sequencing data, which may miss noncoding or novel cancer mutations that could play a role in IM progression. Our study prioritized generating high-depth sequencing data (>1,000×) across 277 genes, an approach necessary due to the low clonality of IM driver mutations, which was then supplemented with WGS in a smaller set of samples. However, this design may limit the ability to detect potential driver mutations outside the targeted panel. Third, for our organoid experiments, we primarily relied on pharmacologic inhibition to interrogate pathway dependencies. Genetic perturbation approaches targeting key pathway components would provide more direct mechanistic insights. Fourth, in the mutational signature analysis, although IMs showed increased SBS17 mutations, OXPHOS gene expression, and oxidative stress, these data are correlative. Further studies involving long-term perturbation of OXPHOS with longitudinal mutational readouts may be needed to conclusively establish mechanistic links. Fifth, the microbiome analysis using targeted DNA sequencing is also constrained by the targeted panel not being specifically designed for Hp detection and not optimized to detect other bacterial genera. Finally, for our scRNA-seq and spatial transcriptomics analyses, our sample size was limited because of the technical and logistic challenges associated with pre-identifying IM/gastric cancer samples enriched for specific features such as CH or distinctive microbiome compositions. This limitation thus requires validation in larger, well-characterized cohorts.
Methods
Human Subjects
This study was approved by institutional ethics boards at Gastric Cancer Biomarker Discovery II, Domain Specific Review Board (DSRB) of the National Healthcare Group (ethics approval no: 2005/00440), DSRB of the National Healthcare Group (2000/00329 and 2019/00629), Institutional Review Board (IRB) of the National University of Singapore (H-19-070E), Centralized IRB of Singapore Health Services (2018/3222), IRB of Seoul National University Hospital (H-1906-083-1040), IRB of Yonsei University Wonju Severance Christian Hospital (CR319134), IRB of Nihon University School of Medicine (20191007), Clinical Research Ethics Committee of Joint Chinese University of Hong Kong-New Territories East Cluster (2019.517), IRB of Stanford University (45077), and IRB of National Taiwan University Hospital (201402061RINA). All study subjects provided written informed consent prior to their participation in the study. This study was conducted in accordance with the Declaration of Helsinki.
Targeted Panel and WGS and Analysis
Genomic DNA from frozen tissues and blood samples was extracted using the Wizard Genomic DNA Purification Kit (Promega) or QIAamp DNA Mini Kit (Qiagen) according to the manufacturer’s protocols. The target enrichment platform was the Agilent SureSelect XT HS2 DNA System with Pre-Capture Pooling (Agilent: G9985A, G9985B, G9985C, and G9985D) with a customized tier 2 design. Briefly, 100 ng of DNA from each sample was enzymatically fragmented (Agilent, 5191-4080) before performing end-repair, adapter ligation, and precapture amplification using unique dual indexing primer pairs. The yield and size distribution of each sample were checked using D1000 ScreenTapes (Agilent, 5067–5582). Sixteen samples were pooled in equal amounts to 1.5 μg per hybridization with the custom panel, and hybridized DNA samples were captured using streptavidin-coated beads before amplification. The yield and size distribution of the captured samples were analyzed on High Sensitivity ScreenTapes (Agilent, 5067–5584). Libraries were sequenced on Illumina NovaSeq 6000 sequencers (PE150 bp; RRID: SCR_016387).
WGS libraries were constructed using the New England Biolabs NEBNext Ultra II DNA Library Prep Kit. The genomic DNA was randomly sheared into short fragments, and the obtained fragments were end-repaired, A-tailed, and further ligated with Illumina adapters. The fragments with adapters were PCR amplified, size selected, and purified. The libraries were checked with Qubit (RRID: SCR_018095) and real-time PCR for quantification and on an Agilent bioanalyzer (RRID: SCR_018043) for size distribution detection. Quantified libraries were pooled and sequenced on the Illumina NovaSeq 6000 (PE150 bp) according to the manufacturer’s protocols.
Sequencing data were aligned to the hs37d5 human reference genome using BWA MEM (RRID: SCR_010910). Duplicates were removed with Agilent’s AGeNT tool using molecular barcode information (targeted sequencing) or MarkDuplicate (WGS; RRID: SCR_006525). Aligned BAM files were further processed according to GATK Best Practices (RRID: SCR_001876) guidelines. We used HaplotypeCaller in GVCF mode to call germline mutations and Mutect2 to call somatic mutations.
For targeted deep sequencing, Mutect2 options “–force-active true –pruning-lod-threshold −4 –max-reads-per-alignment-start 0” were used to improve sensitivity at the expense of runtime. To balance specificity, standard Mutect2 (RRID: SCR_000559) somatic variant filters were applied to remove background germline variations and sequencing artifacts using the Genome Aggregation Database (RRID: SCR_014964) germline resource and a panel of normals consisting of all germline samples profiled on the same panel. Additional filters included checking for cross-sample contamination (GATK4’s CalculateContamination) and filtering for possible read-orientation sequencing artifacts (GATK4’s CollectF1R2Counts and LearnReadOrientationModel). Finally, somatic variants with at least five variant-supporting reads and mutation allele frequencies of at least 1% were included as the final dataset of high-confidence calls. The 1% cutoff criteria were implemented to minimize the impact of sequencing depth on the number of mutations recovered.
Hp Status and Phylogenetic Analysis
Hp coverage was calculated using GATK’s DepthOfCoverage (RRID: SCR_001876) tools on the six targeted Hp genes. Samples with at least 1× coverage were included for phylogenetic analysis. We applied HaplotypeCaller to call variants in the six Hp genes, yielding a total of 961 single-nucleotide variants for phylogenetic inference. The phylogenetic analysis was conducted using the GTRGAMMA model, implemented in RaxML (66), and the resulting phylogenetic tree was visualized using the iTOL platform (Interactive Tree of Life, https://itol.embl.de/; RRID: SCR_018174). Fisher tests with FDR correction were used to identify three gene variants (CagA E106D, R109K, and N228H) significantly separating Singapore and Japanese/Korean Hp strains (FDR < 0.05). For the Hp genome project data (18), CagA protein sequences were retrieved for all CagA+ Hp strains (688 sequences; 655 strains). Multiple sequence alignment was performed using ClustalW (RRID: SCR_002909) to identify amino acid positions corresponding to the three CagA variants (E106D, R109K, and N228H). We used the AlphaFold 3 (67) web server (https://deepmind.google/technologies/alphafold/alphafold-server/; RRID: SCR_025454) to predict structural interactions between CagA domain 1 (amino acids 1–251) and the ASPP2 protein proline-rich domain (amino acids 684–891). The resulting CIF (crystallography information file) was visualized using PyMol (https://pymol.org/2; RRID: SCR_000305) for structural analysis. Two residues in the three-dimensional structure were considered to be in contact if they have at least one pair of atoms within a distance of 3.5 Å. All Hp analysis used the F57 Hp strain (NC_017367.1) as a reference.
CagA–ASPP2 Co-Immunoprecipitation and Immunoblotting
Anti-HA (RRID: AB_2935603), anti-V5 rabbit polyclonal antibodies (RRID: AB_2878059), and Multi-rAb HRP-Goat Anti-Rabbit Recombinant Secondary Antibodies (H+L; RRID: AB_3094534) were obtained from Proteintech; anti-GAPDH-HRP mouse monoclonal antibodies were obtained from Santa Cruz Biotechnology (RRID: AB_627679). pCI-Neo was purchased from Promega. HEK293T cells (RRID: CVCL_0063) were obtained from ATCC and maintained in DMEM (Thermo Fisher Scientific, Gibco) containing 10% FBS (Gibco) and 100 U penicillin/streptomycin (Gibco). Cells were authenticated by short tandem repeat profiling and Mycoplasma testing was performed on July 31, 2024, using the MycoAlert Mycoplasma Detection Kit (Lonza), which was negative. Cells were transiently transfected using jetPRIME reagent (Polyplus Transfection), according to the manufacturer’s instructions and harvested for cell lysis at 48 hours after transfection.
For ectopic expression in mammalian cells, Hp CagA gene variants (intermediate-risk variant: D106/K109/H228 and high-risk variant: E106/R109/N228) were PCR amplified from the genomic DNA of Hp strains carrying these variants and cloned into the pCI-Neo vector with an HA-tag sequence at its 5′ terminus. Transfected cells were washed twice with ice-cold PBS and lysed in ice-cold nonionic detergent lysis buffer (50 mmol/L Tris.Cl, pH 7.4, 150 mmol/L NaCl, 1 mmol/L EDTA, 0.5% Nonidet-P40, and 0.5% Triton X-100) supplemented with the protease inhibitor cocktail (Roche). Cell lysates were cleared by centrifugation and the collected supernatants were saved for subsequent immunoprecipitation (IP) and immunoblotting (input) analysis. For immunoprecipitation, precleared lysates were first incubated with the indicated antibody, followed by precipitation of the immune complexes using the equilibrated anti–rabbit IgG magnetic VHH agarose beads (Proteintech, RRID: AB_3697091) overnight at 4°C. The immune complex–bound beads were washed three times with the ice-cold wash buffer (10 mmol/L Tris.Cl, pH 7.4, 150 mmol/L NaCl, 0.5 mmol/L EDTA, and 0.05% Nonidet-P40), followed by elution with 2× SDS sample buffer (Laemmli) and boiling to dissociate the immunocomplexes from beads. The prepared protein samples were resolved by SDS-PAGE on 4% to 20% gels (Bio-Rad). Proteins were subsequently transferred on to polyvinylidene difluoride (PVDF) membranes (Bio-Rad). After incubation in blocking buffer [5% (w/v) nonfat milk in 0.1 mol/L TBS containing 0.1% Tween-20 (TBST)], the blots were incubated with indicated primary antibodies overnight at 4°C, followed by the secondary anti-rabbit horseradish peroxidase (HRP)–conjugated secondary antibodies at their optimal dilutions. Protein bands were visualized using the GE enhanced chemiluminescence Western blotting substrate using the ChemiDoc MP system (Bio-Rad, RRID: SCR_019037). All experimental data are presented as the mean ± SD. Differences among groups were determined by a Student t test (two-way). Indicated significance P values correspond to <0.05 (∗), <0.01 (∗∗), and <0.001 (∗∗∗).
Bulk and scRNA-seq
Both bulk (n = 104) and scRNA-seq (n = 18) analyses were conducted by combining previously generated data (14) with new samples (10 bulk and two scRNA-seq samples from CH-high patients with IM). All bulk and single-cell sequencing data were processed using standardized pipelines to ensure comparability across samples (14).
Driver Gene and Transcriptome Analysis
Genes under positive selection were identified using the IntOGen pipeline (24), which combines seven complementary methods to detect signals of positive selection in mutational patterns. The pipeline begins by preprocessing somatic mutations, filtering out hypermutated samples, and gathering data needed for the driver detection methods. Outputs from each driver detection algorithms were integrated using a weighted voting system, in which weights reflect the reliability of each method. Finally, a postprocessing step removed spurious genes arising from known artifacts. The KRAS- and ERK-dependent transcriptome signatures were obtained from Klomp and colleagues (31). The fgsea tool (https://github.com/alserglab/fgsea; RRID: SCR_020938) was used to assess the expression of KRAS (top 200 genes), ERK (top 200 genes), and the intersected KRAS–ERK (top 200 genes) signaling pathways in IM bulk RNA-seq and scRNA-seq data. For scRNA-seq, cell cycle phases were assigned using the Seurat’s (RRID: SCR_016341) CellSycleScoring function.
Human Organoid Culture and Organoid Assays
Organoids were cultured in medium containing the following components: 50% Wnt3A-conditioned medium, 10% R-Spondin-1–conditioned medium, 10 mmol/L HEPES (cat. #15630080, Gibco), 2 mmol/L GlutaMAX (cat. #35050061, Gibco), 1× B27 (cat. #17504044, Gibco), 1 mmol/L N-acetyl-L-cysteine (cat. #A9165-5G, Sigma-Aldrich), 50 ng/mL mouse recombinant EGF (cat. #PMG8043, Gibco), 100 ng/mL human recombinant FGF10 (cat. #100-26-250UG, PeproTech), 100 ng/mL mouse recombinant noggin (cat. #250-38-250, PeproTech), 1 nmol/L gastrin I (cat. #G9145-.1MG, Sigma-Aldrich), 2 μmol/L A83-01 (cat. #2939, Tocris), and 10 μmol/L Y-27632 (cat. #Y0503-5MG, Sigma-Aldrich).
Total RNA from organoids was extracted using the RNeasy Plus Mini Kit (cat. #74134, Qiagen) according to the manufacturer’s protocol. Library preparation was conducted using the SMART-Seq Stranded Kit (cat. #634443, Takara). Libraries were sequenced on a HiSeq 4000 sequencer using the paired-end 150 bp read option. Quality control (QC)–passed reads were aligned to the human reference genome GRCh38/hg38 using Hisat2 (https://github.com/DaehwanKimLab/hisat2; RRID: SCR_015530) and raw gene counts were quantified using featureCounts (https://github.com/ShiLab-Bioinformatics/subread; RRID: SCR_012919). Stomach and intestine lineage marker genes were extracted from differential gene expression analysis on organoid microarray public datasets (GSE74843 and GSE112369). Clustering for these marker genes was performed using the ComplexHeatmap package (https://github.com/jokergoo/ComplexHeatmap; RRID: SCR_017270) by integrating public datasets (GSE74843 and GSE112369) with our dataset, with batch correction applied via the ComBat function (https://github.com/jtleek/sva; RRID: SCR_010974). For drug treatment assays, organoids were dissociated into single cells, and 1,000 cells were seeded in a 5-μL Matrigel dome on a 96-well plate. Once the single cells had formed organoids, pyrvinium pamoate, STAT3-IN-1 (cat. #HY-100753, MedChemExpress), ulixertinib (cat. #S7854, Selleckchem) were added at the concentrations indicated in Supplementary Fig. S1D. After a 3-day treatment, ATP levels were measured using the CellTiter-Glo 2.0 Cell Viability Assay (cat. #G9242, Promega). IC50 values were calculated using GraphPad Prism (v10.2.3; RRID: SCR_002798). Comparisons between IM and normal (N) organoids were performed using a linear mixed-effects model implemented with the lme4 (https://github.com/lme4/lme4; RRID: SCR_015654) and lmerTest (https://github.com/runehaubo/lmerTestR; RRID: SCR_015656) packages in R. For organoid formation assays, organoids were dissociated into single cells, 500 cells were seeded in a 5-μL Matrigel dome on a 96-well plate, and 100 nmol/L pyrvinium pamoate was added. Once organoids had grown in the control groups, the organoid number was measured. Organoid area was measured using the ImageJ version 1.54g (NIH, RRID: SCR_003070). Comparisons between treated and control groups were performed using unpaired t tests with Welch correction. Multiple comparisons were adjusted using the FDR method (two-stage step-up, Benjamini, Krieger, and Yekutieli) with a desired FDR (Q) of 1%.
Immunoblot Analysis of Organoids
Proteins were extracted from organoids using RIPA buffer (cat. #89900, Thermo Fisher Scientific) supplemented with complete protease inhibitor cocktail (cat. #53765300, Roche). Equal amounts of protein were separated on 4% to 12% SDS-PAGE and transferred to PVDF membranes. Membranes were blocked for 1 hour with 5% fat-free dry milk in TBST and then incubated overnight with primary antibodies against p-ERK (cat. #4370, Cell Signaling Technology, 1:1,000, RRID: AB_2315112), total ERK (cat. #4695, Cell Signaling Technology, 1:1,000, RRID: AB_390779), p-STAT3 (Y705; cat. #9145S, Cell Signaling Technology, 1:2,000, RRID: AB_2491009), p-STAT3 (S727; cat. #9134, Cell Signaling Technology, 1:1,000, RRID: AB_331589), total STAT3 (cat. #4904S, Cell Signaling Technology, 1:2,000, RRID: AB_331269), and GAPDH (cat. #sc-47724 HRP, Santa Cruz Biotechnology, 1:2,000, RRID: AB_3716894). After washing, blots were incubated for 1 hour with HRP-conjugated secondary antibodies (anti-rabbit or anti-mouse; Santa Cruz Biotechnology, RRID: AB_628497), developed using enhanced chemiluminescence (cat. #RPN2232/2235, Amersham Biosciences), and quantified by densitometry using an iBright 1500 imaging system (Invitrogen, RRID: SCR_026565) and ImageJ v1.54g (NIH, RRID: SCR_003070).
Immunofluorescence Staining of Organoids
Immunofluorescence staining was performed to evaluate the expression of CDX2 in IM and N organoids using an anti-CDX2 antibody (cat. #12306, Cell Signaling Technology; RRID: AB_2797879) and an anti-EPCAM antibody (cat. #2929, Cell Signaling Technology, RRID: AB_2098657). Briefly, approximately 7,500 organoids were recovered from Matrigel using Cell Recovery Solution (cat. #354253, Corning) and embedded in iPGell (cat. #FNK-PG20-1, Genostaff) according to the manufacturer’s instructions. The organoids were then fixed in 10% neutral buffered formalin and embedded in paraffin, and 5 μm formalin-fixed, paraffin-embedded (FFPE) sections were prepared. Antigen retrieval was performed by incubating the sections in a water bath at 97°C for 10 minutes with either pH 6 (citrate) or pH 9 (Tris) buffer. The sections were incubated with the primary antibody at 4°C overnight, followed by incubation with secondary antibodies at room temperature for 1 hour. Finally, the sections were stained with DAPI (cat. #130-111-570, Miltenyi Biotec) for 1 minute before observation.
Mutational Signatures
We used SigProfiler (RRID: SCR_023121; ref. 36) to identify mutational signatures in the IM WGS samples. We also retrieved WGS mutation data in 122 Chinese gastric cancer from the International Cancer Genome Consortium (ICGC) (68) and preprocessed mutational signatures from 79 normal gastric samples (25) for comparison with IM. We employed SigProfilerAssignment’s Analyze_Cosmic_Fit (RRID: SCR_026899) function to assign each mutation to known Catalogue of Somatic Mutations in Cancer (RRID: SCR_002260) mutational signatures. We next used the Analyze_Denovo_Fit function to fit targeted sequencing mutations from our IM cohort to the six mutational signatures (SBS1, SBS5, SBS17a, SBS17b, SBS18, and SBS40) derived from IM WGS data. For consistency with earlier studies (25), we consolidated these six signatures into four broader categories: SBS1, SBS5/40 (SBS5 + SBS40), SBS17 (SBS17a + SBS17b), and SBS18. Biochemical levels of 8-OH-dG nucleosides in organoids were quantified using ELISA assays (Abcam, RRID: AB_2904542).
Mitochondrial Stress Test
Mitochondrial respiration was assessed using the Seahorse XF Cell Mito Stress Test with the Seahorse XFe96 Analyzer (Agilent Technologies, RRID: SCR_019545), following a modified protocol optimized for gastric organoids. IM (day 5–7) and N (day 6) organoids were harvested, counted, and embedded in 3 μL of Matrigel per well (approximately 150 organoids per well and average diameter ∼100 μm) in Seahorse XFe96 Cell Culture Microplates (cat. #103792-100, Agilent Technologies). After overnight culture, the medium was replaced with assay medium—XF DMEM Base Medium (cat. #103575-100, Agilent Technologies) supplemented with 10 mmol/L glucose (cat. #68769, Sigma), 2 mmol/L glutamine (cat. #25030-081, Gibco), and 5 mmol/L sodium pyruvate (cat. #11360-070, Gibco), adjusted to pH 7.4. Organoids were preincubated in a non-CO2 incubator at 37°C for 60 minutes. OCR was measured at baseline and after sequential injections of 5 μmol/L oligomycin A (cat. #S1478, Selleckchem), 0.5 μmol/L Carbonyl cyanide-4 (trifluoromethoxy) phenylhydrazone (FCCP: cat. #C2920, Sigma), and 2 μmol/L rotenone (cat. #R8875, Sigma)/2 μmol/L antimycin A (cat. #A8674, Sigma) according to Supplementary Table S13. For DCA pretreatment experiment, organoids embedded in Matrigel in Seahorse microplates were treated with 5 mmol/L DCA (cat. #S8615, Selleckchem) for 24 hours prior to measurement. OCR was then measured according to Supplementary Table S14. For normalization, organoids were lysed in 0.5% (v/v) Triton X-100, and DNA was stained with DAPI (0.5 μg/mL, cat. #D9542, Sigma). Fluorescence was measured using an Infinite M200 plate reader (Tecan; RRID: SCR_019033), and DNA content was used to normalize OCR values across samples. Data were analyzed using Wave (Agilent, RRID: SCR_024491) and GraphPad Prism software (v10.2.3; RRID: SCR_002798).
Flow Cytometry
Organoid single cells treated with Sodium DCA (cat. #S8615, Selleckchem) at 5 mmol/L were harvested and stained with CellROX Oxidative Stress Reagents (cat. #C10444, Invitrogen) at a final concentration of 10 μmol/L in blank culture media. The cells were incubated for 30 minutes at 37°C. Afterward, the cells were washed and analyzed by flow cytometry (BD Fortessa). Simultaneously, the organoid single cells were also permeabilized with the Cytofix/Cytoperm Fixation/Permeabilization Kit (cat. #554714, BD Biosciences, RRID: AB_2869008) for 30 minutes in the dark at 4°C, followed by staining with anti-H2A.X Phospho (Ser139) Antibody (clone 2F3, cat. #613406, BioLegend, RRID: AB_2248011) for another 30 minutes in the dark at 4°C, before the cells were washed and analyzed by flow cytometry.
EM-seq and Analysis
A total of 200 ng of genomic DNA with diluted unmethylated Lambda control DNA and diluted CpG methylated pUC19 control DNA was sheared to an average fragment size of 300 bp using the Covaris S220 instrument (RRID: SCR_026427). EM-seq libraries were constructed using the NEBNext Enzymatic Methyl-seq Kit (New England BioLabs; #E7120S). Library fragment size was determined using the DNA High Sensitivity Kit on the Agilent Bioanalyzer (Agilent Technologies, RRID: SCR_018043). Libraries were quantified by qPCR, pooled, and sequenced paired-end 150 bp on the NovaSeq X Plus system (Illumina, RRID: SCR_024568). For data analysis, sequencing reads were aligned to the human genome using BWA-Meth (https://github.com/brentp/bwa-meth; RRID: SCR_025851), a BWA-based aligner optimized for bisulfite or EM-seq methylation data. Duplicate reads arising from PCR amplification were identified and marked with Picard MarkDuplicates (RRID: SCR_006525). Methylation status at CpG sites was extracted using Methyldackel (https://github.com/dpryan79/MethylDackel; RRID: SCR_025850), and the data were destranded to aggregate methylation information from both DNA strands. For quality control, the average methylation level for the Lambda negative control was 1.5%, whereas pUC19-positive control showed an average methylation level of 97.5%, indicating that the assay is performing as expected. Processed methylation data were analyzed for differential methylated sites and regions (FDR < 0.001 and methylation difference >10%) using the methylKit (RRID: SCR_005177) R package (69). Hierarchical clustering of the methylation data was performed using the clusterSamples function in methylKit, applying Pearson correlation distance and the Ward.D2 clustering method, using the top 50% most variable CpG dinucleotides.
CH Variant Calling
CH variant calling was performed using an inverse approach compared with somatic IM mutation analysis. Preprocessed and aligned BAM files were generated as described for IM mutation calling. Variants were called using Mutect2 (RRID: SCR_000559) with the same parameter settings, treating blood or saliva samples as the “tumor” input and the IM samples as “control” with the normal–artifact filter turned off. CH variants were further filtered according to the following criteria: at least five variant-supporting reads, VAF in blood or saliva of at least 2%, and a blood or saliva VAF at least twice that observed in matched IM samples to account for possible leukocyte contamination in IM. Similar criteria have been used in other studies. For comparisons, we leveraged the Memorial Sloan Kettering Cancer Center dataset (43), which comprises a cohort of 24,126 patients with cancer spanning 56 types of primary nonhematologic cancers who underwent matched tumor and blood sequencing using the MSK-IMPACT panel before July 1, 2018. The cohort consists predominantly of White individuals (76.3%) and includes a substantial proportion of patients who have received cancer treatments (58.9% of those with available treatment history). Within this dataset, 11,076 unique CH mutations were identified in 7,216 individuals (30.0%).
Clinical Association Analysis
Logistic regression model analysis was performed to identify risk factors associated with dysplasia (low-grade dysplasia) and EGN [high-grade dysplasia (HGD) or adenocarcinoma]. We included previously reported mutation count and clinical factor such as age and genomic factors such as CH and ARID1A truncating mutations. Univariate and multivariate analysis was performed using the glm function in R (RRID: SCR_001905). Significant factors in univariate analysis were further evaluated for multivariate analysis. For the GCEP1000 cohort, which includes longitudinal samples with extended follow-up period, the earliest antrum biopsy in each subject was used for clinical analysis. For recently collected local and international samples, the clinical outcome was restricted to baseline observations as longitudinal follow-up data were not yet available. This cohort had no history of HGD or gastric cancer and was classified under the OLGIM stage II to IV. To reduce confounding variables, IM subjects from the GCEP1000 cohort with a history of HGD or gastric cancer, as well as those classified as OLGIM stage I, were excluded from the analysis. A total of 765 unique patients, comprising 43 subjects with IM with dysplasia and 26 with EGN, were included for clinical association (312 patients in GCEP1000: 20 dysplasia and nine EGN and 453 patients for recently collected samples: 23 dysplasia and 17 EGN).
Immune–Microbiome Analysis
For shotgun metagenomics sequencing, saliva samples (2 mL each) were collected from patients using the OMNIgene·ORAL Saliva Collection Kit (REF: OME-505, DNA Genotek Inc.) at National University Hospital according to the manufacturer’s protocol and immediately transported to the laboratory. Upon arrival, each sample was vigorously shaken for 10 seconds and incubated at 50°C for 60 minutes in a water bath. Subsequently, 500 μL of saliva was aliquoted into each cryovial and stored at −80°C freezer until DNA extraction. Microbial DNA was extracted from 250 μL of saliva using the QIAamp PowerFecal Pro DNA Kit (REF: 51804, Qiagen), following the manufacturer’s protocol. Lysis was achieved through homogenization at 6.0 m/s for 40 seconds using a FastPrep-24 5G Bead Beating Grinder and lysis system (MP Biomedicals, RRID: SCR_018599). DNA concentration was quantified using a Qubit 4 fluorometer with a dsDNA HS Assay Kit (Life Technologies Corporation), purity was assessed with a NanoDrop One spectrophotometer (Thermo Fisher Scientific, RRID: SCR_023005), and DNA integrity was evaluated by agarose gel electrophoresis. Metagenomic library preparation was performed using the Rapid Plus DNA Lib Prep Kit for Illumina (ABclonal Technology Inc.). Libraries were sequenced on the NovaSeq X Plus Platform (Illumina Inc., RRID: SCR_024569) generating 150 bp paired-end reads, yielding approximately 6 GB data per sample. The functional capacity of the microbial communities was analyzed using the bioBakery (RRID: SCR_016596) workflow on the Terra cloud computing platform (https://terra.bio; RRID: SCR_021648). Microbiome analysis was performed using Pathseq (RRID: SCR_005203; ref. 70) using DNA-targeted sequencing from saliva (US cohort) and IM samples, as well as bulk RNA-seq data from a subset of IM samples. Bacterial abundance at the genus or species level was estimated by normalizing the number of unambiguously mapped bacterial read to the number of human reads and multiplying by 1,000,000. RNA-seq samples with very low bacterial reads (<0.02%) were excluded from further analysis as these samples are unlikely to provide reliable estimate of bacterial composition (71). Comparisons between microbiomes in high- and low-CH IM samples were performed using lefser (Linear Discriminant Analysis Effect Size in R, https://github.com/waldronlab/lefser). To deconvolute the immune component in bulk IM RNA-seq, we used CibersortX (RRID: SCR_016955; ref. 72) in the relative mode to infer the fraction of the seven immune cell types identified in IM scRNA-seq data. ESTIMATE (RRID: SCR_026090; ref. 73) was used to infer the immune score in each bulk RNA-seq samples.
IHC
Three consecutive FFPE tissue sections were deparaffinized and rehydrated, followed by incubation with hydrogen peroxide to quench endogenous peroxidase activity. Antigen retrieval was then performed according to the manufacturer’s protocol. The primary antibodies used were MPO (Dako, clone A0398, 1:10,000 dilution, RRID: AB_2335676), CXCL8 (Proteintech, clone IL8, 1:100 dilution, RRID: AB_2861340), and CXCL2 (Invitrogen, clone 16H3L10, 1:50 dilution, RRID: AB_2532403), each applied to consecutive sections. Following incubation with the respective primary antibodies, sections were treated with a secondary antibody and counterstained with hematoxylin. MPO immunostaining was performed on the Ventana automated platform, whereas CXCL8 and CXCL2 IHC were carried out manually. Appropriate positive and negative controls were included for all markers.
Hotspots showing high expression of MPO, CXCL8, and CXCL2 were identified at low magnification (40×). Subsequently, the number of positive cells per high-power field was calculated. MPO positivity was observed in neutrophils; CXCL8 in neutrophils, monocytes, and macrophages; and CXCL2 in neutrophils, tumor cells, monocytes, and macrophages. For analysis, CXCL2 expression in tumor cells was excluded.
Gram Staining
Four-micron-thick FFPE tissue sections were deparaffinized and rehydrated in distilled water. Sections were then stained with crystal violet solution for 1 minute, followed by differentiation using a decolorizing solution until the tissue appeared gray-blue. After washing with water, sections were counterstained with safranin solution for 1 minute, then washed, dehydrated, and mounted. Appropriate positive and negative controls were included. Gram-positive bacteria appeared purple, whereas gram-negative bacteria appeared red.
Microbial FISH
6-FAM–labeled Sa probes (sequence: 5′-AGT TAA ACA GTT TCC AAA GCC TAC-3′; Integrated DNA Technologies) and Cy3-labeled Streptococcus probes (sequence: 5′-GTT AGC CGT CCC TTT CTG G-3′; Integrated DNA Technologies) were used to detect bacterial colonization in paraffin-embedded human gastric cancer tissue sections. After deparaffinization in xylene (cat. #534056, Sigma-Aldrich) and graded ethanol (cat. #E7023, Sigma-Aldrich), sections were rehydrated and treated sequentially with 0.2 N HCl (prepared from HCl, cat. #H1758, Sigma- Aldrich) and Proteinase K (cat. #10108057001, Roche Diagnostics) for 10 minutes each. Slides were incubated with blocking buffer containing BSA (cat. #A9647, Sigma-Aldrich) at 55°C for 2 hours. Hybridization was performed with the probe (1 μmol/L in 35% hybridization buffer, preheated at 88°C for 3 minutes) in a dark, humid chamber at 42°C overnight. After hybridization, slides were washed in Tris-NaCl buffer (20 mmol/L Tris-HCl, pH 7.2 and 40 mmol/L NaCl) and treated with the ReadyProbes Tissue Autofluorescence Quenching Kit (cat. #R37630, Invitrogen). Nuclei were counterstained with DAPI (cat. #D1306, Thermo Fisher Scientific) and mounted with ProLong Diamond Antifade Mountant (cat. #P36961, Invitrogen). Images were acquired using a fluorescence microscope (LSM 800, Zeiss, RRID: SCR_015963) equipped with 6-FAM, Cy3, and DAPI filter sets.
Stereo-seq
Stereo-seq was used to study microbiome species in gastric cancer tumor samples. We used the STOmics Stereo-seq Transcriptomics Set for FFPE, which uses random primers to capture and sequence RNAs in situ. Random primers capture total RNA, including microbial RNA, from FFPE specimens. The tissue on the chips was processed according to the manufacturer’s protocol for the Stereo-seq Transcriptomics N Kit. Briefly, 5-μm-thick FFPE sections were taken from FFPE gastric cancer tissue or adjacent normal gastric tissue blocks using a microtome (Leica, RRID: SCR_023950). Tissue sections were mounted onto a Stereo-seq chip. The tissue section on the chip was dried at 42°C for 3 hours and continued drying overnight at 37°C. Subsequently, the tissue on the Stereo-seq chip was baked for 1 hour at 60°C, followed by deparaffinization using different percentages of ethanol. Then, the tissue on the Stereo-seq chip was stained with Qubit ssDNA reagent (Invitrogen), mounted with glycerol, and imaged with a fluorescent microscope (Leica). After imaging, the tissue underwent de-cross-linking using FFPE De-cross-linking reagent, followed by fixation using cold methanol. Next, the tissue sections were permeabilized using a Permeabilization Mix containing HCl (Sigma) for 30 minutes and proceeded immediately to the reverse transcription and ligation step. After reverse transcription and ligation overnight, cDNA products were released from the chips using cDNA Release Mix. The cDNA mix was collected, purified by 1.0X beads (AMPure) selection, amplified by cDNA Amplification Mix with FFPE cDNA primers mix, and purified by 1.0× beads (AMPure) selection. cDNA quality was assessed by the Bioanalyzer High Sensitivity chip (Agilent, RRID: SCR_018043). Purified cDNA samples were used for cDNA library preparation following the Stereo-seq 16 Barcode Library Preparation Kit v1.0. The cDNA libraries were sequenced on MGI DNBSEQ-T7 Sequencers (RRID: SCR_024847). The Stereo-seq raw data were processed using the BGI STOmics Analytical Workflow (SAW; RRID: SCR_025001), which processes the data from the Stereo-seq sequencing platform, combined with microscope images to generate spatial feature expression matrices. To identify potential microbial sequences, Kraken2 (RRID: SCR_005484; ref. 74) was utilized with the SAW’s “–detect-microorganism” options enabled. The resulting outputs were visualized using StereoMap at bin50 resolution, which groups data into spatial bins corresponding to a 50 × 50 grid of underlying DNA nanoball spots. To reduce noise, a molecular identifier filter was applied, excluding bacterial reads with fewer than five counts within each bin. The SAW output tissue.gef files were also analyzed with Stereopy (75). Square bins with fewer than three detected genes, fewer than 20 total counts, or more than 5% mitochondrial transcripts were removed. Data were normalized using total and log1p methods. Differentially expressed genes were summarized using PCA and square bin clusters were identified using the Leiden algorithm.
Supplementary Material
Supplementary Table S1 details all IM and control samples with associated profiling platforms, including targeted DNA sequencing, WGS, single-cell RNA-seq, bulk RNA-seq, shotgun metagenomics, EM-seq, and Stereo-seq datasets. Supplementary Table S2 summarizes major genomic and molecular findings across different countries and ancestral backgrounds. Supplementary Table S3 lists significantly mutated genes identified in non-hypermutated intestinal metaplasia samples. Supplementary Table S4 shows pathways up-regulated in intestinal lineage cells exhibiting high KRAS/ERK expression. Supplementary Table S5 provides a catalog of established IM and normal gastric organoid lines with relevant metadata. Supplementary Table S6 reports Hallmark and GO pathway enrichment analyses comparing KRAS/MAPK-mutated versus wild-type IM, and severe versus mild IM organoids. Supplementary Table S7 displays country- and risk-group-specific proportions of SBS mutational signatures derived from targeted panel sequencing. Supplementary Table S8 shows correlations between mutational signature exposures and patient age in IM samples, including significance from Pearson’s and Spearman’s tests with and without outliers. Supplementary Table S9 lists germline variants occurring in known somatic driver genes among IM patients. Supplementary Table S10 summarizes univariate and multivariate logistic regression analyses linking clinical and molecular risk factors to dysplasia and early gastric neoplasia. Supplementary Table S11 shows associations between bacterial genera abundance and PIGR mutational status in IM samples. Supplementary Table S12 profiles the composition of the oral microbiome from saliva samples of IM patients. Supplementary Table S13 lists Seahorse mitochondrial stress test instrument parameters and assay conditions. Supplementary Table S14 provides Seahorse mitochondrial stress test settings specific to DCA treatment experiments.
Supplementary Figure S1 shows differential sensitivity of intestinal metaplasia (IM) and normal gastric organoids to pyrvinium and pathway inhibitors, highlighting reduced ERK phosphorylation and selective growth suppression of IM organoids. Supplementary Figure S2 depicts clonal structures in IM samples using SciClone analysis and the effects of DCA-induced oxidative and DNA damage responses in gastric organoids. Supplementary Figure S3 illustrates global DNA methylation alterations in early gastric cancer, including PCA/clustering results, subtype-specific methylation patterns, and associations with replication timing and mutational signatures. Supplementary Figure S4 shows shared germline variant profiles among IM samples from six countries, with UpSet plots and pairwise SNP comparisons illustrating inter-population genomic overlap. Supplementary Figure S5 demonstrates risk stratification for early gastric neoplasia among IM patients using combined clinical and genomic predictors, including ROC and PR curve analyses. Supplementary Figure S6 presents histologic, immunohistochemical, and FISH validation of bacterial presence in gastric cancer tissues, confirming localization of Streptococcus and S. anginosus within tumor regions. Supplementary Figure S7 displays spatial mapping of bacterial reads in gastric cancer tissues, showing host cell-state organization and bacterial localization inferred from Stereopy analysis.
Acknowledgments
This research was supported by the National Research Foundation, Singapore, and Singapore Ministry of Health’s National Medical Research Council under its Open Fund-Large Collaborative Grant (“OF-LCG”; MOH-OFLCG18May-0003), the National Medical Research Council STAR grant MOH-000967, the Ministry of Education, Singapore, under its MOE Academic Research Tier 3 (RIE2025) MOE-MOET32021-0004, subaward from the Cancer Science Institute of Singapore, National University of Singapore, Duke-NUS Core funding, and Agency for Science, Technology, and Research, Singapore. This research was also supported by the National University of Singapore, Yong Loo Lin School of Medicine Internal Grant (Ref: NUHSRO/2022/071/NUSMed/Microbiome/LOA). We would also like to thank all the patients and investigators at participating endoscopy centres, laboratories, and research institutes for donating samples to this study. We also thank the investigators from the Singapore Gastric Cancer Consortium, the NUHS Tissue Repository, and the Duke-NUS Genome Biology Facility for providing scientific advice and technical assistance. We also thank MGI Tech Singapore’s STOmics team in Singapore for providing Stereo-seq reagents and guidance on experiments [RCA 2022-2295 (Cancer and Stem Cell Biology)]. We thank Dr Shang Li (Duke-NUS) for HEK293T cells and Dr Alvin Kunyao Guo (Duke-NUS) for flow cytometry support.
Footnotes
Note: Supplementary data for this article are available at Cancer Discovery Online (http://cancerdiscovery.aacrjournals.org/).
Data Availability
The raw data generated in this study are available in European Genome-Phenome Archive (EGA; RRID: SCR_004944) under the accession numbers EGAD50000001538 (EM-seq), EGAD50000001539 (WGS), EGAD50000001540 (targeted sequencing), and EGAD50000002010 (transcriptomic sequencing). Data for GCEP1000 are also available in EGA with the accession numbers EGAD00001010129 (targeted panel sequencing), EGAD00001010131 (bulk RNA-seq), EGAD00001010157 (WGS), and EGAD00001010166 (scRNA-seq). Access to these datasets can be requested from the Singapore Gastric Cancer Consortium Data Access Committee.
Authors’ Disclosures
P. Tan reports ownership of stock in Tempus AI, Inc. R.J. Huang reports grants from Genentech, Inc. and AI Medical Service, Inc. outside the submitted work. X. Lu reports grants from Ludwig Institute for Cancer Research and NIHR Oxford Biomedical Research Centre during the conduct of the study. K.G. Yeoh reports grants from Singapore Ministry of Health’s National Medical Research Council during the conduct of the study, as well as grants from MiRXES Lab Pte Ltd outside the submitted work. No disclosures were reported by the other authors.
Authors’ Contributions
K.K. Huang: Conceptualization, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. T. Hagihara: Resources, formal analysis, investigation, methodology. B.S.X. Lian: Investigation, methodology. Z.X. Ong: Formal analysis, investigation. S.K. Lim: Investigation. R.H.H. Chong: Data curation, investigation. S. Srivastava: Investigation. J.X. Kang: Formal analysis, visualization, methodology. M.Y. Lee: Formal analysis, visualization, methodology. A.L.-K. Tan: Project administration. M. Lee: Investigation, methodology. S.W.T. Ho: Resources. S. Aishah Binte Abdul Ghani: Investigation. C.S.Y. Ng: Investigation. R. Liang: Formal analysis. L. Liu: Investigation. S.T. Tay: Investigation. X. Ong: Formal analysis. F. Zhu: Project administration. H. Chen: Resources, methodology. Z. Li: Resources, methodology. T.L. Ang: Resources, investigation. T. Gotoda: Resources, investigation. R.J. Huang: Resources, investigation. C.J.L. Khor: Resources, investigation. H.-S. Kim: Resources, investigation. L.H.S. Lau: Resources, investigation. Y.-C. Lee: Resources, investigation. A. Takasu: Resources, investigation. M. Teh: Resources, investigation. M.Y. Thian: Resources, investigation. W.L. Tam: Supervision, methodology. X. Lu: Methodology. S.H. Wong: Supervision, methodology. J.B.Y. So: Resources. H. Chung: Resources, investigation. J. Lee: Resources, investigation, methodology. K.G. Yeoh: Conceptualization, resources, supervision, funding acquisition, investigation, methodology. P. Tan: Conceptualization, resources, data curation, supervision, funding acquisition, investigation, methodology, writing–original draft, project administration, writing–review and editing.
References
- 1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021;71:209–49. [DOI] [PubMed] [Google Scholar]
- 2. Yeoh KG, Tan P. Mapping the genomic diaspora of gastric cancer. Nat Rev Cancer 2022;22:71–84. [DOI] [PubMed] [Google Scholar]
- 3. de Vries AC, van Grieken NC, Looman CW, Casparie MK, de Vries E, Meijer GA, et al. Gastric cancer risk in patients with premalignant gastric lesions: a nationwide cohort study in the Netherlands. Gastroenterology 2008;134:945–52. [DOI] [PubMed] [Google Scholar]
- 4. Polk DB, Peek RM Jr. Helicobacter pylori: gastric cancer and beyond. Nat Rev Cancer 2010;10:403–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Yao X, Smolka AJ. Gastric parietal cell physiology and Helicobacter pylori-induced disease. Gastroenterology 2019;156:2158–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Suzuki A, Katoh H, Komura D, Kakiuchi M, Tagashira A, Yamamoto S, et al. Defined lifestyle and germline factors predispose Asian populations to gastric cancer. Sci Adv 2020;6:eaav9778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Grando SA. Connections of nicotine to cancer. Nat Rev Cancer 2014;14:419–29. [DOI] [PubMed] [Google Scholar]
- 8. Na HK, Lee JY. Molecular basis of alcohol-related gastric and colon cancer. Int J Mol Sci 2017;18:1116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Fu K, Cheung AHK, Wong CC, Liu W, Zhou Y, Wang F, et al. Streptococcus anginosus promotes gastric inflammation, atrophy, and tumorigenesis in mice. Cell 2024;187:882–96.e17. [DOI] [PubMed] [Google Scholar]
- 10. Weeks LD, Ebert BL. Causes and consequences of clonal hematopoiesis. Blood 2023;142:2235–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Tian R, Wiley B, Liu J, Zong X, Truong B, Zhao S, et al. Clonal hematopoiesis and risk of incident lung cancer. J Clin Oncol 2023;41:1423–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Feng Y, Yuan Q, Newsome RC, Robinson T, Bowman RL, Zuniga AN, et al. Hematopoietic-specific heterozygous loss of Dnmt3a exacerbates colitis-associated colon cancer. J Exp Med 2023;220:e20230011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Wong WJ, Emdin C, Bick AG, Zekavat SM, Niroula A, Pirruccello JP, et al. Clonal haematopoiesis and risk of chronic liver disease. Nature 2023;616:747–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Huang KK, Ma H, Chong RHH, Uchihara T, Lian BSX, Zhu F, et al. Spatiotemporal genomic profiling of intestinal metaplasia reveals clonal dynamics of gastric cancer progression. Cancer Cell 2023;41:2019–37.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Lee JWJ, Zhu F, Srivastava S, Tsao SK, Khor C, Ho KY, et al. Severity of gastric intestinal metaplasia predicts the risk of gastric cancer: a prospective multicentre cohort study (GCEP). Gut 2022;71:854–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. White JR, Banks M. Identifying the pre-malignant stomach: from guidelines to practice. Transl Gastroenterol Hepatol 2022;7:8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Salama NR, Hartung ML, Müller A. Life in the human stomach: persistence strategies of the bacterial pathogen Helicobacter pylori. Nat Rev Microbiol 2013;11:385–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Thorell K, Muñoz-Ramírez ZY, Wang D, Sandoval-Motta S, Boscolo Agostini R, Ghirotto S, et al. The Helicobacter pylori Genome Project: insights into H. pylori population structure from analysis of a worldwide collection of complete genomes. Nat Commun 2023;14:8184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Peek RM Jr, Blaser MJ. Helicobacter pylori and gastrointestinal tract adenocarcinomas. Nat Rev Cancer 2002;2:28–37. [DOI] [PubMed] [Google Scholar]
- 20. Nešić D, Buti L, Lu X, Stebbins CE. Structure of the Helicobacter pylori CagA oncoprotein bound to the human tumor suppressor ASPP2. Proc Natl Acad Sci U S A 2014;111:1562–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Tsang YH, Lamb A, Romero-Gallo J, Huang B, Ito K, Peek RM Jr, et al. Helicobacter pylori CagA targets gastric tumor suppressor RUNX3 for proteasome-mediated degradation. Oncogene 2010;29:5643–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Buti L, Spooner E, Van der Veen AG, Rappuoli R, Covacci A, Ploegh HL. Helicobacter pylori cytotoxin-associated gene A (CagA) subverts the apoptosis-stimulating protein of p53 (ASPP2) tumor suppressor pathway of the host. Proc Natl Acad Sci U S A 2011;108:9238–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Correa P, Piazuelo MB. The gastric precancerous cascade. J Dig Dis 2012;13:2–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Martínez-Jiménez F, Muiños F, Sentís I, Deu-Pons J, Reyes-Salazar I, Arnedo-Pac C, et al. A compendium of mutational cancer driver genes. Nat Rev Cancer 2020;20:555–72. [DOI] [PubMed] [Google Scholar]
- 25. Coorens THH, Collord G, Jung H, Wang Y, Moore L, Hooks Y, et al. The somatic mutation landscape of normal gastric epithelium. Nature 2025;640:418–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Liu Y, Sethi NS, Hinoue T, Schneider BG, Cherniack AD, Sanchez-Vega F, et al. Comparative molecular analysis of gastrointestinal adenocarcinomas. Cancer Cell 2018;33:721–35.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Xu C, Huang KK, Law JH, Chua JS, Sheng T, Flores NM, et al. Comprehensive molecular phenotyping of ARID1A-deficient gastric cancer reveals pervasive epigenomic reprogramming and therapeutic opportunities. Gut 2023;72:1651–63. [DOI] [PubMed] [Google Scholar]
- 28. Yuan W, Shi Y, Dai S, Deng M, Zhu K, Xu Y, et al. The role of MAPK pathway in gastric cancer: unveiling molecular crosstalk and therapeutic prospects. J Transl Med 2024;22:1142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Xue Z, Vis DJ, Bruna A, Sustic T, van Wageningen S, Batra AS, et al. MAP3K1 and MAP2K4 mutations are associated with sensitivity to MEK inhibitors in multiple cancer models. Cell Res 2018;28:719–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Wang X, Min S, Liu H, Wu N, Liu X, Wang T, et al. Nf1 loss promotes Kras-driven lung adenocarcinoma and results in Psat1-mediated glutamate dependence. EMBO Mol Med 2019;11:e9856. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Klomp JA, Klomp JE, Stalnecker CA, Bryant KL, Edwards AC, Drizyte-Miller K, et al. Defining the KRAS- and ERK-dependent transcriptome in KRAS-mutant cancers. Science 2024;384:eadk0775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Yang Q, Yasuda T, Choi E, Toyoda T, Roland JT, Uchida E, et al. MEK inhibitor reverses metaplasia and allows re-emergence of normal lineages in Helicobacter pylori-infected gerbils. Gastroenterology 2019;156:577–81.e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Kim H, Jang B, Zhang C, Caldwell B, Park DJ, Kong SH, et al. Targeting stem cells and dysplastic features with dual MEK/ERK and STAT3 suppression in gastric carcinogenesis. Gastroenterology 2024;166:117–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Nanki K, Toshimitsu K, Takano A, Fujii M, Shimokawa M, Ohta Y, et al. Divergent routes toward Wnt and R-spondin niche independency during human gastric carcinogenesis. Cell 2018;174:856–69.e17. [DOI] [PubMed] [Google Scholar]
- 35. Wang K, Yuen ST, Xu J, Lee SP, Yan HH, Shi ST, et al. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer. Nat Genet 2014;46:573–82. [DOI] [PubMed] [Google Scholar]
- 36. Díaz-Gay M, Vangara R, Barnes M, Wang X, Islam SMA, Vermes I, et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics 2023;39:btad756. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Ding Q, Edwards MM, Wang N, Zhu X, Bracci AN, Hulke ML, et al. The genetic architecture of DNA replication timing in human pluripotent stem cells. Nat Commun 2021;12:6746. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Poetsch AR. The genomics of oxidative DNA damage, repair, and resulting mutagenesis. Comput Struct Biotechnol J 2020;18:207–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Caliri AW, Tommasi S, Besaratinia A. Relationships among smoking, oxidative stress, inflammation, macromolecular damage, and cancer. Mutat Res Rev Mutat Res 2021;787:108365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Takeshima H, Ushijima T. Accumulation of genetic and epigenetic alterations in normal cells and cancer risk. NPJ Precis Oncol 2019;3:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Zhu F, Loh M, Hill J, Lee S, Koh KX, Lai KW, et al. Genetic factors associated with intestinal metaplasia in a high risk Singapore-Chinese population: a cohort study. BMC Gastroenterol 2009;9:76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Jaiswal S, Fontanillas P, Flannick J, Manning A, Grauman PV, Mar BG, et al. Age-related clonal hematopoiesis associated with adverse outcomes. N Engl J Med 2014;371:2488–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Bolton KL, Ptashkin RN, Gao T, Braunstein L, Devlin SM, Kelly D, et al. Cancer therapy shapes the fitness landscape of clonal hematopoiesis. Nat Genet 2020;52:1219–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Pich O, Reyes-Salazar I, Gonzalez-Perez A, Lopez-Bigas N. Discovering the drivers of clonal hematopoiesis. Nat Commun 2022;13:4267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Genovese G, Kähler AK, Handsaker RE, Lindberg J, Rose SA, Bakhoum SF, et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N Engl J Med 2014;371:2477–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Dawoud AAZ, Tapper WJ, Cross NCP. Clonal myelopoiesis in the UK Biobank cohort: ASXL1 mutations are strongly associated with smoking. Leukemia 2020;34:2660–72. [DOI] [PubMed] [Google Scholar]
- 47. Khoury JD, Solary E, Abla O, Akkari Y, Alaggio R, Apperley JF, et al. The 5th edition of the World Health Organization classification of haematolymphoid tumours: myeloid and histiocytic/dendritic neoplasms. Leukemia 2022;36:1703–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Mostov KE, Friedlander M, Blobel G. The receptor for transepithelial transport of IgA and IgM contains multiple immunoglobulin-like domains. Nature 1984;308:37–43. [DOI] [PubMed] [Google Scholar]
- 49. Cobo I, Tanaka T, Glass CK, Yeang C. Clonal hematopoiesis driven by DNMT3A and TET2 mutations: role in monocyte and macrophage biology and atherosclerotic cardiovascular disease. Curr Opin Hematol 2022;29:1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Choi E, Hendley AM, Bailey JM, Leach SD, Goldenring JR. Expression of activated ras in gastric chief cells of mice leads to the full spectrum of metaplastic lineage transitions. Gastroenterology 2016;150:918–30.e13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Matkar SS, Durham A, Brice A, Wang TC, Rustgi AK, Hua X. Systemic activation of K-ras rapidly induces gastric hyperplasia and metaplasia in mice. Am J Cancer Res 2011;1:432–45. [PMC free article] [PubMed] [Google Scholar]
- 52. Katz-Summercorn AC, Jammula S, Frangou A, Peneva I, O’Donovan M, Tripathi M, et al. Multi-omic cross-sectional cohort study of pre-malignant Barrett’s esophagus reveals early structural variation and retrotransposon activity. Nat Commun 2022;13:1407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Secrier M, Li X, de Silva N, Eldridge MD, Contino G, Bornschein J, et al. Mutational signatures in esophageal adenocarcinoma define etiologically distinct subgroups with therapeutic relevance. Nat Genet 2016;48:1131–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Endicott JL, Nolte PA, Shen H, Laird PW. Cell division drives DNA methylation loss in late-replicating domains in primary human cells. Nat Commun 2022;13:6659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Christensen S, Van der Roest B, Besselink N, Janssen R, Boymans S, Martens JWM, et al. 5-Fluorouracil treatment induces characteristic T>G mutations in human cancer. Nat Commun 2019;10:4571. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Pich O, Muiños F, Lolkema MP, Steeghs N, Gonzalez-Perez A, Lopez-Bigas N. The mutational footprints of cancer therapies. Nat Genet 2019;51:1732–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Malfertheiner P, Camargo MC, El-Omar E, Liou JM, Peek R, Schulz C, et al. Helicobacter pylori infection. Nat Rev Dis Primers 2023;9:19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Pan KF, Li WQ, Zhang L, Liu WD, Ma JL, Zhang Y, et al. Gastric cancer prevention by community eradication of Helicobacter pylori: a cluster-randomized controlled trial. Nat Med 2024;30:3250–60. [DOI] [PubMed] [Google Scholar]
- 59. Lee YC, Chen TH, Chiu HM, Shun CT, Chiang H, Liu TY, et al. The benefit of mass eradication of Helicobacter pylori infection: a community-based study of gastric cancer prevention. Gut 2013;62:676–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Fukase K, Kato M, Kikuchi S, Inoue K, Uemura N, Okamoto S, et al. Effect of eradication of Helicobacter pylori on incidence of metachronous gastric carcinoma after endoscopic resection of early gastric cancer: an open-label, randomised controlled trial. Lancet 2008;372:392–7. [DOI] [PubMed] [Google Scholar]
- 61. Zeng R, Sung JJY, Yu J. New pathogen for gastric cancer: Streptococcus anginosus. Clin Transl Med 2024;14:e70104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Kar SP, Quiros PM, Gu M, Jiang T, Mitchell J, Langdon R, et al. Genome-wide analyses of 200,453 individuals yield new insights into the causes and consequences of clonal hematopoiesis. Nat Genet 2022;54:1155–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Jaiswal S, Ebert BL. Clonal hematopoiesis in human aging and disease. Science 2019;366:eaan4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Cook EK, Luo M, Rauh MJ. Clonal hematopoiesis and inflammation: partners in leukemogenesis and comorbidity. Exp Hematol 2020;83:85–94. [DOI] [PubMed] [Google Scholar]
- 65. Stein A, Metzeler K, Kubasch AS, Rommel KP, Desch S, Buettner P, et al. Clonal hematopoiesis and cardiovascular disease: deciphering interconnections. Basic Res Cardiol 2022;117:55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics 2019;35:4453–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024;630:493–500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Zhang J, Bajari R, Andric D, Gerthoffert F, Lepsa A, Nahal-Bose H, et al. The international cancer genome Consortium data portal. Nat Biotechnol 2019;37:367–9. [DOI] [PubMed] [Google Scholar]
- 69. Akalin A, Kormaksson M, Li S, Garrett-Bakelman FE, Figueroa ME, Melnick A, et al. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol 2012;13:R87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Walker MA, Pedamallu CS, Ojesina AI, Bullman S, Sharpe T, Whelan CW, et al. GATK PathSeq: a customizable computational tool for the discovery and identification of microbial sequences in libraries from eukaryotic hosts. Bioinformatics 2018;34:4287–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Zhou R, Ng SK, Sung JJY, Goh WWB, Wong SH. Data pre-processing for analyzing microbiome data - a mini review. Comput Struct Biotechnol J 2023;21:4804–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Newman AM, Steen CB, Liu CL, Gentles AJ, Chaudhuri AA, Scherer F, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol 2019;37:773–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Yoshihara K, Shahmoradgoli M, Martínez E, Vegesna R, Kim H, Torres-Garcia W, et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun 2013;4:2612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol 2019;20:257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Fang S, Xu M, Cao L, Liu X, Bezulj M, Tan L, et al. Stereopy: modeling comparative and spatiotemporal cellular heterogeneity via multi-sample spatial transcriptomics. Nat Commun 2025;16:3741. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Table S1 details all IM and control samples with associated profiling platforms, including targeted DNA sequencing, WGS, single-cell RNA-seq, bulk RNA-seq, shotgun metagenomics, EM-seq, and Stereo-seq datasets. Supplementary Table S2 summarizes major genomic and molecular findings across different countries and ancestral backgrounds. Supplementary Table S3 lists significantly mutated genes identified in non-hypermutated intestinal metaplasia samples. Supplementary Table S4 shows pathways up-regulated in intestinal lineage cells exhibiting high KRAS/ERK expression. Supplementary Table S5 provides a catalog of established IM and normal gastric organoid lines with relevant metadata. Supplementary Table S6 reports Hallmark and GO pathway enrichment analyses comparing KRAS/MAPK-mutated versus wild-type IM, and severe versus mild IM organoids. Supplementary Table S7 displays country- and risk-group-specific proportions of SBS mutational signatures derived from targeted panel sequencing. Supplementary Table S8 shows correlations between mutational signature exposures and patient age in IM samples, including significance from Pearson’s and Spearman’s tests with and without outliers. Supplementary Table S9 lists germline variants occurring in known somatic driver genes among IM patients. Supplementary Table S10 summarizes univariate and multivariate logistic regression analyses linking clinical and molecular risk factors to dysplasia and early gastric neoplasia. Supplementary Table S11 shows associations between bacterial genera abundance and PIGR mutational status in IM samples. Supplementary Table S12 profiles the composition of the oral microbiome from saliva samples of IM patients. Supplementary Table S13 lists Seahorse mitochondrial stress test instrument parameters and assay conditions. Supplementary Table S14 provides Seahorse mitochondrial stress test settings specific to DCA treatment experiments.
Supplementary Figure S1 shows differential sensitivity of intestinal metaplasia (IM) and normal gastric organoids to pyrvinium and pathway inhibitors, highlighting reduced ERK phosphorylation and selective growth suppression of IM organoids. Supplementary Figure S2 depicts clonal structures in IM samples using SciClone analysis and the effects of DCA-induced oxidative and DNA damage responses in gastric organoids. Supplementary Figure S3 illustrates global DNA methylation alterations in early gastric cancer, including PCA/clustering results, subtype-specific methylation patterns, and associations with replication timing and mutational signatures. Supplementary Figure S4 shows shared germline variant profiles among IM samples from six countries, with UpSet plots and pairwise SNP comparisons illustrating inter-population genomic overlap. Supplementary Figure S5 demonstrates risk stratification for early gastric neoplasia among IM patients using combined clinical and genomic predictors, including ROC and PR curve analyses. Supplementary Figure S6 presents histologic, immunohistochemical, and FISH validation of bacterial presence in gastric cancer tissues, confirming localization of Streptococcus and S. anginosus within tumor regions. Supplementary Figure S7 displays spatial mapping of bacterial reads in gastric cancer tissues, showing host cell-state organization and bacterial localization inferred from Stereopy analysis.
Data Availability Statement
The raw data generated in this study are available in European Genome-Phenome Archive (EGA; RRID: SCR_004944) under the accession numbers EGAD50000001538 (EM-seq), EGAD50000001539 (WGS), EGAD50000001540 (targeted sequencing), and EGAD50000002010 (transcriptomic sequencing). Data for GCEP1000 are also available in EGA with the accession numbers EGAD00001010129 (targeted panel sequencing), EGAD00001010131 (bulk RNA-seq), EGAD00001010157 (WGS), and EGAD00001010166 (scRNA-seq). Access to these datasets can be requested from the Singapore Gastric Cancer Consortium Data Access Committee.







