Abstract
Computational modeling of somatic evolution, a process shaped by ecology and impacting both host cells and microbial communities in the human body, can capture important dynamics driving carcinogenesis. Here we considered models for esophageal adenocarcinoma (EAC), a cancer that has dramatically increased in incidence over the past few decades in Western populations, with high case fatality rates due to late-stage diagnoses. Despite advancements in genomic analyses of the precursor Barrett’s esophagus (BE), prevention of late-stage EAC remains a significant clinical challenge. Previous microbiome studies in BE/EAC have focused on quantifying static microbial abundance differences rather than determining population dynamics. Using whole genome sequencing data from a total of 505 esophageal samples, we first applied a robust bioinformatics pipeline to extract non-host DNA reads, mapped these putative reads to microbial taxa, and retained those taxa with high genomic coverage. When applying mathematical models of demographic stochasticity to sequential stages of progression to EAC, we observed evidence of neutral dynamics in community assembly within normal esophageal tissue and BE, but not EAC. In a large case–control study of BE patients who progressed to EAC versus BE patients with non-cancer outcomes (NCO) during follow-up (mean = 10.5 years), we found that Helicobacter pylori deviated significantly from the neutral expectation in BE NCO only, suggesting that factors related to H. pylori or H. pylori infection itself may influence EAC risk. Additionally, stochastic simulations incorporating selection recapitulated non-neutral behaviors observed. Formally modeling dynamics during progression holds promise in clinical applications by offering a deeper understanding of microbial involvement in cancer development.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12864-026-12545-w.
Keywords: Esophageal adenocarcinoma, Cancer microbiome, Mathematical modeling, Community assembly, Barrett’s esophagus, Helicobacter pylori
Introduction
To improve our understanding of microbiome dynamics in certain environments, we must elucidate the mechanisms driving microbial community assembly. Most studies toward this aim focus on the relative contribution of niche- vs. neutral processes. Mathematical modeling can be used to quantify such dynamics in microbial populations but thus far has been underutilized in the microbiome analysis of human cancers. The assembly dynamics of microbes in relation to human cancer progression, as well as their potential to enhance predictive models of cancer progression and metastasis, remain largely unexplored. The involvement of microbes in human cancer development is well-supported, specifically that many cancers (such as breast, lung, and colon cancers) evolve in the presence of microbial communities [1–4]. While research on the cancer microbiome is increasing, many sequencing studies of large patient cohorts to date have primarily focused on describing microbial abundances at static time points rather than modeling the temporal dynamics of populations or communities explicitly. Development of these models is essential for hypothesis testing, and for supporting rigorous conclusions on expected versus observed community assembly characteristics over time in the human microbiome.
Esophageal adenocarcinoma (EAC) is a cancer that occurs in the distal esophagus and is nearly always fatal when diagnosed in the advanced stage. EAC develops through a slow stepwise progression through defined premalignant stages of disease, with microbiome involvement at each stage [5–7]. More specifically, EAC arises in a premalignant tissue called Barrett’s esophagus (BE), a relatively common condition in the US (1–2% prevalence with incidence continuing to rise) wherein the normal squamous epithelium of the distal esophagus near the gastroesophageal junction is replaced with specialized intestinal metaplasia [8]. BE is often the result of longstanding exposure to stomach acid as an adaptive response of the esophageal tissue in patients with symptoms of gastroesophageal reflux disease (GERD). In this way, a common pathway to progress to EAC from normal esophagus is through the stages of GERD then BE, and eventually EAC. Patients diagnosed with BE are recommended to undergo regular surveillance exams throughout their lifetime to look for potential early-stage EACs that are easier to treat than late-stage cancers [9]. Surveillance includes taking tissue biopsies, which allows a spatial and temporal map of EAC development to be studied. Although nearly all EAC develops from BE [8], the annual progression rate to EAC from BE is very low (< 0.5% per year [10]) and thus most patients with BE undergo unnecessary surveillance. There is currently an unmet clinical need for improved cancer risk stratification in BE, including the development of reliable biomarkers for future EAC progression that enable the tailoring of cancer prevention strategies in a more personalized way. Therefore, EAC provides a useful system for developing temporal models, and by doing so, fulfills a critical knowledge gap in EAC prevention and the role of microbiome in cancer formation.
Recent studies have quantified the microbial communities within the esophagus at sequential disease states on the pathway to EAC [5, 7, 11–14]. These studies have shown shifts in the relative abundances of microbial taxa, Gram-negative versus Gram-positive bacteria, as well as microbial community diversity across stages of EAC progression. Additionally, characteristics of the oral microbiome have been shown to be associated with stages of progression in BE independent of patient oral health [15]. One dominant bacterium that has been examined extensively for its relationship to EAC is Helicobacter pylori. Although H. pylori is a known carcinogen with an established causal link to non-cardia gastric cancer and mucosa-associated lymphoid tissue (MALT) lymphoma [16], some epidemiological studies have demonstrated an inverse association with EAC [17–19]. A few studies even suggest the increase of widespread antibiotics and lower prevalence of H. pylori may be contributing to the fivefold increase of EAC cases in the United States and other Western populations over the last few decades [6, 20]. Although the absence of H. pylori has been shown to be associated with aneuploidy [21] and risk of advanced stages of high-grade dysplasia (HGD) and EAC in patients with BE [5], a mechanistic role of H. pylori in EAC progression, and the population dynamics of H. pylori growth in BE, have yet to be elucidated. Furthermore, a recent population-based study did not find an association with H. pylori eradication and EAC risk [22]. Overall, there is biological plausibility that altered microbiome and dysbiosis in BE may promote progression to EAC, but underlying mechanisms need to be defined. Here we investigate the microbial community dynamics evident in the stages of progression to EAC using mathematical models.
Existing microbiome studies of EAC relied on traditional statistical methods to test for differences in microbial species abundance between patient groups from single time-point data. These methods are often not designed for temporal dynamics and cannot be used to account for the expected variation in population sizes as the communities change over time. This requires time-dependent models of evolving populations during the progression to cancer. Relevant models of community assembly dynamics have been successfully applied to microbiome data of various environments such as compost, seawater, and sediment, as well as organisms such as roundworms, mice, and jellyfish [23]. They have also been applied to data on microbial communities within the human body such as comparing healthy versus diseased lung [24] and comparing human skin from differing geographical locations which found microbiomes from urbanized cities likely evolve through niche-based processes [25, 26]. To similarly capture population dynamics of each microbial taxa in the community as well as continual migration of microbes into the esophagus, we developed models of community assembly specifically for EAC progression. By assuming certain growth and migration parameters for different species, we tested hypotheses such as the presence of predominantly neutral dynamics driven by demographic stochasticity versus selective forces when fitting models to occurrence-abundance patterns measured in metagenomic data. Notably, these models focus on species-level selection in evolving microbial populations within the host rather than explicitly including gene-level selection, partly due to the microbial sequence resolution of the available human datasets.
Here we analyzed whole genome sequencing (WGS) data generated from 505 esophageal tissue samples obtained previously from patient cohorts with diagnoses spanning the disease continuum from normal esophageal mucosa to EAC. Briefly, our approach uses a custom pipeline to extract non-host DNA from each sample, classifies reads as microbial taxa with stringent genomic coverage requirements, and then applies mathematical models of community assembly at each stage progression to EAC separately. We observed evidence for neutral dynamics (i.e., assuming ecological equivalence between species) in normal, GERD, and BE tissue, but not in EAC. Our findings suggest that modeling microbial community assembly can provide insights into cancer risk and support new strategies for EAC prevention.
Materials and methods
Patient cohort information
To perform microbial analysis of sequential stages of EAC progression, we first performed a literature search for available deep whole genome sequencing (WGS) datasets of esophageal tissue samples from cohorts of individuals without histologically confirmed GERD as normal controls, patients with GERD, patients with Barrett’s esophagus, and patients with EAC. We were able to access published WGS data for each phenotype of interest, as outlined briefly in Table 1 (expanded details provided in Supplementary Table S1), and then analyzed the samples with our computational pipelines described in detail below. Previous microbiome studies have compared the esophageal microbial communities in patients with normal esophageal tissue versus those with BE, and/or with EAC [11, 12, 14, 30–34]. Few microbiome studies have separately categorized BE with low-grade and/or high-grade dysplasia [7, 30], and we also do not examine this herein.
Table 1.
Description of patient datasets
| Disease stage | Total # of samples | Data/Sample type | Ref |
|---|---|---|---|
| Normal esophagus | 50 | Whole genome sequencing (WGS), esophageal brushing | [14] |
| GERD esophagus | 29 | ||
| BE Non-Cancer Outcome | 160 | WGS, isolated BE epithelium from esophageal tissue biopsy | [27] |
| BE Cancer Outcome | 160 | ||
| EAC: Cohort 1 | 83 | WGS, EAC tumor tissue | [28] |
| EAC: Cohort 2 | 23 | WGS, EAC tumor tissue | [29] |
| Total | 505 |
Disease stage characteristics, data and sample type, numbers of samples analyzed, and publication reference provided for each dataset (see Supplementary Table S1 for expanded details on the cohorts and samples included). BE = Barrett’s esophagus, EAC = esophageal adenocarcinoma, WGS = whole genome sequencing
In particular, the BE cohort analyzed in our study has the unique advantage of enabling the comparison of baseline whole genome sequencing data (WGS) data obtained from BE tissue samples in patients with BE who later progressed to EAC (‘cancer outcome’; CO) versus baseline WGS data from BE samples in patients with BE who did not progress to EAC (‘non-cancer outcome’; NCO) [27]. Eighty patients were included, each with two biopsy samples taken at two time points (320 total samples; see Supplementary Figure S1). The WGS data were obtained as part of a retrospective case–control study performed using the Seattle Barrett’s Esophagus Cohort (see [27] for details). The NCO patients remained cancer-free over median 17.47 years of active follow-up and across both time point 1 (TP-1) and time point 2 (TP-2). The CO patients had BE at (TP-1), and then progressed to EAC that was diagnosed (TP-2). For all patients at both TP-1 and TP-2, biopsy samples were taken at 2 different levels within their Barrett’s segment length (see [35] for exact distances from gastroesophageal junction and patient ages). Notably, all samples for this BE dataset were derived from DNA obtained from the isolated epithelium of the BE biopsies, rather than whole biopsy specimens. For 37 of 40 of patients with a cancer outcome and 29 of 40 of patients with no cancer outcome, extensive interviews were performed and data on whether acid suppression medication was used prior to or during the study was available.
To compare BE to the normal esophageal microbiome, we also analyzed WGS data obtained from esophageal brushings of histologically normal tissue (n = 50) from patients with gastrointestinal symptoms but without histologic evidence of reflux esophagitis. Additionally, we analyzed esophageal tissue with histologically confirmed reflux esophagitis from patients with gastroesophageal reflux disease (GERD; n = 29). GERD is of interest because it is a high-risk condition for BE and symptomatic GERD (erosive esophagitis) is a common precursor stage. The samples were labeled as normal or GERD based on a histological report of the patient from which the sample was taken during an upper gastrointestinal endoscopy [14].
Additionally, we analyzed EAC (n = 83) tissue samples from The International Cancer Genome Consortium (ICGC) to complete the histologically normal to cancer microbiome continuum (EAC cohort 1) [28]. Finally, to validate our findings we analyzed an additional EAC dataset (n = 23) from patients who had visible BE (EAC cohort 2) [29]. Components of this dataset are now also included in ICGC and so, to ensure independence, we removed any overlapping samples from analysis of the ICGC dataset. A full description of the 505 total samples analyzed in our study is provided in Supplementary Table S1. For BE, main results described below include the 80 samples from the 40 NCO patients at TP-1 (BE NCO cohort) versus the 80 samples from the 40 NCO patients at TP-1 (BE CO cohort) prior to their EAC diagnosis.
Computational host filtration pipeline and identification of microbial taxa
First, we removed human DNA from the WGS data using our validated computational pipeline (Fig. 1A, see [36] for details). Briefly, host filtering is performed using minimap2 (v.2.17), mapping reads against the GRCh38.p14 and T2T-CHM13v2.0 human genome assemblies [37]. Remaining reads are considered non-host, and were mapped to known microbial species using the SHOGUN [38] pipeline and the Web of Life version 1 database [39]. To avoid false positive from potentially host-contaminated microbial reference genomes, we applied Exhaustive [40] and Conterminator [41] with human references GRCh38.p14 [42], T2T-CHM13v2.0 [43], and the Human Pangenome Research Consortium (HPRC) pangenomes [44] against the Web of Life, subsequently identifying 236 microbial taxa with regions of their genomes aligning with human reads (as was done with Web of Life-clean [40]). Out of an abundance of caution, we dropped any occurrences of the 236 potentially contaminated taxa from our taxonomy tables. In this pipeline, host filtering steps are essential to avoid the potential erroneous assignment of human DNA to microbial taxa. We note that the procedure outlined above was performed across all datasets analyzed in main results (Supplementary Table S1) and for 377 buccal mucosa samples from the Human Microbiome Project that were used to evaluate Helicobacter pylori prevalence in oral communities [45, 46].
Fig. 1.
Neutral and non-neutral community assembly and model dynamics in the context of the esophagus microbiome. A Analysis workflow for included samples with available sequencing data: host DNA depletion, taxonomic classification, and modeling. B Model overview. C Neutral model with mixed microbes in the community and an equal birth and death rates among all microbial taxa. D Non-neutral model with potential ecological dominance of particular microbial taxa in the community enabled by different birth and death rates among microbial taxa
To reduce risk of false positives from short read multimapping, we applied a novel coverage dispersion filter called micov which eliminated taxa with low genomic coverage [47] (https://github.com/biocore/micov). Using micov, we determined coverage dispersion thresholds for microbial genomes that were represented or mismapped by testing on a Staphylococcus aureus monoculture [48] subset to various sequencing depths. We were able to distinguish Staphylococcus aureus and other similar species in the Staphylococcus genus from all other microbial mis-mappings with a sensitivity of 87.8% and specificity of 95.18%. Our criteria for determining if DNA reads were mis-mapped to a particular taxon when genome coverage was less than 10% across all samples (those taxa with > 10% coverage kept, as determined using Zebra [48]) were: A) there were fewer than 4 unique areas of the taxa with mapped reads, B) greater than 75% of reads mapped to the same 3 consecutive bins of the genome, or C) greater than 75% of reads mapping to the taxon are only mapped to a single consecutive region of the genome in any one sample (see Supplementary Methods for details). Taxa meeting any of the 3 criteria above were removed from downstream analyses. For the BE NCO and BE CO dataset, only criteria A was used for filtering due to the low microbial biomass of these samples. Within each dataset, samples were rarefied and samples with less than the rarefaction depth were removed (Supplementary Figure S2). All taxa were collapsed to species level unless otherwise stated.
Additionally, to confirm our modeling results were robust to varying microbial DNA classification approaches, we also analyzed the BE NCO and BE CO datasets using MetaPhlAn4 with the flags “-t rel_ab_w_read_stats –unclassified_estimation” (run using the Nextflow workflow available at https://github.com/FredHutch/metaphlan-nf commit 4a31444, and the reference database version mpa_vJan21_CHOCOPhlAnSGB_202103) [49]. Notably, MetaPhlAn was selected as a top performing tool in the Critical Assessment of Metagenome Interpretation Round I [50] and Round II [51]. Finally, we performed an additional analysis that restricted mapping exclusively to a list of previously documented human-associated microbes (see Supplementary Table S1A from [3]).
Statistical analysis of metagenomic data
All statistical analysis of the metagenomic abundances generated using the methods described above were performed using QIIME2 (v. 2023.2.0) [52] using the API in Python (v. 3.8.16). Beta-diversity was calculated using dimensionality reduction of the BIOM table with Gemelli’s Robust Principal Component Analysis (RPCA) function using DEICODE (v. 0.2.4) [52, 53] to create a distance matrix for PERMANOVA and RPCA-Principal Component Analysis plots.
Community assembly modeling of human esophageal microbiome
Figure 1B illustrates the model setup. The esophageal microbiome contains different microbial species undergoing growth dynamics described mathematically by stochastic birth–death-immigration processes. We assume there are approximately constant N microbial cells in an adult patient’s esophageal microbiome community and Ni represents the number of microbes of species i in the esophagus (thus N = ΣNi). When an individual microbial cell dies in the esophagus, it can be replaced by either another microbial cell that replicates in the present esophageal community or it can be replaced by a microbial cell entering the microenvironment from a source pool. Immigration of microbes from a source pool to the esophagus likely includes microbes from the oral cavity, proximal stomach, and the adjacent normal tissue in the case of BE microbiome. Immigration allows new microbes to enter the esophagus, eliminating the unrealistic fixation of a single microbial species with the above dynamics that includes stochastic extinction. Following the mathematical theory developed previously for these modeling assumptions [23, 54], we applied two different approaches for building the community assembly models based on the presence/absence of selection of certain microbial species within the community.
The first approach is built on neutral theory [55] and thus assumes the rates for birth (bi) and death (di) are the same for all microbes in a community (i.e., bi = b and di = d). Thus, in this model, no microbial species has a selective growth advantage over any other (Fig. 1C). Here, we are defining a selective advantage as an increased birth or decreased death rate for one microbe over others. The second more general approach is built on a non-neutral model called niche theory which assumes that some microbes may have a selective advantage over others. This model accounts for differences in proliferation and death rates across microbial species, allowing for certain microbes to differ in abundance and probability of extinction (Fig. 1D).
When a single microorganism dies, the probability of being replaced by a microbe outside the present esophageal community is represented by m, known as the immigration rate. Alternatively, the probability of being replaced by a species present in the esophagus is (1-m). The probability of a particular species of microbe being chosen for replacement with a birth event is based on the abundance of that microbe in the esophagus (Ni/N) and the source community pool probability for that microbe (pi). As discussed in Sieber et al. [23], we can then calculate the probability that the abundance of species i may increase, decrease, or remain unchanged following the death of a microbe in the local microorganism community using the transition probability master equations corresponding to Hubbell’s original neutral model [55].
If N is large, we can then assume the relative abundance of species i in the esophagus, xi = Ni/N, to be approximately continuous and use the Fokker–Planck equation to model the continuous approximation for the probability density function Φi(xi;N, pi,m). We can then approximate the long-term equilibrium solution of this equation by (see [23] for full derivation),
![]() |
1 |
To connect this to empirical data, we use nonlinear least squares minimization to fit the observed occurrence frequencies fi from the rarefied data to the following truncated cumulative probability density function for observing a particular species in a local community/sample:
![]() |
2 |
where N is the total number of species counts per sample and the relative abundance of species i in the source pool (pi) is approximated by the mean relative abundance of species i across all samples in a given dataset. Thus with pi and the total microbial population in the esophagus (N) defined, the only free parameter to fit to the data is the immigration rate m. Additionally, we calculated the standard coefficient of determination (R2 = 1—the ratio of the sum of squared residuals and the total sum of squares) to provide a measure of goodness of fit for best-fit neutral models in each dataset (see ref [23] for details).
Incorporation of selection into model structure and Gillespie simulations
In the non-neutral model, we assume that each microbial species i has a species-specific birth rate (bi) and death rate (di) that can vary from one another. We assume that the same population dynamics for immigration apply as above- when a microbe dies, it can either be replaced by a microbe inside the source pool or resident within the esophagus. In the non-neutral case, the probability of a microbe being chosen for replacement is not based solely on the relative abundance of the microbe but also on birth and death rates, so we modify the transition rates to take into account both the microbial taxa member that dies (j) and the microbial taxa which will increase in number (i) when calculating the abundance of microbes in the esophagus (N):
![]() |
3 |
In this way, the transition probabilities were modified to account for both the increase of species i, and the decrease of species j.
The probability distribution for the community composition in the non-neutral system can be analytically solved at equilibrium for the case of two different microbial species. However, in a realistic model for a local community considering potentially hundreds of microbial species identified in human tissue samples, the size of the system consisting of transition probabilities becomes too large to be computationally feasible, as discussed in Zapién-Campos et al. [54]. Therefore, we use the Gillespie algorithm to perform stochastic simulations of this model using transition probabilities in Eq. (3) in order to compute the mean relative abundances and occurrence frequencies of the fluctuating microbial populations needed to characterize the community dynamics [54]. When performing simulations, we used the number of hosts, taxa, and microbes per host derived from the dataset of interest with a fixed time simulation of t = 10. To set the initial conditions of the community assembly for each sample in the simulated data at a starting time point, we randomly drew N microbes according to the source pool distributions pi, where we again approximate this distribution by using the microbial relative abundances from the taxonomic table for that dataset. We are then able to test different hypotheses on selective growth advantages for certain microbial species and compare simulated results to real data.
Results
Microbial diversity differs across natural history of EAC progression
We first quantified the microbial species present in each tissue sample by mapping DNA sequencing reads to curated databases of human and microbial genomes (see Materials and methods), and then compared relative abundances between esophageal datasets (Supplementary Figure S3). We found similarities at the phylum level across disease states, although normal esophagus and GERD had higher levels of Firmicutes and Bacteroidetes compared to others. We found many shared taxa between datasets at the species level (Supplementary Figure S4). Alpha-diversity was measured within each disease-stage group using metrics for both richness (Fig. 2A) and Shannon entropy (Fig. 2B). Alpha-diversity measures the species diversity within a community on a local scale (i.e., within 1 sample). There is a relationship between richness (observed features) and Shannon entropy, wherein the equation for Shannon entropy combines information on both richness and species abundance distribution. Specifically, we calculate Shannon entropy
where pi is the relative abundance of species i (pi = count of species i divided by total counts) and S is the number of observed features (richness).
Fig. 2.
Alpha- and beta-diversity metrics across disease sets. A Microbial alpha-diversity represented by taxon richness (i.e., unique observed features) across disease stages. B Microbial alpha-diversity represented by Shannon entropy across disease stages. C Differences in community structure represented by the RPCA-PCoA between disease stages (PERMANOVA p = 0.0001, pseudo-F = 487). Shape corresponds to the disease stage: square = Deshpande et al., 2018, circle = Paulson et al., 2022, diamond = Ross-Innes et al., 2015 & International Cancer Genome Consortium (ICGC) et al., 2020
The median Shannon entropy metrics across disease states were: Normal- 4.82 (IQR = 4.54–5.21), GERD- 5.17 (IQR = 4.64–5.38), BE NCO- 3.79 (IQR = 3.71–3.98), BE CO- 3.76 (IQR = 3.67–3.85), EAC cohort 1- 2.48 (IQR = 2.26–2.81), EAC cohort 2- 2.90 (IQR = 2.84–2.92). We found statistically significant differences across all groups in alpha-diversity (Kruskal–Wallis p = 4.81 × 10–43; Shannon entropy) and then in post-hoc Dunn’s tests pairwise comparisons between disease stages of Normal/GERD vs. BE vs. EAC (p < 0.01, Shannon entropy). Non-significant differences in pairwise comparison Dunn’s tests (Shannon entropy) were found for Normal versus GERD, BE CO versus BE NCO, and EAC cohort 1 versus EAC cohort 2. For Normal/GERD samples, we found a high richness (median = 296 in Normal esophagus, 309.5 in GERD) and largest Shannon entropy metrics. This is potentially influenced by the use of esophageal brushings rather than esophageal biopsies to obtain these samples. Overall, we found decreasing Shannon entropy as stages progressed to EAC, which is consistent with previous findings [11, 12].
We also calculated beta-diversity with DEICODE [53] using a Robust principal component analysis (RPCA; Fig. 2C). Beta-diversity compares the differences in microbial composition between communities (i.e., between samples) and is useful for assessing how communities vary across conditions or environments. We observed clustering among disease stage groups: Normal and GERD samples, BE samples, and EAC samples. As with alpha-diversity, we observed using pairwise PERMANOVA (pseudo-F statistic) that all datasets were statistically significantly different (p = 0.0001). This holds across all pairwise comparisons (p = 0.0001) except for Normal versus GERD (p = 0.4801) and BE NCO versus BE CO (p = 0.5381). When performing the same analyses using batch-corrected metagenomic data for the two EAC cohorts (each cohort considered a batch of same phenotype of interest with tool ConQuR [56]), we found very similar results for alpha and beta diversity metrics (Supplementary Figure S5).
Neutral models generate occurrence-abundance patterns seen in development from normal esophagus to BE
Although comparing exact taxonomic differences between datasets may be influenced by sample collection methods and thus the microenvironments included in the sequencing reads (see Supplementary Table S1), comparing population dynamics predicted by community assembly models can better avoid such biases because conclusions are not based as strongly on specific taxa. We first applied the neutral model (where no bacterial species has a growth advantage over another) to all esophagus disease-stage subsets (Fig. 3, see Materials and methods). In general, we found high R2 values for samples from normal esophagus and GERD (R2 > 0.80) and BE (R2 > 0.65) suggesting that neutral dynamics, driven by demographic and environmental stochasticity, can explain most species populations in these cohorts. We found occurrence-abundance patterns measured in BE NCO and CO samples were consistent across time points, evidence that the total community structure is likely at steady state (Supplementary Figure S6).
Fig. 3.
Community assembly dynamics in progression to EAC. Plots are ordered in sequence of disease state progression from normal to EAC (see Supplementary Table S1 for patient/sample inclusion). The red curves represent the neutral model fit to data points representing unique taxa at the species level. The gray regions delineated by red dotted lines represent 95% bootstrap confidence intervals (obtained by resampling the hosts 100 times with replacement and refitting) and the R2 value represents goodness of fit to the neutral model. The color of the data point represents phyla. Helicobacter pylori is colored pink with a box outline. Vertical line indicates mean relative abundance = 10–4. H. pylori is found with the following occurrence frequency in each group: 19.0% in Normal, 15.4% in GERD, 14.3% in BE NCO, 5.1% in BE CO, 8.8% in EAC cohort 1, 0.0% in EAC cohort 2
Neutral model goodness-of-fit R2 values decreased as stages progressed from normal to EAC. In the case of EAC cohort 1, the neutral model provides a poor fit to the data (R2 = 0.14), suggesting potential evidence for non-neutral microbial dynamics in EAC patients. In particular, we see there may be some species undergoing more neutral dynamics (S-shaped curve of data points for EAC), but a large number of microbial species are found in higher abundance than expected by the neutral expectation. The EAC samples had the largest number of microbial taxa deviating from the neutral curve fit. Similar results for the EAC occurrence-abundance pattern were found in the Ross-Innes et al. EAC cohort 2 dataset (R2 = 0.34, Fig. 3). Interestingly, a subset of Proteobacteria in both EAC datasets appear to have a neutral S-shaped occurrence-abundance curve while many other taxa at higher mean relative abundance deviate significantly from this pattern.
In Fig. 3, the top five most common phyla found in the esophagus are denoted by different colors. H. pylori (pink color designation in each disease-stage panel), although predominantly recognized as a gastric mucosa colonizer, has been found previously in the distal esophagus and aggravates reflux injury [57, 58]. As noted above, H. pylori is of particular interest because of data reporting an inverse association with BE [19] and EAC development [59] in some studies, yet causal association in non-cardia gastric adenocarcinoma through an analogous intestinal metaplasia-dysplasia-adenocarcinoma sequence (also known as the Correa cascade) [16]. We identified H. pylori in samples from every disease stage except for EAC cohort 2 samples from Ross-Innes et al. [29]. The only instances that data for H. pylori deviated significantly from the neutral curve fit was in the GERD group and the BE NCO group. These results hold when sub-setting the cohorts by sex as an EAC risk factor (males have significantly higher risk of EAC than females [60]), except in the case of males with GERD where occurrence-abundance of H. pylori was close to the neutral expectation. Although the sample size for the male GERD patient subset was modest (n = 10), we also found H. pylori at a lower mean relative abundance (1 × 10–6) compared with female GERD patients (n = 19) who had a mean relative abundance of 3.4 × 10–5.
To validate these findings, we fit the neutral model to BE NCO and CO datasets after preprocessing with the MetaPhlAn4 pipeline (see Materials and methods). We identified fewer microbial species overall with this method, but H. pylori is still identified as likely neutral in only the BE CO cases (Supplementary Figure S7).
Finally, we restricted our microbial taxa to those previously documented as human-associated and still found that 1) normal esophagus, GERD and BE show evidence for neutral dynamics while EAC does not, and 2) GERD and BE NCO are the only cases where H. pylori is outside the 95% confidence interval for expected neutral dynamics (Supplementary Figure S8). The top 10 bacterial species with largest deviation from the 95% confidence intervals for a neutral model fit found in each of the BE NCO and CO datasets are provided in Supplementary Tables S2 and S3. Other non-neutral microbes beyond H. pylori included Klebsiella pneumoniae in the NCO BE dataset (nearly identical occurrence-abundance as H. pylori), Prevotella copri in the BE CO cohort, and Neisseria mucosa in both cohorts.
Helicobacter pylori exhibits positive selection in non-progressing BE
Because H. pylori specifically deviated from the neutral model fit in the BE NCO cohort (Fig. 3), we performed Gillespie simulations for neutral and non-neutral scenarios for this dataset. This allowed us to use the relative abundances of taxa observed in the BE NCO tissue samples but vary the birth and death rates of each microbe to evaluate what assumptions for dynamics would represent the data most closely. In the BE NCO neutral model simulation, we found that H. pylori was predicted to have occurrence-abundance data close to the neutral fit curve (Supplementary Figure S9A) unlike the actual data (Fig. 4A). Additionally, simulated H. pylori occurrence frequency was much higher than the occurrence frequency measured in the actual data when assuming a source pool prevalence for H. pylori equaling its mean relative abundance in the BE data (Supplementary Figure S9A). In Supplementary Figure S9B, we show different simulation results for H. pylori with varying birth and death rates and source pool conditions. We found consistent data on H. pylori occurrence-abundance in simulations compared to the empirical data (pink dot with box outline) when assuming a source pool prevalence probability of 1e-05, and thus we used this value for H. pylori for all further model simulations. To examine the oral microbiome as one contributor to the BE source pool, we also found that H. pylori was present in 39 of a total of 377 (15%) buccal mucosa samples from the Human Microbiome Project [45, 46] at a mean relative abundance of 1.13e-05 [45, 45], similar to our assumed value of 1e-05 based on fits to the data. Through varying source pool prevalence and growth rates simultaneously, this analysis also provided a plausible range of assumed immigration rate and relative birth/death rates versus those of other microbial taxa that could create simulated H. pylori occurrence-abundance data that recapitulates the empirical data.
Fig. 4.
Simulation results for BE patients with non-cancer outcome (NCO). A Same data as shown for BE NCO panel in Fig. 3, with neutral model fit for BE NCO patients. B Neutral simulation with data from the BE NCO patients using the adjusted source pool conditions for H. pylori. All (‘A’) microbes have an equal birth and death rate of 1. C Non-neutral simulation for BE NCO data with H. pylori birth and death rates equal to 1 and 0.5, respectively. For all other ('O') microbes, birth rates are drawn independently from a normal distribution (mean = 1, standard deviation = 0.1) and the death rate is set equal to 1. For A-C, pink designates data for H. pylori. For B-C, the source pool prevalence of the H. pylori taxa is set to 1e-5 (H. pylori SP = 1e-5, see Results and Supplementary Figure S9 for details). SP = source pool; HP = H. pylori
More specifically, even with the modified source pool assumption for H. pylori, simulating the full community assembly using the neutral model (assuming all microbes have equal birth rates and equal death rates) generated H. pylori occurrence-abundance data that does not resemble the empirical data (Fig. 4A vs 4B). Therefore, to test the effects of potential non-neutral dynamics, we modified the simulations by assuming that H. pylori has a selective advantage over other microbials species. We tested a variety of different parameter settings to compare to the data in Fig. 4A and found that assuming a lower death rate for H. pylori (dHP = 0.5) compared with other microbes (dA = 1) accurately reflected the actual data (Fig. 4C, Supplementary Figure S9C). We also experimented with variable birth rate and found increased birth rate gives a similar result and fit to the data (Supplementary Figure S9D). Overall, assuming higher expected growth rates (birth – death) for H. pylori compared to other species was necessary to reflect the real data, which aligns with their known predominance in bacterial populations of the stomach when present.
Neutral dynamics do not explain occurrence-abundance patterns in EAC tissue
Because we observed that the neutral model did not fit well to the data for EAC samples, we also performed stochastic simulations based on this dataset to test effects of differing assumptions for microbial population dynamics. With relative abundances from the EAC cohort 1 used as the source pool prevalence for each species, we ran neutral model simulations where the birth and death rates across all microbes were set equal to 1 and resulting occurrence-abundance did not reflect the data, as anticipated (Fig. 5A vs. 5B). We next created simulated data with a random birth rate ranging from 1 to 6 for each microbial species and a constant death rate equal to 1 for each species, and found R2 = 0.59, which more closely resembles the empirical data (Fig. 5C). We re-ran the simulations again with a random death rate ranging from 1.5 to 4 for each microbe and a constant birth rate equal to 4 for each species and found a similar pattern to the original data with R2 = 0.74 (Supplementary Figure S10A). We then varied the birth rate (normally distributed with mean = 1, standard deviation = 0.1) as well as varied the death rate (normally distributed with mean = 1, standard deviation = 0.1) and found that these models did not recapitulate the original data (Supplementary Figure S10 B,C).
Fig. 5.
ICGC EAC cohort 1 data with fits assuming neutral model at steady-state, neutral simulation, and non-neutral simulation. A Same data as shown in Fig. 3, neutral model fit to data from EAC cohort 1 patients in ICGC dataset. B Neutral model simulation with data from the EAC cohort 1 patients in ICGC dataset. All (‘A’) microbes have an equal birth and death rate of 1. C Non-neutral model simulation with all microbes having variable birth rate drawn independently from a uniform distribution ranging from 1 to 6, and a death rate of 1. ICGC = International Cancer Genome Consortium
Finally, when assuming the same growth rates used in Fig. 5C but varying the migration parameter m from 0.0001 to 1, we found that simulated data most closely resembled the original data when m = ~ 0.1 (Supplementary Figure S11). Although there may be multiple fits to the data with certain parameter choices (species-specific prevalence in in source pool[s], birth rate, death rate, and migration rate), our simulation results collectively suggest that substantial differences in taxa-specific growth rates are needed to recapitulate the empirical data for EAC in our model framework.
Discussion
With increasing cases of both BE and EAC in Western populations, new approaches to better understand contributions to EAC risk are essential. As was found with the established role of H. pylori in the formation of peptic ulcers and gastric cancers, and the role of a diverse oral microbiome in oral and systemic health [61], the esophagus also provides a microenvironment for microbial populations to thrive and potentially contribute to patient health and disease. The communities have been quantified previously in numerous studies, mainly using 16S rRNA sequencing, and may influence progression to BE and subsequently EAC but the expected population dynamics of these communities has not been previously explored. Therefore, our aim was to better understand the communities within the esophageal microbiome as a patient progresses from normal tissue to EAC.
To explore dynamics within the esophageal microbiome during EAC progression, we utilized whole genome sequencing (WGS) data from esophageal tissue samples and applied a rigorous host depletion pipeline to ensure we removed all human DNA reads [36] and mapped any potentially microbial reads to an extensively cleaned database (WOL-clean) as well as using a coverage filter called micov. Comparing histological subtypes at each disease stage using the ecological diversity measure of Shannon entropy, we found decreasing diversity from normal esophagus to GERD to BE to EAC.
Study limitations in these analyses include varying sampling techniques and centers (Supplementary Table S1), as well as a lack of reagent ‘blank’ controls and positive spiked controls run in parallel, which would increase confidence in minimizing contamination [4]. Due to limited availability of deep WGS datasets of pre-cancerous esophageal tissues, we analyzed single studies for each disease stage before EAC. For consistent comparison, analyzing datasets that include samples of multiple disease stages sequenced under the same conditions will help to mitigate any potential batch effects across studies. Additionally, we did not assess differences between more advanced premalignant stages within BE such as low-grade dysplasia and high-grade dysplasia.
The effects on our results introduced by these limitations are minimized when considering the community dynamics with rarefied data, where we independently assess data for each disease stage and compare observed occurrence-abundance patterns across stages of progression. We observed that patterns in earlier stages of progression (normal esophagus, GERD, and BE) can be recapitulated with assumption of a neutral model. Zapien-Campus et al. note that such patterns can in fact be recreated by specific assumptions for non-neutral dynamics [54] but the hypothesis of neutrality cannot be rejected for this data. In contrast, patterns in the data for EACs appear quite different, and only a subset of the microbial taxa appear to fit a neutral expectation in their occurrence-abundance patterns. This may imply that more of the EAC microbial populations are undergoing selection pressures in the tumor microenvironment. By simulating data using mathematical models of demographic stochasticity, we can quantify the parameter regimes of differential selection (as measured by population growth rates) that most closely match the empirical data.
Additionally, metagenomic data on microbial prevalence in source pools across the human body, such as the proximal stomach, is currently limited so we did not fully explore variations in the effective species-specific immigration rates beyond those included for H. pylori specifically. Ultimately, it would be ideal to infer all model parameters simultaneously for all species in the system. Recent methods using Bayesian inference that utilize equations for statistical moments of the microbiome composition show promise for this task [62] but these are still expected to work most effectively using longitudinal (time series) data tracking only a small number of species.
We specifically examined H. pylori due to its potentially protective role in EAC development. Across disease states, the mean occurrence frequency of H. pylori was ~ 10%, however, its mean relative abundance across samples was greatest in the NCO group (10–3) compared to other groups (< 10–4). This may suggest that H. pylori has an active role in preventing progression to EAC in NCO patients and BE in patients with GERD, or at least delaying the onset, but further mechanistic studies are needed, particularly since confounding cannot be ruled out here. The only other group that demonstrated non-neutral H. pylori was the GERD group. These patients have not yet crossed over the critical threshold to BE and further stages but may develop BE at a later time point.
Additionally, other microbes in the NCO BE dataset were found to be under possible selection including Klebsiella pneumoniae which had a very similar abundance and occurrence to H. pylori. K. pneumoniae is often found in the human gut but causes serious infections across other tissue types [63] and was linked to Acute Esophageal Necrosis in a case study [64]. There were a few disease-specific microbes that were identified as non-neutral in both cohorts, including Neisseria mucosa, commonly found in the oral and nasal mucosa but can cause upper respiratory infection [65]. Prevotella copri was only found to be non-neutral in the BE CO cohort and is often found in the human gastrointestinal microbiome with varying positive and negative effects [66], but studies have found correlations between P. copri abundance and exacerbated intestinal mucositis [67] as well as colitis [68] in mice. Additionally, P. copri, similar to H. pylori, is implicated in gastric carcinogenesis [69] and was associated with an increased risk for gastric cancer development in a Korean population [70].
Finally, when we modeled the assumption that H. pylori has a higher survival rate compared to other microbes in the esophagus, which was motivated by the observation that H. pylori unequivocally dominates the gastric microbiome when present, the data recapitulated what we observe in BE non-progressors. Although BE patients in our study with available medication data were all prescribed acid suppression medication while on study and/or prior to timepoint 1, higher gastric pH achieved through effective acid suppression should induce a replicative phase for H. pylori biologically, leading to higher abundance theoretically. However, validating this hypothesis would require active monitoring of pH and acid levels in the patient cohort, which was beyond the scope of care at that time. One further hypothesis is that the ecological dominance of H. pylori in the esophagus is protecting against the establishment of microbes that are detrimental to the esophagus and promote BE transformation to EAC.
Well-designed mechanistic and translational studies evaluating the impact of H. pylori virulence factors (e.g., CagA), host genetics, as well as the impact of gastric histology (e.g., gastric atrophy with or without metaplasia) and gastric pH, factors both influenced by H. pylori, on BE development and progression would be informative. Although further validation is needed, our results suggest microbes such as H. pylori in precursor stages of GERD and indolent BE, as well as substantial numbers of taxa in the EAC tissue microenvironment, likely undergo selective pressures that influence relative abundances measured throughout carcinogenesis.
Conclusion
In summary, there is evidence for neutral population dynamics for the majority of microbial species in normal esophagus, GERD, and Barrett’s esophagus tissue microenvironments. This implies that fluctuations in relative abundances measured in previous studies at various time points and disease stages may simply reflect such underlying dynamics. Our data also suggest that Helicobacter pylori in particular may be under positive selection in pre-cancer stages, and EAC microenvironments exhibit population dynamics that are largely non-neutral. Other analyses that incorporate source-sink dynamics, dynamic migration rates with variability across people and over time, as well as priority effects during microbiome assembly are important areas of future work. Overall, bacteria are present in precancerous stages of esophageal adenocarcinoma, and advanced bioinformatics and mathematical modeling are needed to characterize these evolving communities including their stochastic population dynamics from WGS data and clarify their role in cancer progression.
Supplementary Information
Acknowledgements
Figure 1 was made with BioRender.The computational analyses included in this manuscript have used the Triton Shared Computing Cluster at the San Diego Supercomputer Center of UCSD.
Author’s contributions
Conception: C.G., R.K., K.C. Design of the work: C.G., R.K., K.C. Acquisition of data: M.K., T. P., W. M. G., L. B. A., R.K., K.C. Analysis of data: C.G., A.G., S. S. M. Interpretation of data: C.G., J.P.S., S. C. S., S. S. M., R.K., K.C. Software creation: C.G., I.S., Y.W., D.M. Writing—original draft preparation: C.G., K.C. Writing—reviewing and editing: C.G., J.P.S, D.M., S. C. S., S.S.M, T. P., W.M.G., L.B.A., R.K., K.C. All authors reviewed the manuscript.
Funding
This work was supported by AGA Research Foundation (AGA Research Scholar Award AGA2022-13–05), NIH R01 CA270235, and UCSD Academic Senate Grant (RG103468) to K.C. The study was supported in part by the NIDDK-funded San Diego Digestive Diseases Research Center (P30 DK120515). Additionally, this work was supported by NIH grants (R01 CA241728, P30 CA023100, NIH/NIGMS T32GM007198, NCI U24CA248454) to R.K. J.P.S. is supported by grants SD IRACDA, Professors of the Future (5K12GM068524-17) and USDA NIFA (2019–67013-29137). T.G.P. is supported by NIH grants (U2CCA271902, R01 CA140657, R01 CA270235 and R21 CA259687). W.M.G. is supported by NIH grants (U2CCA271902, R01CA266386, R01 CA270235 and U54CA274374). S.C.S. receives funding from a Veterans Affairs Career Development Award (ICX002027A). The contents do not represent the views of the U.S. Department of Veterans Affairs or the United States Government.
Data availability
All datasets used in our study have been previously published [14, 27–29], as outlined in Supplementary Table S1. Please see original publications for details on requesting data access. Code to recreate all results and apply methods to other datasets can be found on GitHub: [https://github.com/cguccione/comad_of_EAC_progression]. The code created for modeling analyses was adapted from [23] and [54]. The workflow used to run MetaPhlAn4 is available at [https://github.com/FredHutch/metaphlan-nf].
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
D.M. is a consultant for BiomeSense, Inc., has equity and receives income. The terms of these arrangements have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. LBA is a co-founder, CSO, scientific advisory member, and consultant for io9, has equity and receives income. The terms of this arrangement have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. L.B.A. is a compensated member of the scientific advisory board of Inocras. L.B.A.’s spouse is an employee of Hologic, Inc. L.B.A. declares U.S. provisional applications with serial numbers: 63/289,601; 63/269,033; 63/366,392; 63/412,835 as well as international patent application PCT/US2023/010679. L.B.A. is also an inventor of a US Patent 10,776,718 for source identification by non-negative matrix factorization. R.K. is a scientific advisory board member, and consultant for BiomeSense, Inc., has equity and receives income. He is a scientific advisory board member and has equity in GenCirq. He is a board member of N=1 IBS advisory board and receives income. He has equity in and acts as a consultant for Cybele. He is a Vice President and board member of Microbiota Vault, Inc. He is a Senior Visiting Fellow of HKUST Jockey Club Institute for Advanced Study. The terms of these arrangements have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. K.C. has research grant support from Phathom Pharmaceuticals. W.M.G. serves on scientific advisory board for Guardant Health, consults for Karius, and receives research support from Lucid Diagnostics.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Nejman D, Livyatan I, Fuks G, Gavert N, Zwang Y, Geller LT, et al. The human tumor microbiome is composed of tumor type-specific intracellular bacteria. Science. 2020;368:973–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Dohlman AB, Klug J, Mesko M, Gao IH, Lipkin SM, Shen X, et al. A pan-cancer mycobiome analysis reveals fungal involvement in gastrointestinal and lung tumors. Cell. 2022;185:3807-3822.e12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Battaglia TW, Mimpen IL, Traets JJH, van Hoeck A, Zeverijn LJ, Geurts BS, et al. A pan-cancer analysis of the microbiome in metastatic cancer. Cell. 2024;187:2324-2335.e19. [DOI] [PubMed] [Google Scholar]
- 4.Sepich-Poore GD, Guccione C, Laplane L, Pradeu T, Curtius K, Knight R. Cancer’s second genome: microbial cancer diagnostics and redefining clonal evolution as a multispecies process. BioEssays. 2022;44:e2100252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Guccione C, Yadlapati R, Shah S, Knight R, Curtius K. Challenges in determining the role of microbiome evolution in Barrett’s esophagus and progression to esophageal adenocarcinoma. Microorganisms. 2021;9:2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Snider EJ, Freedberg DE, Abrams JA. Potential role of the microbiome in Barrett’s esophagus and esophageal adenocarcinoma. Dig Dis Sci. 2016;61:2217–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Snider EJ, Compres G, Freedberg DE, Khiabanian H, Nobel YR, Stump S, et al. Alterations to the esophageal microbiome associated with progression from Barrett’s esophagus to esophageal adenocarcinoma. Cancer Epidemiol Biomarkers Prev. 2019;28:1687–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Curtius K, Rubenstein JH, Chak A, Inadomi JM. Computational modelling suggests that Barrett’s oesophagus may be the precursor of all oesophageal adenocarcinomas. Gut. 2020;70:1435–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Shaheen NJ, Falk GW, Iyer PG, Souza RF, Yadlapati RH, Sauer BG, et al. Diagnosis and management of Barrett’s esophagus: an updated ACG guideline. Am J Gastroenterol. 2022;117:559–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Desai TK, Krishnan K, Samala N, Singh J, Cluley J, Perla S, et al. The incidence of oesophageal adenocarcinoma in non-dysplastic Barrett’s oesophagus: a meta-analysis. Gut. 2012;61:970–6. [DOI] [PubMed] [Google Scholar]
- 11.Elliott DRF, Walker AW, O’Donovan M, Parkhill J, Fitzgerald RC. A non-endoscopic device to sample the oesophageal microbiota: a case-control study. Lancet Gastroenterol Hepatol. 2017;2:32–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Radani N, Metwaly A, Reitmeier S, Baumeister T, Ingermann J, Horstmann J, et al. Analysis of fecal, salivary, and tissue microbiome in Barrett’s esophagus, dysplasia, and esophageal adenocarcinoma. Gastro Hep Adv. 2022;1:755–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yang L, Lu X, Nossa CW, Francois F, Peek RM, Pei Z. Inflammation and intestinal metaplasia of the distal esophagus are associated with alterations in the microbiome. Gastroenterology. 2009;137:588–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Deshpande NP, Riordan SM, Castaño-Rodríguez N, Wilkins MR, Kaakoush NO. Signatures within the esophageal microbiome are associated with host genetics, age, and disease. Microbiome. 2018;6:227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Solfisburg QS, Baldini F, Baldwin-Hunter B, Austin GI, Lee HH, Park H, et al. The Salivary Microbiome and Predicted Metabolite Production are Associated with Progression from Barrett’s Esophagus and High-Grade Dysplasia or Adenocarcinoma. Cancer Epidemiol Biomarkers Prev. 2024 ;33:371-380. [DOI] [PMC free article] [PubMed]
- 16.Kusters JG, van Vliet AHM, Kuipers EJ. Pathogenesis of Helicobacter pylori infection. Clin Microbiol Rev. 2006;19:449–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Islami F, Kamangar F. Helicobacter pylori and esophageal cancer risk: a meta-analysis. Cancer Prev Res (Phila). 2008;1:329–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Erőss B, Farkas N, Vincze Á, Tinusz B, Szapáry L, Garami A, et al. Helicobacter pylori infection reduces the risk of Barrett’s esophagus: a meta-analysis and systematic review. Helicobacter. 2018;23:e12504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Du Y-L, Duan R-Q, Duan L-P. Helicobacter pylori infection is associated with reduced risk of Barrett’s esophagus: a meta-analysis and systematic review. BMC Gastroenterol. 2021;21:459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kong CY, Kroep S, Curtius K, Hazelton WD, Jeon J, Meza R, et al. Exploring the recent trend in esophageal adenocarcinoma incidence and mortality using comparative simulation modeling. Cancer Epidemiol Biomarkers Prev. 2014;23:997–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Gall A, Fero J, McCoy C, Claywell BC, Sanchez CA, Blount PL, et al. Bacterial composition of the human upper gastrointestinal tract microbiome is dynamic and associated with genomic instability in a Barrett’s esophagus cohort. PLoS ONE. 2015;10:e0129055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wiklund A-K, Santoni G, Yan J, Radkiewicz C, Xie S, Birgisson H, et al. Risk of esophageal adenocarcinoma after helicobacter pylori eradication treatment in a population-based multinational cohort Study. Gastroenterology. 2024;167:485-492.e3. [DOI] [PubMed] [Google Scholar]
- 23.Sieber M, Pita L, Weiland-Bräuer N, Dirksen P, Wang J, Mortzfeld B, et al. Neutrality in the metaorganism. PLoS Biol. 2019;17:e3000298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Venkataraman A, Bassis CM, Beck JM, Young VB, Curtis JL, Huffnagle GB, et al. Application of a neutral community model to assess structuring of the human lung microbiome. MBio. 2015. 10.1128/mBio.02284-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Leung MHY, Tong X, Bastien P, Guinot F, Tenenhaus A, Appenzeller BMR, et al. Changes of the human skin microbiota upon chronic exposure to polycyclic aromatic hydrocarbon pollutants. Microbiome. 2020;8:100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Kim H-J, Kim H, Kim JJ, Myeong NR, Kim T, Park T, et al. Fragile skin microbiomes in megacities are assembled by a predominantly niche-based process. Sci Adv. 2018;4:e1701581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Paulson TG, Galipeau PC, Oman KM, Sanchez CA, Kuhner MK, Smith LP, et al. Somatic whole genome dynamics of precancer in Barrett’s esophagus reveals features associated with disease progression. Nat Commun. 2022;13:2300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature. 2020;578:82–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ross-Innes CS, Becq J, Warren A, Cheetham RK, Northen H, O’Donovan M, et al. Whole-genome sequencing provides new insights into the clonal architecture of Barrett’s esophagus and esophageal adenocarcinoma. Nat Genet. 2015;47:1038–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Peter S, Pendergraft A, VanDerPol W, Wilcox CM, Kyanam Kabir Baig KR, Morrow C, et al. Mucosa-Associated Microbiota in Barrett’s Esophagus, Dysplasia, and Esophageal Adenocarcinoma Differ Similarly Compared With Healthy Controls. Clin Transl Gastroenterol. 2020;11: e00199. [DOI] [PMC free article] [PubMed]
- 31.Lopetuso LR, Severgnini M, Pecere S, Ponziani FR, Boskoski I, Larghi A, et al. Esophageal microbiome signature in patients with Barrett’s esophagus and esophageal adenocarcinoma. PLoS ONE. 2020;15:e0231789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zhou J, Shrestha P, Qiu Z, Harman DG, Teoh W-C, Al-Sohaily S, et al. Distinct microbiota dysbiosis in patients with non-erosive reflux disease and esophageal adenocarcinoma. J Clin Med. 2020. 10.3390/jcm9072162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Liu N, Ando T, Ishiguro K, Maeda O, Watanabe O, Funasaka K, et al. Characterization of bacterial biota in the distal esophagus of Japanese patients with reflux esophagitis and Barrett’s esophagus. BMC Infect Dis. 2013;13:130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Pei Z, Bini EJ, Yang L, Zhou M, Francois F, Blaser MJ. Bacterial biota in the human distal esophagus. Proc Natl Acad Sci U S A. 2004;101:4250–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Luebeck J, Ng AWT, Galipeau PC, Li X, Sanchez CA, Katz-Summercorn AC, et al. Extrachromosomal DNA in the cancerous transformation of Barrett’s oesophagus. Nature. 2023;616:798–805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Guccione C, Patel L, Tomofuji Y, McDonald D, Gonzalez A, Sepich-Poore G, et al. Incomplete human reference genomes can drive false sex biases and expose patient-identifying information in metagenomic data. Nat Commun. 2025;16:825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Armstrong G, Martino C, Morris J, Khaleghi B, Kang J, DeReus J, et al. Swapping metagenomics preprocessing pipeline components offers speed and sensitivity increases. mSystems. 2022;7:e0137821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Hillmann B, Al-Ghalith GA, Shields-Cutler RR, Zhu Q, Knight R, Knights D. Shogun: a modular, accurate and scalable framework for microbiome quantification. Bioinformatics. 2020;36:4088–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Zhu Q, Mai U, Pfeiffer W, Janssen S, Asnicar F, Sanders JG, et al. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains bacteria and archaea. Nat Commun. 2019;10:5477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Sepich-Poore GD, McDonald D, Kopylova E, Guccione C, Zhu Q, Austin G, et al. Robustness of cancer microbiome signals over a broad range of methodological variation. Oncogene. 2024;43:1127–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Steinegger M, Salzberg SL. Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Genome Biol. 2020;21:115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen H-C, Kitts PA, et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 2017;27:849–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, et al. The complete sequence of a human genome. Science. 2022;376:44–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, et al. A draft human pangenome reference. Nature. 2023;617:312–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486:207–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Human Microbiome Project Consortium. A framework for human microbiome research. Nature. 2012;486:215–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Weng Y, Guccione C, McDonald D, Oles R, Devkota S, Kopylova E, Sepich-Poore GD, Salido RA, Din MO, Song SJ, Curtius K. Calculating fast differential genome coverages among metagenomic sources using micov. Communications Biology. 2025;8(1):1624. [DOI] [PMC free article] [PubMed]
- 48.Hakim D, Wandro S, Zengler K, Zaramela LS, Nowinski B, Swafford A, et al. Zebra: Static and dynamic genome cover thresholds with overlapping references. mSystems. 2022;7: e0075822. [DOI] [PMC free article] [PubMed]
- 49.Blanco-Míguez A, Beghini F, Cumbo F, McIver LJ, Thompson KN, Zolfo M, et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat Biotechnol. 2023;41:1633–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Dröge J, et al. Critical assessment of metagenome interpretation-a benchmark of metagenomics software. Nat Methods. 2017;14:1063–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Meyer F, Fritz A, Deng Z-L, Koslicki D, Lesker TR, Gurevich A, et al. Critical assessment of metagenome interpretation: the second round of challenges. Nat Methods. 2022;19:429–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol. 2019;37:852–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Martino C, Morton JT, Marotz CA, Thompson LR, Tripathi A, Knight R, et al. A novel sparse compositional technique reveals microbial perturbations. mSystems. 2019. 10.1128/mSystems.00016-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Zapién-Campos R, Sieber M, Traulsen A. The effect of microbial selection on the occurrence-abundance patterns of microbiomes. J R Soc Interface. 2022;19:20210717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Hubbell SP. The Unified Neutral Theory of Biodiversity and Biogeography (MPB-32). Princeton University Press; 2001.
- 56.Ling W, Lu J, Zhao N, Lulla A, Plantinga AM, Fu W, et al. Batch effects removal for microbiome data via conditional quantile regression. Nat Commun. 2022;13:5418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Yamamoto H, Bai YQ, Yuasa Y. Homeodomain protein CDX2 regulates goblet-specific MUC2 gene expression. Biochem Biophys Res Commun. 2003;300:813–8. [DOI] [PubMed] [Google Scholar]
- 58.Henihan RD, Stuart RC, Nolan N, Gorey TF, Hennessy TP, O’Morain CA. Barrett’s esophagus and the presence of Helicobacter pylori. Am J Gastroenterol. 1998;93:542–6. [DOI] [PubMed] [Google Scholar]
- 59.Xie F-J, Zhang Y-P, Zheng Q-Q, Jin H-C, Wang F-L, Chen M, et al. Helicobacter pylori infection and esophageal cancer risk: an updated meta-analysis. World J Gastroenterol. 2013;19:6098–107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Mathieu LN, Kanarek NF, Tsai H-L, Rudin CM, Brock MV. Age and sex differences in the incidence of esophageal adenocarcinoma: results from the surveillance, epidemiology, and end results (SEER) registry (1973–2008). Dis Esophagus. 2014;27:757–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Baker JL, Mark Welch JL, Kauffman KM, McLean JS, He X. The oral microbiome: diversity, biogeography and human health. Nat Rev Microbiol. 2024;22:89–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Zapién-Campos R, Bansept F, Traulsen A. Stochastic models allow improved inference of microbiome interactions from time series data. PLoS Biol. 2024;22:e3002913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Paczosa MK, Mecsas J. Klebsiella pneumoniae: going on the offense with a strong defense. Microbiol Mol Biol Rev. 2016;80. [DOI] [PMC free article] [PubMed]
- 64.Yadukumar L, Aslam H, Ahmed K, Iskander P, Sajid K, Syed O, et al. A rare case of acute esophageal necrosis precipitated by Klebsiella pneumoniae. Gastro Hep Adv. 2023;2:827–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Ren J-M, Zhang X-Y, Liu S-Y. Neisseria mucosa - a rare cause of peritoneal dialysis-related peritonitis: a case report. World J Clin Cases. 2023;11:3311–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Abdelsalam NA, Hegazy SM, Aziz RK. The curious case of Prevotella copri. Gut Microbes. 2023;15:2249152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Yu C, Zhou B, Xia X, Chen S, Deng Y, Wang Y, et al. Prevotella copri is associated with carboplatin-induced gut toxicity. Cell Death Dis. 2019;10:714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Scher JU, Sczesnak A, Longman RS, Segata N, Ubeda C, Bielski C, et al. Expansion of intestinal Prevotella copri correlates with enhanced susceptibility to arthritis. Elife. 2013;2:e01202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Liu X, Li: LXS, Ji F, Mei Y, Cheng Y, Liu F, et al. Alterations of gastric mucosal microbiota across different stomach microhabitats in a cohort of 276 patients with gastric cancer. EBioMedicine. 2019;40: 336-348. [DOI] [PMC free article] [PubMed]
- 70.Gunathilake MN, Lee J, Choi IJ, Kim Y-I, Ahn Y, Park C, et al. Association between the relative abundance of gastric microbiota and the risk of gastric cancer: a case-control study. Sci Rep. 2019;9:13589. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All datasets used in our study have been previously published [14, 27–29], as outlined in Supplementary Table S1. Please see original publications for details on requesting data access. Code to recreate all results and apply methods to other datasets can be found on GitHub: [https://github.com/cguccione/comad_of_EAC_progression]. The code created for modeling analyses was adapted from [23] and [54]. The workflow used to run MetaPhlAn4 is available at [https://github.com/FredHutch/metaphlan-nf].








