Abstract
Objective:
Colonic diverticulosis is a prevalent condition among older adults, marked by the presence of thin-walled pockets in the colon wall that can become inflamed, infected, hemorrhage, or rupture. We present a case-control genetic and transcriptomic study aimed at identifying the genetic and cellular determinants underlying this condition and the relationship with other gastrointestinal disorders.
Design:
We conducted DNA and RNA sequencing on colonic tissue from 404 patients with (N=172) and without (N=232) diverticulosis. We investigated variation in the transcriptome associated with diverticulosis and further integrated this variation with single-cell RNA-seq data from the human intestine. We also integrated our expression quantitative trait loci (eQTL) with GWAS using Mendelian Randomization (MR). Furthermore, a polygenic risk score (PRS) analysis gauged associations between diverticulosis severity and other gastrointestinal disorders.
Results:
We discerned 38 genes with differential expression and 17 with varied transcript usage linked to diverticulosis, indicating tissue remodeling as a primary diverticula formation mechanism. Diverticula formation was primarily linked to stromal and epithelial cells in the colon including endothelial cells, myofibroblasts, fibroblasts, goblet, tuft, enterocytes, neurons, and glia. MR highlighted five genes including CCN3, CRISPLD2, ENTPD7, PHGR1, and TNFSF13, with potential causal effects on diverticulosis. Notably, ENTPD7 upregulation was confirmed in diverticulosis cases. Additionally, diverticulosis severity was positively correlated with genetic predisposition to diverticulitis.
Conclusion:
Our results suggest that tissue remodeling is a primary mechanism for diverticula formation. Individuals with an increased genetic proclivity to diverticulitis exhibit a larger numbers of diverticula on colonoscopy.
Keywords: RNA-seq, Transcriptome, Expression quantitative loci (eQTL), Mendelian randomization, Polygenic risk score
Introduction
Diverticulosis refers to the presence of acquired sac-like defects, known as pseudo-diverticula, that develop in the wall of the colon [2]. Diverticulosis prevalence increases with age, affecting 24% of individuals ages 40–49 years old, 33% ages 50–59 years old, and 50% individuals ages 60–69 years old [3]. Patients with diverticula can develop diverticulitis (with or without perforation), arterial hemorrhage, or segmental colitis. Older age, male sex, and obesity are risk factors associated with the development of diverticulosis [3, 4]. Studies also found the causal associations of higher body mass index, type 2 diabetes, and smoking with an increased risk of diverticular disease [5, 6, 7]. However, constipation and a low fiber diet are not associated with diverticulosis [8, 9].
Genetic susceptibility may play a role in the development of diverticulosis. Twin studies estimate the heritability of diverticular diseases between 40–53% [10, 11]. In addition, continued efforts on large-scale genome-wide association studies (GWAS), including the UK Biobank and Michigan Genomic Initiative, have successfully identified multiple variants associated with diverticular diseases [12, 13, 14, 15]. GWAS have highlighted several molecular and biological mechanisms associated with the pathophysiology of diverticular diseases, including epithelial dysfunction and altered immunity [13, 14]. Beyond these genetic associations, the underlying molecular mechanisms of diverticular diseases, especially parsing out diverticulosis from diverticulitis, remains obscure. A previous RNA-sequencing study was conducted on diverticulitis-only; however, due to a small sample size (N=25), the study possessed limited statistical power to comprehensively analyze the disease’s transcriptome [16]. Thus, there is a clear need to characterize transcriptomic landscape in the tissue of interest and in cohorts with diverticulosis-only. Unraveling the mechanism integral to diverticula formation might facilitate the development of strategies aimed at the prevention of this condition and its concomitant severe complications.
In this study, we investigated the genetic and transcriptomic landscape of diverticulosis. By combining genotypes and bulk RNA-seq data from the colon tissue of well-characterized patients, we assessed differential gene expression, differential transcript usage, and expression quantitative trait loci (eQTLs) to determine changes in gene expression and transcript patterns between those with and without diverticulosis, and their potential regulatory genetic variants, respectively. These findings were further integrated with single-cell RNA-seq data from the human intestine to prioritizes cell types. Additionally, we utilized a Mendelian Randomization (MR) framework to systematically integrate these findings, treating single nucleotide polymorphisms (SNPs) as instrumental variables and a gene as an exposure to identify genes that are causally associated with diverticulosis.
Methods
Study subjects
The study recruited patients 30 years of age and older scheduled for their first colonoscopy between 2013 – 2015 at the University of North Carolina Hospital in Chapel Hill, North Carolina, United States. Inclusion required an indication of colon cancer screening, a complete examination to the cecum, and satisfactory preparation for colonoscopy. The study excluded patients with a history of prior colonoscopy, familial polyposis syndrome, colon resection, colon cancer or adenomas, or colitis on exam. The University of North Carolina School of Medicine Institutional Review Board approved this study, and all participants provided informed consent.
Prior to colonoscopy, participants completed a structured interview to obtain information on demographics, physical activity, tobacco and alcohol use, nonsteroidal anti-inflammatory drug and aspirin use, and diet. Dietary intake was collected using a 124-item food frequency questionnaire. Physical activity was assessed using a validated questionnaire. Race was self-reported. On the day of colonoscopy, a research assistant measured the participant’s height and weight. A research assistant was present during the colonoscopy and recorded information from the procedure on a data collection form. A gastroenterologist performed the colonoscopy and reported the number and characteristics of the diverticula in each segment of the colon. Biopsies were taken from normal appearing mucosa in the sigmoid colon using standard (8mm. wing) disposable, fenestrated colonoscopy forceps (Olympus, Center Valley, PA).
Genotyping
The study participants were genotyped using the Illumina Infinium HTS Assay (Illumina, Inc., San Diego USA) at the UNC Mammalian Genotyping Core. Infinium HTS Assay surveys around 750,000 SNPs and CNV markers per sample through whole-genome amplification coupled with direct, array-based capture, and enzymatic scoring of the SNP loci. DNA was quantified in plates, amplified, and then prepared for hybridization according to the manufacturer’s protocol (https://www.illumina.com/products/by-type/microarray-kits/infinium-global-screening.html). In this cohort, 4 of 644 had insufficient material for analysis. Hybridization, followed by staining and imaging was performed according to the manufacturer’s protocol. After imaging, data was imported into Illumina GenomeStudio Genotyping Module. Data from each plate was assessed. Samples failing QAQC were removed (N = 16); individual SNPs failing to pass metrics were masked out. Data was exported as PLINK files.
RNA-seq Library Construction and processing
For mRNA RNASeq library construction, we used KAPA Hyper RNA Prep (Roche) and bar-coded with individual tags following the manufacturer’s instructions (Illumina, Inc. San Diego, CA). Libraries were prepared and then pooled on an automated liquid handling systems to minimize variance. Typically, these were pools of 38–92 samples, depending on available capacity on a sequencer. Quality control was performed at every step. Libraries were quantified for concentration, fragment size and distribution using a TapeStation system. As needed, pool balance and library quality were assessed using a miSeq Nano single-end 50bp sequencing. QAQC steps and RNA-seq processing pipeline can be found in supplementary methods.
Differential gene expression, transcript usage and pathway enrichment analysis
Details on differential gene expression, transcript usage and pathway enrichment analysis can be found on supplementary methods. Briefly, to estimate latent batch effects embedded in RNA-seq, the SVA package was used to obtain surrogate variables [17]. Differential expression (DE) analysis was conducted using DESeq2 [18]. Individuals with at least one diverticulum were considered a case (N=172) and those without diverticula were considered a control (N=232). Covariates used in the analysis included sex, age, BMI, five surrogate variables and ten principal components derived from genotypes. Any genes with more than 5 counts across more than 10% of the total sample were included for the analysis. Differential transcript usage analysis was performed with transcript abundance quantified by Salmon [19] using DRIMSeq [20] which was designed to efficiently handle large samples (N >100). Pathway enrichment analysis for DE and DTU was performed using Pathfinder R package [21].
Expression quantitative trait loci (eQTL) mapping and integrative analysis of GWAS and eQTLs
Details on eQTL mapping can be found on supplementary methods. Briefly, eQTL mapping was primarily conducted using the GTEx v8 pipeline [22]. For inclusion in the analysis, genes were required to have a minimum of 0.1 TPM in at least 20% of samples and a minimum of 6 reads (unnormalized) in at least 20 samples. Normalization of read counts was performed using the Trimmed Mean of M-values (TMM) method [23], and inverse normal transformation was applied across samples for each gene. TensorQTL was used to perform cis-eQTL mapping, with a mapping window of 1 Mb up- and downstream of the transcription start site (TSS) [24]. eQTLs were computed with respect to the alternative allele.
The Summary Mendelian Randomization (SMR) method was used to integrate GWAS and cis-eQTLs [25]. UK Biobank diverticular disease GWAS results based on 27,311 cases and 334,783 controls generated using SAIGE [26] were used to represent variants-phenotype associations. Conditionally independent cis-eQTLs were used to capture variants-gene expression associations. The SMR was employed with the following criteria: PeQTL< 5.0×10−8, MAF > 0.01, r2< 0.2 for LD pruning threshold, and exclusion of eQTLs that exhibited very high LD (r2 > 0.9) or low to no LD (r2 < 0.05) with the top associated eQTL for the heterogeneity in dependent instruments (HEIDI) test (PHEIDI <0.05). Testing threshold was determined using the Bonferroni correction for the number of genes tested (N=4,151) (PMR < 0.05/4,151 =1.2×10−05).
Phenome Wide Association study (PheWAS)
Associations between eQTLs of D.E genes and other traits were explored using Open Target Genetics that aggregates all published trait-associated loci identified by GWAS and other functional genomics [27]. Variant-trait associations reported in UK Biobank (http://www.nealelab.is/uk-biobank), FinnGen [28] and GWAS catalog [29] were investigated. All PheWAS results can be found in Supplementary Table S8.
Polygenic risk score analysis
To generate our polygenic risk score (PRS), we used the publicly accessible Polygenic Score (PGS) Catalog [30], which contains an inventory of published PRS, along with pertinent metadata. This metadata comprises descriptive information pertaining to the computational methodologies utilized in PRS construction, as well as performance metrics for PRS evaluation. Calculating PRS for traits on individuals was performed using the polygenic score catalog calculator pipeline. Details can be found on https://github.com/PGScatalog/pgsc_calc. Traits tested in this analysis included diverticulosis, diverticulitis, ulcerative colitis, Crohn’s disease, nicotine dependence, sedentary behavior, and processed meat intake. Details on the development, performance matrix, and evaluation of PRS for traits tested can be found in Supplementary Table S1–3.
Statistical analysis
A negative binomial generalized linear model was used to investigate the relationship between the number of diverticula and covariates including dietary/lifestyle factors using glm.nb function in MASS R package. Covariates included BMI, sex, age, age2, 10 PCs of genotypes, smoking status (never and ever smoking), dietary fiber intake per day (g), lean meat intake per day(oz), alcohol drink per day, total fat intakes per day(g) and physical activity per day. To evaluate the relationship between the number of diverticula and PRS, negative binomial generalized linear model was also used and covariates included body mass index, sex, age, age2, 20 PCs of genotypes, smoking status (never and ever smoking), dietary fiber intake per day (g), lean meat intake per day(oz), alcohol drink per day, total fat intakes per day(g), physical activity per day, PRS and smoking status ×PRS.
Results
Demographic profile of the study cohort
In this study, we analyzed a cohort of 404 individuals who met our inclusion criteria and provided high-quality RNA and DNA samples (Methods). Our study comprised individuals with an average age of 54.4 years, with women constituting 56.2% of the total sample. With regards to race, 80% of participants self-identified as White, with the remaining 20% as Black. Further details on the participants’ lifestyle and dietary habits, stratified by presence and number of diverticula, can be found in Table 1. Out of the total participants (N=404), 172 individuals (43%) were diagnosed with colonic diverticula and were thus classified as cases. The remaining 232 participants (57%) had no detectable evidence of diverticula on colonoscopy and were designated as controls. Within the case group, 61% had diverticula only in the distal colon (splenic flexure, descending, or sigmoid), 35% had diverticula in the distal and proximal colon, and 4% had diverticula only in the proximal colon (transverse, ascending, or cecum). We observed a varying prevalence of diverticula, with 36% (N=62) exhibiting 1–5, 21% (N=36) with 6–10, 19% (N=32) with 11–20 and 24% (N=42) of participants with >20 diverticula.
Table 1.
Demographic and phenotypic information of study participants
| Diverticula | |||||||
|---|---|---|---|---|---|---|---|
|
|
|||||||
| Control (N=232) | 1–5 (N=62) | 6–10 (N=36) | 11–20 (N=32) | >20 (N=42) | All (N=172) | Overall (N=404) | |
|
| |||||||
| Age | |||||||
| Mean (SD) | 53.5 (6.24) | 54.6 (7.55) | 55.3 (7.37) | 55.9 (6.45) | 57.5 (7.37) | 55.7 (7.30) | 54.4 (6.79) |
| Sex | |||||||
| FEMALE | 136 (58.6%) | 38 (61.3%) | 17 (47.2%) | 14 (43.8%) | 22 (52.4%) | 91 (52.9%) | 227 (56.2%) |
| MALE | 96 (41.4%) | 24 (38.7%) | 19 (52.8%) | 18 (56.3%) | 20 (47.6%) | 81 (47.1%) | 177 (43.8%) |
| Race | |||||||
| Black | 52 (22.4%) | 7(11.3%) | 9 (25.0%) | 5 (15.6%) | 8(19.0%) | 29 (16.9%) | 81 (20.0%) |
| White | 180 (77.6%) | 55 (88.7%) | 27 (75.0%) | 27 (84.4%) | 34 (81.0%) | 143 (83.1%) | 323 (80.0%) |
| Smoking | |||||||
| Ever | 138 (59.5%) | 38 (61.3%) | 20 (55.6%) | 17 (53.1%) | 27 (64.3%) | 102 (59.3%) | 240 (59.4%) |
| Never | 94 (40.5%) | 24 (38.7%) | 16 (44.4%) | 15 (46.9%) | 15 (35.7%) | 70 (40.7%) | 164 (40.6%) |
| Total Fat Intake (g/day) | |||||||
| Mean (SD) | 86.3 (36.5) | 91.2 (34.0) | 95.5 (44.5) | 82.0 (41.7) | 80.6 (33.8) | 87.7 (37.8) | 87.0 (37.0) |
| Dietary Fiber (g/day) | |||||||
| Mean (SD) | 20.4 (9.06) | 20.1 (9.15) | 23.9 (11.3) | 20.6 (10.2) | 18.5 (6.88) | 20.6 (9.39) | 20.4 (9.19) |
| Lean Meat (ounces/day) | |||||||
| Mean (SD) | 1.34 (0.969) | 1.73 (1.30) | 1.47 (0.987) | 1.63 (0.992) | 1.59 (1.18) | 1.62(1.15) | 1.47 (1.06) |
| Alcohol Consumption (drinks/day) | |||||||
| Mean (SD) | 0.868 (1.91) | 0.794 (1.23) | 0.479 (0.881) | 1.29 (2.93) | 1.06 (1.32) | 0.886 (1.64) | 0.876 (1.79) |
| BMI (kg/m2) | |||||||
| Mean (SD) | 28.2 (5.86) | 27.8 (5.11) | 29.8 (7.77) | 30.5 (6.76) | 30.0 (6.37) | 29.3 (6.39) | 28.6 (6.11) |
| Physical Activity (MET min/day) | |||||||
| Mean (SD) | 3100 (3880) | 3380 (3970) | 4420 (6650) | 2810 (3890) | 3790 (5170) | 3590 (4890) | 3320 (4360) |
Distinct transcriptomic landscape, pathways, and cell types associated with diverticulosis
We analyzed RNA-seq data that was generated directly from tissues extracted from the colon (Fig. 1A). We first investigated how genes were differentially expressed depending on case-control status using differential gene expression (DE) analysis. Of 14,186 protein coding genes tested, a total of 38 (0.26%) genes were differentially expressed between cases and controls at FDR < 0.05 (Fig. 1B). Compared to gene expression profiles of controls without diverticula, 5 genes were down-regulated, and 33 genes were up-regulated in cases with diverticula (Supplementary Fig. S1, Supplementary Table. S4), including genes such as TPPP3, TLR4, AQP3, ENTPD7, and ANGPTL1. Enrichment analyses broadly implicate NF-κB signaling, cortical actin cytoskeleton, extracellular matrix structural constituent, cadherin binding and response to calcium ion pathways (Supplementary Fig. S2).
Figure 1.

Transcriptomic landscape of diverticulosis. A) Schematic of the case-control study design to reveal the genetic and transcriptomic underpinnings of diverticulosis. B) Scatter plot showing the distribution of gene expression change in log2 scale between case and control groups and their corresponding adjusted p-value (FDR: false discovery rate) in −log10 scale. C) Venn diagram showing the gene overlaps that were resulted from differential gene expression analysis, differential transcript usage analysis and previous diverticulosis GWAS studies. D) Scatter boxplots showing the distribution of transcript proportion observed for T1(ENST00000202917.10) and T2 (ENST00000553152.1) between case and control groups. T1 and T2 were transcripts of OAS1. E) Dot chart displaying the enrichment of molecular functions based on Gene Ontology (GO) for genes that were identified through differential expression (DE) and differential transcript usage (DTU) analyses. F) Dot chart showing p-values of differential expression within specific clusters of stromal cells.
We repeated DE analysis using two different age thresholds where only patients aged over 40 and over 50 were included, respectively (Supplementary Table. S4). This approach was taken to assess the impact of including younger patients in their 30s and 40s on the pattern of gene expression differences. In the cohort of patients over 40 years of age, which included 394 individuals (225 controls and 169 cases), we discovered that 20 out of the 28 genes identified at 0.05 FDR level were also found in the initial analysis. Similarly, in the cohort over 50 years of age, with 372 patients (214 controls and 158 cases), 14 of the 18 identified genes were present in the initial analysis. The overlapping genes in both age cohorts retained the same direction in log fold change from initial D.E genes. Notably, the ENTPD7 gene remained significantly upregulated in both analyses, aligning with initial findings. Both D.E gene lists showed significant overlap with the initial D.E gene list (P = 2.3×10−48 for those over 40 and 1.9×10−34 for those over 50), indicating that the gene sets derived from each age threshold do not differ significantly from the initial DE genes.
To detect possible splicing changes at the transcript level, we performed differential transcript usage (DTU) analysis to assess changes in the relative usage of different transcripts of a gene in cases with diverticula compared to controls [31, 32] (Methods). We identified a total of 17 genes that exhibited at least one differential transcript usage between case and control groups (Supplementary Table. S5). We then conducted a pathway enrichment analysis using a combined set of genes identified through both DE and DTU analyses. We observed an enrichment of molecular functions related to extracellular structures and secretory organelles involved in cell-to-cell communication and tissue remodeling (Fig. 1E, Supplementary Fig. S3, Supplementary Table. S6).
To further identify cell types associated with diverticulosis formation, we integrated our D.E. findings with single-cell RNA-seq data from the human intestine [33, 34] (Supplementary Methods). Investigating differential expression patterns between cell types at single-cell resolution for genes associated with diverticulosis, as identified in bulk RNA-seq, may allow us to highlight potential cell types that contribute to the observed bulk differences in relation to diverticula. Of 38 D.E. genes, 17 were differentially expressed between cell types within either the stromal or epithelial cells of the colon (Supplementary Table. S7). Of these 17 genes, 13 exhibited differential expression patterns in stromal cell types, including endothelial cells, myofibroblasts, fibroblasts, glia, and neurons (Fig. 1F). In the epithelial cells of the colon, 7 of our D.E. genes were differentially expressed across several cell types, including goblet, tuft, enterocytes, and cycling transit amplifying cells (Fig. 1F). In addition, cell type-specific enrichment analyses for both stromal and epithelial cells were also conducted (Supplementary Methods). In stromal cells, we observed enrichment in Glia, Fibroblast, Neurons, Pericytes, and Myofibroblast at the 1% FDR level (Supplementary Fig S4). In epithelial cells, no specific cell types were significantly enriched at the 1% FDR level.
Expression quantitative loci mapping in colon tissue
To elucidate the potential genetic determinants underlying diverticulosis, we conducted eQTL analyses using our RNA-seq obtained from colon tissues while controlling for the effect of sex, BMI, age, 60 PEER factors, and population stratification by principal components. Here, we primarily focused on cis-eQTLs that are located within 1 megabase (Mb) from the transcription start site of their corresponding genes. Empirical P-values based on 10,000 permutations were used to determine the significance of observed statistical associations (Methods). A total of 6,143 genes exhibited at least one cis-eQTLs (MAF > 0.01) at a false discovery rate (FDR) of less than 1%. Of 38 DE genes, 13 including DPYSL2, KDELR3, PMP22, SPARC, ANGPTL1, TLR4, TACC1, GSN, WDR43, ING5, FOXA3, AGR3 and ENTPD7 were significantly associated with genetic variants (Fig. 2–3A, Supplementary Table. S8). In addition, 10 of 17 DTU genes (OAS1, GOSR2, PRPF19, CLU, MRPL34, COMTD1, MAPRE2, PITPNA, SERPINA1 and SUPT4H1) also displayed cis-eQTLs (Supplementary Table. S8).
Figure 2.

The distribution of colonic cis-eQTLs. Circular Manhattan plot visualizing −Log10 (P-values) of cis-eQTLs and their genomic coordinates across entire genome. Differentially expressed genes between case and control groups (FDR <0.05) are labelled if a gene is mapped by cis-eQTLs. −Log10 (P-values) were truncated to 100 for visualization purpose.
Figure 3.

cis-eQTLs of differentially expressed genes. A) Scatter plot showing log2 fold change of gene expression between case and control groups (FDR <0.05) and their corresponding effect size of top cis-eQTLs. Error bars indicate 95% confidence interval of log2 fold change of gene expression and its effect size in each direction. B) Plot showing the p-values of multiple traits analyzed for PheWAS from FinnGenn, UK Biobank and GWAS catalog that are associated with top cis-eQTLs of differentially expressed genes and genes with differential transcript usage. Y and X-axis indicates the trait categories and −log10(PGWAS), respectively.
To further examine the potential relationship between other phenotypes and eQTLs associated with these 13 D.E and 10 DTU genes, we performed a phenome-wide association study (PheWAS) using phenotype-variant associations reported to the UK Biobank (http://www.nealelab.is/uk-biobank), FinnGen [28] and the GWAS catalog [29] (Methods) (Fig. 3B, Supplementary Table. S9). rs2078260 associated with ENTPD7 expression is an eQTL that has been closely linked to diverticular diseases and diverticulosis GWAS variants in prior studies (linkage disequilibrium (LD):r2=0.990) including rs7098322 (P=1.0 × 10−11) and rs7091203 (P=2.4 × 10−11) [13]. Another eQTL of FOXA3 expression, rs8103278, has been closely associated (LD: r2=0.959) with a GWAS variant (rs74821481) for cholelithiasis (P= 5.0 × 10−12) [35].
Integrative analysis of GWAS and cis-eQTLs
To examine the hypothesis that diverticulosis may be mediated through gene expression, we integrated our colonic cis-eQTLs and GWAS of diverticulosis results using two-sample summary mendelian randomization (SMR) in conjunction with the Heterogeneity in Dependent Instruments (HEIDI) test [25] (Methods). Briefly, independent cis-eQTLs identified by the stepwise regression procedures coupled with 10,000 permutation tests were used to capture the association between variants and gene expression traits. The diverticular diseases GWAS of UK Biobank was also used to capture the association between variants and diverticulosis [22, 26]. Of 4,151 genes tested, we identified five genes with putative causal effect on diverticulosis mediated through gene expression (CCN3, CRISPLD2, ENTPD7, PHGR1 and TNFSF13) after Bonferroni correction (PMR < 1.2×10−05) (Fig. 4). Among our top causal genes, ENTPD7 (PMR = 7.6×10−07) encodes ectonucleoside triphosphate diphosphohydrolase 7 (NTPDase 7), an enzyme that regulates purinergic signaling responsible for the release of extracellular nucleotides such as ATP and ADP [36, 37]. Interestingly, some of the diverticulosis genes implicated in previous GWAS, including CBY1, THEM4, and S100A10 showed suggestive significance (Supplementary Table. S10) in our data and should be pursued with increased sample sizes.
Figure 4.

Integration of GWAS and colonic cis-eQTLs using mendelian randomization. Forest plot showing SMR effect sizes for genes used as an exposure. The error bars indicate the 95% confidence interval for the SMR effect size of each gene. Red and blue color represent a gene that exhibits a significant Bonferroni-corrected p-value (PMR < 1.2×10−05) and suggestive significance (PMR < 3.0×10−04), respectively.
Polygenic risk score analysis and lifestyle risk factors for diverticulosis severity
We next examined whether the severity of the condition as indicated by the number of diverticula was associated with polygenic risk scores (PRS) estimated for various diseases. The subjects were stratified by diverticulum number into five groups (Table 1). A negative binomial regression framework was used to explore the associations (Method). We found that higher diverticulosis PRS deciles were robustly associated with greater number of diverticula (β(se)= 11.301(3.431), 95% C.I = [4.575–18.026], P-value=9.8×10−4), suggesting that higher genetic susceptibility to diverticulosis is linked to more frequent diverticula on colonoscopy (Fig. 5). Additionally, increasing number of diverticula was also significantly associated with the higher PRS decile of diverticulitis, the inflammatory complication of diverticulosis (β (se)= 4.730(2.268), 95% C.I = [0.282–9.177], P-value=0.0371), demonstrating that greater number of diverticula is linked to greater risk of diverticulitis (Fig. 5). Finally, we investigated the hypothesis that there is a correlation between the prevalence of diverticula and PRS for gastrointestinal disorders that have been associated with diverticular diseases, such as ulcerative colitis, Crohn’s disease, and colorectal cancer [38, 39]. No significant associations were found between those conditions and diverticulosis.
Figure 5.

The association between the incidence of diverticula and polygenic risk score (PRS) of other traits. The X-axis captures the increasing genetic risk associated with each trait for individuals in the study. The Y-axis shows the number of diverticula observed. The distribution of diverticula is stratified by PRS deciles of A) diverticulosis, B) diverticulitis and C) processed meat intake. For each decile, the mean incidence of diverticula was represented by a dot on the plot, with its corresponding 95% confidence interval (C.I) indicated by an error bar.
To confirm that diverticulosis is causally associated with diverticulitis, we conducted MR that treats diverticulosis as the exposure and diverticulitis as the outcome in MR using CAUSE [40] (Supplementary Methods). As expected, we found significant evidence suggesting that diverticulosis has a causal effect on diverticulitis (P =3.8×10−06) (Supplementary Table S11).
As lifestyle factors influence the presence of diverticula, we further tested the association between the severity of the condition and relevant known or suggested lifestyle factors including dietary fiber and meat intake, smoking status, alcohol consumption (Methods). After controlling for the effect of sex, age, and population stratification, we found the frequency of alcohol drinking was positively associated with the number of diverticula (β(se)= 0.30(0.08), 95% C.I =[0.147–0.473], P-value=0.000185) (Supplementary Table. S12). PRS deciles of processed meat intake was positively associated with the number of diverticula (β(se)= 17.868 (8.917), 95% C.I = [0.389–35.346], P-value=0.0451) (Fig. 5). In concordance with a previous study [9], we did not observe evidence that dietary fiber intake was significantly associated with the number of diverticula.
We also found that three genes—TPPP, GSN, and TPPP3—showed a positive association with the number of diverticula and four genes—FAM177B, AQP3, FOXA3, and KDELR3—showed a positive association with BMI (Supplemental Table S13, Supplemental Methods). These genes showed increased expression in patients with diverticulosis (Supplemental Tables S4).
Discussion
Here we reported a large, colonoscopy-based, multi-omic study of diverticulosis that captured both transcriptional and genetic variation associated with this condition. The differential gene expression analysis, along with differential transcript usage and eQTL mapping, enhances our understanding of diverticulosis at a gene level. Our study identified genes and transcripts that are directly linked to this condition, as well as eQTLs that facilitate causal inferences within the framework of MR. Using MR, we pinpointed several genes, notably ENTPD7, that supported a causal inference to this condition. This association was further supported in our D.E. analysis, where the expression of ENTPD7 was markedly elevated in patients diagnosed with diverticulosis as compared to the control cohort. In addition, integrating our D.E genes associated with diverticula and gene expression patterns captured at single cell resolution contributes to understanding of diverticulosis at a cellular level. This analysis underscores the potential association between stromal and epithelial cells of the colon and this condition. This exploration furnishes insights into specific cell types intricately tied not only to observed pathways such as tissue remodeling but also to other pathways relevant to diverticulosis, such as the regulation of the enteric nervous system. Furthermore, our PRS analysis suggested that individuals with a higher number of diverticula in their colon had a greater genetic predisposition to diverticulitis.
From our data, the gene ENTPD7 exhibited differential expression in cases versus controls. This locus had prior associations with diverticulosis in GWAS data and our MR analysis demonstrated evidence of a causal effect of this locus on the development of diverticulosis. Indeed, ENTPD7 may play a role in the remodeling of the colon wall through its effects on metalloproteinases (MPs), enzymes that regulates various signal transduction pathways [41]. Extracellular nucleotides, which are hydrolyzed by ENTPD7, have been shown to regulate the activity of MMPs, a type of MPs that play a key role in the remodeling of the extracellular matrix (ECM) such as collagen, which is the complex network of proteins and carbohydrates that surrounds cells and provides structural support for tissues [42]. A previous mRNA study suggested that the presence of colonic diverticulosis was linked to changes in the levels of collagen content and tissue inhibitors of metalloproteinases (TIMPs), whose expression may lead to the reconstruction of colon wall [43]. In addition, elevated levels of MMP-1 and 2 were observed in the intestinal segments that were affected by complicated diverticular diseases and these MMPs were present in the entire wall of the bowel, which could account for the structural alternation [44]. Upregulation of ENTPD7 in individuals with diverticula, which was observed in our D.E analysis, is known to increase enzymatic activity that hydrolyzes nucleotides into their inactive forms which can activate purinergic receptors on cells [45, 46]. This purinergic signal can activate the NF-κB signaling pathway, which was enriched in our pathway analysis, through purinergic P2X7 receptor (P2X7R) selectively targeting NF-kappaB p65 [47]. There is evidence that NF-κB signaling can positively regulate MMP expressions that are essential for tissue remodeling [48, 49, 50]. Therefore, ENTPD7 may be involved in the development of diverticulosis though the regulation of NF-κB signaling that mediates MMP activity and, by extension, colon wall remodeling. Furthermore, age-related alternation in MMP activity may be related to the higher prevalence of diverticulosis in aging populations [51].
Individuals presenting with a greater number of diverticula inherently possess a heightened likelihood of developing diverticulitis, given the increased number of sites vulnerable to inflammation. Our observation furthers this understanding, suggesting that such individuals may also be genetically predisposed to diverticulitis. Recognizing this dual vulnerability—both morphological and genetic—provides a profound opportunity for enhanced patient profiling. Clinicians can integrate this knowledge into their diagnostic and management strategies, ensuring that patients with a larger number of diverticula, especially those with a genetic predisposition, receive more vigilant monitoring and tailored interventions. This proactive approach paves the way for more informed clinical decisions, optimizing patient outcomes.
Previous GWAS for diverticular diseases, defined as diverticulosis, diverticulitis, or diverticular hemorrhage, identified <100 variants and their putative target genes [12, 13, 14]. Two of these previously reported genes were implicated in our DE (ENTPD7) and DTU (OAS1) analyses that were specific to diverticulosis-only. Genes that did not overlap between our study and prior GWAS can be explained by several factors including differences in study design and true regulatory mechanisms for diverticula formation. A notable disparity among the studies lies in the scope of case inclusion. While prior investigations encompassed cases exhibiting any manifestation of diverticular disease, our study concentrates specifically on a meticulously defined cohort with diverticulosis. This deliberate focus allows us to discern that the factors underpinning the development of diverticulosis which are likely distinct from those associated with the onset of inflammation or bleeding. In addition, whereas GWAS investigated case-control phenotypic variance explained by genotype variance, our transcriptomic analyses focused on difference in expressed mRNA abundance and transcript usage of a gene between case and control groups.
Our study has several limitations. First, while our sample did display ancestral diversity, the preponderance of our sample self-identified as white. Given differences in the prevalence of diverticulosis across self-identified race and ethnicity groups [3, 52, 53, 54, 55], it would be plausible to dissect gene expression patterns across populations in future studies, with much larger sample sizes. Second, although our MR pinpointed several causal genes, including ETNPD7, for diverticulosis, there’s a need for further experimental validation of the observed genes, cell types, and biological pathways. Undertaking comprehensive genetic perturbations and histological characterization of the primary candidates could bolster the causal connection of our findings to diverticulosis. Third, although integrating single-cell data with our bulk RNA-seq analysis identified cell types potentially relevant to diverticulosis, not every 38 differentially expressed gene exhibited variation across the tested cell types, which may be attributed to differences in participant characteristics (Supplementary Table S7; Supplementary Figure S5). Obtaining more data will provide a clearer view on specific cell types associated with the condition. Fourth, while our study’s sample size allowed us to identify D.E genes and eQTLs, a larger sample would not only reinforce our findings but may also unveil biological signals that went unnoticed in this study. With an increased sample size, we could embark on GWAS, an approach limited by our current study’s sample size. By integrating this GWAS with existing eQTLs, future studies could explore additional MR analyses. This approach could open avenues to reassess and potentially uncover new links between genetic variations and gene expression.
In summary, we provide the first detailed transcriptome landscape and biological insights related to diverticulosis. Additionally, our study can serve potentially as a valuable reference for understanding the mechanism for the severe complications associated with diverticulosis and for investigating comparisons with other gastrointestinal diseases.
Supplementary Material
Key messages.
What is already known on this topic:
GWAS have identified genetic variants associated with diverticular diseases, highlighting its biological mechanisms.
The genetic factors that contribute specifically to the pathogenesis of colonic diverticulosis, including transcriptome, remain understudied.
What this study adds:
Changes in the transcriptome associated with diverticulosis suggest that tissue remodeling is a primary mechanism for diverticula formation, closely linked to the stromal and epithelial cell types in the colon.
Our study identified five genes, notably including ENTPD7, with causal connections to this condition and confirmed that expression of the ENTPD7 gene was elevated in patients with diverticulosis.
We found a novel association between increased genetic susceptibility to diverticulitis and a larger numbers of diverticula observed during colonoscopy.
How this study might affect research, practice, or policy:
Incorporating transcriptomic and genetic findings with knowledge of dietary and lifestyle factors is a critical step towards developing preventive strategies that avert the onset of the condition and devising effective treatments for patients who are susceptible to its complications.
Scalable genetic loss-of-function experiments to assess the impact of the putative causal genes could be a promising direction to further determine the regulatory mechanisms of diverticulosis.
Acknowledgement
We acknowledge the help of Amber McCoy for sample preparation and the assistance of Amanda Gerringer of the Mammalian Genotyping Center at UNC and the team at the UNC High Throughput Sequencing facility.
Funding
The project described was supported by the National Institutes of Health, through Grant Award Numbers NIH R01DK094738 (RSS), R01DK132050 (AFP), P30 DK034987 (RSS) and T32HL129982 (contact PI: KEN) for JKS. The project was also supported in part by the University Cancer Research Fund to Lineberger Comprehensive Cancer Center.
List of abbreviations
- GWAS
Genome Wide Association Study
- eQTL
Expression Quantitative Trait Loci
- PRS
Polygenic Risk Score
- RNA-seq
RNA sequencing
- DE
Differentially expressed
- DTU
Differential Transcript Usage
- MR
Mendelian Randomization
- ECM
Extracellular Matrix
- MPs
Metalloproteinases
- PCs
Principal Components
- TIMPs
Tissue Inhibitors of Metalloproteinases
Footnotes
Disclosures
The authors declare that they have no conflicts of interest.
Writing assistance
None
Patient and Public Involvement
Patients or the public were not involved in the design, or conduct, or reporting, or dissemination plans of our research.
Availability of data and materials
We are in the process of uploading the raw RNA-seq fastq files and genotype data from our study to the NCBI dbGaP database (https://www.ncbi.nlm.nih.gov/gap/) [1]. Gene-level quantification was conducted using SALMON (https://github.com/COMBINE-lab/salmon), with surrogate variables computed via SVA (https://github.com/jtleek/sva/tree/master). DESeq2 (https://github.com/thelovelab/DESeq2) facilitated our differential gene expression analysis, while DRIMSeq (https://github.com/gosianow/DRIMSeq) was employed for differential transcript usage analysis. For eQTL mapping, tensorQTL (https://github.com/broadinstitute/tensorqtl) was used, and Mendelian randomization (MR) for eQTL-GWAS was performed using SMR (https://yanglab.westlake.edu.cn/software/smr/#SMR&HEIDIanalysis), with additional MR for GWASs conducted with CAUSE (https://jean997.github.io/cause/). Polygenic risk scores were calculated with pgsc_calc (https://github.com/PGScatalog/pgsc_calc). Single-cell RNA-seq analysis utilized Seurat (https://github.com/satijalab/seurat), and negative binomial generalized linear models were applied using the MASS R package (https://github.com/cran/MASS/tree/master).
References
- 1.Seo J, Liu H, Young Y, Zhang X, Keku TO, Jones CD, et al. Genetic and transcriptomic landscape of colonic diverticulosis. National Center for Biotechnology Information dbGaP 2023;DATA ACCESS REQUEST INITIATED. [Google Scholar]
- 2.Rezapour M, Ali S, Stollman N. Diverticular Disease: An Update on Pathogenesis and Management. Gut Liver 2018;12:125–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Peery AF, Keku TO, Galanko JA, Sandler RS. Sex and Race Disparities in Diverticulosis Prevalence. Clin Gastroenterol Hepatol 2020;18:1980–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Peery AF, Keil A, Jicha K, Galanko JA, Sandler RS. Association of Obesity With Colonic Diverticulosis in Women. Clin Gastroenterol Hepatol 2020;18:107–14.e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Yuan S, Larsson SC. Genetically Predicted Adiposity, Diabetes, and Lifestyle Factors in Relation to Diverticular Disease. Clin Gastroenterol Hepatol 2022;20:1077–84. [DOI] [PubMed] [Google Scholar]
- 6.Larsson SC, Burgess S. Causal role of high body mass index in multiple chronic diseases: a systematic review and meta-analysis of Mendelian randomization studies. BMC Med 2021;19:320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Chen J, Yuan S, Fu T, Ruan X, Qiao J, Wang X, et al. Gastrointestinal Consequences of Type 2 Diabetes Mellitus and Impaired Glycemic Homeostasis: A Mendelian Randomization Study. Diabetes Care 2023;46:828–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Peery AF, Sandler RS, Ahnen DJ, Galanko JA, Holm AN, Shaukat A, et al. Constipation and a low-fiber diet are not associated with diverticulosis. Clin Gastroenterol Hepatol 2013;11:1622–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Peery AF, Barrett PR, Park D, Rogers AJ, Galanko JA, Martin CF, et al. A high-fiber diet does not protect against asymptomatic diverticulosis. Gastroenterology 2012;142:266–72.e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Granlund J, Svensson T, Olén O, Hjern F, Pedersen NL, Magnusson PK, et al. The genetic influence on diverticular disease--a twin study. Aliment Pharmacol Ther 2012;35:1103–7. [DOI] [PubMed] [Google Scholar]
- 11.Strate LL, Erichsen R, Baron JA, Mortensen J, Pedersen JK, Riis AH, et al. Heritability and familial aggregation of diverticular disease: a population-based study of twins and siblings. Gastroenterology 2013;144:736–42.e1; quiz e14. [DOI] [PubMed] [Google Scholar]
- 12.Canela-Xandri O, Rawlik K, Tenesa A. An atlas of genetic associations in UK Biobank. Nat Genet 2018;50:1593–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Maguire LH, Handelman SK, Du X, Chen Y, Pers TH, Speliotes EK. Genome-wide association analyses identify 39 new susceptibility loci for diverticular disease. Nat Genet 2018;50:1359–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Schafmayer C, Harrison JW, Buch S, Lange C, Reichert MC, Hofer P, et al. Genome-wide association analysis of diverticular disease points towards neuromuscular, connective tissue and epithelial pathomechanisms. Gut 2019;68:854–65. [DOI] [PubMed] [Google Scholar]
- 15.Wu Y, Goleva SB, Breidenbach LB, Kim M, MacGregor S, Gandal MJ, et al. 150 risk variants for diverticular disease of intestine prioritize cell types and enable polygenic prediction of disease susceptibility. Cell Genomics 2023:100326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Schieffer KM, Choi CS, Emrich S, Harris L, Deiling S, Karamchandani DM, et al. RNA-seq implicates deregulation of the immune system in the pathogenesis of diverticulitis. Am J Physiol Gastrointest Liver Physiol 2017;313:G277–g84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 2012;28:882–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014;15:550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 2017;14:417–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Nowicka M, Robinson MD. DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics. F1000Res 2016;5:1356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ulgen E, Ozisik O, Sezerman OU. pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front Genet 2019;10:858. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 2020;369:1318–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 2010;11:R25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Taylor-Weiner A, Aguet F, Haradhvala NJ, Gosai S, Anand S, Kim J, et al. Scaling computational genomics to millions of individuals with GPUs. Genome Biol 2019;20:228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet 2016;48:481–7. [DOI] [PubMed] [Google Scholar]
- 26.Zhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, Wolford BN, et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet 2018;50:1335–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ghoussaini M, Mountjoy E, Carmona M, Peat G, Schmidt EM, Hercules A, et al. Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics. Nucleic Acids Res 2021;49:D1311–d20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kurki MI, Karjalainen J, Palta P, Sipilä TP, Kristiansson K, Donner KM, et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 2023;613:508–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 2014;42:D1001–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Lambert SA, Gil L, Jupp S, Ritchie SC, Xu Y, Buniello A, et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat Genet 2021;53:420–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Love MI, Soneson C, Patro R. Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification. F1000Res 2018;7:952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Soneson C, Matthes KL, Nowicka M, Law CW, Robinson MD. Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage. Genome Biol 2016;17:12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Hickey JW, Becker WR, Nevins SA, Horning A, Perez AE, Zhu C, et al. Organization of the human intestine at single-cell resolution. Nature 2023;619:572–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Becker Winston. scRNA data from: Organization of the human Intestine at single cell resolution [Dataset]. Dryad 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Sakaue S, Kanai M, Tanigawa Y, Karjalainen J, Kurki M, Koshiba S, et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet 2021;53:1415–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Yegutkin GG. Nucleotide- and nucleoside-converting ectoenzymes: Important modulators of purinergic signalling cascade. Biochim Biophys Acta 2008;1783:673–94. [DOI] [PubMed] [Google Scholar]
- 37.Burnstock G Purinergic signalling. Br J Pharmacol 2006;147 Suppl 1:S172–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zhang Y, Zhang H, Zhu J, He Y, Wang P, Li D, et al. Association between diverticular disease and colorectal cancer: a bidirectional mendelian randomization study. BMC Cancer 2023;23:137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Peppercorn MA. The overlap of inflammatory bowel disease and diverticular disease. J Clin Gastroenterol 2004;38:S8–10. [DOI] [PubMed] [Google Scholar]
- 40.Morrison J, Knoblauch N, Marcus JH, Stephens M, He X. Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genome-wide summary statistics. Nat Genet 2020;52:740–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Khokha R, Murthy A, Weiss A. Metalloproteinases and their natural inhibitors in inflammation and immunity. Nat Rev Immunol 2013;13:649–65. [DOI] [PubMed] [Google Scholar]
- 42.Visse R, Nagase H. Matrix metalloproteinases and tissue inhibitors of metalloproteinases: structure, function, and biochemistry. Circ Res 2003;92:827–39. [DOI] [PubMed] [Google Scholar]
- 43.Mimura T, Bateman AC, Lee RL, Johnson PA, McDonald PJ, Talbot IC, et al. Up-regulation of collagen and tissue inhibitors of matrix metalloproteinase in colonic diverticular disease. Dis Colon Rectum 2004;47:371–8; discussion 8–9. [DOI] [PubMed] [Google Scholar]
- 44.Rosemar A, Ivarsson ML, Börjesson L, Holmdahl L. Increased concentration of tissue-degrading matrix metalloproteinases and their inhibitor in complicated diverticular disease. Scand J Gastroenterol 2007;42:215–20. [DOI] [PubMed] [Google Scholar]
- 45.Kusu T, Kayama H, Kinoshita M, Jeon SG, Ueda Y, Goto Y, et al. Ecto-nucleoside triphosphate diphosphohydrolase 7 controls Th17 cell responses through regulation of luminal ATP in the small intestine. J Immunol 2013;190:774–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Zimmermann H, Zebisch M, Sträter N. Cellular function and molecular structure of ecto-nucleotidases. Purinergic Signal 2012;8:437–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Ferrari D, Wesselborg S, Bauer MK, Schulze-Osthoff K. Extracellular ATP activates transcription factor NF-kappaB through the P2Z purinoreceptor by selectively targeting NF-kappaB p65. J Cell Biol 1997;139:1635–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Al-Sadi R, Engers J, Haque M, King S, Al-Omari D, Ma TY. Matrix Metalloproteinase-9 (MMP-9) induced disruption of intestinal epithelial tight junction barrier is mediated by NF-κB activation. PLoS One 2021;16:e0249544. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Yeh CB, Hsieh MJ, Hsieh YH, Chien MH, Chiou HL, Yang SF. Antimetastatic effects of norcantharidin on hepatocellular carcinoma by transcriptional inhibition of MMP-9 through modulation of NF-kB activity. PLoS One 2012;7:e31055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Page-McCaw A, Ewald AJ, Werb Z. Matrix metalloproteinases and the regulation of tissue remodelling. Nat Rev Mol Cell Biol 2007;8:221–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Freitas-Rodríguez S, Folgueras AR, López-Otín C. The role of matrix metalloproteinases in aging: Tissue remodeling and beyond. Biochim Biophys Acta Mol Cell Res 2017;1864:2015–25. [DOI] [PubMed] [Google Scholar]
- 52.Golder M, Ster IC, Babu P, Sharma A, Bayat M, Farah A. Demographic determinants of risk, colon distribution and density scores of diverticular disease. World J Gastroenterol 2011;17:1009–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Choe EK, Lee JE, Chung SJ, Yang SY, Kim YS, Shin ES, et al. Genome-wide association study of right-sided colonic diverticulosis in a Korean population. Sci Rep 2019;9:7360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Song JH, Kim YS, Lee JH, Ok KS, Ryu SH, Lee JH, et al. Clinical characteristics of colonic diverticulosis in Korea: a prospective study. Korean J Intern Med 2010;25:140–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Hong W, Geng W, Wang C, Dong L, Pan S, Yang X, et al. Prevalence of colonic diverticulosis in mainland China from 2004 to 2014. Sci Rep 2016;6:26237. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
We are in the process of uploading the raw RNA-seq fastq files and genotype data from our study to the NCBI dbGaP database (https://www.ncbi.nlm.nih.gov/gap/) [1]. Gene-level quantification was conducted using SALMON (https://github.com/COMBINE-lab/salmon), with surrogate variables computed via SVA (https://github.com/jtleek/sva/tree/master). DESeq2 (https://github.com/thelovelab/DESeq2) facilitated our differential gene expression analysis, while DRIMSeq (https://github.com/gosianow/DRIMSeq) was employed for differential transcript usage analysis. For eQTL mapping, tensorQTL (https://github.com/broadinstitute/tensorqtl) was used, and Mendelian randomization (MR) for eQTL-GWAS was performed using SMR (https://yanglab.westlake.edu.cn/software/smr/#SMR&HEIDIanalysis), with additional MR for GWASs conducted with CAUSE (https://jean997.github.io/cause/). Polygenic risk scores were calculated with pgsc_calc (https://github.com/PGScatalog/pgsc_calc). Single-cell RNA-seq analysis utilized Seurat (https://github.com/satijalab/seurat), and negative binomial generalized linear models were applied using the MASS R package (https://github.com/cran/MASS/tree/master).
