Skip to main content
PLOS One logoLink to PLOS One
. 2024 Dec 31;19(12):e0311495. doi: 10.1371/journal.pone.0311495

Identification of O-glycosylation related genes and subtypes in ulcerative colitis based on machine learning

Yue Lu 1,#, Yi Su 1,#, Nan Wang 1,#, Dongyue Li 1, Huichao Zhang 1, Hongyu Xu 1,*
Editor: Ashutosh Pandey2
PMCID: PMC11687659  PMID: 39739658

Abstract

Ulcerative colitis (UC) is an immune-related inflammatory bowel disease, with its underlying mechanisms being a central area of clinical research. O-GlcNAcylation plays a critical role in regulating immunity progression and the occurrence of inflammatory diseases and tumors. Yet, the mechanism of O-GlcNAc-associated colitis remains to be elucidated. To this end, the transcriptional and clinical data of GSE75214 and GSE92415 from the GEO database was hereby examined, and genes MUC1, ADAMTS1, GXYLT2, and SEMA5A were found to be significantly related to O-GlcNAcylation using machine learning methods. Based on the four hub genes, two UC subtypes were built. Notably, subtype B might be prone to developing colitis-associated colorectal cancer (CAC). This study delved into the role of intestinal glycosylation changes, especially the O-GlcNAcylation, and forged a foundation for further research on the occurrence and development of UC. Overall, understanding the role of O-GlcNAcylation in UC could have significant implications for diagnosis and treatment, offering valuable insights into the disease’s progression.

Introduction

Ulcerative colitis (UC), an inflammatory bowel disease, has been a persistent challenge for patients over decades. Elucidating the deeper and more precise mechanisms behind UC has been a key focus in clinical research. Immune response holds considerable significance in the occurrence and development of UC [1]. The pathogenesis of UC includes various components of immunoinflammatory pathways related to the intestine, including antigen recognition, immune response, epithelial barrier, and intestinal microbiota [13]. In addition, various types of immune cells, such as antigen-presenting cells (dendritic cells and macrophages), T helper cells, regulatory T cells, and natural killer T cells, play vital roles in the pathogenesis of UC by regulating, inhibiting, and maintaining inflammation [46]. Addressing immune-related issues is now a critical component of the fundamental research regarding ulcerative colitis. Previous studies have demonstrated a strong link between glycosylation and both colon inflammation and the colonic immune response [79].

Glycosylation is a reversible post-translational modification that involves the enzymatic covalent attachment of monosaccharides or glycans to proteins. This process is known as glycosylation [10]. As an essential modification of proteins, protein glycosylation mainly includes N-glycans, O-glycans, and other type [10, 11]. O-GlcNAcylation, also known as O-glycosylation, plays an essential role in regulating innate immune cell function, cell metabolism, and the occurrence of inflammatory diseases and tumors [12]. Research has revealed that the levels of O-GlcNAcylation on proteins alter when innate immune cells are stimulated during inflammatory states [1316]. Consequently, the disruption of O-GlcNAcylation balance in the body can lead to a range of diseases, encompassing intestinal inflammatory disorders, diabetes, neurodegeneration, and even tumors [1719]. The relationship between intestinal O-GlcNAcylation and ulcerative colitis has attracted increasing attention [9, 12, 2022].

Intestinal mucosal injury is the most direct manifestation of ulcerative colitis. Most intestinal glycans are mucin-type O-glycans, making up 80% of the mass of human MUC2, the most prevalent intestinal mucin [23]. Intestinal epithelial O-glycans can directly regulate microorganism interactions by providing ligands for bacterial adhesins and nutrients for bacterial metabolism [22]. Various evidence has supported the strong connection between O-glycosylation and ulcerative colitis. Identifying glycosylation biomarkers and their expression changes is essential for diagnosing ulcerative colitis, predicting its progression, and assessing potential complications.

In this study, bioinformatics methods were employed to explore the role of O-GlcNAcylation in developing ulcerative colitis. Furthermore, multiple machine learning methods were used to classify UC into two subtypes based on key genes, guiding the choice of UC treatment and prognostic judgment of UC.

Materials and methods

Datasets and sample selection

The following criteria from the Gene Expression Omnibus (GEO) database (www.ncbi.nlm.nih.gov/geo/) retrieval of UC microarray datawere included: a) data from the same sequencing platform to generate expression of two different spectra; b) inclusion of human test samples only; and c) a minimum of ten samples per groups. Finally, two datasets, namely GSE75214 (provided by the GPL6244 platform) and GSE92415 (provided by the GPL13158 platform), were hereby incorporated. The GSE75214 database contains intestinal mucosal biopsies obtained endoscopically from UC patients (n = 97) and healthy controls (n = 11), followed by microarray analysis to assess gene expression. GSE92415 database enrolled 21 healthy subjects and 162 UC patients, including baseline before treatment (n = 87) and post-treatment individuals (n = 75), to evaluate the effect of golimumab (GLM) during induction treatment in moderately to severe UC. In this study, only 87 untreated UC samples and 21 healthy control samples were selected.

Merge and deduplication of datasets

The genes from both the UC patients and healthy individuals in the GSE75214 dataset were merged with those from GSE92415 to form a comprehensive data set. The batch effect was then eliminated to minimize discrepancies between the different datasets, using the R packages "limma" and "sva" [24, 25]. There were a total of 216 samples and 16467 genes after combination (S1 File). S1A and S1B Fig present the data before and after the merger, respectively (S1A and S1B Fig). To eliminate the adverse effects caused by singular sample data, the datasets was homogenized by R packages "preprocessCore". S1C and S1D Fig illustrate the data before and after normalization, respectively (S1C and S1D Fig).

Identification of differential expressed genes

The genes that met the criteria of the adjusted P-value < 0.05 and |log fold change (FC)| > 1.0 or 0.5 were considered DEGs using the "limma" package. The volcano plot and the heatmap visualized the DEGs using the "ggplot2" package and "pheatmap" package, respectively.

Biological function and pathway enrichment analyses

Using the R language "clusterprofiler" package, Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed to identify the potential functions of differential genes and signaling pathways associated with DEGs. GO assays included biological process (BP), Cell component (CC), and molecular function (MF) categories.

Identification and functional enrichment analysis of O-GlcNAcylation differential genes

The O-GlcNAcylation gene set was downloaded from the MsigDB database (https://www.gsea-msigdb.org/). Subsequently, the gene set between the UC and healthy control groups was extracted and interacted with the O-Glcnacylation gene set to search for differential genes. For this analysis, the R "clusterprofiler" package was utilize [26]. The GO and KEGG enrichment analyses were carried out to derive visual representations of the enrichment results.

PPI

STRING database (https://string-db.org/) was used to construct Protein-Protein Interaction Networks (PPI) encoded by 7 DEGs to represent the relationships among the 7 differential gene [27].

Machine learning

LASSO was performed to enhance the predictive accuracy and comprehensibility of the statistical models by employing a regression method for variable selection. Random Forest (RF) is a versatile computational method capable of predicting continuous variables. It is adaptable to various conditions and is known for its high accuracy and sensitivity [28]. Support vector machine (SVM) is a supervised machine learning (ML) method capable of learning from data and making decisions [29].

In this study, three machine learning methods, namely LASSO regression (R-packaged glmnet), Random Forest (R language randomForest), and SVM support vector machine (R-packaged kernel), were used to screen essential differential genes from the seven candidate genes.

Core genes predicting the disease onset

The receiver operating characteristic (ROC) curve was drawn using the pROC software package (R Package pROC) to evaluate the sensitivity and specificity of four core genes in predicting disease occurrence, with the X-axis indicating "specificity" and the Y-axis representing "sensitivity" [30, 31]. Other gene predictions could obtain different ROC curves. Different areas under the ROC curve (AUC) were obtained, reflecting the gene’s strength in predicting disease occurrence.

Immune landscape of dataset

CIBERSORT (http://cibersort.stanford.edu/) was used to determine the GSE75214 and GSE92415 states of the immune cells infiltrating. Following that, Spearman’s method was employed to assess the correlation between the expression of the four pivotal genes and that of immune cells within the dataset samples.

GSEA analysis of single genes

To characterize the potential functions of the four hub genes, the R clusterProfiler package was used to display the top 20 results of four single-gene GSEA analyses of Reactome [32]. The listed values denoted enrichment scores, with scores above zero indicating a positive correlation between the gene and the pathway, and scores below zero suggesting a negative correlation. The results were then ranked in descending order according to the absolute value of the normalized enrichment score (NES).

Unsupervised clustering of genes

Based on the four core genes, the R package "ConsensusClusterPlus" was used for unsupervised consensus cluster analysis, identifying two subtypes as optimal, this analysis further highlighted the differential expression of core genes among different types. A p-value less than 0.05 was considered significant. R-package pheatmap was used to draw a heatmap to show the expression differences of the four gene expressions among different subtypes.

GSVA analysis of different types of pathways

KEGG path and Reactome path were downloaded from the Msigdb database, respectively. The R package GSVA was used to score the paths and to compare the differences between the paths of the two subtypes [33]. Subsequently, an R package, “pheatmap,” was adopted for drawing a heatmap to compare the two groups.

Biological function and pathway enrichment analysis of two subtypes

PCA diagram was used to show the distribution of different subtypes of UC samples, indicating the relationship between two distinct subtypes of UC. Further differential analysis was performed for subtypes, and the selected differential genes were enriched by GO and KEGG. Clusterprofiler was utilized to obtain visual enrichment analysis results.

Prediction of miRNAs and transcription factors upstream of genes

The regnetwork database (https://regnetworkweb.org/) was used to predict miRNAs and transcription factors (TFs) upstream of genes, with red indicating the core gene. Finally, Cytoscape software was used to construct the network [34].

The R language codes related to bioinformatics methods involved in this study have been uploaded as supplementary information (S2 File).

Results

Identification and functional enrichment analysis of DEGs

Upon the merging of the two datasets, namely GSE75214 (provided by the GPL6244 platform) and GSE92415 (provided by the GPL13158 platform), 184 UC samples and 32 healthy controls were ultimately obtained (Table 1).

Table 1. Sample numbers.

Healthy Control Ulcerative Colitis
GSE75214 11 97
GSE92415 21 87
Total 32 184

The GSE75214 database contained UC patients (n = 97) and healthy controls (n = 11), GSE92415 database enrolled 21 healthy subjects and 162 UC patients, including baseline before treatment (n = 87) and post-treatment individuals (n = 75), which were excluded from the present research.

The two databases from the GEO database underwent normalization and subsequent merging (S1 Fig). Subsequently, the limma package of R language was used for differential analysis between UC and control, and the differentially expressed genes were screened according to the criteria of |logFC|>1 and adj.P.Val <0.05. The results showed that 449 genes were co-up-regulated and 233 were co-down-regulated (Fig 1A). Heatmap analysis showed significant gene expression differences between UC and the healthy control group. For example, REG and MMP families had significantly high expression in UC, while low expression was in healthy control groups (Fig 1B).

Fig 1. Identification and functional enrichment analysis of DEGs.

Fig 1

(A) The volcano map showed DEGs from two GEO datasets, UC and health control. (B) The heatmap showed the different genes between UC and healthy controls. The screening criteria were set to |LogFC| > 1 and adj.P.Val < 0.05. (C-E) The enrichment analysis results of GO, including BP, CC, and MF, revealed the underlying functions of DEGs. (F) KEGG revealed the first twenty pathways of differential gene enrichment.

GO enrichment analysis involving BP, CC, and MF showed that differential genes of UC and healthy controls were mainly enriched in leukocyte migration, neutrophil, granulocyte chemotaxis, and regulation of immune processes (Fig 1C–1E). Regarding the KEGG pathway, significant enrichment pathways were TNF, IL-17 signaling pathway, rheumatoid arthritis, NF-κb, and B-cell receptor signaling pathway (Fig 1F). Additionally, UC and healthy controls of DEGs were primarily observed in immune and inflammatory pathways.

Screening and functional enrichment of O-GlcNAcylation-associated differential genes

The upregulated UC-associated differential genes and O-GlcNAcylation gene sets were extracted and overlapped. Upon overlapping analysis, four common differential genes were identified, including ADAMTS1, MUC1, ST3GAL1, and THBS2 (Fig 2A). The downregulated differential genes and O-GlcNAcylation gene sets between the UC group and the healthy control group were extracted for intersection analysis, and three common differential genes were identified, including SEMA5A, GXYLT2, and GALNT12, after overlapping analysis (Fig 2B).

Fig 2. Screening and functional enrichment of O-GlcNAcylation-associated differential genes.

Fig 2

(A) Venn diagram of upregulated differential genes and O- GlcNAcylation gene sets. (B) Venn diagram of downregulating differential gene and O-GlcNAcylation gene set. (C) The enrichment analysis results of GO, including BP, CC, and MF. (D) The KEGG enrichment of DEGs. (E) Mapping between the top 5 pathway of KEGG and three differential genes, with different colored lines corresponding to different KEGG pathways.

GO enrichment analysis uncovered that differential genes were mainly enriched in protein glycosylation, biosynthesis, and metabolism of glycoproteins (Fig 2C). The KEGG pathway significantly increased in other glycosylation biosynthetic pathways, Mucin-type O-glycan biosynthesis, and PI3K-Akt signaling pathway (Fig 2D). Furthermore, it was found that the seven differential genes were mainly involved in the glycosylation biosynthesis and metabolism pathways. Moreover, three critical genes, including GXYLT2, GALNT12, and ST3GAL1, were selected and enriched in the KEGG top 5 pathways (Table 2), and their corresponding relationships were labeled (Fig 2E).

Table 2. The KEGG enrichment analysis table of DEGs.

ID Description GeneRatio BgRatio pvalue p.adjust geneID
hsa00512 Mucin type O-glycan biosynthesis 2/5 36/9180 0.000148427 0.00152442 ST3GAL1/GALNT12
hsa00514 Other types of O-glycan biosynthesis 2/5 47/9180 0.00025407 0.00152442 GXYLT2/GALNT12
hsa00533 Glycosaminoglycan biosynthesis ‐ keratan sulfate 1/5 14/9180 0.007603702 0.019548115 ST3GAL1
hsa00603 Glycosphingolipid biosynthesis ‐ globo and isoglobo series 1/5 15/9180 0.008145048 0.019548115 ST3GAL1
hsa00604 Glycosphingolipid biosynthesis ‐ ganglio series 1/5 15/9180 0.008145048 0.019548115 ST3GAL1

Table 2 demonstrated that KEGG top 5 pathways ranked in ascending order of p-value, and the relevant information including ID, Description, GeneRatio, BgRatio, pvalue, p.adjust, qvalue, geneID, and Count.

Expression and correlation of the hub DEGs

The expression levels of the seven differentially expressed genes identified through intersection analysis between the UC and healthy control group were visualized using a volcano plot and a heatmap. These visualizations provided a comparative analysis of the gene expression patterns in both groups (Fig 3A). As shown in the figure, the expression of ADAMTS1, MUC1, ST3GAL1, and THBS2 were significantly upregulated in UC, while those of SEMA5A, GXYLT2, and GALNT12 were considerably downregulated in UC (Fig 3B). Using the STRING database (https://string-db.org/), PPI Networks encoded by 7 DEGs were constructed. PPI, an interaction network comprising 7 nodes and 18 edges, was visualized using Cytoscape software (Fig 3C).

Fig 3. Expression and correlation of the hub DEGs.

Fig 3

(A) The volcano map of the seven differential genes presented separately. (B) Expression analysis of 7 differential genes in UC and healthy controls (ggplo2 package mapping). ns p>0.05, *p<0.05, **p<0.01, ***p<0.001, ****p<0.0001. (C) 7 DEG-encoded protein interaction networks. The network nodes represent proteins, while the lines indicate predicted relationships: with light blue representing auxiliary database evidence, purple representing laboratory proof, yellow representing text mining evidence, green representing gene similarity, red representing gene fusion, blue representing gene co-production, black lines representing gene co-expression, and gray lines representing protein homology.

Machine learning screening for key differential genes

Herein, the most important features were selected based on three machine learning algorithms to screen out the hub genes with the most guiding value further from the 7 DEGs. The LASSO logistic regression algorithm, RF analysis, and SVM algorithm were carried out successively. 4 key genes were selected based on the results of the three algorithms.

Furthermore, LASSO analysis was conducted to identify 5 tag genes, namely ADAMTS1, MUC1, ST3GAL1, SEMA5A, and GXYLT2 (Fig 4A). In the RF analysis, 5 tag genes were selected in order of relative importance, namely SEMA5A, ADAMTS1, MUC1, GXYLT2, and THBS2 (Fig 4B). Moreover 6 tag genes, namely ADAMTS1, SEMA5A, MUC1, GXYLT2, THBS2 and GALNT12, were identified using SVM (Fig 4C). 4 core genes were finally recognized through the interaction of these three algorithms, including GXYLT2, MUCI, ADAMTS1, and SEMA5A (Fig 4D). Subsequently, the correlations among the 4 core genes screened by machine learning were evaluated, with red representing positive and green indicating negative correlations (Fig 4E).

Fig 4. Machine learning screening for key differential genes.

Fig 4

(A) LASSO regression screening of 5 genes. (B) RF selected 5 genes in order of importance. (C) SVM screened 6 genes. (D) Intersection obtained 4 core genes. (E) The correlation between the 4 core genes, with red represents a positive correlation, and green indicating a negative correlation. (F) ROC curve of 4 genes predicting disease occurrence.

A significant correlation between SEMA5A, GXYLT2, and MUC1 expression was observed, with SEMA5A showing a strong positive correlation with GXYLT2 expression and a strong negative correlation with MUC1 expression. ADAMTS1 was negatively correlated with the expression of GXYLT2 but exhibited no significant correlation with the expression of MUC1. In addition, there was a significant negative correlation between MUC1 and GXYLT2 expression. As indicated by the ROC curves, the four hub genes demonstrated a strong predictive power for UC (GXYLT2 AUC = 0.923, MUC1 AUC = 0.898, ADAMTS1 AUC = 0.955, and SEMA5A AUC = 0.944) (Fig 4F).

Evaluation of the degree of immune cell infiltration

In this study, the relationship between immune cells in UC was also investigated and the results demonstrated a positive correlation in the expression of Activated B cells, Activated CD4, CD8 T cells, Natural Killer cells, and other immune cells. The expressions of Type 17 helper cell, activated B cell, and activated CD8 T cell were negatively correlated, respectively, while most other immune cells were positively correlated with each other (Fig 5A). CiberSort was employed to futher demonstrate the difference in immune cell infiltration between UC and healthy control group. The results showed that the levels of Activated B cells, Activated CD4, CD8 T cells, Natural Killer cells, and other immune cells in UC patients were significantly higher than those in the standard control group. Besides, no significant difference in the expression of Type 17 helper cell and CD56dim natural killer cells was identified between the UC and healthy control group (Fig 5B).

Fig 5. Evaluation of the degree of immune cell infiltration.

Fig 5

(A) Correlation analysis between immune cells. (B) Differences in immune cell infiltration between UC and healthy control group, ns p>0.05, *p<0.05, **p<0.01, ***p<0.001. (C) Correlation analysis between 4 core genes and immune cells.

Furthermore, the connection between 4 core genes and immune cells was also delved into. The findings revealed a significant correlation between the expression of these core genes and activated CD4 T cells, natural killer cells, Type 17 helper cells, and CD56dim natural killer cells. Among them, ADAMTS1 was observed to be significantly positively correlated with the expression of natural killer T cells and activated CD4 T cells while being significantly negatively correlated with the expression of Type 17 helper cells and CD56dim natural killer cells. Meanwhile, GXYLT2 and SEMA5A were significantly negatively correlated with the expression of activated dendritic cells, activated CD4 T cells, natural killer cells, Type 17 helper cells, and CD56dim natural killer cells. Moreover, there was a significant positive correlation between the expression of MUC1 and activated dendritic cells, natural killer cells, activated CD4 T cells, and Type 17 helper cells (Fig 5C).

Single gene enrichment analysis

Based on the significant role of the four hub genes, the correlation genes associated with ADAMTS1, GXYLT, MUC1, and SEMA5A expression were hereby analyzed. The heatmap positively revealed the top 50 co-expressed genes with four core genes (S2A–S2D Fig).

Single-gene GSEA was performed to characterize the potential function of the four hub genes. The ridgeline plot displayed only the top 20 results. Details are shown in Fig 6, and the values below representing enrichment scores, with a value exceeding 0 indicating a positive correlation between a gene and a pathway, while a value less than 0 indicating a negative correlation.

Fig 6. Single gene enrichment analysis.

Fig 6

(A) GSEA analysis for ADAMTS1. (B) GSEA analysis for GXYLT2. (C) GSEA analysis for MUC1. (D) GSEA analysis for SEMA5A.

Almost all pathways identified were related to immunity and inflammation, including antigen processing-cross-presentation, signaling by interleukins, integral cell surface interactions, and interferon signaling. Meanwhile, MUC1 showed a negative correlation with both Asparagine N-linked glycosylation and O-linked glycosylation of mucins. Additionally, GXYLT2 and SEMA5A exhibited a negative correlation with collagen formation, whereas ADAMTS1 displayed a positive correlation with the same process (Fig 6A–6D).

Unsupervised consensus clustering analysis of gene expression profiles revealed two subtypes of UC

An unsupervised consensus clustering analysis was conducted based on the four hub genes, with all UC samples initially divided into k (k = 2–9) clusters. The cumulative distribution function (CDF) curves of the consensus score matrix statistic indicating that the optimal number was obtained when k = 2. Consequently, two distinct subtypes of UC were identified (Fig 7A), involving 106 samples in subtype A and 78 in subtype B. The four genes exhibited remarkable differences in expression between the two subtypes (p<0.05). The expression of all other genes in subtype A was higher than that in subtype B, except for MUC1 (Fig 7B). Furthermore, a heatmap was drawn to more intuitively display the expression differences of four genes between 184 samples from two subtypes using the R software package “pheatmap”. GXYLT2, ADAMTS1, and SEMA5A were significantly upregulated in subtype B, while MUC1 was upregulated considerably in subtype A, further validating the presence of diverse subtypes in UC (Fig 7C).

Fig 7. Identification and validation of ulcerative colitis subtypes.

Fig 7

(A) Heatmap of sample clustering at consensus k = 2. (B) The expression status of four hub genes in the two subtypes, ***p<0.001. (C) Heatmap of four hub genes in the two subtypes.

GSVA of biological pathways between two subtypes

GSVA enrichment was performed to explore the biological behavior and pathway differences of the two clusters. The GSVA enrichment analysis showed that the two subtypes significantly varied in the metabolism of various substances. A heatmap of the genes was organized in ascending order according to their P values, and the top 20 were selected for further analysis.

The results of the KEGG analysis showed that the A subtype was enriched in pathways of base excision repair and substance metabolism, including galactose, fructose, mannose, amino sugar, nucleotide sugar metabolism, and glycerolipid. In contrast, the B subtype was frequently involved in cancer-related pathways, such as non-small cell lung cancer, colorectal cancer, chronic myeloid leukemia, endometrial cancer, among others. As a result, it could be reasonably speculated that subtype B of UC could possibly develop into ulcerative colitis-associated colorectal cancer (CAC) (Fig 8A).

Fig 8. The diversity of the underlying biological function characteristics between the two subtypes.

Fig 8

(A) The differences in KEGG pathway enrichment score between subtypes A and B. (B) The differences in Reactome pathway enrichment score between subtypes A and B.

Furthermore, the results of the Rectome analysis indicated that the subtype A was enriched in nucleotide catabolism and purine catabolism pathways. In contrast, the subtype B was enriched in pathways of ESR mediated signaling, signaling by the nuclear receptor, IGF1R signaling, RUNX2 regulates osteoblast differentiation, RUNX2 regulates bone development, and glutamate and glutamine metabolism (Fig 8B).

Differential genes and enrichment analysis of the two subtypes

The principal component analysis (PCA) demonstrated that UC patients were well distributed into two clusters (Fig 9A). The PCA offered a holistic and clear visual representation, mapping all samples and highlighting the separation between groups. The substantial distance between subtypes A and B indicated pronounced distinctions between them.

Fig 9. Differential genes and enrichment analysis of the two subtypes.

Fig 9

(A) PCA analysis demonstrating a distinctive difference between the two clusters. (B) Volcano plot of the 229 DEGs. The threshold for the volcano plot was |logFC| >0.5 and adj.p.Val. < 0.05. (C) GO enrichment analysis showing the BP, CC, and MF parts. (D) The bubble plot depicting the KEGG pathway enrichment analysis of DEGs. (E) The correspondence between the KEGG top five pathways and genes.

Through differentially expressed genes analysis, 229 DEGs were obtained, including 105 DEGs markedly upregulated and 124 DEGs significantly downregulated (Fig 9B). Following that, GO and KEGG analyses of DEGs were performed to further interpret the clustering results from the perspective of fundamental biological processes. The top ten results of GO enrichment analyses were exhibited, including BP, CC, and MF (Fig 9C). The BP indicated the enrichment function of the regulation of peptidase activity and response to peptide hormone. Meanwhile, the CC showed that the DEGs were primarily correlated with the collagen-containing extracellular matrix, apical part of the cell, and apical plasma membrane. For MF, extracellular matrix structural constituent, receptor ligand activity, and signaling receptor activator activity were mainly enriched for the DEGs. Additionally, KEGG analysis showed that the DEGs were primarily involved in inflammation, immunity, and infectious diseases (Fig 9D).

According to KEGG enrichment analysis, the top 5 significant pathways of DEGs and related genes were identified, including Cytokine-cytokine receptor interaction, IL-17 signaling pathway, Viral protein interaction with cytokine and cytokine receptor, Amoebiasis and Pertussis (Fig 9E) (Table 3).

Table 3. The KEGG enrichment analysis table of the DEGs of the two subtypes.

ID Description GeneRatio BgRatio pvalue p.adjust geneID
hsa04657 IL-17 signaling pathway 9/110 94/9180 1.67E-06 0.000330056 LCN2/MUC5B/S100A7/CCL2/CCL20/PTGS2/FOS/IL6/CXCL1
hsa04061 Viral protein interaction with cytokine and cytokine receptor 8/110 100/9180 2.45E-05 0.001604571 CXCL12/IL22RA1/CCL28/CCL2/CXCR4/CCL20/IL6/CXCL1
hsa05146 Amoebiasis 8/110 102/9180 2.84E-05 0.001604571 NOS2/FN1/IL1R2/LAMC2/SERPINB4/IL6/CXCL1/SERPINB3
hsa05133 Pertussis 7/110 76/9180 3.24E-05 0.001604571 C4BPA/NOS2/CASP1/C4BPB/C1S/FOS/IL6
hsa04060 Cytokine-cytokine receptor interaction 13/110 295/9180 4.84E-05 0.00183806 CXCL12/LIFR/IL1R2/CXCL17/IL22RA1/CCL28/GHR/CCL2/CXCR4/CCL20/TNFRSF11B/IL6/CXCL1

Table 3 demonstrated that KEGG top 5 pathways ranked in ascending order of p-value, and the relevant information including ID, Description, GeneRatio, BgRatio, pvalue, p.adjust, qvalue, geneID, and Count.

Prediction of miRNAs and transcription factors

To determine the upstream TFs and miRNAs of hub genes, 56 TFs and 49 miRNAs were obtained via the RegNetwork repository (https://regnetworkweb.org/), with a vast network established to present enhanced co-regulatory patterns using Cytoscape (Fig 10). MUC1, in the core position of the network, was regulated by 33 TFs and 27 miRNAs. For SEMA5A and MUC1, the common TF was SP1 and MEF2A, and miRNA was hsa-miR-519e. MUC1 and ADAMTS1 had four common TF, including TFAP2A, STAT1, STAT3, and CTCF. GXYLT2 had only one upstream miRNA has-miR-37 and two TF HNF4A and NR2F1.

Fig 10. TF–miRNA co-regulatory network analysis, with red nodes representing hub genes, and blue nodes indicating TFs and miRNAs.

Fig 10

Discussion

Glycosylation is the process of attaching various sugars to proteins through glycosidic bonds, representing the most prevalent post-translational modification across all cellular organisms. Glycosylation enhances the stability of proteins, primarily involving N-glycans and O-glycans, with enzymes overseeing the entire process [10]. O-GlcNAc transferase (OGT) adds the O-linked β-N-acetylglucosamine (O-GlcNAc) monosaccharides to the serine or threonine residues of nuclear or cytoplasmic protein [35]. O-GlcNAcase (OGA) can then remove the monosaccharide reversibly [36]. Most protein glycosylation occurs in the endoplasmic reticulum (ER) and Golgi. O-glycosylation, in particular, regulates immune cells’ development, homeostasis, and functions [37, 38].

As one type of IBD, UC is characterized by an abnormal immune response to the gut microbiota. The prevalence of UC is escalating year by year [39]. Mucosal lesions usually originate in the rectum and may spread to the entire colon as the disease progresses [3]. The extra-intestinal manifestations also influence the quality of life and even cause disability, including anemia, arthropathy, metabolic bone disease, and hepatobiliary disease [40]. Immune cells play an essential role in the occurrence and development of UC. Antigen-presenting cells (APCs), such as macrophages and dendritic cells (DCs), can recognize antigens and initiate the immune response by releasing cytokines like IL-12. L-12 is instrumental in driving the differentiation of Th1 cells, which in turn secrete the pro-inflammatory cytokines TNF-α, IFN-γ, and IL-2 [41, 42].

Gut microbiota influences intestinal physiology and emphasizes the potential of bacterial OGAs as a promising therapeutic strategy in colonic inflammation by hydrolyzing O-GlcNAcylated proteins [43]. Moreover, Qian-Hui Sun et al. identified increased O-GlcNAc level in the gut epithelium of AIEC LF82-infected mice and CD patients, linking the change to intestinal inflammation [17]. In addition, in dextran sodium sulfate (DSS)-induced colitis and azoxymethane (AOM)/DSS-induced CAC mice models, the O-GlcNAcylation of colonic tissues was also elevated. Compared to normal colonic tissues, human CAC tissues’ O-GlcNAcylation was increased [44]. Many studies have implicated O-GlcNAcylation as a contributing factor in the promotion of chronic colonic inflammation.

However, the relationship between O-GlcNAcylation and UC has not been well-studied, making it necessarily important to explore the specific molecular mechanism of O-GlcNAcylation in UC. Herein, efforts were made to determine the possible role of O-glycosylation in UC through bioinformatic analysis. Specifically, GSE75214 and GSE9241 datasets downloaded from GEO were analyzed to identify DEGs in UC patients. The GO and KEGG analyses revealed that the DEGs were enriched in immunity, inflammation, and cytokine signaling pathways.

To explore the relationship between O-GlcNAcylation and UC, the DEGs in UC were intersected with 151 O-GlcNAcylation-related genes. A total of 7 DEGs were detected. Furthermore, GO and KEGG were conducted for the seven DEGs. Through LASSO, SVM-RFE, and RF algorithm, four O-glycosylation-related hub DEGs in UC were screened, including MUC1, ADAMTS1, GXYLT2, and SEMA5A. The AUC values of the four hub genes were high, demonstrating these four hub genes as potential target genes for treating UC through O-glycosylation. This offered a new direction for exploring the role of O-glycosylation in UC. Given the importance of immunity in UC, an immune infiltration analysis was further performed. The results revealed a significant difference between the normal group and UC patients. UC patients had a higher level of DCs, Th1 cells, Treg, B cell, CD4+ T cell, CD8+ T cell, macrophage and neutrophil compared to their normal counterparts. The results were highly consistent with the results of previous studies, underscoring the importance of immune cells in the pathogenesis of UC.

MUC1, a member of the mucin family and a membrane-bound protein, is secreted by goblet and absorptive cells of the intestinal epithelium and plays a role in the mucus layer [45]. It is highly expressed in the epithelial mucosa of the gastrointestinal tract. Mucins are O-glycosylated proteins that can form protective mucous barriers [46]. The membrane shift and overexpression of MUC1 affect the prognosis of related malignant tumors, like colon cancer [47, 48]. Besides, MUC1 also holds considerable significance in intracellular signaling and immune regulation, especially colonic inflammation [4951]. Increased MUC1 expression is often associated with a decrease in the beneficial gut microbiota [52]. The breakdown of the mucus barrier and dysregulation of intestinal microflora can exacerbate the incidence and progression of UC. Yet, the precise mechanism of MUC1’s involvement in UC requires additional research.

ADAMTS1, a disintegrin-like, and metalloprotease with the thrombospondin type 1 motif, is a protein-coding gene whose related pathways are the diseases associated with O-glycosylation. It plays a vital role in inflammatory processes and the development of cancer [53, 54] and presents angiogenic inhibitor activity [55]. Compared to the standard group, the ADAMTS1 level in UC patients was hereby found to be higher and correlated with IL-17. IL-17 may damage the intestinal wall by promoting the expression of ADAMTS1 [56]. However, the mechanisms by which ADAMTS1 operates in UC are not fully understood and warrant additional research for clarification.

GXYLT2 (glucoside xylosyltransferase) encodes a xylosyltransferase, which catalyzes the addition of xylose to the O-glucose-modified residues of EGF repeats of Notch proteins [57]. Compared to quiescent UC, the expression of GXYLT2 in active UC is elevated, which facilitates the assessment of disease activity [58]. Meanwhile, Barnicle et al. found that methylated genes GXYLT2 differed between inflamed tissues and regular counterparts of UC patients, and the Wnt signaling pathway was involved [59]. The dysregulation of Wnt and Notch signaling pathways, associated with the proliferation and differentiation of intestinal stem cells (ISCs), induces cell overgrowth and malignant transformation. In UC, the inhibition of Wnt and overexpression of Notch induce the decrease of Paneth cells, thereby leading to intestinal barrier damage [60].

The final gene among the four central hub genes is SEMA5A, with a limited number of studies currently available on this gene. Using differentially expressed lncRNAs to predict target genes, Benhai Xiong et al. investigated the effects of extracellular vesicles (EVs) on the expression of Sema5a genes in DSS-treated mice. The results showed a considerable down-regulation of Sema5a gene expression [61]. Besides, axon guidance cue Sema5a may cause the expression of pro-inflammatory genes (TNF-α and IL-8) [62].

Currently, UC classification is primarily based on the severity of the disease, categorized as mild, moderate, and severe. Classification at the genetic level, however, remains understudied. Therefore, UC was hereby further grouped into two subtypes using unsupervised consensus clustering. This classification was based on the expression of the four hub genes, utilizing machine learning methods and unsupervised clustering algorithms. The expression of the four hub genes varied significantly between the two subtypes. The four core gene expression trends in subtype B were the same as those of previous research. Furthermore, GSVA enrichment analysis showed that subtype A was enriched in various substance metabolisms, such as glucose metabolism, lipid metabolism, and amino acid metabolism. In contrast, subtype B was significantly enriched in cancer-related pathways, like colorectal cancer. This significant finding suggested that subtype B of UC could potentially progress to CAC. Moreover, GO and KEGG enrichment analyses of subtype A and B DEGs were conducted to further identify the differences between the two subtypes. The results enriched in cytokine-related signaling pathways and immune-related diseases, underscore the significance of distinguishing between the two subtypes, calling for further research.

In conclusion, two subtypes of UC were hereby confirmed. Each possessed distinct molecular features, biological behavior, and clinical characteristics. Overall, the classification provides a basis for further studies about the therapy and prognosis of UC. However, this study is still subjected to several limitations. Firstly, a comparative analysis of survival curves for the two UC subtypes was not conducted. Secondly, the study relied solely on bioinformatic methods, which have yet to be experimentally validated. The sampling method did not exclude the effects of age, gender, disease severity, complications, and therapeutic approaches. Further efforts could be made to compare the hub gene expression level between the mice with DSS and the control group, UC patients, and healthy individuals, and to explore the prognosis of different subtypes of UC and their effects on CAC.

In summary, the research warrants in-depth exploration to demonstrate the further mechanisms of O-GlcNAcylation, the expression of the hub genes, and the clinical significance of the two subtypes in UC.

Supporting information

S1 Fig. The two databases underwent normalization and subsequent merging.

(A-B) GSE75214 and GSE92415 combined and used R packages "limma" and "sva" to remove batch effects, resulting in 16,467 genes and 216 samples. A: before merging; and B: after merging. (C-D) R language preprocessCore package homogenized the dataset; C: before homogenization; and D: after homogenization.

(TIF)

pone.0311495.s001.tif (4.5MB, tif)
S2 Fig. The correlation analysis between the four hub genes and all other genes.

The positive correlation between the top 50 genes and 4 hub genes was displayed using heatmaps.

(TIF)

pone.0311495.s002.tif (7.1MB, tif)
S1 File. DEGs of GSE75214 and GSE92415.

The genes from both the UC patients and healthy individuals in the GSE75214 dataset were merged with those from GSE92415 to form a comprehensive data set. The batch effect was then eliminated to minimize discrepancies between the different datasets. There were a total of 16467 genes after combination.

(CSV)

pone.0311495.s003.csv (1.8MB, csv)
S2 File. R code.

This is the R language code used by the bioinformatics method involved in this study.

(DOCX)

pone.0311495.s004.docx (44.6KB, docx)

Acknowledgments

The authors would like to thank Shuyuan Zhang of the First Affiliated Hospital of Harbin Medical University for helpful discussions on topics related to this work. Besides, I could not have completed this dissertation without the support of my friends Dr. Yonghui Wu, who provided stimulating discussions and creative ideas.

Data Availability

The data underlying the results presented in the study are available from the GEO database (www.ncbi.nlm.nih.gov/geo/) at the following accession numbers: Accession Number GSE75214 - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75214 Accession Number GSE92415 - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92415

Funding Statement

This study was funded by the Natural Science Foundation of Heilongjiang Province, LH2020H037, Hongyu Xu.

References

  • 1.Zhang S. Z., Zhao X. H. & Zhang D. C. Cellular and molecular immunopathogenesis of ulcerative colitis. Cell Mol Immunol 3, 35–40 (2006). [PubMed] [Google Scholar]
  • 2.Lavelle A, Sokol H. Gut microbiota-derived metabolites as key actors in inflammatory bowel disease. Nat Rev Gastroenterol Hepatol. 2020; 17:223–37. doi: 10.1038/s41575-019-0258-z [DOI] [PubMed] [Google Scholar]
  • 3.Le Berre C, Honap S, Peyrin-Biroulet L. Ulcerative colitis. Lancet. 2023; 402:571–84. doi: 10.1016/S0140-6736(23)00966-2 [DOI] [PubMed] [Google Scholar]
  • 4.Na YR, Stakenborg M, Seok SH, Matteoli G. Macrophages in intestinal inflammation and resolution: a potential therapeutic target in IBD. Nat Rev Gastroenterol Hepatol. 2019; 16:531–43. doi: 10.1038/s41575-019-0172-4 [DOI] [PubMed] [Google Scholar]
  • 5.Mudter J, Neurath MF. Il-6 signaling in inflammatory bowel disease: pathophysiological role and clinical relevance. Inflamm Bowel Dis. 2007; 13:1016–23. doi: 10.1002/ibd.20148 [DOI] [PubMed] [Google Scholar]
  • 6.Mitsialis V, Wall S, Liu P, Ordovas-Montanes J, Parmet T, Vukovic M, et al. Single-Cell Analyses of Colon and Blood Reveal Distinct Immune Cell Signatures of Ulcerative Colitis and Crohn’s Disease. Gastroenterology. 2020; 159:591–608.e10. doi: 10.1053/j.gastro.2020.04.074 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Brazil JC, Parkos CA. Finding the sweet spot: glycosylation mediated regulation of intestinal inflammation. Mucosal Immunol. 2022; 15:211–22. doi: 10.1038/s41385-021-00466-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Biermann MH, Griffante G, Podolska MJ, Boeltz S, Stürmer J, Muñoz LE, et al. Sweet but dangerous ‐ the role of immunoglobulin G glycosylation in autoimmunity and inflammation. Lupus. 2016; 25:934–42. doi: 10.1177/0961203316640368 [DOI] [PubMed] [Google Scholar]
  • 9.Theodoratou E, Campbell H, Ventham NT, Kolarich D, Pučić-Baković M, Zoldoš V, et al. The role of glycosylation in IBD. Nat Rev Gastroenterol Hepatol. 2014; 11:588–600. doi: 10.1038/nrgastro.2014.78 [DOI] [PubMed] [Google Scholar]
  • 10.Eichler J. Protein glycosylation. Curr Biol. 2019. 29(7): R229–R231. doi: 10.1016/j.cub.2019.01.003 [DOI] [PubMed] [Google Scholar]
  • 11.Torres CR, Hart GW. Topography and polypeptide distribution of terminal N-acetylglucosamine residues on the surfaces of intact lymphocytes. Evidence for O-linked GlcNAc. J Biol Chem. 1984; 259:3308–17. [PubMed] [Google Scholar]
  • 12.Magalhães A, Duarte HO, Reis CA. The role of O-glycosylation in human disease. Mol Aspects Med. 2021. 79: 100964. doi: 10.1016/j.mam.2021.100964 [DOI] [PubMed] [Google Scholar]
  • 13.Kearse KP, Hart GW. Lymphocyte activation induces rapid changes in nuclear and cytoplasmic glycoproteins. Proc Natl Acad Sci U S A. 1991; 88:1701–5. doi: 10.1073/pnas.88.5.1701 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Li T, Li X, Attri KS, Liu C, Li L, Herring LE, et al. O-GlcNAc Transferase Links Glucose Metabolism to MAVS-Mediated Antiviral Innate Immunity. Cell Host Microbe. 2018; 24:791–803.e6. doi: 10.1016/j.chom.2018.11.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Li X, Gong W, Wang H, Li T, Attri KS, Lewis RE, et al. O-GlcNAc Transferase Suppresses Inflammation and Necroptosis by Targeting Receptor-Interacting Serine/Threonine-Protein Kinase 3. Immunity. 2019; 50:576–90.e6. doi: 10.1016/j.immuni.2019.01.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Lund PJ, Elias JE, Davis MM. Global Analysis of O-GlcNAc Glycoproteins in Activated Human T Cells. J Immunol. 2016; 197:3086–98. doi: 10.4049/jimmunol.1502031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sun QH, Wang YS, Liu G, Zhou HL, Jian YP, Liu MD, et al. Enhanced O-linked Glcnacylation in Crohn’s disease promotes intestinal inflammation. EBioMedicine. 2020; 53:102693. doi: 10.1016/j.ebiom.2020.102693 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Li X, Zhang Z, Li L, Gong W, Lazenby AJ, Swanson BJ, et al. Myeloid-derived cullin 3 promotes STAT3 phosphorylation by inhibiting OGT expression and protects against intestinal inflammation. J Exp Med. 2017; 214:1093–109. doi: 10.1084/jem.20161105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Liu F, Iqbal K, Grundke-Iqbal I, Hart GW, Gong CX. O-GlcNAcylation regulates phosphorylation of tau: a mechanism involved in Alzheimer’s disease. Proc Natl Acad Sci U S A. 2004; 101:10804–9. doi: 10.1073/pnas.0400348101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wei J, Chen C, Feng J, Zhou S, Feng X, Yang Z, et al. Muc2 mucin O-glycosylation interacts with enteropathogenic Escherichia coli to influence the development of ulcerative colitis based on the NF-kB signaling pathway. J Transl Med. 2023; 21:793. doi: 10.1186/s12967-023-04687-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Fu J, Wei B, Wen T, Johansson ME, Liu X, Bradford E, et al. Loss of intestinal core 1-derived O-glycans causes spontaneous colitis in mice. J Clin Invest. 2011; 121:1657–66. doi: 10.1172/JCI45538 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Kudelka MR, Stowell SR, Cummings RD, Neish AS. Intestinal epithelial glycosylation in homeostasis and gut microbiota interactions in IBD. Nat Rev Gastroenterol Hepatol. 2020. 17(10): 597–617. doi: 10.1038/s41575-020-0331-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Johansson ME, Larsson JM, Hansson GC. The two mucus layers of colon are organized by the MUC2 mucin, whereas the outer layer is a legislator of host-microbial interactions. Proc Natl Acad Sci U S A. 2011. 108 Suppl 1(Suppl 1): 4659–65. doi: 10.1073/pnas.1006451107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015; 43:e47. doi: 10.1093/nar/gkv007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012; 28:882–3. doi: 10.1093/bioinformatics/bts034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Yu G, Wang LG, Han Y, He QY. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012; 16:284–7. doi: 10.1089/omi.2011.0118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 2017; 45:D362-362D368. doi: 10.1093/nar/gkw937 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ellis K, Kerr J, Godbole S, Lanckriet G, Wing D, Marshall S. A random forest classifier for the prediction of energy expenditure and type of physical activity from wrist and hip accelerometers. Physiol Meas. 2014; 35:2191–203. doi: 10.1088/0967-3334/35/11/2191 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Valkenborg D, Rousseau AJ, Geubbelmans M, Burzykowski T. Support vector machines. Am J Orthod Dentofacial Orthop. 2023; 164:754–7. doi: 10.1016/j.ajodo.2023.08.003 [DOI] [PubMed] [Google Scholar]
  • 30.Obuchowski NA, Bullen JA. Receiver operating characteristic (ROC) curves: review of methods with applications in diagnostic medicine. Phys Med Biol. 2018; 63:07TR01. doi: 10.1088/1361-6560/aab4b1 [DOI] [PubMed] [Google Scholar]
  • 31.Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011; 12:77. doi: 10.1186/1471-2105-12-77 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005; 102:15545–50. doi: 10.1073/pnas.0506580102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics. 2013; 14:7. doi: 10.1186/1471-2105-14-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003; 13:2498–504. doi: 10.1101/gr.1239303 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Hart GW, Slawson C, Ramirez-Correa G, Lagerlof O. Cross talk between O-GlcNAcylation and phosphorylation: roles in signaling, transcription, and chronic disease. Annu Rev Biochem. 2011; 80:825–58. doi: 10.1146/annurev-biochem-060608-102511 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wani WY, Chatham JC, Darley-Usmar V, McMahon LL, Zhang J. O-GlcNAcylation and neurodegeneration. Brain Res Bull. 2017; 133:80–7. doi: 10.1016/j.brainresbull.2016.08.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Chang YH, Weng CL, Lin KI. O-GlcNAcylation and its role in the immune system. J Biomed Sci. 2020; 27:57. doi: 10.1186/s12929-020-00648-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Qiang A, Slawson C, Fields PE. The Role of O-GlcNAcylation in Immune Cell Activation. Front Endocrinol (Lausanne). 2021; 12:596617. doi: 10.3389/fendo.2021.596617 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Du L, Ha C. Epidemiology and Pathogenesis of Ulcerative Colitis. Gastroenterol Clin North Am. 2020; 49:643–54. doi: 10.1016/j.gtc.2020.07.005 [DOI] [PubMed] [Google Scholar]
  • 40.Magro F, Gionchetti P, Eliakim R, Ardizzone S, Armuzzi A, Barreiro-de Acosta M, et al. Third European Evidence-based Consensus on Diagnosis and Management of Ulcerative Colitis. Part 1: Definitions, Diagnosis, Extra-intestinal Manifestations, Pregnancy, Cancer Surveillance, Surgery, and Ileo-anal Pouch Disorders. J Crohns Colitis. 2017; 11:649–70. doi: 10.1093/ecco-jcc/jjx008 [DOI] [PubMed] [Google Scholar]
  • 41.Geremia A, Biancheri P, Allan P, Corazza GR, Di Sabatino A. Innate and adaptive immunity in inflammatory bowel disease. Autoimmun Rev. 2014; 13:3–10. doi: 10.1016/j.autrev.2013.06.004 [DOI] [PubMed] [Google Scholar]
  • 42.Saez A, Gomez-Bris R, Herrero-Fernandez B, Mingorance C, Rius C, Gonzalez-Granado JM. Innate Lymphoid Cells in Intestinal Homeostasis and Inflammatory Bowel Disease. Int J Mol Sci. 2021; 22. doi: 10.3390/ijms22147618 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.He X, Gao J, Peng L, Hu T, Wan Y, Zhou M, et al. Bacterial O-GlcNAcase genes abundance decreases in ulcerative colitis patients and its administration ameliorates colitis in mice. Gut. 2021; 70:1872–83. doi: 10.1136/gutjnl-2020-322468 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Yang YR, Kim DH, Seo YK, Park D, Jang HJ, Choi SY, et al. Elevated O-GlcNAcylation promotes colonic inflammation and tumorigenesis by modulating NF-κB signaling. Oncotarget. 2015; 6:12529–42. doi: 10.18632/oncotarget.3725 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Vancamelbeke M, Vanuytsel T, Farré R, Verstockt S, Ferrante M, Van Assche G, et al. Genetic and Transcriptomic Bases of Intestinal Epithelial Barrier Dysfunction in Inflammatory Bowel Disease. Inflamm Bowel Dis. 2017; 23:1718–29. doi: 10.1097/MIB.0000000000001246 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Hattrup CL, Gendler SJ. Structure and function of the cell surface (tethered) mucins. Annu Rev Physiol. 2008; 70:431–57. doi: 10.1146/annurev.physiol.70.113006.100659 [DOI] [PubMed] [Google Scholar]
  • 47.Chen W, Zhang Z, Zhang S, Zhu P, Ko JK, Yung KK. MUC1: Structure, Function, and Clinic Application in Epithelial Cancers. Int J Mol Sci. 2021; 22. doi: 10.3390/ijms22126567 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Sun Y, Fan L, Mian W, Zhang F, Liu X, Tang Y, et al. Modified apple polysaccharide influences MUC-1 expression to prevent ICR mice from colitis-associated carcinogenesis. Int J Biol Macromol. 2018; 120:1387–95. doi: 10.1016/j.ijbiomac.2018.09.142 [DOI] [PubMed] [Google Scholar]
  • 49.Murwanti R, Denda-Nagai K, Sugiura D, Mogushi K, Gendler SJ, Irimura T. Prevention of Inflammation-Driven Colon Carcinogenesis in Human MUC1 Transgenic Mice by Vaccination with MUC1 DNA and Dendritic Cells. Cancers (Basel). 2023; 15. doi: 10.3390/cancers15061920 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Long L, Huang X, Yu S, Fan J, Li X, Xu R, et al. The research status and prospects of MUC1 in immunology. Hum Vaccin Immunother. 2023; 19:2172278. doi: 10.1080/21645515.2023.2172278 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Nishida A, Lau CW, Zhang M, Andoh A, Shi HN, Mizoguchi E, et al. The membrane-bound mucin Muc1 regulates T helper 17-cell responses and colitis in mice. Gastroenterology. 2012; 142:865–74.e2. doi: 10.1053/j.gastro.2011.12.036 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Xu S, Li X, Zhang S, Qi C, Zhang Z, Ma R, et al. Oxidative stress gene expression, DNA methylation, and gut microbiota interaction trigger Crohn’s disease: a multi-omics Mendelian randomization study. BMC Med. 2023; 21:179. doi: 10.1186/s12916-023-02878-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Kuno K, Kanada N, Nakashima E, Fujiki F, Ichimura F, Matsushima K. Molecular cloning of a gene encoding a new type of metalloproteinase-disintegrin family protein with thrombospondin motifs as an inflammation associated gene. J Biol Chem. 1997; 272:556–62. doi: 10.1074/jbc.272.1.556 [DOI] [PubMed] [Google Scholar]
  • 54.Tan Ide A, Ricciardelli C, Russell DL. The metalloproteinase ADAMTS1: a comprehensive review of its role in tumorigenic and metastatic pathways. Int J Cancer. 2013; 133:2263–76. doi: 10.1002/ijc.28127 [DOI] [PubMed] [Google Scholar]
  • 55.Schrimpf C, Xin C, Campanholle G, Gill SE, Stallcup W, Lin SL, et al. Pericyte TIMP3 and ADAMTS1 modulate vascular stability after kidney injury. J Am Soc Nephrol. 2012; 23:868–83. doi: 10.1681/ASN.2011080851 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Buran T, Batır MB, Çam FS, Kasap E, Çöllü F, Çelebi H, et al. Molecular analyses of ADAMTS-1, -4, -5, and IL-17 a cytokine relationship in patients with ulcerative colitis. BMC Gastroenterol. 2023; 23:345. doi: 10.1186/s12876-023-02985-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Sethi MK, Buettner FF, Krylov VB, Takeuchi H, Nifantiev NE, Haltiwanger RS, et al. Identification of glycosyltransferase 8 family members as xylosyltransferases acting on O-glucosylated notch epidermal growth factor repeats. J Biol Chem. 2010; 285:1582–6. doi: 10.1074/jbc.C109.065409 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Zeng Z, Mukherjee A, Zhang H. From Genetics to Epigenetics, Roles of Epigenetics in Inflammatory Bowel Disease. Front Genet. 2019; 10:1017. doi: 10.3389/fgene.2019.01017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Barnicle A, Seoighe C, Greally JM, Golden A, Egan LJ. Inflammation-associated DNA methylation patterns in epithelium of ulcerative colitis. Epigenetics. 2017; 12:591–606. doi: 10.1080/15592294.2017.1334023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Hou Q, Huang J, Ayansola H, Masatoshi H, Zhang B. Intestinal Stem Cells and Immune Cell Relationships: Potential Therapeutic Targets for Inflammatory Bowel Diseases. Front Immunol. 2020; 11:623691. doi: 10.3389/fimmu.2020.623691 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Du C, Wang K, Zhao Y, Nan X, Chen R, Quan S, et al. Supplementation with Milk-Derived Extracellular Vesicles Shapes the Gut Microbiota and Regulates the Transcriptomic Landscape in Experimental Colitis. Nutrients. 2022; 14. doi: 10.3390/nu14091808 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Sugimoto M, Fujikawa A, Womack JE, Sugimoto Y. Evidence that bovine forebrain embryonic zinc finger-like gene influences immune response associated with mastitis resistance. Proc Natl Acad Sci U S A. 2006; 103:6454–9. doi: 10.1073/pnas.0601015103 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Ashutosh Pandey

9 May 2024

PONE-D-24-08842Identification of O-Glycosylation related genes and subtypes in Ulcerative Colitis based on machine learningPLOS ONE

Dear Dr. Xu,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jun 22 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Ashutosh Pandey, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In the present manuscript, the authors attempted to demonstrate the role of O-Glycosylation-related genes and subtypes like MUC1, ADAMTS1, GXYLT2, and SEMA5A in Ulcerative Colitis using bioinformatics approaches. The manuscript is good but required major revisions prior to the acceptance.

1. The quality of the writing is extremely poor and lacks scientific explanation. The choice of words for discussing any information requires more background studies and details technical descriptions. The abstract needed to be reframed and the introduction also required substantial corrections.

2. The background knowledge for the selection of the two different datasets for the study viz., GSE75214 and GSE92415 is not clear. The patient data from the selected population needed to be properly demonstrated. The authors are requested to follow https://doi.org/10.1371/journal.pone.0289064 for their reference. It was found that one of the datasets GSE92415 contained the data for UC patients treated with Golimumab, it is not mentioned anywhere in the manuscript and the rationale behind the selection of this particular study is not clear in the manuscript. It needed to be addressed properly.

3. The selection of DEGs from the two discrete datasets is not clear. The authors should either first select the common DEGs between both the datasets and then screen for the Upregulated and downregulated genes or clearly describe their criteria for the selection process. The statistical significance and relevance of the screening and functional enrichment needed to be justified properly. Similar conditions apply to the LASSO analysis.

4. The resolution of the figures needed to be modified along with the figure captions. The font and size of the text used in the figure are unclear and need to be modified.

5. Regarding the establishment of the role of the critical genes in CRC/CAC, the authors may explore the differential expression of these critical genes for human CRC using the TCGA data. The authors can follow https://doi.org/10.1016/j.humgen.2023.201189 and https://doi.org/10.3389/fgene.2021.608313 for their reference.

Reviewer #2: The study aims to investigate the role of altered intestinal glycosylation, particularly O-GlcNAcylation modification of proteins in the pathogenesis of Ulcerative colitis (UC) which could have potential implications for diagnosis and treatment. For this purpose, the study effectively integrates previously published transcriptional and clinical data by employing various machine learning methods. The article identifies MUC1, ADAMTS1, GXYLT2 and SEMA5A as significantly associated with UC-related O-GlcNAcylation. The authors propose two UC subtypes based on the expression of these four hub genes. Interestingly, subtype B (defined by elevated expression of ADAMTS1, GXYLT2 and SEMA5A) shows a potential predisposition to colitis-associated colorectal cancer (CAC), providing valuable insights into disease progression.

Strengths:

This study used a systemic data-driven approach to address an important question of the role of O-GlcNAcylation in UC pathogenesis and progression, offering a foundation for further research. The identification of two UC subtypes based on the four hub genes represents a significant contribution to the field.

Concerns:

I have a few concerns and suggestions regarding this article in its present form, most which stem from the presentation of the results (such as by adding key details of the bioinformatic analysis to the text). Addressing these concerns will enhance the article’s clarity and enable the readers to better assess the study’s findings.

Major Issues:

1. Gene lists should be provided for differential expression analysis along with logFC values and p-values (e.g. for Figure 1 and 2).

2. For all figures, make sure to specify a legend for the scale used and any relevant cutoffs for fold change or p-value in the figure description (e.g for the heatmap in Figure 1). Furthermore, make sure to provide which figure you are referring to in the text (line 418).

3. There is some ambiguity in the description of analysis (e.g lines lines 427 to 429). For figure 4A-C, specify which tag genes were identified by each of the algorithms in the figures. The number of genes identified is inconsistent with the in-text description of the figure (lines 128 to 131). Furthermore, Figure 4E could be described in more detail in the text (specify positive/negative correlation or the exact correlation value).

4. Relevant citations should be provided throughout the paper for all claims, (e.g. lines 25-30 and 43-45). Also provide references for packages and databases used (e.g. lines 152 and 427).

5. The discussion of the results in the context of prior literature on UC would be of immediate interest for biologists and clinicians. In this regard, further development of the implications of the results can strengthen the paper.

Minor Issues:

1. The abstract and introduction mention the etiology of UC and the role of O-GlcNAcylation in various inflammatory diseases. However, the context from prior literature that the authors already provide could be strengthened by providing further details about which immune cells are known to play a role in promoting mucosal immune and inflammatory responses in UC.

2. In the introduction, the authors mention that the level of protein O-GlcNAc changes (lines 32-35). Discuss these alterations and their implication further.

3. In the results section, the article should introduce the two datasets before jumping into the analysis in the results section. Explicitly state what the differential expression analysis is comparing (e.g. UC vs. healthy controls in line 51) to better guide the reader.

4. Make sure the nomenclature for human genes/transcripts/proteins is correct throughout the manuscript text and figures (e.g. in gene lists provided adjacent to heatmaps).

5. Discuss the limitations of the study and suggest future experiments that can validate the findings.

6. Providing a table of the pathways along with p-values for the pathway enrichment analysis would be helpful.

7. Some of the ambiguity in the analysis may be addressed by providing the code for replication by others.

8. Language can be improved for clarity (e.g. lines 23-25). Please also proofread for typos (e.g. line 26 and 29).

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Anukriti Singh

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Dec 31;19(12):e0311495. doi: 10.1371/journal.pone.0311495.r002

Author response to Decision Letter 0


19 Jun 2024

Dear Editor and Reviewer:

Thank you for your decisions and constructive comments on our manuscript entitled “Identification of O-Glycosylation related genes and subtypes in Ulcerative Colitis based on machine learning”. Those comments are all valuable and very helpful for revising and improving our paper. We have studied comments carefully and have made corrections which we hope meet with approval. The reviewers’ comments are laid out below in italicized font and specific concerns have been numbered. Our response is given in normal font and changes/additions to the manuscript are given in the red text.

Responses to Reviewers comments:

Reviewer #1:

1. The quality of the writing is extremely poor and lacks scientific explanation. The choice of words for discussing any information requires more background studies and details technical descriptions. The abstract needed to be reframed and the introduction also required substantial corrections.

We apologize for the poor language of our manuscript. We have now worked on both language and readability, and really hope that the flow and language level have been substantially improved.

2.The background knowledge for the selection of the two different datasets for the study viz., GSE75214 and GSE92415 is not clear. The patient data from the selected population needed to be properly demonstrated. The authors are requested to follow https://doi.org/10.1371/journal.pone.0289064 for their reference. It was found that one of the datasets GSE92415 contained the data for UC patients treated with Golimumab, it is not mentioned anywhere in the manuscript and the rationale behind the selection of this particular study is not clear in the manuscript. It needed to be addressed properly.

We are sorry for our carelessness. In the methods section, we provide detailed background on the databases GSE75214 and GSE92415. In addition, the data for UC patients treated with Golimumab were not included in our study (Line 77-87).

This study incorporated two datasets, GSE75214 (provided by the GPL6244 platform) and GSE92415 (provided by the GPL13158 platform). The GSE75214 database contains intestinal mucosal biopsies obtained endoscopically from UC patients (n=97) and healthy controls (n=11), and subsequently analyzed for gene expression by microarray. GSE92415 database enrolled 21 healthy subjects and 162 UC patients, including baseline before treatment (n=87) and post-treatment individuals (n=75), to evaluate the effect of golimumab (GLM) during induction treatment in moderately to severe UC. In this study, we selected only 87 untreated UC samples and 21 healthy control samples.

3.The selection of DEGs from the two discrete datasets is not clear. The authors should either first select the common DEGs between both the datasets and then screen for the Upregulated and downregulated genes or clearly describe their criteria for the selection process. The statistical significance and relevance of the screening and functional enrichment needed to be justified properly. Similar conditions apply to the LASSO analysis.

We apologize for the lack of clear instructions on the differential genes screening process. In order to increase the sample size, we included multiple datasets. However, there were batch effects among these datasets due to various factors, such as experimental time, batch, laboratory, and sample processing method. These batch effects were combined and eliminated to minimize technical differences and ensure data reliability and comparability. We combined GSE75214 and GSE92415 datasets and performed differential analysis using the R language limma package. According to the criteria of |logFC|>1 and adj.P. Val<0.05, 449 genes were up-regulated, and 233 genes were down-regulated.

4.The resolution of the figures needed to be modified along with the figure captions. The font and size of the text used in the figure are unclear and need to be modified.

We modified the captions of several figures to make them more accurate, like Figure 4, and the resolutions are also adjusted appropriately. If any figure is not up to standard, please point it out and we will carefully modify it until it meets the PLOS ONE's standards. We sincerely thank the editor and all reviewers for the valuable feedback.

5. Regarding the establishment of the role of the critical genes in CRC/CAC, the authors may explore the differential expression of these critical genes for human CRC using the TCGA data. The authors can follow https://doi.org/10.1016/j.humgen.2023.201189 and https://doi.org/10.3389/fgene.2021.608313 for their reference.

We compared the expression of four hub genes in the TCGA and GEPIA databases, and found the differential expression is not exactly consistent with our study, as shown in the following pictures. For CRC studies, the expression levels of four key genes were also different in the two databases. In the TCGA database, the expression levels of four key genes in CRC were lower than those in the control group. Whereas, in the GEPIA database, MUC1, GXYLT2, and SEMA5A were highly expressed in CRC, thereinto SEMA5A had an obvious difference. ADAMTS1 was expressed at a low level in colorectal cancer. However, it is worth noting that the four core gene expression trends in subtype B were the same as those in previous research on UC, which we discussed amply in the Discussion section.

Colorectal cancer (CRC) and colitis-associated colorectal cancer (CAC) have crucial differences. CAC occurs due to complications of chronic inflammatory disorders of the colon, while most CRC arise from precancerous lesions or adenomatous polyps[1]. However, there are only CRC and no CAC data in TCGA, so we did not show the differential expression of these critical genes in the manuscript.

If the reviewers have better suggestions, we hope to strive for another opportunity to make modifications.

[1] Kasi A, Handa S, Bhatti S, Umar S, Bansal A, Sun W. Molecular Pathogenesis and Classification of Colorectal Carcinoma. Curr Colorectal Cancer Rep. 2020 Sep;16(5):97-106. doi: 10.1007/s11888-020-00458-z. Epub 2020 Aug 15. PMID: 32905465; PMCID: PMC7469945

Reviewer #2:

Thanks for your thoughtful review. In the results section, we do have a lot of shortcomings. We have made detailed revisions and added the details of bioinformatic analysis to make the logic more fluent and clearer.

1. Gene lists should be provided for differential expression analysis along with logFC values and p-values (e.g. for Figure 1 and 2).

Because the gene lists are too large to be displayed directly in the manuscript, they are uploaded as supplementary tables along with logFC values and p-values. In addition , we added the KEGG enrichment analysis tables of DEGs in the manuscript, as shown in Table 1 and 2.

2. For all figures, make sure to specify a legend for the scale used and any relevant cutoffs for fold change or p-value in the figure description (e.g for the heatmap in Figure 1). Furthermore, make sure to provide which figure you are referring to in the text (line 418).

We apologize for lack of rigor. In the revised manuscript, we have added the logFC values and p-values, mainly Figure 1 and 9. Regarding the original 418 line section, we have reorganized the language and added the figure to make the description clearer (Line 97-100).

3. There is some ambiguity in the description of analysis (e.g. lines 427 to 429). For figure 4A-C, specify which tag genes were identified by each of the algorithms in the figures. The number of genes identified is inconsistent with the in-text description of the figure (lines 128 to 131). Furthermore, Figure 4E could be described in more detail in the text (specify positive/negative correlation or the exact correlation value).

In the section on methods and results of machine learning, we have provided a more detailed explanation (Line 128-135). For Figure 4A-C, the genes identified by each algorithm are listed in the revised manuscript. The explanation of Figures 4E and F has also been added (Line 307-319).

4. Relevant citations should be provided throughout the paper for all claims, (e.g. lines 25-30 and 43-45). Also provide references for packages and databases used (e.g. lines 152 and 427).

Based on your suggestions, we have checked the literature carefully and added more references on the definition of Glycosylation and the relationship between O-Glycosylation and diseases into the introduction part of the revised manuscript (Line 40-64). In Materials and methods part, we hope the complements of R packages references and websites of the databases involved can improve the credibility of the paper and meet the requirements of your journal. Besides, the key R packages have been uploaded as supporting information named “R packages”.

5. The discussion of the results in the context of prior literature on UC would be of immediate interest for biologists and clinicians. In this regard, further development of the implications of the results can strengthen the paper.

At the beginning of the discussion, background information on ulcerative colitis was added (Line 510-521). The intestinal symptoms and complications of ulcerative colitis can reduce the quality of life of patients, so it is necessary to explore its pathogenesis to guide treatment. We hope that the text added can arouse the interest of readers and look forward to further guidance from the reviewers.

Minor Issues:

1. The abstract and introduction mention the etiology of UC and the role of O-GlcNAcylation in various inflammatory diseases. However, the context from prior literature that the authors already provide could be strengthened by providing further details about which immune cells are known to play a role in promoting mucosal immune and inflammatory responses in UC.

According to the suggestions of the two reviewers, the abstract and introduction of this paper have been greatly improved. In the introduction and discussion sections, the correlation between immune cells and UC was supplemented (Line 31-35).

2. In the introduction, the authors mention that the level of protein O-GlcNAc changes (lines 32-35). Discuss these alterations and their implication further.

Thank you for the comment, the alterations and implications of O-GlcNAc changes are further described in lines 47-52.

3. In the results section, the article should introduce the two datasets before jumping into the analysis in the results section. Explicitly state what the differential expression analysis is comparing (e.g. UC vs. healthy controls in line 51) to better guide the reader.

We apologize for not providing a detailed introduction to the two datasets. Due to adjusting the order of methods and results, the background information of the datasets is presented in the methods section (Line 77-87), including methods and R packages for merging two datasets and screening for differentially expressed genes (Line 89-100).

4. Make sure the nomenclature for human genes/transcripts/proteins is correct throughout the manuscript text and figures (e.g. in gene lists provided adjacent to heatmaps).

Thanks for your reminder, the nomenclature of human genes/transcripts/proteins in the text and figures are correct after checking.

5. Discuss the limitations of the study and suggest future experiments that can validate the findings.

In the original manuscript, the limitations were indeed a little brief. Now we have further discussed the limitations and added what experimental methods can be further supplemented in the future (Line 626-635).

6. Providing a table of the pathways along with p-values for the pathway enrichment analysis would be helpful.

The tables of the pathways along with p-values for the pathway enrichment analysis have been uploaded.

7. Some of the ambiguity in the analysis may be addressed by providing the code for replication by others.

We have provided the code by which the steps of this article can be repeated.

8. Language can be improved for clarity (e.g. lines 23-25). Please also proofread for typos (e.g. line 26 and 29).

We sincerely thank the reviewer for careful reading. In the resubmitted manuscript, the typo is revised, and the grammar is corrected.

We tried our best to improve the manuscript and made some changes, shown in detail in the file “Revised Manuscript with Track Changes”. In addition, we would like to further confirm the authors’ order of the paper “Identification of O-Glycosylation related genes and subtypes in Ulcerative Colitis based on machine learning”. The following three authors, Yue Lu, Yi Su, and Nan Wang, contributed equally to this work and be considered co-first authors. In addition, Dongyue Li was added as the 2nd set of equal contributors, whose e-mail is “lidongyue83@163.com”, and the author Shuyuan Zhang was deleted. We appreciate for editors and reviewers’ work earnestly, and hope correction will meet with approval. Once again, thank you very much for your comments and suggestions.

I appreciate your consideration. I am looking forward to hearing from you.

Sincerely,

Hongyu Xu

Institution and address: the First Affiliated Hospital of Harbin Medical University, 23 You Zheng Street, Nangang District, Harbin City, Heilongjiang Province, China

Telephone: 86-13903656899

E-mail: xuhongyu@ldy.edu.rs

Attachment

Submitted filename: Response to Reviewers.pdf

pone.0311495.s005.pdf (1.2MB, pdf)

Decision Letter 1

Ashutosh Pandey

18 Jul 2024

PONE-D-24-08842R1Identification of O-Glycosylation related genes and subtypes in Ulcerative Colitis based on machine learningPLOS ONE

Dear Dr. Xu,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that the revised version is improved but needs some revision to fully meet PLOS ONE’s publication criteria. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Sep 01 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Ashutosh Pandey, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: No

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: There are still come minor lapses in writing such as poor word choice and grammar, which should be fixed prior to publication. For instance, typos like "For visualization, the DEGs," (line 104) and numerous others should be corrected. Phrases like “involving three low expressions.” should be fixed to be informative and grammatically correct. Also the font changes give a sloppy appearance in "Identification of differential expressed genes".

Tables for GO analysis are not legible (exclude extraneous information) or shown as bar charts, with tables presented as supplementary information. Discussion has been thoroughly improved and comments sufficiently addressed.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: Anukriti Singh

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Dec 31;19(12):e0311495. doi: 10.1371/journal.pone.0311495.r004

Author response to Decision Letter 1


29 Aug 2024

Dear Editor and Reviewer:

We thank the reviewer for the kind consideration and constructive comments on our manuscript. We have addressed your concerned in a point-by-point manner below, and hope that you will find the added information suitable and sufficient for publication. The reviewers’ comments are laid out below in italicized font and specific concerns have been numbered. Our response is given in normal font.

1.Typos like "For visualization, the DEGs," (line 104) and numerous others should be corrected.

Grammatically, our expression lacks rigor and we have made changes to make it more precise.(line104-105)

2.Phrases like “involving three low expressions.” should be fixed to be informative and grammatically correct.

Our representation is not accurate and concise enough. Considering Figure 3 has clearly shown the differential expression of genes between UC patients and controls, we deleted “involving three low expressions.”, which does not affect the reader's understanding.  

3.The font changes give a sloppy appearance in "Identification of differential expressed genes".

In paragraph "Identification of differential expressed genes", the font is unified as “Times New Roman”, and the size is 14.

4.Tables for GO analysis are not legible (exclude extraneous information) or shown as bar charts, with tables presented as supplementary information.

According to the reviewer's suggestion, “qvalue” and “count” were deleted in the kegg tables to make the forms more concise.

We sincerely appreciate the time and effort invested by the reviewers in evaluating our manuscript. We are more than happy to make any further revisions to improve the paper and facilitate successful publication. Once again, thank you very much for your comments and suggestions.

Sincerely,

Hongyu Xu

Institution and address: the First Affiliated Hospital of Harbin Medical University, 23 You Zheng Street, Nangang District, Harbin City, Heilongjiang Province, China

Telephone: 86-13903656899

E-mail: xuhongyu@ldy.edu.rs

Attachment

Submitted filename: Response to Reviewers.docx

pone.0311495.s006.docx (14.5KB, docx)

Decision Letter 2

Ashutosh Pandey

18 Sep 2024

Identification of O-Glycosylation related genes and subtypes in Ulcerative Colitis based on machine learning

PONE-D-24-08842R2

Dear Dr. Xu,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Ashutosh Pandey, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Ashutosh Pandey

6 Dec 2024

PONE-D-24-08842R2

PLOS ONE

Dear Dr. Xu,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Ashutosh Pandey

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. The two databases underwent normalization and subsequent merging.

    (A-B) GSE75214 and GSE92415 combined and used R packages "limma" and "sva" to remove batch effects, resulting in 16,467 genes and 216 samples. A: before merging; and B: after merging. (C-D) R language preprocessCore package homogenized the dataset; C: before homogenization; and D: after homogenization.

    (TIF)

    pone.0311495.s001.tif (4.5MB, tif)
    S2 Fig. The correlation analysis between the four hub genes and all other genes.

    The positive correlation between the top 50 genes and 4 hub genes was displayed using heatmaps.

    (TIF)

    pone.0311495.s002.tif (7.1MB, tif)
    S1 File. DEGs of GSE75214 and GSE92415.

    The genes from both the UC patients and healthy individuals in the GSE75214 dataset were merged with those from GSE92415 to form a comprehensive data set. The batch effect was then eliminated to minimize discrepancies between the different datasets. There were a total of 16467 genes after combination.

    (CSV)

    pone.0311495.s003.csv (1.8MB, csv)
    S2 File. R code.

    This is the R language code used by the bioinformatics method involved in this study.

    (DOCX)

    pone.0311495.s004.docx (44.6KB, docx)
    Attachment

    Submitted filename: Response to Reviewers.pdf

    pone.0311495.s005.pdf (1.2MB, pdf)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pone.0311495.s006.docx (14.5KB, docx)

    Data Availability Statement

    The data underlying the results presented in the study are available from the GEO database (www.ncbi.nlm.nih.gov/geo/) at the following accession numbers: Accession Number GSE75214 - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75214 Accession Number GSE92415 - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92415


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES