Skip to main content
International Dental Journal logoLink to International Dental Journal
. 2025 Nov 19;76(1):104028. doi: 10.1016/j.identj.2025.104028

Machine Learning-Based Transcriptomic Diagnosis of Periodontitis

Ya’nan Cheng 1, Haiqiong Yang 1, Hui Mo 1, Pu Xu 1,
PMCID: PMC12666856  PMID: 41265166

Abstract

Background

Periodontitis, a prevalent chronic inflammatory disease, remains a global health challenge with conventional diagnostic methods hindered by subjectivity and low sensitivity. This study aimed to develop a machine learning (ML)-based diagnostic framework using transcriptomic data to enhance diagnostic accuracy and efficiency.

Methods

Transcriptomic datasets from 616 samples (452 periodontitis, 164 healthy controls) were retrieved from the Gene Expression Omnibus (GEO). Differentially expressed genes (DEGs) were identified, and functional enrichment, weighted gene co-expression network analysis (WGCNA), and immune infiltration profiling were performed. Key biomarkers were refined using Boruta and Least Absolute Shrinkage and Selection Operator (LASSO) algorithms. Independent six ML models were constructed and validated. A nomogram for risk prediction, transcription factor networks, and drug-target interactions were analysed.

Results

Five diagnostic biomarkers (CSF2RB, COL15A1, MME, NEFM, CYP24A1) were identified, with robust performance across datasets. The Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) achieved perfect classification in training and high accuracy in external validation. Immune infiltration analysis revealed significant correlations between biomarkers and immune cell populations (eg, dendritic cells, T cells). Transcription factor networks highlighted NFYA and SP1 as central regulators. Drug prediction identified re-purposable candidates with validated molecular docking affinity.

Conclusion

This study establishes a ML-driven diagnostic framework for periodontitis, integrating transcriptomic, immune, and regulatory network insights. These gene biomarkers may provide novel insight into periodontitis pathogenesis, while our diagnostic models show potential for clinical utility in personalised diagnosis, targeted intervention, and therapeutic development.

Key words: Periodontitis, Machine learning, Diagnostic models, Statistical, Artificial intelligence, Precision medicine

Plain Language Summary

Periodontitis is a common, serious condition often diagnosed too late using traditional methods that can be subjective. To improve detection, we developed an machine learning (ML) tool that analyses genetic activity in gum tissue. Using data from 616 patient samples, we identified five key genes (CSF2RB, COL15A1, MME, NEFM, CYP24A1) that act as biological ‘flags’ for gum disease. These genes are linked to immune responses that drive gum inflammation. Our ML models – especially two types called Random Forest and XGBoost – perfectly spotted gum disease in initial tests and remained highly accurate in new patient groups. We also created a simple scoring chart (nomogram) to predict individual risk. The genes we found interact with immune cells and vitamin D pathways, revealing new disease mechanisms. This work provides a faster, more objective way to diagnose gum disease and opens doors for personalised treatments.

Introduction

Periodontitis, a prevalent chronic inflammatory disease, affects over 40% of the adult population worldwide.1,2 It is initiated by dental plaque biofilm and leads to the destruction of gingival tissues and alveolar bone resorption, ultimately resulting in tooth mobility and loss.3,4 Early diagnosis and intervention are crucial for controlling disease progression and improving patient outcomes. However, traditional diagnostic methods for periodontitis primarily rely on clinical examinations and radiographic imaging, which are often limited by subjectivity and low sensitivity.5

Conventional diagnostic methods for periodontal diseases are defined as follows6,7: Clinical examination using a periodontal probe to assess probing pocket depth (PPD), clinical attachment level (CAL), bleeding on probing (BOP), and furcation involvement, combined with radiographic imaging (eg, intraoral X-rays, CBCT) to evaluate bone loss morphology and severity; these gold-standard measures detect inflammatory sequelae and structural damage but cannot predict future disease activity.

In recent years, machine learning (ML) techniques have revolutionised medical diagnostics by extracting complex patterns from multimodal datasets to enhance predictive accuracy and objectivity.8,9 Within periodontitis, ML models demonstrate significant potential to overcome limitations of conventional methods,10,11 including subjective clinical assessments and poor generalizability. Current applications span microbial signature analysis,12 salivary biomarker detection,13 automated probing depth measurements,14 DNA methylation and RNA-seq15 achieving diagnostic precision comparable to or exceeding human experts.

The objective of this study is to develop a machine learning-based diagnostic model for periodontitis and evaluate its performance. We collected transcriptomic data from a total of 616 samples in the Gene Expression Omnibus (GEO) dataset, comprising 452 periodontitis samples and 164 healthy controls. Utilizing this data, we trained and tested a nomogram model along with various machine learning algorithms, including Random Forest (RF), k-Nearest Neighbours (KNN), Support Vector Machine (SVM), Generalized Linear Model (GLM), Partial Least Squares (PLS), and eXtreme Gradient Boosting (XGBoost). The model's performance was assessed through both internal and external datasets.

We anticipate that the developed periodontitis diagnostic model will enhance diagnostic accuracy and efficiency, providing valuable guidance for periodontitis diagnosis and personalised treatment.

Methods

RNA expression data acquisition

RNA expression datasets related to periodontitis (PD) were retrieved from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo). A total of 616 samples were downloaded from 8 distinct datasets: GSE16134 (PD=241, Healthy=69, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16134), GSE10334 (PD=183, Healthy=64, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10334), GSE23586 (PD=3, Healthy=3, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE23586), GSE223328 (PD=4, Healthy=4, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223328), GSE223924 (PD=10, Healthy=10, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223924), GSE243173 (PD=6, Healthy=4, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE243173), GSE273165 (PD=4, Healthy=5, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE273165), and GSE27993 (PD=5, Healthy=5, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE27993) (Supplementary Table 1). The GSE16134 dataset was designated as the training dataset for subsequent analyses. We merged and performed batch correction on the following publicly available gene expression datasets as combined test dataset: GSE23586, GSE223328, GSE223924, GSE243173, GSE273165, and GSE27993, thereby integrating diverse cohorts while mitigating technical variations for robust downstream analysis (Supplementary Figure 1).

All of these GEO datasets are used gingiva or periodental ligament tissue.

Identification of differentially expressed genes (DEGs)

Differential gene expression analysis was performed using the Wilcoxon test. The R package ‘limma’ was employed to identify differentially expressed genes (DEGs) between periodontitis patients and healthy controls. Genes with an absolute log2(fold change) greater than 1 (log2|FC| > 1) and adjusted p-value less (FDR, adjusted method is holm) than 0.05 (FDR < 0.05) were considered statistically significant DEGs.

Functional enrichment analysis: GO and KEGG

Gene Ontology (GO) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted using the R packages ‘org.Hs.eg.db’ and ‘clusterProfiler’. GO terms and KEGG pathways were deemed significantly enriched if the adjusted p-value was less than 0.05.

Gene set enrichment analysis (GSEA)

The R package ‘psych’ was utilised to calculate the correlation coefficients between each core gene and all other genes in the training set. Genes were subsequently ranked based on their correlation coefficients, generating a list of related genes for each core gene. Gene Set Enrichment Analysis (GSEA) was performed using the ranked gene lists, with the reference gene set obtained from the MSigDB database. The ‘clusterProfiler’ R package was employed for GSEA pathway enrichment analysis, with a significance threshold set at adjusted p-value < 0.05.

Weighted gene co-expression network analysis (WGCNA)

In the Weighted Gene Co-expression Network Analysis (WGCNA), genes from the training dataset (GSE16134) exhibiting a median absolute deviation (MAD) in the top 75% and an MAD value greater than 0.01 were selected for further analysis. An input matrix was constructed using these genes. To determine the optimal soft threshold, topological calculations were performed across a soft threshold range of 1 to 30. The correlation matrix was initially converted into an adjacency matrix, which was subsequently transformed into a topological overlap matrix (TOM). Using average linkage hierarchical clustering based on the TOM, genes were grouped into distinct modules. Adjacent modules were then merged to enhance the robustness of the analysis. Finally, Pearson’s correlation analysis was applied to evaluate the relationship between the merged modules and disease incidence. The modules demonstrating the strongest positive and negative correlations with the disease were designated as the core modules. Through this process, a total of 660 genes were identified as significant.

Screening of early diagnostic biomarkers and construction of diagnostic models for periodontitis

To identify effective diagnostic biomarkers for periodontitis, we extracted the intersection of differentially expressed genes (DEGs) from the training dataset and GSE10334 and the genes associated with periodontitis obtained through WGCNA. Subsequently, the intersection genes were further refined using Boruta and 10-fold cross validate Least Absolute Shrinkage and Selection Operator (LASSO) algorithms (random seed was 5201314).

Our ML workflow comprised two distinct phases: Phase I for biomarker selection using multiple algorithms, and Phase II for building final diagnostic models using the selected biomarkers.

Phase I: Six distinct machine learning models were employed to evaluate the performance of the candidate genes: Random Forest (RF, with ntree = 300), k-Nearest Neighbours (KNN, with tuneLength = 10), Support Vector Machine (SVM, tuned using the caret package’s default grid), Generalized Linear Model (GLM, fitted with default settings for binomial family regression), Partial Least Squares (PLS, with internal validation = 'CV'), and eXtreme Gradient Boosting (XGBoost, with nrounds = 100). To develop a more parsimonious and generalizable diagnostic model for periodontitis, key genes were selected based on the smallest residuals across all six algorithms. Specifically, genes consistently ranked within the top 50% of minimal residuals in every model were identified as potential early diagnostic biomarkers.16

Phase II: To further assess the diagnostic power of the identified biomarkers, the same six machine learning models (RF, KNN, SVM, GLM, PLS, and XGBoost) were reconstructed using these biomarkers on the training dataset, applying the same hyperparameter settings as described above. The performance of these models was subsequently validated on the external datasets GSE10334 and combined test dataset. To assess the clinical utility of the biomarkers in predicting the probability of periodontitis, a nomogram was developed using the R package ‘RMS’ based on the identified biomarkers in the training dataset. The nomogram model was further validated on the external datasets to ensure its robustness and generalizability.

Immune infiltration analysis

We utilised the ‘CIBERSORT’ R package to conduct a comprehensive analysis of the immune microenvironment within the samples. A comparative analysis of immune cell composition was performed between periodontitis samples and healthy samples. The Spearman correlation coefficient was used to assess the correlations among differentially abundant immune cells.

Drug prediction and molecular docking

To investigate potential therapeutic agents targeting core genes for the treatment of periodontitis, we searched for candidate drugs associated with the 5 prognostic genes in the DGIdb database (https://www.dgidb.org/). Among the 5 genes, BGN exhibited relatively high expression levels and was selected for further analysis.

Results

Differentially expressed genes in periodontitis

We identified 181 and 139 differentially expressed genes (DEGs) from the GEO datasets GSE16134 (training dataset) and GSE10334, respectively. In training dataset, 128 genes were upregulated, and 53 genes were downregulated (Figure 1A), while in GSE10334, 95 genes were upregulated, and 44 genes were downregulated (Figure 1B). To ensure the reliability of the DEGs, we extracted the intersection of DEGs from training dataset and GSE10334, resulting in a total of 137 shared DEGs (Figure 1C).

Fig. 1.

Fig 1

Analysis of Differentially Expressed Genes (DEGs) across datasets and their functional characterization. A-B, Volcano plots illustrating Differentially Expressed Genes (DEGs) in A, GSE16134 and B, GSE10334 datasets. Red dots represent significantly up-regulated genes (fold change > 1, p value < 0.05), while blue dots indicate significantly down-regulated genes (fold change < 1, p value < 0.05). Gray dots represent non-significant genes. C, Venn diagram demonstrating the overlap of DEGs between GSE16134 and GSE10334 datasets. Numbers in parentheses indicate the percentage of total DEGs in each dataset. D, Protein-Protein Interaction (PPI) network of overlapping DEGs constructed using STRING database. Node colour intensity reflects the degree of protein interactions, with darker red indicating higher connectivity. E: KEGG pathway enrichment analysis of DEGs. Left panel displays gene expression patterns, where red and blue bars represent up- and down-regulated genes, respectively. Right panel shows enriched KEGG pathways color-coded by functional categories, with pathway top 10 significance indicated by -log10(p-value).

To explore the relationships among these genes, we utilised the STRING (https://cn.string-db.org/) database to construct a protein-protein interaction (PPI) network with a confidence threshold of 0.7. The results revealed a complex network of interactions among the DEGs (Figure 1D). Notably, genes such as CXCR4 and IL1B exhibited the strongest interactions with other genes, suggesting their potential central roles in the molecular mechanisms underlying periodontitis.

To further elucidate the biological functions of these DEGs, we performed Gene Ontology (GO) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathway enrichment analyses. The GO and KEGG analyses (Figure 1E, Supplementary Figure 2A) demonstrated that the DEGs were predominantly associated with immune cell-related processes and immune response pathways. These findings highlight the critical involvement of immune regulation in the pathogenesis of periodontitis.

Weighted gene co-expression network analysis (WGCNA)

The training periodontitis dataset was subjected to Weighted Gene Co-expression Network Analysis (WGCNA) to identify co-expressed gene modules associated with the disease. The sample dendrogram revealed no outliers in the dataset (Supplementary Figure 2B), and thus, no samples were excluded from the analysis. Following the WGCNA methodology, the optimal soft threshold power was determined to be 12, as this value achieved a scale-free topology model fit with a signed R² value close to 1 (Figure 2A). Based on the topological overlap matrix (TOM) and hierarchical clustering, 12 distinct gene modules were identified in the periodontitis dataset (Figure 2B).

Fig. 2.

Fig 2

Identification of Diagnostic Genes for Periodontitis. A, Selection of soft thresholds for the construction of gene co-expression networks. B, Hierarchical clustering trees of WGCNA modules for periodontitis, illustrating the modular structure of the networks. C, Correlation analysis between module elements (MEs) and disease status, demonstrating module-trait associations. Each row corresponds to an ME, and each column represents a distinct group. D, Venn diagram depicting the intersection of disease-associated genes and DEGs. E, Boruta machine learning to select periodontitis related genes. Green demonstrated that were confirmed by Boruta method. F-G, Least Absolute Shrinkage and Selection Operator (LASSO) Regression. F, Each coloured line represents a unique gene, showcasing its coefficient profile during LASSO regularization path. The optimal lambda (λ), determined by minimizing prediction error, is visually indicated, highlighting the regularization strength that achieves the best trade-off between bias and variance. G, The LASSO model optimized via 10-fold cross-validation. H, The upset plot shows the overlap and unique features selected by six models: XGBoost, KNN, GLM, SVM, and PLS. Horizontal bars indicate the size of feature sets for each model, while vertical bars represent shared features across model combinations. Connecting dots show specific model intersections, highlighting consensus and unique features.

Subsequently, the correlations between these modules and clinical traits (periodontitis vs healthy) were evaluated. The ‘green’ module exhibited the strongest positive correlation with periodontitis (r = 0.618, p < 0.001), while the ‘greenyellow’ module showed the strongest negative correlation with the disease (r = −0.584, p < 0.001) (Figure 2C). These findings suggest that genes within the ‘green’ module may play a role in promoting periodontitis, whereas genes in the ‘greenyellow’ module may have protective or inhibitory effects.

Ultimately, a total of 660 genes were identified from the WGCNA analysis, providing a robust set of candidate genes for further investigation into the molecular mechanisms underlying periodontitis.

Identification of diagnostic biomarkers for periodontitis

We initially identified 94 overlapping genes from the intersection of differentially expressed genes in train and GSE10334 datasets with WGCNA analysis (Figure 2D). Subsequently, we employed the Boruta algorithm to screen 59 significant genes (Figure 2E, Supplementary Figure 3A). Further refinement using LASSO regression yielded 12 key genes (Figure 2F-G). To establish more accurate and reliable diagnostic biomarkers, we intersected the top 50% residual genes across 6 machine learning algorithms (KNN, GLM, PLS, RF, SVM, and XGBoost). This rigorous approach culminated in the identification of 5 genes (CSF2RB, COL15A1, MME, NEFM, CYP24A1) as diagnostic biomarkers for periodontitis (Figure 2H, Supplementary Figure 3B-C). The robustness of our biomarker selection was further validated through comprehensive machine learning approaches, including KNN, GLM, PLS, RF, SVM, and XGBoost models, as demonstrated by their respective feature importance rankings and root mean square error (RMSE) loss evaluations (Supplementary Figure 3). The consistent performance across multiple algorithms strengthens the reliability of our identified biomarkers for periodontitis diagnosis.

Notably, CSF2RB, COL15A1, MME, and CYP24A1 exhibited upregulated expression in periodontitis compared to healthy tissues, while NEFM showed downregulated expression (Figure 3A). Functional enrichment analysis revealed that these biomarkers are primarily involved in crucial biological pathways, including Cytokine-Cytokine Receptor Interaction, Cell Adhesion Molecules (Cams), Focal Adhesion, Hematopoietic Cell Lineage, and Chemokine Signalling Pathway (Supplementary Figure 4). These findings suggest that the identified biomarkers play significant roles in the inflammatory and immune regulatory processes associated with periodontitis pathogenesis.

Fig. 3.

Fig 3

Comprehensive analysis of diagnostic biomarkers and machine learning models for periodontitis. A, Expression patterns of five candidate biomarkers (CSF2RB, COL15A1, MME, NEFM, CYP24A1) across healthy and periodontitis samples in GSE16134 dataset. Box plots show median expression levels with interquartile ranges. B-D, Receiver Operating Characteristic (ROC) curves demonstrating diagnostic performance of individual biomarkers in three independent datasets: Train Dataset Diagnostic (GSE16134), GSE10334 Diagnostic, Combined Test Diagnostic (GSE23586, GSE223328, GSE223924, GSE243173, GSE273165, and GSE27993). Area Under Curve (AUC) values are indicated for each biomarker. E-G, Comparative performance of six machine learning algorithms (KNN, GLM, PLS, RF, SVM, XGBoost) across three datasets, shown through ROC curves with corresponding AUC values. H, Nomogram prediction model. I, Calibration plot of the nomogram prediction model, showing agreement between predicted and observed probabilities. The Hosmer-Lemeshow test p-value indicates model fit. J, ROC curves demonstrating diagnostic performance of nomogram prediction model in three independent datasets. K, Decision curve analysis evaluating clinical utility of biomarker combinations across different risk thresholds. Net benefit is plotted against probability threshold. L, Cost-benefit analysis of different risk stratification strategies, showing the number of high-risk individuals identified at various cost-benefit ratios.

Development and validation of periodontitis diagnostic models

We initially evaluated the diagnostic performance of the 5 biomarker genes (CSF2RB, COL15A1, MME, NEFM, and CYP24A1) across 3 independent datasets (train dataset, GSE10334, and combined test dataset) (Figure 3B-D). The results demonstrated exceptional diagnostic capability, indicating robust discriminative power between periodontitis and healthy samples.

Subsequently, we constructed 6 distinct machine learning models (KNN, GLM, PLS, RF, SVM, and XGBoost) utilizing these 5 biomarkers. The diagnostic models exhibited consistent and reliable performance across both training and validation sets (Figure 3E-G). Notably, the GLM and PLS models achieved perfect classification in the training dataset, while maintaining high performance (area under the receiver operating characteristic curve, AUC > 0.85) in external validation sets.

To facilitate clinical translation, we developed a comprehensive nomogram model for predicting periodontitis risk probability (Figure 3H). The model's predictive accuracy was rigorously assessed through calibration curves and receiver operating characteristic (ROC) analysis (Figure 3I-J). The nomogram demonstrated excellent calibration (mean absolute error = 0.028) and discrimination (AUC = 0.933 in train dataset, 0.908 in GSE10334, and 0.869 in combined test dataset), with Hosmer-Lemeshow test confirming good fit (p = 0.00153).

The model's performance was further evaluated across a range of classification thresholds (0.0 to 1.0), demonstrating robust predictive accuracy (Figure 3K). Optimal performance was achieved at a threshold of 0.6, effectively balancing sensitivity and specificity. Cost-benefit analysis (Figure 3L) revealed that this threshold maximised the identification of true high-risk cases while minimizing unnecessary interventions, with the number of high-risk individuals decreasing from 1000 to 200 as the threshold increased from 0.0 to 1.0.

These findings collectively demonstrate the clinical utility of our diagnostic models, particularly the nomogram, which provides a practical tool for early risk stratification and personalised intervention in periodontitis management. The integration of multiple machine learning approaches and rigorous validation across independent datasets strengthens the reliability and translational potential of our diagnostic framework.

Immune infiltration analysis

To elucidate the immune microenvironment in periodontitis, we conducted comprehensive immune infiltration analysis. The expression levels of diagnostic biomarkers showed significant correlations with Immune Score, ESTIMATE Score, and Stromal Score (Figure 4A-C). Notably, CSF2RB, COL15A1, MME, and CYP24A1 exhibited strong positive correlations with all 3 scores, suggesting their potential roles in regulating immune cell infiltration and shaping the immune microenvironment. In contrast, NEFM expression demonstrated negative correlations with these scores.

Fig. 4.

Fig 4

Correlation between gene expression and immune score, ESTIMATE score, and stromal score. A-C, Each panel shows the scatter plot of log2TPM (gene expression level) versus ESTIMATE score A, Immune score B, and Stromal score C, for the indicated genes. Each data point represents a single sample. The solid line represents the best fit line based on Spearman correlation analysis. R, Spearman correlation coefficient; p, p-value; n, number of samples. D, Proportions of immune cells in different samples. Bar plots show the average proportion of each immune cell type in Healthy (red) and periodontitis (PD) (dark red) groups. The colour bars represent different immune cell types. E, Box plots of immune cell counts. Box plots show the distribution of cell estimated proportion for each immune cell type in PD (yellow) and Healthy (green) groups. F, Correlations between immune cell types with significant differences in abundance between PD and Healthy groups. Red indicates positive correlation, blue indicates negative correlation, and an asterisk (*) indicates p < 0.05. G: Correlations between immune cell types with significant differences in abundance between PD and Healthy groups and the five diagnostic genes.

To further characterize the immune landscape in periodontitis, we compared immune cell composition between periodontitis and healthy tissues (Figure 4D-E). Significant differences were observed in several immune cell populations, particularly Dendritic cells resting, Plasma cells, Neutrophils, and T cells CD4 memory activated, indicating their potential involvement in periodontitis pathogenesis.

Correlation network analysis revealed complex interactions among these differentially expressed immune cells (Figure 4F). For instance, T cells gamma delta showed positive correlation with Mast cells resting but negative correlation with T cells CD8. Furthermore, significant associations were identified between diagnostic biomarkers and specific immune cell populations (Figure 4G), providing insights into potential regulatory mechanisms underlying periodontitis progression.

These findings collectively suggest that the identified biomarkers not only serve as diagnostic indicators but also play crucial roles in modulating the immune microenvironment in periodontitis. The intricate network of immune cell interactions and their associations with biomarker expression patterns provide valuable insights for understanding the immunological basis of periodontitis and developing targeted therapeutic strategies.

Transcription factor regulatory network analysis

Through systematic analysis of the transcription factor (TF) regulatory network associated with periodontitis key diagnostic genes, we identified several key transcription factors and their target genes (Figure 5A, Supplementary Table 2). For example, the CSF2RB gene is regulated by NFIC, which is involved in immune system signalling; the COL15A1 gene is regulated by LHX3 and ARID3A, which are associated with extracellular matrix structure; and the transcription factor NFYA regulates both COL15A1 and CYP24A1, the latter of which plays a critical role in vitamin D metabolism.

Fig. 5.

Fig 5

Transcriptional regulatory network and drug prediction analysis for periodontitis diagnostic biomarkers. A, Transcriptional regulatory network of periodontitis diagnostic biomarkers. Red nodes represent periodontitis diagnostic biomarkers (CYP24A1, MME, NEFM, CSF2RB, COL15A1), while blue nodes indicate transcription factors. Edge weights represent regulatory interaction strengths. B, Drug-gene interaction network for periodontitis diagnostic biomarkers. Red nodes denote periodontitis diagnostic biomarkers, and yellow nodes represent potential therapeutic drugs predicted using the DGIdb database. Edge thickness and colour intensity correspond to interaction confidence scores, with thicker, darker red lines indicating higher prediction scores. C-D, Molecular docking analysis between periodontitis diagnostic biomarkers and predicted therapeutic compounds. The heatmap displays Vina docking scores (kcal/mol), with more negative values indicating stronger predicted binding affinities. CSF2RB showed the strongest binding affinity with Sargramostim (Vina score: −7.1), while MME exhibited the highest affinity with Candoxatril (Vina score: −7.9).

Functional annotation of these transcription factors revealed their involvement in diverse biological processes. Specifically, NFIC and ELK1 play key roles in immune regulation by modulating the expression of immune-related genes; LHX3 and ARID3A influence extracellular matrix stability through the regulation of COL15A1; and NFYA and ELK1 are involved in vitamin D metabolism by regulating CYP24A1, highlighting the multifunctional roles of these transcription factors in the pathological processes of periodontitis.

Topological analysis of the transcription factor regulatory network identified NFYA and SP1 as central hubs, regulating multiple key genes. These transcription factors form a multi-layered regulatory network through complex interactions, significantly influencing the progression of periodontitis. The broad regulatory roles of NFYA and SP1 suggest their central importance in the development and progression of periodontitis.

The analysis of the transcription factor regulatory network unveils the complex regulatory mechanisms of periodontitis-related genes. NFYA and SP1, as core transcription factors, may impact the progression of periodontitis by regulating extracellular matrix remodelling and metabolic processes. Additionally, the immune regulatory functions of NFIC and ELK1 further emphasise the critical role of the immune system in periodontitis. These findings not only deepen our understanding of the molecular mechanisms underlying periodontitis but also provide potential molecular targets for future diagnostic and therapeutic strategies.

In summary, this study systematically elucidates the regulatory mechanisms of periodontitis-related genes through transcription factor network analysis, laying a solid theoretical foundation for subsequent functional studies and clinical applications.

Predicting potential drugs targeting diagnostic genes

To explore potential therapeutic agents for periodontitis, we queried the DGIdb database for drugs targeting five diagnostic marker genes associated with the disease. The drug prediction analysis based on the DGIdb database (https://www.dgidb.org/) revealed potential interactions between periodontitis diagnostic marker genes and various drugs (Figure 5B-D, Supplementary Table 3). Notably, the MME gene showed significant associations with the approved angiotensin-converting enzyme inhibitor SACUBITRIL/VALSARTAN (interaction score: 4.75) and the antihypertensive drug RACECADOTRIL (score: 4.75). The CYP24A1 gene exhibited high binding potential with the unapproved anti-inflammatory/anti-psoriatic agent CTA018 (score: 5.80) and the approved Vitamin D (score: 0.89). Furthermore, CSF2RB demonstrated a strong interaction with the marketed drug SARGRAMOSTIM (indication: Crohn’s disease, score: 13.05), suggesting its potential to modulate immune responses in periodontitis.

Molecular docking results (Figure 5C) further validated the binding efficacy of some predicted drugs. For instance, CSF2RB showed a Vina Score of −7.1 with a candidate drug, indicating strong affinity with the target protein. Similarly, MME exhibited a docking score of −7.9 with CANDOXATRIL (Figure 5D), supporting its feasibility as a potential therapeutic agent. These findings not only highlight the repurposing potential of existing drugs but also provide molecular-level theoretical support for the development of novel drugs. This underscores the promise of multi-target intervention strategies in the treatment of periodontitis.

Discussion

In this study, we developed and validated a machine learning (ML)-based diagnostic model for periodontitis using transcriptomic data, immune infiltration analysis, and transcription factor regulatory networks. Our findings demonstrate the potential of ML algorithms can accuracy and efficiency for periodontitis diagnosis, offering a complementary method with traditional diagnostic methods, which are often limited by subjectivity and low sensitivity. The identification of five key diagnostic biomarkers (CSF2RB, COL15A1, MME, NEFM, and CYP24A1) and the construction of robust ML models could provide a foundation for early detection and personalised treatment of periodontitis.

The differential expression analysis revealed 137 shared DEGs between periodontitis and healthy controls, with significant enrichment in immune-related pathways, underscoring the critical role of immune regulation in periodontitis pathogenesis. The WGCNA analysis further identified co-expressed gene modules strongly associated with periodontitis, particularly the ‘green’ module, which exhibited a positive correlation with the disease. These findings align with previous studies highlighting the involvement of immune and inflammatory processes in periodontitis progression.17, 18, 19

The five diagnostic biomarkers identified in this study – CSF2RB, COL15A1, MME, NEFM, and CYP24A1 – were consistently validated across multiple datasets and ML models, demonstrating their reliability and potential clinical utility. Notably, CSF2RB and COL15A1 were upregulated in periodontitis, suggesting their roles in promoting inflammatory responses,20,21 while NEFM was downregulated, possibly indicating a protective or inhibitory function.22 Functional enrichment analysis further linked these biomarkers to critical pathways such as cytokine-cytokine receptor interaction and chemokine signalling, which are known to be involved in periodontitis.23, 24, 25

While our models demonstrated robust performance across both training (GSE16134) and external validation datasets (GSE10334, combined test dataset), we acknowledge that these datasets share inherent similarities in patient populations. Limited datasets were sourced from GEO repositories, which may limit the generalizability of our findings to more diverse cohorts. Nevertheless, the consistent performance of the models – particularly the GLM and PLS algorithms achieving AUC > 0.85 in external validation – suggests that the identified biomarkers (CSF2RB, COL15A1, MME, NEFM, CYP24A1) capture core molecular signatures of periodontitis pathogenesis. To further validate the generalizability, future studies should include independent cohorts with broader demographic and clinical heterogeneity, such as different ethnicities, disease stages, or comorbid conditions. The development of a nomogram for predicting periodontitis risk probability further enhances the clinical applicability of our findings. The nomogram demonstrated excellent calibration and discrimination, providing a practical tool for risk stratification and personalised intervention in periodontitis management. This aligns with the growing trend of integrating ML into healthcare for improved diagnostic accuracy and patient outcomes.26

Immune infiltration analysis revealed significant differences in immune cell composition between periodontitis and healthy tissues, particularly in dendritic cells, plasma cells, and T cells.17,27, 28, 29 The strong correlations between the diagnostic biomarkers and immune scores suggest that these biomarkers not only serve as diagnostic indicators but also play crucial roles in modulating the immune microenvironment.30, 31, 32, 33, 34 These findings provide valuable insights into the immunological basis of periodontitis and highlight potential targets for therapeutic intervention.

The transcription factor regulatory network analysis identified key regulators such as NFYA and SP1, which are involved in extracellular matrix remodelling and immune regulation.35 These transcription factors form a complex regulatory network that significantly influences periodontitis progression. The identification of these regulatory mechanisms deepens our understanding of the molecular basis of periodontitis and offers potential targets for future diagnostic and therapeutic strategies.

Finally, drug prediction and molecular docking analysis suggested potential therapeutic agents targeting the identified biomarkers. Notably, MME showed strong interactions with approved drugs such as SACUBITRIL/VALSARTAN,36 while CYP24A1 exhibited high binding potential with Vitamin D.37, 38, 39 These findings highlight the repurposing potential of existing drugs and provide molecular-level support for the development of novel multi-target intervention strategies in periodontitis treatment.

Limitations

Our study has two key limitations regarding generalizability: (1) Geographic limitation: 92% of samples (567/616) originated from U.S. institutions, limiting generalizability to global populations. (2) Limited validation cohort diversity: Samples from other regions were limited in size, comprising 8 from France, 20 from South Korea, 6 from Japan, 9 from China, and 10 from Germany. Future validation efforts should prioritise cohorts from diverse geographic regions to enhance generalizability. (3) The current study is all based on computational and requires biological or clinical validation in the future. Additionally, prospective validation of biomarkers using longitudinal clinical data, the development of hybrid algorithms integrating transcriptomic signatures with clinical metrics (PPD/CAL), and exploration of point-of-care testing via gingival crevicular fluid represent critical next steps.

Conclusion

In conclusion, this study presents a comprehensive ML-based diagnostic framework for periodontitis, integrating transcriptomic data, immune infiltration analysis, and transcription factor regulatory networks. The identified biomarkers and machine learning models offer potential to complement traditional periodontitis diagnostics, enable personalised treatment, and accelerate therapeutic development. Future studies should focus on validating these findings in larger, diverse cohorts and exploring the therapeutic potential of the identified biomarkers and drugs in clinical settings.

Data availability statement

RNA expression datasets analysed in this study are publicly available in the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo) under accession numbers GSE16134 (PD=241, Healthy=69, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16134), GSE10334 (PD=183, Healthy=64, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10334), GSE23586 (PD=3, Healthy=3, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE23586), GSE223328 (PD=4, Healthy=4, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223328), GSE223924 (PD=10, Healthy=10, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223924), GSE243173 (PD=6, Healthy=4, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE243173), GSE273165 (PD=4, Healthy=5, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE273165), and GSE27993 (PD=5, Healthy=5, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE27993) All 616 samples originated from gingiva or periodontal ligament tissue.

Statistical analysis

Analyses were performed using R version 4.5.0 and Python version 3.12. More detailed information on R package versions and citation details can be found in Supplementary Table 4.

Author contributions

Ya’nan Cheng made contribution to the conception and design; Haiqiong Yang, Hui Mo analysed and interpreted data; Ya’nan Cheng drafted the article; Pu Xu revised it critically for important intellectual content; All authors approved the final version to be published.

Ethical and legal declarations

The datasets analysed during the current study are available in the Gene Expression Omnibus (GEO database, https://www.ncbi.nlm.nih.gov/geo/).

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Funding

Hainan Provincial Natural Science Foundation High-level Talents Project of China (Grant No. 821RC725).

Conflict of interest

None disclosed.

Footnotes

Supplementary material associated with this article can be found in the online version at doi:10.1016/j.identj.2025.104028.

Appendix. Supplementary materials

Supplementary Figure 1. GEO dataset debatch. A: Principal component analysis demonstrate GEO dataset debatch before. B: Principal component analysis demonstrate GEO dataset debatch after.

mmc1.jpg (153.4KB, jpg)

Supplementary Figure 2. GO analysis and sample cluster. A: Gene Ontology (GO) Enrichment Analysis. B: WGCNA samples cluster to select outlier samples.

mmc2.jpg (117.2KB, jpg)

Supplementary Figure 3. Machine Learning to Select Genes and Genes Performance. A: Boruta Machine Learning Model. The Boruta feature selection algorithm was employed to identify the most relevant genes for classification. The plot shows the importance of each gene as determined by the Boruta algorithm, highlighting the key genes that contribute significantly to the model's predictive power. This step is crucial for reducing dimensionality and improving the interpretability of the machine learning model. B: Residual of machine learning. C: RMSE loss after feature permutations for different machine learning models.

mmc3.jpg (206.3KB, jpg)

Supplementary Figure 4: GSEA enrichment plots of diagnostic genes. These heatmap illustrates the significant enrichment of diagnostic genes in KEGG pathways based on the GSEA dataset. The x-axis represents the names of the pathways, while the y-axis represents the enrichment scores (NES). The colour scale indicates the -log10 p-values, with darker colours representing more significant enrichments.

mmc4.jpg (209.3KB, jpg)
mmc5.docx (12KB, docx)
mmc6.xlsx (28.5KB, xlsx)
mmc7.xlsx (11.2KB, xlsx)
mmc8.xlsx (26.4KB, xlsx)

References

  • 1.Bartold PM. Lifestyle and periodontitis: the emergence of personalized periodontics. Periodontol 2000. 2018;78(1):7–11. doi: 10.1111/prd.12237. [DOI] [PubMed] [Google Scholar]
  • 2.Kwon T., Lamster I.B., Levin L. Current concepts in the management of periodontitis. Int Dent J. 2021;71(6):462–476. doi: 10.1111/idj.12630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Usui M., Onizuka S., Sato T., Kokabu S., Ariyoshi W., Nakashima K. Mechanism of alveolar bone destruction in periodontitis - periodontal bacteria and inflammation. Jpn Dent Sci Rev. 2021;57:201–208. doi: 10.1016/j.jdsr.2021.09.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Valm AM. The structure of dental plaque microbial communities in the transition from health to dental caries and periodontal disease. J Mol Biol. 2019;431(16):2957–2969. doi: 10.1016/j.jmb.2019.05.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Stavropoulos A., Bertl K., Spineli L.M., Sculean A., Cortellini P., Tonetti M. Medium- and long-term clinical benefits of periodontal regenerative/reconstructive procedures in intrabony defects: systematic review and network meta-analysis of randomized controlled clinical studies. J Clin Periodontol. 2021;48(3):410–430. doi: 10.1111/jcpe.13409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Tonetti M.S., Sanz M. Implementation of the new classification of periodontal diseases: decision-making algorithms for clinical practice and education. J Clin Periodontol. 2019;46(4):398–405. doi: 10.1111/jcpe.13104. [DOI] [PubMed] [Google Scholar]
  • 7.Zitzmann N.U., Margolin M.D., Filippi A., Weiger R., Krastl G. Patient assessment and diagnosis in implant treatment. Aust Dent J. 2008;53(Suppl 1):S3–10. doi: 10.1111/j.1834-7819.2008.00036.x. [DOI] [PubMed] [Google Scholar]
  • 8.Thalakiriyawa D.S., Dissanayaka WL. Advances in regenerative dentistry approaches: an update. Int Dent J. 2024;74(1):25–34. doi: 10.1016/j.identj.2023.07.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Mao K., Thu K.M., Hung K.F., Yu O.Y., Hsung RT-C, Lam WY-H. Artificial intelligence in detecting periodontal disease from intraoral photographs: a systematic review. Int Dent J. 2025;75(5):100883. doi: 10.1016/j.identj.2025.100883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Samaranayake L., Tuygunov N., Schwendicke F., et al. The transformative role of artificial intelligence in dentistry: a comprehensive overview. part 1: fundamentals of ai, and its contemporary applications in dentistry. Int Dent J. 2025;75(2):383–396. doi: 10.1016/j.identj.2025.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Tuygunov N., Samaranayake L., Khurshid Z., et al. The transformative role of artificial intelligence in dentistry: a comprehensive overview part 2: the promise and perils, and the international dental federation communique. Int Dent J. 2025;75(2):397–404. doi: 10.1016/j.identj.2025.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wang C.W., Hao Y., Di Gianfilippo R., et al. Machine learning-assisted immune profiling stratifies peri-implantitis patients with unique microbial colonization and clinical outcomes. Theranostics. 2021;11(14):6703–6716. doi: 10.7150/thno.57775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Adeoye J., Su YX. Artificial intelligence in salivary biomarker discovery and validation for oral diseases. Oral Dis. 2024;30(1):23–37. doi: 10.1111/odi.14641. [DOI] [PubMed] [Google Scholar]
  • 14.Liu Y., Cheng Y., Song Y., Cai D., Zhang N. Oral screening of dental calculus, gingivitis and dental caries through segmentation on intraoral photographic images using deep learning. BMC Oral Health. 2024;24(1):1287. doi: 10.1186/s12903-024-05072-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Natarajan P.M., Yadalam PK. Decoding epigenetic enhancer-promoter interactions in periodontitis via transformer-GAN: a deep learning framework for inflammatory gene regulation and biomarker discovery. Int Dent J. 2025;75(6) doi: 10.1016/j.identj.2025.103879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Li R., He S., Qin T., et al. Glycosylation gene expression profiles enable prognosis prediction for colorectal cancer. Sci Rep. 2025;15(1):798. doi: 10.1038/s41598-024-84300-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hajishengallis G. Periodontitis: from microbial immune subversion to systemic inflammation. Nat Rev Immunol. 2015;15(1):30–44. doi: 10.1038/nri3785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Pan W., Wang Q., Chen Q. The cytokine network involved in the host immune response to periodontitis. Int J Oral Sci. 2019;11(3):30. doi: 10.1038/s41368-019-0064-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.López-Valverde N., Quispe-López N., Blanco Rueda J.A. Inflammation and immune response in the development of periodontal disease: a narrative review. Front Cell Infect Microbiol. 2024;14 doi: 10.3389/fcimb.2024.1493818. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Klemm F., Möckl A., Salamero-Boix A., et al. Compensatory CSF2-driven macrophage activation promotes adaptive resistance to CSF1R inhibition in breast-to-brain metastasis. Nat Cancer. 2021;2(10):1086–1101. doi: 10.1038/s43018-021-00254-0. [DOI] [PubMed] [Google Scholar]
  • 21.Chen C., Xie Z., Ni Y., He Y. Screening immune-related blood biomarkers for DKD-related HCC using machine learning. Front Immunol. 2024;15 doi: 10.3389/fimmu.2024.1339373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Pang Y., Li L., Yang Y., Shen Y., Xu X., Li J. LncRNA-ANAPC2 and lncRNA-NEFM positively regulates the inflammatory response via the miR-451/npr2/hdac8 axis in grass carp. Fish Shellfish Immunol. 2022;128:1–6. doi: 10.1016/j.fsi.2022.07.014. [DOI] [PubMed] [Google Scholar]
  • 23.Chen X., Lei H., Cheng Y., et al. CXCL8, MMP12, and MMP13 are common biomarkers of periodontitis and oral squamous cell carcinoma. Oral Dis. 2024;30(2):390–407. doi: 10.1111/odi.14419. [DOI] [PubMed] [Google Scholar]
  • 24.Gao S., Lin M., Chen W., Chen X., Tian Z., Jia T., et al. Identification of potential diagnostic biomarkers associated with periodontitis by comprehensive bioinformatics analysis. Sci Rep. 2024;14(1):93. doi: 10.1038/s41598-023-50410-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Xu X., Li T., Tang J., et al. CXCR4-mediated neutrophil dynamics in periodontitis. Cell Signal. 2024;120 doi: 10.1016/j.cellsig.2024.111212. [DOI] [PubMed] [Google Scholar]
  • 26.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56. doi: 10.1038/s41591-018-0300-7. [DOI] [PubMed] [Google Scholar]
  • 27.Ren Z., Xue Y., Zhang H., et al. Systemic immune-inflammation index and systemic inflammation response index are associated with periodontitis: evidence from NHANES 2009 to 2014. Int Dent J. 2024;74(5):1033–1043. doi: 10.1016/j.identj.2024.03.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kwon T., Lamster I.B., Levin L. Current Concepts in the Management of Periodontitis. Int Dent J. 2021;71(6):462–476. doi: 10.1111/idj.12630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Li X., Wang H., Yu X., Saha G., et al. Maladaptive innate immune training of myelopoiesis links inflammatory comorbidities. Cell. 2022;185(10):1709–1727.e18. doi: 10.1016/j.cell.2022.03.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Jerby-Arnon L., Neftel C., Shore M.E., et al. Opposing immune and genetic mechanisms shape oncogenic programs in synovial sarcoma. Nat Med. 2021;27(2):289–300. doi: 10.1038/s41591-020-01212-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Zakarya R., Chan Y.L., Rutting S., et al. BET proteins are associated with the induction of small airway fibrosis in COPD. Thorax. 2021;76(7):647–655. doi: 10.1136/thoraxjnl-2020-215092. [DOI] [PubMed] [Google Scholar]
  • 32.Chuang L.S., Villaverde N., Hui K.Y., et al. A frameshift in CSF2RB predominant among Ashkenazi Jews increases risk for Crohn's disease and reduces monocyte signaling via GM-CSF. Gastroenterology. 2016;151(4):710–723.e2. doi: 10.1053/j.gastro.2016.06.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Chen H., Peng L., Wang Z., He Y., Zhang X. Integrated machine learning and bioinformatic analyses constructed a network between mitochondrial dysfunction and immune microenvironment of periodontitis. Inflammation. 2023;46(5):1932–1951. doi: 10.1007/s10753-023-01851-0. [DOI] [PubMed] [Google Scholar]
  • 34.Li D., Zhao W., Zhang X., Lv H., Li C., Sun L. NEFM DNA methylation correlates with immune infiltration and survival in breast cancer. Clin Epigenetics. 2021;13(1):112. doi: 10.1186/s13148-021-01096-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Ventura I., Revert F., Revert-Ros F., Gómez-Tatay L., Prieto-Ruiz J.A., Hernández-Andreu JM. SP1 and NFY regulate the expression of PNPT1, a gene encoding a mitochondrial protein involved in cancer. Int J Mol Sci. 2022;23(19):11399. doi: 10.3390/ijms231911399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Krittanawong C., Kitai T. Pharmacogenomics of angiotensin receptor/neprilysin inhibitor and its long-term side effects. Cardiovasc Ther. 2017;35(4) doi: 10.1111/1755-5922.12272. [DOI] [PubMed] [Google Scholar]
  • 37.Meyer M.B., Lee S.M., Towne J.M., et al. In vivo contribution of Cyp24a1 promoter vitamin d response elements. Endocrinology. 2024;165(11):bqae134. doi: 10.1210/endocr/bqae134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Milan K.L., Ramkumar KM. Regulatory mechanisms and pathological implications of CYP24A1 in vitamin D metabolism. Pathol Res Pr. 2024;264 doi: 10.1016/j.prp.2024.155684. [DOI] [PubMed] [Google Scholar]
  • 39.Fuchs M.A., Grabner A., Shi M., et al. Intestinal Cyp24a1 regulates vitamin D locally independent of systemic regulation by renal Cyp24a1 in mice. J Clin Invest. 2024;135(4) doi: 10.1172/JCI179882. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Figure 1. GEO dataset debatch. A: Principal component analysis demonstrate GEO dataset debatch before. B: Principal component analysis demonstrate GEO dataset debatch after.

mmc1.jpg (153.4KB, jpg)

Supplementary Figure 2. GO analysis and sample cluster. A: Gene Ontology (GO) Enrichment Analysis. B: WGCNA samples cluster to select outlier samples.

mmc2.jpg (117.2KB, jpg)

Supplementary Figure 3. Machine Learning to Select Genes and Genes Performance. A: Boruta Machine Learning Model. The Boruta feature selection algorithm was employed to identify the most relevant genes for classification. The plot shows the importance of each gene as determined by the Boruta algorithm, highlighting the key genes that contribute significantly to the model's predictive power. This step is crucial for reducing dimensionality and improving the interpretability of the machine learning model. B: Residual of machine learning. C: RMSE loss after feature permutations for different machine learning models.

mmc3.jpg (206.3KB, jpg)

Supplementary Figure 4: GSEA enrichment plots of diagnostic genes. These heatmap illustrates the significant enrichment of diagnostic genes in KEGG pathways based on the GSEA dataset. The x-axis represents the names of the pathways, while the y-axis represents the enrichment scores (NES). The colour scale indicates the -log10 p-values, with darker colours representing more significant enrichments.

mmc4.jpg (209.3KB, jpg)
mmc5.docx (12KB, docx)
mmc6.xlsx (28.5KB, xlsx)
mmc7.xlsx (11.2KB, xlsx)
mmc8.xlsx (26.4KB, xlsx)

Data Availability Statement

RNA expression datasets analysed in this study are publicly available in the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo) under accession numbers GSE16134 (PD=241, Healthy=69, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16134), GSE10334 (PD=183, Healthy=64, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10334), GSE23586 (PD=3, Healthy=3, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE23586), GSE223328 (PD=4, Healthy=4, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223328), GSE223924 (PD=10, Healthy=10, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223924), GSE243173 (PD=6, Healthy=4, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE243173), GSE273165 (PD=4, Healthy=5, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE273165), and GSE27993 (PD=5, Healthy=5, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE27993) All 616 samples originated from gingiva or periodontal ligament tissue.

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.


Articles from International Dental Journal are provided here courtesy of Elsevier

RESOURCES