Abstract
Primary Sjögren’s syndrome (pSS) is a chronic, systemic autoimmune disease mostly affecting the exocrine glands. This debilitating condition is complex and specific treatments remain unavailable. There is a need for the development of novel diagnostic models for early screening. Four gene profiling datasets were downloaded from the Gene Expression Omnibus database. The ‘limma’ software package was used to identify differentially expressed genes (DEGs). A random forest-supervised classification algorithm was used to screen disease-specific genes, and three machine learning algorithms, including artificial neural networks (ANN), random forest (RF), and support vector machines (SVM), were used to build a pSS diagnostic model. The performance of the model was measured using its area under the receiver operating characteristic curve. Immune cell infiltration was investigated using the CIBERSORT algorithm. A total of 96 DEGs were identified. By utilizing a RF classifier, a set of 14 signature genes that are pivotal in transcription regulation and disease progression in pSS were identified. Through the utilization of training and testing datasets, diagnostic models for pSS were successfully designed using ANN, RF, and SVM, resulting in AUCs of 0.972, 1.00, and 0.9742, respectively. The validation set yielded AUCs of 0.766, 0.8321, and 0.8223. It was the RF model that produced the best prediction performance out of the three models tested. As a result, an early predictive model for pSS was successfully developed with high diagnostic performance, providing a valuable resource for the screening and early diagnosis of pSS.
Subject terms: Computational biology and bioinformatics, Machine learning
Introduction
Primary Sjögren’s syndrome (pSS) is a chronic, systemic autoimmune disorder1,2 characterized by xerostomia and xerophthalmia, which are caused by lymphocytic infiltration of the salivary and lacrimal glands2. In addition, the extra-glandular symptoms of pSS can also affect the joints, lungs, kidneys, liver, nervous system, and musculoskeletal system3. The prevalence of pSS is higher in females than in males, with the average female-to-male ratio being 9:1. Diagnosis of pSS is based on clinical signs and symptoms, which include serological tests for autoantibody biomarkers and salivary gland histopathology4. Owing to disease heterogeneity and its complex clinical phenotypes, the underlying pathogenesis remains unclear. Therefore, identifying biomarkers and constructing novel diagnostic models for pSS are important in understanding disease progression.
The diagnosis model has been developed using machine learning algorithms such as random forest (RF), support vector machines (SVM), and artificial neural networks (ANN). In the absence of a priori assumptions, RF analysis can identify hidden factors that distinguish between case and control groups with a high level of predictive accuracy5. An ANN based algorithm based on deep learning can help identify patterns and features in large volumes of data6,7. ANN learn to recognize patterns in data based on examples without assuming anything about the nature or interrelationships of the data. In comparison with conventional models based on polynomials, linear regression, and statistics, ANNs are competitive8,9. An SVM is a machine-learning algorithm that uses multivariate statistical analysis to classify and predict individuals10. With SVM, high-dimensional data can be effectively handled, and classification results can be obtained without overfitting11. To this end, the identification of reliable and efficient biomarkers that assist in early diagnosis of pSS would be of great benefit in implementing effective interventions. Li et al.12 identified potential biomarkers for pSS disease progression using transcriptome sequencing and clinical data by constructing a diagnostic model for pSS using circRNAs and clinical features (AUC = 0.93)13. Additionally, Nishikawa et al. reported that serological biomarkers may be potential therapeutic targets for pSS14. To date, the application of machine-learning techniques in clinical settings for diagnosis and outcome prediction has already proven successful in the context of a range of diseases15,16.
The central idea of genomic medicine is that outcomes are improved when genetic diagnoses and genotype-individualized treatments are augmented by symptom-based diagnostics. To develop a transcriptome diagnostic model for pSS, microarray data was gathered from the Gene Expression Omnibus (GEO). Through bioinformatic analysis, we identified genes that were differentially expressed in pSS patients by comparing pSS samples with samples from patients without pSS. First, RF was used to find the genes that mattered most for classification. We developed a diagnostic model for pSS patients using three machine learning algorithms: ANN, RF, and SVM. Receiver operating characteristic (ROC) curves were used to evaluate the diagnostic performance of the chosen biomarkers. In addition, we validated the accuracy and reliability of the models by analysis using an external GEO cohort (see Fig. 1).
Figure 1.
Flow-chart illustrating the study protocol.
Materials and methods
Data download and processing
We downloaded microarray expression datasets from the National Center for Biotechnology Information Gene Expression Omnibus database (NCBI GEO; https://www.ncbi.nlm.nih.gov/geo/). As shown in Table 1, we searched for four sets of patients with pSS and normal controls. To create a large training cohort (GSE137684, GSE137354, and GSE34526), we used the 'ComBat' algorithm from the 'SVA' R package (version 3.46.0) to remove batch effects in different training datasets17. Where multiple probes mapped to the same Gene ID, the maximum mean expression value of all probes represented the gene's expression level. Probe IDs were converted to gene symbols based on the annotation of the microarray platforms. The final training dataset consisted of 57 pSS patients and 53 non-pSS samples. GSE66795 was used as the validation dataset.
Table 1.
Source of GEO datasets.
Screening for differentially expressed genes
In the training set, differentially expressed genes (DEGs) were identified using the ‘limma’ package in the ‘R’ software package (version 3.54.2) 18, with an adjusted P-value < 0.05 and | log2 fold-change (log2FC) |≥ 1. To create a heat map and analyze clusters of DEGs, we used the R package ‘pheatmap’. Heatmap and volcano plot visualizations of the DEGs were performed using R packages ‘pheatmap’ (version 1.0.12) and ’ggplot2’ (version 3.4.2), respectively.
Functional enrichment analysis and construction of protein-protein interaction network
To better understand the biological significance of the DEGs, we conducted GO and KEGG enrichment analyses using the R package ’clusterProfiler' (version 4.7.1) 19,20. A significantly enriched pathway exhibited a p < 0.05 and a corrected p < 0.05. The STRING database (https://cn.string-db.org/) was used to analyze the network of protein-protein interactions (PPIs). The network was visualized using the ‘Cytoscape’ software package (v3.7).
Screening for signature genes by random forest
To establish a RF model based on DEGs, the R package ‘randomForest’ was adopted (version 4.7-1.1)21. Signature genes were selected based on the minimum cross-validation error. We set the number of decision trees to 500 and the number of seeds to 12,345,678. Using the Gini index, signature genes in the RF model were evaluated using a gene importance score, and a score of > 1 was selected. The ‘Heatmap’ function in R was then used to cluster signature genes bidirectionally based on their expression profiles.
Construction of the diagnostic model using machine learning
In order to eliminate batch effects in the pSS and normal groups, we converted the expression data of signature genes into ‘Gene Score’ using the min-max method. The experimental procedure was as follows: firstly, the median expression of the genes expressed in all samples was calculated. If an upregulated gene expression in a sample was greater than the median expression value of the gene, the expression was marked as 1; otherwise, it was marked as 0. Similarly, if a downregulated gene expression in a sample was greater than the median expression value of the gene, the expression was marked as 0; otherwise, it was marked as 1. Above all, the ‘Gene Score’ sheet was used for ANN analysis. The ANN model was implemented using the "neuralnet" function in R (version 1.44.2)22. With the neuralnet package, you can build feedforward neural networks that include one or more hidden layers23. A variety of popular learning algorithms are included, including backpropagation and resilient backpropagation. Additionally, learning rates and momentum can be customized. For smaller datasets, the neuralnet package provides fast and efficient performance24. The random seed size was set at 12,345,678. The model consisted of three types of layers: the input layers, with the ‘Gene Score’ of signature genes; the hidden layers; and the output layers, with two nodes (control/pSS). Using the expression ‘GeneExpression’ × ‘NeuralNetworkWeight’, we constructed a pSS disease diagnostic model. In addition, we also used two predictive models: RF and SVM. Based on the hub gene set, SVM classifiers were constructed using the R package e1071 (version 1.7-13). RandomForest R package (version 1.7-11) was used to train the RF classifier model. In the training and validation sets, ROC curves were generated using the ‘pROC’ package25 and the AUC represented the diagnostic value.
Identification of immune cell infiltration
With the LM22 signature as a reference, CIBERSORT26 was used to characterize tumor-infiltrating immune cells within the pSS and normal groups in the training set. The R function ‘corrplot’ (version 0.92) was used to calculate Spearman’s correlations relating to immune cell infiltration.
Results
Screening of DEGs and functional enrichment analysis
We combined the three datasets (GSE23117, GSE40611, and GSE84844) into a training cohort. The batch effect was mitigated after applying the ‘ComBat’ algorithm (Fig. 2A,B). In total, 96 DEGs were found between the pSS and normal samples using the “limma” package, of which 85 were upregulated (SAMD9, GIMAP2, and DDX60, among many others) and 11 were downregulated (for example, MLXIP, WASF2, and NFIC). Supplementary Table 1 presents the list of DEGs. Gene heatmaps (Fig. 2C) and volcano maps (Fig. 2D) were used to represent the DEG distributions. As a result of the GO functional classification, DEGs were mostly enriched in defense response to virus and the type I interferon signaling pathways, and in cellular response to type I interferon. KEGG functional analysis revealed that 96 DEGs were associated with the intestinal immune network for IgA production and the NOD-like receptor signaling pathway (Fig. 2E,F). Using STRING online database analysis of the PPI network, we obtained 400 pairs of proteins (96 proteins in total). Pairs with a combined score of more than 0.6 were visualized using the ‘Cytoscape’ software. Generally, the higher the degree of a node, the more important it is. CXCL10, NDC80, ISG15, SAMD9L, and HERC5 were identified as hub genes of the network. (Fig. 3).
Figure 2.
Analyses of DEGs in the training dataset. (A, B) Distribution and PCA before and after removing the batch effect. (C) Volcano plot of DEGs. (D) Heatmap of the 50 DEGs. (E) GO function enrichment analysis of the DEGs. (F) KEGG enrichment analysis of the DEGs.
Figure 3.
A network view of the pSS PPI network. Color is used to show the degree, with yellower genes indicating a higher degree, and bluer genes indicating a lower degree.
Random forest screening for signature genes
To obtain more reliable pSS signature genes, 96 DEGs were input into the RF classifier. For the 1 to 96 variables, a recurrent RF classification was carried out and used to calculate the average error rate of the model. Ultimately, the model with 401 trees was selected as the final parameter by analyzing the relationship between the model error and the number of decision trees (Fig. 4A). The relative importance of each genus was determined based on MeanDecreaseGini (Fig. 4B). We selected 14 DEGs with MeanDecreaseGini > 1 as the pSS-signature genes for ANN analysis, 12 of which (SAMD9, DDX60, CXCL10, GIMAP2, NDC80, GMNN, CALHM6, TRIM22, SAMD9L, EVI2A, KBTBD8, and DDX60L) were upregulated and two of which (MLXIP and NFIC) were downregulated. Figure 4B shows that among the twelve variables, SAMD9 and DDX60 were the most important, followed by CXCL10, GIMAP2, MLXIP, and NDC80. The heat plot (Fig. 4C) showed that the activity of 14 pSS signature genes could distinguish pSS samples from normal samples.
Figure 4.
Random Forest analysis. (A) Correlation plot between RF trees and model error. (B) Gini coefficients were used in the RF classifiers to provide the following results. The importance index is on the x-axis, and the genetic variable is on the y-axis. (C) The heatmap of fourteen key genes generated by RF.
Construction and validation of the Machine Learning model
The diagnostic model we developed for pSS was based on three machine learning algorithms. First, we converted the 14 pSS-signature genes expression into ‘Gene Score’ in order to perform an ANN analysis. The ANN consisted of three layers (input, hidden, and output). The number of nodes in the input and output layers were 14 (number of input signature genes) and two (pSS or HC (non-pSS)), respectively (Fig. 5). The pSS-specific scoring model was formulated using the expression ‘GeneExpression’ × ‘NeuralNetworkWeight’. The area under the ROC curve was used to measure performance. In the training dataset, the AUC was 0.972, accuracy was 0.9812, precision was 1.00, recall was 0.9661, and F1-score was 0.9828 (Fig. 6A and Supplementary Table 2). In the test dataset, the AUC was 0.766, accuracy was 0.7714, precision was 0.9277, recall was 0.5878, and F1-score was 0.7196 (Fig. 6B and Supplementary Table 3).
Figure 5.
Results of artificial neural networks visualized.
Figure 6.
Evaluation of training and validation datasets using ROC curves and their AUC values. (A) ROC curve of ANN in training set. (B) ROC curve of ANN in testing set. (C) ROC curve of RF and SVM in training set. (D) ROC curve of RF and SVM in testing set.
The results of the study indicate that in the training set, the RF model achieved perfect scores (values = 1) for AUC, accuracy, precision, recall, and F1-score, while the Support Vector Machine (SVM) model achieved a slightly lower AUC score of 0.9742, with accuracy, precision, recall, and F1-score values of 0.9455, 0.9322, 0.9649, and 0.9483, respectively (Fig. 6C and Supplementary Tables 4 and 5). In the testing set, the RF model achieved an AUC score of 0.8321, with accuracay, precision, recall, and F1-score values of 0.8188, 0.8188, 1.00, and 0.9003, respectively. Similarly, the SVM model achieved an AUC score of 0.8223, with accuracy, precision, recall, and F1-score values of 0.8188, 0.8188, 1.00, and 0.9003, respectively (Fig. 6D and Supplementary Tables 6 and 7). The results indicated that this model may discriminate effectively between pSS and non-pSS samples. It was the RF model that produced the best prediction performance out of the three models tested. In the end, we constructed a diagnostic model based on 14 genes using RF.
Immune cell infiltration analysis
We used CIBERSORT to analyze 22 immune cell phenotypes in the training set to determine whether they were associated with the pSS and non-pSS groups and with immune infiltration. The following phenotypes were found to be relatively abundant in pSS: naïve and memory B cells; CD4 memory resting, CD4 memory activated, and γδ T cells; M0 and M2 macrophages; dendritic cells; and both activated and resting mast cells. Meanwhile, in HC, the following phenotypes were relatively abundant: plasma cells; CD8 and regulatory (Tregs) T cells; resting NK cells; monocytes; mast cells; and neutrophils (Fig. 7A). The measured correlation for immune cell infiltration is shown in Fig. 7B.
Figure 7.
A review of the immunological landscape of pSS. (A) Twenty-two immune-cell subtypes were compared between the HC and pSS groups. (B) Correlation analysis of infiltrating immune cells.
Discussion
Currently, pSS is diagnosed based on functional (Schirmer’s test), serological (anti-Ro/SSA), and histological (labial minor salivary gland or salivary gland) tests27,28. However, due to a combination of the heterogeneity of the disease, its complex clinical phenotypes, and the lack of effective biomarkers for early screening, most patients are diagnosed with an advanced form of the disease on presentation. Thus, it is crucial to develop effective screening tools and assess risk factors early.
We obtained four datasets (GSE23117, GSE40611, GSE84844, and GSE66795) from the GEO in order to build and validate a diagnostic model for pSS. We identified 96 genes that are expressed differently between the pSS and HC groups; enrichment analysis indicated that these DEGs were mostly involved in immunological processes. The ‘defense response to viruses’ and ‘type I interferon signaling pathway’ were the most enriched GO terms. These results are consistent with previous studies that have shown a relationship between interferon signaling and pSS. Titers of anti-Ro and anti-La autoantibodies are positively associated with type I interferon overexpression genes even in pSS29,30. Type I interferons are important components of the innate immune system that facilitate inhibition of viral infections via adaptive immunity31. The intestinal immune network governing IgA production was observed to be the most enriched KEGG pathway in the pSS group. In normal physiology, host-gut microbiota interactions are complex and multifaceted. Exposure to gut microbes stimulates continuous diversification of B-cell repertoires and constant production of IgA antibodies, both T-dependent and T-independent32. Our analysis of GO and KEGG pathways revealed that these differentially expressed proteins could be involved in the development of pSS.
Fourteen DEGs were identified by RF analysis: SAMD9, DDX60, CXCL10, GIMAP2, NDC80, GMNN, CALHM6, TRIM22, SAMD9L, EVI2A, KBTBD8, DDX60L, MLXIP, and NFIC. Our findings are consistent with those of previous studies. AMDS9 is a genetically regulated anti-inflammatory factor in patients with rheumatoid arthritis33.
It is estimated that DDX60L and DDX60 share 70% of their amino acid sequences34. The DDX60L gene is activated by interferons. In the innate immune system, DDX60L proteins recognize viral RNA molecules in order to protect against viral infections35. So far, there is little information available about the function of the DDX60L. It has been shown that DDX60L is associated with HIV host factors36, and childhood obesity37. This gene encodes a component of the NDC80 kinetochore complex, which is responsible for organizing and stabilizing interactions between microtubules and keratochromas38. The GMNN gene regulates the cell cycle. By inhibiting DNA replication licensing and histone H4 acetylation, GMNN promotes cell proliferation39. It is thought that CALHM6 regulates infection-related immunity40. Apart from pSS, a number of other autoimmune diseases are thought to be influenced by CXCL10, which recruits immune cells to sites of inflammation41. The GIMAP family of proteins regulates lymphocyte apoptosis by acting as GTPases of immunity-associated proteins42. In lymphocytes, GIMAP2 heterodimerizes with the GIMAP7 protein to activate GIMAP7 function43,44. According to these studies, multiple GIMAP proteins contribute to the survival of T cells. Approximately 70% of pSS patients who meet the diagnostic criteria have serum autoantibodies against several intracellular proteins (e.g., TRIM21 (Ro52), La/SSB)45,46. Ro52/TRIM21 plays a crucial role in antibody-dependent pathogen neutralization47. A tumor suppressor, SAMD9L is repressed by the p53 pathway in breast and hepatocellular tissues48. In hematopoietic tissue, SAMD9L plays a crucial role in regulating cell proliferation49. It is possible that Evi2a is a lymphocyte-specific tumor suppressor, which could play a role in BCR activation50. BBK protein that has been identified as being found in the Golgi apparatus and translocating to the forming spindle after KBTBD8 is the first entry into mitosis51. The findings presented here indicate that KBTBD8 is also essential for the healthy function of ovarian epithelium52. The MLXIP interacts with Max-like protein X (MLX) to activate transcription. Ovarian cancer cells migrate towards MLXLP, which was associated with a poor prognosis53. In mice, NFIC regulates the expression of PTEN/SENP8 and inhibits rheumatoid arthritis-induced inflammation54. Many of the variations have not yet been reported as being linked to pSS but have strong associations with other autoimmune disorders. A deeper understanding of the complex role these genes play in pSS requires further research.
We developed a diagnostic prediction model for patients with pSS utilizing machine learning algorithms, namely ANN, RF, and SVM, based on 14 genes. The diagnostic models for pSS using the aforementioned algorithms were successfully designed and achieved AUCs of 0.972, 1.00, and 0.9742 in the training and testing datasets, respectively. However, the AUCs for the validation set were 0.766, 0.8321, and 0.8223. The prediction properties of our model were deemed satisfactory. Nevertheless, the sample size of our cohort was limited, and further studies with larger-scale cohorts are required to validate our findings.
In addition, we examined the immune microenvironment of pSS. Multiple studies have shown that B cells are associated with disease activity in pSS55, while CD4 + T cells in pSS undergo premature aging due to lymphopenia56. A significant increase in dendritic cells has been observed in patients with pSS, which is closely related to Type I interferons29; overexpression has also been observed in mast cells, which produce transforming growth factor β1 and promote tissue fibrosis57. Conversely, a major reduction in NKT-like cells has been observed in pSS, which may be contributing to the pathogenesis of the disease58. Researchers may be able to identify novel immunotherapies for pSS by further studying the host immune response.
This study has several limitations. First, for further validation of the diagnostic model, large cohorts are needed. Second, the predictive performance of the different pSS diagnostic model needs to be validated in larger cohort.
Here, we proposed and externally verified a pSS diagnostic model. Our model is both specific and sensitive and shows great potential as a basis for the development of new diagnostic tools for pSS. We also explored the immune status of pSS, and our data provide the impetus for further analyses in order to gain a deeper understanding of the condition. Further research into the possible applications of our model in clinical settings is needed in order to improve patient outcomes.
Supplementary Information
Author contributions
Methodology, K.Y. and Q.W; formal analysis, L.W. and Q C.G; writing—original draft preparation, K.Y.; writing—review and editing, S.T; All authors have read and agreed to the published version of the manuscript.”
Data availability
The datasets generated during the current study are available in the GEO database (http://www.ncbi.nlm.nih.gov/geo/) with the accession no GSE23117, GSE40611, GSE84844, and GSE66795.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-023-35864-4.
References
- 1.Psianou K, Panagoulias I, Papanastasiou AD, de Lastic AL, Rodi M, Spantidea PI, Degn SE, Georgiou P, Mouzaki A. Clinical and immunological parameters of Sjögren's syndrome. Autoimmun. Rev. 2018;17:1053–1064. doi: 10.1016/j.autrev.2018.05.005. [DOI] [PubMed] [Google Scholar]
- 2.Nocturne G, Boudaoud S, Miceli-Richard C, Viengchareun S, Lazure T, Nititham J, Taylor KE, Ma A, Busato F, Melki J, et al. Germline and somatic genetic variations of TNFAIP3 in lymphoma complicating primary Sjogren's syndrome. Blood. 2013;122:4068–4076. doi: 10.1182/blood-2013-05-503383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Stefanski AL, Tomiak C, Pleyer U, Dietrich T, Burmester GR, Dörner T. The diagnosis and treatment of Sjögren's syndrome. Dtsch. Arztebl. Int. 2017;114:354–361. doi: 10.3238/arztebl.2017.0354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Negrini S, Emmi G, Greco M, Borro M, Sardanelli F, Murdaca G, Indiveri F, Puppo F. Sjögren's syndrome: A systemic autoimmune disease. Clin. Exp. Med. 2022;22:9–25. doi: 10.1007/s10238-021-00728-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Radice R, Ramsahai R, Grieve R, Kreif N, Sadique Z, Sekhon JS. Evaluating treatment effectiveness in patient subgroups: A comparison of propensity score methods with an automated matching approach. Int. J. Biostat. 2012;8:25. doi: 10.1515/1557-4679.1382. [DOI] [PubMed] [Google Scholar]
- 6.Bahar E, Yoon H. Modeling and predicting the cell migration properties from scratch wound healing assay on cisplatin-resistant ovarian cancer cell lines using artificial neural network. Healthcare (Basel) 2021 doi: 10.3390/healthcare9070911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw. 2015;61:85–117. doi: 10.1016/j.neunet.2014.09.003. [DOI] [PubMed] [Google Scholar]
- 8.Shi HY, Lee KT, Lee HH, Ho WH, Sun DP, Wang JJ, Chiu CC. Comparison of artificial neural network and logistic regression models for predicting in-hospital mortality after primary liver cancer surgery. PLoS One. 2012;7:e35781. doi: 10.1371/journal.pone.0035781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Harrison RF, Kennedy RL. Artificial neural network models for prediction of acute coronary syndromes using clinical data from the time of presentation. Ann. Emerg. Med. 2005;46:431–439. doi: 10.1016/j.annemergmed.2004.09.012. [DOI] [PubMed] [Google Scholar]
- 10.Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Jr, Haussler D. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. U. S. A. 2000;97:262–267. doi: 10.1073/pnas.97.1.262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wu CC, Asgharzadeh S, Triche TJ, D'Argenio DZ. Prediction of human functional genetic networks from heterogeneous data using RVM-based ensemble learning. Bioinformatics. 2010;26:807–813. doi: 10.1093/bioinformatics/btq044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Li N, Li L, Wu M, Li Y, Yang J, Wu Y, Xu H, Luo D, Gao Y, Fei X, et al. Integrated bioinformatics and validation reveal potential biomarkers associated with progression of primary Sjögren's syndrome. Front. Immunol. 2021;12:697157. doi: 10.3389/fimmu.2021.697157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Li F, Liu Z, Zhang B, Jiang S, Wang Q, Du L, Xue H, Zhang Y, Jin M, Zhu X, et al. Circular RNA sequencing indicates circ-IQGAP2 and circ-ZC3H6 as noninvasive biomarkers of primary Sjögren's syndrome. Rheumatology (Oxford) 2020;59:2603–2615. doi: 10.1093/rheumatology/keaa163. [DOI] [PubMed] [Google Scholar]
- 14.Nishikawa A, Suzuki K, Kassai Y, Gotou Y, Takiguchi M, Miyazaki T, Yoshimoto K, Yasuoka H, Yamaoka K, Morita R, et al. Identification of definitive serum biomarkers associated with disease activity in primary Sjögren's syndrome. Arthritis Res. Ther. 2016;18:106. doi: 10.1186/s13075-016-1006-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Shi M, Xu G. Development and validation of GMI signature based random survival forest prognosis model to predict clinical outcome in acute myeloid leukemia. BMC Med. Genomics. 2019;12:90. doi: 10.1186/s12920-019-0540-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Fidanza A, Stumpf PS, Ramachandran P, Tamagno S, Babtie A, Lopez-Yrigoyen M, Taylor AH, Easterbrook J, Henderson BEP, Axton R, et al. Single-cell analyses and machine learning define hematopoietic progenitor and HSC-like cells derived from human PSCs. Blood. 2020;136:2893–2904. doi: 10.1182/blood.2020006229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28:882–883. doi: 10.1093/bioinformatics/bts034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47. doi: 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Yu G, Wang LG, Han Y, He QY. clusterProfiler: An R package for comparing biological themes among gene clusters. OMICS. 2012;16:284–287. doi: 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kanehisa M, Furumichi M, Sato Y, Kawashima M, Ishiguro-Watanabe M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 2023;51:D587–d592. doi: 10.1093/nar/gkac963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kursa MB. Robustness of random forest-based gene selection methods. BMC Bioinform. 2014;15:8. doi: 10.1186/1471-2105-15-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Beck MW. NeuralNetTools: Visualization and analysis tools for neural networks. J. Stat. Softw. 2018;85:1–20. doi: 10.18637/jss.v085.i11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sinha R, Irulappan V, Patil BS, Reddy PCO, Ramegowda V, Mohan-Raju B, Rangappa K, Singh HK, Bhartiya S, Senthil-Kumar M. Low soil moisture predisposes field-grown chickpea plants to dry root rot disease: Evidence from simulation modeling and correlation analysis. Sci. Rep. 2021;11:6568. doi: 10.1038/s41598-021-85928-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Li DD, Chen T, Ling YL, Jiang Y, Li QG. A methylation diagnostic model based on random forests and neural networks for asthma identification. Comput. Math. Methods Med. 2022;2022:2679050. doi: 10.1155/2022/2679050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Müller M. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. 2011;12:77. doi: 10.1186/1471-2105-12-77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, Hoang CD, Diehn M, Alizadeh AA. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods. 2015;12:453–457. doi: 10.1038/nmeth.3337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Fisher BA, Brown RM, Bowman SJ, Barone F. A review of salivary gland histopathology in primary Sjögren's syndrome with a focus on its potential as a clinical trials biomarker. Ann. Rheum. Dis. 2015;74:1645–1650. doi: 10.1136/annrheumdis-2015-207499. [DOI] [PubMed] [Google Scholar]
- 28.Guellec D, Cornec D, Jousse-Joulin S, Marhadour T, Marcorelles P, Pers JO, Saraux A, Devauchelle-Pensec V. Diagnostic value of labial minor salivary gland biopsy for Sjögren's syndrome: A systematic review. Autoimmun. Rev. 2013;12:416–420. doi: 10.1016/j.autrev.2012.08.001. [DOI] [PubMed] [Google Scholar]
- 29.Yao Y, Liu Z, Jallal B, Shen N, Rönnblom L. Type I interferons in Sjögren's syndrome. Autoimmun. Rev. 2013;12:558–566. doi: 10.1016/j.autrev.2012.10.006. [DOI] [PubMed] [Google Scholar]
- 30.Thorlacius GE, Wahren-Herlenius M, Rönnblom L. An update on the role of type I interferons in systemic lupus erythematosus and Sjögren's syndrome. Curr. Opin. Rheumatol. 2018;30:471–481. doi: 10.1097/bor.0000000000000524. [DOI] [PubMed] [Google Scholar]
- 31.Winkler CW, Myers LM, Woods TA, Carmody AB, Taylor KG, Peterson KE. Lymphocytes have a role in protection, but not in pathogenesis, during La Crosse Virus infection in mice. J. Neuroinflamm. 2017;14:62. doi: 10.1186/s12974-017-0836-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zhao Q, Elson CO. Adaptive immune education by gut microbiota antigens. Immunology. 2018;154:28–37. doi: 10.1111/imm.12896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.He P, Wu LF, Bing PF, Xia W, Wang L, Xie FF, Lu X, Lei SF, Deng FY. SAMD9 is a (epi-) genetically regulated anti-inflammatory factor activated in RA patients. Mol. Cell Biochem. 2019;456:135–144. doi: 10.1007/s11010-019-03499-7. [DOI] [PubMed] [Google Scholar]
- 34.Schoggins JW, Wilson SJ, Panis M, Murphy MY, Jones CT, Bieniasz P, Rice CM. A diverse range of gene products are effectors of the type I interferon antiviral response. Nature. 2011;472:481–485. doi: 10.1038/nature09907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Grünvogel O, Esser-Nobis K, Reustle A, Schult P, Müller B, Metz P, Trippler M, Windisch MP, Frese M, Binder M, et al. DDX60L is an interferon-stimulated gene product restricting hepatitis C virus replication in cell culture. J. Virol. 2015;89:10548–10568. doi: 10.1128/jvi.01297-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Zhou H, Xu M, Huang Q, Gates AT, Zhang XD, Castle JC, Stec E, Ferrer M, Strulovici B, Hazuda DJ, et al. Genome-scale RNAi screen for host factors required for HIV replication. Cell Host Microbe. 2008;4:495–504. doi: 10.1016/j.chom.2008.10.004. [DOI] [PubMed] [Google Scholar]
- 37.Comuzzie AG, Cole SA, Laston SL, Voruganti VS, Haack K, Gibbs RA, Butte NF. Novel genetic loci identified for the pathophysiology of childhood obesity in the Hispanic population. PLoS One. 2012;7:e51954. doi: 10.1371/journal.pone.0051954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zheng Y, Liu L, Ye J. Identification of dysregulated modules based on network entropy in type 1 diabetes. Exp. Ther. Med. 2018;15:3211–3214. doi: 10.3892/etm.2018.5803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Zhao X, Zhang X, Shao S, Yang Q, Shen C, Yang X, Jiao W, Liu J, Wang Y. High expression of GMNN predicts malignant progression and poor prognosis in ACC. Eur. J. Med. Res. 2022;27:301. doi: 10.1186/s40001-022-00950-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Dufek S, Cheshire C, Levine AP, Trompeter RS, Issler N, Stubbs M, Mozere M, Gupta S, Klootwijk E, Patel V, et al. Genetic identification of two novel loci associated with steroid-sensitive nephrotic syndrome. J. Am. Soc. Nephrol. 2019;30:1375–1384. doi: 10.1681/asn.2018101054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Aota K, Yamanoi T, Kani K, Ono S, Momota Y, Azuma M. Inhibition of JAK-STAT signaling by baricitinib reduces interferon-γ-induced CXCL10 production in human salivary gland ductal cells. Inflammation. 2021;44:206–216. doi: 10.1007/s10753-020-01322-w. [DOI] [PubMed] [Google Scholar]
- 42.Schwefel D, Daumke O. GTP-dependent scaffold formation in the GTPase of immunity associated protein FAMILY. Small GTPases. 2011;2:27–30. doi: 10.4161/sgtp.2.1.14938. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Schwefel D, Arasu BS, Marino SF, Lamprecht B, Köchert K, Rosenbaum E, Eichhorst J, Wiesner B, Behlke J, Rocks O, et al. Structural insights into the mechanism of GTPase activation in the GIMAP family. Structure. 2013;21:550–559. doi: 10.1016/j.str.2013.01.014. [DOI] [PubMed] [Google Scholar]
- 44.Yano K, Carter C, Yoshida N, Abe T, Yamada A, Nitta T, Ishimaru N, Takada K, Butcher GW, Takahama Y. Gimap3 and Gimap5 cooperate to maintain T-cell numbers in the mouse. Eur. J. Immunol. 2014;44:561–572. doi: 10.1002/eji.201343750. [DOI] [PubMed] [Google Scholar]
- 45.Li X, Xu B, Ma Y, Li X, Cheng Q, Wang X, Wang G, Qian L, Wei L. Clinical and laboratory profiles of primary Sjogren's syndrome in a Chinese population: A retrospective analysis of 315 patients. Int. J. Rheum. Dis. 2015;18:439–446. doi: 10.1111/1756-185x.12583. [DOI] [PubMed] [Google Scholar]
- 46.Vitali C, Bombardieri S, Jonsson R, Moutsopoulos HM, Alexander EL, Carsons SE, Daniels TE, Fox PC, Fox RI, Kassan SS, et al. Classification criteria for Sjögren's syndrome: a revised version of the European criteria proposed by the American-European Consensus Group. Ann. Rheum. Dis. 2002;61:554–558. doi: 10.1136/ard.61.6.554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Burbelo PD, Teos LY, Herche JL, Iadarola MJ, Alevizos I. Autoantibodies against the immunoglobulin-binding region of Ro52 link its autoantigenicity with pathogen neutralization. Sci. Rep. 2018;8:3345. doi: 10.1038/s41598-018-21522-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Gallant-Behm CL, Ramsey MR, Bensard CL, Nojek I, Tran J, Liu M, Ellisen LW, Espinosa JM. ΔNp63α represses anti-proliferative genes via H2A.Z deposition. Genes Dev. 2012;26:2325–2336. doi: 10.1101/gad.198069.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Nagamachi A, Matsui H, Asou H, Ozaki Y, Aki D, Kanai A, Takubo K, Suda T, Nakamura T, Wolff L, et al. Haploinsufficiency of SAMD9L, an endosome fusion facilitator, causes myeloid malignancies in mice mimicking human diseases with monosomy 7. Cancer Cell. 2013;24:305–317. doi: 10.1016/j.ccr.2013.08.011. [DOI] [PubMed] [Google Scholar]
- 50.Li XW, Rees JS, Xue P, Zhang H, Hamaia SW, Sanderson B, Funk PE, Farndale RW, Lilley KS, Perrett S, et al. New insights into the DT40 B cell receptor cluster using a proteomic proximity labeling assay. J. Biol. Chem. 2014;289:14434–14447. doi: 10.1074/jbc.M113.529578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Lührig S, Kolb S, Mellies N, Nolte J. The novel BTB-kelch protein, KBTBD8, is located in the Golgi apparatus and translocates to the spindle apparatus during mitosis. Cell Div. 2013;8:3. doi: 10.1186/1747-1028-8-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Du L, Li CR, He QF, Li XH, Yang LF, Zou Y, Yang ZX, Zhang D, Xing XW. Downregulation of the ubiquitin ligase KBTBD8 prevented epithelial ovarian cancer progression. Mol. Med. 2020;26:96. doi: 10.1186/s10020-020-00226-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Meunier L, Puiffe ML, Le Page C, Filali-Mouhim A, Chevrette M, Tonin PN, Provencher DM, Mes-Masson AM. Effect of ovarian cancer ascites on cell migration and gene expression in an epithelial ovarian cancer in vitro model. Transl. Oncol. 2010;3:230–238. doi: 10.1593/tlo.10103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Jia P, Zhang W, Shi Y. NFIC attenuates rheumatoid arthritis-induced inflammatory response in mice by regulating PTEN/SENP8 transcription. Tissue Cell. 2023;81:102013. doi: 10.1016/j.tice.2023.102013. [DOI] [PubMed] [Google Scholar]
- 55.Inamo J, Suzuki K, Takeshita M, Kassai Y, Takiguchi M, Kurisu R, Okuzono Y, Tasaki S, Yoshimura A, Takeuchi T. Identification of novel genes associated with dysregulation of B cells in patients with primary Sjögren's syndrome. Arthritis Res. Ther. 2020;22:153. doi: 10.1186/s13075-020-02248-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Fessler J, Fasching P, Raicht A, Hammerl S, Weber J, Lackner A, Hermann J, Dejaco C, Graninger WB, Schwinger W, et al. Lymphopenia in primary Sjögren's syndrome is associated with premature aging of naïve CD4+ T cells. Rheumatology (Oxford) 2021;60:588–597. doi: 10.1093/rheumatology/keaa105. [DOI] [PubMed] [Google Scholar]
- 57.Kaieda S, Fujimoto K, Todoroki K, Abe Y, Kusukawa J, Hoshino T, Ida H. Mast cells can produce transforming growth factor β1 and promote tissue fibrosis during the development of Sjögren's syndrome-related sialadenitis. Mod. Rheumatol. 2022;32:761–769. doi: 10.1093/mr/roab051. [DOI] [PubMed] [Google Scholar]
- 58.Zhou X, Li Q, Li Y, Fu J, Sun F, Li Y, Wang Y, Jia Y, Zhang Y, Jia R, et al. Diminished natural killer T-like cells correlates with aggravated primary Sjögren's syndrome. Clin. Rheumatol. 2022;41:1163–1168. doi: 10.1007/s10067-021-06011-z. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets generated during the current study are available in the GEO database (http://www.ncbi.nlm.nih.gov/geo/) with the accession no GSE23117, GSE40611, GSE84844, and GSE66795.







