Abstract
Common Variable Immunodeficiency (CVID) is a primary immunodeficiency characterized by reduced levels of specific immunoglobulins, resulting in frequent infections, autoimmune disorders, increased cancer risk, and diminished antibody production despite an adequate B cell count. With its clinical manifestations being highly variable, the classification of CVID, including the widely recognized Freiburg classification, is primarily based on clinical symptoms and genetic variations. Our study aims to refine the classification of CVID by analyzing transcriptomics data to identify distinct disease subtypes. We utilized the GSE51405 dataset, examining transcriptomic profiles from 30 CVID patients without complications. Employing a combination of clustering techniques—KMeans, hierarchical agglomerative clustering, spectral clustering, and Gaussian Mixture models—and differential gene expression analysis with R’s limma package, we integrated molecular findings with demographic data (age and gender) through correlation analysis and identified common genes among clusters. Three distinct clusters of CVID patients were identified using KMeans, Agglomerative Clustering, and Gaussian Mixture Models, highlighting the disease’s heterogeneity. Differential expression analysis unveiled 31 genes with variable expression levels across these clusters. Notably, nine genes (EIF5A, RPL21, ANP32A, DTX3L, NCF2, CDC42EP3, CHP1, FOLR3, and DEFA4) exhibited consistent differential expression across all clusters, independent of demographic factors. The study recommends categorizing patients based on the four genes, NCF2, CHP1, FOLR3, and DEFA4—as they may assist in prognostic prediction. Transcriptomic analysis of common variable immunodeficiency (CVID) patients identified three distinct clusters based on gene expression, independent of age and gender. Nine differentially expressed genes were identified across these clusters, suggesting potential biomarkers for CVID subtype classification. These findings highlight the genetic heterogeneity of CVID and provide novel insights into disease classification and potential personalized treatment approaches.
Keywords: Common variable immunodeficiency, Transcriptomics data, Machine learning, Classification
Subject terms: Computational biology and bioinformatics, Genetics, Immunology, Molecular biology, Systems biology, Biomarkers, Diseases, Molecular medicine
Introduction
Common variable immunodeficiency (CVID), a primary immunodeficiency syndrome (PID), is the most common immunodeficiency disorder. It is characterized by a comparatively normal amount of inefficient B cells in the blood, resulting in lower levels of important antibodies and a condition known as hypogammaglobulinemia1,2. Patients with CVID may encounter a broad spectrum of clinical manifestations, including microbial infections, chronic diseases including autoimmune disorders, and certain types of cancers like leukemia and lymphoma3,4. The disease affects both men and women equally and is less common in African and Asian populations. Its prevalence in the Middle East and Caucasians varies from approximately 1 in 10,000 to 1 in 50,0005–8. Previous studies have identified single gene mutations, both dominant and recessive, as well as a combination of multiple genetic factors that can contribute to the development of CVID9. Due to the heterogeneity and complexity of the disease, we require a basic classification method for management10.
In recent years, classification systems have been developed to achieve the best diagnostic criteria according to the patient’s immunological and clinical characteristics of patients. For instance, Freiburg classification categorized CVID patients into two main groups, I and II, and two subgroups, Ia and Ib, depending on the amount of CD21. According to this classification system, patients in subgroup Ia are more likely to have splenomegaly and cytopenia11. Additionally, Paris classification categorized patients into three groups based on the memory B-cells phenotype with one of the groups having a greater association with splenomegaly and some autoimmunity disorders12. After that, In 2008, EUROclass integrated the two previous approaches and added a population of transitional B-cells to produce a more precise understanding of the disease13.
Transcriptomics is a subfield of biological sciences that evaluates gene expression products, such as RNAs or even microRNAs. Transcriptomics can give us new insight into regulating gene expression in various disease conditions. Recent improvements in gene expression analysis have allowed us to measure gene expression levels in specific tissues, contributing to our more profound understanding of diseases14. For instance, several studies have investigated gene expressions in cancer to identify biomarkers for diagnostic and therapeutic measures15–17. In the context of CVID, studies have been performed on transcriptomic data and demonstrated promising findings related to disease severity and treatment. For instance, microRNA can play a role in the immunoglobulin therapy of CVID patients18. Also, the severity of the disease can be related to increased levels of some microRNAs in CVID patients, like microRNA-21019. Another study used transcriptional profiling to find the underlying causes of patients with a history of inflammatory problems related to CVID and to choose the appropriate treatment for them20. Additionally, wide alternations related to gene regulation, such as DNA methylation, have been reported21. The study revealed that some genes in memory B-cells, such as CCL22, are downregulated while others, like CD22, are upregulated21. These alterations in gene expression may be involved in the pathogenesis of the disease.
Nowadays, due to the progress in the field of Artificial Intelligence (AI), the use of Machine Learning (ML) methods for classifying disease from available data has increased. Currently, Machine learning methods have gained particular attention for the analysis of gene expression data22,23. Previously, transcriptomic data and machine learning approaches were used to distinguish between two different types of Lower Urinary Tract Dysfunction (LUTD), categorizing COVID-19 patients, and determining the primary tissue of origin for Cancer of unknown primary (CUP)24–27. The results of these studies can potentially improve clinical practices for diagnosis and prognosis27.
In this article, we attempted to use machine learning to create a new classification system based on gene expression data in CVID patients. Clustering methods identify three distinct classes based on expression data, distinguishable through four specific genes. These findings suggest that these genes can serve as markers, offering a more refined classification that may inform diagnosis, prognosis, and potential treatment strategies.
Method and materials
In this study, the GSE51405 dataset was employed to analyse transcriptomic profiles from 30 CVID patients without complications. Methodologies utilized included clustering analysis with KMeans, hierarchical Agglomerative, Spectral, and Gaussian Mixture Models (GMM), differential gene expression analysis across identified clusters, integration of demographic data, and subsequent feature reduction and linear discriminant analysis (LDA) to enhance genetic profiling (Fig. 1).
Fig. 1.
Flowchart depicting the methodology of the study on Common Variable Immunodeficiency (CVID) in 30 patients without complications. The process involves clustering analysis with multiple methods, differential gene expression analysis between clusters, integration with demographic data, and feature reduction for precision in genetic profiling. GSE51405 dataset was used in the present study.
Study design and participant selection
This study utilized the GSE51405 dataset available at GEO database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE51405)20, which comprises transcriptomic data from individuals diagnosed with Common Variable Immunodeficiency (CVID) and healthy controls. The dataset includes 83 cases categorized as healthy controls (n = 24), CVID patients with inflammatory complications such as hematologic or organ-specific autoimmunity, biopsy-proven granulomatous disease, interstitial lung disease, lymphoid hyperplasia with splenomegaly or gastrointestinal inflammatory disease (n = 29), and CVID patients without mentioned complications (n = 30)20.
We focused specifically on the 30 CVID patients without complications to investigate their baseline gene expression patterns. We excluded patients with complications due to their distinct transcriptional signatures which can potentially confound our analysis of CVID patients without complications, as demonstrated by the marked differences in up- and down-regulated patterns between CVID patients with and without inflammatory complications20. This subset was selected to identify potential genetic biomarkers that could differentiate subgroups within uncomplicated CVID patients, thereby contributing to a better understanding of CVID’s clinical heterogeneity and potentially aiding in the prediction and prevention of complications.
Data processing and clustering analysis
Transcriptomics data normalization and equal contribution of features were ensured through standardization using the `StandardScaler` from the `sklearn.preprocessing` module. Following standardization, dimensionality reduction was performed using Principal Component Analysis (PCA), retaining the first two principal components for visualization and subsequent analysis. The study evaluated several clustering algorithms—KMeans, Agglomerative Clustering, Spectral Clustering, and Gaussian Mixture Models (GMM)—each meticulously tuned to optimize parameters such as the number of clusters or components based on silhouette scores, and further validated using Davies-Bouldin and Calinski-Harabasz indices to assess the validity of clustering results. These algorithms were executed using functions from `sklearn.cluster`, and their effectiveness was analyzed through visualization techniques using `matplotlib` and `seaborn`, which facilitated the plotting of data points according to their cluster assignments, enabling a comprehensive evaluation of the clustering outcomes and providing insight into the data’s underlying patterns.
Differential gene expression analysis
The limma package in R was employed to analyze differential gene expression across identified clusters, utilizing its robust linear models tailored for microarray data analysis. This statistical approach facilitated the identification of genes that exhibited significant differences in expression levels between clusters, which is crucial for uncovering the genetic foundations of CVID. By leveraging limma’s empirical Bayes smoothing method, we improved the precision of our estimates, particularly beneficial given the relatively small sample sizes typical in such studies.
Correlation analysis of gene expression with demographic data
To investigate the potential influence of demographic factors on gene expression patterns in CVID patients, we conducted a correlation analysis between age, gender, and the identified gene clusters. Using the PerformanceAnalytics package in R, we calculated Spearman’s rank correlation coefficients to assess the strength and direction of associations between continuous variables (age and gene expression levels).
Statistical considerations
In our study, to ensure the validity and integrity of the statistical analyses, all tests were conducted as two-sided with a conventional threshold where results were deemed statistically significant if the P-values were less than 0.05. This criterion is standard in biomedical research, as it provides a rigorous benchmark for determining the likelihood that the observed differences or associations are not due to random chance. Additionally, the entire data analysis process was carried out using R software (version 4.3.3) and Python (version 3.12). Venn diagrams were constructed using the online tool available at https://bioinformatics.psb.ugent.be/webtools/Venn/.
Results
Clustering analysis of CVID patients
This study performed a comprehensive clustering analysis on a dataset of 30 CVID patients without complications, using five distinct clustering methods. Hyperparameter optimization for the number of clusters was initially conducted with the silhouette score as the evaluative metric. The optimal scores indicated that three clusters were most effective for the KMeans, Agglomerative, and Gaussian Mixture Model (GMM) algorithms, achieving a Silhouette Score of 0.615, a Davies-Bouldin Score of 0.532, and a Calinski-Harabasz Index of 60.562. In contrast, the Spectral clustering algorithm showed its best performance with two clusters, registering a Silhouette Score of 0.587, a Davies-Bouldin Score of 0.655, and a Calinski-Harabasz Index of 47.114. The results of these analyses successfully identified three distinct patient clusters via the KMeans, Agglomerative, and GMM methods, highlighting the presence of potential subtypes within the CVID population. This differentiation suggests significant underlying differences among the patients, potentially influenced by distinct genetic or environmental factors, despite their external clinical similarities (Fig. 2).
Fig. 2.
Comparison of Clustering Algorithms on a 2D Principal Component Analysis (PCA) Projection. Each panel represents the clusters formed by a different algorithm on the same dataset projected onto the first two principal components. The colors indicate the cluster assignments by each algorithm. (A) KMeans clustering showing three distinct groups. (B) Agglomerative clustering with a similar three-group distinction. (C) Spectral clustering which has identified two groups. (D) Gaussian Mixture Model (GMM) with a distribution that also suggests three groups. The study shows that utilizing various clustering methods such as KMeans, GMM, and Agglomerative effectively separates CVID samples into three distinct clusters.
Differential gene expression analysis
In this study, we investigated gene expression patterns in patients diagnosed with CVID, categorizing them into three distinct clusters. Our initial analysis identified differential expression of 40 genes across these clusters. A subsequent comparative analysis of the clusters revealed specific patterns of gene expression: 14 differentially expressed genes (DEGs) between Clusters 0 and 1, 10 DEGs between Clusters 0 and 2, and 16 DEGs between Clusters 1 and 2 (Fig. 3). Notably, nine genes—EIF5A, RPL21, ANP32A, DTX3L, NCF2, CDC42EP3, CHP1, FOLR3, and DEFA4—consistently exhibited differential expression across all clusters, underscoring their potential importance in the pathophysiology of CVID. Further, elucidation through Venn diagram analysis revealed significant overlaps in gene expression between cluster comparisons: seven genes (EIF5A, RPL21, ANP32A, DTX3L, NCF2, CDC42EP3, CHP1) were commonly differentially expressed between Clusters 1 vs. 0 and 1 vs. 2; the gene FOLR3 was jointly expressed in Clusters 2 vs. 0 and 1 vs. 2; and DEFA4 uniquely differed between Clusters 2 vs. 0 and 1 vs. 0. These findings highlight the heterogeneity of CVID and pinpoint specific genes as potential focal points for further research into their roles in modulating the disease’s complex mechanisms.
Fig. 3.
Venn diagram illustrating the overlap between three comparison groups. The blue circle (1Vs2) represents the number of genes unique to group 1 when compared to group 2, the red circle (1Vs0) represents genes unique to group 1 compared to group 0, and the green circle (2Vs0) shows genes unique to group 2 when compared to group 0. The overlapping sections show the number of genes that are common between the groups, with the central part where all three circles intersect representing genes common to all groups. Through Venn diagram analysis, nine genes were identified as overlapping among three clusters. These genes enable the differentiation of the clusters in CVID samples.
Genetic linkages in clusters
In our comprehensive investigation of CVID, a distinctive pattern of gene correlations was discovered within the first patient cluster (Fig. 4). Seven genes—EIF5A, RPL21, ANP32A, DTX3L, NCF2, CDC42EP3, and CHP1—not only exhibited differential expression but also showed significant positive correlations with each other, suggesting a potential genetic linkage or shared biological pathway crucial for the clinical manifestation of CVID in this subgroup. Through a detailed correlation matrix analysis, we assessed the interplay between demographic variables, such as age and gender, cluster assignments, and gene expression levels for a select panel of genes. Our findings highlight several robust gene–gene correlations that hint at co-regulation or shared roles in the disease’s pathogenesis, with notable associations such as EIF5A and RPL21 (r = 0.93, p < 0.001), ANP32A and DTX3L (r = 0.92, P-value < 0.001), and NCF2 and CDC42EP3 (r = 0.93, P-value < 0.001). In stark contrast, demographic factors exhibited weak correlations with gene expression levels, indicating a gene expression landscape that transcends age and gender differences within the cohort. Furthermore, the expression patterns of certain genes varied significantly with cluster assignments, particularly RPL21 and ANP32A, suggesting their potential utility in subclassifying CVID patients. Overall, this in-depth analysis of the correlation matrix reveals critical gene expressions and interactions that are likely instrumental to the pathophysiology of CVID and underscores that these genetic signatures are predominantly reflective of disease pathology rather than patient demographics. Furthermore, for feature reduction, FOLR3 and DEFA4 were chosen as class 2 and 0 discriminator, respectively. Also, CHP1 and NCF2, highly correlated genes together, were selected as class 1 discriminators (Supplementary File 1 – Figs. S1 and S2).
Fig. 4.
Pairwise comparison matrix showcasing the relationships between multiple variables across different samples. Each cell on the diagonal presents a distribution histogram for a single variable, while the off-diagonal cells show scatter plots for the pairwise comparisons between variables. Correlation coefficients are indicated within the corresponding upper cells, with red stars denoting the significance levels of the correlations (* P-value < 0.05, ** P-value < 0.01, *** P-value < 0.001). The histograms along the diagonal include overlaid density plots for a clearer visualization of the distribution shapes. The correlogram indicated that the nine identified genes exhibited no correlation with gender and age. Thus, their expression is likely influenced by the condition of the CVID samples.
Discussion
The complex and heterogeneous nature of primary immunodeficiency disorders, such as common variable immunodeficiency (CVID), presents a challenge in understanding the underlying genetic and molecular mechanisms. Establishing refined classifications based on emerging biomarkers and techniques is crucial, as patients with similar clinical presentations may exhibit distinct pathophysiological profiles. This would enable clinicians to make more informed treatment decisions based on precise diagnostic criteria.
Previously, several groups developed classification methods based on B cell function differences or immunophenotypic analysis using flow cytometry, while others focused on developing clinical criteria through large patient cohorts28. However, these methods have several limitations29. Advancements in high-throughput sequencing technologies have opened new avenues for identifying biomarkers related to gene expression, DNA methylation, B cell receptor (BCR) and T cell receptor (TCR) repertoires, gut microbiome diversity, proteomics, and metabolomics in CVID patients1,30. Integrating machine learning with transcriptomics has also shown promise in this regard27,31.
In this study, we proposed a novel classification method that involved a comprehensive analysis of transcriptomics data from 30 patients with CVID who had no complications. We identified 40 differentially expressed genes independent of age and sex and by using K-means, hierarchical agglomerative clustering, and Gaussian Mixture Models, we suggested three distinct patient clusters. These findings support the concept of genetic heterogeneity within the CVID population and align with previous studies that have identified various CVID subtypes linked to specific gene mutations such as ICOS-linked CVID132, TACI-linked CVID233, CD19-associated CVID334, BAFFR-related CVID435, CD20-mutated CVID536, CD81-related CVID637, CD21-caused CVID738, LRBA-linked CVID839, NFKB2-related CVID1040, IL21-linked CVID1141, NFKB1-associated CVID1242, IKZF1-related CVID1343, IRF2BP2-linked CVID1444, and a heterozygous mutation related to SEC61A1 in CVID1545. Patients may have different prognoses in the future despite their clinical similarities due to the differences in these gene expressions.
Additionally, our comprehensive investigation of CVID revealed differential expression of nine genes such as EIF5A, RPL21, ANP32A, DTX3L, NCF2, CDC42EP3, CHP1, FOLR3, and DEFA4 among identified 40 genes across all clusters which are suggestive of the potential role of these genes in the pathophysiology of CVID. Most of the defined genes in our study were novel, and previous studies have not investigated the alterations in the expression of these nine genes in CVID. Yet, some studies indicated the promising role of these genes in the immune system and proposed mechanisms that can be related to the pathophysiology of CVID46–48. DTX3L is involved in multiple processes, such as macromolecule metabolic regulation, protein localization, and ubiquitination, contributing to cell signaling, growth, differentiation, and apoptosis46. There might be a possible role of DTX3L in CVID since previous studies reported lower levels of ubiquitination-related genes like DTX3L in CD21-low B cells based on comprehensive multi-omics analysis of B cells from individuals with NFKB1 mutations47. NCF2 encodes a subunit of the NADPH oxidase complex in neutrophils, with mutations in this gene linked to chronic granulomatous disease. Previous studies have reported rare variants in NCF2 in the whole genome sequencing done in CVID-AcT, which can be responsible for the pathophysiology of CVID48. The role of CDC42ep3, ANP32a, DEFA-4, CHP1, RPL21, and FOLR3 in CVID has not yet been investigated in previous studies. Nonetheless, some studies have suggested multiple functions of these genes in the immune system. For instance, changes in the expression of CDC42EP3 may affect cell proliferation, apoptosis, migration, and phagocytosis, hence developing malignancy as observed in colorectal cancer49. Additionally, studies on mice have proposed the potential role of ANP32a in autoimmune disorders and/or immune deficiencies by regulating adequate adaptive immune responses and dysregulation of pathways50. CHP1 is also related to the immune system as it contributes to calcium signaling in the CD4 + TCR pathway and calcineurin-dependent NFAT signaling in lymphocytes51,52. The role of RPL21 as a gene encoding ribosomal protein L21 in immunodeficiency disease is still unknown. However, some studies reported several mechanisms of RPL21 in the immune system, which are responsible for colorectal cancer, Alzheimer’s disease, and infectious disease53–55. The RPL21 mechanism in Alzheimer’s disease is mainly done by correlating with M2 macrophages and gamma delta T cells54. In addition to Alzheimer’s, alteration of gamma delta T cells can be found in CVID, which may indicate the potential role of RPL21 in CVID by affecting gamma delta cells, which necessitates further studies to investigate the exact mechanism in CVID56,57. eIF5a encodes the protein that binds to the ribosome to assist in translating consecutive proline peptides58. Until now, several studies mentioned the role of eIF5a in immune deficiency disease59 and proposed several immunomodulatory effects of eIF5a, which include anti-inflammatory effects in macrophages60, mitochondrial respiration and activation61 and cytokine in T cells62. DEFA-4 proteins are located in azurophil granules of neutrophils and exist in several cells, such as lymphocytes, monocytes, natural killer (NK) cells, and mucosal surface epithelium. This protein engages in anti-microbial activities63, anti-viral activities64, inflammatory disease65, and autoimmune disease66. As well as DEFA-4, the alteration in expression of FOLR3 seems to be involved in immune-mediated chronic diseases due to the role of FOLR3 in the innate immune system67–70. Further studies are needed to understand the precise functions and responsible pathways associated with the modification in translation of the nine identified genes in CVID.
We performed linear separation to determine a limited set of genes implicated in CVID categorizing. Remarkably, as seen in our investigations, only four genes, namely NCF2, CHP1, FOLR3, and DEFA4, were enough for accurate classification purposes. Patients’ classification following these four genes could enhance the cost-effectiveness and identification of individuals at risk for a bad prognosis.
Moreover, our findings suggested a unique gene association in the first cluster of CVID. We revealed seven positively associated genes: EIF5A, RPL21, ANP32A, DTX3L, NCF2, CDC42EP3, and CHP1. This correlation indicated a potential linkage or shared biological pathway that could be responsible for the disease’s clinical manifestation. To back this idea, we also did a correlation matrix analysis in which strong positive links between specific genes such as EIF5A and RPL21, ANP32A and DTX3L, NCF2, and CDC42EP3 were found. These findings suggested the possibility of co-regulation or shared roles in disease pathology. Notably, NCF2 and CDC42EP3 share a common pathway by involving in the signaling by Rho GTPase71. However, the common roles or pathways shared by other genes are still unknown. We further investigated the interplay between demographic variables – such as age, gender, and gene expression levels. Additionally, our correlation analysis revealed weak associations between demographic factors (age and gender) and gene expression, suggesting that genetic alterations are the primary drivers of disease pathology in CVID, rather than patient demographics.
Limitations
Our study has some limitations. First, the limited sample size necessitates validation in larger cohorts to ensure reproducibility and generalizability. Second, although age and gender were not confounding, other unmeasured variables may influence gene expression patterns. Third, the lack of available immunological and clinical data prevented an assessment of the variations in clinical manifestations and laboratory parameters among different subgroups. Consequently, we were unable to determine whether there is a relationship between gene expression and specific clinical characteristics or paraclinical parameters within each subgroup. Finally, our focus on uncomplicated CVID patients limits the applicability of our findings to those with complications.
Future prospects
Gene expression data (transcriptome) appears to be highly valuable for diagnosis, grouping patients, predicting disease outcomes within each group, and finding biomarkers for treatment72–74. Future studies should focus on these evaluations to validate our findings further and investigate the relationships between gene expression patterns and clinical and immunological characteristics of CVID patients. Additionally, integrating data from various molecular layers, including genetics, epigenetics, and transcriptomics, is recommended to understand complex diseases such as CVID comprehensively. Determining disease subgroups based on aggregating diverse data types is suggested to enhance our insights into the disease’s heterogeneity and improve patient stratification.
Conclusion
This study analyzed transcriptomics data from CVID patients without complications and identified three potential subcategories within CVID patients. We found the differential expression patterns of nine genes—EIF5A, RPL21, ANP32A, DTX3L, NCF2, CDC42EP3, CHP1, FOLR3, and DEFA4—across three clusters among CVID patients, independent of age and gender. Furthermore, our study suggests that classifying patients based on the expression of four genes—NCF2, CHP1, FOLR3, and DEFA4—could aid in predicting their potential prognoses. Our results provide valuable insights into CVID classification and highlight novel alterations in gene expression, contributing to a deeper understanding of CVID heterogeneity.
Further studies are needed to investigate the role of these genes in the pathogenesis of CVID and to identify the exact biological pathways involved in the disease. Additionally, future studies should investigate the potential linkage between these genes to better understand the disease’s pathogenesis and identify potential therapeutic targets. To validate our findings and to increase the statistical power of the analysis, further studies with a larger sample size should be conducted.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Abbreviations
- ANP32A
Acidic nuclear phosphoprotein 32 family member A
- BCR
B cell receptor
- CDC42EP3
CDC42 effector protein 3
- CHP1
Calcineurin-like EF-hand protein 1
- CVID
Common variable immunodeficiency
- DEFA4
Defensin alpha 4
- DEGs
Differentially expressed genes
- DTX3L
Deltex E3 ubiquitin ligase 3 like
- EIF5A
Eukaryotic translation initiation factor 5A
- FOLR3
Folate receptor 3
- GEO
Gene expression omnibus
- GMM
Gaussian mixture model
- LDA
Linear discriminant analysis
- LUTD
Lower urinary tract dysfunction
- NCF2
Neutrophil cytosolic factor 2
- PCA
Principal component analysis
- PID
Primary immunodeficiency syndrome
- RPL21
Ribosomal protein L21
- TCR
T cell receptor
Author contributions
Conceptualization: MZ, KK, and ZS; In Silico Data Collection and Analysis: MZ; Writing – Original Draft Preparation: ZM, MZ, NS; Writing – Review & Editing: ZS and KK; Supervision: KK and ZS. All Authors Approved the Final Version to be published; They All Agreed to be Accountable for All Aspects of the Work.
Funding
Not available.
Data availability
The datasets used in this study (GSE51405) can be found in online repositories (GEO database; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE51405), the name of which can be found in the article. Other data presented in this study are available on request from the corresponding authors.
Declarations
Competing interests
The authors declare no competing interests.
Ethics declarations
Not available.
Consent for publication
Not available.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Zahra Salehi, Email: Zahra.salehi6463@yahoo.com, Email: zsalehi@sina.tums.ac.ir.
Kaveh Kavousi, Email: kkavousi@ut.ac.ir.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-024-74728-3.
References
- 1.Kienzler, A. K., Hargreaves, C. E. & Patel, S. Y. The role of genomics in common variable immunodeficiency disorders. Clin. Exp. Immunol.188(3), 326–332 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Ahn, S. & Cunningham-Rundles, C. Role of B cells in common variable immune deficiency. Expert Rev. Clin. Immunol.5(5), 557–564 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Chapel, H. et al. Common variable immunodeficiency disorders: Division into distinct clinical phenotypes. Blood112(2), 277–286 (2008). [DOI] [PubMed] [Google Scholar]
- 4.Resnick, E. S. & Cunningham-Rundles, C. The many faces of the clinical picture of common variable immune deficiency. Curr. Opin. Allergy Clin. Immunol.12(6), 595–601 (2012). [DOI] [PubMed] [Google Scholar]
- 5.Bonilla, F. A. et al. International consensus document (ICON): Common variable immunodeficiency disorders. J. Allergy Clin. Immunol. In Pract.4(1), 38–59 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bogaert, D. J. et al. Genes associated with common variable immunodeficiency: One diagnosis to rule them all?. J. Med. Genet.53(9), 575–590 (2016). [DOI] [PubMed] [Google Scholar]
- 7.Tseng, C.-W. et al. The incidence and prevalence of common variable immunodeficiency disease in Taiwan, a population-based study. PLoS One10(10), e0140473 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lougaris, V. & Plebani, A. The genetic heterogeneity of common variable immunodeficiency (CVID). Immunol. Genet. J., 1–14 (2020).
- 9.Liu, G. et al. Identification of candidate disease genes in patients with common variable immunodeficiency. Quant. Biol.7(3), 190–201 (2019). [Google Scholar]
- 10.Warnatz, K. et al. Severe deficiency of switched memory B cells (CD27(+)IgM(-)IgD(-)) in subgroups of patients with common variable immunodeficiency: A new approach to classify a heterogeneous disease. Blood99(5), 1544–1551 (2002). [DOI] [PubMed] [Google Scholar]
- 11.Warnatz, K. et al. Severe deficiency of switched memory B cells (CD27+ IgM− IgD−) in subgroups of patients with common variable immunodeficiency: A new approach to classify a heterogeneous disease. Blood J. Am. Soc. Hematol.99(5), 1544–1551 (2002). [DOI] [PubMed] [Google Scholar]
- 12.Piqueras, B. et al. Common variable immunodeficiency patient classification based on impaired B cell memory differentiation correlates with clinical aspects. J. Clin. Immunol.23, 385–400 (2003). [DOI] [PubMed] [Google Scholar]
- 13.Wehr, C. et al. The EUROclass trial: Defining subgroups in common variable immunodeficiency. Blood J. Am. Soc. Hematol.111(1), 77–85 (2008). [DOI] [PubMed] [Google Scholar]
- 14.Alharbi, F. & Vakanski, A. Machine learning methods for cancer classification using gene expression data: A review. Bioengineering10(2), 173 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Padroni, L. et al. Identifying MicroRNAs suitable for detection of breast cancer: A systematic review of discovery phases studies on MicroRNA expression profiles. Int. J. Mol. Sci.24(20), 15114 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Tulsyan, S. et al. A systematic review with in silico analysis on transcriptomic profile of gallbladder carcinoma. Semin. Oncol.47(6), 398–408 (2020). [DOI] [PubMed] [Google Scholar]
- 17.Gutierrez-Camino, A. et al. miRNA deregulation in childhood acute lymphoblastic leukemia: A systematic review. Epigenomics12(1), 69–80 (2020). [DOI] [PubMed] [Google Scholar]
- 18.De Felice, B. et al. Differently expressed microRNA in response to the first Ig replacement therapy in common variable immunodeficiency patients. Sci. Rep.10(1), 21482 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Babaha, F. et al. Evaluation of miR-210 expression in common variable immunodeficiency: Patients with unsolved genetic defect. Allergol. Immunopathol.49(2), 84–93 (2021). [DOI] [PubMed] [Google Scholar]
- 20.Park, J. et al. Interferon signature in the blood in inflammatory common variable immune deficiency. PloS One8(9), e74893 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Rodríguez-Ubreva, J. et al. Single-cell atlas of common variable immunodeficiency shows germinal center-associated epigenetic dysregulation in B-cell responses. Nat. Commun.10.1038/s41467-022-29450-x (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lunshof, J. E. et al. Personal genomes in progress: From the human genome project to the personal genome project. Dialog. Clin. Neurosci.12(1), 47–60 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Khan, M.F., et al. An IoMT-enabled smart healthcare model to monitor elderly people using machine learning technique. Comput. Intell. Neurosci. 2021 (2021). [DOI] [PMC free article] [PubMed]
- 24.Akshay, A. et al. Machine learning-based classification of transcriptome signatures of non-ulcerative bladder pain syndrome. Int. J. Mol. Sci.25(3), 1568 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zarei Ghobadi, M. et al. Exploration of blood− derived coding and non-coding RNA diagnostic immunological panels for COVID-19 through a co-expressed-based machine learning procedure. Front. Immunol.13, 1001070 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Swanson, K. et al., From patterns to patients: Advances in clinical machine learning for cancer diagnosis, prognosis, and treatment.Cell (2023). [DOI] [PubMed]
- 27.Daamen, A. R. et al., Classification of COVID-19 patients into clinically relevant subsets by a novel machine learning pipeline using transcriptomic features. Int. J. Mol. Sci.24(5), 2023. [DOI] [PMC free article] [PubMed]
- 28.Peng, X. P., Caballero-Oteyza, A. & Grimbacher, B. Common variable immunodeficiency: More pathways than roads to Rome. Annu. Rev. Pathol.18, 283–310 (2023). [DOI] [PubMed] [Google Scholar]
- 29.Yazdani, R. et al. Comparison of various classifications for patients with common variable immunodeficiency (CVID) using measurement of B-cell subsets. Allergol. Immunopathol. (Madr)45(2), 183–192 (2017). [DOI] [PubMed] [Google Scholar]
- 30.Aggarwal, V. et al. Recent advances in elucidating the genetics of common variable immunodeficiency. Genes. Dis.7(1), 26–37 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Akshay, A. et al., Machine learning-based classification of transcriptome signatures of non-ulcerative bladder pain syndrome. Int. J. Mol. Sci.25(3), 2024. [DOI] [PMC free article] [PubMed]
- 32.Grimbacher, B. et al. Homozygous loss of ICOS is associated with adult-onset common variable immunodeficiency. Nat. Immunol.4(3), 261–268 (2003). [DOI] [PubMed] [Google Scholar]
- 33.Castigli, E. et al. TACI is mutant in common variable immunodeficiency and IgA deficiency. Nat. Genet.37(8), 829–834 (2005). [DOI] [PubMed] [Google Scholar]
- 34.van Zelm, M. C. et al. Human CD19 and CD40L deficiencies impair antibody selection and differentially affect somatic hypermutation. J. Allergy Clin. Immunol.134(1), 135–144 (2014). [DOI] [PubMed] [Google Scholar]
- 35.Warnatz, K. et al. B-cell activating factor receptor deficiency is associated with an adult-onset antibody deficiency syndrome in humans. Proc. Natl. Acad. Sci. U. S. A.106(33), 13945–13950 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Kuijpers, T. W. et al. CD20 deficiency in humans results in impaired T cell-independent antibody responses. J. Clin. Invest.120(1), 214–222 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.van Zelm, M. C. et al. CD81 gene defect in humans disrupts CD19 complex formation and leads to antibody deficiency. J. Clin. Invest.120(4), 1265–1274 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Thiel, J. et al. Genetic CD21 deficiency is associated with hypogammaglobulinemia. J. Allergy Clin. Immunol.129(3), 801-810.e6 (2012). [DOI] [PubMed] [Google Scholar]
- 39.Lopez-Herrera, G. et al. Deleterious mutations in LRBA are associated with a syndrome of immune deficiency and autoimmunity. Am. J. Hum. Genet.90(6), 986–1001 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Chen, K. et al. Germline mutations in NFKB2 implicate the noncanonical NF-κB pathway in the pathogenesis of common variable immunodeficiency. Am. J. Hum. Genet.93(5), 812–824 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Salzer, E. et al. Early-onset inflammatory bowel disease and common variable immunodeficiency-like disease caused by IL-21 deficiency. J. Allergy Clin. Immunol.133(6), 1651–9.e12 (2014). [DOI] [PubMed] [Google Scholar]
- 42.Tuijnenburg, P. et al. Loss-of-function nuclear factor κB subunit 1 (NFKB1) variants are the most common monogenic cause of common variable immunodeficiency in Europeans. J. Allergy Clin. Immunol.142(4), 1285–1296 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kuehn, H. S. et al. Loss of B cells in patients with heterozygous mutations in IKAROS. N. Engl. J. Med.374(11), 1032–1043 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Keller, M. D. et al. Mutation in IRF2BP2 is responsible for a familial form of common variable immunodeficiency disorder. J. Allergy Clin. Immunol.138(2), 544-550.e4 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Schubert, D. et al. Plasma cell deficiency in human subjects with heterozygous mutations in Sec61 translocon alpha 1 subunit (SEC61A1). J Allergy Clin Immunol141(4), 1427–1438 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Wang, L. et al. Functions and molecular mechanisms of Deltex family ubiquitin E3 ligases in development and disease. Front. Cell. Dev. Biol.9, 706997 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Camacho-Ordonez, N. et al., Integrated multi-omics analyses of NFKB1 patients B cells points towards an up regulation of NF-κB network inhibitors (2022).
- 48.Stuchlý, J. et al. Common variable immunodeficiency patients with a phenotypic profile of immunosenescence present with thrombocytopenia. Sci. Rep.7, 39710 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Feng, Q. et al. CDC42EP3 promotes colorectal cancer through regulating cell proliferation, cell apoptosis and cell migration. Cancer Cell. Int.21(1), 169 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Chemnitz, J. et al. The acidic protein rich in leucines Anp32b is an immunomodulator of inflammation in mice. Sci. Rep.9(1), 4853 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Kim, S. et al. PubChem 2023 update. Nucleic Acids Res.51(D1), D1373–D1380 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Tomar, N. & De, R. K. A model of an integrated immune system pathway in Homo sapiens and its interaction with superantigen producing expression regulatory pathway in Staphylococcus aureus: Comparing behavior of pathogen perturbed and unperturbed pathway. PLoS One8(12), e80918 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Zhu, J. et al. RPL21 interacts with LAMP3 to promote colorectal cancer invasion and metastasis by regulating focal adhesion formation. Cell. Mol. Biol. Lett.28(1), 31 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Zhuang, X. et al. Development of a novel immune infiltration-related diagnostic model for Alzheimer’s disease using bioinformatic strategies. Front. Immunol.14, 1147501 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Squires, R., et al., Influenza life cycle. Reactome Curated Knowl. Biol. Pathw.21 (2007).
- 56.Paquin-Proulx, D. et al. Inversion of the Vδ1 to Vδ2 γδ T cell ratio in CVID is not restored by IVIg and is associated with immune activation and exhaustion. Medicine (Baltimore)95(30), e4304 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Viallard, J. F. et al. Gammadelta T lymphocytosis associated with granulomatous disease in a patient with common variable immunodeficiency. Clin. Infect. Dis.35(12), e134–e137 (2002). [DOI] [PubMed] [Google Scholar]
- 58.Barba-Aliaga, M. & Alepuz, P. Role of eIF5A in mitochondrial function.Int. J. Mol. Sci.23(3), (2022). [DOI] [PMC free article] [PubMed]
- 59.Schäfer, B. et al. Inhibition of multidrug-resistant HIV-1 by interference with cellular S-adenosylmethionine decarboxylase activity. J. Infect. Dis.194(6), 740–750 (2006). [DOI] [PubMed] [Google Scholar]
- 60.de Almeida, O. P. Jr. et al. Hypusine modification of the ribosome-binding protein eIF5A, a target for new anti-inflammatory drugs: Understanding the action of the inhibitor GC7 on a murine macrophage cell line. Curr. Pharm. Des.20(2), 284–292 (2014). [DOI] [PubMed] [Google Scholar]
- 61.Puleston, D. J. et al. Polyamines and eIF5A hypusination modulate mitochondrial respiration and macrophage activation. Cell. Metab.30(2), 352-363.e8 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Tan, T. C. J. et al. Translation factor eIF5a is essential for IFNγ production and cell cycle regulation in primary CD8(+) T lymphocytes. Nat. Commun.13(1), 7796 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Hu, H. et al. Systematic mutational analysis of human neutrophil α-defensin HNP4. Biochimica et Biophysica Acta (BBA) Biomembranes1861(4), 835–844 (2019). [DOI] [PubMed] [Google Scholar]
- 64.Wu, Z. et al. Human neutrophil α-defensin 4 inhibits HIV-1 infection in vitro. FEBS Lett.579(1), 162–166 (2005). [DOI] [PubMed] [Google Scholar]
- 65.Zhou, Q. et al. Comparative transcriptome analysis of peripheral blood mononuclear cells in hepatitis B-related acute-on-chronic liver failure. Sci. Rep.6, 20759 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Villanueva, E. et al. Netting neutrophils induce endothelial damage, infiltrate tissues, and expose immunostimulatory molecules in systemic lupus erythematosus. J. Immunol.187(1), 538–552 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Tseng, C. C. et al., Next-generation sequencing profiles of the methylome and transcriptome in peripheral blood mononuclear cells of rheumatoid arthritis. J. Clin. Med.8(9) (2019). [DOI] [PMC free article] [PubMed]
- 68.Rohde, G. et al. CXC chemokines and antimicrobial peptides in rhinovirus-induced experimental asthma exacerbations. Clin. Exp. Allergy44(7), 930–939 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.He, T., Xia, Y. & Yang, J. Systemic inflammation and chronic kidney disease in a patient due to the RNASEH2B defect. Pediatr. Rheumatol.19(1), 9 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Branco, A. C. C. C., Rogers, L. M. & Aronoff, D. M. Folate receptor beta signaling in the regulation of macrophage antimicrobial immune response: A scoping review. Biomed. Hub9(1), 31–37 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Fabregat, A. et al. Reactome diagram viewer: Data structures and strategies to boost performance. Bioinformatics (Oxford, England)34(7), 1208–1214 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Williams, C. G. et al. An introduction to spatial transcriptomics for biomedical research. Genome Med.14(1), 68 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Ahmed, Z. et al. Human gene and disease associations for clinical-genomics and precision medicine research. Clin. Transl. Med.10(1), 297–318 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Mathur, S. & Sutton, J. Personalized medicine could transform healthcare. Biomed. Rep.7(1), 3–5 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets used in this study (GSE51405) can be found in online repositories (GEO database; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE51405), the name of which can be found in the article. Other data presented in this study are available on request from the corresponding authors.




