Skip to main content
Medicine logoLink to Medicine
. 2025 Oct 24;104(43):e45317. doi: 10.1097/MD.0000000000045317

ZNF143 as a diagnostic biomarker: Insights from gene expression and immune cell infiltration in COPD and asthma

Tianyi Yang a, Qiang Li a, Guannan Jin a, Songhao Du a, Yang Yu b, Baihua Jiang a,*
PMCID: PMC12558262  PMID: 41137279

Abstract

Chronic obstructive pulmonary disease (COPD) and asthma are common and serious respiratory diseases worldwide. Their clinical overlap and lack of specificity in current biomarkers pose a great diagnostic challenge for early diagnosis. To address this gap, this study aimed to identify common transcriptomic features and potential diagnostic biomarkers for the diseases using an integrated bioinformatics approach. This study analyzed COPD chip data using weighted gene co-expression network analysis, identifying 375 key differential genes. Functional enrichment analysis was performed to assess the biological roles of these genes. Machine learning methods, including least absolute shrinkage and selection operator and random forest, were employed to identify 5 key biomarkers: MYO16, CHML, POLR3B, ZNF101, and ZNF143. The findings revealed that the identified genes were primarily associated with immune response and T cell-related inflammatory pathways. Among the biomarkers, ZNF143 was significantly upregulated in both COPD and asthma, with expression levels notably higher in COPD patients compared to asthma patients. Expression analysis and receiver operating characteristic curve assessment validated ZNF143 as a potential diagnostic biomarker. Additionally, the CIBERSORT algorithm was used to evaluate immune cell infiltration, revealing a positive correlation between ZNF143 and CD8 T cells, M2 macrophages, and γ-δ T cells, and a negative correlation with memory-activated CD4 T cells, plasma cells, and neutrophils. These findings suggest a potential role for ZNF143 in both COPD and asthma, supporting its candidacy as an early diagnostic biomarker. This research offers preliminary insights into the molecular mechanisms underlying these respiratory diseases and may inform future directions for diagnostic and therapeutic exploration.

Keywords: asthma, chronic obstructive pulmonary disease, immune infiltration, machine learning, ZNF143

1. Introduction

Chronic obstructive pulmonary disease (COPD) is a prevalent, preventable, and treatable condition. It is characterized by persistent respiratory symptoms and irreversible airflow obstruction, making it one of the most common chronic respiratory diseases globally. Its high incidence and mortality rates make it as the third leading cause of death worldwide.[1] The pathogenesis of COPD is complex, involving various factors, particularly chronic airway inflammation and its impact on immune cells.[2] Although COPD primarily affects the lungs, recent studies have shown that it often coexists with other respiratory diseases such as asthma.[3] Asthma is a chronic disease characterized by airway inflammation and reversible airflow limitation, and its coexistence with COPD may affect patients’ clinical presentation and prognosis.[3] Current data indicate that a significant proportion of COPD patients also suffer from asthma, suggesting a potential important pathological mechanism between the 2. While previous studies have explored the correlation between COPD and asthma, the specific common mechanisms remain unclear. This study aims to delve into the interactions between COPD and asthma and their potential genetic connections to identify candidate diagnostic biomarkers. The goal is to provide a theoretical basis for optimizing treatment strategies to improve understanding of the interplay between both diseases and advance clinical applications.

2. Methods

2.1. Data retrieval and download

In this study, we obtained multiple gene expression datasets related to COPD and asthma from the GEO database (https://www.ncbi.nlm.nih.gov/geo/), including GSE76925,[4] GSE38974,[5] and GSE147878.[6] The GSE76925 dataset comprises 111 COPD samples and 40 control samples, sourced from the GPL10558 platform using the Illumina HumanHT-12 V4.0 expression beadchip. The GSE38974 dataset includes 23 COPD samples and 9 control samples based on the GPL4133 platform using the Agilent-014850 Whole Human Genome Microarray 4 × 44K G4112F. Lastly, the GSE147878 dataset consists of 60 asthma samples and 13 control samples using the GPL10558 platform with the Illumina HumanHT-12 V4.0 expression beadchip. Through the analysis of datasets, we aimed to gain deeper insights into the molecular mechanisms of COPD, particularly the potential connections between COPD and asthma.

To reduce potential biases due to technical differences, platform heterogeneity, and variations in sample process, this study adopted a multi-step data normalization and batch effect correction strategy. First, raw expression data were log2-transformed to stabilize the variance and minimize the impact of extreme values. Subsequently, expression matrix was standardized using the z-score method to ensure comparability of expression levels across different samples. Finally, batch effect correction was performed using the ComBat function from the R package “sva” (https://bioconductor.org/packages/release/bioc/html/sva.html) with default parameters, while preserving biological differences.

Differential analysis of the GSE76925 dataset was conducted using the “limma” package (https://www.bioconductor.org/packages/release/bioc/html/limma.html).[7] We initially fitted a linear model using the lmFit function and assessed differential expression with the eBayes function. To filter for differentially expressed genes (DEGs), we set criteria of |logFC| > 0.5 and adj.P. Val. < .05. Subsequently, we utilized the “WGCNA” package[8] (https://github.com/Bioinformatics-rookie/WGCNA) to evaluate key module genes associated with COPD and identified shared genes from the DEGs list. The “glmnet” (https://github.com/cran/glmnet) and “randomForest” (https://github.com/cran/randomForest) packages were also applied to implement least absolute shrinkage and selection operator (LASSO)[9] and random forest[10] algorithms, respectively, to identify key genes among the shared genes. Finally, receiver operating characteristic (ROC) analysis on the key genes was performed using the roc function from the “pROC” package[11] (https://github.com/xrobin/pROC) to evaluate their predictive capabilities. The GSE38974 dataset was used to validate the screening results. This comprehensive approach was to explore potential biomarkers for COPD.

2.2. Weighted gene co-expression network analysis

Co-expression network analysis of gene expression data from COPD patients was performed using the “WGCNA” package in R. Before constructing this network, clustering of the COPD and control group samples was performed using the hclust function with the criteria set for cutting height at 140, and samples exceeding this threshold were considered to be outliers and excluded. Subsequently, the automatic network construction feature was used to generate a co-expression network. The appropriate soft threshold was calculated using the pickSoftThreshold function to ensure the scale-free property of the network. Based on this threshold, gene co-expression similarity was calculated, and an adjacency matrix was constructed. Hierarchical clustering and dynamic tree cutting methods were applied to identify gene modules. Finally, the importance of genes and their module membership relationships were assessed, revealing key gene modules associated with COPD.

2.3. Functional enrichment analysis

Functional enrichment analysis of the identified shared genes was conducted using the R package “clusterProfiler” (https://github.com/YuLab-SMU/clusterProfiler).[12] First, the shared genes were converted into a character vector. Gene ontology (GO) enrichment analysis was performed using the enrichGO function, followed by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis with the enrichKEGG function. Both analyses calculated the significance of each GO term or KEGG pathway. To control for false discovery rate, Benjamini–Hochberg method was applied for multiple testing correction. The results of the enrichment analyses were visualized using the “ggplot2” package (https://github.com/tidyverse/ggplot2).

2.4. Feature selection using LASSO and random forest

Univariate regression analysis was employed to screen overlapping genes and identify significant prognostic biomarkers. Initially, the R package “glmnet” was used to apply LASSO, a regularization regression analysis method suitable for high-dimensional data. The input gene expression matrix was standardized, and a binomial logistic model was fitted. The optimal regularization parameter λ was selected via 10-fold cross-validation using the cv.glmnet function, with mean squared error as the evaluation criterion. Two lambda values were considered: lambda.min, which yields the minimum cross-validated error, and lambda.1se, which provides a more parsimonious model within 1 standard error of the minimum. The final feature set was derived using lambda.1se, resulting in 11 candidate genes.

Simultaneously, we applied the random forest algorithm, a robust ensemble learning method, to identify genes significantly associated with disease status., A model was constructed using the R package randomForest, with 500 decision trees (ntree = 500) and a minimum node size of 1 (node size = 1) to allow for fine-grained splitting. Decision trees were generated through resampling techniques, and variable importance scores were calculated based on both Mean Decrease Accuracy and Mean Decrease Gini metrics. The top 30 genes from each metric were intersected to identify robust features, effectively managing high-dimensional data and mitigating the risk of overfitting. Finally, a comprehensive analysis was performed on the overlapping genes identified by LASSO and random forest. A two-tailed P-value < .05 was considered statistically significant. The gene expression was validated using data from the GSE38974 dataset to assess their potential as candidate diagnostic biomarkers.

2.5. ROC analysis and expression of diagnostic biomarkers

The diagnostic performance of the 5 candidate biomarkers was evaluated using the publicly available dataset GSE38974. Initially, gene expression differences between the COPD and control groups were compared using the Wilcoxon rank-sum test. Subsequently, a logistic regression model was constructed, and the pROC package was utilized to calculate the ROC curve and the area under the curve (AUC) to assess the model’s diagnostic capability.

2.6. Immune infiltration assessment

Immune infiltration analysis was performed using the R package “IOBR,” (https://github.com/IOBR/IOBR)[13] which integrated 9 immune infiltration analysis algorithms, including CIBERSORT, to provide a comprehensive assessment of immune cell infiltration. Linear support vector regression was employed to deconvolute the large-scale gene expression matrix, aiming to reduce noise and enhance the accuracy of the analysis. This method enabled us to precisely quantify the infiltration of 22 different types of immune cells in each COPD gene expression profile. During the analysis, only samples with a P-value <.05 from the CIBERSORT output were considered for further examination. Subsequently, the correlation between the candidate diagnostic biomarkers and significantly altered immune cells was assessed using Spearman correlation coefficients. Visualization of the data was performed using the “reshape2” (https://github.com/hadley/reshape) and “ggExtra” (https://github.com/daattali/ggExtra) packages in R. This analysis provided a deeper understanding of the immune microenvironment in COPD patients, aiding in the identification of potential biomarkers and therapeutic targets.

2.7. Evaluation and comparison of diagnostic biomarkers in asthma patients

The expression differences and diagnostic efficacy of the COPD biomarkers were assessed in asthma patients to explore their universality across different respiratory diseases. For this purpose, the asthma-related dataset GSE147878 was selected, and the expression matrix was extracted, log2-transformed, and standardized to ensure data consistency and comparability. Subsequently, the Wilcoxon rank-sum test was used to assess the differential expression of biomarkers between asthma and normal samples. Meanwhile, ROC curve analysis was performed to evaluate their diagnostic performance in asthma.

Additionally, to further compare the expression profiles of these biomarkers in COPD and asthma, we integrated the datasets GSE76925 and GSE147878, and batch effects were corrected using the ComBat function from the R package “sva.” The normalized data were used to analyze the expression differences of target molecules between both diseases, aiming to validate their expression consistency across different respiratory diseases. Besides, this analysis explored the potential value in disease subtype and mechanism research, serving as a reference for subsequent clinical application research.

2.8. Statistical analysis

All statistical analyses were performed using R (version 4.4.0). Differential expression and immune cell comparisons were performed using the Wilcoxon rank-sum test. Module–trait association in weighted gene co-expression network analysis (WGCNA) were assessed using Pearson correlation. P-values were adjusted using the Benjamini–Hochberg method, with adjusted P < .05 considered statistically significant.

3. Results

3.1. Identification of differential genes in COPD

Differential analysis between COPD and control samples identified 2845 DEGs, with the criteria of |logFC| > 0.5 and adj.P. Val. < .05. Among them, 1554 genes were significantly upregulated, while 1291 genes were significantly downregulated. To visually present the expression patterns of the DEGs, a volcano plot and a heatmap were utilized for analyses (Fig. 1A and B). Notably, most of the identified DEGs are involved in immune response, inflammatory signaling, and epithelial cell regulation, which are key biological processes (BP) implicated in COPD pathogenesis.

Figure 1.

Figure 1.

Differential analyses between COPD and control samples. (A) Volcano plot of DEGs generated using fold change values and adjusted P-values based on 111 COPD samples and 40 control samples from the GSE76925 dataset. Yellow indicates upregulated genes; gray represents genes with no significant difference, and blue signifies downregulated genes. (B) Heatmap of DEGs, with different colors representing trends of gene expression across different tissues. COPD = chronic obstructive pulmonary disease, DEGs = differentially expressed genes.

3.2. Construction of weighted gene co-expression network and identification of key modules in COPD

To further explore key genes in COPD, we conducted a WGCNA to identify gene modules most related to COPD. Based on scale independence and average connectivity, a soft threshold power of 4 was selected (Fig. 2A), generating 28 modules. The module clustering dendrogram is shown in Figure 2B. Additionally, we analyzed the correlation between COPD and the gene modules (Fig. 2C). The lightcyan module (202 genes, r = −0.42, P = 1e − 04), blue module (1942 genes, r = 0.32, P = .004), and tan module (367 genes, r = 0.28, P = .01) exhibited a strong association with COPD. Based on the findings, 3 modules were established as key modules for subsequent analysis. A strong association was revealed between module membership and gene significance within the lightcyan module (r = 0.64, P = 1.1e − 24), blue module (r = 0.48, P = 1.8e − 112), and tan module (r = 0.2, P = .00011; Fig. 2D). These modules represent co-expressed gene networks closely linked to COPD, offering insights into disease-related regulatory mechanisms. Ultimately, we identified 2511 key genes significantly associated with COPD across 3 modules. Additionally, an intersection analysis between the COPD-DEGs and the key genes identified through WGCNA, revealed 375 common genes, which were utilized for further research (Fig. 2E).

Figure 2.

Figure 2.

Identification of key module genes in COPD through WGCNA and Intersection with DEGs. (A) Determination of the optimal β value using the scale-free topology model, with β = 4 selected as the soft threshold based on average connectivity and scale independence. (B) Clustering dendrogram of co-expression network modules, sorted based on hierarchical gene clustering of the 1-TOM matrix. Each module has a unique color. (C) Heatmap reveals the relationship between module eigengenes and COPD status, displaying the correlation (top) and P-values (bottom) between module eigengenes and COPD status. (D) Correlation plot of gene significance versus module membership for genes in the lightcyan, blue, and tan modules. (E) Venn diagram showing the intersection of genes, identifying 375 common genes in COPD, derived from the intersection of key module genes and DEGs. COPD = chronic obstructive pulmonary disease, DEGs = differentially expressed genes, WGCNA = weighted gene co-expression network analysis.

3.3. Functional enrichment analysis of pathogenic genes associated with COPD

The results of GO and KEGG enrichment analyses based on 375 genes are illustrated. We visualized the top 5 significantly enriched terms in GO for BP, cellular components, and molecular functions (MF), along with the top ten enriched pathways in KEGG. In the GO analysis, Figure 3A displays significantly enriched terms, including T cell activation, α-β T cell activation, and positive regulation of T cell receptor signaling pathway showing the highest significance in BP. For cellular components, enriched terms mainly involved the α-β T cell receptor complex, external side of plasma membrane, and γ-δ T cell receptor complex, indicating a close association of these genes with T cell-related structural components. The top 5 molecular function entries include protein kinase binding and T cell receptor binding, highlighting the role of these genes in signal transduction. Figure 3B presents the top ten significantly enriched pathways from the KEGG analysis, with T cell receptor signaling pathway being the most significantly enriched, followed by hematopoietic cell lineage and primary immunodeficiency, reflecting the crucial role of these genes in immune response and T cell-related pathways. Additionally, the pathways MAPK signaling pathway, transcriptional dysregulation in cancer, and Th1 and Th2 cell differentiation are also significantly enriched. These enrichment results suggest that the intersecting genes are primarily involved in T cell-mediated immune regulation, highlighting their potential roles in COPD-related immunopathology.

Figure 3.

Figure 3.

Functional enrichment analysis of 375 overlapped genes. (A) Bar chart for GO analysis displays the top 5 significantly enriched terms, with purple representing biological processes (BP), blue indicating cellular components (CC), and red signifying molecular functions (MF). (B) Bar chart for KEGG analysis shows the top 10 significantly enriched pathways, with the relevant genes associated with each pathway listed below the corresponding entry. BP = biological process, CC = cellular component, GO = gene ontology, MF = molecular function, KEGG = Kyoto Encyclopedia of Genes and Genomes.

3.4. Identification of key genes with diagnostic value through machine learning

Two different algorithms were employed to screen candidate diagnostic biomarkers. Random forest algorithm identified 16 important features from the DEGs (Fig. 4A) and 11 feature variables were selected Using the LASSO logistic regression algorithm (Fig. 4B). The Venn diagram revealed 5 overlapping key genes identified by both algorithms (Fig. 4C) MYO16, CHML, POLR3B, ZNF101, and ZNF143. To enhance diagnostic and predictive performance, a nomogram was constructed based on these key genes through logistic regression analysis (Fig. 4D). Furthermore, decision curve analysis of the nomogram indicated that making decisions based on the nomogram model could be beneficial for the diagnosis of COPD (Fig. 4E).

Figure 4.

Figure 4.

Identification of candidate biomarkers for COPD diagnosis through machine learning algorithms and construction of a diagnostic nomogram for performance validation. (A) Sixteen diagnostic biomarkers were identified using the RF algorithm. (B) Eleven diagnostic biomarkers were selected through the LASSO logistic regression algorithm. (C) A Venn diagram illustrates 5 overlapped diagnostic biomarkers identified by both RF and LASSO: MYO16, CHML, POLR3B, ZNF101, and ZNF143. (D) Nomogram for the 5 diagnostic biomarkers. (E) DCA of the nomogram model. The green line labeled “None” represents the net benefit when assuming no patients have COPD. The blue line labeled “All” indicates the net benefit when assuming all patients have COPD. The red line labeled “Nomogram” represents the net benefit of identifying relevant patients based on the predicted COPD diagnosis values from the nomogram model. (F–J) Expression differences of the 5 candidate diagnostic biomarkers in the dataset GSE76925. (F) MYO16; (G) CHML; (H) POLR3B; (I) ZNF101; and (J) ZNF143. COPD = chronic obstructive pulmonary disease, DCA = decision curve analysis, LASSO = least absolute shrinkage and selection operator, RF = random forest.

After identifying these 5 genes, we 1st analyzed their expression levels in the COPD dataset GSE76925 (Fig. 4F–J). The results indicated that MYO16 (P value = 1.3e − 06), POLR3B (P value = 7.9e − 07), ZNF101 (P value = 2.3e − 08), and ZNF143 (P value = 3e − 11) were significantly overexpressed in COPD, while CHML (P value = 3.6e − 06) was significantly downregulated. Subsequently, we used the GSE38974 dataset as a validation cohort to confirm the accuracy of the aforementioned analyses and the expression of the 5 candidate diagnostic biomarkers. The results showed no significant differences in the expression of MYO16, CHML, POLR3B, and ZNF101 in COPD (Fig. 5A–D). However, ZNF143 was significantly overexpressed in COPD samples (P value = .0075; Fig. 5E). To further evaluate the diagnostic efficacy of ZNF143, we performed ROC validation using the GSE76925 and GSE38974 datasets. The ROC curve is the gold standard for assessing diagnostic accuracy and survival rates. The AUC for ZNF143 in the GSE76925 dataset was 0.855 (Fig. 5F), while in the GSE38974 dataset, the AUC was 0.802 (Fig. 5G), indicating that ZNF143 has clear diagnostic value. These results indicate that ZNF143, identified through both random forest and LASSO machine learning algorithms, exhibits strong diagnostic performance and highlights its potential as a candidate biomarker for COPD.

Figure 5.

Figure 5.

Validation of expression differences and ROC diagnostic performance of 5 candidate biomarkers in the validation cohort. (A–E) Expression differences of the 5 candidate diagnostic biomarkers in the validation cohort GSE38974, which includes 23 COPD samples and 9 control samples. (A) MYO16; (B) CHML; (C) POLR3B; (D) ZNF101; and (E) ZNF143. (F–G) ROC validation of the diagnostic efficacy of gene ZNF143. (F) Diagnostic efficacy of ZNF143 validated in the cohorts GSE76925, AUC = 0.855 and (G) GSE38974, AUC = 0.802. AUC = area under the curve, COPD = chronic obstructive pulmonary disease, ROC = receiver operating characteristic.

3.5. Evaluation of immune cell changes in COPD through immune infiltration analysis

Figure 6A shows changes in the proportions of immune cell infiltration assessed by the CIBERSORT algorithm. Specifically, neutrophils, activated memory CD4 T cells, monocytes, and resting memory CD4 T cells are significantly decreased in COPD, while gamma delta T cells, M0 macrophages, and CD8 T cells are significantly increased (Fig. 6B). Additionally, the relationship between the 5 biomarkers, particularly ZNF143, and immune infiltration in COPD was analyzed using the Spearman statistical method. ZNF143 was revealed significantly associated with various infiltrating immune cells, including M2 macrophages, neutrophils, plasma cells, activated memory CD4 T cells, CD8 T cells, and gamma delta T cells (Fig. 6C). Notably, CD8 T cells, M2 macrophages, and gamma delta T cells demonstrated significant positive correlations with ZNF143, while activated memory CD4 T cells, plasma cells, and neutrophils had significant negative correlations (Fig. 6D). These findings suggest that ZNF143 may influence COPD progression by modulating immune cell infiltration, particularly T cell- and macrophage-related responses.

Figure 6.

Figure 6.

Assessment of immune cell infiltration in COPD using the GSE76925 dataset. (A) Stacked bar chart showing the proportion of infiltrating immune cells in COPD. (B) Boxplot illustrating the differences of 22 types of infiltrating immune cells in COPD samples. (C) Correlation between biomarkers and 22 types of infiltrating immune cells. (D) Lollipop chart displaying the Spearman correlation of ZNF143 expression levels with the 22 types of infiltrating immune cells. The size of the circles represents the absolute value of the correlation coefficient, while different colors indicate the P-values of correlations: red (*** < .001), yellow (** < .01), green (* < .05), and blue ( > .05). COPD = chronic obstructive pulmonary disease.

3.6. High ZNF143 expression is positively associated with asthma

To explore the potential role of ZNF143 in asthma, we analyzed its expression levels using the asthma-related dataset GSE147878. ZNF143 was found significantly elevated in asthma samples compared to controls (P value = .0058; Fig. 7A). Furthermore, the ROC curve analysis revealed an AUC value of 0.746 for ZNF143 (Fig. 7B), suggesting a favorable predictive performance in asthma diagnosis. To further investigate the expression differences of ZNF143 between COPD and asthma, we integrated the COPD dataset GSE76925 with the asthma dataset GSE147878 using the “combat” function from the “limma” package to correct for batch effects (Fig. 7C). The results showed that ZNF143 (P value = 1.1e − 12) was highly expressed in COPD samples vs asthma samples (Fig. 7D). ZNF143 displayed diagnostic significance in asthma. Meanwhile, its differential expression between COPD and asthma might provide new insights into the pathological mechanisms of both conditions. This suggests that ZNF143 may play a shared yet disease-specific role in airway inflammation, potentially contributing to divergent immune responses in COPD and asthma.

Figure 7.

Figure 7.

Analysis of ZNF143 expression and diagnostic performance in asthma and COPD. (A) Expression differences of ZNF143 in the asthma dataset GSE147878, which includes 60 asthma samples and 13 control samples. (B) ROC analysis validating the diagnostic efficacy of ZNF143 for asthma. (C) Integration of asthma and COPD datasets with batch effect correction. (D) Expression differences of ZNF143 between COPD and asthma datasets. COPD = chronic obstructive pulmonary disease, ROC = receiver operating characteristic.

4. Discussion

COPD is characterized by persistent airflow limitation. It is primarily triggered by long-term smoking, air pollution, and occupational exposures.[14] According to the World Health Organization, this disease is now the third leading cause of death globally, significantly impacting patients’ quality of life. Recent advancements in biomarker identification and potential mechanisms, driven by high-throughput omics technologies, have opened new directions for early diagnosis and personalized treatment. However, the complexity of COPD and absent standardized biomarkers pose challenges for clinical translation.[15] Many patients are diagnosed in the later stages, missing the optimal treatment window. Future research must focus on in-depth biomarker screening, validation, and standardization to enhance early diagnosis and treatment outcomes.

COPD often coexists with other respiratory diseases, such as asthma, where airway inflammation and reversible airflow limitation may affect the clinical presentation of COPD patients. Data indicate that a significant proportion of COPD patients also have asthma.[3,16] This coexistence, referred to as asthma-COPD overlap, typically results in worse health outcomes than in patients with only one of the conditions.[16] Between 1999 and 2016, 6738 men and 12,028 women died from concurrent asthma and COPD.[16] The management of COPD involves not only treating pulmonary symptoms but also addressing systemic complications that may affect prognosis.[17] Understanding the interactions between asthma and COPD, and how to effectively manage these comorbidities, is crucial for improving patients’ quality of life and overall health.[18]

This study conducted differential expression analysis and WGCNA co-expression network analysis based on GEO dataset, identifying 2845 DEGs and 2511 module genes related to COPD, ultimately yielding 375 key differential genes. Enrichment analysis revealed that these genes are primarily associated with immune response and T cell-related inflammatory pathways. To select candidate biomarkers for diagnosis, we employed LASSO and random forest algorithms and identified 11 and 16 important biological feature variables, respectively. Among them, MYO16, CHML, POLR3B, ZNF101, and ZNF143 were common to both algorithms.

To validate the expression of the 5 candidate genes and their potential as diagnostic markers, we performed logistic regression analysis and decision curve analysis. The findings suggest that these genes might aid in the diagnosis of COPD. In the GSE76925 dataset, MYO16 (P = 1.3e − 06), POLR3B (P = 7.9e − 07), ZNF101 (P = 2.3e − 08), and ZNF143 (P = 3e − 11) were significantly upregulated, while CHML (P = 3.6e − 06) was significantly downregulated. In the GSE38974 dataset, there were no significant differences in the expression of MYO16, CHML, POLR3B, and ZNF101, while ZNF143 was significantly upregulated (P = .0075). ROC analysis and AUC statistics have been recommended for objectively assessing the performance of biomarkers in various diseases.[19] ZNF143 was effective and potential for COPD diagnosis, with an AUC of 0.855 in GSE76925 and 0.802 in GSE38974. Additionally, in the dataset GSE147878, ZNF143 (P = .0058) was significantly upregulated in asthma, with an AUC of 0.746. Furthermore, ZNF143 was significantly expressed (P = 1.1e − 12) in COPD compared to asthma, indicating its diagnostic significance in asthma. Meanwhile, its differential expression between COPD and asthma may provide new insights into the pathophysiological mechanisms of both conditions.

Through immune infiltration analysis, ZNF143 expression was significantly positively correlated with CD8 T cells, M2 macrophages, and gamma delta T cells in COPD, while negatively correlated with CD4 memory activated T cells, plasma cells, and neutrophils. These findings suggest a potential association between ZNF143 and the abundance of specific immune cell populations. Notably, these results were derived from computational correlation analyses and did not establish direct causal relationships or functional regulatory mechanisms. The observed associations may reflect indirect interactions or shared upstream regulatory factors, rather than direct immunomodulatory effects of ZNF143. Although immune infiltration analysis provides valuable insights into the immune microenvironment of COPD, relying solely on data correlation analysis to infer functional relationships has inherent limitations. The observed expression associations between ZNF143 and specific immune cell populations are insufficient to confirm its direct role in immune regulation or disease progression. To further validate the ZNF143 functional regulatory effects, systematic studies are required conducted using experimental methods such as gene knockout and overexpression, combined with immune phenotype analysis, which can be an important direction for our future research.

ZNF143 is a key transcription factor in the C2H2 zinc finger protein family, primarily regulating gene expression by binding to specific DNA sequences.[20] Research indicates that ZNF143 plays a critical role in transcriptional regulation, particularly through interactions with other transcription factors Notch1 and THAP11.[21] ZNF143 can recognize and bind specific sequences in the promoter regions of multiple genes, influencing their expression.[21] Moreover, the overlap of ZNF143’s binding sites with those of other transcription factors may regulate shared target genes through competitive binding, playing a significant role in cell proliferation and development.[21] ZNF143 not only directly binds to gene promoters but also interacts with chromatin at distal regulatory elements, guiding specific transcription processes.[22] Furthermore, ZNF143 exhibits cell-type-specific binding and is involved in regulating genes associated with cell proliferation and growth.[23] ZNF143, a member of the C2H2 zinc finger protein family, is involved in cell growth and differentiation and its full functions are still being studied.[24] In hepatocellular carcinoma, overexpression of ZNF143 is associated with poor prognosis by promoting tumor cell proliferation via CDC6 activation.[23] Notably, ZNF143 is significantly upregulated in COPD and non-small cell lung cancer, correlating with immune cell infiltration. It may influence tumor microenvironment,[25] enhancing cell survival and proliferation, and affecting the efficacy of immunotherapy by regulating immune checkpoints PD-L1/PD-L2.

In summary, this study found significant upregulation of ZNF143 in both COPD and asthma, with higher expression levels in COPD patients than those with asthma through the analysis of gene expression microarray data. The correlation between ZNF143 and immune cell infiltration suggested that it could be a potential target for more precise and personalized immunotherapy. ZNF143 represents a candidate diagnostic biomarker for COPD and asthma and provides a molecular basis for drug development.

This study still has several limitations, including insufficient exploration of the underlying mechanisms and a lack of experimental evidence. Besides, the sample size of the validation cohort GSE38974 is relatively small (including 23 COPD samples and 9 control samples), which may limit the statistical power to detect subtle expression differences and affect the robustness of the diagnostic performance evaluation. So larger independent cohorts are necessary for further validation of the ZNF143 diagnostic value. Additionally, the transcriptomic/microarray data from public databases and purely computational analysis methods also have inherent disadvantages despite such resources hold significant value in hypothesis generation and biomarker screening. Firstly, sample bias from public datasets may exist, such as uneven sample sources, inconsistent disease staging, and differences in diagnostic criteria, which may affect the representativeness and generalizability of the study results. Secondly, some datasets provides insufficient clinical variable information, such as history of tobacco use, pulmonary function parameters, and treatment regimens, unable to perform in-depth exploration of the relationship between gene expression and clinical features. This study also lacked experimental validation to confirm the specific functional mechanisms of the candidate genes in disease development. Future research should combine clinical samples with in vivo and in vitro experiments to biologically validate the computational findings. This can enhance the credibility and clinical translational value of the results.

To address these limitations and further elucidate the role of ZNF143, future research should focus on leveraging advanced high-throughput omics technologies to investigate its cell-specific expression, immune regulatory functions, and involvement in disease progression. Experimental validation and mechanistic studies are essential to establish a more detailed and reliable foundation for precision therapy in COPD and asthma.

Author contributions

Conceptualization: Tianyi Yang, Guannan Jin, Songhao Du, Baihua Jiang.

Data curation: Tianyi Yang.

Formal analysis: Tianyi Yang, Qiang Li, Yang Yu.

Methodology: Tianyi Yang.

Software: Tianyi Yang.

Writing – original draft: Tianyi Yang, Qiang Li.

Writing – review & editing: Baihua Jiang.

Abbreviations:

AUC
area under the curve
BP
biological processes
COPD
chronic obstructive pulmonary disease
DEGs
differentially expressed genes
KEGG
Kyoto Encyclopedia of Genes and Genomes
LASSO
least absolute shrinkage and selection operator
ROC
receiver operating characteristic
WGCNA
weighted gene co-expression network analysis

How to cite this article: Yang T, Li Q, Jin G, Du S, Yu Y, Jiang B. ZNF143 as a diagnostic biomarker: Insights from gene expression and immune cell infiltration in COPD and asthma. Medicine 2025;104:43(e45317).

The authors have no funding and conflicts of interest to disclose.

The datasets generated and/or analyzed during the current study are available in the GEO repository under the accession numbers GSE76925, GSE38974, and GSE147878.

Contributor Information

Tianyi Yang, Email: yy177001617@126.com.

Qiang Li, Email: liqiangzyfb@163.com.

Guannan Jin, Email: jinguannan1@163.com.

Songhao Du, Email: dshadoctor@126.com.

Yang Yu, Email: yuyang5757@163.com.

References

  • [1].Qaiser M, Khan N, Jain A. Ultrasonographic assessment of diaphragmatic excursion and its correlation with spirometry in chronic obstructive pulmonary disease patients. Int J Appl Basic Med Res. 2020;10:256–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Jia Y, He T, Wu D, et al. The treatment of Qibai Pingfei Capsule on chronic obstructive pulmonary disease may be mediated by Th17/Treg balance and gut-lung axis microbiota. J Transl Med. 2022;20:281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].John C, Guyatt AL, Shrine N, et al. Genetic associations and architecture of asthma-COPD overlap. Chest. 2022;161:1155–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Morrow JD, Zhou X, Lao T, et al. Functional interactors of three genome-wide association study genes are differentially expressed in severe chronic obstructive pulmonary disease lung tissue. Sci Rep. 2017;7:44232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Ezzie ME, Crawford M, Cho JH, et al. Gene expression networks in COPD: microRNA and mRNA regulation. Thorax. 2012;67:122–31. [DOI] [PubMed] [Google Scholar]
  • [6].Sánchez-Ovando S, Pavlidis S, Kermani NZ, et al. Pathways linked to unresolved inflammation and airway remodelling characterize the transcriptome in two independent severe asthma cohorts. Respirology. 2022;27:730–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 2008;9:559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Tibshirani R. Regression shrinkage and selection via the LASSO. J R Stat Soc Series B Stat Methodol. 2018;58:267–88. [Google Scholar]
  • [10].Jiang T, Gradus JL, Lash TL, Fox MP. Addressing measurement error in random forests using quantitative bias analysis. Am J Epidemiol. 2021;190:1830–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinf. 2011;12:77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Wu T, Hu E, Xu S, et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovation (Camb). 2021;2:100141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Zeng D, Ye Z, Shen R, et al. IOBR: multi-omics immuno-oncology biological research to decode tumor microenvironment and signatures. Front Immunol. 2021;12:687975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Derom E, Brusselle GG, Joos GF. The once-daily fixed-dose combination of olodaterol and tiotropium in the management of COPD: current evidence and future prospects. Ther Adv Respir Dis. 2019;13:1753466619843426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Yahaya B. Understanding cellular mechanisms underlying airway epithelial repair: selecting the most appropriate animal models. ScientificWorldJournal. 2012;2012:961684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Dodd KE, Wood J, Mazurek JM. Mortality among persons with both asthma and chronic obstructive pulmonary disease aged ≥25 years, by industry and occupation - United States, 1999-2016. MMWR Morb Mortal Wkly Rep. 2020;69:670–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Nussbaumer-Ochsner Y, Rabe KF. Systemic manifestations of COPD. Chest. 2011;139:165–73. [DOI] [PubMed] [Google Scholar]
  • [18].Pleasants RA, Ohar JA, Croft JB, et al. Chronic obstructive pulmonary disease and asthma-patient characteristics and health impairment. COPD. 2014;11:256–66. [DOI] [PubMed] [Google Scholar]
  • [19].Chen YW, Leung JM, Sin DD. A systematic review of diagnostic biomarkers of COPD exacerbation. PLoS One. 2016;11:e0158843. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Izumi H, Wakasugi T, Shimajiri S, et al. Role of ZNF143 in tumor growth through transcriptional regulation of DNA replication and cell-cycle-associated genes. Cancer Sci. 2010;101:2538–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Ngondo-Mbongo RP, Myslinski E, Aster JC, Carbon P. Modulation of gene expression via overlapping binding sites exerted by ZNF143, Notch1 and THAP11. Nucleic Acids Res. 2013;41:4000–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Bailey SD, Zhang X, Desai K, et al. ZNF143 provides sequence specificity to secure chromatin interactions at gene promoters. Nat Commun. 2015;2:6186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Zhang L, Huo Q, Ge C, et al. ZNF143-mediated H3K9 trimethylation upregulates CDC6 by activating MDIG in hepatocellular carcinoma. Cancer Res. 2020;80:2599–611. [DOI] [PubMed] [Google Scholar]
  • [24].Ni W, Perez AA, Schreiner S, Nicolet CM, Farnham PJ. Characterization of the ZFX family of transcription factors that bind downstream of the start site of CpG island promoters. Nucleic Acids Res. 2020;48:5986–6000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Feng Z, Yin Y, Liu B, et al. ZNF143 expression is associated with COPD and tumor microenvironment in non-small cell lung cancer. Int J Chron Obstruct Pulmon Dis. 2022;17:685–700. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Medicine are provided here courtesy of Wolters Kluwer Health

RESOURCES