Abstract
Thyroid cancer (THCA) remains a prevalent endocrine malignancy, with limited molecular markers for accurate diagnosis and targeted therapy. This study aimed to identify key biomarkers and therapeutic targets for THCA using integrative bioinformatics and experimental validation. We used differential gene expression analysis, gene enrichment analysis, protein-protein interaction (PPI) network construction, and machine learning algorithms to identify potential biomarkers. Receiver operating characteristic (ROC) curve analysis, immunohistochemical data from the Human Protein Atlas (HPA), quantitative real-time PCR (qPCR), and immune infiltration analysis were used for further validation. Additionally, ROC and multivariate Cox regression analyses were conducted to quantitatively evaluate the diagnostic and prognostic performance of candidate genes. Four candidate biomarkers (VEGFA, TYK2, NRP1, and C5AR1) were identified, but only C5AR1 showed statistical significance in ROC and prognostic analyses. Specifically, C5AR1 demonstrated a strong diagnostic value (AUC = 0.873, 95% CI: 0.822–0.924) and was significantly associated with poorer overall survival (hazard ratio [HR] = 2.41, 95% CI: 1.15–5.06, p = 0.021). Further qPCR and immune infiltration analyses confirmed that C5AR1 expression was associated with immune cell infiltration, potentially influencing THCA progression. This study identifies C5AR1 as a key biomarker in THCA, suggesting its role in immune-related tumor progression. These findings indicate statistical associations rather than causal mechanisms, highlighting the need for further experimental validation. The results provide a foundation for targeted immunotherapy strategies in THCA treatment.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-025-32205-5.
Keywords: Bioinformatics analysis, Thyroid cancer, DEGs, C5AR1, Immune infiltration, Machine learning
Subject terms: Endocrine cancer, Oncogenes
Introduction
Thyroid cancer (THCA) is one of the most common endocrine malignancies, accounting for more than 90% of all endocrine malignancies1. The incidence of this disease has steadily increased worldwide, reaching about 567,000 cases in 20242. Several factors contribute to THCA development, including obesity, smoking, overweight, and radiation exposure3. Currently, the therapeutic approaches include surgical resection, radioactive iodine ablation, chemotherapy, and molecular targeted therapy4. However, these conventional approaches still cannot completely cure THCA. Given this limitation, investigating genetic factors involved in THCA progression offers promising avenues for developing more effective treatments. An increasing body of evidence suggests that many genes contribute to THCA initiation and progression, partly through ion channel–related mechanisms5. Hence, the investigation of THCA at the molecular level remains an important research focus.
The use of high-throughput gene profiling is an effective way to identify key disease-related genes. Next-generation sequencing (NGS), such as RNA sequencing (RNA-seq), has become the standard for unbiased transcriptome profiling, overcoming many limitations of older technologies like microarrays. Although microarray technology, as an early high-throughput approach, has been widely used for gene expression analysis6, it has inherent sensitivity limitations. It has been applied in many studies to identify molecular markers, therapeutic targets, and differentially expressed genes7–9. However, the sensitivity of microarray technology is limited by technical constraints, leading to less reliable detection of low-expression genes10. RNA-seq offers higher sensitivity and a broader dynamic range, enabling detection of low-abundance transcripts. The latest literature shows that combining machine learning with NGS data improves the accuracy of gene identification and functional prediction11.
Therefore, this study conducted a comprehensive analysis of THCA using microarray data, bioinformatics, and machine learning methods. To overcome the limitations of previous single-cohort or single-method studies, we adopted an integrative analytical strategy that combines multi-algorithm bioinformatics and independent experimental validation. This multi-level framework provides more reliable identification of immune-related biomarkers and offers THCA-specific insights into tumor progression and immune infiltration, potentially supporting the discovery of new therapeutic targets and improving clinical understanding.
Methods and materials
THCA dataset download and processing
The gene expression datasets were obtained from the Gene Expression Omnibus (GEO) database, which is a comprehensive repository for gene expression data (https://www.ncbi.nlm.nih.gov/geo/). GSE33630 and GSE65144 were used as the analysis set, GSE29265 and GSE53157 were used as the validation set. All datasets were generated from thyroid tissue samples, including both thyroid carcinoma tissues and matched adjacent normal thyroid tissues that served as controls. The data for these sets have been compiled and documented in Supplementary Table 1. “Limma” software (R version 4.2.2, Limma version 3.52.4) was employed for analysis of the mRNA expression differences between these sets12. Differentially expressed genes (DEGs) were defined using the criteria |log2FC| > 1 and adjusted p < 0.05, resulting in the identification of 6085 DEGs. The R package “EnhancedVolcano” (version 1.16.0) was utilised for the creation of differential expression volcano maps. Subsequently, these DEGs were intersected with 1793 immune-related genes (IRGs) obtained from the ImmPort database, yielding 322 immune-associated DEGs for downstream prioritization and functional analysis.
Functional enrichment analysis
Functional enrichment analysis was conducted on the identified genes. Gene Ontology (GO) is a widely used methodology for assigning functional annotations to genes, primarily encompassing molecular functions (MFs), biological processes (BPs), and cellular components (CCs). The utilisation of Kyoto Encyclopaedia of Genes and Genomes (KEGG) enrichment analysis enabled the identification of pathways that are associated with specific genes13. The utilisation of the “GOplot” package (version 1.0.2) and the “clusterProfiler” package (version 4.6.0) in the R programming language (version 4.2.2) enabled the identification of the GO functions and KEGG pathways associated with relevant genes14.
PPI network construction and core gene identification
The STRING database (http://string-db.org/) provided a search tool for the obtaining of interacting genes. The purpose of this tool was to conduct a PPI data analysis. In order to assess potential PPI associations, the DEGs identified earlier were mapped to the STRING database. The extracted PPI pair had a comprehensive score of 0.4. The PPI network was subsequently visualised via the Cytoscape programme (www.Cytoscape.org/). Nodes that exhibited high connectedness and intermediary centrality tended to play crucial roles in ensuring the overall stability of the network. CytoHubba (version 2.1) and CytoNCA (version 2.1) are two software plugins commonly employed in the field of cellular biology for the purpose of quantifying the degree and betweenness centrality values of individual protein nodes within a given network. In this study, we identified genes that met the criteria of having a degree of ≥ 6 and a betweenness centrality of ≥ 860 as core genes (Supplementary Table 2)15.
Identification of different genes
Additional analysis of the aforementioned genes utilised machine learning methods. The Support Vector Machines (SVMs) algorithm, implemented using the e1071 R package (version 1.7–10), is a machine learning approach, utilised for supervised regression or classification tasks, that necessitates the availability of a labelled training dataset16. Support Vector Machine Recursive Feature Elimination (SVM RFE) is a machine learning technique utilised to train feature subsets across many categories, with the aim of reducing the feature set and identifying the most predictive features. The RandomForest algorithm, implemented using the randomForest R package (version 4.7-1), is commonly employed for gene ranking purposes, with relative values over 0.25 being generally regarded as typical indicators of causative roles17. Through the employment of these two methodologies, key genes were found.
Single-sample gene set enrichment analysis
Single-sample gene set enrichment analysis (ssGSEA) is a method that extends the gene set enrichment analysis (GSEA) approach by evaluating the enrichment score for each sample and gene set combination. The gene set variation analysis (GSA) R package (version 4.7-1) was utilised to conduct ssGSEA of the designated important genes18.
Reconstructing a PPI network based on selected key genes
GeneMania is a web-based platform (http://www.genemania.org) that is capable of generating protein–protein interaction networks through the prediction of upstream genes associated with target genes. The primary functionalities of this website encompass physical interactions, coexpression, colocalisation, gene enrichment analysis, genetic connections, and site prediction. In the present study, the primary objective was to utilise the constructed PPI network to identify important genes19.
Receiver operating characteristic analysis
The “pROC” package (version 1.18.0) in the R programming language (version 4.2.2) was utilised to generate an ROC curve, which was employed to assess the diagnostic importances of crucial genes in clinical settings by examining the area under the curve (AUC)20.
Prognosis analysis
In this study, a deep forest map was utilised to conduct a univariate Cox regression analysis. The “forestplot” R programme (version 4.2.2) was employed for this purpose. The resulting analysis included p-values, hazard ratios (HRs), and 95% confidence intervals, which were shown afterwards.
Expression analysis of key genes in THCA
The UALCAN data portal is a network resource that offers interactive features (http://ualcan.path.uab.edu/)21. The expressions of the pivotal genes involved in THCA were examined by utilisation of its data.
Immune infiltration analysis
In our study, we employed many methods, including ESTIMATE22, QUANTISEQ23, xCELL24, CIBERSORT25, EPIC26, Timer27, and IPS28, to investigate the correlations between the expression levels of specific genes and immune infiltration across all tumours in the Cancer Genome Atlas (TCGA) dataset.
Quantitative real-time RT-PCR
The recruitment period for this study began on June 30, 2023, and ended on December 30, 2023, involving a total of 100 adult inpatients diagnosed with THCA. Minor inpatients were not included. The inclusion criteria were: (1) histologically confirmed thyroid carcinoma; (2) no prior chemotherapy, radiotherapy, or targeted therapy before surgical resection; and (3) age ≥ 18 years with signed informed consent. The exclusion criteria were: (1) concurrent malignancies in other organs; (2) severe systemic diseases or active infections that might interfere with molecular analyses; and (3) incomplete clinical data or inadequate tissue samples. Among these patients, 42 were male and 58 were female, with a mean age of 47.6 ± 12.3 years (range: 21–72 years). According to the AJCC staging system, 26 cases were stage I, 39 cases stage II, 23 cases stage III, and 12 cases stage IV. Histologically, 82 cases were papillary thyroid carcinoma and 18 cases were follicular thyroid carcinoma. The research protocol strictly adhered to the principles outlined in the Declaration of Helsinki, ensuring ethical treatment of human tissues. Furthermore, the study received approval from the Clinical Research Ethics Committee of The First Hospital of Hebei Medical University (NO. S00618). Prior to their participation, each patient provided informed consent by signing the necessary forms.
Total RNA was extracted from the human thyroid tissues using an RNA isolation kit (RNAiso Plus, Cat. No. 9109, Takara Bio Inc., Shiga, Japan). In this experiment, the isolated RNA was dissolved in 20 mL of DEPC-treated water and then reverse-transcribed using a reverse transcription reagent kit (PrimeScript RT Reagent Kit with gDNA Eraser, Cat. No. RR047A, Takara Bio Inc.) and a thermal cycler (Mastercycler Nexus, Eppendorf, Hamburg, Germany). The resulting cDNA was used for the qPCR assay, and amplification curves were generated using SYBR Premix Ex Taq II (Cat. No. RR820A, Takara Bio Inc.).
Each qPCR reaction was performed in triplicate, and the relative mRNA expression levels were calculated using the 2^−ΔΔCt method. GAPDH was used as the internal housekeeping gene for normalization to ensure comparability between samples.
This experimental validation ensures the accuracy and reliability of the identified gene expression levels, reinforcing the robustness of our results. The primers used in the quantitative PCR were as follows:
GAPDH (Forward: 5′-AATGGACAACTGGTCGTGGAC-3′; Reverse: 5′-CCCTCCAGGGGATCTGTTTG-3′; Ta = 60 °C; amplicon size = 124 bp; RefSeq accession: NM_002046).
C5AR1 (Forward: 5′-CGCTTTCTGCTGGTGTTT-3′; Reverse: 5′-TTTGTCGTGGCTGTAGTCC-3′; Ta = 60 °C; amplicon size = 138 bp; RefSeq accession: NM_001736).
Immunohistochemical staining of C5AR1
The Human Protein Atlas (HPA) database (https://www.Proteinatlas.org/) provides detailed information on the distribution of proteins in human tissues and cells. We selected immunohistochemical images of C5AR1 in THCA and normal tissues from the HPA database. These images were used to detect the differential expression of C5AR1 at the protein level29.
Results
Screening and functional enrichment analysis of DEGs in THCA
After applying standard preprocessing procedures, we observed that the GSE33630 and GSE65144 datasets, which were initially non-overlapping in the UMAP distribution (Fig. 1A), became fully overlapping (Fig. 1B). This indicates they can be analyzed as a unified dataset for downstream integration and analysis. The preprocessing steps included background correction, quantile normalization, and log2 transformation. Based on statistical significance (P < 0.05) and a fold change threshold (|log2 (FC)| ≥ 1), a total of 6085 DEGs were identified. To control for multiple testing, we employed the False Discovery Rate (FDR) to adjust p-values, ensuring that the results were robust and reliable. Their expression profiles were visualized using volcano plots (Fig. 1C). Subsequently, functional enrichment analysis was conducted. The results showed BP enrichment primarily in macromolecule biosynthesis, cellular protein metabolism, and nitrogen compound biosynthesis. The MF enrichment was mostly associated with catalytic activity, cation binding, and metal ion binding. The CC enrichment primarily pertained to the cytosol, the nuclei, and the protein-containing complex (Fig. 1D). KEGG pathway analysis indicated enrichment in herpes simplex virus 1 infection, cancer, and PI3K–Akt signaling pathways (Fig. 1E).
Fig. 1.
(A) Umap distribution map before normalisation between samples. (B) Umap distribution map after normalisation between samples. (C) Volcano maps of DEGs. (D) GO analysis of DEGs. (E) KEGG analysis of DEGs (www.kegg.jp/kegg/kegg1.html).
Selection and functional enrichment analysis of immune-related DEGs in THCA
To identify immune-related DEGs, we conducted a screening process using IRGs from the ImmPort database (1793 genes). After merging and intersecting these IRGs with DEGs, 322 immune-related DEGs were identified in THCA (Supplementary Fig. 1A). Subsequently, functional enrichment analysis was performed on these DEGs. The analysis showed that BP enrichment primarily involved the regulation of responses to chemicals, stimuli, and organic substances. MF enrichment was mainly related to signalling receptor binding as well as molecular function regulator and molecular transducer activity. CC enrichment was mainly related to the extracellular regions, vesicles, and endomembrane system (Supplementary Fig. 1B). KEGG analysis revealed significant enrichment in cancer pathways, cytokine–cytokine receptor interactions, and the PI3K–Akt signaling pathway (Supplementary Fig. 1C).
PPI network construction and core gene identification in THCA
To further investigate the screened DEGs, we used a STRING tool to predict protein–protein interactions among them and constructed a comprehensive PPI network for analysis. The network consisted of 317 nodes and 987 edges (Supplementary Fig. 2). Subsequently, the degree and betweenness centrality values of each protein node were computed using the software tools CytoHubba and CytoNCA within the Cytoscape platform (Fig. 2A and B). Genes with degree ≥ 6 and betweenness centrality ≥ 860 were considered core genes, resulting in 39 core genes. (Supplementary Fig. 3A). Among them, C5AR1, VEGFA, NRP1, and TYK2 were consistently highlighted, suggesting that PPI network analysis not only narrowed the DEG list to biologically central candidates but also provided systems-level support for the subsequent selection of C5AR1. Furthermore, it was important to perform functional enrichment analysis on the aforementioned data. The results thereof showed that BP enrichment was mainly related to the cell surface receptor signalling, chemical response, and intracellular signal transduction pathways. MF enrichment was mainly related to carbohydrate derivative binding, catalytic activity, and anion binding. CC enrichment was mainly related to vesicles, cytosol, and the endomembrane system (Supplementary Fig. 3B). KEGG pathway analysis identified cancer, Ras, and PI3K–Akt signaling as dominant pathways (Supplementary Fig. 3C).
Fig. 2.
(A) The network diagram calculated with CytoHubba. (B) The network diagram calculated with CytoNCA. (C) The expression of core genes, verified through Random Forest algorithm selection. (D, E) Selection and validation of the expression of core genes through the SVM RFE algorithm. (F) Key genes screened through both algorithms.
Selection of key genes in THCA
After identifying the core genes, machine learning algorithms were used to further select key genes. Two machine learning algorithms were employed for the purpose. The Random Forest algorithm was used to examine the association between the error rates and the quantities of classification trees by integrating feature selection and, afterwards, evaluating essential predictive genes (Fig. 2C). The SVM-RFE method is capable of identifying core predictive genes by considering statistically significant univariate features (Fig. 2D and E). A Venn diagram analysis revealed four overlapping key genes (VEGFA, TYK2, NRP1, and C5AR1) (Fig. 2F).
Single-sample gene set enrichment analysis of key genes in THCA
To better understand the role of key genes in THCA progression, we performed ssGSEA and divided THCA samples into high- and low-expression groups based on the median expression of each key gene. The analysis revealed that high-VEGFA samples were enriched in DNA replication, proteasome activity, and Escherichia coli infection pathways, while low-VEGFA samples showed enrichment in peroxisome function and circadian rhythm pathways (Fig. 3A). For TYK2, high-expression samples were enriched in immune-related pathways such as allograft rejection and graft-versus-host disease, whereas low-expression samples were enriched in metabolic processes such as oxidative phosphorylation (Fig. 3B). For NRP1, high expression correlated with proteasome and DNA replication pathways, while low expression correlated with water reabsorption pathways (Fig. 3C). For C5AR1, high expression was associated with immune and inflammatory pathways, including systemic lupus erythematosus and renin–angiotensin signaling, whereas low expression correlated with amino acid metabolism and GPI anchor biosynthesis (Fig. 3D).
Fig. 3.
(A) ssGSEA of VEGFA. (B) ssGSEA of TYK2. (C) ssGSEA of NRP1. (D) ssGSEA of C5AR1.
Reconstruction of a protein-protein interaction network based on the selected key genes in THCA
To explore upstream interactions, we used the GeneMANIA database to predict protein–protein interaction (PPI) networks of the key genes (Supplementary Fig. 4). GO and KEGG enrichment analyses were then performed. BP enrichment was mainly related to receptor signaling, chemical responses, and stimulus regulation, MF enrichment included receptor binding and transducer activity, and CC enrichment was associated with plasma membrane and extracellular regions (Fig. 4A). KEGG analysis identified PI3K–Akt, cancer, and JAK–STAT signaling as major pathways (Fig. 4B).
Fig. 4.
(A) GO analysis of key and upstream genes. (B) KEGG analysis of key and upstream genes (www.kegg.jp/kegg/kegg1.html). (C) ROC analysis of C5AR1, NRP1, TYK2 and VEGFA. (D) Prognostic analysis of C5AR1, NRP1, TYK2 and VEGFA.
ROC analysis and prognostic value of key genes in THCA
To assess the diagnostic potential of the key genes, we conducted ROC analysis. C5AR1 (AUC = 0.79), NRP1 (AUC = 0.86), TYK2 (AUC = 0.70), and VEGFA (AUC = 0.75) showed significant discriminatory power (Fig. 4C), indicating diagnostic relevance for THCA.
We further evaluated their prognostic value using a pan-cancer survival analysis based on TCGA, TARGET, and GTEx datasets from the UCSC Xena Browser. Data were uniformly processed, and samples with short follow-up or small group sizes were excluded. The analysis identified C5AR1 as a significant risk factor in THCA (P = 0.02), while NRP1, TYK2, and VEGFA were not statistically significant (P > 0.05) (Fig. 4D).
Expression analysis of C5AR1 in THCA
As C5AR1 showed significant diagnostic and prognostic relevance, we performed further analysis. Immunohistochemical results showed elevated C5AR1 protein expression in THCA tissues compared with normal tissues (Fig. 5A). qPCR confirmed significantly higher C5AR1 mRNA levels in THCA tissues (Fig. 5B). ESTIMATE algorithm analysis revealed a strong correlation between C5AR1 expression and immune infiltration (R = 0.56, P = 1.9e − 43) (Fig. 5C). These results were validated using TCGA data, which also demonstrated significantly higher C5AR1 expression in THCA (Fig. 5D–F). In validation datasets GSE29265 and GSE53157, C5AR1 remained consistently upregulated (Fig. 5G,H). The AUC values in these datasets were 0.72 and 0.60, confirming diagnostic potential (Fig. 5I,J).
Fig. 5.
(A) Immunohistochemical staining of C5AR1. (B) Quantitative analysis of C5AR1 mRNA transcription levels in THCA (C) ESTIMATE algorithm analysis of the immune invasion of C5AR1 in THCA. (D) The expressions of C5AR1 in THCA and normal tissue samples. (E) Composition of in normal tissue and THCA. Every dot represents one dataset, and the gray line represents samples from the same dataset. Paired Wilcoxon-ranked sum test, Bonferroni corrected. (F) The expression of C5AR1 in the progression of THCA. (G) The expression of C5AR1 between THCA and control group in GSE29265 validation set. (H) The expression of C5AR1 between THCA and control group in GSE53157 validation set. (I) The ROC curve of the diagnostic efcacy verifcation in GSE29265 validation set. (J) The ROC curve of the diagnostic efcacy verifcation in GSE53157 validation set.
Immunoinfiltration analysis of C5AR1 in THCA
Because C5AR1 is an immune-related DEG, we further examined its association with immune infiltration. QUANTISEQ analysis showed that higher C5AR1 expression correlated with increased infiltration of B cells, M1/M2 macrophages, neutrophils, NK cells, CD8⁺ T cells, and Tregs, while CD4⁺ T cells and dendritic cells were decreased (Fig. 6A). Other algorithms (xCELL, CIBERSORT, EPIC, TIMER) produced consistent overall trends, though with some variations in specific immune cell types (Fig. 6B–E). IPS analysis revealed a negative correlation between C5AR1 expression and immunotherapy response scores, indicating that elevated C5AR1 expression may suppress antitumor immunity and promote THCA progression (Fig. 6F).
Fig. 6.
(A) QUANTISEQ scores for C5AR1. (B) xCELL scores for C5AR1. (C) CIBERSORT scores for C5AR1. (D) EPIC scores for C5AR1. (E) Timer scores for C5AR1. (F) IPS scores for C5AR1.
Discussion
Thyroid cancer (THCA) shows a rising global incidence but maintains a relatively low fatality rate30–33. However, it is prone to lymph node metastasis, which contributes to poor prognosis and reduced survival34–38. Effective THCA management depends on early detection, but reliable biomarkers remain insufficient39,40. This study aimed to identify differentially expressed genes (DEGs) that could serve as biomarkers and potential therapeutic candidates for THCA. Our integrative analysis identified C5AR1 as a key regulator potentially influencing immune responses within the tumor microenvironment, suggesting its value for early diagnosis and disease monitoring.
By integrating multiple immune deconvolution algorithms (ESTIMATE, QUANTISEQ, xCell, CIBERSORT, EPIC, TIMER, and IPS) with machine learning–based gene prioritization and qPCR validation, we consistently observed that C5AR1 expression correlates with immune infiltration patterns. These findings support the immunological and clinical relevance of C5AR1 specifically in THCA, distinguishing it from previously reported associations in other cancers.
Through comprehensive bioinformatic analysis, several DEGs linked to protein synthesis and cancer-related pathways were identified, among which C5AR1 emerged as the most robust candidate, showing significantly elevated expression in THCA tissues compared with normal controls. Pro-inflammatory mediators such as IL-6 and TNF-α were also upregulated, suggesting that C5AR1 may participate in inflammation-driven tumor progression. These results collectively indicate that C5AR1 contributes to an immunosuppressive microenvironment that promotes tumor growth.
C5AR1, a G protein–coupled receptor (GPCR) known to mediate inflammation and immune dysregulation41, has been implicated in multiple malignancies42–48. Our study confirms its significant upregulation in THCA. The PPI network analysis highlights its central role in immune and chemotaxis-related pathways, reinforcing its biological relevance. While our results demonstrate a strong association between C5AR1 expression and immune infiltration, this relationship should be interpreted as correlative rather than causal, and functional validation remains necessary.
Although consistent with findings in other cancers, our data emphasize the need for further mechanistic studies to clarify how C5AR1 modulates immune cell behavior in THCA. Future research should assess both local and systemic immune effects to establish a comprehensive understanding of its immunoregulatory role. In addition, other DEGs related to inflammation and metabolism were identified, but their biological significance requires further investigation.
Several limitations should be acknowledged. First, external independent datasets for survival validation were unavailable, limiting generalizability. Second, although validation cohorts showed significant AUC values, these were modest, possibly due to inter-cohort heterogeneity and sample size. Third, no in vivo or in vitro experiments were conducted, leaving the biological role of C5AR1 to be validated in future studies.
Ultimately, if validated experimentally, C5AR1 may serve as a potential immunoregulatory target in THCA, offering new directions for immunotherapy development and improved patient stratification.
Conclusions
Through integrated bioinformatics analysis, this study identified C5AR1 as a potential regulator of THCA progression via immune cell modulation. However, the underlying biological mechanisms remain to be clarified. This research provides an important foundation for understanding the immune-related role of C5AR1 in THCA, and the analytical framework used here may be extended to other cancers.
While our findings suggest diagnostic and prognostic potential, further research is needed to fully define C5AR1’s functional role in thyroid cancer. Importantly, C5AR1 should be regarded primarily as a biomarker reflecting immune activity and disease state, and future experimental validation could support its development as a therapeutic target.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgements
We gratefully acknowledge The First Hospital of Hebei Medical University for providing conducive research spaces that have greatly facilitated our work and collaboration.
Author contributions
Conceptualisation, Y.S., B.L. and F.S.; software, X.M.; Methodology, L.X.; validation, H.W.; Project administration, X.M.; formal analysis, Z.L.; investigation, J.F.; data curation, T.J.; Funding acquisition, Y.S.; Resources, B.L.; Visualization, F.S.; writing—original draft preparation, Y.S., B.L. and F.S.; writing—review and editing, B.L. and F.S.; supervision, Q.W.
Funding
This study was supported by grants from Health Innovation Project of Hebei Provincial Science and Technology Department (No. 22372409D), Scientific Research Project of health Department of Hebei Province (No. 20231063), Scientific Research Project of health Department of Hebei Province (No.20160686), Natural science foundation of Hebei Province (No. H2022206418), The First Hospital of Hebei Medical University Spark Program Outstanding Youth Fund (No. XH202212), Hebei Province Young Top Talent Program (No. BJ2025056) and Scientific Research Project of health Department of Hebei Province (No. 20240643).
Data availability
The datasets used and analysed during the current study available from the corresponding author on reasonable request.
Declarations
Competing interests
The authors declare no competing interests.
Ethical approval
All procedures involving human participants enrolled in this study were conducted in accordance with the ethical standards of the Clinical Research Ethics Committee of The First Hospital of Hebei Medical University (NO. S00618).
Consent for publication
The authors declared that they had reviewed, approved, and consented to the publication of the final version of the manuscript.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Yueyao Sun, Lei Xu and Xiaowen Ma contributed equally to this work.
Contributor Information
Bo Liu, Email: Lb123@hebmu.edu.cn.
Fangjian Shang, Email: Sfj123@hebmu.edu.cn.
References
- 1.Ferrari, S. M. et al. Novel treatments for anaplastic thyroid carcinoma. Gland Surg.9 (Suppl 1), S28–S42. 10.21037/gs.2019.10.18 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bray, F., Laversanne, M., Weiderpass, E. & Soerjomataram, I. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Cancer J. Clin.74 (2), 137–165. 10.3322/caac.21834 (2024). [DOI] [PubMed] [Google Scholar]
- 3.Siegel, R. L., Miller, K. D., Wagle, N. S. & Jemal, A. Cancer statistics, 2023. CA: A cancer. J. Clin.73 (1), 17–48. 10.3322/caac.21763 (2023). [DOI] [PubMed] [Google Scholar]
- 4.Pan, Z. et al. Integrated bioinformatics analysis of master regulators in anaplastic thyroid carcinoma. BioMed Res. Int.2019, 9734576. 10.1155/2019/9734576 (2019). [DOI] [PMC free article] [PubMed]
- 5.Hu, S., Liao, Y. & Chen, L. Identification of key pathways and genes in anaplastic thyroid carcinoma via integrated bioinformatics analysis. Med. Sci. Monitor: Int. Med. J. Experimental Clin. Res.24, 6438–6448. 10.12659/MSM.910088 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Jaksik, R., Iwanaszko, M., Rzeszowska-Wolny, J. & Kimmel, M. Microarray experiments and factors which affect their reliability. Biol. Direct10 46. 10.1186/s13062-015-0077-2 (2015). [DOI] [PMC free article] [PubMed]
- 7.Gao, X., Wang, H. & Zhang, Y. An updated comparison of microarray and RNA-seq for transcriptomic profiling. BMC Genom.26 (1), 115. 10.1186/s12864-025-11548-3 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Skubleny, D., Ghosh, S., Spratlin, J., Schiller, D. E. & Rayat, G. R. Feature-specific quantile normalization and feature-specific mean–variance normalization deliver robust bi-directional classification and feature selection performance between microarray and RNAseq data. BMC Bioinform.25, 136. 10.1186/s12859-024-05759-w (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Zhao, S., Fung-Leung, W. P., Bittner, A., Ngo, K. & Liu, X. Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. BMC Genom.15, 523. 10.1186/1471-2164-15-523 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Rai, M. F., Sandell, L. L., Nath, A. & Mortazavi, A. Advantages of RNA-seq compared to RNA microarrays for transcriptome profiling. Front. Genet.8, 20. 10.3389/fgene.2017.00020 (2017).
- 11.Li, C. et al. Identification of novel characteristic biomarkers and immune infiltration profile for the anaplastic thyroid cancer via machine learning algorithms. J. Endocrinol. Investig.46 (8), 1633–1650. 10.1007/s40618-023-02022-6 (2023). [DOI] [PubMed] [Google Scholar]
- 12.Ritchie, M. E. et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res.43 (7), e47. 10.1093/nar/gkv007 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kanehisa, M., Furumichi, M., Sato, Y., Kawashima, M. & Ishiguro-Watanabe, M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res.51 (D1), D587–D592. 10.1093/nar/gkac963 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn.46 (1–3), 389–422. 10.1023/A:1012487302797 (2002). [Google Scholar]
- 15.Cukuroglu, E., Engin, H. B., Gursoy, A. & Keskin, O. Hot spots in protein-protein interfaces: towards drug discovery. Prog. Biophys. Mol. Biol.116 (2–3), 165–173. 10.1016/j.pbiomolbio.2014.06.003 (2014). [DOI] [PubMed] [Google Scholar]
- 16.Huang, M. L., Hung, Y. H., Lee, W. M., Li, R. K. & Jiang, B. R. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier. Sci. World J.2014, 795624. 10.1155/2014/795624 (2014). [DOI] [PMC free article] [PubMed]
- 17.Ishwaran, H. & Kogalur, U. B. Consistency of random survival forests. Stat. Probab. Lett.80 (13–14), 1056–1064. 10.1016/j.spl.2010.02.020 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gong, W. et al. Single-sample gene set enrichment analysis reveals the clinical implications of immune-related genes in ovarian cancer. Front. Mol. Biosci.11, 1426274. 10.3389/fmolb.2024.1426274 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Warde-Farley, D. et al. The genemania prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res.38 (Web Server issue), W214–W220. 10.1093/nar/gkq537 (2010). [DOI] [PMC free article] [PubMed]
- 20.Nykytyuk, S. O., Sverstiuk, A. S., Klymnyuk, S. I., Pyvovarchuk, D. S. & Palaniza, Y. B. Approach to prediction and receiver operating characteristic analysis of a regression model for assessing the severity of the course Lyme borreliosis in children. Reumatologia61 (5), 345–352. 10.5114/reum/173115 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Chandrashekar, D. S. et al. UALCAN: a portal for facilitating tumor subgroup gene expression and survival analyses. Neoplasia (New York, N.Y.)19 (8), 649–658. 10.1016/j.neo.2017.05.002 (2017). [DOI] [PMC free article] [PubMed]
- 22.Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun.4, 2612. 10.1038/ncomms3612 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Finotello, F. et al. Molecular and Pharmacological modulators of the tumor immune contexture revealed by Deconvolution of RNA-seq data. Genome Med.11 (1), 34. 10.1186/s13073-019-0638-6 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Aran, D., Hu, Z. & Butte, A. J. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol.18 (1), 220. 10.1186/s13059-017-1349-1 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods. 12 (5), 453–457. 10.1038/nmeth.3337 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Racle, J., Gfeller, D. & Clifton, N. J. EPIC: A Tool to estimate the proportions of different cell types from bulk gene expression data. Methods Mol. Biol. (Clifton, N.J.)2120, 233–248. 10.1007/978-1-0716-0327-7_17 (2020). [DOI] [PubMed]
- 27.Li, T. et al. TIMER: A web server for comprehensive analysis of Tumor-Infiltrating immune cells. Cancer Res.77 (21), e108–e110. 10.1158/0008-5472.CAN-17-0307 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Charoentong, P. et al. Pan-cancer Immunogenomic analyses reveal Genotype-Immunophenotype relationships and predictors of response to checkpoint Blockade. Cell. Rep.18 (1), 248–262. 10.1016/j.celrep.2016.12.019 (2017). [DOI] [PubMed] [Google Scholar]
- 29.Digre, A. & Lindskog, C. The human protein atlas-Integrated omics for single cell mapping of the human proteome. Protein Science: Publication Protein Soc.32 (2), e4562. 10.1002/pro.4562 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Zhang, S. et al. Global burden of thyroid cancer in 2022: incidence and mortality estimates from GLOBOCAN. Chin. Med. J.137 (11), 1150–1157. 10.1097/CM9.0000000000003284 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kaliszewski, K. et al. The incidence trend and management of thyroid cancer — what has changed in the past years: own experience and literature review. Cancers15 (20), 4941. 10.3390/cancers15204941 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Dou, Z., Shi, Y. & Jia, J. Global burden of thyroid cancer across 204 countries and territories from 1990 to 2019: an analysis of GBD 2019 data. J. Hematol. Oncol. (2024)., 17, Article 82. 10.1186/s13045-024-01593-y. [DOI] [PMC free article] [PubMed]
- 33.Zhang, J., Wang, X., Liu, Y., Li, H. & Chen, Q. High aggressiveness of papillary thyroid cancer: molecular mechanisms, metastatic features, and risk stratification. Cell. Death Discovery. 10, 21. 10.1038/s41420-024-02157-2 (2024).38212635 [Google Scholar]
- 34.Chen, H., Li, Y., Zhang, Y., Wang, X. & Zhao, J. Prediction model of cervical lymph node metastasis in papillary thyroid carcinoma. Cancer Control. 31, 10732748241295347. 10.1177/10732748241295347 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zhang, J. et al. Prognostic nutritional index predicts lateral lymph node metastasis and recurrence-free survival in papillary thyroid carcinoma. BMC Cancer. 24, 1039. 10.1186/s12885-024-12801-w (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Liu, J., Zou, B., Xiang, C., Yan, H. & Liu, H. C. Comprehensive bioinformatics analysis unveils THEMIS2 as a carcinogenic indicator related to immune infiltration and prognosis of thyroid cancer. Sci. Rep.14, 8156. 10.1038/s41598-024-58943-6 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Guo, Y., Liu, X., Li, J. & Zhang, H. Predictive risk-scoring model for lateral lymph node metastasis in papillary thyroid carcinoma. Sci. Rep.15, 9542. 10.1038/s41598-025-92295-z (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Pan, L., Liu, H., Wang, Y., Zhang, S. & Chen, Y. Integrated transcriptome sequencing and weighted gene co-expression network analysis reveals key genes of papillary thyroid carcinomas. Comput. Struct. Biotechnol. J., 24, 2869–2879. 10.1016/j.csbj.2024.08.046 (2024). [DOI] [PMC free article] [PubMed]
- 39.Guo, M., Sun, Y., Wei, Y., Xu, J. & Zhang, C. Advances in targeted therapy and biomarker research in thyroid cancer. Front. Endocrinol.15, 1372553. 10.3389/fendo.2024.1372553 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Codrich, M., Lombardi, A., Di Giacomo, G. & Persani, L. Circulating biomarkers of thyroid cancer: an appraisal. J. Clin. Med.14 (5), 1582. 10.3390/jcm14051582 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Shen, H., Gu, X., Li, X., Zhang, R. & Wang, Z. C5aR1 shapes a non-inflammatory tumor microenvironment and mediates immune evasion in gastric cancer. Biomolecules Biomed.23 (3), 392–404. 10.17305/bjbms.2022.8317 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Bao, D. et al. Integrative analysis of complement system to prognosis and immune infiltrating in colon cancer and gastric cancer. Front. Oncol.10, 553297. 10.3389/fonc.2020.553297 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Yoneda, M. et al. Enhancement of cancer invasion and growth via the C5a-C5a receptor system: implications for cancer promotion by autoimmune diseases and association with cervical cancer invasion. Oncol. Lett.17 (1), 913–920. 10.3892/ol.2018.9715 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Zheng, X., Sun, R. & Wei, T. Immune microenvironment in papillary thyroid carcinoma: roles of immune cells and checkpoints in disease progression and therapeutic implications. Front. Immunol.15, 1438235. 10.3389/fimmu.2024.1438235 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Chen, J. et al. C5aR deficiency attenuates the breast cancer development via the p38/p21 axis. Aging12 (14), 14285–14299. 10.18632/aging.103468 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Guo, Y. et al. Identification and validation of a novel senescence-related signature in thyroid cancer and its correlation with tumor microenvironment. Front. Immunol.14, 1128390. 10.3389/fimmu.2023.1128390 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Kong, F. et al. Hepatitis B virus core protein mediates the upregulation of C5α receptor 1 via NF-κB pathway to facilitate the growth and migration of hepatoma cells. Cancer Res. Treat.53 (2), 506–527. 10.4143/crt.2020.397 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Li, B. et al. Biomarkers associated with papillary thyroid carcinoma and hashimoto’s thyroiditis: bioinformatic analysis and experimental validation. Int. Immunopharmacol.143 (Pt 3), 113532. 10.1016/j.intimp.2024.113532 (2024). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Dou, Z., Shi, Y. & Jia, J. Global burden of thyroid cancer across 204 countries and territories from 1990 to 2019: an analysis of GBD 2019 data. J. Hematol. Oncol. (2024)., 17, Article 82. 10.1186/s13045-024-01593-y. [DOI] [PMC free article] [PubMed]
- Pan, L., Liu, H., Wang, Y., Zhang, S. & Chen, Y. Integrated transcriptome sequencing and weighted gene co-expression network analysis reveals key genes of papillary thyroid carcinomas. Comput. Struct. Biotechnol. J., 24, 2869–2879. 10.1016/j.csbj.2024.08.046 (2024). [DOI] [PMC free article] [PubMed]
Supplementary Materials
Data Availability Statement
The datasets used and analysed during the current study available from the corresponding author on reasonable request.






