ABSTRACT
Gallbladder cancer (GBC) is the most common biliary tract neoplasm. Identifying biomarkers for GBC initiation and progression remains a challenge. This study aimed to identify GBC biomarkers using machine learning and bioinformatics. Differentially expressed genes (DEGs) were identified from two microarray datasets (GSE100363, GSE139682) from the GEO database. Gene Ontology and pathway analyses were performed using DAVID. A protein–protein interaction network was constructed using STRING, and hub genes were identified via three ranking algorithms (degree, MNC and closeness centrality). Feature selection methods (Pearson correlation, recursive feature elimination) were applied to extract key gene subsets. Machine learning models (SVM, NB and RF) were trained on GSE100363 and validated on GSE139682 to assess predictive performance. Biomarkers were further validated using the GEPIA database. A total of 146 DEGs were identified, including 39 upregulated and 107 downregulated genes. Eleven hub genes were identified, with SLIT3, COL7A1 and CLDN4 strongly correlated with GBC. Machine learning results confirmed their diagnostic potential. The study highlights NTRK2, COL14A1, SCN4B, ATP1A2, SLC17A7, SLIT3, COL7A1, CLDN4, CLEC3B, ADCYAP1R1 and MFAP4 as crucial genes associated with GBC. SLIT3, COL7A1 and CLDN4 serve as highly predictive biomarkers, and findings can improve early diagnosis and prognosis, aiding clinical decision‐making.
Keywords: bioinformatics, feature selection, gall bladder cancer, hub genes, machine learning, PPI network
This study applied bioinformatics and machine learning to identify biomarkers for gallbladder cancer (GBC) using gene expression data. A total of 146 differentially expressed genes were identified, with SLIT3, COL7A1 and CLDN4 emerging as key predictive biomarkers. Machine learning models confirmed their strong diagnostic potential. These findings could improve early detection and prognosis, aiding clinical decision‐making.

1. Introduction
The highly common type of biliary tract cancer (BTC) and gallbladder cancer (GBC) has an unfavourable outcome and a high death rate [1, 2, 3, 4]. Having a survival rate of < 6 months on average as well as a total 5‐year survival rate of under 5%, this malignancy is an extremely deadly illness. Since this cancer spreads quietly when a late diagnosis is made, early detection is crucial. With an ordinary cholecystectomy in suspicious gallbladder stone illness, 0.5%–1.5% of individuals were found with GBC [5]. The eighth American Joint Committee on Cancer (AJCC) guideline [6] states how the most effective possible treatment for GBC at its infancy is surgical resection; for GBC at a later stage, chemotherapy, radiation therapy, immunotherapy and targeted therapy are advised. The extremely aggressive and metastatic features of advanced GBC, such as local development of tumours, hepatic invasion and lymph node metastases, result in minimal reaction to treatment and an unfavourable outlook for victims [7, 8]. Although the complicated procedure and GBC's molecular mechanisms are ambiguous, several studies revealed essential functions of numerous biological processes in cancer spread and invasion, including immune evasion [9], epithelial‐mesenchymal transition (EMT) [10, 11] and cancer stem cells [12]. It is crucial to look into the new biomarkers connected to invasion and spreading in order to improve diagnosis for GBC victims.
Transcriptome examination using high‐throughput genetic sequencing, including microarrays as well as sequencing of RNA, is a new technique in cancer research for identifying pathways and genes as possible predictive and diagnostic biomarkers [13, 14]. Additionally, bioinformatics is currently employed to uncover biomarkers linked to certain diseases. An advancement in the better prevention and treatment of GBC could be achieved by these biomarkers [15]. Relevant gene biomarkers for GBC are currently being identified by bioinformatic approaches using data of gene expression, and the bioinformatics findings exhibit inconsistent behaviour. The predictive importance of several DEGs in GBC has been shown by current research. However, the outcomes of such research have been inconsistent, maybe because various statistical techniques were employed. Furthermore, there is still a shortage of examination of the predictive significance of the DEGs utilising machine learning technologies in GBC. Moreover, the enrichment pathways, Gene Ontology (GO) functions and interaction network of DEGs are still unclear. Combining ML and bioinformatic approaches can enhance the GBC signature learning and validation method and produce reliable results [16]. Although earlier studies such as Kulasingam and Diamandis [15] laid the foundation for biomarker discovery in gallbladder cancer, more recent investigations have leveraged transcriptomics and bioinformatics approaches to uncover key genes and therapeutic targets (Yang et al. [17]; Cao et al. [18]; Singh et al. [19]). These contemporary studies further validate the need for integrative machine learning frameworks, as adopted in our work, to advance the early detection and prognosis of GBC.
This study is significant since it utilised bioinformatics and machine learning to analyse crucial genes for GBC and validate their diagnostic usefulness. The first step in this investigation was to gather two microarray datasets related to GBC from the GEO database. We employed bioinformatics to identify key DEGs in GBC from microarray datasets. DEGs are offered for further functional investigation and PPI network production. Additionally, we used the degree, MNC and closeness approaches to determine the top 15 hub genes. The ‘real’ hub genes were believed to be located at the junction of the 11 hub genes found using the three approaches. Additionally, significant genes were found using feature selection methods such as Pearson correlation and the recursive feature elimination technique. Afterwards, the 11 real hub genes and significant genes, identified by the feature selection method, are trained on the GSE100363 dataset to develop a ML model using a support vector machine (SVM), naive Bayes (NB) and random forest (RF) algorithm. Then, the model was tested using an independent GSE139682 dataset to validate the biomarkers. In addition, the biomarkers were validated using the GEPIA database. Likewise, this study again utilised identified biomarkers to demonstrate drugs, diseases and chemical components to aid in future interpretation or to propose analytical methods. Figure 1 shows the detailed procedure of this investigation.
FIGURE 1.

Flow diagram of the proposed methodology. First, two microarray datasets (GSE100363 and GSE139682) were downloaded from GEO. Second, differentially expressed genes (DEGs) were identified from those datasets. Next, the Gene Ontology analysis and pathway analysis were performed with the identified DEGs to screen significant GO terms and pathways. After that, the protein–protein interaction (PPI) network was constructed. Subsequently, three ranking algorithms (degree, MNC and closeness centrality) was employed to identify the top 15 hub genes, which, surprisingly, overlapped 11 real hub genes. In parallel, two feature selection methods (Pearson correlation and recursive feature elimination) were employed to further identify significant gene subsets. Afterwards, the hub genes and significant genes subset were trained on the GSE100363 dataset to develop a machine learning model using NB, SVM and RF algorithms. Finally, the model was tested using an independent GSE139682 dataset to validate the biomarkers. Additionally, the real hub genes were validated using the GEPIA web tool. Furthermore, this study again utilised identified biomarkers to demonstrate drugs, diseases and chemical components.
1.1. Research Gap
Despite significant advancements in understanding gallbladder cancer (GBC), several gaps persist in previous research. Although numerous studies have highlighted the essential roles of various biological processes in cancer spread and invasion, the detailed molecular mechanisms of GBC remain elusive. Specifically, prior research has identified key biological processes such as immune evasion [9], epithelial‐mesenchymal transition (EMT) [10, 11] and cancer stem cells as critical in GBC progression, yet the precise pathways and interactions governing these processes are not fully understood. Current approaches have largely relied on traditional bioinformatics and high‐throughput sequencing methods, such as microarrays and RNA sequencing, to identify potential diagnostic and prognostic biomarkers. There has been limited applications of machine learning technologies to examine the predictive significance of differentially expressed genes (DEGs) in GBC. This lack of integration of machine learning methods may hinder the ability to achieve more reliable and reproducible outcomes. Moreover, although gene expression data have been extensively analysed to identify relevant biomarkers for GBC, there remains a significant gap in understanding the enrichment pathways, Gene Ontology (GO) functions and interaction networks of these DEGs. Previous studies have not thoroughly investigated these aspects, which are crucial for elucidating the underlying biological mechanisms and for developing effective therapeutic strategies. Addressing these gaps is essential for advancing the diagnosis, treatment and prognosis of GBC and for providing a more comprehensive understanding of this aggressive cancer.
1.2. Research Question
This work uses a combination of ML and bioinformatics tools to find and evaluate important biomarkers for gallbladder cancer (GBC). Specifically, it seeks to answer the following questions:
How can bioinformatics and machine learning approaches be used to identify biomarkers for gallbladder cancer (GBC) and which method produces optimal biomarkers for gallbladder cancer (GBC)?
Which differentially expressed genes (DEGs) are significant in distinguishing between GBC and healthy samples?
How can machine learning models be validated to accurately classify GBC samples based on these DEGs?
What are the possible values for the diagnosis and prognosis of the identified biomarkers in GBC?
How can identified biomarkers discover the drugs, diseases and chemical compounds which can improve interpretation and direct analytical methods?
These questions arise from the need to enhance the precision of identification of biomarkers, which are crucial for effective treatment as well as management of GBC. Additionally, these questions aim to bridge the gap between traditional clinical methods and modern computational techniques by leveraging the power of high‐throughput data of gene expression as well as advanced algorithms. Specifically, the study seeks to identify DEGs that can serve as reliable biomarkers for GBC, develop predictive models using these biomarkers and validate the effectiveness of these models in distinguishing between healthy and cancerous samples.
2. Methods
2.1. Data Collection
The National Centre for Biotechnology Information's (NCBI) Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) [20] is an accessible, high‐throughput genome repository containing information from chips, microarrays and gene expression. High‐throughput data on gene expression from research institutes throughout the world are included in the database, which was created in 2000. The repository provides access to research papers and associated gene expression identification data [21]. To find gene expression datasets for gallbladder cancer in the GEO database, we employed the search phrase ‘Gallbladder Cancer AND Homo sapiens’ refined by ‘expression profiling by array’. We obtained a pair of datasets (GSE100363 and GSE139682) produced on the GPL20795 platform, HiSeq X Ten (Homo sapiens), containing gene expression data for those suffering from gallbladder tumours and normal people. The GSE100363 series comprises 8 samples: 4 gallbladder tumours and 4 normal samples. The GSE139682 series comprises 20 samples: 10 gallbladder tumours and 10 normal samples. Figure 2 shows the distribution of tumour and normal sample counts across selected datasets.
FIGURE 2.

Distribution of the number of samples of both tumour and normal among selected datasets. The GSE139682 series comprises 20 samples: 10 gallbladder tumours and 10 normal samples. The GSE100363 series comprises 8 samples: 4 gallbladder tumours and 4 normal samples.
2.2. DEGs Identification
Genes that exhibit notable variations in their level of expression across several environments for experimentation, such as distinct tissues, developmental stages or disease states, are referred to as differentially expressed genes (DEGs). Genes that are upregulated exhibit greater levels of expression under one situation than another. Conversely, expression levels of downregulated genes are lower under one situation than another. The upregulation of genes can indicate the activation of specific biological processes, response to environmental stimuli or involvement in disease pathways. On the other hand, the downregulation of genes may indicate suppression of certain biological processes, adaptation to environmental changes or inhibition of disease pathways.
GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r), a data statistical programme included within the GEO's database, allows for the visual statistical analysis of gene expression profiles. It examines multiple GEO series datasets to discover DEGs under experimental settings [22]. We utilised GEO2R to identify DEGs in both GBC tumour and normal samples. Utilising GEO's database, the gene log2foldchange(log2FC) value was acquired. The fold change of the gene is the ratio of the normal and cancerous gene expression values. Positive as well as negative log2FC values were used to identify the upregulated and downregulated genes. Adjusted p‐value < 0.05 and |log2foldchange(log2FC)| > 1 for overexpression along with |log2foldchange(log2FC)| < −1 for downexpression are employed to determine the statistically significant effect of genes. DEGs were visualised using a volcano map. The Venny web tool was used to make Venn diagrams (https://bioinfogp.cnb.csic.es/tools/venny/) [23]. Algorithm 1 represents the overall procedure of the upregulated and downregulated DEG identification.
ALGORITHM 1. Algorithm of upregulated and downregulated DEG identification.
1.
Input:
‐ GEO series datasets containing gene expression profiles.
‐ Define empty sets U = θ (upregulated DEGs) and D = θ (downregulated DEGs).
‐ Set adjusted p‐value threshold pth = 0.05
‐ Set log2fold change threshold log2 FCth = 1
Output:
‐ Lists of upregulated and downregulated DEGs.
1: for each gene gi in the dataset do
2: Use GEO2R tool to obtain values of log2 FC and the adjusted p‐value padj
3: if padj(gi) < pth then
4: if log2 FC(gi) > log2 FCth then
5: U ← U ∪ {gi}
6: else if log2 FC(gi) < −log2 FCth then
7: D ← D ∪ {gi}
2.3. Gene Ontology and Pathway Enrichment Analysis of DEGs
Gene Ontology (GO) study yields broad biological research results of a particular gene or gene group by utilising the concepts of molecular functions (MFs), biological processes (BPs) and cellular components (CCs). GO's research has become an essential component of investigations connected to system biology in recent times. Pathway enrichment analysis is another tool that helps investigate the biological relationships among the collection of gene groups based on a thorough genome‐scale analysis [24]. Within this work, the GO repository [25] was utilised for investigating the Gene Ontology terms related to DEGs, and the REACTOME [26] databases were utilised to do pathway analysis. An online biological information directory called the database for annotation, visualisation and integrated discovery (DAVID, http://david.abcc.ncifcrf.gov/) combines information from biology with methods of analysis for providing a comprehensive collection of annotation details about functioning genes and proteins [27]. This offers an opportunity for users for gathering biological information. The DAVID online database was used to do biological studies and examine how DEGs work [25].
2.4. Construction of Protein–Protein Interaction Network
A public database called the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING; http://string‐db.org) is employed for anticipating protein–protein interaction (PPI) networks [28]. Data on over 3 billion connections, 20 million proteins and 5000 species can be found in the STRING database. Such protein interactions encompass both co‐expression correlations and direct physical interactions. To gain more knowledge of the intricate regulatory networks seen in organisms, known protein–protein connections can be discovered using the STRING database. According to earlier research, examining the functional relationships between proteins can reveal fresh information on the origin or progression of disorders [29]. For this study, a PPI network involving the DEGs was constructed using the database known as STRING. To incorporate interactions with medium‐to‐high confidence, we selected a minimum confidence score threshold of 0.4. In order to maintain good network stability and guarantee enough coverage of interactions among the differentially expressed genes (DEGs), this threshold was used.
2.5. Hub Gene Identification
Hub genes are often utilised to focus considering the specific group of DEGs that might most effectively separate sick samples from the group under control. In turn, the constructed PPI network was visualised using the Cytoscape software [30] (https://cytoscape.org/), as well as Cytoscape CytoHubba plugin was used for determining hub genes within the PPI network using a variety of ranking techniques.
Numerous node ranking techniques, such as global as well as local ways, are offered by CytoHubba [31]. The global approach looks at the node's interaction with the entire network, whereas the local ranking technique looks at the node's interaction with its immediate peers. Three ranking algorithms were employed to determine the hub genes: a global ranking method (Closeness Centrality), a pair of local ranking algorithms (degree and maximum neighbourhood component [MNC]). The quantity of nodes that surround a node v is its degree. The maximum connected component (MNC) is the size of the neighbourhood N (v), which refers to the collection of nodes that are adjacent to v but exclude v. Last but not least, closeness centrality, which is determined by averaging the shortest route's length linking each node within the connection to all other nodes, shows the distance between each node within the network. In order to create machine learning models, in the end, we picked the top 15 genes using every ranking method.
2.6. Significant DEGs Identification Utilising Feature Selection Methods
We used feature selection techniques, such as Pearson correlation and recursive feature elimination (RFE), that choose high‐dimensional data's most significant attributes to further find the important DEGs which most effectively differentiate sick specimens from the healthy ones.
2.6.1. Recursive Feature Elimination
In a nutshell, RFE ranks the features according to their significance as well as gives back the top‐n features resulting from eliminating less significant attributes with n being the number of features that the user entered. The data is initially divided into training (80%) and validation (20%) groups in order to accomplish this. The classifier is then developed as a linear support vector classifier. Using RFE, the process selects features from 1 to all available features, iterating over a range of feature counts. The model is trained for every selection, and its accuracy is assessed using 5‐fold cross validation. The results—the quantity of features and the associated accuracy—are then recorded. In order to help determine the ideal number of features that produce the best performance, the link between the number of features and the cross‐validated accuracy of the model is finally plotted.
Then, to obtain the feature set that yields the best performance, the estimator parameter needs to be initialised by indicating the classifier that will be employed and the n features to select the parameter which indicates how many characteristics need to be chosen. Following configuration, the fit() method must be used to fit the model to a dataset used for training in order to select the features. In our research, we supplied the sample vector into the RFE framework that includes exactly 146 DEGs, setting n features to select to an optimal number of features that yield the best performance and using the SVM method as an estimator.
2.6.2. Pearson Correlation
The Pearson correlation demonstrates the linear relationship between two variables. Strongly correlated features have a more linear dependence and so have a similar effect on the dependent variable. Two qualities may be eliminated if there is a substantial correlation between them. In our research, we entered the sample matrix including all 146 DEGs of features into a Pearson correlation model, and each of the pair of features with a correlation > 0.75 was eliminated after comparing the feature correlations. In bioinformatics research, this correlation coefficient threshold is frequently applied to eliminate strongly connected pairings while preserving a few independent properties. One feature was eliminated from each associated pair if the features had an absolute Pearson correlation higher than 0.75, which was deemed redundant.
2.7. Developing ML Models on GSE100363 Dataset
For distinction among GBC tumours and healthy samples, we created three ML models, that is, random forest (RF), support vector machine (SVM) and naive Bayes (NB) classifiers. The application of SVM, a supervised learning method, is in regression and classification. For classification purposes, the SVM generates hyperplanes that maximise the separation between classes. On the other hand, an ensemble classifier called random forest [32, 33] includes numerous decision trees and generates a class which reflects the mode of each tree's output separately. In every classification tree, a certain number of votes are assigned to each class. Out of all the trees, the algorithm chooses the category with the highest number of votes. Naive Bayes is a family of statistical classification algorithms based on Bayes' theorem, which calculates the probability of a class given a set of features. It operates under the assumption that the features are independent of each other, which is a simplification that makes the computations more manageable, hence the term ‘naive’ [34].
Moreover, we developed a space of searches for parameter optimisation for each machine learning model in order to determine the optimal set of attributes. As a result, for hyperparameter optimisation for RF, NB and SVM, a grid search technique was employed after a randomised search. Regarding the randomised search, we generated a grid of hyperparameters, and we used random hyperparameter combinations to train and test our models. Subsequently, the optimal parameter combinations are determined by identifying the best parameters using a randomised search. Table 1 summarises the tested parameters and selected parameters (in separate column). We employed 4‐fold‐cross‐validation (CV) to assess such models. The StratifedKFold method is used to divide the data into four segments, assuring that each subgroup has an equal proportion of positive and negative observations. Choosing one of the subsets for testing and using all the other subsets for training allows the process to be performed four times. We computed the accuracy score for every fold and utilised that result to calculate the mean accuracy.
TABLE 1.
Search space parameters for RF, NB and SVM model optimisation.
| Model | Hyperparameters | Search space | Selected parameters |
|---|---|---|---|
| SVM | C | [100, 10, 1, 0.1] | 1, 0.1 |
| Gamma | [1, 0.1, ‘auto’, ‘scale’] | ‘scale’ | |
| Kernel | [‘poly’, ‘rbf’, ‘linear’] | ‘rbf’, ‘linear’ | |
| RF | ‘min samples leaf’ | [4, 2, 1] | 1 |
| ‘max depth’ | [30, 20, 10] | 10 | |
| ‘n estimators’ | [200, 100, 50] | 50 | |
| ‘min samples split’ | [10, 5, 2] | 2 | |
| NB | ‘var smoothing’ | [1e‐9, 1e‐8, 1e‐7, 1e‐6, 1e‐5] | 1e‐9, 1e‐7 |
2.8. Evaluation Tools
Our proposed methodology was assessed using ROC curve performance matrices as well as accuracy, precision, recall and F1‐score, which are described below.
- Accuracy score: The framework's ability to predict is assessed by the accuracy score. This measures how much of precise estimations the model made relative to all of its forecasts. The accuracy score is calculated using the formula below:
(1) -
Precision: Analyse how well a classifier predicts the favourable outcomes. The ratio of true positive (TP) estimations to the total of TP and false positives (FP) is used to compute precision. In this context, TP refers to the overall amount of precise forecasts or results in which the real class was positive. A FP is the entire quantity of inaccurate forecasts or results in which the true class was positive.
The following is a mathematical representation of it:(2) - Recall: Calculates a ratio of every positive forecast to the precise affirmative guesses generated via classifiers. The ratio of TP guesses to the total of TP and false negatives (FN) is known as recall. Here, FN refers to the overall amount of inaccurate forecasts or outcomes in which the real class was unfavourable. The following is a mathematical representation of it:
(3) - F1‐score: Demonstrates how well the model performed in regards to precision as well as recall, and it could be computed by the harmonic mean of the two using the formula below:
(4) ROC curve: A binary classifier's performance across several thresholds is evaluated using an illustration called the receiver operating characteristic (ROC) curve. For various threshold configurations, the sensitivity, or true positive rate, is plotted against the specificity, or false positive rate. In particular, greater values along the y‐axis signify greater TPs and lower FNs, whereas smaller figures along the plot's x‐axis indicate lower FPs and greater TNs. Often employed as an overview indicator for assessing a classifier's overall efficacy, the area under the ROC curve (AUC‐ROC) has values ranging from zero to one. The model's ability to discriminate is shown by a greater AUC‐ROC.
2.9. Validation of Biomarkers Gene Expression
The level of expression of biomarkers in GBC and normal tissues was verified using GEPIA2 (Gene Expression Profiling Interactive Analysis, http://gepia2.cancer‐pku.cn/), an archive of data retrieved from the UCSC Xena database that comprises 9736 tumour samples and 8587 normal samples. p‐values below 0.05 indicated differences that were statistically noteworthy [35].
2.10. Drugs, Disease and Chemicals Identification Associated With Biomarkers
To speed up the medicinal product formation procedure and learn how different medications work, it is essential to identify drug molecules that are linked to them. Additionally, researchers can comprehend the biological mechanisms behind a disease by examining the genes linked to it. The drug signatures database (DSigDB) of the Enrichr web tool (https://maayanlab.cloud/Enrichr) allows one to forecast the desired medication. Further, the disease‐associated genes were displayed using the disease gene network (DisGeNET) database and the Enrichr web tool. In order to accomplish this, we utilised potential biomarkers. Further, interactions between proteins (genes) and chemical compounds were also displayed, which helps to understand how various chemicals (e.g., environmental toxins) interact with specific proteins.
3. Results
3.1. DEGs Identification
Initially, from the GSE139682 and GSE100363 datasets, respectively, a total of 22,829 and 22,031 DEGs were discovered. Following the execution of the revised p‐value requirement and minimal log2FC, 1800, 432 DEGs are found in accordance (Table 2). In the end, the common differential expression analysis between the tissues from gallbladder tumours and normal tissues produced 146 common DEGs. Of them, 39 genes had considerably upregulated expression, whereas 107 genes had downregulated expression in GSE100363 and GSE139682 (Figure 3). The volcano plots of different expressions are shown in (Figure 4).
TABLE 2.
Dataset analysis details before filtration and after log2FC and adjusted p‐value filtration.
FIGURE 3.

Venn intersection diagrams of the DEGs of the two datasets: (a) represents the common DEGs, (b) shows commonly upregulated genes and (c) shows commonly downregulated genes. Diagrams were generated using the Venny.
FIGURE 4.

DEGs are represented by volcano graphs. Blue dots indicate downregulated genes and red dots indicate up‐regulated genes. The |logFC| > 1 for overexpression and |logFC| < −1 for downexpression were applied to set up differences. Genes without differential expression were denoted by black dots. Results were generated using the GEO2R.
3.2. Gene Ontology and Pathway Enrichment Analysis of DEGs
Using the DAVID online tool, Gene Ontology functional as well as pathway enrichment analyses were carried out on 146 common DEGs. To learn more about the function of the DEGs that were found, functional analysis was used. The functional analysis revealed significantly enriched terms from Gene Ontology as well as pathways of the discovered DEGs. Based on the GO study, it can be inferred that the overexpressed DEGs are primarily linked to biological processes such as ‘cell adhesion’, ‘epidermis development’, and ‘keratinisation’; cellular components like the ‘plasma membrane’ and ‘integral membrane’; and molecular functions such as ‘structural molecule activity’ (Table 3). Likewise, the investigation reveals that the down‐expressed DEGs are primarily linked to ‘cell differentiation’, ‘nervous system development’, ‘cell adhesion’ for biological process, ‘the plasma membrane’, ‘integral component of membrane’ for cellular component, ‘calcium ion binding’, and ‘heparin‐binding’ for molecular function (Table 4). To further explore the pathways that were found to be enriched in DEGs, an enrichment study of the REACTOME pathway was carried out. The REACTOME pathway study found that the pathways involved in the overexpression of DEGs are primarily enhanced in ‘signalling by receptor tyrosine kinases’, ‘formation of the cornified envelope’, ‘keratinisation’, and ‘extracellular matrix organisation’ pathway (Table 3). Furthermore, ‘signal transduction’, ‘extracellular matrix organisation’, and ‘hemostasis’ pathways were the key ones where downexpressed DEGs were enriched (Table 4). Employing a dot bubble illustration for enrichment made Gene Ontology enriched BP/MF/CC and REACTOME, and the GO keywords and pathways for both overexpressed DEGs (Figure 5) and downexpressed DEGs (Figure 6) were visualised.
TABLE 3.
Results of GO analysis of the upregulated DEGs using the DAVID online tool.
| Category | GO ID | % | p‐value |
|---|---|---|---|
| BP | Epidermis development | 7.692308 | 0.006651 |
| BP | Keratinisation | 7.692308 | 0.007428 |
| BP | Endodermal cell differentiation | 5.128205 | 0.043789 |
| BP | Cell adhesion | 10.25641 | 0.046459 |
| BP | Establishment of skin barrier | 5.128205 | 0.04655 |
| BP | Intermediate filament organisation | 5.128205 | 0.093636 |
| CC | Ciliary membrane | 5.128205 | 0.085086 |
| CC | Cornified envelope | 5.128205 | 0.08375 |
| CC | Intermediate filament cytoskeleton | 5.128205 | 0.082411 |
| CC | Integral component of membrane | 33.33333 | 0.068537 |
| CC | Hippocampal mossy fibre to CA3 synapse | 5.128205 | 0.060735 |
| CC | Integral component of plasma membrane | 15.38462 | 0.054364 |
| CC | Cytoskeleton | 10.25641 | 0.044753 |
| CC | Occluding junction | 5.128205 | 0.037162 |
| CC | Apical plasma membrane | 10.25641 | 0.019177 |
| CC | Intermediate filament | 7.692308 | 0.017341 |
| CC | Plasma membrane | 38.46154 | 0.011103 |
| MF | Structural molecule activity | 10.25641 | 0.003175 |
| REACTOME | Signalling by receptor tyrosine kinases | 10.25641026 | 0.051975399 |
| REACTOME | Formation of the cornified envelope | 7.692307692 | 0.018876409 |
| REACTOME | Keratinisation | 7.692307692 | 0.047712584 |
| REACTOME | Extracellular matrix organisation | 7.692307692 | 0.085736649 |
| REACTOME | Anchoring fibril formation | 5.128205128 | 0.024375819 |
| REACTOME | LDL clearance | 5.128205128 | 0.030780624 |
| REACTOME | Laminin interactions | 5.128205128 | 0.048189704 |
| REACTOME | Plasma lipoprotein clearance | 5.128205128 | 0.059113989 |
| REACTOME | Nonintegrin membrane‐ECM interactions | 5.128205128 | 0.092681663 |
| REACTOME | Assembly of collagen fibrils and other multimeric structures | 5.128205128 | 0.095676523 |
TABLE 4.
Gene Ontology analysis of downregulated DEGs using the DAVID web tool.
| Category | GO ID | % | p‐value |
|---|---|---|---|
| BP | Cell differentiation | 12.38095 | 1.10E‐04 |
| BP | Nervous system development | 10.47619 | 3.16E‐05 |
| BP | Cell adhesion | 8.571429 | 0.005604 |
| BP | Cell‐cell adhesion | 6.666667 | 3.51E‐04 |
| BP | Extracellular matrix organisation | 5.714286 | 0.002013 |
| CC | Plasma membrane | 40.95238 | 1.40E‐04 |
| CC | Integral component of membrane | 37.14286 | 0.003353 |
| CC | Integral component of plasma membrane | 19.04762 | 3.49E‐05 |
| CC | Extracellular space | 19.04762 | 0.001877 |
| CC | Extracellular region | 17.14286 | 0.02544 |
| CC | Cell surface | 10.47619 | 0.001011 |
| CC | Axon | 8.571429 | 2.99E‐04 |
| CC | Dendrite | 7.619048 | 0.005228 |
| CC | External side of plasma membrane | 7.619048 | 0.00729 |
| MF | Calcium ion binding | 8.571429 | 0.027692 |
| MF | Heparin binding | 7.619048 | 2.39E‐05 |
| MF | Actin binding | 4.761905 | 0.081892 |
| MF | Structural molecule activity | 3.809524 | 0.066818 |
| MF | Carbohydrate binding | 3.809524 | 0.073484 |
| MF | Signalling receptor activity | 3.809524 | 0.094202 |
| REACTOME | Extracellular matrix organisation | 8.571429 | 1.14E‐04 |
| REACTOME | Collagen biosynthesis and modifying enzymes | 3.809524 | 0.004569 |
| REACTOME | Cell surface interactions at the vascular wall | 4.761905 | 0.004872 |
| REACTOME | Collagen formation | 3.809524 | 0.010334 |
| REACTOME | Muscle contraction | 4.761905 | 0.019309 |
| REACTOME | NGF‐independent TRKA activation | 1.904762 | 0.024852 |
| REACTOME | Cardiac conduction | 3.809524 | 0.028455 |
| REACTOME | Activation of TRKA receptors | 1.904762 | 0.029749 |
| REACTOME | Degradation of the extracellular matrix | 3.809524 | 0.03306 |
| REACTOME | Ligand‐receptor interactions | 1.904762 | 0.039471 |
| REACTOME | Neuronal system | 5.714286 | 0.054045 |
| REACTOME | Integrin cell surface interactions | 2.857143 | 0.067809 |
| REACTOME | Signal transduction | 18.09524 | 0.079522 |
| REACTOME | Activation of SMO | 1.904762 | 0.086664 |
| REACTOME | Hemostasis | 6.666667 | 0.089655 |
| REACTOME | Scavenging by class A receptors | 1.904762 | 0.091256 |
FIGURE 5.

Functional enrichment analyses: GO terms and REACTOME pathways for upregulated DEGs in this study. BP, biological process‐enriched upregulated DEGs; CC, cellular component‐enriched upregulated DEGs; MF, molecular function‐enriched upregulated DEGs. The bubble's size represents the enrichment outcome, whereas the colours denote the importance of the enrichment. Results were generated using the DAVID online Database, and graphs were generated using the SRplot tool.
FIGURE 6.

Functional enrichment analyses: GO terms as well as REACTOME pathways of downregulated DEGs in this study. BP, biological process‐enriched upregulated DEGs; CC, cellular component‐enriched upregulated DEGs; MF, molecular function‐enriched upregulated DEGs. The bubble's size represents the enrichment outcome, whereas the colours denote the importance of the enrichment. Results were generated using the DAVID online database, and graphs were generated using the SRplot tool.
3.3. Screening PPI Network and Hub Genes
Utilising Cytoscape and public database STRING, a PPI network was constructed to investigate the protein networks linked to identified genes (Figure 7). Next, employing the CytoHubba plugin, the PPI network was loaded into the Cytoscape for visualising as well as identifying hub genes. We subsequently acquired the top 15 hub genes for each of the three topological ranking algorithms: Closeness centrality, maximum neighbourhood component (MNC) and degree (Figure 8, Table 5). Amazingly, we discovered that just 11 of the hub genes were identified by all three ranking methods (Figure 8, Table 6). The 11 ‘real’ hub genes were found at the junction of the hub genes generated by three techniques.
FIGURE 7.

PPI network of 146 promising target genes in gallbladder cancer based on the STRING database. Results were generated using the STRING Database.
FIGURE 8.

(a–c) Hub genes identification with the CytoHubba plugin in Cytoscape software. Three different ranking algorithms were used: degree, MNC and closeness. (d) Eleven common hub genes in gallbladder cancer were identified using a Venn diagram. Results of (a–c) were generated using the Cytoscape CytoHubba Plugin. And results of (d) were generated using the Venny Webtool.
TABLE 5.
Top 15 hub genes ranked by degree, closeness and MNC method.
| Degree | Closeness | MNC | |||
|---|---|---|---|---|---|
| Name | Score | Name | Score | Name | Score |
| NTRK2 | 16 | NTRK2 | 49.36667 | NTRK2 | 15 |
| COL14A1 | 15 | COL14A1 | 47.8 | COL14A1 | 15 |
| CLDN4 | 13 | SCN4B | 46.38333 | SLC17A7 | 13 |
| SLC17A7 | 13 | ATP1A2 | 46.28333 | CLDN4 | 11 |
| ATP1A2 | 12 | SLC17A7 | 45.86667 | ATP1A2 | 9 |
| SCN4B | 12 | SLIT3 | 45.66667 | SLIT3 | 9 |
| COL7A1 | 11 | SFRP1 | 45.21667 | MFAP4 | 9 |
| SLIT3 | 11 | COL7A1 | 44.96667 | SCN4B | 8 |
| CLEC3B | 10 | CLDN4 | 44.25 | CLEC3B | 8 |
| MFAP4 | 10 | CLEC3B | 44.05 | ADCYAP1R1 | 7 |
| ADCYAP1R1 | 9 | ADCYAP1R1 | 43.31667 | NTRK3 | 7 |
| SFRP1 | 9 | SCNN1A | 42.55 | GRIK3 | 7 |
| SCNN1A | 8 | ASTN1 | 42.5 | COL7A1 | 7 |
| SPINT2 | 7 | MFAP4 | 42.48333 | OVOL2 | 6 |
| NTRK3 | 7 | JAM3 | 42.38333 | SPINT2 | 6 |
TABLE 6.
Identification of real hub genes by the PPI network and significant genes by feature selection methods.
| Methods | Identified genes |
|---|---|
| 11 ‘real’ hub genes identified at the intersection of hub genes produced by three ranking algorithms | NTRK2, COL14A1, SCN4B, ATP1A2, SLC17A7, SLIT3, CLDN4, COL7A1, CLEC3B, ADCYAP1R1, MFAP4 |
| Significant genes identified through the Pearson correlation FSM | ABCA8, ADAMTS3, AFAP1‐AS1, BOC, CACNA2D3, CCAT1, COL7A1, CRLF1, DDR1, FAM111B, GPR158, GRB7, KPNA7, KRT17, KRT222, KRT86, LINC00673, MYOM2, SLC22A4 |
| Significant genes identified through recursive feature elimination FSM | CHRDL1, COL14A1, COLEC12, DIXDC1, FENDRR, FHL1, JAM3, NCEH1, PTGIS, SFRP1, SMAD9, SPON1, TSPAN7 |
3.4. Identification of Significant Genes Using Feature Selection Methods
By employing feature selection techniques for identifying the subset of genes that most effectively participate in the prediction job, the features were also ascertained, which are required for constructing the ML models. The 147 DEGs were used in the application of the recursive feature elimination and Pearson correlation feature selection procedures. Having an accuracy of 0.95 and a rate of error of 0.05, RFE identified the top 13 genes, which are shown in Figure 9. To choose the most suitable group of features for the Pearson approach, we experimented with a threshold value. By defining the correlation scores above 0.75, we succeeded in getting gene groups of 19 using the Pearson correlation method. The correlation heatmap of these genes is shown in Figure 10. Table 6 lists the significant genes that have been identified using both feature selection methods.
FIGURE 9.

Significant genes identification using RFE. (a) Screening the number of features and corresponding accuracy score. (b) Screening the number of features and corresponding error rates.
FIGURE 10.

The correlation heatmap of genes that are selected using the Pearson correlation matrix.
3.5. Analysing ML Models Performance on Independent Dataset
The variations in forecasting abilities of the three machine learning models, RF, NB and SVM, were assessed on the independent dataset (GSE139682) upon feeding them distinct feature sets of 11 ‘real’ hub genes and of genes determined by the two feature selection techniques independently. Hence, we created three separate datasets for each biomarker identification method. The first dataset contains the 11 real hub genes as features; the second and third datasets contain genes identified by the Pearson correlation and recursive feature elimination method, respectively, as features. Following that, these datasets were classified, and the potential of various feature selection methods and techniques for classification to classify data was investigated. Accuracy, precision, recall and F1‐scores of these investigations during the evaluation procedure are shown in Tables 7, 8, 9. The AUROC curve derived from the experimental dataset for the SVM, NB and RF models is displayed in Figure 11. The AUC‐ROC values are also displayed in Tables 7, 8, 9 for each model. In all cases, the result shows that the subset of real hub genes outperformed other gene subsets. On the other hand, the gene subset identified by the recursive feature elimination method provided the worst performance. These findings imply that classifiers built using hub genes were able to attain good prediction performances.
TABLE 7.
The mean accuracy (± SD), precision, recall, F1‐scores, AUC‐ROC values of support vector machine, random forest and naive Bayes classification model based on real hub genes identified at the intersection of three hub gene ranking algorithms.
| Models | Accuracy (%) (mean ± SD) | Precision (%) | Recall (%) | F1‐score (%) | AUC‐ROC | |||
|---|---|---|---|---|---|---|---|---|
| GBC | Normal | GBC | Normal | GBC | Normal | |||
| SVM | 90.2 ± 1.5 | 90 | 90 | 90 | 90 | 90 | 90 | 0.9 |
| RF | 89.6 ± 1.2 | 100 | 83 | 80 | 100 | 89 | 91 | 0.9 |
| NB | 85.4 ± 2.1 | 77 | 100 | 100 | 70 | 87 | 82 | 0.85 |
TABLE 8.
The mean accuracy (± SD), precision, recall, F1‐scores, AUC‐ROC values of support vector machine random forest and naive Bayes classification model based on significant genes identified by the recursive feature elimination method.
| Models | Accuracy (%) (mean ± SD) | Precision (%) | Recall (%) | F1‐score (%) | AUC‐ROC | |||
|---|---|---|---|---|---|---|---|---|
| GBC | Normal | GBC | Normal | GBC | Normal | |||
| SVM | 90.2 ± 1.3 | 83 | 100 | 100 | 80 | 91 | 89 | 0.9 |
| RF | 81.4 ± 2.0 | 80 | 80 | 80 | 80 | 80 | 80 | 0.8 |
| NB | 66.8 ± 2.5 | 100 | 59 | 30 | 100 | 46 | 74 | 0.65 |
TABLE 9.
The mean accuracy (± SD), precision, recall, F1‐scores, AUC‐ROC values of support vector machine, random forest and naive Bayes classification model based on significant genes identified by the Pearson correlation method.
| Models | Accuracy (%) (mean ± SD) | Precision (%) | Recall (%) | F1‐score (%) | AUC‐ROC | |||
|---|---|---|---|---|---|---|---|---|
| GBC | Normal | GBC | Normal | GBC | Normal | |||
| SVM | 54.6 ± 2.2 | 53 | 100 | 100 | 10 | 69 | 18 | 0.55 |
| RF | 79.8 ± 1.6 | 75 | 88 | 90 | 70 | 82 | 78 | 0.8 |
| NB | 80.1 ± 1.4 | 75 | 88 | 90 | 70 | 82 | 78 | 0.8 |
FIGURE 11.

The ROC curves of the support vector machine and random forest classification model based on real hub genes identified at the intersection of three hub gene ranking algorithms and genes identified by the Pearson correlation method and recursive feature elimination method.
3.6. Validation of Biomarkers Expression
GEPIA library was used to confirm hub gene expression, with a p‐value < 0.05 and a Log2FC > 1 criteria. GEPIA box plots showed that all hub gene expressions in GBC patients were considerably upregulated in the SLIT3, COL7A1 and CLDN4 (Figure 12). There is little statistical difference in other genes.
FIGURE 12.

Validation of hub gene expression: GEPIA files are used to create boxplots that display the hub gene expression in GBC patients and healthy controls.
3.7. Drugs and Disease Identification Associated Biomarkers
During the production of drugs, the biological and scientific aspects of the medications must be investigated. Using the DSigDB and Enrichr, we found 10 possible drugs (vitnoin CTD 00007069, valproic acid CTD 00006977, progesterone CTD 00006624, bisphenol A CTD 00000312, N‐methyl‐D‐aspartic acid BOSS, (9beta,10alpha)‐17‐(Acetyloxy)‐6‐methylpregna‐4,6‐diene‐3,20‐dione CTD 00007266, trichostatin A CTD 00000660, sarin CTD 00006722, ginsenoside Rg3 CTD 00003265, and delsemidine CTD 00002328). A total of 11 real hub genes (potential biomarkers) were used to generate the potential drug. Table 10 and Figure 13 display the drug components associated with 11 potential biomarkers using the DSigDB database targeting pharmaceuticals. Here, some compounds identified through CTD enrichment, such as SARIN (CTD 00006722), are toxicants with no therapeutic relevance to gallbladder cancer. These associations likely reflect known gene–chemical. Furthermore, interactions between proteins (genes) and chemical compounds were also displayed in Figure 13. In the figure, the circular nodes represent the hub genes, and the square nodes represent chemical compounds.
TABLE 10.
Top 10 potential drugs that are projected to target hub genes ordered by adjusted p‐value.
| Drugs name | p‐value | Adjusted p‐value | Associated genes |
|---|---|---|---|
| Vitinoin CTD 00007069 | 6.08E‐04 | 0.118037 | NTRK2; MFAP4; COL14A1; SLIT3 |
| Valproic ACID CTD 00006977 | 0.001049 | 0.118037 | NTRK2; MFAP4; CLDN4; ADCYAP1R1; COL14A1; COL7A1; ATP1A2; SLC17A7; SLIT3; SCN4B |
| Progesterone CTD 00006624 | 0.002257 | 0.151622 | MFAP4; CLDN4; CLEC3B; ADCYAP1R1; COL14A1 |
| Bisphenol A CTD 00000312 | 0.003637 | 0.151622 | CLDN4; CLEC3B; ATP1A2; SLIT3 |
| N‐methyl‐D‐aspartic acid BOSS | 0.005309 | 0.151622 | NTRK2; SLC17A7 |
| (9beta,10alpha)‐17‐(Acetyloxy)‐6‐methylpregna‐4,6‐diene‐3,20‐dione CTD 00007266 | 0.006035 | 0.151622 | CLDN4 |
| Trichostatin A CTD 00000660 | 0.006678 | 0.151622 | CLDN4; ADCYAP1R1; COL14A1; ATP1A2; SLIT3; SCN4B |
| Sarin CTD 00006722 | 0.006707 | 0.151622 | NTRK2; MFAP4 |
| Ginsenoside Rg3 CTD 00003265 | 0.007129 | 0.151622 | NTRK2 |
| Delsemidine CTD 00002328 | 0.009313 | 0.151622 | NTRK2 |
Note: Results were generated using the Enrichr platform based on gene set enrichment against the DSigDB database.
FIGURE 13.

(a) Top 10 potential drugs that are projected to target hub genes ordered by adjusted p‐values. (b) Protein–chemical interaction network consisting of potential biomarkers. (c) Top 10 diseases associated with potential biomarkers ordered by adjusted p‐values. Results were generated using the Enrichr platform.
Moreover, a disease's molecular pathway can be understood by researchers by examining the genes linked to that disease. Hence, using DisGeNET, we found 10 diseases associated with identified potential biomarkers (anxiety and fear, major depressive disorder, neuroendocrine tumours, bipolar disorder, post‐traumatic, stress disorder, unipolar depression, prenatal alcohol exposure, symptoms of stress, atrophy of tongue and congenital localised absence of skin). 11 real hub genes (potential biomarkers) were used to generate the potential drug. Table 11 and Figure 13 display the disease associated with potential biomarkers of the DisGeNET study list.
TABLE 11.
Top 10 diseases associated with potential biomarkers ordered by adjusted p‐value.
| Disease name | p‐value | Adjusted p‐value | Genes |
|---|---|---|---|
| Anxiety and fear | 3.29E‐05 | 0.016529 | ADCYAP1R1; ATP1A2 |
| Major depressive disorder | 1.22E‐04 | 0.030763 | NTRK2; ADCYAP1R1; SLC17A7; SLIT3 |
| Neuroendocrine tumours | 3.21E‐04 | 0.053755 | NTRK2; CLDN4; ADCYAP1R1 |
| Bipolar disorder | 7.94E‐04 | 0.072839 | NTRK2; ADCYAP1R1; ATP1A2; SLC17A7 |
| Post‐traumatic stress disorder | 0.001596 | 0.072839 | NTRK2; ADCYAP1R1 |
| Unipolar depression | 0.002427 | 0.072839 | NTRK2; ADCYAP1R1; SLIT3 |
| Prenatal alcohol exposure | 0.003296 | 0.072839 | NTRK2 |
| Symptoms of stress | 0.003296 | 0.072839 | ADCYAP1R1 |
| Atrophy of tongue | 0.003844 | 0.072839 | COL7A1 |
| Congenital localised absence of skin | 0.003844 | 0.072839 | COL7A1 |
Note: Results were generated using the Enrichr platform based on gene set enrichment against the DisGeNET database.
Although none of the drugs and diseases (except major depressive disorder, anxiety and fear) achieved statistical significance after FDR correction (adjusted p < 0.05), the top‐ranked associations are presented for exploratory purposes and may serve as a basis for future validation.
4. Discussion
GBC is the highly common cause of disease among biliary tract neoplasms, comprising 80%–95% [36]. Based on GLOBOCAN's (Global Cancer Observatory) 2020 cancer data [37], in 2020, GBC accounted for around 115,949 fresh identified instances, making it the 24th leading cause of cancer globally. The number of patients with a GBC diagnosis reached nearly 84,695 that year because of the severe form of cancer. The overall global incidence has been rising over recent years, despite regional variations in incidence rates. This trend is expected to continue as risk factors grow more common among populations [38]. The goal and challenge for medical and scientific research has always been to identify the molecular mechanism and biomarkers associated with the onset and progression of gallbladder cancer. This research has significant important implications in enhancing the assessment, therapy effectiveness and prognostic lifespan of GBC.
This current work aims to comprehensively identify potential genes and pathways associated with gallbladder cancer by bioinformatics analysis and machine learning methods. By the GEO repository, data on gene expression (GSE100363 and GSE139682) were acquired. Afterwards, a total of 432 and 1800 potential DEGs were obtained from two different datasets. Of these, 146 were identified as common potential DEGs, with 39 of them being over‐expressed genes and 107 being down‐expressed genes associated with gallbladder cancer. Next, in order to explore over‐expressed as well as down‐expressed genes, we conducted enrichment studies of Gene Ontology analysis, utilising the three approaches (CC, MF and BP), and pathway analysis using the REACTOM database.
According to the GO and REACTOME study, the over‐expressed DEGs are mainly associated with biological processes like ‘cell adhesion’, ‘epidermis development and ‘keratinisation’, cellular components like ‘integral membrane’ and ‘plasma membrane’, molecular functions like ‘structural molecule activity’ and pathways like ‘gignalling by receptor tyrosine kinase’. Here, epidermis development and keratinisation were significantly upregulated, indicating epithelial differentiation and stratification abnormalities, which are commonly observed in GBC histology. Likewise, the investigation reveals that the down‐expressed DEGs are primarily linked to ‘cell differentiation’, ‘nervous system development’, ‘cell adhesion’ for biological process, ‘the plasma membrane’, ‘integral component of membrane’ for cellular component, ‘calcium ion binding’ and ‘heparin‐binding’ for molecular functions as well as ‘signal transduction’ for pathways. The downregulation of genes involved in extracellular matrix (ECM) organisation and cell adhesion suggests degradation of tissue barriers and enhanced metastatic potential. Additionally, the enrichment of terms related to the plasma membrane structure and nervous system development may reflect dysregulated cell signalling and neuroimmune interactions within the tumour microenvironment.
In order to maintain statistical rigour, we further concentrated on results with greater significance (p < 0.01), even though an initial p‐value threshold of 0.1 was chosen for exploratory Gene Ontology and pathway enrichment. Cell differentiation (p = 1.10E‐04), nervous system development (p = 3.16E‐05), keratinisation (p = 0.0074) and epidermis development (p = 0.0066) were all highly significant biological processes. Furthermore, a significant enrichment of the extracellular matrix organisation pathway was observed (p = 1.14E‐04). The pathophysiology of gallbladder cancer, which includes epithelial change, tissue architectural modification and increased metastatic potential, is biologically compatible with these findings. The behaviour of tumour cells in GBC is further supported by the enrichment of plasma membrane components and structural molecular activity.
In order to evaluate the interactional links, we also built a PPI network. After that, we obtained the top 15 hub genes for each of the three ranking algorithms: closeness centrality, MNC and degree. Surprisingly, we found that just 11 of the hub genes—which were regarded as ‘real’ hub genes—were recognised by all three ranking techniques.
Additionally, in this investigation, we employed feature selection methods, including RFE and Pearson correlation, to identify the significant DEGs that most effectively separate unhealthy samples from the healthy controls. Then, using the SVM, NB and RF algorithms, the genuine hub genes and significant genes that were found using the feature selection method were trained on the GSE100363 dataset to create a machine learning model. In order to validate the biomarkers, the model was lastly validated using the independent GSE139682 dataset. Nonetheless, the outcomes showed that the subset of real hub genes outperformed the others, suggesting that these potential proteins could be used as key biomarkers for GBC diagnosis. However, the small sample size of the training dataset (GSE100363, n = 8) limits statistical power and may contribute to overfitting, despite validation on an independent dataset. Although we used feature selection and cross‐validation to reduce this risk, the high accuracy should be interpreted cautiously. Additionally, no batch effect correction was applied, which may introduce technical bias. Future studies with larger, multi‐batch datasets and proper normalisation are needed to validate these findings.
The identified hub genes show diverse but convergent roles in cancer biology, particularly in processes relevant to gallbladder cancer progression:
NTRK2: NTRK2, a member of the neurotrophic tyrosine receptor kinase (NTRK) gene family, encodes the TrkB receptor, which is involved in cell growth, survival and differentiation. In the context of gallbladder cancer, the overexpression of NTRK2 or the presence of NTRK2 gene rearrangements contributes to early tumourigenesis by promoting cell proliferation and resistance to apoptosis [39].
COL14A1: COL14A1 (Collagen type XIV alpha 1 chain) plays a significant role in the molecular landscape of early‐stage gallbladder cancer (GBC), particularly in the context of extracellular matrix (ECM) remodelling. In this study [40], the author found that COL14A1, along with other ECM proteins (e.g., COL1A2, COL6A1, BGN and DCN), was significantly downregulated in tumour tissues from early‐stage GBC compared to gallstone disease (GSD) controls.
SCN4B: The SCN4B gene (sodium voltage‐gated channel beta subunit 4) plays an important role in distinguishing rare squamous cell carcinomas (SCCs), including those originating in the gallbladder, which are often aggressive and difficult to diagnose early. According to the study [41], gallbladder SCCs were grouped into a cluster (BBGPT) characterised by distinct molecular features such as upregulation of extracellular matrix (ECM) glycoproteins, fatty acid metabolism and inflammatory response pathways.
ATP1A2: ATP1A2, involved in bile acid transport and cellular ion regulation, plays a role in gallbladder cancer (GBC) by contributing to bile acid‐induced cellular stress and tumour progression. Its dysregulation, as observed in bile acid‐associated colorectal cancer, plays potential as a biomarker for early GBC detection through shared bile‐related carcinogenic pathways [42].
SLC17A7: SLC17A7, regulated by the exosomal circFMN2/miR‐1182 path‐way, is part of a molecular network that influences tumour development and is detectable through liquid biopsy approaches. This makes it a promising candidate biomarker for early detection in gallbladder and colorectal cancers [43].
SLIT3: SLIT3 plays an important role in gallbladder cancer early detection by influencing tumour angiogenesis and correlating with markers of tumour invasion and stage [44]. Its expression, especially in tandem with ROBO1, can help identify aggressive tumours at earlier stages, making it a promising target for biomarker development and risk stratification.
COL7A1: COL7A1 contributes to early detection potential in gallbladder cancer by being upregulated in bile duct‐related cancers, linked to metastasis and poor survival, involved in tumour‐promoting signalling (PI3K/AKT). This makes it a valuable biomarker candidate, especially when analysed alongside other genes in noninvasive tissue or liquid biopsy screenings [45, 46].
CLDN4: CLDN4, regulated through m6A modification by IGF2BP3, drives gallbladder cancer progression via the NF‐kB and STAT3 signalling pathways. Its overexpression marks early tumourigenic changes, making it a promising biomarker for early detection and prognostication of gallbladder cancer [47].
CLEC3B: CLEC3B functions as a tumour suppressor in biliary tract cancers by inhibiting Wnt/β‐catenin signalling, EMT and cancer cell proliferation. Its low expression is an early indicator of poor prognosis, making CLEC3B a promising biomarker for early detection and prognosis prediction in gallbladder cancer [48].
ADCYAP1R1: ADCYAP1R1 plays a potential tumour‐suppressive role in gallbladder cancer by being involved in immune cell interactions [49].
MFAP4: MFAP4, a serum biomarker linked to fibrosis and liver disease, holds strong diagnostic potential for early‐stage gallbladder cancer, particularly in at‐risk populations with chronic gallbladder inflammation. Its noninvasive detectability and relevance in biliary tract fibrosis make it a promising candidate for early GBC screening and monitoring [50].
Additionally, expression levels of biomarkers were validated which showed that SLIT3, COL7A1 and CLDN4 hub gene expressions in GBC patients were significantly elevated, according to GEPIA box plots. Other genes show minimal statistical differences. Further, the drugs, disease and chemical compounds were also demonstrated in this study to help with further interpretation or suggest analytical approaches.
Recent advances in deep learning have substantially accelerated bioinformatics research, particularly in genomic data modelling and multimodal integration. For instance, recent studies have demonstrated the potential of deep learning to identify disease‐specific brain regions and white matter tracts with high diagnostic accuracy, as shown in PSP versus PD classification tasks using multimodal MRI data and neural networks [51]. Recent work in ophthalmology has demonstrated how hybrid deep learning architectures, such as combinations of vision transformers (ViT), swin transformers and CNNs, can achieve near‐perfect classification performance in complex clinical tasks like glaucoma detection, highlighting the growing utility of multimodal deep learning systems for precision diagnostics [52]. Although our current study is based on classical machine learning for gallbladder cancer biomarker identification, future work will explore deep neural network frameworks to integrate multi‐omics and imaging data, thereby improving predictive performance and biological insights through interpretable and generalisable models.
5. Conclusions
This study's analysis of data on gene expression utilising bioinformatics and ML techniques may complement conventional medical diagnostic indicators, allowing physicians to treat gallbladder cancer more effectively and individually. In order to find biomarkers, we developed a computational approach that combines machine learning and bioinformatics with many hub gene ranking and feature selection techniques. In this study, 11 key genes or biomarkers were identified with diagnostic and prognostic values in GBC. This study also clarified the role of pharmacological drugs, disease entities and other chemical substances, offering a thorough foundation for more complex interpretation and supporting sophisticated analytical techniques.
5.1. Limitations and Future Work
Our study relies solely on transcriptomic data (gene expression microarrays). Additional omics data (e.g., proteomics, methylation and single‐cell sequencing) could provide a more comprehensive biomarker profile, which we plan to incorporate in future work. Although we validated hub gene expression using the GEPIA2 database and tested machine learning models on an independent dataset, no experimental assays (e.g., qPCR, IHC and western blot) were performed. Our plans for the future include in vitro and vivo validation, as well as prospective clinical studies.
Although this study identifies and computationally validates key biomarkers for gallbladder cancer (GBC), clinical translation requires additional steps. The next phase involves validating the expression and diagnostic utility of hub genes in larger, independent patient cohorts through quantitative PCR, immunohistochemistry (IHC) or ELISA‐based assays on biopsy or blood samples. Moreover, prospective observational studies and eventually controlled clinical trials will be essential to assess their predictive and prognostic performance in real‐world clinical settings. We plan to collaborate with clinical oncology departments to design and execute such translational studies, moving towards biomarker‐guided diagnostics and personalised therapy for GBC patients.
Author Contributions
Rabea Khatun: conceptualization, resources, formal analysis, methodology, software, writing – original draft. Wahia Tasnim: validation, writing – original draft. Maksuda Akter: investigation, writing – original draft. Md. Manowarul Islam: conceptualization, supervision, methodology, resources, software, writing – original draft, validation. Md. Ashraf Uddin: validation, writing – review and editing. Saurav Chandra Das: validation, writing – review and editing. Md. Zulfiker Mahmud: writing – original draft, writing – review and editing.
Ethics Statement
The authors have nothing to report.
Consent
The authors have nothing to report.
Conflicts of Interest
The authors declare no conflicts of interest.
Materials Availability
The authors have nothing to report.
Code Availability
The authors have nothing to report.
Handling Editor: Hao Wu
Funding: This research was supported by the ICT Division, Ministry of Telecommunications and Information Technology for Research Fellowship of 2022‐23 (2022‐23).
Data Availability Statement
The selected datasets are sourced from free and open‐access source: https://www.ncbi.nlm.nih.gov/gds.
References
- 1. Hundal R. and Shaffer E. A., “Gallbladder Cancer: Epidemiology and Outcome,” Clinical Epidemiology (2014): 99–109, 10.2147/clep.s37357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Raki´c M., Patrlj L., Kopljar M., et al., “Gallbladder Cancer,” Hepatobiliary Surgery and Nutrition 3, no. 5 (2014): 221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Siegel R. L., Miller K. D., Fuchs H. E., Jemal A., et al., “Cancer Statistics, 2021,” Ca‐A Cancer Journal for Clinicians 71, no. 1 (2021): 7–33, 10.3322/caac.21654. [DOI] [PubMed] [Google Scholar]
- 4. Valle J. W., Kelley R. K., Nervi B., Oh D.‐Y., and Zhu A. X., “Biliary Tract Cancer,” Lancet 397, no. 10272 (2021): 428–444, 10.1016/s0140-6736(21)00153-7. [DOI] [PubMed] [Google Scholar]
- 5. Gourgiotis S., Kocher H. M., Solaini L., Yarollahi A., Tsiambas E., and Salemis N. S., “Gallbladder Cancer,” American Journal of Surgery 196, no. 2 (2008): 252–264, 10.1016/j.amjsurg.2007.11.011. [DOI] [PubMed] [Google Scholar]
- 6. Chun Y. S., Pawlik T. M., and Vauthey J.‐N., “Of the AJCC Cancer Staging Manual: Pancreas and Hepatobiliary Cancers,” Annals of Surgical Oncology 25, no. 4 (2018): 845–847, 10.1245/s10434-017-6025-x. [DOI] [PubMed] [Google Scholar]
- 7. Mantripragada K. C., Hamid F., Shafqat H., and Olszewski A. J., “Adjuvant Therapy for Resected Gallbladder Cancer: Analysis of the National Cancer Data Base,” Journal of the National Cancer Institute 109, no. 2 (2017): 202, 10.1093/jnci/djw202. [DOI] [PubMed] [Google Scholar]
- 8. Chen M., Cao J., Bai Y., et al., “Development and Validation of a Nomogram for Early Detection of Malignant Gallbladder Lesions,” Clinical and Translational Gastroenterology 10, no. 10 (2019): e00098, 10.14309/ctg.0000000000000098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Tauriello D. V., Palomo‐Ponce S., Stork D., et al., “Tgfβ Drives Immune Evasion in Genetically Reconstituted Colon Cancer Metastasis,” Nature 554, no. 7693 (2018): 538–543, 10.1038/nature25492. [DOI] [PubMed] [Google Scholar]
- 10. Wu M.‐J., Chen Y.‐S., Kim M. R., et al., “Epithelial‐Mesenchymal Transition Directs Stem Cell Polarity via Regulation of Mitofusin,” Cell Metabolism 29, no. 4 (2019): 993–1002, 10.1016/j.cmet.2018.11.004. [DOI] [PubMed] [Google Scholar]
- 11. Zheng L., Xu M., Xu J., et al., “ELF3 Promotes Epithelial–Mesenchymal Transition by Protecting ZEB1 From miR‐141‐3p‐Mediated Silencing in Hepatocellular Carcinoma,” Cell Death & Disease 9, no. 3 (2018): 387, 10.1038/s41419-018-0399-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Civenni G., Bosotti R., Timpanaro A., et al., “Epigenetic Control of Mitochondrial Fission Enables Self‐Renewal of Stem‐Like Tumor Cells in Human Prostate Cancer,” Cell Metabolism 30, no. 2 (2019): 303–318, 10.1016/j.cmet.2019.05.004. [DOI] [PubMed] [Google Scholar]
- 13. Raphael B. J., Hruban R. H., Aguirre A. J., et al., “Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma,” Cancer Cell 32, no. 2 (2017): 185–203, 10.1016/j.ccell.2017.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Kinde I., Bettegowda C., Wang Y., et al., “Evaluation of DNA From the Papanicolaou Test to Detect Ovarian and Endometrial Cancers,” Science Translational Medicine 5, no. 167 (2013): 167–41674, 10.1126/scitranslmed.3004952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Kulasingam V. and Diamandis E. P., “Strategies for Discovering Novel Cancer Biomarkers Through Utilization of Emerging Technologies,” Nature Clinical Practice Oncology 5, no. 10 (2008): 588–599, 10.1038/ncponc1187. [DOI] [PubMed] [Google Scholar]
- 16. Auslander N., Gussow A. B., and Koonin E. V., “Incorporating Machine Learning Into Established Bioinformatics Frameworks,” International Journal of Molecular Sciences 22, no. 6 (2021): 2903, 10.3390/ijms22062903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Yang Y.‐C., Chen Z.‐T., Wan D.‐L., Tang H., and Liu M.‐L., “Targeted Gene Sequencing and Bioinformatics Analysis of Patients With Gallbladder Neuroendocrine Carcinoma: A Case Report,” World Journal of Gastrointestinal Oncology 17, no. 1 (2025): 100757, 10.4251/wjgo.v17.i1.100757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Cao J., Shao H., Hu J., et al., “Identification of Invasion‐Metastasis Associated MiRNAs in Gallbladder Cancer by Bioinformatics and Experimental Validation,” Journal of Translational Medicine 20, no. 1 (2022): 188, 10.1186/s12967-022-03394-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Singh N., Sharma R., and Bose S., “Meta‐Analysis of Transcriptomics Data Identifies Potential Biomarkers and Their Associated Regulatory Networks in Gallbladder Cancer,” Gastroenterology and Hepatology From Bed to Bench 15, no. 4 (2022): 311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Barrett T., Wilhite S. E., Ledoux P., et al., “NCBI GEO: Archive for Functional Genomics Data Sets—Update,” Nucleic Acids Research 41, no. D1 (2012): 991–995, 10.1093/nar/gks1193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Kong X., Wang C., Wu Q., et al., “Screening and Identification of Key Biomarkers of Depression Using Bioinformatics,” Scientific Reports 13, no. 1 (2023): 4180, 10.1038/s41598-023-31413-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Xu Z., Zhou Y., Cao Y., Dinh T. L. A., Wan J., and Zhao M., “Identification of Candidate Biomarkers and Analysis of Prognostic Values in Ovarian Cancer by Integrated Bioinformatics Analysis,” Medical Oncology 33, no. 11 (2016): 1–8, 10.1007/s12032-016-0840-y. [DOI] [PubMed] [Google Scholar]
- 23. Oliveros J. C., “Venny. An Interactive Tool for Comparing Lists With Venn Diagrams,” (2007), http://bioinfogp.cnb.csic.es/tools/venny/index.html.
- 24. Reimand J., Isserlin R., Voisin V., et al., “Pathway Enrichment Analysis and Visualization of Omics Data Using g: Profiler, GSEA, Cytoscape and EnrichmentMap,” Nature Protocols 14, no. 2 (2019): 482–517, 10.1038/s41596-018-0103-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Ashburner M., Ball C. A., Blake J. A., et al., “Gene Ontology: Tool for the Unification of Biology,” Nature Genetics 25, no. 1 (2000): 25–29, 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Croft D., O’kelly G., Wu G., et al., “Reactome: A Database of Reactions, Pathways and Biological Processes,” supplement, Nucleic Acids Research 39, no. S1 (2010): 691–697, 10.1093/nar/gkq1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Huang D. W., Sherman B. T., Tan Q., et al., “The David Gene Functional Classification Tool: A Novel Biological Module‐Centric Algorithm to Functionally Analyze Large Gene Lists,” Genome Biology 8, no. 9 (2007): 1–16, 10.1186/gb-2007-8-9-r183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Franceschini A., Szklarczyk D., Frankild S., et al., “STRING v9. 1: Protein‐Protein Interaction Networks, With Increased Coverage and Integration,” Nucleic Acids Research 41, no. D1 (2012): 808–815, 10.1093/nar/gks1094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Smoot M. E., Ono K., Ruscheinski J., Wang P.‐L., and Ideker T., “Cytoscape 2.8: New Features for Data Integration and Network Visualization,” Bioinformatics 27, no. 3 (2011): 431–432, 10.1093/bioinformatics/btq675. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Kohl M., Wiese S., and Warscheid B., “Cytoscape: Software for Visualization and Analysis of Biological Networks,” In Data Mining in Proteomics: From Standards to Applications, edited by Hamacher M., Eisenacher M., and Stephan C., (2011), 291–303. Humana Press. [DOI] [PubMed] [Google Scholar]
- 31. Chin C.‐H., Chen S.‐H., Wu H.‐H., Ho C.‐W., Ko M.‐T., and Lin C.‐Y., “Cyto‐Hubba: Identifying Hub Objects and Sub‐Networks From Complex Interactome,” BMC Systems Biology 8, no. 4 (2014): 1–7, 10.1186/1752-0509-8-s4-s11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Breiman L., “Random Forests,” Machine Learning 45, no. 1 (2001): 5–32, 10.1023/a:1010933404324. [DOI] [Google Scholar]
- 33. Khatun R., Akter M., Islam M. M., et al., “Cancer Classification Utilizing Voting Classifier With Ensemble Feature Selection Method and Transcriptomic Data,” Genes 14, no. 9 (2023): 1802, 10.3390/genes14091802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Mazdadi M. I., Farmadi A., Kartini D., Kartini D., and Muliadi X., “Implementation of Particle Swarm Optimization Feature Selection on Na¨ıve Bayes for Thoracic Surgery Classification,” Journal of Electronics, Electromedical Engineering, and Medical Informatics 5, no. 3 (2023): 150–158, 10.35882/jeemi.v5i3.305. [DOI] [Google Scholar]
- 35. Tang Z., Li C., Kang B., Gao G., Li C., and Zhang Z., “GEPIA: A Web Server for Cancer and Normal Gene Expression Profiling and Interactive Analyses,” Nucleic Acids Research 45, no. W1 (2017): 98–102, 10.1093/nar/gkx247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Huang J., Patel H. K., Boakye D., et al., “World‐Wide Distribution, Associated Factors, and Trends of Gallbladder Cancer: A Global Country‐Level Analysis,” Cancer Letters 521 (2021): 238–251, 10.1016/j.canlet.2021.09.004. [DOI] [PubMed] [Google Scholar]
- 37. Khandelwal A., Malhotra A., Jain M., Vasquez K. M., and Jain A., “The Emerging Role of Long Non‐Coding RNA in Gallbladder Cancer Pathogenesis,” Biochimie 132 (2017): 152–160, 10.1016/j.biochi.2016.11.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Liu Y., Ding W., Yu W., Zhang Y., Ao X., and Wang J., “Long Non‐Coding RNAs: Biogenesis, Functions, and Clinical Significance in Gastric Cancer,” Molecular Therapy Oncolytics 23 (2021): 458–476, 10.1016/j.omto.2021.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Demols A., Rocq L., Perez‐Casanova L., et al., “A Two‐Step Diagnostic Approach for NTRK Gene Fusion Detection in Biliary Tract and Pancreatic Adenocarcinomas,” Oncologist 28, no. 7 (2023): 520–525, 10.1093/oncolo/oyad075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Akhtar J., Jain V., Kansal R., et al., “Quantitative Tissue Proteome Profile Reveals Neutrophil Degranulation and Remodeling of Extracellular Matrix Proteins in Early Stage Gallbladder Cancer,” Frontiers in Oncology 12 (2023): 1046974, 10.3389/fonc.2022.1046974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Song Q., Yang Y., Jiang D., et al., “Proteomic Analysis Reveals Key Differences Between Squamous Cell Carcinomas and Adenocarcinomas Across Multiple Tissues,” Nature Communications 13, no. 1 (2022): 4167, 10.1038/s41467-022-31719-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Yang X., Li P., Zhuang J., et al., “Identification of Molecular Targets of Bile Acids Acting on Colorectal Cancer and Their Correlation With Immunity,” Digestive Diseases and Sciences 69, no. 1 (2024): 123–134, 10.1007/s10620-023-08032-x. [DOI] [PubMed] [Google Scholar]
- 43. Wang H., Zeng X., Zheng Y., Wang Y., and Zhou Y., “Exosomal Circrna in Digestive System Tumors: The Main Player or Coadjuvants?,” Frontiers in Oncology 11 (2021): 614462, 10.3389/fonc.2021.614462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Shao Y., Zhou Y., Hou Y., et al., “Prognostic Implications of SLIT and ROBO1 Expression in Gallbladder Cancer,” Cell Biochemistry and Biophysics 70, no. 2 (2014): 747–758, 10.1007/s12013-014-9976-6. [DOI] [PubMed] [Google Scholar]
- 45. Ma Y., Zhang Y., Chen F., et al., “The COL7A1/PI3K/AKT Axis Regulates the Progression of Cholangiocarcinoma,” Heliyon 10, no. 18 (2024): e37361, 10.1016/j.heliyon.2024.e37361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Oh S. E., Oh M. Y., An J. Y., et al., “Prognostic Value of Highly Expressed Type VII Collagen (COL7A1) in Patients With Gastric Cancer,” Pathology and Oncology Research 27 (2021): 1609860, 10.3389/pore.2021.1609860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Qin J., Cui Z., Zhou J., et al., “IGF2BP3 Drives Gallbladder Cancer Progression by m6A‐Modified CLDN4 and Inducing Macrophage Immunosuppressive Polarization,” Translational Oncology 37 (2023): 101764, 10.1016/j.tranon.2023.101764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Wu S., Wang G., Xie Y., et al., “The CLEC3B Inhibits Cellular Proliferation and Metastasis of Cholangiocarcinoma Through Wnt/β‐Catenin Pathway,” PeerJ 12 (2024): 18497, 10.7717/peerj.18497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Gao W. and Yang M., “Identification by Bioinformatics Analysis of Potential Key Genes Related to the Progression and Prognosis of Gastric Cancer,” Frontiers in Oncology 12 (2022): 881015, 10.3389/fonc.2022.881015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Zhang A.‐h., Sun H., Yan G.‐L., Han Y., and Wang X.‐J., “Serum Proteomics in Biomedical Research: A Systematic Review,” Applied Biochemistry and Biotechnology 170, no. 4 (2013): 774–786, 10.1007/s12010-013-0238-7. [DOI] [PubMed] [Google Scholar]
- 51. Volkmann H., H¨oglinger G. U., Gr¨on G., et al., “MRI Classification of Progressive Supranuclear Palsy, Parkinson Disease and Controls Using Deep Learning and Machine Learning Algorithms for the Identification of Regions and Tracts of Interest as Potential Biomarkers,” Computers in Biology and Medicine 185 (2025): 109518, 10.1016/j.compbiomed.2024.109518. [DOI] [PubMed] [Google Scholar]
- 52. Sivakumar R. and Penkova A., “Enhancing Glaucoma Detection Through Multi‐Modal Integration of Retinal Images and Clinical Biomarkers,” Engineering Applications of Artificial Intelligence 143 (2025): 110010, 10.1016/j.engappai.2025.110010. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The selected datasets are sourced from free and open‐access source: https://www.ncbi.nlm.nih.gov/gds.
