Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Oct 31;15:38124. doi: 10.1038/s41598-025-08865-8

Multistage feature selection and stacked generalization model for cancer detection

Sulekha Das 1, Avijit Kumar Chaudhuri 2,, Sayak Das 3, Partha Ghosh 4
PMCID: PMC12579239  PMID: 41173962

Abstract

To address the issue of reliable cancer screening, this study proposes a novel approach to select key features in conjunction with a stacking classifier. It reduces the number of features required while maintaining the same diagnostic accuracy. The experimental results demonstrate that the proposed method yields superior performance in terms of accuracy, sensitivity, precision, specificity, and AUC on each benchmark dataset. This stacked model, built from Logistic Regression, Naïve Bayes, Decision Tree and a Multilayer Perceptron as meta-classifier, achieves 100% accuracy, sensitivity, specificity and AUC using the selected optimal feature subsets. The findings confirm that intelligent feature selection helps models perform better and is easier to use in identification of cancer.

Keywords: Cancer detection, Feature selection, Stacked generalization, Breast Cancer, Lung Cancer, Hybrid Filter-Wrapper, Stacked classifier

Subject terms: Cancer, Computational biology and bioinformatics, Developmental biology, Evolution

Introduction

In 2020, there were a staggering 18,094,716 million reported cases of cancer worldwide (https://www.wcrf.org/cancer-trends/global-cancer-data-by-country/). The age-standardized rate for all cancers, excluding non-melanoma skin cancer, stood at 190 per 100,000 individuals. Notably, this rate was higher among men at 206.9 per 100,000 compared to women at 178.1 per 100,0001.

This constant burden involving almost every country in the world is a clear indication of how complex the cancer issue is in the public health domain. It has affected so many people, proving why research should go on. Better treatment and a proper way of preventing the spread of this disease are essential. Remarkably, approximately 40% of cancer cases could be averted by addressing risk factors associated with diet, nutrition, and physical activity.

Healthcare Artificial Intelligence (AI) uses software, especially machine learning algorithms, to diagnose conditions and disseminate and interpret large amounts of health and medical information. AI can generate output depending on the input, making it quite valuable for diagnosis and policy formulation in the medical field2. The primary reason for using AI in healthcare is to analyze how different medical treatments are connected with the patient’s quality of life. Unlike traditional techniques, AI is outstanding in acquiring, evaluating, and making conclusions based on data. This is done using such methods as Deep Learning and Machine Learning (ML)3,4.

Based on data patterns, AI makes its reasoning and increases the chances of making correct predictions and recommendations. These capabilities are currently being leveraged in various areas of healthcare, including:

Diagnostic procedures

Deep learning, in particular, allows AI to read X-rays and Magnetic Resonance Imaging (MRI) scans to determine what is wrong and to better assist radiologists in diagnosing the situation5,6.

Medication development

It can also accurately predict how various chemicals interact with the intended proteins and accelerate the drug-creation process6.

Personalized medicine

AI can successfully be applied in individual treatment or primary disease prevention depending on genetic factors, additional behaviours, or other factors.

Patient monitoring

Healthcare continuous monitoring by patients using their vital signs and other health metrics through devices and applications AI could inform the healthcare providers about potential health risks at a given time.

Treatment protocol formation

AI can also assimilate massive amounts of clinical data and then coordinate effective treatment plans for patient care, thus bringing the best available care to the patient. Therefore, integrating AI into the healthcare system positively affects the quality, speed, and individual approach to the patient’s treatment and has a great potential to positively influence the patient’s health. For instance, in the context of screening, to some extent, they are diagnosing the likelihood of the development of coronary artery disease. In gastroenterology, AI is used in endoscopic procedures, like Colonoscopy, to diagnose diseases and abnormalities faster. AI also appears helpful in infectious disease medicine, especially with the novel coronavirus1,2. The host response to the virus can also be determined from neural networks with mass spectrometry. Other uses of AI include detecting antibiotic resistance, diagnosing malaria using blood smears, and enhancing rapid diagnostic tests for Lyme disease. It is also being used for the diagnosis of tuberculosis, meningitis, and sepsis, as well as for the prognosis of complicated treatment in hepatitis C and B6.

Uses of machine learning in cancer prediction

Identifying treatment strategies in individual patients concerning molecular, genetic, and tumour characteristics is at the heart of comprehensive cancer therapy and is amenable to AI-based solutions7,8. Oncology is a field that has received considerable attention in achieving the use of AI and especially ML in risk evaluation, diagnosis, drug discovery, and molecular profiling of tumours. These investigations demonstrate that applying ML increases cancer prediction and diagnostic proficiencies compared with cognitive-based analysis of only pathology micrographs and imaging studies, as well as the conversion of images into sequences of numbers. In January 2020, by employing the Google DeepMind algorithm, doctors designed a new AI system to detect breast cancer more efficiently913. In July 2020, researchers at the University of Pittsburgh developed a new machine learning algorithm with 98% specificity and 98% sensitivity in implementing a prostate cancer diagnosis. Another study conducted after the ViT-Patch model tested it on a public dataset to validate its feasibility in detecting malignancies and positioning tumours.

Another study employed ML to categorize the cancer information and to diagnose breast cancer with the help of classification models like the SVM classifiers and the probabilistic neural.

Networks, and the KNN14. The overall test accuracy of the models developed using this classifier was the highest. Rana et al., 2015 15 applied the usage of the ML classification algorithms that work through the analysis of previous data to predict the new input categories, and the authors stated that out of all the algorithms, the random forest model emerged as the best with an accuracy of 96% in various cancer detection. This study laid the groundwork for the proposed AI system recommended for implementation14,15.

Another observational research compared the performance of the SVM, artificial neural networks, Naïve Bayes classifier, and AdaBoost tree models for breast cancer prognosis after employing the Principal Component Analysis (PCA) for dimensionality reduction. The study concluded that ANN was the best method for giving accurate real-time predictions and prognoses1619. Cancer has become a worrying issue among many other diseases facing humans worldwide. In the early stages, detection is crucial for reducing overwhelmingly high mortality rates. Therefore, developing a quick, accurate, and understandable machine-learning model is the primary focus of this research. Simplifying the model decreases the computation burden and increases its interpretability. This research investigates a 3-layer Hybrid Filter-Wrapper strategy for feature selection and the Stacked Classifier approach. Our evaluation spans the WBC dataset with 30 features and one outcome variable and the LCP dataset from the Kaggle Machine Learning repository with 15 independent features and one dependent feature. In Phase 1, we apply a Greedy stepwise search algorithm (as shown in Algorithm 1 and Fig. 1) that selects 9 features for the WBC Dataset and 10 for the LCP Dataset. Those selected features are highly correlated with the class but not among themselves. In stage 2, the best first search combined with a logistic regression algorithm is used to select 6 features for BC and 8 for the LC dataset (as shown in Algorithm 2 and Fig. 1). In Phase 3, various classifiers such as Logistic Regression (LR), Naïve Bayes (NB), Decision Tree (DT), Support Vector Machine (SVM) with the polynomial kernel, Multilayer Perceptron Network (MLN), and a Stacked model (as depicted in Fig. 1) are used to distinguish patients with or without Breast/Lung Cancer using the selected features. The Stacked model comprises LR, NB, and DT as base classifiers and MLP as the Meta-classifier. This study uses data splitting, a variety of metrics, and statistical tests together with 10-fold cross-validation to allow for a rigorous comparison. In particular, the LR, the NB, and the DT show improved performance metrics when the features are reduced to 6/3. In a 50–50 split, SVM gets 98.6% accuracy with 30 features for WBC and 25 features for LCP datasets. However, MLP and LR with six/three features also cross the 98% threshold. Interestingly, the stacked model achieves 100% accuracy with six/five features in all splits (50–50, 66 − 34, and 80 − 20) and 10-fold cross-validations. These results and feature reduction methods indicate a significant improvement over the previous studies in cancer detection methods. The primary focus of our current study is on the model’s performance. The authors also acknowledge the importance of helping clinicians and healthcare professionals understand and trust the model’s decisions. This work has incorporated explainability techniques such as SHAP, Local Interpretable Model-agnostic Explanations (LIME), and saliency maps to provide insights into how the stacked model makes predictions. These methods enhance our approach’s clinical relevance and facilitate its integration into healthcare practice.

Fig. 1.

Fig. 1

Cancer Prediction Framework.

Combining stacked generalization with hybrid feature selection has improved prediction quality and explainable results during cancer detection tests. Multiple base classifiers connected through stacked ensemble models outperform individual classifiers since they extract distinct advantages from different algorithms. This research confirms their effectiveness because deep learning-based stacked models perform best when identifying breast and lung cancers. Multistage feature selection approaches that amalgamate filter- wrapper and embedded modules achieve better generalizability by cutting down dimensions and retaining essential independent features. Studies recently demonstrated these methods work effectively for cancer patients since optimized feature selection helps physicians make better early cancer diagnoses and predictions regarding patient outcomes. Our research expands previous advancements through a three-layer hybrid feature selection technique and stacked classification approach, which achieves 100% accuracy during various evaluation tests. Incorporating explainability techniques improves clinical healthcare adoption by overcoming essential obstacles related to AI implementation. This work faces primary hindrances in dealing with inconsistent data and feature selection approaches’ impact on classification success. The model performance of the Hybrid Filter- Wrapper approach depends substantially on its feature reduction process according to performance observations from WBC and LCP datasets. When the original feature set was reduced to 9, 10, 6, and 8 features, significant improvements in accuracy, sensitivity, precision, specificity, AUC, and Kappa statistics values were made. The primary achievement of this research is showing that applying ensemble learning methods along with feature selection enables better classification outcomes. The stacked model produces outstanding results by outperforming individual classifiers in every feature set and train-test split evaluation with 100% accuracy. The research findings confirm the need to optimize feature subsets because they enhance model performance yet demonstrate stacking as the superior ensemble method for achieving reliable classification outcomes.

Relevant literature

Tables 1 and 2 summarize accuracy-related literature research and comparative statistical analysis. They present an extensive review of the methodology while demonstrating the performance boost obtained from the proposed method.

Table 1.

WBC performance Comparison.

Year Method Results(%)
Abdulkareem SA, & Abdulkareem,202120 Random Forest, eXtreme Gradient Boosting Accuracy- 99.02
Alshayeji et al., 202221 Artificial Neural Network

Accuracy- 99.85

Sensitivity – 100 Specificity–

99.72

Benbrahimetal., 202022 Naïve Bayes, Logistic Regression, Multilayer Perceptron Accuracy- 97
Hernandez-Julioet al., 202323  Fuzzy Algorithms

Accuracy- 99.3 Sensitivity–

98.57

Specificity– 99.69

Kappa – 98.45

Hossin et al., 202324  logistic regression, random forest, K-nearest neighbours, decision tree, adaboost, support vector machine, gradient boosting, and Gaussian Naive Bayes

Accuracy- 99.12 Sensitivity – 97.73

Specificity–

100

Kadhim&Kamil, 202225 

decision tree, quadratic discriminant analysis, AdaBoost, Bagging meta estimator, Extra randomized trees, Gaussian process classifier, Ridge, Gaussian naive Bayes, k-Nearest neighbors, multilayer perceptron, and support

vector classifier.

Accuracy- 97.7 Sensitivity– 95.74

Specificity– 98.50

Mohammadetal., 202226 

C4.5 Decision Tree, Artificial Neural Networks, Support Vector Machines, Naive Bayes multinomial classifier, and K-nearest

Neighbors

Accuracy- 97.7 Sensitivity– 96.9

Kappa – 86.78

Naji et al., 202127 

Support Vector Machine, Random Forest,

Logistic Regression, Decision Tree (C 4.5), K- Nearest Neighbours

Accuracy- 97.2

Sensitivity – 94

Umami&Sarno, 202028 

Linear Regression, Logistic Regression, Decision Tree, Gradient Boosting Decision Tree, Support Vector Machine, Random

Forest, K-nearest Neighbor

Accuracy- 99.4

Table 2.

LCP performance Comparison.

Year Method Result
Ahmed et al., 202329 k-nearestneighborand support vector machine

Accuracy- 99

Sensitivity – 99

Alshayeji&Abed,202330  XGBoost

Accuracy- 99.65

Sensitivity – 99.64

Specificity – 99.9

Al-Tawalbeh et al., 202231 KNN, SVM, Naïve Bayes and narrow neural network Accuracy- 92.6
Bharathy & Pavithra, 202232 

Support Vector Machine, K- Nearest Neighbour, Decision Tree, Logistic Regression, Naïve Bayes

and Random Forest

Accuracy- 88.5
Bushara et al., 202333  Visual Geometry Group - Capsule Network (VGG- CapsNet)

Accuracy- 98.61

Sensitivity – 99.07

Specificity – 99.07

Ingle et al., 202134 AdaBoost

Accuracy- 90.74

Sensitivity – 81.80

Specificity – 93.99

Kappa – 75.3

Lakshmi et al., 202135

Logistic Regression, Linear DiscriminantAnalysis, k- NearestNeighbors, and

Naïve Bayes algorithms

Accuracy- 100
NAM & SHIN, 201936

Support Vector Machine, Two-Class Support Decision Jungle and Multiclass

Decision Jungle

Accuracy- 100

Sensitivity – 100

Assessment of model performance

This research employs the WBC Data from the UCI Machine Learning Online Repository. The collection contains information on 569 patients. Of these 569 patients, 212 were diagnosed with “malignant” and 357 with “benign,” which gives an approximate distribution of 37.2% malignant and 62.7% benign samples. It includes 30 real-valued input predictors related to the diagnosis of benign or malignant and summarized in Table 3. Several features of cell nuclei images obtained from Fine-Needle Aspirates (FNA) of breast masses are quantified. However, even looking only at the characteristic nuclei values, the mean nucleus radius cannot delineate between elongated and circular nuclei. The dataset also contains the mean and Standard Error(SE) of the measures, as well as the worst value for each assessed feature, even in case of rare anomalies.

Table 3.

Descriptive statistics of variables (WBC Dataset).

Feature Name Description Minimu
m Value
Maximum
Value
radius_mean

Mean of distances from the center to points

on the perimeter

6.981 28.11
texture_mean

Standarddeviationofgray-scalevalues

(roughness of the cell)

9.71 39.28
perimeter_mean Mean size of the perimeter of the cell nuclei 43.79 188.5
area_mean Mean area of the cell nuclei 143.5 2501
smoothness_mean

Mean smoothness (local variation in radius

lengths)

0.05263 0.1634
compactness_mean Mean compactness (perimeter²/area − 1.0) 0.01938 0.3454
concavity_mean

Mean severity of concave portions of the

contour

0 0.4268

concave

points_mean

Mean number of concave points on the

contour

0 0.2012
symmetry_mean Mean symmetry of the cell nuclei 0.106 0.304

fractal_dimension_

mean

Meanfractaldimension(‘coastline

approximation’ − 1)

0.04996 0.09744
radius_se Standard error for the radius 0.1115 2.873
texture_se

Standard error for the texture (gray-scale

variation)

0.3602 4.885
perimeter_se Standard error for the perimeter 0.757 21.98
area_se Standard error for the area 6.802 542.2
smoothness_se Standard error for smoothness 0.001713 0.03113
compactness_se Standard error for compactness 0.002252 0.1354
concavity_se Standard error for concavity 0 0.396
concave points_se Standard error for concave points 0 0.05279
symmetry_se Standard error for symmetry 0.007882 0.07895

fractal_dimension_

se

Standard error for fractal dimension 0.000895 0.02984
radius_worst Worst or largest value for the radius 7.93 36.04
texture_worst Worst or largest value for the texture 12.02 49.54
perimeter_worst Worst or largest value for the perimeter 50.41 251.2
area_worst Worst or largest value for the area 185.2 4254
smoothness_worst Worst or largest value for smoothness 0.07117 0.2226
compactness_worst Worst or largest value for compactness 0.02729 1.058
concavity_worst Worst or largest value for concavity 0 1.252

concave

points_worst

Worst or largest value for concave points 0 0.291
symmetry_worst Worst or largest value for symmetry 0.1565 0.6638

fractal_dimension_

worst

Worst or largest value for fractal dimension 0.05504 0.2075
Outcome Diagnosis (0 = benign, 1 = malignant) 0 1

Another dataset is sourced from Kaggle Machine Learning Online Repository, contains 22 features: Age, Gender, Air Pollution, Alcohol Use, Dust Allergy, Occupational Hazards, Genetic Risk, Chronic Lung Disease, Balanced Diet, Obesity, Smoking, Passive Smoker, Chest Pain, Coughing of Blood, Fatigue, Weight Loss, Shortness of Breath, Wheezing, Swallowing Difficulty, Clubbing of Fingernails, Frequent Cold, Dry Cough, and Snoring. These features and their relationships to risk-level classifications are outlined in Table 4. The data reveals significant associations between these features and the risk of lung cancer. Notably, individuals aged 35 to 40 are statistically more likely to develop lung cancer.

Table 4.

Descriptive statistics of variables (LCP Dataset).

Feature Count Mean Standard deviation Min 25% 50% 75% Max
Age 1000 37.17 12.00 14 27.75 45 73
Gender 1000 1.40 0.49 1 1 2 2
Air Pollution 1000 3.84 2.03 1 2 6 8
Alcohol use 1000 4.56 2.62 1 2 7 8
Dust Allergy 1000 5.16 1.98 1 4 6 7 8
Occupational Hazards 1000 4.84 2.10 1 3 5 7 8
Genetic Risk 1000 4.58 2.12 1 2 7 7
Chronic Lung Disease 1000 4.38 1.84 1 3 6 7
Balanced Diet 1000 4.49 2.13 1 2 7 7
Obesity 1000 4.46 2.12 1 3 7 7
Smoking 1000 3.94 2.49 1 2 7 8
Passive Smoker 1000 4.19 2.31 1 2 7 8
Chest Pain 1000 4.43 2.28 1 2 7 9
Coughing of Blood 1000 4.85 2.42 1 3 4 7 9
Fatigue 1000 3.85 2.24 1 2 3 5 9
Weight Loss 1000 3.85 2.20 1 2 3 6 8
Shortness of Breath 1000 4.24 2.28 1 2 4 6 9
Wheezing 1000 3.77 2.04 1 2 4 5 8
Swallowing Difficulty 1000 3.74 2.27 1 2 4 5 8
Clubbing of Fingernails 1000 3.92 2.38 1 2 4 5 9
Frequent Cold 1000 3.53 1.83 1 2 3 5 7
Dry Cough 1000 3.85 2.03 1 2 6 7
Snoring 1000 2.92 1.47 1 2 4 7

Furthermore, higher levels of air pollution and alcohol consumption correlate with an increased risk of lung cancer. Conversely, lower alcohol consumption is associated with a reduced risk. Shortness of breath, wheezing, clubbing of fingernails, and snoring show a broader range of values corresponding to higher risks. Additional information can be viewed in Table 4.

The classification performance of models is evaluated using the metrics and summarized in Table 5.

Table 5.

Confusion matrix and performance evaluation metrics and statistical tests.

S/N Metrics Formula/ Description
1 Confusion Matrix
Actual

P

r e d i c t

Malignant (Positive) Benign (Negative)
Malignant (Positive) True Positive, Inline graphic False Positive, Inline graphic

Benign

(Negative)

False Negative, Inline graphic True Negative, Inline graphic

e

d

Inline graphic Inline graphic
2 Accuracy Inline graphic
3 Precision Inline graphic
4

AUC

(Areaunderthe curve)

A curve plotted between sensitivity and (1-specificity) is called

receiver operating characteristic (ROC). AUC measures the degree to which the curve is up in the north-west corner.

5 Kappa Statistic

Inline graphic

Pc is complete agreement probability, and Pb represents likelihood ‘by chance’. Its range is (-1, 1).

The number of correct predictions when classifying positive cases represents TP (True Positive). The total quantity of wrong pessimistic predictions identified as positive constitutes FP (False Positive) results. A TN(True Negative) occurrence identifies the correct classification of original negative instances. FN (False Negative) represents errors when classifying positive instances as negative.

Bagging (Bootstrap Aggregating)

Bagging trains multiple similar models using distinct parts of the training data obtained through random sampling with replacement. The final prediction consists of averaging predictions for regression tasks or performing voting for classification tasks.

Boosting

The sequential ensemble technique utilizes weak learners to build strong learners by uniting multiple learners. Every model inside this framework receives corrections for other model errors while devoting special attention to misclassified training data points.

Cross validation

When applied to separate external datasets, an analytical method performs a predictive assessment of statistical model outcomes. Models are trained using k-1 subsets (folds) to apply the learning process with the testing occurring on the left-out fold. Each fold gets used as the test set during one of the k repetitions of the training process. Such a technique prevents overfitting while producing a more precise model performance assessment.

High Precision indicates that the method misclassifies only a few healthy patients as having cancer. Sensitivity values are the percentage of actual cancer patients detected correctly, and we should aim for high sensitivity and avoid misclassifying any cancer patient as healthy. The error due to a few healthy patients’ misclassifications as having cancer can be easily corrected in further tests. Specificity is the percentage of patients correctly detected as not having cancer. AUC is the area under the Receiver operating characteristics(ROC) curve and directly correlates with a given classifier’s overall accuracy. It takes values from 0 to 1, and the values 0, 1, and 0.5 imply an entirely inaccurate test, a perfectly accurate test, and no discrimination, respectively. An acceptable, excellent, and outstanding value lies in the 0.7 to 0.8, 0.8 to 0.9, and 0.9-1. The Kappa value evaluates the observed accuracy against the expected accuracy (random chance). An observed accuracy of 80% with an expected accuracy of 50% is preferable over 75%. A higher value of Kappa is sought.

Methodology

The study implements classic machine learning classifiers (LR, SVM, NB, MLP, and HT) and a novel hybrid feature selection approach that improves predictive models and their interpretability levels. Multiple classifiers have been chosen according to their suitability in cancer prediction because each uses distinct mathematical methods to perform patient classification effectively. A new stacking model serves as a component to enhance computational efficiency in this system. This research implements the classification algorithms using the chosen feature selection method described in the sections below.

Logistic regression

Logistic regression is one of the most frequently used statistical methods to predict cancer due to its capacity to model binary response variables, such as the existence or absence of cancer, with the help of predictor variables. Several published papers have shown that it could be used to predict cancer risk using clinical and demographic variables. For instance, the study by Tsoi et al. (2018)37 used logistic regression to determine the relationship between lifestyle factors and breast cancer risk factors, including age, family history of the disease and BMI, to mention a few. Another sample application is the ability to predict lung cancer through using of logistic regression models for determining high-risk populations by history of smoking and genetic indicators (Liao et al., 2020)38.

SVM

This algorithm’s crucial approach is to search for a plane that would give the maximum margin while the dimensionality changes with the number of features. Although comparing two features for classification is relatively simple, it is more challenging when working with multiple features for classification. Increasing the margin makes the prediction outcomes more accurate39.

NB

It is based on the assumption that every variable has an independent and equivalent effect on the result. This means the features are assumed to be non-interacting and have an equal impact on the output40. However, in real-world applications, this assumption may not be valid, which can cause errors in the predictions. Some of these, especially the Gaussian Naïve Bayes, are based on the belief that the feature distributions follow the Gaussian distribution and give proportional conditional probabilities to it.

MLP

The Multilayer Perceptron (MLP) is a significant and widely recognized class of artificial neural networks (ANNs). While the concept was identified in the early 1950 s, the backpropagation algorithm for training ANNs was developed in the mid-1980s. This discovery boosted the importance of MLPs, among the most popular technologies nowadays41. An MLP is a feedforward neural network. Its architecture comprises three main layers: an input layer, one or more than one hidden layer, and an output layer. The hidden layer includes additional nodes with activation functions, which define the behaviour of the nodes. The inputs are fed into the input layer, whose number of nodes equals the number of input variables, and then passed to the first hidden layer42. In the hidden layer, every input is weighted, added to a bias, and summed according to a specific formula. Within each hidden layer node, an activation function, such as Sigmoid, ReLU, etc., is used to compute the node output passed to the next layer. The output layer, with an activation function chosen depending on the output type, generates the final output. This series of operations is known as forward feeding. The error rate is calculated based on the difference between the expected target and the actual output is then determined. This is because the error rate needs to be reduced as much as possible. Backpropagation is used to update the MLP weights according to the error rate of the previous epoch. The MLP is most suitable for solving problems that cannot be separated linearly. It is widely used for pattern recognition and is vital in predicting and diagnosing diseases43.

HT

Hoeffding trees are decision trees capable of learning from massive data streams, provided that the data distribution is not altered over time. They use small sample sizes that are often adequate to choose the optimal splitting attribute. Using the Hoeffding bound helps achieve this, as the Hoeffding bound can(within a prescribed precision)quantify the number of observations needed to estimate the suitability of the attribute. Hoeffding Trees have two parts: Hoeffding Tree training and Hoeffding Tree scoring. In Hoeffding tree training, supervised learning analyses a small sample of data with known outcomes and chooses the attribute for tree node splitting. The trained data is referenced in scoring, and each new instance is classified into the concerned class label. Various studies have indicated that HTs demonstrate strong performance44. The HT algorithm compares attributes better than other algorithms and has lower memory consumption and enhanced utilization with data sampling. However, it takes extensive time to inspect if ties occur45. HTs have been used for classification46 and disease prediction, where they have been shown to perform well. Hoeffding Trees, a category of Decision Trees, are effective methods for handling nonlinear data. Tree-based classifiers have a nonlinear and hierarchical approach that makes them suitable for non-parametric and categorical data. They are seen to have a good deal of flexibility in data analysis and reveal the hierarchical structure of independent variables, which is advantageous for classification.

Feature selection

Sequential feature selection algorithms are valuable methods in machine learning for dealing with big data. These algorithms come under greedy search techniques to streamline the initial d-dimensional feature space to a manageable k-dimensional feature space where k < d. This dimensionality reduction is essential as it helps to sort out valuable features for a given problem while leaving out irrelevant and noisy features. The main reasons for using feature selection algorithms are to improve the model’s quality and accelerate calculations. In high- dimensional datasets, some features are often redundant or irrelevant, slowing the training process, and the model’s performance could be more optimal. These algorithms are designed to reduce the model complexity by choosing a subset of the most informative features, which enables faster computation and eliminates the risk of overfitting47. Sequential feature selection can be divided into two main types: Forward selection and backward elimination. In forward selection, the algorithm begins with no features as the initial subset. Then, it adds one feature at each step with the highest correlation to the target variable until we reach k features. On the other hand, backward elimination starts with all features and then successively excludes the feature with minor importance until k features are retained. Both methods rank features according to the criterion of choice, which can be accuracy, AUC, or another measure48,49. The other significant advantage of sequential feature selection is its simplicity compared to the different methods, which can be easily implemented. However, it is inherently greedy – at each step, it makes the locally optimal choice and hopes to find a global optimum. This approach can sometimes lead to selecting suboptimal feature subsets because the algorithm does not reconsider its options. However, sequential feature selection is still widely employed due to its simplicity and computational efficiency50,51.

Wrapper feature selection is one of the most commonly used methods for choosing the best feature subset in constructing a model. It assesses the accuracy of different feature subsets in a model by training and testing a model on a given subset of features and then picks the subgroup that will provide the best result. Best-first search is the algorithm employed in this context to optimally search in the space of all possible subsets of features5255.

This paper proposes a hybrid Filter-Wrapper feature selection method, which in Phase 1 employs the filter-based approach, where a Multivariate Feature Evaluator is implemented using the greedy step-wise search algorithm. The goal of this stage is to quickly find a decent approximation of essential features with a low computational cost. In particular, features are ordered by their calculated relevance weights, and features are deleted or added iteratively according to the greedy criterion to maximize the evaluator’s score.

In Phase 2, the features identified in Phase 1 are used as the feature subspace in a Wrapper-based approach. The Wrapper approach’s computational effort in Phase 2 is reduced due to the reduced feature sub-space in Phase 1. This method attempts to select features iteratively. Its greedy nature requires lower computational effort and is preferred when there are several features. In this case, the Best-First Search technique is applied in conjunction with the Logistic Regression classifier to test various sets of features. This iterative process searches for combinations of features to zero in on the subset, aiming to optimize the predictor’s performance while keeping the computational work low. The selection of the feature is encapsulated within the framework of cross-validation; that is, feature selection (both Phases − 1 and Phases − 2) is performed separately in each fold of the 10-fold cross-validation, and hence, there is no transfer of information from that test fold to the feature selection process.

Phase III involves training different types of classifiers, such as ensemble and stacked generalization, using the final set of features to classify patients correctly. The stability of the entire pipeline can be assessed through the use of 10-fold cross-validation, whereas training and testing breaks are isolated to ensure unbiased judgment.

The experimental, which compares predictions with.

  1. all features,

  2. the feature subset obtained in Phase 1, and.

  3. The feature subset obtained in Phase 2 demonstrates the efficiency and effectiveness of the proposed hybrid feature selection method.

We will make such clarifications and place them in the manuscript as a response to the reviewer’s remarks on the feature selection process.

The Multistage Feature Selection algorithm is demonstrated by Algorithm 1 and Fig. 2 below.

Fig. 2.

Fig. 2

Hybrid Filter-Wrapper Feature Selection.

Algorithm 1: multistage feature selection

graphic file with name 41598_2025_8865_Figa_HTML.gif

Stacking

Bagging and boosting use homogeneous weak learners in the ensemble technique, while stacking often uses heterogeneous ones5659. These weak learners are trained simultaneously, and the final decision is made by training a meta-learner. The meta-learner uses the predictions of the weak learners as input features and the targets from the dataset as output targets (as illustrated in Fig. 3 and Algorithm 2). The meta-learner is designed to learn how best to combine the input predictions for the output of an improved final prediction.

Fig. 3.

Fig. 3

Proposed Stacking Model.

In a method like Random Forest, the model averages the decisions made by individual trained models. However, this approach’s limitation is that it assigns an equal weightage to each model irrespective of its accuracy. A better approach is the weighted average ensemble, which assigns weights to individual models based on their accuracy in making predictions. This weighted average method usually improves the result compared to the simple averaging ensemble6062.

Stacking takes this idea one step further and replaces the linear weighted sum with another learning algorithm, such as Linear Regression for regression problems and Logistic Regression for classification problems. This approach averages the outputs of the sub-models more flexibly, and any learning algorithm can be used at the final stage of combining. Stacking uses the production of the sub-models as input and tries to learn how to incorporate them for better overall prediction6365.

Algorithm 2: stacking with k-fold cross-validation

graphic file with name 41598_2025_8865_Figb_HTML.jpg

graphic file with name 41598_2025_8865_Figc_HTML.jpg

Results and discussion

The experiments were conducted using the Python programming language in the Google Colab environment. The selected features in Phase 1 and Phase 2 of the Hybrid Filter- Wrapper approach are provided in Tables 6 and 7.

Table 6.

Features selected using hybrid Filter-Wrapper approach (WBC Dataset).

Approach Selected features
Phase 1: Greedy stepwise & Filter (9 features)

perimeter mean, area mean, compactness mean, symmetry

mean, fractaldimensionmean, area_se, concavity_se, texture_worst, area_worst

Phase2:Best-first&

Wrapper (6 features)

compactness_mean, symmetry_mean, fractal_dimension_mean,

area_se, concavity_se, area_worst

Table 7.

Features selected using hybrid Filter-Wrapper approach (LCP Dataset).

Approach Selected features
Phase 1: Greedy stepwise & Filter (10 features) Genetic Risk, Balanced Diet, Smoking, Chest Pain, Fatigue, Weight Loss, Wheezing, Swallowing Difficulty, Dry Cough, Snoring
Phase 2: Best-first & Wrapper (6 features) compactness_mean, symmetry_mean, fractal_dimension_mean, area_se, concavity_se, area_worst

Three different set-ups of experiments are used.

the complete dataset with all 30(WBC dataset)/16 features (LCP dataset),

the dataset with only 9(WBC dataset)/10(LCP dataset) features obtained in Phase 1, and the datasets with only 6(WBC dataset)/8(LCP dataset) features obtained in Phase 2.

A comparative analysis of LR, NB, SVM, RF, MLP, and the Stacked model was performed in the three settings using 50–50, 66 − 34, and 80 − 20 train-test split of the dataset, and the results were validated using a 10-fold cross-validation. The classification accuracy, sensitivity, Precision, and specificity values are provided in Tables 8, 9 and 10, and 11, Tables 12 and 13, and Tables 14 and 15, respectively. The AUC/ROC and Kappa values are provided in Tables 16, 17, 18 and 19 respectively. All results are obtained using Python programming language and multiplied by 100. Values of various performance metrics for all features, 9 features and 6 features are separated by a comma in each cell of the tables.

Table 8.

Comparison of accuracies with all features, 9 features and 6 features (WBC Dataset).

Train-Test
Split
LR NB SVM HT MLP STACKE
D
50–50

76,80.6,

95.4

74.3, 80.6,

95.4

76.4,

79.5,

95.7

74.3, 80.6,

95.4

75, 86.9,

96.8

79.2, 87.6,

100

66 − 34

79.8,

83.9, 95.3

78.2, 80.8,

92.2

77.7,

82.9,

96.9

78.2, 80.8,

92.2

77.7,

91.7,

98.9

81.3, 91.7,

100

80 − 20

80.7,

82.4, 95.6

78,82.4,

95.6

82.4,

80.7,

95.6

78,82.4,

95.6

80.7,

91.2,

98.2

79.8, 90.3,

100

10-fold

cross validation

76.9,

80.6, 95.6

77.1, 79.6,

95.4

76.4,

80.1,

96.1

77.1, 78.7,

94.9

80.5,

91.9,

97.5

79.4, 90.7,

100

Table 9.

Comparison of accuracies with all features, 10 features and 8 features (LCP Dataset).

Train-Test Split LR NB SVM HT MLP STACKE D
50–50

75.8,

84.8, 100

76.2, 84.8,

100

75.3,

84.4, 100

76.2, 84.6,

100

80.3,

97.9,

100

87, 96, 100
66 − 34

77.2,

83.6, 100

76.4, 84.2,

100

76.1,

82.7, 100

76.4, 82.7,

100

84.5,

98.4,

100

92, 99, 100
80 − 20

75.1,84,

100

76.1, 84.6,

100

75.6,

83.2, 100

76.1,84,

100

84.7,

98.5,

100

90, 97, 100
10-fold cross-validation

75.1,

84.8, 100

74.6, 84.6,

100

74.2,

84.6, 100

74.6, 84.8,

100

86.4,

99.2,

100

91, 99, 100

Table 10.

Comparison of sensitivity with all features, 9 features and 6 features (WBC Dataset).

Train-Test Split LR NB SVM HT MLP STACKE D
50–50

76.1,

80.6, 95.4

74.3,

80.6, 95.4

76.4,

79.6, 95.8

74.3,

80.6, 95.4

75,87,

96.8

79.2, 87.7,

100

66 − 34 79.8,83.9, 95.3 78.2,80.8, 92.2 77.7,82.9, 96.9 78.2,80.8, 92.2 77.7,91.7, 99 81.3, 91.7,100
80 − 20

80.7,

82.5, 95.6

78.1,

82.5, 95.6

82.5,

80.7, 95.6

78.1,

82.5, 95.6

80.7,

91.2, 98.2

79.8, 90.4,

100

10-fold cross-validation

77,80.7,

95.6

77.2,

79.6, 95.4

76.4,

80.1, 96.1

77.2,

78.7, 94.9

80.5, 91.9,

97.5

79.4, 90.7,

100

Table 11.

Comparison of Sensitivity(recall) with all features, 10 features and 8 features (LCP Dataset).

Train-Test Split LR NB SVM HT MLP STACKE D
50–50

75.9,

84.9, 100

76.2,

84.8, 100

75.3,

84.5, 100

76.2,

84.6, 100

80.3,

97.9, 100

87, 96, 100
66 − 34

77.3,

83.6, 100

76.5,

84.2, 100

76.2,

82.7, 100

76.5,

82.7, 100

84.5,

98.4, 100

92, 99, 100
80 − 20

75.2,84,

100

76.2,

84.7, 100

75.7,

83.2, 100

76.2,84,

100

84.7,

98.5, 100

90, 97, 100
10-fold cross-validation

75.1,

84.8, 100

74.6,

84.6, 100

74.2,

84.7, 100

74.6,

84.9, 100

86.4, 99.2,

100

91, 99, 100

Table 12.

Comparison of precision with all features, 9 features and 6 features (WBC Dataset).

Train-Test
Split
LR NB SVM HT MLP STACKE
D
50–50

76.7,

80.7,

95.3

74.4,

80.9,

95.7

76.7,80,

96

74.4,

80.9,

95.7

75.1,

87.1,

96.8

79.3, 87.8,

100

66 − 34

79.8,

84.1,

95.9

78.2,

80.8, 92.9

77.8,

83.5, 97

78.2,

80.8, 92.9

77.8,

91.8,99

81.3, 91.9,

100

80 − 20

80.7,

82.5, 95.8

78.1,

82.5, 95.8

82.5,

80.9, 95.8

78.1,

82.5, 95.8

80.7,

91.3, 98.3

83.6, 90.5,

100

10-fold cross-validation

76.9,

80.7, 95.5

77.2,

79.6, 95.7

76.4,

80.2, 96.3

77.2,

78.8, 95

80.5,92,

97.6

79.4,91,

100

Table 13.

Comparison of precision with all features, 10 features and 8 features (LCP Dataset).

Train-Test Split LR NB SVM HT MLP STACKE D
50–50

75.9,

84.6, 100

76.2,

84.2, 100

75.4,

84.1, 100

76.2,

84.3, 100

80.5,

97.9, 100

88, 96, 100
66 − 34

78.3,

83.5, 100

76.5,

83.7, 100

76.2,

82.4, 100

76.5,

82.4, 100

84.5,

98.4, 100

92, 99, 100
80 − 20

75.1,

83.7, 100

76.1,

84.2, 100

75.7,

82.7, 100

76.1,

83.5, 100

84.7,

98.5, 100

90, 97, 100
10-fold cross-validation

75.1,

84.6, 100

74.6,

84.1, 100

74.2,

84.4, 100

74.6,

84.4, 100

86.4, 99.2,

100

91, 99, 100

Table 14.

Comparison of specificity with all features, 9 features and 6 features (WBC Dataset).

Train-Test Split LR NB SVM HT MLP STACKE D
50–50

70.6,

82.9, 96.7

70.4, 84.4,

94.9

72, 84, 95.3

70.4,

84.4, 94.9

71.5,

89.1, 96.8

80.3, 85.8,

100

66 − 34

78.4,

82.4, 98.7

77,81.5,

91.6

75, 80, 96.5

77,81.5,

91.6

75,90.8,

98.8

79.7, 89.3,

100

80 − 20

81.8,

81.9, 95.3

77.6, 81.9,

95.3

81.3, 79.7,

95.3

77.6,

81.9, 95.3

79.6,

93.7, 98

72.3, 88.7,

100

10-fold cross validation

76,80.6,

96.1

77.2,80,

94.8

74.8, 79.8,

95.6

77.3,

77.7, 94.6

78,90.2,

97.3

77.7, 87.5,

100

Table 15.

Comparison of specificity with all features, 10 features and 8 features (LCP Dataset).

Train-Test Split LR NB SVM HT MLP STACKE D
50–50

74.6,71,

100

75,73.7,

100

76.6, 70.6,

100

75,70.7,

100

77.6,

95.7, 100

90, 92, 100
66 − 34

76,69.4,

100

75.3, 73.1,

100

75,68.4,

100

75.3,

68.4, 100

82.9,

96.3, 100

91, 98, 100
80 − 20

73.8,

72.3, 100

74.8, 76.4,

100

75.5,72,

100

74.8,

78.8, 100

84,99.3,

100

86, 97, 100
10-fold cross-validation

74.7,

70.8, 100

74.3,73,

100

72.8, 71.2,

100

74.3,73,

100

85.9,

99.3, 100

91, 98, 100

Table 16.

Comparison of AUC (ROC area) value with all features, 9 features and 6 features (WBC Dataset).

Train-Test Split LR NB SVM HT MLP STACKE D
50–50

84.9,

86.9, 93.4

83.2, 87.3,

93.1

76.5, 79.8,

85

83.2,

87.3, 93.1

82.7, 88.8,

88.4

84.8, 92.8,

100

66 − 34

86.3,

86.5, 97.2

85.2, 86.4,

97.9

77.7, 82.1,

89.3

85.2,

86.4, 97.9

85.5, 92.1,

95.8

88,94.5,

100

80 − 20

88.6,

82.2, 95.8

87.7, 82.9,

95.4

82.5, 79.1,

79.2

87.7,

82.9, 95.4

86.1, 95.3,

90.4

89.3, 94.7,

100

10-fold cross-validation

85.3,

85.8, 95.7

85.3, 85.6,

95.5

76.2, 80.1,

88.2

85.3,83,

95.3

86.1, 94.5,

90.6

86.5, 91.8,

100

Table 17.

Comparison of AUC (ROC area) value with all features, 10 features and 8 features (LCP Dataset).

Train-Test Split LR NB SVM HT MLP STACKE D
50–50

83.9,

91.5, 100

83.5,91,

100

75,78.2,

100

83.5,

91.7, 100

87.8, 98.2,

100

93, 99, 100
66 − 34

84.3,

90.9, 100

84.3, 90.7,

100

76,76.7,

100

84.3,

87.7, 100

89.4, 98.5,

100

95, 99, 100
80 − 20

82.9,

91.2, 100

83.2,91,

100

75,76.8,

100

83.2,90,

100

89.9, 98.1,

100

94, 96, 100
10-fold cross-validation

83.3,

91.7, 100

83.1, 91.6,

100

74.2, 78.6,

100

83.1,

91.2, 100

92.2, 98.7,

100

95, 99, 100

Table 18.

Comparison of kappa statistics with all features, 9 features and 6 features.

Train-Test Split LR NB SVM HT MLP STACKE D
50–50

52.1,

61.1, 80.4

48.2, 61.3,

78.1

52.6, 59.2,

80

48.2,

61.3, 78.1

49.5, 73.8,

85.8

57.5, 75.1,

100

66 − 34

59.3,

67.3, 82.5

56.1, 61.2,

59.7

55.3,65,

86.2

56.1,

61.2, 59.7

55.3, 83.2,

95.7

62.4, 83.1,

100

80 − 20

61.4,

63.3, 71.4

56.1, 63.4,

71.4

64.9, 59.5,

71.5

56.1,

63.4, 71.5

61.4, 82.1,

89.9

59.6, 79.9,

100

10-fold cross-validation

53.5,

61.3, 82.9

53.7, 59.2,

81.2

52.5, 60.2,

84.4

53.7,

57.4, 79.1

60.8, 83.8,

90.5

58.6, 81.3,

100

Table 19.

Comparison of kappa statistic with all features, 10 features and 8 features.

Train-Test Split LR NB SVM HT MLP STACKE D
50–50

51.5,

59.1, 100

52.2, 57.3,

100

50.2, 57.8,

100

52.2,

58.3, 100

60.6, 94.5,

100

75, 90, 100
66 − 34

54.3,

57.2, 100

52.7, 57.3,

100

52.1, 54.4,

100

52.7,

54.4, 100

68.9, 95.9,

100

83, 97, 100
80 − 20

49.6,

58.7, 100

51.6, 59.2,

100

50.4, 55.8,

100

51.6,

55.8, 100

68.9, 96.2,

100

80, 92, 100
10-fold cross-validation

50.1,

59.3, 100

49,57.3,

100

48.3, 58.5,

100

49,58.3,

100

72.7, 97.9,

100

82, 98, 100

Analysis of accuracy: The STACKED model is the most accurate, providing 100% accuracy in all splits and cross-validation. From this, it can be concluded that the stacking ensemble method is superior to individual models in the case of WBC and LCP datasets. The MLP classifier also gives high accuracy for all the splits, especially the 80 − 20 and 66 − 34 splits. The SVM classifier performed significantly, particularly in the 80 − 20 split scenario. However, the average accuracy of LR, NB, and HT is lower than that of SVM, MLP, and STACKED models. Regarding accuracy, STACKED models yield excellent results, and in many cases, it achieved 100% accuracy for both the datasets.

Tables 10 and 11 presents a comparison of the sensitivities of the different classification models with three features (all features, 9/10 features, and 6/8 features in WBC/LCP datasets) and train-test split ratios differentiate between 50 and 50, 66 − 34, and 80 − 20%, and 10 fold cross- validation. This implies that as the number of features decreases from all features to 9 or 10 features and then 6 or 8 features, the sensitivity increases, especially for the stacked model. Generally, the stacked model yields the highest results, with sensitivity rates approaching 100% in most cases. The performance is reasonably stable between various splits of the train- test data.

Table 11 presents the sensitivity of the same models on the LCP Dataset. The comparison is done between all features, 10 features, and 8 features simultaneously on the same train-test splits and using a 10-fold CV. Similarly to the WBC dataset, whre the enumber of features.

decreases, the sensitivity is either maintained at a high level or can slightly improve. The stacked model has 100% sensitivity in all feature subsets and train-test splits, which proves it is the most accurate model in this case. The MLP also exhibits a high level of performance, mainly when a small number of features are incorporated; in most instances, sensitivity is nearly 100%. We observe that the stacked model has the highest sensitivity across all feature sets and train-test splits for both datasets.

In some cases, it is equal to 100%, which shows that it works well for predicting actual positive values with one set of features in the first place and with another when using the second set of given data. As for the impact upon sensitivity, decreasing the number of features does not always negatively affect the outcome and can even enhance the result in MLP and Stacked models. It is essential to note that the performance of the models is comparable when applying cross-validation and when partitioning the data differently to carry out train-test splits.

Tables 12 and 13 presents the comparison of the precision metrics of the classification model with different types of feature combinations (all features, 9 or 10 features, and 6 or 8 features) with the three train-test splitting ratios(50–50%, 66 − 34%, and 80 − 20%) and 10-fold cross-validation. Overall, there is a trend of Precision being lower in various models with all features. In the WBC dataset with 9 or 6 selected features reach a higher precision in contrast to utilizing all the characteristics, and there is a positive shift in the models like MLP and stacked models. The highest Precision is also recorded when optimal 6 or 8 features are selected, mainly for the stacked model, where its accuracy is 100% in most of the train-test split ratios.

Analysing the Precision for both datasets and various classification models, it was observed that Precision increases as the number of features decreases, especially in the case of stacked and MLP models. The stacked model is also consistently more accurate and sometimes achieves 100% Precision, which shows the model to be sound and perfect. The results have indicated that reducing the features improves the accuracy, with the best results being obtained for either the 6 or the 8 features, depending on the dataset.

The specificity analysis stated in Tables 14 and 15 analyses the specificity of various models (LR, NB, SVM, HT, MLP, and STACKED) for two datasets (WBC and LCP) employing different features and train-test split. The proposed STACKED model demonstrates the high specificity for the WBC dataset throughout the analysis of all the feature sets and all splits, with 100% for the optimal 6 features. On the other hand, for the LCP dataset, STACKED has comparable accuracy but varied a little from the previous ones and had a 100% specificity in all classifiers and splits with minimum feature sets, which shows the optimal outcome in both datasets.

The AUC (ROC area) in Tables 16, 17, 20 and 21 indicates that the stacked model has given the best overall results with the optimum scores of 100% in all cases. Regarding the improvement of the other models, reducing features is less consistent across the different train-test splits for models such as LR, NB, and MLP but is more consistent with the stacked model. For instance, SVM accuracy decreases when the number of features is low because SVM becomes unstable, especially in the 80 − 20% split. In Table 17, the stacked model has once again produced perfect scores in all configurations. Similarly, with these models, decreasing the number of input features causes an enhancement in AUC. Therefore, the stability of AUC in the WBC and LCP datasets proves that the stacked model is sound across all the feature sets and splits. In summary, the stacked model achieves the highest AUC/ROC value on both datasets, although the impact of feature reduction varies unevenly across the different classification algorithms studied.

Table 20.

Comparison of ROC value with all features, 9 features and 6 features (WBC Dataset) for 10 fold cross-validations for stacked model.

graphic file with name 41598_2025_8865_Figd_HTML.gif graphic file with name 41598_2025_8865_Fige_HTML.gif graphic file with name 41598_2025_8865_Figf_HTML.gif
ROC = 86.5 ROC = 91.8, ROC = 1

Table 21.

Comparison of AUC value with all features, 10 features and 8 features (LCP Dataset) for 10 fold cross-validations for stacked model.

graphic file with name 41598_2025_8865_Figg_HTML.gif graphic file with name 41598_2025_8865_Figh_HTML.gif graphic file with name 41598_2025_8865_Figi_HTML.gif
ROC = 0.95 ROC = 0.99 ROC = 1

As demonstrated from the results of Tables 18 and 19 STACKED and MLP models perform better as compared to the other models irrespective of using train-test split and feature sets while achieving 100% Kappa statistics with using fewer numbers of features (6 or 8 features). As shown in both tables, the performance of these models gets higher if the number of features decreases, which proves the stability and efficiency of the models with a small and adequate number of features. Other models, such as LR, NB, and SVM, have much more fluctuating and relatively lower accuracies overall, with a slight boost due to feature elimination. However, they are less optimal than STACKED or MLP. This study examines the impact of feature selection and model selection on assessing multiple sclerosis lesions. It confirms that STACKED and MLP are the most accurate methods when used with limited features.

The hybrid feature selection method generally showed computational efficiency and usability in finding the best possible compromise between precision, sensitivity, specificity, kappa value, AUC, and overall accuracy. This solution has an iterative approach with complete performance metrics that allow it to provide a dependable cancer diagnosis while minimizing misclassifications and maximizing clinical utility.

The determining role of the interpretability of models lives in high-stakes areas like healthcare, wherein AI prediction models, for example, in cancer diagnosis, have direct implications on patient outcomes. Understandability and openness is the key to improve the trustfulness of medical practitioners and their decision-making process, easing the clinical acceptance of AI systems.

SHapley Additive exPlanations(SHAP) is made easy to understand through the function of the beeswarm plot, which summarizes the feature’s importance over several instances visually. Specifically, this research presents SHAP summary plots for both LCP and WBC datasets, respectively, which appear in Fig. 4. Real examples from the framework deliver model-based information to show how different features impact predictions through the SHAP visualizations. A single SHAP value belongs to a given feature and a specific instance, visible as a dot in each plot. The model ranks features according to their importance, while the colour scale indicates a value range from low to high. The features ‘Smoking,’ ‘Chest Pain,’ and ‘Genetic Risk’ in the LCP dataset appeared consistently in important rankings, which match known lung-related condition risk factors. The WBC dataset consists of ‘area_worst’ and ‘perimeter_worst’ features, which match clinical professionals’ perspectives of breast cancer diagnosis through tumour characteristics. The authors have proved model interpretability by presenting extensive visualizations which received expert verification.

Fig. 4.

Fig. 4

Beeswarm Plot Supporting Multistage Feature Selection and Feature Consistency.

Cancer detection depends significantly on Confidence Intervals(CI) because they generate statistical ranges showing where diagnostic measures such as sensitivity specificity and accuracy values should lie with 95% confidence. When performing cancer diagnostic tests, doctors gain valuable information about test performance through CIs to make correct decisions because false positives and negatives represent critical risks. Using single-point estimates, a diagnostic model assessing sensitivity at 90% with an associated 85–94% confidence interval shows enhanced trustworthiness.

Figures 5 and 6 presents the confidence intervals for the accuracy values calculations done on the WBC Dataset with its complete set of features, with selected 9 features and 6 features and for LCP Dataset with all features, 10 features and 8 features respectively.

Fig. 5.

Fig. 5

Confidence interval of Accuracies with all features, 9 features and 6 features (WBC Dataset).

Fig. 6.

Fig. 6

Confidence interval of Accuracies with all features, 10 features and 8 features (LCP Dataset).

Interpretation of Results.

LR (Logistic Regression): The confidence interval is (76.0, 80.7), indicating that we can be 95% confident that the true mean accuracy of the LR model lies within this range. NB (Naive Bayes): The confidence interval is (80.6, 83.8), suggesting a high level of confidence in the model’s performance. SVM (Support Vector Machine): The confidence interval is (95.4, 95.6), indicating very high accuracy with a narrow range. HT (Hoeffding Tree): The confidence interval is (74.3, 78.2), showing a lower performance compared to others. MLP (Multilayer Perceptron): The confidence interval is (79.6, 82.4), indicating good performance. STACKED: The confidence interval is (100.0, 100.0), indicating perfect accuracy.

Interpretation of results

LR (Logistic Regression): The confidence interval is (75.1, 77.5), indicating that we can be 95% confident that the true mean accuracy of the LR model lies within this range. NB (Naive Bayes): The confidence interval is (83.8, 85.2), suggesting a high level of confidence in the model’s performance. SVM (Support Vector Machine): The confidence interval is (100.0, 100.0), indicating perfect accuracy. HT (Hoeffding Tree): The confidence interval is (75.5, 76.5), showing a lower performance compared to others. MLP (Multilayer Perceptron): The confidence interval is (84.4, 84.8), indicating good performance. STACKED: The confidence interval is (100.0, 100.0), indicating perfect accuracy.

Similarly in this section, we described the confidence intervals for ROC AUC values achieved on the WBC dataset by ROC/AUC analysis on complete feature sets and on the sets consisting of only 9 and 6 selected features, respectively. The 95% confidence intervals (CIs) of the area under the curve (AUC) values derived from the ROC analysis applied to the WBC data set with complete feature set, reduced sets with 9 and 6 selected features demonstrate the consistency in performance from different classifiers. Logistic Regression (LR) was 85.83–92.27%, and Naive Bayes (NB) was not far behind at 85.47–92.11%. Support Vector Machine (SVM) had a relatively low performance with a CI of 78.58–84.03%. Hoeffding tree (HT) range was 85.13–91.99%, and the Multi-Layer Perceptron (MLP) showed promising results with a CI of 87.03–91.99%. The importance of the Stacked Ensemble method was demonstrated as it performed better than the individual classifiers, giving the highest AUC confidence interval of 89.96–97.10%, which implied its generalization capability across subsets of features.

Cross-validation methodology

The classification metrics of stratified 10-fold cross-validation at each fold appear in the accompanying images presented in Fig. 7, which was applied to the training dataset. The evaluation strategy provides precise metrics about precision and recall along with f1-score and support and accuracy, which specifically measure the performance of classes 0 and 1 in each cross-validation fold. This approach ensures that: The dataset points serve only one training session followed by one validation session. Every fold maintains the original distribution of the classes through stratification, allowing more trustworthy assessment metrics.

Fig. 7.

Fig. 7

Comparison of Accuracies in individual folds in 10 fold Cross-Validations.

Key results from the cross-validation

The accuracy scores measured between 0.84 and 1.00, with seven out of ten folds demonstrating results above the 0.98 range. The performance of the 4th validation fold was the poorest (accuracy = 0.84) due to the natural distribution variance in split data. The model demonstrated strong generalization abilities through its consistently high precision, recall and F1 scores across all different data subsets.

Our proposed hybrid feature selection method undergoes additional validation tests by analyzing the Framingham Heart Study dataset from a different medical domain. This dataset maintains a grossly imbalanced proportion between its 4240 instances, where 644 belong to the positive group and the remaining 3596 belong to the negative group. The Stacking classifier showed 92.95% accuracy across 10-fold cross-validation, achieving AUC/ROC of 96.3%, sensitivity and specificity at 93%, and a kappa value of 86%. During the first stage, when the model underwent feature selection (which reduced the features to 11), the performance of the Stacking classifier increased dramatically to reach 98.8% accuracy alongside 100% AUC/ROC and 99% sensitivity and specificity and a kappa value of 97. The classifier’s performance improved alongside dimension reduction through feature selection since irrelevant and redundant features was removed, which yielded better generalization capabilities and robustness. All primary evaluation metrics verify the selected features’ effectiveness and the proposed model’s powerful capabilities. The method demonstrates reliability and consistency while retaining important information and analyzing datasets with different database sizes and distribution patterns. The Framingham Heart Study dataset includes 16 columns comprising 15 attributes and a single outcome variable. This dataset exists to forecast a 10-year CHD risk for patients. The DOI of the dataset is 10.34740/kaggle/dsv/3493583.

Conclusion

This research develops a novel cancer detection system based on the ensemble approach with a generated two-step Hybrid Filter-Wrapper feature selection framework applied to WBC and LCP cancer datasets. The approach consists of three separate phases in its framework. Phase 1(Filter-Based Feature Selection): We use a stepwise Greedy search algorithm and multivariate feature ranking tool in the first phase. The approach minimizes data dimensions while preserving key features that help identify the target class, representing cancer presence or absence. We reduce feature correlation to obtain an informational subset of variables during selection. Phase 2(Wrapper-Based Feature Refinement): The best-first search technique with logistic regression in the second phase refines the Phase 1 selected features. This repetitive assessment of selected features enables performance enhancement in the model while simultaneously assisting in finding the best features for new classification tasks. Phase 3(Classifier Implementation): Six machine learning classifiers are applied in the third stage, including LR, NB, DT, and SVM, using a polynomial kernel and MLP. The Stacked model merges LR, NB, and DT base classifiers into a structure that applies MLP as its meta-classifier. The robust evaluation procedure uses 10-fold cross-validation and train-test splits at 50/50%, 66/34%, and 80/20%. The extensive examination method enables a detailed evaluation of different performance outcomes for the selected features. The designed stacked model achieves outstanding classification metrics performance in every test as it outclasses baseline models by substantial margins. Hybrid filter-wrapper feature selection and a stacked model improve diagnostic accuracy and interpretability during procedures. The result shows that combining various classifiers produces the best results in complex classification problems like cancer testing. The ensemble stacked model proves to be an effective predictive tool for clinical cancer detection applications by enhancing the accuracy of results.

Future scope

To showcase the practical applications of the model in clinical settings, it should be validated using real-world patient data through collaborations with hospitals and cancer research institutes. This will ensure its effectiveness across diverse demographics. Integrating the model into electronic health record systems can assist oncologists in the early detection of cancer. Additionally, conducting pilot studies with healthcare institutions can compare Artificial Intelligence (AI)- assisted screening methods with traditional diagnostic techniques. Furthermore, deploying the model in telemedicine and mobile health applications can enhance access to early screening, especially in under-served areas.

The model proved consistently effective at predicting cancer when applied to the WBC dataset of 569 patients from the UCI Machine Learning Repository and the LCP dataset consisting of 1,000 participants from Kaggle. This indicates that it is appropriately scalable for moderately sized datasets. But when we increase the number of data entries, model scalability may depend on the increased computational cost of memory and time. Therefore, ways to remain efficient in low-resource settings, including data preprocessing, model pruning, and cloud computing resources, must be accounted for. With growing datasets, future studies would get to optimize deployment approaches that can maintain the accuracy and applicability of the model in changing real-world environments.

Acknowledgements

Not Applicable.

Author contributions

Sulekha Das: Conceptualization, Writing- Original draft, Software, Investigation, Data Curation, Visualization. Avijit Kumar Chaudhuri: Methodology, Validation, Formal Analysis, Supervision, Writing- Review & Editing, Methodology, Validation, Formal Analysis. Sayak Das: Supervision. Partha Ghosh: Supervision.

Funding information

Not Applicable.

Data availability

We have dealt with two datasets for this article. Both of these datasets underlying this study are openly available in kaggle website. These data were derived from sources in the public domain. Link of this datasets are given below.https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-datahttps://www.kaggle.com/datasets/mysarahmadbhat/lung-cancer.

Declarations

Competing interests

The authors declare no competing interests.

Ethical statement

Though we applied datasets from publicly available repositories, we realized that there were ethical concerns for potential biases in datasets and privacy in healthcare research. The datasets used herein follow the applicable rules of ethics, such as data anonymization and de- identification, and we take all reasonable measures to mitigate biases in our analysis.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Rauf, F. et al. DenseIncepS115: a novel network-level fusion framework for alzheimer’s disease prediction using MRI images. Front. Oncol.14, 1501742 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Fatima, M. et al. Breast lesion segmentation and classification using U-Net saliency Estimation and explainable residual convolutional neural network. Fractals10, S0218348X24400607 (2024). [Google Scholar]
  • 3.Ertel, W. Introduction To Artificial Intelligence (Springer, 2018).
  • 4.Kalis, B., Collier, M. & Fu, R. 10 promising AI applications in health care. Harvard Business Rev. 2–5 (2018).
  • 5.Wang, H., Zu, Q., Chen, J., Yang, Z. & Ahmed MA. Application of artificial intelligence in acute coronary syndrome: a brief literature review. Adv. Therapy 1–9 (2021). [DOI] [PubMed]
  • 6.Majumder, J., Ghosh, S., Khang, A., Debnath, T. & Chaudhuri, A. K. Hepatitis C. Prediction using feature selection by machine learning technique. In Medical Robotics and AI-Assisted Diagnostics for a High-Tech Healthcare Industry, pp 195–204, IGI Global (2024).
  • 7.Rauf, F. et al. Artificial intelligence assisted common maternal fetal planes prediction from ultrasound images based on information fusion of customized convolutional neural networks. Front. Med.11, 1486995 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ullah, M. S., Khan, M. A., Albarakati, H. M., Damaševičius, R. & Alsenan, S. Multimodal brain tumor segmentation and classification from MRI scans based on optimized DeepLabV3 + and interpreted networks information fusion empowered with explainable AI. Comput. Biol. Med.182, 109183 (2024). [DOI] [PubMed] [Google Scholar]
  • 9.Yu, C. & Helwig, E. J. The role of AI technology in prediction, diagnosis and treatment of colorectal cancer. Artif. Intell. Rev.55, 323–343 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kumar, Y., Gupta, S., Singla, R. & Hu Y. C. A systematic review of artificial intelligence techniques in cancer prediction and diagnosis. Arch. Comput. Methods Eng.29, 2043–2070 (2022). [DOI] [PMC free article] [PubMed]
  • 11.McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature577, 89–94 (2020). [DOI] [PubMed]
  • 12.Majumder, A. & Sen D. Artificial intelligence in cancer diagnostics and therapy: current perspectives. Indian J. Cancer58, 481–492 (2021). [DOI] [PubMed]
  • 13.Pantanowitz, L. et al. An artificial intelligence algorithm for prostate cancer diagnosis in whole slide images of core needle biopsies: a blinded clinical validation and deployment study. Lancet Digit. Health2, e407–e416 (2020). [DOI] [PubMed]
  • 14.Feng, H. et al. Identifying malignant breast ultrasound images using ViT-patch. Appl. Sci.13, 3489 (2023).
  • 15.Ray, A., Chen, M. & Gelogo, Y. Performance comparison of different machine learning algorithms for risk prediction and diagnosis of breast cancer. In Smart Technologies in Data Science and Communication: Proceedings of SMART-DSC 2019 71–76 Springer Singapore. (2020).
  • 16.Rana, M., Chandorkar, P., Dsouza, A. & Kazi, N. Breast cancer diagnosis and recurrence prediction using machine learning techniques. Int. J. Res. Eng. Technol.4, 372–376 (2015). [Google Scholar]
  • 17.Kharya, S., Dubey, D. & Soni, S. Predictive machine learning techniques for breast cancer detection. Int. J. Comput. Sci. Inform. Technol.4, 1023–1028 (2013). [Google Scholar]
  • 18.Liu, H. et al. Recent advances in pulse-coupled neural networks with applications in image processing. Electronics11 3264 (2022).
  • 19.Agrawal, S. & Agrawal, J. Neural network techniques for cancer prediction: a survey. Procedia Comput. Sci.60, 769–774 (2015). [Google Scholar]
  • 20.Abdulkareem, S. A. & Abdulkareem Z. O. An evaluation of the Wisconsin breast cancer dataset using ensemble classifiers and RFE feature selection. Int. J. Sci. Basic. Appl. Res.55, 67–80 (2021).
  • 21.Alshayeji, M. H., Ellethy, H. & Gupta R Computer-aided detection of breast cancer on the Wisconsin dataset: an artificial neural networks approach. Biomed. Signal Process. Control. 71, 103141 (2022). [Google Scholar]
  • 22.Benbrahim, H. & Hachimi, H. and Amine A. (Eds.). Comparative study of machine learning algorithms using the breast cancer dataset. In Advanced Intelligent Systems for Sustainable Development (AI2SD’2019) Volume 2-Advanced Intelligent Systems for Sustainable Development Applied To Agriculture and Health, pp 83–91, Springer International Publishing (2020).
  • 23.Hernández-Julio, Y. F., Díaz-Pertuz, L. A., Prieto-Guevara, M. J., Barrios-Barrios, M. A. & Nieto-Bernal, W. Intelligent fuzzy system to predict the Wisconsin breast cancer dataset. Int. J. Environ. Res. Public Health. 20, 5103 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Hossin, M. M. et al. Breast cancer detection: an effective comparison of different machine learning algorithms on the Wisconsin dataset. Bull. Electr. Eng. Inf.12, 2446–2456 (2023). [Google Scholar]
  • 25.Kadhim, R. R. & Kamil, M. Y. Comparison of breast cancer classification models on Wisconsin dataset. International Journal of Reconfigurable and Embedded Systems 2089 4864. (2022).
  • 26.Mohammad, W. T., Teete, R., Al-Aaraj, H., Rubbai, Y. S. Y. & Arabyat, M. M. Diagnosis of breast cancer pathology on the Wisconsin dataset with the help of data mining classification and clustering techniques. Applied Bionics and Biomechanics 2022 6187275. (2022). [DOI] [PMC free article] [PubMed] [Retracted]
  • 27.Naji, M. A. et al. Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Comput. Sci.191, 487–492 (2021).
  • 28.Umami, R. F. & Sarno, R. Analysis of classification algorithm for Wisconsin diagnosis breast cancer data study. In 2020 International Seminar on Application for Technology of Information and Communication (iSemantic) 464–469 IEEE. (2020).
  • 29.Ahmed, S. et al. The deep learning resnet101 and ensemble Xgboost algorithm with hyperparameters optimization accurately predict the lung cancer. Appl. Artif. Intell.37, 2166222 (2023).
  • 30.Alshayeji, M. H. & Abed S.E. Lung cancer classification and identification framework with automatic nodule segmentation screening using machine learning. Appl. Intell.53, 19724–19741 (2023).
  • 31.Al-Tawalbeh, J. et al. Classification of lung cancer by using machine learning algorithms. In 2022 5th International Conference on Engineering Technology and its Applications (IICETA) 528–531 IEEE. (2022).
  • 32.Bharathy, S. & Pavithra, R. Lung cancer detection using machine learning. In 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC) 539–543 IEEE. (2022).
  • 33.Bushara, A. R., Kumar, R. V. & Kumar, S. S. An ensemble method for the detection and classification of lung cancer using computed tomography images utilizing a capsule network with visual geometry group. Biomed. Signal Process. Control. 85, 104930 (2023). [Google Scholar]
  • 34.Ingle, K., Chaskar, U. & Rathod, S. Lung cancer types prediction using machine learning approach. In 2021 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT) 01–06 IEEE. (2021).
  • 35.Lakshmi, S. V., Greeshma, B., Thanooj, M. J., Reddy, K. R. & Rakesh K. R. Lung cancer detection and stage classification using supervised algorithms. Turkish J. Physiotherapy Rehabilitation32, 3 (2021).
  • 36.Nam, Y. J. & Shin W. J. A study on comparison of lung cancer prediction using ensemble machine learning. Korea J. Artif. Intell.7, 19–24 (2019).
  • 37.Tsoi, K. K., Chan, F. C., Hirai, H. W. & Sung JJ Risk of Gastrointestinal bleeding and benefit from colorectal cancer reduction from long-term use of low-dose aspirin: a retrospective study of 612,509 patients. J. Gastroenterol. Hepatol.33, 1728–1736 (2018). [DOI] [PubMed] [Google Scholar]
  • 38.Liao, C. M., Huang, W. H., Kung, P. T., Chiu, L. T. & Tsai, W. C. Comparison of colorectal cancer screening between people with and without disability: a nationwide matched cohort study. BMC Public. Health. 21, 1034 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Chaudhuri, A. K., Banerjee, D. K. & Das A. A dataset-centric feature selection and stacked model to detect breast cancer. Int. J. Intell. Syst. Appl.13, 24 (2021).
  • 40.Chaudhuri, A. K. & Das, S. and Ray A. (Eds.). An improved random forest model for detecting heart disease. In Data-Centric AI Solutions and Emerging Technologies in the Healthcare Ecosystem, pp 143–164, CRC (2024).
  • 41.Safar, A. A., Salih, D. M. & Murshid, A. M. Pattern recognition using the multilayer perceptron (MLP) for medical disease: a survey. Int. J. Nonlinear Anal. Appl.14, 1989–1998 (2023). [Google Scholar]
  • 42.Javaid, S. & Saeed N. Neural networks for infectious diseases detection: prospects and challenges. Authorea Preprints (2023).
  • 43.Macukow, B. (Ed.). Neural networks–state of art, brief history, basic models and architecture. In Computer Information Systems and Industrial Management 3–14 Springer International Publishing (2016).
  • 44.Wang, X. Analysis and optimization for Hoeffding tree (2020).
  • 45.Mathew, T. E. Appositeness of Hoeffding tree models for breast cancer classification. J. Curr. Sci. Technol.12, 391–407 (2022). [Google Scholar]
  • 46.Elbasi, E. & Zreikat A.I. Heart disease classification for early diagnosis based on adaptive Hoeffding tree algorithm in IoMT data. Int. Arab. J. Inform. Technol.20, 38–48 (2023).
  • 47.Chaudhuri, A. K. & Das, A. Variable selection in genetic algorithm model with logistic regression for prediction of progression to diseases. In 2020 IEEE International Conference for Innovation in Technology (INOCON) 1–6 IEEE. (2020).
  • 48.Agarap, A. F. M. On breast cancer detection: an application of machine learning algorithms on the Wisconsin diagnostic dataset. In Proceedings of the 2nd International Conference on Machine Learning and Soft Computing 5–9. (2018).
  • 49.Chaudhuri, A. K., Sinha, D., Banerjee, D. K. & Das A. A novel enhanced decision tree model for detecting chronic kidney disease. Netw. Model. Anal. Health Inf. Bioinf.10, 1–22 (2021).
  • 50.Das, S. et al. A.K. A multifaceted approach to understanding mental health crises in the COVID-19 era: using AI algorithms and feature selection strategies. In AI-Driven Innovations in Digital Healthcare: Emerging Trends, Challenges, and Applications, pp. 97–119 IGI Global. (2024).
  • 51.Kar, S. P. et al. Identification of insecurity in COVID-19 using machine learning techniques. In Medical Robotics and AI- Assisted Diagnostics for a High-Tech Healthcare Industry 239–256 IGI Global. (2024).
  • 52.Cheng, Z. et al. Application of serum SERS technology based on thermally annealed silver nanoparticle composite substrate in breast cancer. Photodiagnosis and Photodynamic Therapy, 41, 103284. doi:https://doi.org/10.1016/j.pdpdt.2023.103284. (2023). [DOI] [PubMed]
  • 53.Zeng, Q. et al. Serum Raman spectroscopy combined with convolutional neural network for rapid diagnosis of HER2-positive and triple-negative breast cancer. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 286, 122000. doi: https://doi.org/10.1016/j.saa.2022.122000. (2023). [DOI] [PubMed]
  • 54.Pu, X., Sheng, S., Fu, Y., Yang, Y. & Xu, G. Construction of circRNA–miRNA–mRNA CeRNA regulatory network and screening of diagnostic targets for tuberculosis. Ann. Med.56 (1), 2416604. 10.1080/07853890.2024.2416604 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Xingguang Duan, D. et al. Changsheng li. A novel robotic bronchoscope system for navigation and biopsy of pulmonary lesions. Cyborg Bionic Syst. 40013. 10.34133/cbsystems.0013 (2023). [DOI] [PMC free article] [PubMed]
  • 56.Li, H. et al. UCFNNet: Ulcerative colitis evaluation based on fine-grained lesion learner and noise suppression gating. Computer Methods and Programs in Biomedicine, 247, 108080. doi:https://doi.org/10.1016/j.cmpb.2024.108080. (2024). [DOI] [PubMed]
  • 57.Liu, S. et al. Identification of a lncRNA/circRNA-miRNA-mRNA network in Nasopharyngeal Carcinoma by deep sequencing and bioinformatics analysis. Journal of Cancer, 15(7), 1916–1928. doi: 10.7150/jca.91546. (2024). [DOI] [PMC free article] [PubMed]
  • 58.Li, Y. et al. CircMYBL1 suppressed acquired resistance to osimertinib in non-small-cell lung cancer. Cancer Genetics,284–285, 34–42. doi: https://doi.org/10.1016/j.cancergen.2024.04.001. (2024). [DOI] [PubMed]
  • 59.Chen, S. et al. Evaluation of a three-gene methylation model for correlating lymph node metastasis in postoperative early gastric cancer adjacent samples. Frontiers in Oncology, 14, 1432869. doi: https://doi.org/10.3389/fonc.2024.1432869. (2024). [DOI] [PMC free article] [PubMed]
  • 60.Cao, Z., Zhu, J., Wang, Z., Peng, Y. & Zeng, L. Comprehensive pan-cancer analysis reveals ENC1 as a promising prognostic biomarker for tumor microenvironment and therapeutic responses. Sci. Rep.14 (1), 25331. 10.1038/s41598-024-76798-9 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Zhou, J. et al. Chrysotoxine regulates ferroptosis and the PI3K/AKT/mTOR pathway to prevent cervical cancer. Journal of Ethnopharmacology, 338, 119126. doi: https://doi.org/10.1016/j.jep.2024.119126. (2025). [DOI] [PubMed]
  • 62.Jiang, Z. et al.Low-Frequency Ultrasound Sensitive Piezo1 Channels Regulate Keloid-Related Characteristics of Fibroblasts.Advanced Science, 11(14), 2305489. doi: https://doi.org/10.1002/advs.202305489. (2024). [DOI] [PMC free article] [PubMed]
  • 63.Saber, A., Elbedwehy, S., Awad, W. A. & Hassan, E. An optimized ensemble model based on meta-heuristic algorithms for effective detection and classification of breast tumors. Neural Comput. Appl.37 (6), 4881–4894 (2025). [Google Scholar]
  • 64.Elbedwehy, S., Hassan, E., Saber, A. & Elmonier, R. Integrating neural networks with advanced optimization techniques for accurate kidney disease diagnosis. Sci. Rep.14 (1), 21740 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Alnowaiser, K., Saber, A., Hassan, E. & Awad, W. A. An optimized model based on adaptive convolutional neural network and grey Wolf algorithm for breast cancer diagnosis. PloS One, 19(8), e0304868. (2024). [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

We have dealt with two datasets for this article. Both of these datasets underlying this study are openly available in kaggle website. These data were derived from sources in the public domain. Link of this datasets are given below.https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-datahttps://www.kaggle.com/datasets/mysarahmadbhat/lung-cancer.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES