Multistage feature selection and stacked generalization model for cancer detection

Sulekha Das; Avijit Kumar Chaudhuri; Sayak Das; Partha Ghosh

doi:10.1038/s41598-025-08865-8

. 2025 Oct 31;15:38124. doi: 10.1038/s41598-025-08865-8

Multistage feature selection and stacked generalization model for cancer detection

Sulekha Das ¹, Avijit Kumar Chaudhuri ^2,^✉, Sayak Das ³, Partha Ghosh ⁴

PMCID: PMC12579239 PMID: 41173962

Abstract

To address the issue of reliable cancer screening, this study proposes a novel approach to select key features in conjunction with a stacking classifier. It reduces the number of features required while maintaining the same diagnostic accuracy. The experimental results demonstrate that the proposed method yields superior performance in terms of accuracy, sensitivity, precision, specificity, and AUC on each benchmark dataset. This stacked model, built from Logistic Regression, Naïve Bayes, Decision Tree and a Multilayer Perceptron as meta-classifier, achieves 100% accuracy, sensitivity, specificity and AUC using the selected optimal feature subsets. The findings confirm that intelligent feature selection helps models perform better and is easier to use in identification of cancer.

Keywords: Cancer detection, Feature selection, Stacked generalization, Breast Cancer, Lung Cancer, Hybrid Filter-Wrapper, Stacked classifier

Subject terms: Cancer, Computational biology and bioinformatics, Developmental biology, Evolution

Introduction

In 2020, there were a staggering 18,094,716 million reported cases of cancer worldwide (https://www.wcrf.org/cancer-trends/global-cancer-data-by-country/). The age-standardized rate for all cancers, excluding non-melanoma skin cancer, stood at 190 per 100,000 individuals. Notably, this rate was higher among men at 206.9 per 100,000 compared to women at 178.1 per 100,000¹.

This constant burden involving almost every country in the world is a clear indication of how complex the cancer issue is in the public health domain. It has affected so many people, proving why research should go on. Better treatment and a proper way of preventing the spread of this disease are essential. Remarkably, approximately 40% of cancer cases could be averted by addressing risk factors associated with diet, nutrition, and physical activity.

Healthcare Artificial Intelligence (AI) uses software, especially machine learning algorithms, to diagnose conditions and disseminate and interpret large amounts of health and medical information. AI can generate output depending on the input, making it quite valuable for diagnosis and policy formulation in the medical field². The primary reason for using AI in healthcare is to analyze how different medical treatments are connected with the patient’s quality of life. Unlike traditional techniques, AI is outstanding in acquiring, evaluating, and making conclusions based on data. This is done using such methods as Deep Learning and Machine Learning (ML)^3,4.

Based on data patterns, AI makes its reasoning and increases the chances of making correct predictions and recommendations. These capabilities are currently being leveraged in various areas of healthcare, including:

Diagnostic procedures

Deep learning, in particular, allows AI to read X-rays and Magnetic Resonance Imaging (MRI) scans to determine what is wrong and to better assist radiologists in diagnosing the situation^5,6.

Medication development

It can also accurately predict how various chemicals interact with the intended proteins and accelerate the drug-creation process⁶.

Personalized medicine

AI can successfully be applied in individual treatment or primary disease prevention depending on genetic factors, additional behaviours, or other factors.

Patient monitoring

Healthcare continuous monitoring by patients using their vital signs and other health metrics through devices and applications AI could inform the healthcare providers about potential health risks at a given time.

Treatment protocol formation

AI can also assimilate massive amounts of clinical data and then coordinate effective treatment plans for patient care, thus bringing the best available care to the patient. Therefore, integrating AI into the healthcare system positively affects the quality, speed, and individual approach to the patient’s treatment and has a great potential to positively influence the patient’s health. For instance, in the context of screening, to some extent, they are diagnosing the likelihood of the development of coronary artery disease. In gastroenterology, AI is used in endoscopic procedures, like Colonoscopy, to diagnose diseases and abnormalities faster. AI also appears helpful in infectious disease medicine, especially with the novel coronavirus^1,2. The host response to the virus can also be determined from neural networks with mass spectrometry. Other uses of AI include detecting antibiotic resistance, diagnosing malaria using blood smears, and enhancing rapid diagnostic tests for Lyme disease. It is also being used for the diagnosis of tuberculosis, meningitis, and sepsis, as well as for the prognosis of complicated treatment in hepatitis C and B⁶.

Uses of machine learning in cancer prediction

Identifying treatment strategies in individual patients concerning molecular, genetic, and tumour characteristics is at the heart of comprehensive cancer therapy and is amenable to AI-based solutions^7,8. Oncology is a field that has received considerable attention in achieving the use of AI and especially ML in risk evaluation, diagnosis, drug discovery, and molecular profiling of tumours. These investigations demonstrate that applying ML increases cancer prediction and diagnostic proficiencies compared with cognitive-based analysis of only pathology micrographs and imaging studies, as well as the conversion of images into sequences of numbers. In January 2020, by employing the Google DeepMind algorithm, doctors designed a new AI system to detect breast cancer more efficiently^9–13. In July 2020, researchers at the University of Pittsburgh developed a new machine learning algorithm with 98% specificity and 98% sensitivity in implementing a prostate cancer diagnosis. Another study conducted after the ViT-Patch model tested it on a public dataset to validate its feasibility in detecting malignancies and positioning tumours.

Another study employed ML to categorize the cancer information and to diagnose breast cancer with the help of classification models like the SVM classifiers and the probabilistic neural.

Networks, and the KNN¹⁴. The overall test accuracy of the models developed using this classifier was the highest. Rana et al., 2015 ¹⁵ applied the usage of the ML classification algorithms that work through the analysis of previous data to predict the new input categories, and the authors stated that out of all the algorithms, the random forest model emerged as the best with an accuracy of 96% in various cancer detection. This study laid the groundwork for the proposed AI system recommended for implementation^14,15.

Another observational research compared the performance of the SVM, artificial neural networks, Naïve Bayes classifier, and AdaBoost tree models for breast cancer prognosis after employing the Principal Component Analysis (PCA) for dimensionality reduction. The study concluded that ANN was the best method for giving accurate real-time predictions and prognoses^16–19. Cancer has become a worrying issue among many other diseases facing humans worldwide. In the early stages, detection is crucial for reducing overwhelmingly high mortality rates. Therefore, developing a quick, accurate, and understandable machine-learning model is the primary focus of this research. Simplifying the model decreases the computation burden and increases its interpretability. This research investigates a 3-layer Hybrid Filter-Wrapper strategy for feature selection and the Stacked Classifier approach. Our evaluation spans the WBC dataset with 30 features and one outcome variable and the LCP dataset from the Kaggle Machine Learning repository with 15 independent features and one dependent feature. In Phase 1, we apply a Greedy stepwise search algorithm (as shown in Algorithm 1 and Fig. 1) that selects 9 features for the WBC Dataset and 10 for the LCP Dataset. Those selected features are highly correlated with the class but not among themselves. In stage 2, the best first search combined with a logistic regression algorithm is used to select 6 features for BC and 8 for the LC dataset (as shown in Algorithm 2 and Fig. 1). In Phase 3, various classifiers such as Logistic Regression (LR), Naïve Bayes (NB), Decision Tree (DT), Support Vector Machine (SVM) with the polynomial kernel, Multilayer Perceptron Network (MLN), and a Stacked model (as depicted in Fig. 1) are used to distinguish patients with or without Breast/Lung Cancer using the selected features. The Stacked model comprises LR, NB, and DT as base classifiers and MLP as the Meta-classifier. This study uses data splitting, a variety of metrics, and statistical tests together with 10-fold cross-validation to allow for a rigorous comparison. In particular, the LR, the NB, and the DT show improved performance metrics when the features are reduced to 6/3. In a 50–50 split, SVM gets 98.6% accuracy with 30 features for WBC and 25 features for LCP datasets. However, MLP and LR with six/three features also cross the 98% threshold. Interestingly, the stacked model achieves 100% accuracy with six/five features in all splits (50–50, 66 − 34, and 80 − 20) and 10-fold cross-validations. These results and feature reduction methods indicate a significant improvement over the previous studies in cancer detection methods. The primary focus of our current study is on the model’s performance. The authors also acknowledge the importance of helping clinicians and healthcare professionals understand and trust the model’s decisions. This work has incorporated explainability techniques such as SHAP, Local Interpretable Model-agnostic Explanations (LIME), and saliency maps to provide insights into how the stacked model makes predictions. These methods enhance our approach’s clinical relevance and facilitate its integration into healthcare practice.

Combining stacked generalization with hybrid feature selection has improved prediction quality and explainable results during cancer detection tests. Multiple base classifiers connected through stacked ensemble models outperform individual classifiers since they extract distinct advantages from different algorithms. This research confirms their effectiveness because deep learning-based stacked models perform best when identifying breast and lung cancers. Multistage feature selection approaches that amalgamate filter- wrapper and embedded modules achieve better generalizability by cutting down dimensions and retaining essential independent features. Studies recently demonstrated these methods work effectively for cancer patients since optimized feature selection helps physicians make better early cancer diagnoses and predictions regarding patient outcomes. Our research expands previous advancements through a three-layer hybrid feature selection technique and stacked classification approach, which achieves 100% accuracy during various evaluation tests. Incorporating explainability techniques improves clinical healthcare adoption by overcoming essential obstacles related to AI implementation. This work faces primary hindrances in dealing with inconsistent data and feature selection approaches’ impact on classification success. The model performance of the Hybrid Filter- Wrapper approach depends substantially on its feature reduction process according to performance observations from WBC and LCP datasets. When the original feature set was reduced to 9, 10, 6, and 8 features, significant improvements in accuracy, sensitivity, precision, specificity, AUC, and Kappa statistics values were made. The primary achievement of this research is showing that applying ensemble learning methods along with feature selection enables better classification outcomes. The stacked model produces outstanding results by outperforming individual classifiers in every feature set and train-test split evaluation with 100% accuracy. The research findings confirm the need to optimize feature subsets because they enhance model performance yet demonstrate stacking as the superior ensemble method for achieving reliable classification outcomes.

Relevant literature

Tables 1 and 2 summarize accuracy-related literature research and comparative statistical analysis. They present an extensive review of the methodology while demonstrating the performance boost obtained from the proposed method.

Table 1.

WBC performance Comparison.

Year	Method	Results(%)
Abdulkareem SA, & Abdulkareem,2021²⁰	Random Forest, eXtreme Gradient Boosting	Accuracy- 99.02
Alshayeji et al., 2022²¹	Artificial Neural Network	Accuracy- 99.85 Sensitivity – 100 Specificity– 99.72
Benbrahimetal., 2020²²	Naïve Bayes, Logistic Regression, Multilayer Perceptron	Accuracy- 97
Hernandez-Julioet al., 2023²³	Fuzzy Algorithms	Accuracy- 99.3 Sensitivity– 98.57
		Specificity– 99.69 Kappa – 98.45
Hossin et al., 2023²⁴	logistic regression, random forest, K-nearest neighbours, decision tree, adaboost, support vector machine, gradient boosting, and Gaussian Naive Bayes	Accuracy- 99.12 Sensitivity – 97.73 Specificity– 100
Kadhim&Kamil, 2022²⁵	decision tree, quadratic discriminant analysis, AdaBoost, Bagging meta estimator, Extra randomized trees, Gaussian process classifier, Ridge, Gaussian naive Bayes, k-Nearest neighbors, multilayer perceptron, and support vector classifier.	Accuracy- 97.7 Sensitivity– 95.74 Specificity– 98.50
Mohammadetal., 2022²⁶	C4.5 Decision Tree, Artificial Neural Networks, Support Vector Machines, Naive Bayes multinomial classifier, and K-nearest Neighbors	Accuracy- 97.7 Sensitivity– 96.9 Kappa – 86.78
Naji et al., 2021²⁷	Support Vector Machine, Random Forest, Logistic Regression, Decision Tree (C 4.5), K- Nearest Neighbours	Accuracy- 97.2 Sensitivity – 94
Umami&Sarno, 2020²⁸	Linear Regression, Logistic Regression, Decision Tree, Gradient Boosting Decision Tree, Support Vector Machine, Random Forest, K-nearest Neighbor	Accuracy- 99.4

Open in a new tab

Table 2.

LCP performance Comparison.

Year	Method	Result
Ahmed et al., 2023²⁹	k-nearestneighborand support vector machine	Accuracy- 99 Sensitivity – 99
Alshayeji&Abed,2023³⁰	XGBoost	Accuracy- 99.65 Sensitivity – 99.64 Specificity – 99.9
Al-Tawalbeh et al., 2022³¹	KNN, SVM, Naïve Bayes and narrow neural network	Accuracy- 92.6
Bharathy & Pavithra, 2022³²	Support Vector Machine, K- Nearest Neighbour, Decision Tree, Logistic Regression, Naïve Bayes and Random Forest	Accuracy- 88.5
Bushara et al., 2023³³	Visual Geometry Group - Capsule Network (VGG- CapsNet)	Accuracy- 98.61 Sensitivity – 99.07 Specificity – 99.07
Ingle et al., 2021³⁴	AdaBoost	Accuracy- 90.74 Sensitivity – 81.80 Specificity – 93.99 Kappa – 75.3
Lakshmi et al., 2021³⁵	Logistic Regression, Linear DiscriminantAnalysis, k- NearestNeighbors, and Naïve Bayes algorithms	Accuracy- 100
NAM & SHIN, 2019³⁶	Support Vector Machine, Two-Class Support Decision Jungle and Multiclass Decision Jungle	Accuracy- 100 Sensitivity – 100

Open in a new tab

Assessment of model performance

This research employs the WBC Data from the UCI Machine Learning Online Repository. The collection contains information on 569 patients. Of these 569 patients, 212 were diagnosed with “malignant” and 357 with “benign,” which gives an approximate distribution of 37.2% malignant and 62.7% benign samples. It includes 30 real-valued input predictors related to the diagnosis of benign or malignant and summarized in Table 3. Several features of cell nuclei images obtained from Fine-Needle Aspirates (FNA) of breast masses are quantified. However, even looking only at the characteristic nuclei values, the mean nucleus radius cannot delineate between elongated and circular nuclei. The dataset also contains the mean and Standard Error(SE) of the measures, as well as the worst value for each assessed feature, even in case of rare anomalies.

Table 3.

Descriptive statistics of variables (WBC Dataset).

Feature Name	Description	Minimu m Value	Maximum Value
radius_mean	Mean of distances from the center to points on the perimeter	6.981	28.11
texture_mean	Standarddeviationofgray-scalevalues (roughness of the cell)	9.71	39.28
perimeter_mean	Mean size of the perimeter of the cell nuclei	43.79	188.5
area_mean	Mean area of the cell nuclei	143.5	2501
smoothness_mean	Mean smoothness (local variation in radius lengths)	0.05263	0.1634
compactness_mean	Mean compactness (perimeterÂ²/area − 1.0)	0.01938	0.3454
concavity_mean	Mean severity of concave portions of the contour	0	0.4268
concave points_mean	Mean number of concave points on the contour	0	0.2012
symmetry_mean	Mean symmetry of the cell nuclei	0.106	0.304
fractal_dimension_ mean	Meanfractaldimension(‘coastline approximation’ − 1)	0.04996	0.09744
radius_se	Standard error for the radius	0.1115	2.873
texture_se	Standard error for the texture (gray-scale variation)	0.3602	4.885
perimeter_se	Standard error for the perimeter	0.757	21.98
area_se	Standard error for the area	6.802	542.2
smoothness_se	Standard error for smoothness	0.001713	0.03113
compactness_se	Standard error for compactness	0.002252	0.1354
concavity_se	Standard error for concavity	0	0.396
concave points_se	Standard error for concave points	0	0.05279
symmetry_se	Standard error for symmetry	0.007882	0.07895
fractal_dimension_ se	Standard error for fractal dimension	0.000895	0.02984
radius_worst	Worst or largest value for the radius	7.93	36.04
texture_worst	Worst or largest value for the texture	12.02	49.54
perimeter_worst	Worst or largest value for the perimeter	50.41	251.2
area_worst	Worst or largest value for the area	185.2	4254
smoothness_worst	Worst or largest value for smoothness	0.07117	0.2226
compactness_worst	Worst or largest value for compactness	0.02729	1.058
concavity_worst	Worst or largest value for concavity	0	1.252
concave points_worst	Worst or largest value for concave points	0	0.291
symmetry_worst	Worst or largest value for symmetry	0.1565	0.6638
fractal_dimension_ worst	Worst or largest value for fractal dimension	0.05504	0.2075
Outcome	Diagnosis (0 = benign, 1 = malignant)	0	1

Open in a new tab

Another dataset is sourced from Kaggle Machine Learning Online Repository, contains 22 features: Age, Gender, Air Pollution, Alcohol Use, Dust Allergy, Occupational Hazards, Genetic Risk, Chronic Lung Disease, Balanced Diet, Obesity, Smoking, Passive Smoker, Chest Pain, Coughing of Blood, Fatigue, Weight Loss, Shortness of Breath, Wheezing, Swallowing Difficulty, Clubbing of Fingernails, Frequent Cold, Dry Cough, and Snoring. These features and their relationships to risk-level classifications are outlined in Table 4. The data reveals significant associations between these features and the risk of lung cancer. Notably, individuals aged 35 to 40 are statistically more likely to develop lung cancer.

Table 4.

Descriptive statistics of variables (LCP Dataset).

Feature	Count	Mean	Standard deviation	Min	25%	50%	75%	Max
Age	1000	37.17	12.00	14	27.75		45	73
Gender	1000	1.40	0.49	1	1		2	2
Air Pollution	1000	3.84	2.03	1	2		6	8
Alcohol use	1000	4.56	2.62	1	2		7	8
Dust Allergy	1000	5.16	1.98	1	4	6	7	8
Occupational Hazards	1000	4.84	2.10	1	3	5	7	8
Genetic Risk	1000	4.58	2.12	1	2		7	7
Chronic Lung Disease	1000	4.38	1.84	1	3		6	7
Balanced Diet	1000	4.49	2.13	1	2		7	7
Obesity	1000	4.46	2.12	1	3		7	7
Smoking	1000	3.94	2.49	1	2		7	8
Passive Smoker	1000	4.19	2.31	1	2		7	8
Chest Pain	1000	4.43	2.28	1	2		7	9
Coughing of Blood	1000	4.85	2.42	1	3	4	7	9
Fatigue	1000	3.85	2.24	1	2	3	5	9
Weight Loss	1000	3.85	2.20	1	2	3	6	8
Shortness of Breath	1000	4.24	2.28	1	2	4	6	9
Wheezing	1000	3.77	2.04	1	2	4	5	8
Swallowing Difficulty	1000	3.74	2.27	1	2	4	5	8
Clubbing of Fingernails	1000	3.92	2.38	1	2	4	5	9
Frequent Cold	1000	3.53	1.83	1	2	3	5	7
Dry Cough	1000	3.85	2.03	1	2		6	7
Snoring	1000	2.92	1.47	1	2		4	7

Open in a new tab

Furthermore, higher levels of air pollution and alcohol consumption correlate with an increased risk of lung cancer. Conversely, lower alcohol consumption is associated with a reduced risk. Shortness of breath, wheezing, clubbing of fingernails, and snoring show a broader range of values corresponding to higher risks. Additional information can be viewed in Table 4.

The classification performance of models is evaluated using the metrics and summarized in Table 5.

Table 5.

Confusion matrix and performance evaluation metrics and statistical tests.

S/N	Metrics	Formula/ Description
1	Confusion Matrix
					Actual
			P r e d i c t		Malignant (Positive)	Benign (Negative)
				Malignant (Positive)	True Positive,	False Positive,
				Benign (Negative)	False Negative,	True Negative,
			e d

2	Accuracy
3	Precision
4	AUC (Areaunderthe curve)	A curve plotted between sensitivity and (1-specificity) is called receiver operating characteristic (ROC). AUC measures the degree to which the curve is up in the north-west corner.
5	Kappa Statistic	P_c is complete agreement probability, and P_b represents likelihood ‘by chance’. Its range is (-1, 1).

Open in a new tab

The number of correct predictions when classifying positive cases represents TP (True Positive). The total quantity of wrong pessimistic predictions identified as positive constitutes FP (False Positive) results. A TN(True Negative) occurrence identifies the correct classification of original negative instances. FN (False Negative) represents errors when classifying positive instances as negative.

Bagging (Bootstrap Aggregating)

Bagging trains multiple similar models using distinct parts of the training data obtained through random sampling with replacement. The final prediction consists of averaging predictions for regression tasks or performing voting for classification tasks.

Boosting

The sequential ensemble technique utilizes weak learners to build strong learners by uniting multiple learners. Every model inside this framework receives corrections for other model errors while devoting special attention to misclassified training data points.

Cross validation

When applied to separate external datasets, an analytical method performs a predictive assessment of statistical model outcomes. Models are trained using k-1 subsets (folds) to apply the learning process with the testing occurring on the left-out fold. Each fold gets used as the test set during one of the k repetitions of the training process. Such a technique prevents overfitting while producing a more precise model performance assessment.

High Precision indicates that the method misclassifies only a few healthy patients as having cancer. Sensitivity values are the percentage of actual cancer patients detected correctly, and we should aim for high sensitivity and avoid misclassifying any cancer patient as healthy. The error due to a few healthy patients’ misclassifications as having cancer can be easily corrected in further tests. Specificity is the percentage of patients correctly detected as not having cancer. AUC is the area under the Receiver operating characteristics(ROC) curve and directly correlates with a given classifier’s overall accuracy. It takes values from 0 to 1, and the values 0, 1, and 0.5 imply an entirely inaccurate test, a perfectly accurate test, and no discrimination, respectively. An acceptable, excellent, and outstanding value lies in the 0.7 to 0.8, 0.8 to 0.9, and 0.9-1. The Kappa value evaluates the observed accuracy against the expected accuracy (random chance). An observed accuracy of 80% with an expected accuracy of 50% is preferable over 75%. A higher value of Kappa is sought.

Methodology

The study implements classic machine learning classifiers (LR, SVM, NB, MLP, and HT) and a novel hybrid feature selection approach that improves predictive models and their interpretability levels. Multiple classifiers have been chosen according to their suitability in cancer prediction because each uses distinct mathematical methods to perform patient classification effectively. A new stacking model serves as a component to enhance computational efficiency in this system. This research implements the classification algorithms using the chosen feature selection method described in the sections below.

Logistic regression

Logistic regression is one of the most frequently used statistical methods to predict cancer due to its capacity to model binary response variables, such as the existence or absence of cancer, with the help of predictor variables. Several published papers have shown that it could be used to predict cancer risk using clinical and demographic variables. For instance, the study by Tsoi et al. (2018)³⁷ used logistic regression to determine the relationship between lifestyle factors and breast cancer risk factors, including age, family history of the disease and BMI, to mention a few. Another sample application is the ability to predict lung cancer through using of logistic regression models for determining high-risk populations by history of smoking and genetic indicators (Liao et al., 2020)³⁸.

SVM

This algorithm’s crucial approach is to search for a plane that would give the maximum margin while the dimensionality changes with the number of features. Although comparing two features for classification is relatively simple, it is more challenging when working with multiple features for classification. Increasing the margin makes the prediction outcomes more accurate³⁹.

NB

It is based on the assumption that every variable has an independent and equivalent effect on the result. This means the features are assumed to be non-interacting and have an equal impact on the output⁴⁰. However, in real-world applications, this assumption may not be valid, which can cause errors in the predictions. Some of these, especially the Gaussian Naïve Bayes, are based on the belief that the feature distributions follow the Gaussian distribution and give proportional conditional probabilities to it.

MLP

The Multilayer Perceptron (MLP) is a significant and widely recognized class of artificial neural networks (ANNs). While the concept was identified in the early 1950 s, the backpropagation algorithm for training ANNs was developed in the mid-1980s. This discovery boosted the importance of MLPs, among the most popular technologies nowadays⁴¹. An MLP is a feedforward neural network. Its architecture comprises three main layers: an input layer, one or more than one hidden layer, and an output layer. The hidden layer includes additional nodes with activation functions, which define the behaviour of the nodes. The inputs are fed into the input layer, whose number of nodes equals the number of input variables, and then passed to the first hidden layer⁴². In the hidden layer, every input is weighted, added to a bias, and summed according to a specific formula. Within each hidden layer node, an activation function, such as Sigmoid, ReLU, etc., is used to compute the node output passed to the next layer. The output layer, with an activation function chosen depending on the output type, generates the final output. This series of operations is known as forward feeding. The error rate is calculated based on the difference between the expected target and the actual output is then determined. This is because the error rate needs to be reduced as much as possible. Backpropagation is used to update the MLP weights according to the error rate of the previous epoch. The MLP is most suitable for solving problems that cannot be separated linearly. It is widely used for pattern recognition and is vital in predicting and diagnosing diseases⁴³.

HT

Hoeffding trees are decision trees capable of learning from massive data streams, provided that the data distribution is not altered over time. They use small sample sizes that are often adequate to choose the optimal splitting attribute. Using the Hoeffding bound helps achieve this, as the Hoeffding bound can(within a prescribed precision)quantify the number of observations needed to estimate the suitability of the attribute. Hoeffding Trees have two parts: Hoeffding Tree training and Hoeffding Tree scoring. In Hoeffding tree training, supervised learning analyses a small sample of data with known outcomes and chooses the attribute for tree node splitting. The trained data is referenced in scoring, and each new instance is classified into the concerned class label. Various studies have indicated that HTs demonstrate strong performance⁴⁴. The HT algorithm compares attributes better than other algorithms and has lower memory consumption and enhanced utilization with data sampling. However, it takes extensive time to inspect if ties occur⁴⁵. HTs have been used for classification⁴⁶ and disease prediction, where they have been shown to perform well. Hoeffding Trees, a category of Decision Trees, are effective methods for handling nonlinear data. Tree-based classifiers have a nonlinear and hierarchical approach that makes them suitable for non-parametric and categorical data. They are seen to have a good deal of flexibility in data analysis and reveal the hierarchical structure of independent variables, which is advantageous for classification.

Feature selection

Sequential feature selection algorithms are valuable methods in machine learning for dealing with big data. These algorithms come under greedy search techniques to streamline the initial d-dimensional feature space to a manageable k-dimensional feature space where k < d. This dimensionality reduction is essential as it helps to sort out valuable features for a given problem while leaving out irrelevant and noisy features. The main reasons for using feature selection algorithms are to improve the model’s quality and accelerate calculations. In high- dimensional datasets, some features are often redundant or irrelevant, slowing the training process, and the model’s performance could be more optimal. These algorithms are designed to reduce the model complexity by choosing a subset of the most informative features, which enables faster computation and eliminates the risk of overfitting⁴⁷. Sequential feature selection can be divided into two main types: Forward selection and backward elimination. In forward selection, the algorithm begins with no features as the initial subset. Then, it adds one feature at each step with the highest correlation to the target variable until we reach k features. On the other hand, backward elimination starts with all features and then successively excludes the feature with minor importance until k features are retained. Both methods rank features according to the criterion of choice, which can be accuracy, AUC, or another measure^48,49. The other significant advantage of sequential feature selection is its simplicity compared to the different methods, which can be easily implemented. However, it is inherently greedy – at each step, it makes the locally optimal choice and hopes to find a global optimum. This approach can sometimes lead to selecting suboptimal feature subsets because the algorithm does not reconsider its options. However, sequential feature selection is still widely employed due to its simplicity and computational efficiency^50,51.

Wrapper feature selection is one of the most commonly used methods for choosing the best feature subset in constructing a model. It assesses the accuracy of different feature subsets in a model by training and testing a model on a given subset of features and then picks the subgroup that will provide the best result. Best-first search is the algorithm employed in this context to optimally search in the space of all possible subsets of features^52–55.

This paper proposes a hybrid Filter-Wrapper feature selection method, which in Phase 1 employs the filter-based approach, where a Multivariate Feature Evaluator is implemented using the greedy step-wise search algorithm. The goal of this stage is to quickly find a decent approximation of essential features with a low computational cost. In particular, features are ordered by their calculated relevance weights, and features are deleted or added iteratively according to the greedy criterion to maximize the evaluator’s score.

In Phase 2, the features identified in Phase 1 are used as the feature subspace in a Wrapper-based approach. The Wrapper approach’s computational effort in Phase 2 is reduced due to the reduced feature sub-space in Phase 1. This method attempts to select features iteratively. Its greedy nature requires lower computational effort and is preferred when there are several features. In this case, the Best-First Search technique is applied in conjunction with the Logistic Regression classifier to test various sets of features. This iterative process searches for combinations of features to zero in on the subset, aiming to optimize the predictor’s performance while keeping the computational work low. The selection of the feature is encapsulated within the framework of cross-validation; that is, feature selection (both Phases − 1 and Phases − 2) is performed separately in each fold of the 10-fold cross-validation, and hence, there is no transfer of information from that test fold to the feature selection process.

Phase III involves training different types of classifiers, such as ensemble and stacked generalization, using the final set of features to classify patients correctly. The stability of the entire pipeline can be assessed through the use of 10-fold cross-validation, whereas training and testing breaks are isolated to ensure unbiased judgment.

The experimental, which compares predictions with.

all features,
the feature subset obtained in Phase 1, and.
The feature subset obtained in Phase 2 demonstrates the efficiency and effectiveness of the proposed hybrid feature selection method.

We will make such clarifications and place them in the manuscript as a response to the reviewer’s remarks on the feature selection process.

The Multistage Feature Selection algorithm is demonstrated by Algorithm 1 and Fig. 2 below.

Algorithm 1: multistage feature selection

graphic file with name 41598_2025_8865_Figa_HTML.gif

Stacking

Bagging and boosting use homogeneous weak learners in the ensemble technique, while stacking often uses heterogeneous ones^56–59. These weak learners are trained simultaneously, and the final decision is made by training a meta-learner. The meta-learner uses the predictions of the weak learners as input features and the targets from the dataset as output targets (as illustrated in Fig. 3 and Algorithm 2). The meta-learner is designed to learn how best to combine the input predictions for the output of an improved final prediction.

In a method like Random Forest, the model averages the decisions made by individual trained models. However, this approach’s limitation is that it assigns an equal weightage to each model irrespective of its accuracy. A better approach is the weighted average ensemble, which assigns weights to individual models based on their accuracy in making predictions. This weighted average method usually improves the result compared to the simple averaging ensemble^60–62.

Stacking takes this idea one step further and replaces the linear weighted sum with another learning algorithm, such as Linear Regression for regression problems and Logistic Regression for classification problems. This approach averages the outputs of the sub-models more flexibly, and any learning algorithm can be used at the final stage of combining. Stacking uses the production of the sub-models as input and tries to learn how to incorporate them for better overall prediction^63–65.

Algorithm 2: stacking with k-fold cross-validation

Results and discussion

The experiments were conducted using the Python programming language in the Google Colab environment. The selected features in Phase 1 and Phase 2 of the Hybrid Filter- Wrapper approach are provided in Tables 6 and 7.

Table 6.

Features selected using hybrid Filter-Wrapper approach (WBC Dataset).

Approach

Selected features

Phase 1: Greedy stepwise & Filter (9 features)

perimeter mean, area mean, compactness mean, symmetry

mean, fractaldimensionmean, area_se, concavity_se, texture_worst, area_worst

Phase2:Best-first&

Wrapper (6 features)

compactness_mean, symmetry_mean, fractal_dimension_mean,

area_se, concavity_se, area_worst

Open in a new tab

Table 7.

Features selected using hybrid Filter-Wrapper approach (LCP Dataset).

Approach	Selected features
Phase 1: Greedy stepwise & Filter (10 features)	Genetic Risk, Balanced Diet, Smoking, Chest Pain, Fatigue, Weight Loss, Wheezing, Swallowing Difficulty, Dry Cough, Snoring
Phase 2: Best-first & Wrapper (6 features)	compactness_mean, symmetry_mean, fractal_dimension_mean, area_se, concavity_se, area_worst

Open in a new tab

Three different set-ups of experiments are used.

the complete dataset with all 30(WBC dataset)/16 features (LCP dataset),

the dataset with only 9(WBC dataset)/10(LCP dataset) features obtained in Phase 1, and the datasets with only 6(WBC dataset)/8(LCP dataset) features obtained in Phase 2.

A comparative analysis of LR, NB, SVM, RF, MLP, and the Stacked model was performed in the three settings using 50–50, 66 − 34, and 80 − 20 train-test split of the dataset, and the results were validated using a 10-fold cross-validation. The classification accuracy, sensitivity, Precision, and specificity values are provided in Tables 8, 9 and 10, and 11, Tables 12 and 13, and Tables 14 and 15, respectively. The AUC/ROC and Kappa values are provided in Tables 16, 17, 18 and 19 respectively. All results are obtained using Python programming language and multiplied by 100. Values of various performance metrics for all features, 9 features and 6 features are separated by a comma in each cell of the tables.

Table 8.

Comparison of accuracies with all features, 9 features and 6 features (WBC Dataset).

Train-Test
Split

SVM

MLP

STACKE
D

50–50

76,80.6,

95.4

74.3, 80.6,

95.4

76.4,

79.5,

95.7

74.3, 80.6,

95.4

75, 86.9,

96.8

79.2, 87.6,

100

66 − 34

79.8,

83.9, 95.3

78.2, 80.8,

92.2

77.7,

82.9,

96.9

78.2, 80.8,

92.2

77.7,

91.7,

98.9

81.3, 91.7,

100

80 − 20

80.7,

82.4, 95.6

78,82.4,

95.6

82.4,

80.7,

95.6

78,82.4,

95.6

80.7,

91.2,

98.2

79.8, 90.3,

100

10-fold

cross validation

76.9,

80.6, 95.6

77.1, 79.6,

95.4

76.4,

80.1,

96.1

77.1, 78.7,

94.9

80.5,

91.9,

97.5

79.4, 90.7,

100

Open in a new tab

Table 9.

Comparison of accuracies with all features, 10 features and 8 features (LCP Dataset).

Train-Test Split

SVM

MLP

STACKE D

50–50

75.8,

84.8, 100

76.2, 84.8,

100

75.3,

84.4, 100

76.2, 84.6,

100

80.3,

97.9,

100

87, 96, 100

66 − 34

77.2,

83.6, 100

76.4, 84.2,

100

76.1,

82.7, 100

76.4, 82.7,

100

84.5,

98.4,

100

92, 99, 100

80 − 20

75.1,84,

100

76.1, 84.6,

100

75.6,

83.2, 100

76.1,84,

100

84.7,

98.5,

100

90, 97, 100

10-fold cross-validation

75.1,

84.8, 100

74.6, 84.6,

100

74.2,

84.6, 100

74.6, 84.8,

100

86.4,

99.2,

100

91, 99, 100

Open in a new tab

Table 10.

Comparison of sensitivity with all features, 9 features and 6 features (WBC Dataset).

Train-Test Split

SVM

MLP

STACKE D

50–50

76.1,

80.6, 95.4

74.3,

80.6, 95.4

76.4,

79.6, 95.8

74.3,

80.6, 95.4

75,87,

96.8

79.2, 87.7,

100

66 − 34

79.8,83.9, 95.3

78.2,80.8, 92.2

77.7,82.9, 96.9

78.2,80.8, 92.2

77.7,91.7, 99

81.3, 91.7,100

80 − 20

80.7,

82.5, 95.6

78.1,

82.5, 95.6

82.5,

80.7, 95.6

78.1,

82.5, 95.6

80.7,

91.2, 98.2

79.8, 90.4,

100

10-fold cross-validation

77,80.7,

95.6

77.2,

79.6, 95.4

76.4,

80.1, 96.1

77.2,

78.7, 94.9

80.5, 91.9,

97.5

79.4, 90.7,

100

Open in a new tab

Table 11.

Comparison of Sensitivity(recall) with all features, 10 features and 8 features (LCP Dataset).

Train-Test Split

SVM

MLP

STACKE D

50–50

75.9,

84.9, 100

76.2,

84.8, 100

75.3,

84.5, 100

76.2,

84.6, 100

80.3,

97.9, 100

87, 96, 100

66 − 34

77.3,

83.6, 100

76.5,

84.2, 100

76.2,

82.7, 100

76.5,

82.7, 100

84.5,

98.4, 100

92, 99, 100

80 − 20

75.2,84,

100

76.2,

84.7, 100

75.7,

83.2, 100

76.2,84,

100

84.7,

98.5, 100

90, 97, 100

10-fold cross-validation

75.1,

84.8, 100

74.6,

84.6, 100

74.2,

84.7, 100

74.6,

84.9, 100

86.4, 99.2,

100

91, 99, 100

Open in a new tab

Table 12.

Comparison of precision with all features, 9 features and 6 features (WBC Dataset).

Train-Test
Split

SVM

MLP

STACKE
D

50–50

76.7,

80.7,

95.3

74.4,

80.9,

95.7

76.7,80,

74.4,

80.9,

95.7

75.1,

87.1,

96.8

79.3, 87.8,

100

66 − 34

79.8,

84.1,

95.9

78.2,

80.8, 92.9

77.8,

83.5, 97

78.2,

80.8, 92.9

77.8,

91.8,99

81.3, 91.9,

100

80 − 20

80.7,

82.5, 95.8

78.1,

82.5, 95.8

82.5,

80.9, 95.8

78.1,

82.5, 95.8

80.7,

91.3, 98.3

83.6, 90.5,

100

10-fold cross-validation

76.9,

80.7, 95.5

77.2,

79.6, 95.7

76.4,

80.2, 96.3

77.2,

78.8, 95

80.5,92,

97.6

79.4,91,

100

Open in a new tab

Table 13.

Comparison of precision with all features, 10 features and 8 features (LCP Dataset).

Train-Test Split

SVM

MLP

STACKE D

50–50

75.9,

84.6, 100

76.2,

84.2, 100

75.4,

84.1, 100

76.2,

84.3, 100

80.5,

97.9, 100

88, 96, 100

66 − 34

78.3,

83.5, 100

76.5,

83.7, 100

76.2,

82.4, 100

76.5,

82.4, 100

84.5,

98.4, 100

92, 99, 100

80 − 20

75.1,

83.7, 100

76.1,

84.2, 100

75.7,

82.7, 100

76.1,

83.5, 100

84.7,

98.5, 100

90, 97, 100

10-fold cross-validation

75.1,

84.6, 100

74.6,

84.1, 100

74.2,

84.4, 100

74.6,

84.4, 100

86.4, 99.2,

100

91, 99, 100

Open in a new tab

Table 14.

Comparison of specificity with all features, 9 features and 6 features (WBC Dataset).

Train-Test Split

SVM

MLP

STACKE D

50–50

70.6,

82.9, 96.7

70.4, 84.4,

94.9

72, 84, 95.3

70.4,

84.4, 94.9

71.5,

89.1, 96.8

80.3, 85.8,

100

66 − 34

78.4,

82.4, 98.7

77,81.5,

91.6

75, 80, 96.5

77,81.5,

91.6

75,90.8,

98.8

79.7, 89.3,

100

80 − 20

81.8,

81.9, 95.3

77.6, 81.9,

95.3

81.3, 79.7,

95.3

77.6,

81.9, 95.3

79.6,

93.7, 98

72.3, 88.7,

100

10-fold cross validation

76,80.6,

96.1

77.2,80,

94.8

74.8, 79.8,

95.6

77.3,

77.7, 94.6

78,90.2,

97.3

77.7, 87.5,

100

Open in a new tab

Table 15.

Comparison of specificity with all features, 10 features and 8 features (LCP Dataset).

Train-Test Split

SVM

MLP

STACKE D

50–50

74.6,71,

100

75,73.7,

100

76.6, 70.6,

100

75,70.7,

100

77.6,

95.7, 100

90, 92, 100

66 − 34

76,69.4,

100

75.3, 73.1,

100

75,68.4,

100

75.3,

68.4, 100

82.9,

96.3, 100

91, 98, 100

80 − 20

73.8,

72.3, 100

74.8, 76.4,

100

75.5,72,

100

74.8,

78.8, 100

84,99.3,

100

86, 97, 100

10-fold cross-validation

74.7,

70.8, 100

74.3,73,

100

72.8, 71.2,

100

74.3,73,

100

85.9,

99.3, 100

91, 98, 100

Open in a new tab

Table 16.

Comparison of AUC (ROC area) value with all features, 9 features and 6 features (WBC Dataset).

Train-Test Split

SVM

MLP

STACKE D

50–50

84.9,

86.9, 93.4

83.2, 87.3,

93.1

76.5, 79.8,

83.2,

87.3, 93.1

82.7, 88.8,

88.4

84.8, 92.8,

100

66 − 34

86.3,

86.5, 97.2

85.2, 86.4,

97.9

77.7, 82.1,

89.3

85.2,

86.4, 97.9

85.5, 92.1,

95.8

88,94.5,

100

80 − 20

88.6,

82.2, 95.8

87.7, 82.9,

95.4

82.5, 79.1,

79.2

87.7,

82.9, 95.4

86.1, 95.3,

90.4

89.3, 94.7,

100

10-fold cross-validation

85.3,

85.8, 95.7

85.3, 85.6,

95.5

76.2, 80.1,

88.2

85.3,83,

95.3

86.1, 94.5,

90.6

86.5, 91.8,

100

Open in a new tab

Table 17.

Comparison of AUC (ROC area) value with all features, 10 features and 8 features (LCP Dataset).

Train-Test Split

SVM

MLP

STACKE D

50–50

83.9,

91.5, 100

83.5,91,

100

75,78.2,

100

83.5,

91.7, 100

87.8, 98.2,

100

93, 99, 100

66 − 34

84.3,

90.9, 100

84.3, 90.7,

100

76,76.7,

100

84.3,

87.7, 100

89.4, 98.5,

100

95, 99, 100

80 − 20

82.9,

91.2, 100

83.2,91,

100

75,76.8,

100

83.2,90,

100

89.9, 98.1,

100

94, 96, 100

10-fold cross-validation

83.3,

91.7, 100

83.1, 91.6,

100

74.2, 78.6,

100

83.1,

91.2, 100

92.2, 98.7,

100

95, 99, 100

Open in a new tab

Table 18.

Comparison of kappa statistics with all features, 9 features and 6 features.

Train-Test Split

SVM

MLP

STACKE D

50–50

52.1,

61.1, 80.4

48.2, 61.3,

78.1

52.6, 59.2,

48.2,

61.3, 78.1

49.5, 73.8,

85.8

57.5, 75.1,

100

66 − 34

59.3,

67.3, 82.5

56.1, 61.2,

59.7

55.3,65,

86.2

56.1,

61.2, 59.7

55.3, 83.2,

95.7

62.4, 83.1,

100

80 − 20

61.4,

63.3, 71.4

56.1, 63.4,

71.4

64.9, 59.5,

71.5

56.1,

63.4, 71.5

61.4, 82.1,

89.9

59.6, 79.9,

100

10-fold cross-validation

53.5,

61.3, 82.9

53.7, 59.2,

81.2

52.5, 60.2,

84.4

53.7,

57.4, 79.1

60.8, 83.8,

90.5

58.6, 81.3,

100

Open in a new tab

Table 19.

Comparison of kappa statistic with all features, 10 features and 8 features.

Train-Test Split

SVM

MLP

STACKE D

50–50

51.5,

59.1, 100

52.2, 57.3,

100

50.2, 57.8,

100

52.2,

58.3, 100

60.6, 94.5,

100

75, 90, 100

66 − 34

54.3,

57.2, 100

52.7, 57.3,

100

52.1, 54.4,

100

52.7,

54.4, 100

68.9, 95.9,

100

83, 97, 100

80 − 20

49.6,

58.7, 100

51.6, 59.2,

100

50.4, 55.8,

100

51.6,

55.8, 100

68.9, 96.2,

100

80, 92, 100

10-fold cross-validation

50.1,

59.3, 100

49,57.3,

100

48.3, 58.5,

100

49,58.3,

100

72.7, 97.9,

100

82, 98, 100

Open in a new tab

Analysis of accuracy: The STACKED model is the most accurate, providing 100% accuracy in all splits and cross-validation. From this, it can be concluded that the stacking ensemble method is superior to individual models in the case of WBC and LCP datasets. The MLP classifier also gives high accuracy for all the splits, especially the 80 − 20 and 66 − 34 splits. The SVM classifier performed significantly, particularly in the 80 − 20 split scenario. However, the average accuracy of LR, NB, and HT is lower than that of SVM, MLP, and STACKED models. Regarding accuracy, STACKED models yield excellent results, and in many cases, it achieved 100% accuracy for both the datasets.

Tables 10 and 11 presents a comparison of the sensitivities of the different classification models with three features (all features, 9/10 features, and 6/8 features in WBC/LCP datasets) and train-test split ratios differentiate between 50 and 50, 66 − 34, and 80 − 20%, and 10 fold cross- validation. This implies that as the number of features decreases from all features to 9 or 10 features and then 6 or 8 features, the sensitivity increases, especially for the stacked model. Generally, the stacked model yields the highest results, with sensitivity rates approaching 100% in most cases. The performance is reasonably stable between various splits of the train- test data.

Table 11 presents the sensitivity of the same models on the LCP Dataset. The comparison is done between all features, 10 features, and 8 features simultaneously on the same train-test splits and using a 10-fold CV. Similarly to the WBC dataset, whre the enumber of features.

decreases, the sensitivity is either maintained at a high level or can slightly improve. The stacked model has 100% sensitivity in all feature subsets and train-test splits, which proves it is the most accurate model in this case. The MLP also exhibits a high level of performance, mainly when a small number of features are incorporated; in most instances, sensitivity is nearly 100%. We observe that the stacked model has the highest sensitivity across all feature sets and train-test splits for both datasets.

In some cases, it is equal to 100%, which shows that it works well for predicting actual positive values with one set of features in the first place and with another when using the second set of given data. As for the impact upon sensitivity, decreasing the number of features does not always negatively affect the outcome and can even enhance the result in MLP and Stacked models. It is essential to note that the performance of the models is comparable when applying cross-validation and when partitioning the data differently to carry out train-test splits.

Tables 12 and 13 presents the comparison of the precision metrics of the classification model with different types of feature combinations (all features, 9 or 10 features, and 6 or 8 features) with the three train-test splitting ratios(50–50%, 66 − 34%, and 80 − 20%) and 10-fold cross-validation. Overall, there is a trend of Precision being lower in various models with all features. In the WBC dataset with 9 or 6 selected features reach a higher precision in contrast to utilizing all the characteristics, and there is a positive shift in the models like MLP and stacked models. The highest Precision is also recorded when optimal 6 or 8 features are selected, mainly for the stacked model, where its accuracy is 100% in most of the train-test split ratios.

Analysing the Precision for both datasets and various classification models, it was observed that Precision increases as the number of features decreases, especially in the case of stacked and MLP models. The stacked model is also consistently more accurate and sometimes achieves 100% Precision, which shows the model to be sound and perfect. The results have indicated that reducing the features improves the accuracy, with the best results being obtained for either the 6 or the 8 features, depending on the dataset.

The specificity analysis stated in Tables 14 and 15 analyses the specificity of various models (LR, NB, SVM, HT, MLP, and STACKED) for two datasets (WBC and LCP) employing different features and train-test split. The proposed STACKED model demonstrates the high specificity for the WBC dataset throughout the analysis of all the feature sets and all splits, with 100% for the optimal 6 features. On the other hand, for the LCP dataset, STACKED has comparable accuracy but varied a little from the previous ones and had a 100% specificity in all classifiers and splits with minimum feature sets, which shows the optimal outcome in both datasets.

The AUC (ROC area) in Tables 16, 17, 20 and 21 indicates that the stacked model has given the best overall results with the optimum scores of 100% in all cases. Regarding the improvement of the other models, reducing features is less consistent across the different train-test splits for models such as LR, NB, and MLP but is more consistent with the stacked model. For instance, SVM accuracy decreases when the number of features is low because SVM becomes unstable, especially in the 80 − 20% split. In Table 17, the stacked model has once again produced perfect scores in all configurations. Similarly, with these models, decreasing the number of input features causes an enhancement in AUC. Therefore, the stability of AUC in the WBC and LCP datasets proves that the stacked model is sound across all the feature sets and splits. In summary, the stacked model achieves the highest AUC/ROC value on both datasets, although the impact of feature reduction varies unevenly across the different classification algorithms studied.

Table 20.

Comparison of ROC value with all features, 9 features and 6 features (WBC Dataset) for 10 fold cross-validations for stacked model.


ROC = 86.5	ROC = 91.8,	ROC = 1

Open in a new tab

Table 21.

Comparison of AUC value with all features, 10 features and 8 features (LCP Dataset) for 10 fold cross-validations for stacked model.


ROC = 0.95	ROC = 0.99	ROC = 1

Open in a new tab

As demonstrated from the results of Tables 18 and 19 STACKED and MLP models perform better as compared to the other models irrespective of using train-test split and feature sets while achieving 100% Kappa statistics with using fewer numbers of features (6 or 8 features). As shown in both tables, the performance of these models gets higher if the number of features decreases, which proves the stability and efficiency of the models with a small and adequate number of features. Other models, such as LR, NB, and SVM, have much more fluctuating and relatively lower accuracies overall, with a slight boost due to feature elimination. However, they are less optimal than STACKED or MLP. This study examines the impact of feature selection and model selection on assessing multiple sclerosis lesions. It confirms that STACKED and MLP are the most accurate methods when used with limited features.

The hybrid feature selection method generally showed computational efficiency and usability in finding the best possible compromise between precision, sensitivity, specificity, kappa value, AUC, and overall accuracy. This solution has an iterative approach with complete performance metrics that allow it to provide a dependable cancer diagnosis while minimizing misclassifications and maximizing clinical utility.

The determining role of the interpretability of models lives in high-stakes areas like healthcare, wherein AI prediction models, for example, in cancer diagnosis, have direct implications on patient outcomes. Understandability and openness is the key to improve the trustfulness of medical practitioners and their decision-making process, easing the clinical acceptance of AI systems.

SHapley Additive exPlanations(SHAP) is made easy to understand through the function of the beeswarm plot, which summarizes the feature’s importance over several instances visually. Specifically, this research presents SHAP summary plots for both LCP and WBC datasets, respectively, which appear in Fig. 4. Real examples from the framework deliver model-based information to show how different features impact predictions through the SHAP visualizations. A single SHAP value belongs to a given feature and a specific instance, visible as a dot in each plot. The model ranks features according to their importance, while the colour scale indicates a value range from low to high. The features ‘Smoking,’ ‘Chest Pain,’ and ‘Genetic Risk’ in the LCP dataset appeared consistently in important rankings, which match known lung-related condition risk factors. The WBC dataset consists of ‘area_worst’ and ‘perimeter_worst’ features, which match clinical professionals’ perspectives of breast cancer diagnosis through tumour characteristics. The authors have proved model interpretability by presenting extensive visualizations which received expert verification.

Fig. 4 — Beeswarm Plot Supporting Multistage Feature Selection and Feature Consistency.

Cancer detection depends significantly on Confidence Intervals(CI) because they generate statistical ranges showing where diagnostic measures such as sensitivity specificity and accuracy values should lie with 95% confidence. When performing cancer diagnostic tests, doctors gain valuable information about test performance through CIs to make correct decisions because false positives and negatives represent critical risks. Using single-point estimates, a diagnostic model assessing sensitivity at 90% with an associated 85–94% confidence interval shows enhanced trustworthiness.

Figures 5 and 6 presents the confidence intervals for the accuracy values calculations done on the WBC Dataset with its complete set of features, with selected 9 features and 6 features and for LCP Dataset with all features, 10 features and 8 features respectively.

Fig. 5 — Confidence interval of Accuracies with all features, 9 features and 6 features (WBC Dataset).

Fig. 6 — Confidence interval of Accuracies with all features, 10 features and 8 features (LCP Dataset).

Interpretation of Results.

LR (Logistic Regression): The confidence interval is (76.0, 80.7), indicating that we can be 95% confident that the true mean accuracy of the LR model lies within this range. NB (Naive Bayes): The confidence interval is (80.6, 83.8), suggesting a high level of confidence in the model’s performance. SVM (Support Vector Machine): The confidence interval is (95.4, 95.6), indicating very high accuracy with a narrow range. HT (Hoeffding Tree): The confidence interval is (74.3, 78.2), showing a lower performance compared to others. MLP (Multilayer Perceptron): The confidence interval is (79.6, 82.4), indicating good performance. STACKED: The confidence interval is (100.0, 100.0), indicating perfect accuracy.

Interpretation of results

LR (Logistic Regression): The confidence interval is (75.1, 77.5), indicating that we can be 95% confident that the true mean accuracy of the LR model lies within this range. NB (Naive Bayes): The confidence interval is (83.8, 85.2), suggesting a high level of confidence in the model’s performance. SVM (Support Vector Machine): The confidence interval is (100.0, 100.0), indicating perfect accuracy. HT (Hoeffding Tree): The confidence interval is (75.5, 76.5), showing a lower performance compared to others. MLP (Multilayer Perceptron): The confidence interval is (84.4, 84.8), indicating good performance. STACKED: The confidence interval is (100.0, 100.0), indicating perfect accuracy.

Similarly in this section, we described the confidence intervals for ROC AUC values achieved on the WBC dataset by ROC/AUC analysis on complete feature sets and on the sets consisting of only 9 and 6 selected features, respectively. The 95% confidence intervals (CIs) of the area under the curve (AUC) values derived from the ROC analysis applied to the WBC data set with complete feature set, reduced sets with 9 and 6 selected features demonstrate the consistency in performance from different classifiers. Logistic Regression (LR) was 85.83–92.27%, and Naive Bayes (NB) was not far behind at 85.47–92.11%. Support Vector Machine (SVM) had a relatively low performance with a CI of 78.58–84.03%. Hoeffding tree (HT) range was 85.13–91.99%, and the Multi-Layer Perceptron (MLP) showed promising results with a CI of 87.03–91.99%. The importance of the Stacked Ensemble method was demonstrated as it performed better than the individual classifiers, giving the highest AUC confidence interval of 89.96–97.10%, which implied its generalization capability across subsets of features.

Cross-validation methodology

The classification metrics of stratified 10-fold cross-validation at each fold appear in the accompanying images presented in Fig. 7, which was applied to the training dataset. The evaluation strategy provides precise metrics about precision and recall along with f1-score and support and accuracy, which specifically measure the performance of classes 0 and 1 in each cross-validation fold. This approach ensures that: The dataset points serve only one training session followed by one validation session. Every fold maintains the original distribution of the classes through stratification, allowing more trustworthy assessment metrics.

Key results from the cross-validation

The accuracy scores measured between 0.84 and 1.00, with seven out of ten folds demonstrating results above the 0.98 range. The performance of the 4th validation fold was the poorest (accuracy = 0.84) due to the natural distribution variance in split data. The model demonstrated strong generalization abilities through its consistently high precision, recall and F1 scores across all different data subsets.

Our proposed hybrid feature selection method undergoes additional validation tests by analyzing the Framingham Heart Study dataset from a different medical domain. This dataset maintains a grossly imbalanced proportion between its 4240 instances, where 644 belong to the positive group and the remaining 3596 belong to the negative group. The Stacking classifier showed 92.95% accuracy across 10-fold cross-validation, achieving AUC/ROC of 96.3%, sensitivity and specificity at 93%, and a kappa value of 86%. During the first stage, when the model underwent feature selection (which reduced the features to 11), the performance of the Stacking classifier increased dramatically to reach 98.8% accuracy alongside 100% AUC/ROC and 99% sensitivity and specificity and a kappa value of 97. The classifier’s performance improved alongside dimension reduction through feature selection since irrelevant and redundant features was removed, which yielded better generalization capabilities and robustness. All primary evaluation metrics verify the selected features’ effectiveness and the proposed model’s powerful capabilities. The method demonstrates reliability and consistency while retaining important information and analyzing datasets with different database sizes and distribution patterns. The Framingham Heart Study dataset includes 16 columns comprising 15 attributes and a single outcome variable. This dataset exists to forecast a 10-year CHD risk for patients. The DOI of the dataset is 10.34740/kaggle/dsv/3493583.

Conclusion

This research develops a novel cancer detection system based on the ensemble approach with a generated two-step Hybrid Filter-Wrapper feature selection framework applied to WBC and LCP cancer datasets. The approach consists of three separate phases in its framework. Phase 1(Filter-Based Feature Selection): We use a stepwise Greedy search algorithm and multivariate feature ranking tool in the first phase. The approach minimizes data dimensions while preserving key features that help identify the target class, representing cancer presence or absence. We reduce feature correlation to obtain an informational subset of variables during selection. Phase 2(Wrapper-Based Feature Refinement): The best-first search technique with logistic regression in the second phase refines the Phase 1 selected features. This repetitive assessment of selected features enables performance enhancement in the model while simultaneously assisting in finding the best features for new classification tasks. Phase 3(Classifier Implementation): Six machine learning classifiers are applied in the third stage, including LR, NB, DT, and SVM, using a polynomial kernel and MLP. The Stacked model merges LR, NB, and DT base classifiers into a structure that applies MLP as its meta-classifier. The robust evaluation procedure uses 10-fold cross-validation and train-test splits at 50/50%, 66/34%, and 80/20%. The extensive examination method enables a detailed evaluation of different performance outcomes for the selected features. The designed stacked model achieves outstanding classification metrics performance in every test as it outclasses baseline models by substantial margins. Hybrid filter-wrapper feature selection and a stacked model improve diagnostic accuracy and interpretability during procedures. The result shows that combining various classifiers produces the best results in complex classification problems like cancer testing. The ensemble stacked model proves to be an effective predictive tool for clinical cancer detection applications by enhancing the accuracy of results.

Future scope

To showcase the practical applications of the model in clinical settings, it should be validated using real-world patient data through collaborations with hospitals and cancer research institutes. This will ensure its effectiveness across diverse demographics. Integrating the model into electronic health record systems can assist oncologists in the early detection of cancer. Additionally, conducting pilot studies with healthcare institutions can compare Artificial Intelligence (AI)- assisted screening methods with traditional diagnostic techniques. Furthermore, deploying the model in telemedicine and mobile health applications can enhance access to early screening, especially in under-served areas.

The model proved consistently effective at predicting cancer when applied to the WBC dataset of 569 patients from the UCI Machine Learning Repository and the LCP dataset consisting of 1,000 participants from Kaggle. This indicates that it is appropriately scalable for moderately sized datasets. But when we increase the number of data entries, model scalability may depend on the increased computational cost of memory and time. Therefore, ways to remain efficient in low-resource settings, including data preprocessing, model pruning, and cloud computing resources, must be accounted for. With growing datasets, future studies would get to optimize deployment approaches that can maintain the accuracy and applicability of the model in changing real-world environments.

Acknowledgements

Not Applicable.

Author contributions

Sulekha Das: Conceptualization, Writing- Original draft, Software, Investigation, Data Curation, Visualization. Avijit Kumar Chaudhuri: Methodology, Validation, Formal Analysis, Supervision, Writing- Review & Editing, Methodology, Validation, Formal Analysis. Sayak Das: Supervision. Partha Ghosh: Supervision.

Funding information

Not Applicable.

Data availability

We have dealt with two datasets for this article. Both of these datasets underlying this study are openly available in kaggle website. These data were derived from sources in the public domain. Link of this datasets are given below.https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-datahttps://www.kaggle.com/datasets/mysarahmadbhat/lung-cancer.

Declarations

Competing interests

The authors declare no competing interests.

Ethical statement

Though we applied datasets from publicly available repositories, we realized that there were ethical concerns for potential biases in datasets and privacy in healthcare research. The datasets used herein follow the applicable rules of ethics, such as data anonymization and de- identification, and we take all reasonable measures to mitigate biases in our analysis.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Rauf, F. et al. DenseIncepS115: a novel network-level fusion framework for alzheimer’s disease prediction using MRI images. Front. Oncol.14, 1501742 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Fatima, M. et al. Breast lesion segmentation and classification using U-Net saliency Estimation and explainable residual convolutional neural network. Fractals10, S0218348X24400607 (2024). [Google Scholar]
3.Ertel, W. Introduction To Artificial Intelligence (Springer, 2018).
4.Kalis, B., Collier, M. & Fu, R. 10 promising AI applications in health care. Harvard Business Rev. 2–5 (2018).
5.Wang, H., Zu, Q., Chen, J., Yang, Z. & Ahmed MA. Application of artificial intelligence in acute coronary syndrome: a brief literature review. Adv. Therapy 1–9 (2021). [DOI] [PubMed]
6.Majumder, J., Ghosh, S., Khang, A., Debnath, T. & Chaudhuri, A. K. Hepatitis C. Prediction using feature selection by machine learning technique. In Medical Robotics and AI-Assisted Diagnostics for a High-Tech Healthcare Industry, pp 195–204, IGI Global (2024).
7.Rauf, F. et al. Artificial intelligence assisted common maternal fetal planes prediction from ultrasound images based on information fusion of customized convolutional neural networks. Front. Med.11, 1486995 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Ullah, M. S., Khan, M. A., Albarakati, H. M., Damaševičius, R. & Alsenan, S. Multimodal brain tumor segmentation and classification from MRI scans based on optimized DeepLabV3 + and interpreted networks information fusion empowered with explainable AI. Comput. Biol. Med.182, 109183 (2024). [DOI] [PubMed] [Google Scholar]
9.Yu, C. & Helwig, E. J. The role of AI technology in prediction, diagnosis and treatment of colorectal cancer. Artif. Intell. Rev.55, 323–343 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Kumar, Y., Gupta, S., Singla, R. & Hu Y. C. A systematic review of artificial intelligence techniques in cancer prediction and diagnosis. Arch. Comput. Methods Eng.29, 2043–2070 (2022). [DOI] [PMC free article] [PubMed]
11.McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature577, 89–94 (2020). [DOI] [PubMed]
12.Majumder, A. & Sen D. Artificial intelligence in cancer diagnostics and therapy: current perspectives. Indian J. Cancer58, 481–492 (2021). [DOI] [PubMed]
13.Pantanowitz, L. et al. An artificial intelligence algorithm for prostate cancer diagnosis in whole slide images of core needle biopsies: a blinded clinical validation and deployment study. Lancet Digit. Health2, e407–e416 (2020). [DOI] [PubMed]
14.Feng, H. et al. Identifying malignant breast ultrasound images using ViT-patch. Appl. Sci.13, 3489 (2023).
15.Ray, A., Chen, M. & Gelogo, Y. Performance comparison of different machine learning algorithms for risk prediction and diagnosis of breast cancer. In Smart Technologies in Data Science and Communication: Proceedings of SMART-DSC 2019 71–76 Springer Singapore. (2020).
16.Rana, M., Chandorkar, P., Dsouza, A. & Kazi, N. Breast cancer diagnosis and recurrence prediction using machine learning techniques. Int. J. Res. Eng. Technol.4, 372–376 (2015). [Google Scholar]
17.Kharya, S., Dubey, D. & Soni, S. Predictive machine learning techniques for breast cancer detection. Int. J. Comput. Sci. Inform. Technol.4, 1023–1028 (2013). [Google Scholar]
18.Liu, H. et al. Recent advances in pulse-coupled neural networks with applications in image processing. Electronics11 3264 (2022).
19.Agrawal, S. & Agrawal, J. Neural network techniques for cancer prediction: a survey. Procedia Comput. Sci.60, 769–774 (2015). [Google Scholar]
20.Abdulkareem, S. A. & Abdulkareem Z. O. An evaluation of the Wisconsin breast cancer dataset using ensemble classifiers and RFE feature selection. Int. J. Sci. Basic. Appl. Res.55, 67–80 (2021).
21.Alshayeji, M. H., Ellethy, H. & Gupta R Computer-aided detection of breast cancer on the Wisconsin dataset: an artificial neural networks approach. Biomed. Signal Process. Control. 71, 103141 (2022). [Google Scholar]
22.Benbrahim, H. & Hachimi, H. and Amine A. (Eds.). Comparative study of machine learning algorithms using the breast cancer dataset. In Advanced Intelligent Systems for Sustainable Development (AI2SD’2019) Volume 2-Advanced Intelligent Systems for Sustainable Development Applied To Agriculture and Health, pp 83–91, Springer International Publishing (2020).
23.Hernández-Julio, Y. F., Díaz-Pertuz, L. A., Prieto-Guevara, M. J., Barrios-Barrios, M. A. & Nieto-Bernal, W. Intelligent fuzzy system to predict the Wisconsin breast cancer dataset. Int. J. Environ. Res. Public Health. 20, 5103 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Hossin, M. M. et al. Breast cancer detection: an effective comparison of different machine learning algorithms on the Wisconsin dataset. Bull. Electr. Eng. Inf.12, 2446–2456 (2023). [Google Scholar]
25.Kadhim, R. R. & Kamil, M. Y. Comparison of breast cancer classification models on Wisconsin dataset. International Journal of Reconfigurable and Embedded Systems 2089 4864. (2022).
26.Mohammad, W. T., Teete, R., Al-Aaraj, H., Rubbai, Y. S. Y. & Arabyat, M. M. Diagnosis of breast cancer pathology on the Wisconsin dataset with the help of data mining classification and clustering techniques. Applied Bionics and Biomechanics 2022 6187275. (2022). [DOI] [PMC free article] [PubMed] [Retracted]
27.Naji, M. A. et al. Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Comput. Sci.191, 487–492 (2021).
28.Umami, R. F. & Sarno, R. Analysis of classification algorithm for Wisconsin diagnosis breast cancer data study. In 2020 International Seminar on Application for Technology of Information and Communication (iSemantic) 464–469 IEEE. (2020).
29.Ahmed, S. et al. The deep learning resnet101 and ensemble Xgboost algorithm with hyperparameters optimization accurately predict the lung cancer. Appl. Artif. Intell.37, 2166222 (2023).
30.Alshayeji, M. H. & Abed S.E. Lung cancer classification and identification framework with automatic nodule segmentation screening using machine learning. Appl. Intell.53, 19724–19741 (2023).
31.Al-Tawalbeh, J. et al. Classification of lung cancer by using machine learning algorithms. In 2022 5th International Conference on Engineering Technology and its Applications (IICETA) 528–531 IEEE. (2022).
32.Bharathy, S. & Pavithra, R. Lung cancer detection using machine learning. In 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC) 539–543 IEEE. (2022).
33.Bushara, A. R., Kumar, R. V. & Kumar, S. S. An ensemble method for the detection and classification of lung cancer using computed tomography images utilizing a capsule network with visual geometry group. Biomed. Signal Process. Control. 85, 104930 (2023). [Google Scholar]
34.Ingle, K., Chaskar, U. & Rathod, S. Lung cancer types prediction using machine learning approach. In 2021 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT) 01–06 IEEE. (2021).
35.Lakshmi, S. V., Greeshma, B., Thanooj, M. J., Reddy, K. R. & Rakesh K. R. Lung cancer detection and stage classification using supervised algorithms. Turkish J. Physiotherapy Rehabilitation32, 3 (2021).
36.Nam, Y. J. & Shin W. J. A study on comparison of lung cancer prediction using ensemble machine learning. Korea J. Artif. Intell.7, 19–24 (2019).
37.Tsoi, K. K., Chan, F. C., Hirai, H. W. & Sung JJ Risk of Gastrointestinal bleeding and benefit from colorectal cancer reduction from long-term use of low-dose aspirin: a retrospective study of 612,509 patients. J. Gastroenterol. Hepatol.33, 1728–1736 (2018). [DOI] [PubMed] [Google Scholar]
38.Liao, C. M., Huang, W. H., Kung, P. T., Chiu, L. T. & Tsai, W. C. Comparison of colorectal cancer screening between people with and without disability: a nationwide matched cohort study. BMC Public. Health. 21, 1034 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Chaudhuri, A. K., Banerjee, D. K. & Das A. A dataset-centric feature selection and stacked model to detect breast cancer. Int. J. Intell. Syst. Appl.13, 24 (2021).
40.Chaudhuri, A. K. & Das, S. and Ray A. (Eds.). An improved random forest model for detecting heart disease. In Data-Centric AI Solutions and Emerging Technologies in the Healthcare Ecosystem, pp 143–164, CRC (2024).
41.Safar, A. A., Salih, D. M. & Murshid, A. M. Pattern recognition using the multilayer perceptron (MLP) for medical disease: a survey. Int. J. Nonlinear Anal. Appl.14, 1989–1998 (2023). [Google Scholar]
42.Javaid, S. & Saeed N. Neural networks for infectious diseases detection: prospects and challenges. Authorea Preprints (2023).
43.Macukow, B. (Ed.). Neural networks–state of art, brief history, basic models and architecture. In Computer Information Systems and Industrial Management 3–14 Springer International Publishing (2016).
44.Wang, X. Analysis and optimization for Hoeffding tree (2020).
45.Mathew, T. E. Appositeness of Hoeffding tree models for breast cancer classification. J. Curr. Sci. Technol.12, 391–407 (2022). [Google Scholar]
46.Elbasi, E. & Zreikat A.I. Heart disease classification for early diagnosis based on adaptive Hoeffding tree algorithm in IoMT data. Int. Arab. J. Inform. Technol.20, 38–48 (2023).
47.Chaudhuri, A. K. & Das, A. Variable selection in genetic algorithm model with logistic regression for prediction of progression to diseases. In 2020 IEEE International Conference for Innovation in Technology (INOCON) 1–6 IEEE. (2020).
48.Agarap, A. F. M. On breast cancer detection: an application of machine learning algorithms on the Wisconsin diagnostic dataset. In Proceedings of the 2nd International Conference on Machine Learning and Soft Computing 5–9. (2018).
49.Chaudhuri, A. K., Sinha, D., Banerjee, D. K. & Das A. A novel enhanced decision tree model for detecting chronic kidney disease. Netw. Model. Anal. Health Inf. Bioinf.10, 1–22 (2021).
50.Das, S. et al. A.K. A multifaceted approach to understanding mental health crises in the COVID-19 era: using AI algorithms and feature selection strategies. In AI-Driven Innovations in Digital Healthcare: Emerging Trends, Challenges, and Applications, pp. 97–119 IGI Global. (2024).
51.Kar, S. P. et al. Identification of insecurity in COVID-19 using machine learning techniques. In Medical Robotics and AI- Assisted Diagnostics for a High-Tech Healthcare Industry 239–256 IGI Global. (2024).
52.Cheng, Z. et al. Application of serum SERS technology based on thermally annealed silver nanoparticle composite substrate in breast cancer. Photodiagnosis and Photodynamic Therapy, 41, 103284. doi:https://doi.org/10.1016/j.pdpdt.2023.103284. (2023). [DOI] [PubMed]
53.Zeng, Q. et al. Serum Raman spectroscopy combined with convolutional neural network for rapid diagnosis of HER2-positive and triple-negative breast cancer. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 286, 122000. doi: https://doi.org/10.1016/j.saa.2022.122000. (2023). [DOI] [PubMed]
54.Pu, X., Sheng, S., Fu, Y., Yang, Y. & Xu, G. Construction of circRNA–miRNA–mRNA CeRNA regulatory network and screening of diagnostic targets for tuberculosis. Ann. Med.56 (1), 2416604. 10.1080/07853890.2024.2416604 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Xingguang Duan, D. et al. Changsheng li. A novel robotic bronchoscope system for navigation and biopsy of pulmonary lesions. Cyborg Bionic Syst. 40013. 10.34133/cbsystems.0013 (2023). [DOI] [PMC free article] [PubMed]
56.Li, H. et al. UCFNNet: Ulcerative colitis evaluation based on fine-grained lesion learner and noise suppression gating. Computer Methods and Programs in Biomedicine, 247, 108080. doi:https://doi.org/10.1016/j.cmpb.2024.108080. (2024). [DOI] [PubMed]
57.Liu, S. et al. Identification of a lncRNA/circRNA-miRNA-mRNA network in Nasopharyngeal Carcinoma by deep sequencing and bioinformatics analysis. Journal of Cancer, 15(7), 1916–1928. doi: 10.7150/jca.91546. (2024). [DOI] [PMC free article] [PubMed]
58.Li, Y. et al. CircMYBL1 suppressed acquired resistance to osimertinib in non-small-cell lung cancer. Cancer Genetics,284–285, 34–42. doi: https://doi.org/10.1016/j.cancergen.2024.04.001. (2024). [DOI] [PubMed]
59.Chen, S. et al. Evaluation of a three-gene methylation model for correlating lymph node metastasis in postoperative early gastric cancer adjacent samples. Frontiers in Oncology, 14, 1432869. doi: https://doi.org/10.3389/fonc.2024.1432869. (2024). [DOI] [PMC free article] [PubMed]
60.Cao, Z., Zhu, J., Wang, Z., Peng, Y. & Zeng, L. Comprehensive pan-cancer analysis reveals ENC1 as a promising prognostic biomarker for tumor microenvironment and therapeutic responses. Sci. Rep.14 (1), 25331. 10.1038/s41598-024-76798-9 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Zhou, J. et al. Chrysotoxine regulates ferroptosis and the PI3K/AKT/mTOR pathway to prevent cervical cancer. Journal of Ethnopharmacology, 338, 119126. doi: https://doi.org/10.1016/j.jep.2024.119126. (2025). [DOI] [PubMed]
62.Jiang, Z. et al.Low-Frequency Ultrasound Sensitive Piezo1 Channels Regulate Keloid-Related Characteristics of Fibroblasts.Advanced Science, 11(14), 2305489. doi: https://doi.org/10.1002/advs.202305489. (2024). [DOI] [PMC free article] [PubMed]
63.Saber, A., Elbedwehy, S., Awad, W. A. & Hassan, E. An optimized ensemble model based on meta-heuristic algorithms for effective detection and classification of breast tumors. Neural Comput. Appl.37 (6), 4881–4894 (2025). [Google Scholar]
64.Elbedwehy, S., Hassan, E., Saber, A. & Elmonier, R. Integrating neural networks with advanced optimization techniques for accurate kidney disease diagnosis. Sci. Rep.14 (1), 21740 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Alnowaiser, K., Saber, A., Hassan, E. & Awad, W. A. An optimized model based on adaptive convolutional neural network and grey Wolf algorithm for breast cancer diagnosis. PloS One, 19(8), e0304868. (2024). [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.Rauf, F. et al. DenseIncepS115: a novel network-level fusion framework for alzheimer’s disease prediction using MRI images. Front. Oncol.14, 1501742 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Fatima, M. et al. Breast lesion segmentation and classification using U-Net saliency Estimation and explainable residual convolutional neural network. Fractals10, S0218348X24400607 (2024). [Google Scholar]

[CR3] 3.Ertel, W. Introduction To Artificial Intelligence (Springer, 2018).

[CR4] 4.Kalis, B., Collier, M. & Fu, R. 10 promising AI applications in health care. Harvard Business Rev. 2–5 (2018).

[CR5] 5.Wang, H., Zu, Q., Chen, J., Yang, Z. & Ahmed MA. Application of artificial intelligence in acute coronary syndrome: a brief literature review. Adv. Therapy 1–9 (2021). [DOI] [PubMed]

[CR6] 6.Majumder, J., Ghosh, S., Khang, A., Debnath, T. & Chaudhuri, A. K. Hepatitis C. Prediction using feature selection by machine learning technique. In Medical Robotics and AI-Assisted Diagnostics for a High-Tech Healthcare Industry, pp 195–204, IGI Global (2024).

[CR7] 7.Rauf, F. et al. Artificial intelligence assisted common maternal fetal planes prediction from ultrasound images based on information fusion of customized convolutional neural networks. Front. Med.11, 1486995 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Ullah, M. S., Khan, M. A., Albarakati, H. M., Damaševičius, R. & Alsenan, S. Multimodal brain tumor segmentation and classification from MRI scans based on optimized DeepLabV3 + and interpreted networks information fusion empowered with explainable AI. Comput. Biol. Med.182, 109183 (2024). [DOI] [PubMed] [Google Scholar]

[CR9] 9.Yu, C. & Helwig, E. J. The role of AI technology in prediction, diagnosis and treatment of colorectal cancer. Artif. Intell. Rev.55, 323–343 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Kumar, Y., Gupta, S., Singla, R. & Hu Y. C. A systematic review of artificial intelligence techniques in cancer prediction and diagnosis. Arch. Comput. Methods Eng.29, 2043–2070 (2022). [DOI] [PMC free article] [PubMed]

[CR11] 11.McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature577, 89–94 (2020). [DOI] [PubMed]

[CR12] 12.Majumder, A. & Sen D. Artificial intelligence in cancer diagnostics and therapy: current perspectives. Indian J. Cancer58, 481–492 (2021). [DOI] [PubMed]

[CR13] 13.Pantanowitz, L. et al. An artificial intelligence algorithm for prostate cancer diagnosis in whole slide images of core needle biopsies: a blinded clinical validation and deployment study. Lancet Digit. Health2, e407–e416 (2020). [DOI] [PubMed]

[CR14] 14.Feng, H. et al. Identifying malignant breast ultrasound images using ViT-patch. Appl. Sci.13, 3489 (2023).

[CR15] 15.Ray, A., Chen, M. & Gelogo, Y. Performance comparison of different machine learning algorithms for risk prediction and diagnosis of breast cancer. In Smart Technologies in Data Science and Communication: Proceedings of SMART-DSC 2019 71–76 Springer Singapore. (2020).

[CR16] 16.Rana, M., Chandorkar, P., Dsouza, A. & Kazi, N. Breast cancer diagnosis and recurrence prediction using machine learning techniques. Int. J. Res. Eng. Technol.4, 372–376 (2015). [Google Scholar]

[CR17] 17.Kharya, S., Dubey, D. & Soni, S. Predictive machine learning techniques for breast cancer detection. Int. J. Comput. Sci. Inform. Technol.4, 1023–1028 (2013). [Google Scholar]

[CR18] 18.Liu, H. et al. Recent advances in pulse-coupled neural networks with applications in image processing. Electronics11 3264 (2022).

[CR19] 19.Agrawal, S. & Agrawal, J. Neural network techniques for cancer prediction: a survey. Procedia Comput. Sci.60, 769–774 (2015). [Google Scholar]

[CR20] 20.Abdulkareem, S. A. & Abdulkareem Z. O. An evaluation of the Wisconsin breast cancer dataset using ensemble classifiers and RFE feature selection. Int. J. Sci. Basic. Appl. Res.55, 67–80 (2021).

[CR21] 21.Alshayeji, M. H., Ellethy, H. & Gupta R Computer-aided detection of breast cancer on the Wisconsin dataset: an artificial neural networks approach. Biomed. Signal Process. Control. 71, 103141 (2022). [Google Scholar]

[CR22] 22.Benbrahim, H. & Hachimi, H. and Amine A. (Eds.). Comparative study of machine learning algorithms using the breast cancer dataset. In Advanced Intelligent Systems for Sustainable Development (AI2SD’2019) Volume 2-Advanced Intelligent Systems for Sustainable Development Applied To Agriculture and Health, pp 83–91, Springer International Publishing (2020).

[CR23] 23.Hernández-Julio, Y. F., Díaz-Pertuz, L. A., Prieto-Guevara, M. J., Barrios-Barrios, M. A. & Nieto-Bernal, W. Intelligent fuzzy system to predict the Wisconsin breast cancer dataset. Int. J. Environ. Res. Public Health. 20, 5103 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Hossin, M. M. et al. Breast cancer detection: an effective comparison of different machine learning algorithms on the Wisconsin dataset. Bull. Electr. Eng. Inf.12, 2446–2456 (2023). [Google Scholar]

[CR25] 25.Kadhim, R. R. & Kamil, M. Y. Comparison of breast cancer classification models on Wisconsin dataset. International Journal of Reconfigurable and Embedded Systems 2089 4864. (2022).

[CR26] 26.Mohammad, W. T., Teete, R., Al-Aaraj, H., Rubbai, Y. S. Y. & Arabyat, M. M. Diagnosis of breast cancer pathology on the Wisconsin dataset with the help of data mining classification and clustering techniques. Applied Bionics and Biomechanics 2022 6187275. (2022). [DOI] [PMC free article] [PubMed] [Retracted]

[CR27] 27.Naji, M. A. et al. Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Comput. Sci.191, 487–492 (2021).

[CR28] 28.Umami, R. F. & Sarno, R. Analysis of classification algorithm for Wisconsin diagnosis breast cancer data study. In 2020 International Seminar on Application for Technology of Information and Communication (iSemantic) 464–469 IEEE. (2020).

[CR29] 29.Ahmed, S. et al. The deep learning resnet101 and ensemble Xgboost algorithm with hyperparameters optimization accurately predict the lung cancer. Appl. Artif. Intell.37, 2166222 (2023).

[CR30] 30.Alshayeji, M. H. & Abed S.E. Lung cancer classification and identification framework with automatic nodule segmentation screening using machine learning. Appl. Intell.53, 19724–19741 (2023).

[CR31] 31.Al-Tawalbeh, J. et al. Classification of lung cancer by using machine learning algorithms. In 2022 5th International Conference on Engineering Technology and its Applications (IICETA) 528–531 IEEE. (2022).

[CR32] 32.Bharathy, S. & Pavithra, R. Lung cancer detection using machine learning. In 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC) 539–543 IEEE. (2022).

[CR33] 33.Bushara, A. R., Kumar, R. V. & Kumar, S. S. An ensemble method for the detection and classification of lung cancer using computed tomography images utilizing a capsule network with visual geometry group. Biomed. Signal Process. Control. 85, 104930 (2023). [Google Scholar]

[CR34] 34.Ingle, K., Chaskar, U. & Rathod, S. Lung cancer types prediction using machine learning approach. In 2021 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT) 01–06 IEEE. (2021).

[CR35] 35.Lakshmi, S. V., Greeshma, B., Thanooj, M. J., Reddy, K. R. & Rakesh K. R. Lung cancer detection and stage classification using supervised algorithms. Turkish J. Physiotherapy Rehabilitation32, 3 (2021).

[CR36] 36.Nam, Y. J. & Shin W. J. A study on comparison of lung cancer prediction using ensemble machine learning. Korea J. Artif. Intell.7, 19–24 (2019).

[CR37] 37.Tsoi, K. K., Chan, F. C., Hirai, H. W. & Sung JJ Risk of Gastrointestinal bleeding and benefit from colorectal cancer reduction from long-term use of low-dose aspirin: a retrospective study of 612,509 patients. J. Gastroenterol. Hepatol.33, 1728–1736 (2018). [DOI] [PubMed] [Google Scholar]

[CR38] 38.Liao, C. M., Huang, W. H., Kung, P. T., Chiu, L. T. & Tsai, W. C. Comparison of colorectal cancer screening between people with and without disability: a nationwide matched cohort study. BMC Public. Health. 21, 1034 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Chaudhuri, A. K., Banerjee, D. K. & Das A. A dataset-centric feature selection and stacked model to detect breast cancer. Int. J. Intell. Syst. Appl.13, 24 (2021).

[CR40] 40.Chaudhuri, A. K. & Das, S. and Ray A. (Eds.). An improved random forest model for detecting heart disease. In Data-Centric AI Solutions and Emerging Technologies in the Healthcare Ecosystem, pp 143–164, CRC (2024).

[CR41] 41.Safar, A. A., Salih, D. M. & Murshid, A. M. Pattern recognition using the multilayer perceptron (MLP) for medical disease: a survey. Int. J. Nonlinear Anal. Appl.14, 1989–1998 (2023). [Google Scholar]

[CR42] 42.Javaid, S. & Saeed N. Neural networks for infectious diseases detection: prospects and challenges. Authorea Preprints (2023).

[CR43] 43.Macukow, B. (Ed.). Neural networks–state of art, brief history, basic models and architecture. In Computer Information Systems and Industrial Management 3–14 Springer International Publishing (2016).

[CR44] 44.Wang, X. Analysis and optimization for Hoeffding tree (2020).

[CR45] 45.Mathew, T. E. Appositeness of Hoeffding tree models for breast cancer classification. J. Curr. Sci. Technol.12, 391–407 (2022). [Google Scholar]

[CR46] 46.Elbasi, E. & Zreikat A.I. Heart disease classification for early diagnosis based on adaptive Hoeffding tree algorithm in IoMT data. Int. Arab. J. Inform. Technol.20, 38–48 (2023).

[CR47] 47.Chaudhuri, A. K. & Das, A. Variable selection in genetic algorithm model with logistic regression for prediction of progression to diseases. In 2020 IEEE International Conference for Innovation in Technology (INOCON) 1–6 IEEE. (2020).

[CR48] 48.Agarap, A. F. M. On breast cancer detection: an application of machine learning algorithms on the Wisconsin diagnostic dataset. In Proceedings of the 2nd International Conference on Machine Learning and Soft Computing 5–9. (2018).

[CR49] 49.Chaudhuri, A. K., Sinha, D., Banerjee, D. K. & Das A. A novel enhanced decision tree model for detecting chronic kidney disease. Netw. Model. Anal. Health Inf. Bioinf.10, 1–22 (2021).

[CR50] 50.Das, S. et al. A.K. A multifaceted approach to understanding mental health crises in the COVID-19 era: using AI algorithms and feature selection strategies. In AI-Driven Innovations in Digital Healthcare: Emerging Trends, Challenges, and Applications, pp. 97–119 IGI Global. (2024).

[CR51] 51.Kar, S. P. et al. Identification of insecurity in COVID-19 using machine learning techniques. In Medical Robotics and AI- Assisted Diagnostics for a High-Tech Healthcare Industry 239–256 IGI Global. (2024).

[CR52] 52.Cheng, Z. et al. Application of serum SERS technology based on thermally annealed silver nanoparticle composite substrate in breast cancer. Photodiagnosis and Photodynamic Therapy, 41, 103284. doi:https://doi.org/10.1016/j.pdpdt.2023.103284. (2023). [DOI] [PubMed]

[CR53] 53.Zeng, Q. et al. Serum Raman spectroscopy combined with convolutional neural network for rapid diagnosis of HER2-positive and triple-negative breast cancer. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 286, 122000. doi: https://doi.org/10.1016/j.saa.2022.122000. (2023). [DOI] [PubMed]

[CR54] 54.Pu, X., Sheng, S., Fu, Y., Yang, Y. & Xu, G. Construction of circRNA–miRNA–mRNA CeRNA regulatory network and screening of diagnostic targets for tuberculosis. Ann. Med.56 (1), 2416604. 10.1080/07853890.2024.2416604 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR55] 55.Xingguang Duan, D. et al. Changsheng li. A novel robotic bronchoscope system for navigation and biopsy of pulmonary lesions. Cyborg Bionic Syst. 40013. 10.34133/cbsystems.0013 (2023). [DOI] [PMC free article] [PubMed]

[CR56] 56.Li, H. et al. UCFNNet: Ulcerative colitis evaluation based on fine-grained lesion learner and noise suppression gating. Computer Methods and Programs in Biomedicine, 247, 108080. doi:https://doi.org/10.1016/j.cmpb.2024.108080. (2024). [DOI] [PubMed]

[CR57] 57.Liu, S. et al. Identification of a lncRNA/circRNA-miRNA-mRNA network in Nasopharyngeal Carcinoma by deep sequencing and bioinformatics analysis. Journal of Cancer, 15(7), 1916–1928. doi: 10.7150/jca.91546. (2024). [DOI] [PMC free article] [PubMed]

[CR58] 58.Li, Y. et al. CircMYBL1 suppressed acquired resistance to osimertinib in non-small-cell lung cancer. Cancer Genetics,284–285, 34–42. doi: https://doi.org/10.1016/j.cancergen.2024.04.001. (2024). [DOI] [PubMed]

[CR59] 59.Chen, S. et al. Evaluation of a three-gene methylation model for correlating lymph node metastasis in postoperative early gastric cancer adjacent samples. Frontiers in Oncology, 14, 1432869. doi: https://doi.org/10.3389/fonc.2024.1432869. (2024). [DOI] [PMC free article] [PubMed]

[CR60] 60.Cao, Z., Zhu, J., Wang, Z., Peng, Y. & Zeng, L. Comprehensive pan-cancer analysis reveals ENC1 as a promising prognostic biomarker for tumor microenvironment and therapeutic responses. Sci. Rep.14 (1), 25331. 10.1038/s41598-024-76798-9 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR61] 61.Zhou, J. et al. Chrysotoxine regulates ferroptosis and the PI3K/AKT/mTOR pathway to prevent cervical cancer. Journal of Ethnopharmacology, 338, 119126. doi: https://doi.org/10.1016/j.jep.2024.119126. (2025). [DOI] [PubMed]

[CR62] 62.Jiang, Z. et al.Low-Frequency Ultrasound Sensitive Piezo1 Channels Regulate Keloid-Related Characteristics of Fibroblasts.Advanced Science, 11(14), 2305489. doi: https://doi.org/10.1002/advs.202305489. (2024). [DOI] [PMC free article] [PubMed]

[CR63] 63.Saber, A., Elbedwehy, S., Awad, W. A. & Hassan, E. An optimized ensemble model based on meta-heuristic algorithms for effective detection and classification of breast tumors. Neural Comput. Appl.37 (6), 4881–4894 (2025). [Google Scholar]

[CR64] 64.Elbedwehy, S., Hassan, E., Saber, A. & Elmonier, R. Integrating neural networks with advanced optimization techniques for accurate kidney disease diagnosis. Sci. Rep.14 (1), 21740 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR65] 65.Alnowaiser, K., Saber, A., Hassan, E. & Awad, W. A. An optimized model based on adaptive convolutional neural network and grey Wolf algorithm for breast cancer diagnosis. PloS One, 19(8), e0304868. (2024). [DOI] [PMC free article] [PubMed]

PERMALINK

Multistage feature selection and stacked generalization model for cancer detection

Sulekha Das

Avijit Kumar Chaudhuri

Sayak Das

Partha Ghosh

Abstract

Introduction

Diagnostic procedures

Medication development

Personalized medicine

Patient monitoring

Treatment protocol formation

Uses of machine learning in cancer prediction

Fig. 1.

Relevant literature

Table 1.

Table 2.

Assessment of model performance

Table 3.

Table 4.

Table 5.

Bagging (Bootstrap Aggregating)

Boosting

Cross validation

Methodology

Logistic regression

SVM

NB

MLP

HT

Feature selection

Fig. 2.

Algorithm 1: multistage feature selection

Stacking

Fig. 3.

Algorithm 2: stacking with k-fold cross-validation

Results and discussion

Table 6.

Table 7.

Table 8.

Table 9.

Table 10.

Table 11.

Table 12.

Table 13.

Table 14.

Table 15.

Table 16.

Table 17.

Table 18.

Table 19.

Table 20.

Table 21.

Fig. 4.

Fig. 5.

Fig. 6.

Interpretation of results

Cross-validation methodology

Fig. 7.

Key results from the cross-validation

Conclusion

Future scope

Acknowledgements

Author contributions

Funding information

Data availability

Declarations

Competing interests

Ethical statement

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases