Algorithm 1: Ensemble feature selection with nested cross-validation |
Input: miRNA expression dataset D ∈ ℝN × F, where N is the number of samples and F is the number of miRNA features. Output:
Fminimal ← ∅ M* ← None (i.e., Leave-6-Out Cross-Validation when N = 60) Generate p = 10 non-overlapping folds: , where Ti ∈ ℝ(N − p) × F, Vi ∈ ℝp × F Step 2. Outer Cross-Validation (Leave-p-Out) for each i ∈ {1, 2, …, p} do Let Ti be the outer training set and Vi be the outer validation set Step 3. Inner Cross-Validation (Stratified k-Fold) and Feature Selection for each j ∈ {1, 2, …, k} do Feature Selection on tj: Apply RFE, Random Forest importance, LASSO, and SelectKBest Model Training: Train classifiers (LR, RF, SVM, XGBoost, AdaBoost) on tj Model Evaluation: Evaluate on vj using Accuracy, Sensitivity, Specificity, F1 Score, and AUC Model Selection: Choose best-performing model Mj Update Feature Set: Add features to Fminimal if selected in ≥ 3 inner folds Select most frequent model across inner folds as: M* = argmaxMj (frequency of selection in inner folds) Step 4. Model Validation on Outer Fold Use M* and Fminimal to classify Vi Evaluate performance using Accuracy, Sensitivity, Specificity, F1 Score, and AUC Return:
|