Skip to main content
. 2025 May 8;12(5):497. doi: 10.3390/bioengineering12050497
Algorithm 1: Ensemble feature selection with nested cross-validation
Input: miRNA expression dataset D ∈ ℝN × F, where N is the number of samples and F is the number of miRNA features.
Output:
  •       •

    Minimal miRNA feature set Fminimal

  •       •

    Best-performing model M*

  •       •

    Mean performance metrics across all validation sets

Step 1. Initialization
             Fminimal ← ∅
             M* ← None
             Let p=N10 (i.e., Leave-6-Out Cross-Validation when N = 60)
             Generate p = 10 non-overlapping folds:
                  {(Ti,Vi)}i=1p, where Ti ∈ ℝ(Np) × F, Vi ∈ ℝp × F
Step 2. Outer Cross-Validation (Leave-p-Out)
             for each i ∈ {1, 2, …, p} do
                  Let Ti be the outer training set and Vi be the outer validation set
Step 3. Inner Cross-Validation (Stratified k-Fold) and Feature Selection
                  Split Ti into k stratified folds: {(tj,vj)}j=1k
                  for each j ∈ {1, 2, …, k} do
                       Feature Selection on tj:
                            Apply RFE, Random Forest importance, LASSO, and SelectKBest
                       Model Training:
                            Train classifiers (LR, RF, SVM, XGBoost, AdaBoost) on tj
                       Model Evaluation:
                            Evaluate on vj using Accuracy, Sensitivity, Specificity, F1 Score, and AUC
                       Model Selection:
                            Choose best-performing model Mj
                       Update Feature Set:
                            Add features to Fminimal if selected in ≥ 3 inner folds
                  Select most frequent model across inner folds as:
                       M* = argmaxMj (frequency of selection in inner folds)
Step 4. Model Validation on Outer Fold
                      Use M* and Fminimal to classify Vi
                      Evaluate performance using Accuracy, Sensitivity, Specificity, F1 Score, and AUC
Return:
  • -

    Fminimal—final feature set

  • -

    M*—best-performing model

  • -

    Average performance metrics over all Vi, i = 1, 2, …, p