Nested CV used to evaluate classification models under study. The dataset is first split into train and test sets in the outer loop. The outer train set in each fold is used to select the model hyperparameters that maximize the inner CV accuracy. The best hyperparameters are then used to train the model using the entire samples in the outer train set, which is then tested on the outer test set to obtain 1 out of K (=10) performance scores (accuracy, precision, recall, F1 score, and AUC). The average of the performance scores over K folds is reported as the final results.