Skip to main content
. 2021 Jul 9;13(14):3450. doi: 10.3390/cancers13143450

Figure 1.

Figure 1

Schematic representation of the modeling pipeline: (I) data preprocessing steps, which include encoding categorical features as integer arrays and data oversampling to convert from an imbalanced to a balanced dataset (i.e., the numbers of surviving and deceased patients were made nearly equal) to avoid bias (i.e., preventing the model from ignoring minority classes); (II) hyperparameter optimization via a 3-fold cross-validation to find the best subset of hyperparameters (k: hyperparameter subset index number, P: number of estimators, Q: maximum depth of each estimator, R: learning rate, S: subsample ratio, T: column sample ratio for each estimator) that improves the models’ ROC–AUC score {f(yi,y^i)} signifying the area under the receiver operating characteristic curve during the (432 × 3) iterations over the hyperparameter space (i.e., to enhance the predictive accuracy of the XAI models); (III) testing of the predictive accuracy of the final AI models after being trained with the best subset of hyperparameters; (IV) prediction of the probability of the clinical outcomes with AI models; (V) explanation of the predicted outcomes with a game theory-based XAI model to enhance the interpretability and explainability of the model predictions, identification of critical inflection (turning) points, above or below which the ≥5-year survival rates increase, and assessing the conditional probability of ≥5-year survival rates from the range of TME factors determined by the inflection points.