Skip to main content
. 2021 May 18;9(5):1223–1239. doi: 10.1007/s43390-021-00360-0

Fig. 2.

Fig. 2

Flow chart demonstrating the general process of training, validating, and testing utilized during the development of machine learning models. This diagram shows how training/test datasets are generated from the original data (usually 80/20 split). The training data is then split again (generally 80/20) into a training set and validation set, most often using a technique called cross-validation. The training data is randomly split 80/20k-number of times, such that the model learns from the training set, and then parameter tuning is done with the validation set k-number of times; ultimately the learned models are averaged to select the optimal one. The resulting model is then tested on a distinct test set for final performance evaluation, usually given by % accuracy and AUC values. The model can then be deployed to make predictions on new data