Diagrams of inconsistent partitioning. Random features (R) based on
published radiomics data form the basis of our experimentation (atypical
from radiomics machine learning [ML] studies). (A) The
upper level (blue and yellow) illustrates consistent partitioning that
prevents information leak, while the lower level (green) demonstrates
how the use of the entire data set for radiomics feature normalization,
feature selection, hyperparameter selection, model selection, and
performance reporting will result in an unrealistically optimistic
assessment of the radiomics ML model. (B) Diagrams show
normalization strategies. Data set normalization (green) is an example
of inconsistent partitioning, with use of a mean and SD calculated with
use of all samples, both the training and test sets, to scale. Train
normalization (right) and split normalization (bottom) are different
approaches to consistent partitioning (more details in
Appendix S1).