. Author manuscript; available in PMC: 2019 Aug 1.

Published in final edited form as: Clin Cancer Res. 2018 Mar 26;24(15):3492–3499. doi: 10.1158/1078-0432.CCR-18-0385

Table 2. Commonly observed pitfalls.

A list of commonly observed pitfalls in data analysis of medical image data.

Pitfall 1: No predefined analysis protocol	- Not defining the analysis protocols beforehand could result in testing a large number of different design strategies to optimize the performance, which often does not generalize to other independent data sets.
Pitfall 2: Insufficient data for training and validation	- Insufficient training data could diminish the learning capability of a model. Insufficient validation data hinders the true evaluation of the underlying hypotheses.
Pitfall 3: No multiple test correction	- When a large number of features are statistically evaluated, often there is a chance of false “discovery”. Several statistical techniques, known as multiple testing corrections, are available to prevent that. No (or incorrect) multiple testing correction could induce erroneous statistical inferences and results.
Pitfall 4: No feature reduction	- Large number of features increase the chance of false “discovery”. Moreover, it could also induce the multicollinearity and overfitting. Therefore in order to avoid the curse of dimensionality, feature reduction or selection methods are required.
Pitfall 5: Overfitting	- While an overfitted model works extremely well in the initial training data set, its performance degrades substantially on other, independent data sets. Proper strategies should be applied to reduce the chance of overfitting on training data.
Pitfall 6: No locked data and Information leakage	- Often validation (parts of) data is used within training procedure. For example, features are selected based on the complete dataset, including validation data, and then these features are used for training a model in the training data. Validating this model in the validation cohort would be incorrect. As features are selected using the validation cohort, there is a possibility of information leakage and overfitting. Similarly, validation data can not be used for tuning the hyperparameters. Validation data should always be locked and untouched during training.
Pitfall 7: Not reporting appropriate performance metric	- Often using a single performance metric for the evaluation of a model is not sufficient. For example, accuracy of a classifier is sensitive to the event ratio (class distribution) of a population. Accurate evaluation can be achieved by reporting multiple performance metrics. For example, in classification studies, AUC, Sensitivity, Specificity, PPV, NPV etc. should be reported along with the accuracy of a classifier.
Pitfall 8: Training performance are incorrectly reported	- Often it has been observed that model performance on the training data is reported as results. Although this could provide information regarding the learning capability and convergence of a model, it does not give any information regarding the generalizability of a model and hence does not allow valid evaluation. Results should only emphasise the model performance in the independent validation cohorts.