Pitfall 1: No predefined analysis protocol |
|
Pitfall 2: Insufficient data for training and validation |
|
Pitfall 3: No multiple test correction |
-
-
When a large number of features are statistically evaluated, often there is a chance of false “discovery”. Several statistical techniques, known as multiple testing corrections, are available to prevent that. No (or incorrect) multiple testing correction could induce erroneous statistical inferences and results.
|
Pitfall 4: No feature reduction |
-
-
Large number of features increase the chance of false “discovery”. Moreover, it could also induce the multicollinearity and overfitting. Therefore in order to avoid the curse of dimensionality, feature reduction or selection methods are required.
|
Pitfall 5: Overfitting |
-
-
While an overfitted model works extremely well in the initial training data set, its performance degrades substantially on other, independent data sets. Proper strategies should be applied to reduce the chance of overfitting on training data.
|
Pitfall 6: No locked data and Information leakage |
-
-
Often validation (parts of) data is used within training procedure. For example, features are selected based on the complete dataset, including validation data, and then these features are used for training a model in the training data. Validating this model in the validation cohort would be incorrect. As features are selected using the validation cohort, there is a possibility of information leakage and overfitting. Similarly, validation data can not be used for tuning the hyperparameters. Validation data should always be locked and untouched during training.
|
Pitfall 7: Not reporting appropriate performance metric |
-
-
Often using a single performance metric for the evaluation of a model is not sufficient. For example, accuracy of a classifier is sensitive to the event ratio (class distribution) of a population. Accurate evaluation can be achieved by reporting multiple performance metrics. For example, in classification studies, AUC, Sensitivity, Specificity, PPV, NPV etc. should be reported along with the accuracy of a classifier.
|
Pitfall 8: Training performance are incorrectly reported |
-
-
Often it has been observed that model performance on the training data is reported as results. Although this could provide information regarding the learning capability and convergence of a model, it does not give any information regarding the generalizability of a model and hence does not allow valid evaluation. Results should only emphasise the model performance in the independent validation cohorts.
|