1 |
|
Designing the Study Plan |
|
1.1 |
Describe the need for the application of machine learning to the dataset |
|
1.2 |
Describe the objectives of the machine learning analysis |
|
1.3 |
Define the study plan |
|
1.4 |
Describe the summary statistics of baseline data |
|
1.5 |
Describe the overall steps of the machine learning workflow |
2 |
|
Data Standardization, Feature Engineering, and Learning |
|
2.1 |
Describe how the data were processed in order to make it clean, uniform, and consistent |
|
2.2 |
Describe whether variables were normalized and if so, how this was done |
|
2.3 |
Provide details on the fraction of missing values (if any) and imputation methods |
|
2.4 |
Describe any feature selection processes applied |
|
2.5 |
Identify and describe the process to handle outliers if any |
|
2.6 |
Describe whether class imbalance existed, and which method was applied to deal with it |
3 |
|
Selection of Machine Learning Models |
|
3.1 |
Explicitly define the goal of the analysis e.g., regression, classification, clustering |
|
3.2 |
Identify the proper learning method used (e.g., supervised, reinforcement learning etc.) to address the problem |
|
3.3 |
Provide explicit details on the use of simpler, complex, or ensemble models |
|
3.4 |
Provide the comparison of complex models against simpler models if possible |
|
3.5 |
Define ensemble methods, if used |
|
3.6 |
Provide details on whether the model is interpretable |
4 |
|
Model Assessment |
|
4.1 |
Provide a clear description of data used for training, validation, and testing |
|
4.2 |
Describe how the model parameters were optimized (e.g., optimization technique, number of model parameters etc.) |
5 |
|
Model Evaluation |
|
5.1 |
Provide the metric(s) used to evaluate the performance of the model |
|
5.2 |
Define the prevalence of disease and the choice of the scoring rule used |
|
5.3 |
Report any methods used to balance the numbers of subjects in each class |
|
5.4 |
Discuss the risk associated to misclassification |
6 |
|
Best Practices for Model Replicability |
|
6.1 |
Consider sharing code or scripts on a public repository with appropriate copyright protection steps for further development and non-commercial use |
|
6.2 |
Release a data dictionary with appropriate explanation of the variables |
|
6.3 |
Document the version of all software and external libraries used |
7 |
|
Reporting Limitations, Biases and Alternatives |
|
7.1 |
Identify and report the relevant model assumptions and findings |
|
7.2 |
If well performing models were tested on a hold-out validation dataset, detail the data of that validation set with the same rigor as that of training dataset (see section 2 above) |