Skip to main content
. Author manuscript; available in PMC: 2021 Mar 12.
Published in final edited form as: JACC Cardiovasc Imaging. 2020 Sep;13(9):2017–2035. doi: 10.1016/j.jcmg.2020.07.015

TABLE 1.

Checklist for Standardized Reporting of Machine Learning Investigations

Section Checklist item
1 Designing the Study Plan
1.1 Describe the need for the application of machine learning to the dataset
1.2 Describe the objectives of the machine learning analysis
1.3 Define the study plan
1.4 Describe the summary statistics of baseline data
1.5 Describe the overall steps of the machine learning workflow
2 Data Standardization, Feature Engineering, and Learning
2.1 Describe how the data were processed in order to make it clean, uniform, and consistent
2.2 Describe whether variables were normalized and if so, how this was done
2.3 Provide details on the fraction of missing values (if any) and imputation methods
2.4 Describe any feature selection processes applied
2.5 Identify and describe the process to handle outliers if any
2.6 Describe whether class imbalance existed, and which method was applied to deal with it
3 Selection of Machine Learning Models
3.1 Explicitly define the goal of the analysis e.g., regression, classification, clustering
3.2 Identify the proper learning method used (e.g., supervised, reinforcement learning etc.) to address the problem
3.3 Provide explicit details on the use of simpler, complex, or ensemble models
3.4 Provide the comparison of complex models against simpler models if possible
3.5 Define ensemble methods, if used
3.6 Provide details on whether the model is interpretable
4 Model Assessment
4.1 Provide a clear description of data used for training, validation, and testing
4.2 Describe how the model parameters were optimized (e.g., optimization technique, number of model parameters etc.)
5 Model Evaluation
5.1 Provide the metric(s) used to evaluate the performance of the model
5.2 Define the prevalence of disease and the choice of the scoring rule used
5.3 Report any methods used to balance the numbers of subjects in each class
5.4 Discuss the risk associated to misclassification
6 Best Practices for Model Replicability
6.1 Consider sharing code or scripts on a public repository with appropriate copyright protection steps for further development and non-commercial use
6.2 Release a data dictionary with appropriate explanation of the variables
6.3 Document the version of all software and external libraries used
7 Reporting Limitations, Biases and Alternatives
7.1 Identify and report the relevant model assumptions and findings
7.2 If well performing models were tested on a hold-out validation dataset, detail the data of that validation set with the same rigor as that of training dataset (see section 2 above)