Skip to main content
. 2020 Nov 19;31(6):3909–3922. doi: 10.1007/s00330-020-07417-0

Table 1.

Checklist of items to include when reporting ML studies

1. Which clinical problem is being solved?
  □ Which patients or disease does the study concern?
  □ How can ML improve upon existing diagnostic or prognostic approaches?
  □ What stage of diagnostic pathway is investigated?
2. Choice of ML model
  □ Which ML model is used?
  □ Which measures are taken to avoid overfitting?
3. Sample size motivation
  □ Is the sample size clearly motivated?
  □ Which considerations were used to prespecify a sample size?
  □ Is there a statistical analysis plan?
4. Specification of study design and training, validation, and testing datasets
  □ Is the study prospective or retrospective?
  □ What were the inclusion and exclusion criteria?
  □ How many patients were included for training, validation, and testing?
  □ Was the test dataset kept separate from the training and validation datasets?
  □ Was an external dataset used for validation?*
  □ Who performed external validation?
5. Standard of reference
  □ What was the standard of reference?
  □ Were existing labels used, or were labels newly created for the study?
  □ How many observers contributed to the standard of reference?
  □ Were observers blinded to the output of the ML algorithm and to labels of other observers?
6. Reporting of results
  □ Which measures are used to report diagnostic or prognostic accuracy?
  □ Which other measures are used to express agreement between the ML algorithm and the standard of reference?
  □ Are contingency tables given?
  □ Are confidence estimates given?
7. Are the results explainable?
  □ Is it clear how the ML algorithm came to a specific classification or recommendation?
  □ Which strategies were used to investigate the algorithm’s internal logic?
8. Can the results be applied in a clinical setting?
  □ Is the dataset representative of the clinical setting in which the model will be applied?
  □ What are significant sources of bias?
  □ For which patients can it be used clinically?
  □ Can the results be implemented at the point of care?
9. Is the performance reproducible and generalizable?
  □ Has reproducibility been studied?
  □ Has the ML algorithm been validated externally?
  □ Which sources of variation have been studied?
10. Is there any evidence that the model has an effect on patient outcomes?
  □ Has an effect on patient outcomes been demonstrated?
11. Is the code available?
  □ Is the software code available? Where is it stored?
  □ Is the fully trained ML model available or should the algorithm be retrained with new data?
  □ Is there a mechanism to study the algorithms’ results over time?

*Data from another institute or hospital