Skip to main content
. 2022 Apr;63(4):500–510. doi: 10.2967/jnumed.121.262567

TABLE 2.

Summary of Recommendations

Category Topic Recommendation
Study design Task definition Collaborate with domain experts, stakeholders
Study types Identify publications as development studies or evaluation studies
Risk assessment Assess the degree of risk that algorithm poses to patients and conduct study accordingly
Statistical plan Preregister statistical analysis plans for prospective studies
Data collection Bias anticipation Collect data belonging to classes or groups that are vulnerable to bias
Training set size estimation Estimate size on the basis of trial and error, or prior similar studies
Evaluation of set size estimation* Use statistical power analysis for guidance
Data decisions Use justified, objective, and documented inclusion and exclusion criteria
Data labeling Reference standard Use labels that are regarded as sufficient standards of reference by the field
Label quality Justify label quality by application, study type, and clinical claim (Fig. 4)
Labeling guide* Produce detailed guide for labelers in reader studies
Quantity/quality tradeoff Consider multiple labelers (quality) over greater numbers (quantity)
Model design Model comparison* Explore and compare different models for development studies
Baseline comparison Compare complex models with simpler models or standard of care
Model selection Report model selection and hyperparameter tuning techniques
Model stability Use repeated training with random initialization when feasible
Ablation study* Perform ablation studies for development studies focusing on novel architectures
Model training Cross validation* Use cross validation for development studies; preserve data distribution across splits
Data leakage Avoid information leaks from test set during model training
Model testing and interpretability Test set Use same data and class distribution as for target population; use high-quality labels
Target population Explicitly define target population
External sets Use external sets for evaluating model sensitivity to dataset shift
Evaluation metric Use multiple metrics when appropriate; visually inspect model outputs
Model interpretability* Use interpretability methods for clinical tasks
Reporting and dissemination Reporting Follow published reporting guidelines and checklists
Sharing* Make code and models from development studies accessible
Transparency Be forthcoming about failure modes and population characteristics in training and evaluation sets
Reproducibility checks Ensure that submitted materials to journals are sufficient for replication
Evaluation

*Not all recommendations are applicable to all types of studies.

Addressed in separate report from AI Task Force.