Skip to main content
. 2024 Dec 21;184(1):98. doi: 10.1007/s00431-024-05925-5

Table 5.

Recommended elements to report in machine learning-based studies

Step Rationale/general description
Preprocessing

Exploratory data analysis/descriptive statistics

- Cohort characteristics: Describe in detail the used cohort/sample. Present the initial participation rate and relevant background factors for evaluation of generalizability and representability

- Analysis subset characteristics and comparison: If a subset of the cohort/sample was used for the analysis, compare background factors between the full cohort/sample and the subset

- Correlation analysis: Report correlations between variables (e.g., through a correlation matrix). Understanding these relationships can guide feature engineering and model selection, as well as reveal potential (multi)collinearity issues

- Missingness: Visualization of the degree of missingness (across subjects and variables, preferably with a measure of patterns across missingness between variables) provides useful information as well

Outlier/invalid data management: Describe the presence/degree of outliers and/or data deemed invalid, and if any processing of these was performed. Visualization is particularly useful, e.g., with simple box plots. If possible, provide code/syntax (in a repository)

Management of missingness: Visualize/tabulate missingness and patterns thereof. Provide the rationale for using a particular imputation algorithm or other approach, including details (preferably including plots) on evaluation/validation of the imputation. If possible, provide code/syntax (in a repository)

Feature selection: Provide explicit rationale for the used variables. Ideally, add a table (in the supplementary material) where the reason(s) for inclusion/exclusion for each variable that could potentially have been of relevance are listed. Importantly, feature selection processes should be described in detail, including narrative summary and output of data-driven approaches and tables. If data-driven methods were used, provide code/syntax (in a repository)

Feature scaling: Report if the variables were inputted as-is or if any scaling was performed (preferably providing code/syntax)

Dimensionality reduction: Report the tools and hyperparameters/settings used (preferably providing the actual code/syntax), together with details on the percentage of variance explained in the reduced subspace, loss, or other relevant information to assess the performance/representability of the new data

Model training / evaluation

Algorithm selection: Describe the rationale for the selection of models. Preferably, select at least two models to assess the robustness of the chosen solution

Model implementation: Explain in detail how the model(s) were implemented, which hyperparameter settings were tested and the underlying rationale. Preferably, provide the actual implementation code/syntax (in a repository)

Model evaluation: Provide a detailed log of the model(s) with different hyperparameters, so as to make the selection of the optimal solution transparent and clear for the reader. For example, if a cluster analysis was performed and 3–5 cluster-solutions were assessed as the top 3 models, provide clinical characteristics and evaluation metrics for at least these (but preferably all tested solutions)

Interpretation

Characteristics: Most relevant for unsupervised analyses. Provide rich details on the subgroups, including on parameters included and not included in the model (e.g., background factors, comorbidities, sociodemographic factors etc.)

Influencing factors/explanation of the model: Provide as much detail as possible on how the model derived its output, e.g., feature importance

Uncertainty in findings: Describe the uncertainty in the model (e.g., 95% confidence intervals of the predictions or subject characteristics)

External validation: Provide an analysis of the generalizability of the results, preferably by externally validating the model in a different cohort/sample

Limitations: Could the analyses have been done differently in an optimal setting? Transparently describe challenges and drawbacks/compromises