Prognostic models for predicting clinical disease progression, worsening and activity in people with multiple sclerosis

. 2020 May 7;2020(5):CD013606. doi: 10.1002/14651858.CD013606

Domain	Key items
Study information	Study identifier (last name of first author and publication year), citation Development with/without internal validation and/or external validation
Source of data	Cohort, case‐control, randomised trial, registry, routine care data Retrospective/prospective data collection
Participants	Inclusion and exclusion criteria Recruitment method and details (location, number of centres, setting) Participant description (including disease duration, type of MS at prognostication, diagnostic criteria used) Details of treatments received Study dates
Outcomes to be predicted	Definition and method of measurement of outcome Category of outcome measure (conversion to definite MS, conversion to progressive MS, relapses, progression score, composite) Was the same outcome definition and method of measurement used on all participants? Was the outcome assessed without knowledge of candidate predictors (i.e. blinded)? Were candidate predictors part of the outcome? Time of outcome occurrence or summary of duration of follow‐up
Candidate predictors	Number and type of predictors (e.g. demographics, symptoms, scores, CSF, imaging, electrophysiological, ‐omics, disease type) Definition and method for measurement of candidate predictors Timing of predictor measurement (e.g. at patient presentation, as diagnosis, at predefined intervals) Were predictors assessed blinded for outcome (if relevant)? Handling of predictors in the modelling (e.g. continuous, linear, (fractional) polynomial/spline/non‐linear transformations, categorizations)
Sample size	Number of participants and number of events Number of events in relation to number to candidate predictors (EPV) Power of study assessed?
Missing data	Number of participants with any missing value Number of participants with missing values for each predictor Method for handling missing data (e.g. complete‐case analysis, single imputation, multiple imputation) Loss to follow‐up discussed?
Model development	Modelling method (e.g. logistic, survival, neural network, specific machine learning technique) Modelling assumptions satisfied? Was it a longitudinal model? Method for selection of predictors for inclusion in multivariable modelling (e.g. all candidate predictors, pre‐selection based on unadjusted association with outcome, etc) Method for selection of predictors during multivariable modelling (e.g. full model approach, backward selection, forward selection) Criteria used for selection of predictors during multivariable modelling (e.g. P value, AIC, BIC) Shrinkage of predictor weights/regression coefficients (e.g. no shrinkage, uniform shrinkage, penalized estimation)
Model performance	Measure and estimate of calibration with confidence intervals (calibration plot, calibration slope, Hosmer‐Lemeshow test) Measure and estimate of discrimination with confidence intervals (C‐statistic, D‐statistic) Log‐rank used for discrimination (yes, no, not applicable) Measure and estimate of classification with confidence intervals (sens, spec, ppv, npv, net reclassification, accuracy rate (TP+TN/N), error rate (1‐acc)) Were a‐priori cut points used for classification measures? (yes, no, NR, NA) Overall performance (R^2, Brier score, etc)
Model evaluation	If model development, model performance tested on development dataset only or on separate external validation If model development, method used for testing model performance on development dataset (random split of data, resampling methods e.g. bootstrap or crossvalidation, none) In case of poor validation, was model adjusted or updated (e.g. intercept recalibrated, predictor effects adjusted, new predictors added)?
Results	Multivariable model (e.g. basic, extended, simplified) presented, including predictor weights/regression coefficients, intercept, baseline survival; with standard errors or confidence intervals) Any alternative presentation of the final prediction models (e.g. sum score, nomogram, score chart, predictions for specific risk subgroups with performance) Provide details on how risk groups were created, if done and the observed values at which they occur Comparison of the distribution of predictors (including missing data) for development and validation data sets If validation, is the same model used as presented in development (same intercept and weights, no dropping of variables, etc)?
Interpretation and discussion	Aim according to authors (abstract, discussion) Was the primary aim prediction of individual patient outcomes? Are the models interpreted as probably/confirmatory (model useful for practice) or probably/exploratory (more research is needed)? Comparisons made with other studies, discussion of generalisability, strengths, and limitations Suggested improvements for the future
Model readiness for practical use	Sufficient explanation to allow for further use, availability of predictors, external validation, prediction intervals, timing