Skip to main content
The Cochrane Database of Systematic Reviews logoLink to The Cochrane Database of Systematic Reviews
. 2023 Sep 8;2023(9):CD013606. doi: 10.1002/14651858.CD013606.pub2

Prognostic models for predicting clinical disease progression, worsening and activity in people with multiple sclerosis

Kelly Reeve 1, Begum Irmak On 2, Joachim Havla 3, Jacob Burns 2,4, Martina A Gosteli-Peter 5, Albraa Alabsawi 2, Zoheir Alayash 2,6, Andrea Götschi 1, Heidi Seibold 7, Ulrich Mansmann 2,4, Ulrike Held 1,
Editor: Cochrane Multiple Sclerosis and Rare Diseases of the CNS Group
PMCID: PMC10486189  PMID: 37681561

Abstract

Background

Multiple sclerosis (MS) is a chronic inflammatory disease of the central nervous system that affects millions of people worldwide. The disease course varies greatly across individuals and many disease‐modifying treatments with different safety and efficacy profiles have been developed recently. Prognostic models evaluated and shown to be valid in different settings have the potential to support people with MS and their physicians during the decision‐making process for treatment or disease/life management, allow stratified and more precise interpretation of interventional trials, and provide insights into disease mechanisms. Many researchers have turned to prognostic models to help predict clinical outcomes in people with MS; however, to our knowledge, no widely accepted prognostic model for MS is being used in clinical practice yet.

Objectives

To identify and summarise multivariable prognostic models, and their validation studies for quantifying the risk of clinical disease progression, worsening, and activity in adults with MS.

Search methods

We searched MEDLINE, Embase, and the Cochrane Database of Systematic Reviews from January 1996 until July 2021. We also screened the reference lists of included studies and relevant reviews, and references citing the included studies.

Selection criteria

We included all statistically developed multivariable prognostic models aiming to predict clinical disease progression, worsening, and activity, as measured by disability, relapse, conversion to definite MS, conversion to progressive MS, or a composite of these in adult individuals with MS. We also included any studies evaluating the performance of (i.e. validating) these models. There were no restrictions based on language, data source, timing of prognostication, or timing of outcome.

Data collection and analysis

Pairs of review authors independently screened titles/abstracts and full texts, extracted data using a piloted form based on the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS), assessed risk of bias using the Prediction Model Risk Of Bias Assessment Tool (PROBAST), and assessed reporting deficiencies based on the checklist items in Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD). The characteristics of the included models and their validations are described narratively. We planned to meta‐analyse the discrimination and calibration of models with at least three external validations outside the model development study but no model met this criterion. We summarised between‐study heterogeneity narratively but again could not perform the planned meta‐regression.

Main results

We included 57 studies, from which we identified 75 model developments, 15 external validations corresponding to only 12 (16%) of the models, and six author‐reported validations. Only two models were externally validated multiple times. None of the identified external validations were performed by researchers independent of those that developed the model. The outcome was related to disease progression in 39 (41%), relapses in 8 (8%), conversion to definite MS in 17 (18%), and conversion to progressive MS in 27 (28%) of the 96 models or validations. The disease and treatment‐related characteristics of included participants, and definitions of considered predictors and outcome, were highly heterogeneous amongst the studies. Based on the publication year, we observed an increase in the percent of participants on treatment, diversification of the diagnostic criteria used, an increase in consideration of biomarkers or treatment as predictors, and increased use of machine learning methods over time.

Usability and reproducibility

All identified models contained at least one predictor requiring the skills of a medical specialist for measurement or assessment. Most of the models (44; 59%) contained predictors that require specialist equipment likely to be absent from primary care or standard hospital settings. Over half (52%) of the developed models were not accompanied by model coefficients, tools, or instructions, which hinders their application, independent validation or reproduction. The data used in model developments were made publicly available or reported to be available on request only in a few studies (two and six, respectively).

Risk of bias

We rated all but one of the model developments or validations as having high overall risk of bias. The main reason for this was the statistical methods used for the development or evaluation of prognostic models; we rated all but two of the included model developments or validations as having high risk of bias in the analysis domain. None of the model developments that were externally validated or these models' external validations had low risk of bias. There were concerns related to applicability of the models to our research question in over one‐third (38%) of the models or their validations.

Reporting deficiencies

Reporting was poor overall and there was no observable increase in the quality of reporting over time. The items that were unclearly reported or not reported at all for most of the included models or validations were related to sample size justification, blinding of outcome assessors, details of the full model or how to obtain predictions from it, amount of missing data, and treatments received by the participants. Reporting of preferred model performance measures of discrimination and calibration was suboptimal.

Authors' conclusions

The current evidence is not sufficient for recommending the use of any of the published prognostic prediction models for people with MS in clinical routine today due to lack of independent external validations. The MS prognostic research community should adhere to the current reporting and methodological guidelines and conduct many more state‐of‐the‐art external validation studies for the existing or newly developed models.

Keywords: Adult, Humans, Disease Progression, Multiple Sclerosis, Prognosis, Reproducibility of Results, Systematic Reviews as Topic

Plain language summary

Which models exist for prediction of future disease outcomes in people with multiple sclerosis?

Why is it important to study multiple sclerosis?

Multiple sclerosis (MS) is a chronic disease of the brain, spine, and nerves. Millions of people worldwide suffer from this disease, but the disease and how it progresses can be very different from person to person. Although MS cannot be cured, different treatments are available that can help reduce symptoms and slow the worsening of the disease. These treatments work differently, with some having more severe side effects than others. Understanding the severity of an individual’s MS is important to patients and medical professionals.

Why are prognostic models important in the context of multiple sclerosis?

Prognostic models help patients and medical professionals understand how sick an individual is and will become. This understanding can support patients during life and treatment choices. Prognostic models can also help medical professionals make decisions about how to best treat an individual, better understand the disease, or to develop treatments. Prognostic models for MS might involve combining a range of different pieces of information about an individual to predict how their MS will continue to develop. Important pieces of information to include in a prognostic model could be, for example, information on personal characteristics (such as age, sex, body mass index), information on their behaviour (such as whether they smoke), and information about their MS (such as how long they have had the disease). Other clinical features or measurements may also be important.

What did we want to find out?

We wanted to search for and find all prognostic models that combine multiple pieces of information to predict how MS will continue to develop and worsen in adults.

What did we do?

We used different techniques to search for all studies that described prognostic models, which combine multiple pieces of information, developed in the context of MS. We were interested in studies showing how these prognostic models were developed, as well as studies evaluating how well they actually work in practice. Once we found all relevant studies, we summarised them and evaluated how well they reported their results and how well they were conducted.

What did we find?

We found 57 studies that described prognostic models combining multiple pieces of information to predict how MS will continue to develop and worsen in adults. These studies described the development of 75 different prognostic models. There were 15 instances in which the performance of specific prognostic models was evaluated.

We found that prognostic models focus on different outcomes; 41% looked at disease progression, 8% at relapses, 18% at moving from a first attack to definite MS, and 28% at moving from the early stages of MS to progressive MS. The prognostic models we found were very different from one another in many ways. The patients they used to develop the models, for example, were very different in terms of treatments. In addition, the pieces of information they used to predict the course of MS were very different from one another. We found that prognostic models have changed over time regarding the diagnosis of MS and increase in use of treatment, the information observed with new techniques, or new modelling approaches. We also found that using these prognostic models requires information about the individual that would require a medical specialist and often specialist equipment, both of which may not be available in many clinics and hospitals.

What are the limitations of the evidence?

We found problems with most studies, meaning that we may not be able to trust their results. Common problems involved data and statistical methods used across studies. Additionally, many of the studies report results that may be very different if the prognostic models are applied to a new set of people with MS. We also found that the studies did a poor job of describing their methods and reporting their findings.

What does this mean?

The studies we found show that the evidence on prognostic models for predicting how MS will continue to develop and worsen in adults is not yet well‐developed. New research is needed that focusses on using methods recommended in guidelines to develop prognostic models and evaluate their performance. This research should also focus on describing their methods and results well, so that other researchers and medical professionals can use them for research and clinical practice.

Summary of findings

Summary of findings 1. Summary of findings.

Population: adults with relapsing‐remitting multiple sclerosis
Setting: specialty clinical care
Model: models with more than one external validation
Outcome: conversion to progressive multiple sclerosis
Timing: prediction of time to outcome at disease onset
Model name External validations (study, if different) Number of participants Performance measure Overall risk of bias assessment
Manouchehrinia 2019 British Columbia cohort 3967 c‐statistic 0.77 (95% CI 0.76 to 0.78) High due to use of a predictor measured at a time point after time of model use, and lack of calibration assessment
ACROSS trial 175 c‐statistic 0.77 (95% CI 0.70 to 0.85) Same as above
FREEDOMS and FREEDOMS II trial extensions 2355 c‐statistic 0.87 (95% CI 0.84 to 0.89) Same as above
Bayesian Risk Estimate for Multiple Sclerosis score Italian cohort (Bergamaschi 2007) 535 Cutoff 95% sensitivity 0.17, specificity 0.99 High due to lack of discrimination or calibration assessment
MSBase registry (Bergamaschi 2015) 1131 Cutoff 50% sensitivity 0.35, specificity 0.80 Same as above

ACROSS: A CROSS‐Sectional Long‐term Follow‐up of Fingolimod Phase II Study Patients
CI: confidence interval
FREEDOMS: FTY720 Research Evaluating Effects of Daily Oral therapy in Multiple Sclerosis

Background

Description of the health condition and context

Multiple sclerosis (MS) is a chronic inflammatory disease of the central nervous system (CNS) that usually begins in young adulthood and affects 2.8 million people worldwide (Adelman 2013; Thompson 2018a; Walton 2020). The course of MS varies greatly and is characterised by clinical, radiological, genetic, and pathological heterogeneity. The exact aetiology of MS is still unclear, even though there are convincing arguments for an (auto‐)immunopathogenesis (Hohlfeld 2016a; Hohlfeld 2016b), triggered or driven by exposure to environmental risk factors (Attfield 2022). These include the neuropathological findings, the various analogies to the autoimmune animal models, and, above all, the response of MS to various immunosuppressive therapies (Thompson 2018a). Genetic research also supports the (auto‐)immunopathogenesis theory implicating peripheral immune cells and microglia in susceptibility (Attfield 2022; Patsopoulos 2019; Sawcer 2011). Environmental factors such as vitamin D deficiency or Epstein‐Barr virus infection have been shown to have an influence on the development (Attfield 2022; Belbasis 2015; Bjornevik 2023) and course (Hempel 2017) of the disease. Modern imaging techniques, such as diffusion tensor imaging, as well as neuropathological investigations have shown that, in addition to demyelination in MS, significant damage occurs to axons (Thompson 2018a).

The current diagnosis of MS is based on the modified ‘McDonald criteria’ (Thompson 2018b), and further differentiation of disease course into subtypes is described by Lublin 2014. In relapsing MS, the disease may initially present as clinically isolated syndrome (CIS), a first disease attack of at least 24 hours with patient‐reported symptoms reflecting an inflammatory demyelinating event in the CNS without fever or infection. According to current diagnostic criteria, the first attack may already be definite relapsing‐remitting MS (RRMS) if there is temporal and spatial dissemination at the time of initial manifestation, as evidenced by magnetic resonance imaging (MRI), cerebrospinal fluid (CSF) diagnosis, and/or clinical presentation. RRMS is characterised by relapses and periods of remission with stable neurological disability (Thompson 2018b). According to natural history studies, 30% to 50% of untreated RRMS patients convert to secondary progressive MS (SPMS) within 10 to 15 years after disease onset (Weinshenker 1989a). Progressive MS is defined as a steadily increasing neurological disability without unequivocal recovery (Lorscheider 2016). About 15% of people with MS, however, have progressive disease from the start, the primary progressive MS (PPMS) subtype (Reich 2018). This classification was made at a time when few biomarkers were available and is still used in clinical practice, especially for communication with patients and the definition of study cohorts. However, in the current understanding of MS pathophysiology, both peripherally initiated and CNS compartmentalised inflammation processes are assumed to contribute to the disease progression. In addition, signs of neurodegeneration can be detected early in the disease. From a clinical point of view, this means that even people with relapsing MS may have a gradual progression independent of relapse activity in addition to accumulation of residual disability from relapses (relapse‐associated worsening). Similarly, people with progressive MS may continue experiencing relapses (Lublin 2014).

Although MS is still incurable, pharmacological treatment for MS, particularly RRMS, has developed with increasing speed since the introduction of the first interferon‐beta preparation more than 25 years ago. The arsenal of MS therapeutics includes various substances with different mechanisms of action. The main therapeutic goals of treatment are reduction in relapse rate, delaying onset, and slowing or stopping confirmed disability progression (Wingerchuk 2016). Therapeutic choice amongst highly effective therapeutic options has led to the expectation of no evidence of disease activity (NEDA) under immunotherapy. This is defined by the absence of relapses, disability progression, and active MRI lesions (Thompson 2018a). There are two established treatment strategies: the use of mild to moderately effective but safe medications from disease onset with escalation strategies as needed ('stepwise escalation'), or the use of higher efficacy medications from disease onset, which may be associated with higher risk of adverse events ('hit hard and early' concept) (Ontaneda 2019). Overtreatment or undertreatment should be avoided and the risk‐benefit balance be considered; however, refraining from treatment is also an option.

The current guidelines usually classify the available therapies as first‐, second‐, or third‐line according to their efficacy and safety profiles and recommend selection of a therapy based on the patient’s disease activity and preferences, reserving efficacious but high‐risk second‐line medications for highly active disease (Hemmer 2021; Montalban 2018; Rae‐Grant 2018). The definition of highly active disease varies across the literature, however (Diaz 2019; Freedman 2016), and how to define benign MS is unclear (Correale 2012). With its broad spectrum of clinical manifestations and an armamentarium of therapeutic approaches with different risk profiles, MS is a prime example of a disease that requires individualised medicine.

Description of the prognostic models

Many potential prognostic factors have been identified for predicting disease progression, worsening, and activity in people with MS. These include but are not limited to age, sex, body mass index, smoking history, and disease duration (Briggs 2019). Various biomarkers for MS have also been proposed, with those measured by MRI being the most commonly investigated (Rotstein 2019). However, prediction typically requires a combination of prognostic factors (Steyerberg 2013), especially for multifactorial diseases such as MS. Researchers in this clinical field have noted the strong focus on prognostic factors as opposed to prognostic modelling and have expressed the need for models for estimation of ‘individualised’ risk (de Groot 2009; Wottschel 2015).

A prognostic model is an empirical model that combines the effects of two or more predictors in order to estimate the risk of future clinical outcomes in individual patients within a specified length of time (Steyerberg 2013; Steyerberg 2019). As with prognosis research more generally, these models can serve many purposes, including improving the study design and analysis of randomised clinical trials (Hernández 2004; Roozenbeek 2009). For instance, Sormani and colleagues suggest the use of their model for participant selection in MS clinical trials (Sormani 2007). Adjusting for baseline risk in network meta‐analyses (Chalkou 2021) and health service research (Jarman 2010) are other application areas.

Ideally, prognostic models are developed using large high‐quality datasets, with subjects representative of the population to which the model should later be applied. Large samples may generally be required for more complex modelling tasks, such as model development including data‐driven predictor selection from a large set of candidate predictors. Sufficiently large datasets reduce the potential for overfitting and ensure that the overall risk can be precisely estimated (Riley 2019). Outcomes and their timing should be important to people with the health condition of interest and, along with predictors, be well‐defined prior to their assessment. When selecting predictors to consider, basic variables known to be related to prognosis, such as disease duration and sex, should always be included, in addition to novel biomarkers that may provide added value (Steyerberg 2013).

Before a prognostic model is used in practice, it must be appropriately evaluated. This evaluation ideally has two components. One component, discrimination, assesses the success of a prognostic model in ranking those that experience the event versus those that do not. The second component, calibration, assesses the prognostic model’s ability to estimate event probabilities that are close to those actually observed. Good discriminative power is important to all prognostic model applications and may even be sufficient for some applications (Justice 1999), such as patient stratification in randomised controlled trials and adjustment in comparative healthcare research. However, people with MS and the clinicians advising them are interested in the absolute probability of outcomes in these individuals, as opposed to comparing risks with other people, hence model calibration is very important in this setting.

The data used for evaluation determines its usefulness and generalisability, i.e. how the model is expected to perform in new patients (Justice 1999). Internal validation is the evaluation of a model in the sample that it was developed in. If the internal validation is performed directly in the development sample without any resampling techniques (apparent validation), the model accuracy is expected to be overestimated, i.e. overoptimistic (Harrell 2001; Moons 2019; Steyerberg 2019). Resampling techniques, such as cross‐validation and bootstrapping, allow us to assess overfitting and account for overoptimism. However, even with correct internal validation procedures, we only learn about the accuracy of the model as applied to people from an identical underlying population. Therefore, a further prerequisite before use of a prognostic model in practice is external validation, i.e. its evaluation in a group of patients independent of those that were used in the model’s development. Such independence may be based on many qualities, such as time, location, and participant spectrum (patients with different disease severities or belonging to different disease subtypes). In MS, historical transportability is important, for example, because disease severity is likely to have changed over time with changes in diagnostic criteria. It is important to assess whether models developed under older diagnostic criteria are still accurate when applied to patients today. Before any clinical application, a prognostic model needs to have good discrimination and calibration in many different external validations, preferably by researchers independent of those that developed the model.

To our knowledge, no widely accepted prognostic model for MS is being used in clinical practice yet. A systematic review is needed to understand the state of the MS prognostic modelling literature as a whole and whether any models are on the way towards translation into practice. In order to address this goal, the scope of our review will be broad in terms of outcomes, predictors, timing, and setting, as well as the form of MS addressed.

Health outcomes

According to a survey conducted by Day and colleagues, disability progression and relapses are the most important outcomes related to disease course for people with MS and clinical experts alike (Day 2018). Disease progression is characterised by a relapse‐independent accumulation of neurological deficits and usually manifests as a decrease in walking ability that occurs over varying time spans (Warnke 2019). Disease progression is most commonly measured by the ambulation functional score of the Expanded Disability Status Scale (EDSS). Neurological disability has also been operationalised by the Multiple Sclerosis Functional Composite (MSFC). The International Advisory Committee on Clinical Trials of MS has suggested that the term ‘progression’ should only be used for the progressive subtypes of the disease and that relapse‐related increases in disability be referred to as disease 'worsening' (Lublin 2014). Although a more consistent application of the Lublin Criteria and the 'progression independent of relapse activity' and 'relapse‐associated worsening' terminology regarding relapse‐associated worsening and progression independent of relapse activity has become apparent in recent years, the terminology used in the literature may not exactly match these definitions. For the purposes of this review, increase in disability, either dependent or independent of relapses, is relevant, as it is ranked as the highest priority outcome by people with MS (Day 2018).

Relapses, another high‐priority clinical outcome indicative of disease activity, manifest as acute and transient episodes of neurological symptoms. Subacute episodes can lead to different neurological symptoms, which may remit completely in the course of the disease, but may also be accompanied by residual disability. Despite the fact that relapse rate is the primary outcome in most confirmatory clinical trials leading to market approval of RRMS therapies, it is not yet clear whether a reduction in relapse rate is associated with a better overall prognosis. For example, the strength of the effect of the reduction in relapse rate with regard to the prevention of long‐term disability progression remains controversial (Cree 2019).

Diagnostic transition to a more advanced disease stage, indicative of worsening and active disease, is also of interest to prognostic research in this field. For example, people initially diagnosed with CIS can meet the criteria of clinically definite MS by experiencing another relapse. The ability to predict whether or when the conversion to definite MS will occur might have substantial clinical impact on decisions to start or abstain from early treatment in people with CIS. Patients initially diagnosed with RRMS can be considered to have converted to a progressive course, SPMS, by retrospective assessment of sustained progression independent of relapses over a period of time, for example one year (Thompson 2018b).

As MS is a lifelong condition, we find the aforementioned outcomes to be relevant not only at various time points of prognostication during the disease course, but also for various prediction horizons. We also expect outcome definitions, timing, and measurement methods for clinical disease progression, worsening, and activity to be highly heterogeneous in the literature.

Why it is important to do this review

While there are more than 50 published Cochrane Reviews on interventions for MS or associated symptoms and more than 20 are ongoing, this is the first Cochrane Review of prognostic studies in MS (Cochrane 2021). Independent of the Cochrane network, Hempel and colleagues reviewed 59 studies of single modifiable prognostic factors in MS progression, such as vitamin D levels and smoking status (Hempel 2017). Also, Río and Ruiz‐Peña reviewed 45 studies that predict long‐term treatment response by short‐term response criteria, including both single factors and multivariable expert‐based algorithms (Río 2016). Both reviews found a wide variety of methods, timing, and outcome and prognostic factor definitions.

We aimed to conduct a systematic Cochrane Review of multivariable prognostic models for predicting future clinical outcomes indicative of disease progression, worsening, and activity in people with MS at any time point following diagnosis. The results from this review will provide a long‐sought comprehensive summary and assessment of the evidence base for all disease subtypes (not just RRMS) and across all statistical methodologies (not just machine learning (ML) studies). We aimed to thereby enhance the knowledge base described by the many non‐systematic reviews (Derfuss 2012; Gafson 2017; Miller 2008; Rotstein 2019) and focused systematic reviews (Brown 2020; Havas 2020; Seccia 2021) reported thus far. Identified models could potentially provide people with MS and their physicians with informative and clinically relevant tools for making decisions on disease management.

No review thus far presents changes in prognostic factors and methods over time, nor assessment of reporting deficiencies of the models in the literature. We also summarised the readiness of the models for translation into clinical practice in terms of external validation evidence, thereby identifying models that require further external validation or clinical impact assessment. This review forms a solid basis from which to make recommendations for future prognosis research in MS.

Objectives

To identify and summarise multivariable prognostic models for quantifying the risk of clinical disease progression, worsening, and activity in MS.

To this end we aimed to:

  • describe the characteristics of the identified multivariable prognostic models, including prognostic factors considered and evaluation measures used;

  • describe changes in outcome definitions, time frames, prognostic factors, and statistical methods over time;

  • summarise the validation performance of the models;

  • summarise model performance and synthesise across external validation studies via meta‐analysis, where possible;

  • investigate sources of heterogeneity between studies;

  • assess the risk of bias in the models;

  • evaluate moderating effects on model performance by meta‐regression, where possible; and

  • make recommendations for future MS prognostic research.

Methods

Criteria for considering studies for this review

Defining the eligibility criteria was an iterative process, which involved multiple discussions within the review team based on our previous knowledge, as well as several studies we knew should be included, and borderline cases that we knew should be excluded. These criteria are described by the PICOTS table below and in the following sections.

Population Adults with MS, including all subtypes (CIS, RRMS, SPMS, PPMS)
Intervention All multivariable prognostic models and their validation studies
Comparator There are no comparators in this review
Outcome Clinical disease progression, worsening, and activity, which are measured based on disability, relapses, conversion to a more advanced disease subtype (clinically definite, progressive), or a composite of these
Timing The models are to be used any time following diagnosis for predicting future disease course
Setting Any clinical setting where people with MS receive medical care
CIS: clinically isolated syndrome; MS: multiple sclerosis; PPMS: primary progressive MS; RRMS: relapsing‐remitting MS; SPMS: secondary progressive MS

Types of studies

We included studies that aimed to develop, validate, extend, or update multivariable prognostic models of future disease outcomes in people with MS.

  • Study design: We included prognostic modelling studies that used data collected retrospectively or prospectively from the following sources: routine care, disease/patient registries, cohort studies, case‐control studies, and randomised controlled trials.

  • Data source and setting: We included studies based on both primary and secondary use of data. We included models intended for use in any clinical setting where people with MS receive medical care. We excluded studies that did not contain prediction of future outcomes in individuals.

  • Statistical methods: We included models developed with either traditional statistical methods or machine learning (ML). For the purpose of this review, a method is considered ML if it has at least one tuning parameter, excluding Bayesian priors, for controlling its architecture and, as a result, its performance.

  • Validation: We included studies that evaluated a previously reported prognostic model in a different set of participants by reporting discrimination, calibration, or classification measures based on predictions from that model, even though the term ‘validation’ was not explicitly used. We also included studies that reported validation of a previously reported prognostic model, even though what was done did not constitute an external validation in its strictest sense (see 'Terms used for reporting' for details). Studies that did not meet the search or inclusion criteria themselves but described the development of models evaluated or validated in future eligible studies were also included in order to extract data from and assess the risk of bias in the model development.

Targeted population

We included studies on adult individuals, 18 years old or over, with a diagnosis of MS, irrespective of the subtype or treatment status. We included studies that did not specify the disease subtype of their sample and studies that included people with one or more MS subtypes of CIS, relapsing, progressive, or any other categories. When a study included people with a single episode of optic neuritis, we considered the event to comprise a CIS and considered the study eligible.

Types of prognostic models

Determining whether a study reporting a multivariable model is a prognostic model study for predicting future disease outcomes in individuals can be difficult (Kreuzberger 2020). In this review, a study was considered to develop a multivariable prognostic model if the aims, results, and discussion report on the model itself, and not just the individual predictors comprising the model or the methodology used. For example, a study that reported only adjusted predictor effect measures from a multivariable model and discussed these, but neither evaluated the predictive performance of the model using discrimination, calibration, or classification measures nor discussed the model as a whole was excluded.

Studies were not limited by their modelling method; i.e. inclusion did not depend on whether traditional statistical methods or ML methods were used for development. We excluded studies predicting outcomes only based on single prognostic factors. We also excluded studies reporting on models that aimed to predict treatment response, either beneficial or harmful. The use of treatment as a predictor in the model was by itself not considered to determine the aim of treatment response prediction. Rather, the reported aim of the study was the determining factor.

Types of outcomes to be predicted

We included clinical outcomes indicating disease progression, worsening, and activity. We accepted author definitions based on any of the following:

  • disability progression/worsening;

  • relapse/attack;

  • conversion to a more advanced disease subtype:

    • to definite MS; or

    • to progressive MS;

  • composite outcomes containing at least one of the above (such as NEDA).

We included studies with any of the above outcomes, including models validated for a different outcome than originally developed. We did not exclude studies based on the data type of the outcome, even though prognosis is usually interpreted as referring to the risk of an event, i.e. necessitating a binary outcome. We excluded models that predict only paraclinical outcomes, such as laboratory measurements or image findings, because their translation to patient‐relevant outcomes at the individual level is unclear and they are not prioritised by people with MS (Day 2018). We also excluded studies predicting only quality of life outcomes, due to the difficulty in interpreting their clinical meaning. We considered cognitive disability to constitute a domain of disability whereas fatigue, depression, and falls did not fit any of the aforementioned outcome categories and we considered them out of scope for this review aiming to be relevant to clinical practice.

We did not exclude any studies based on time point of prognostication or the time horizon for which the prognostic models apply because our preliminary review of the prognostic literature in MS indicated very liberally defined (in years) and heterogeneous time points of prognostication, both in relation to diagnosis and start of treatment. Defining a time horizon was considered too restrictive for the review objective. For clinically meaningful outcomes, however, we expected disability progression/worsening and conversion from RRMS to SPMS to be measured in years. Relapses and conversion from CIS to RRMS were expected to be measured in months to a couple of years.

Search methods for identification of studies

Electronic searches

To identify eligible studies, we searched the following databases on 2 July 2021 (Appendix 1).

  • MEDLINE (Ovid SP) (1996 to 1 July 2021)

  • Embase (embase.com) (1996 to 2 July 2021)

  • Cochrane Database of Systematic Reviews (CDSR 2021, Issue 6) (searched 2 July 2021, via www.cochranelibrary.com)

  • Cochrane Central Register of Controlled Trials (CENTRAL 2021, Issue 6) (searched 2 July 2021, via www.cochranelibrary.com)

The Embase search above included conference proceedings from the following organisations.

  • European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS)

  • Americas Committee for Treatment and Research in Multiple Sclerosis (ACTRIMS)

  • American Academy of Neurology (AAN)

  • European Academy of Neurology (EAN)

We restricted the search to studies published since 1996, the year of publication of an important tutorial on multivariable prognostic models in Statistics in Medicine (Harrell 1996). Before this time, methods were rapidly being developed but at the same time concerns over the misuse of statistical modelling for prediction of health outcomes were being raised (Chatfield 1995; Concato 1993; Diamond 1989). We considered Harrell 1996 to be a turning point, after which many papers (Altman 2000), textbooks (Harrell 2001; Steyerberg 2019), and guidelines (Enhancing the QUAlity and Transparency Of health Research (EQUATOR)) network and Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD)) addressing proper analysis and reporting became readily available (Collins 2015; Simera 2008). We did not impose language restrictions on the search.

We used a search strategy for systematic reviews of prognostic models based on that of Geersing 2012 and further refined for this review. We validated this strategy for our specific question by determining whether it could identify a list of 11 a priori defined studies of interest: six inclusions (Bejarano 2011; Bergamaschi 2007; Margaritella 2012; Pellegrini 2019; Tousignant 2019; Wottschel 2015) and five excluded studies (borderline studies) that required assessment at the full‐text stage (Bovis 2019; Cree 2016; Gauthier 2007; Kalincik 2017; Runmarker 1994). Also, we randomly selected 120 titles and abstracts from less stringent search criteria and screened them to prevent missing any relevant studies.

We split this modified filter into three sub‐searches: search terms specific for prediction or prognostic models (2a); terms for general models (2b); and the statistical search terms (2c).

The search comprised two main parts, with each combining either two or three main concepts, as follows:

  • MS (1) and specific prognostic models (2a); or

  • MS (1) and general models or general statistical terms (2b or 2c) and clinical outcomes (3).

The search strategy used combinations of thesaurus search terms (MeSH or Emtree) and free‐text search terms including synonyms in the title and abstract. Animal studies and studies in children were excluded from the search.

Searching other resources

We performed backward reference searching of all included studies and all MS prognosis reviews identified during screening using Web of Science. We tracked citations of all the included studies (forward reference searching) via Web of Science. We performed the search in Web of Science between 13 October 2020 and 25 October 2020 for the studies/reviews from the initial database search, and on 16 August 2021 for the studies/reviews from the update to the database search. We also contacted authors of all included studies for further information on unpublished or ongoing studies.

Data collection

Selection of studies

Aiming to refine the eligibility criteria and ensure a common understanding amongst the review authors, we conducted a pilot title and abstract screening with a random subset that produced 200 results from the draft search strategy. This was followed by full‐text screening of the eight titles marked for inclusion. We selected eligible studies from the search results using the criteria outlined in 'Criteria for considering studies for this review' via the Rayyan web application (Ouzzani 2016). We used the same platform to document the exclusion reasons at the full‐text screening stage.

Pairs of review authors (BIO, KAR, AA, ZA, AG) performed title and abstract screening independently, and we included all titles marked for inclusion by at least one author at this stage in the full‐text screening. We also performed, independently and in duplicate, assessment of full texts for their inclusion in the review or reasons for their exclusion. When the record corresponded to a conference abstract, we searched its title and/or authors online (www.google.com, onlinelibrary.ectrims‐congress.eu/ectrims; accessed between 22 April 2022 and 29 October 2021) for any related articles, poster, or video presentations and, if available, considered these additional sources of information during our assessment. In addition, if the full text did not meet all the inclusion criteria but also could not be excluded with the available reported information, the authors of the respective studies were contacted for clarification. We resolved disagreements by involving a third review author (BIO, KAR) and, when necessary, through group discussion.

If a conference abstract meeting our inclusion criteria did not have an associated publication like a peer‐reviewed or preprint article, the data needed to inform the review and risk of bias assessment could not be extracted. As stated by Kreuzberger and colleagues, this complicates assessment of both inclusion/exclusion and risk of bias (Kreuzberger 2020). Consultation with study authors would only provide sufficient information if an associated publication could also be supplied. Hence, we considered conference abstracts to be awaiting classification until a report with more information on them becomes available.

For the assessment of non‐English titles/abstracts, we used online translators (translate.google.com, www.deepl.com/en/translator) and included any record that seemed to be relevant for full‐text screening. At full‐text stage, we (BIO and KAR) consulted the assessment of native speakers of that language and retrieved the translation of the full‐text for our independent assessment in duplicate.

We summarised the study selection process with a flowchart adapted from the PRISMA statement (Page 2021), showing the number of records we identified, the number of reports we excluded with reasons, and the total number of studies included.

Details regarding selection of studies

Due to the recency of the relevant reporting guidelines (Collins 2015), poor labelling of prognostic prediction studies, and the novelty of this review type (Kreuzberger 2020), we had regular meetings to clarify the boundaries and application of the selection criteria both at the title/abstract and full‐text screening levels. For transparency, we report the details of the recurrent themes and the decisions below.

  • The distinction between reports of multivariable prognostic prediction models and reports assessing the value of a single prognostic factor or searching for independent prognostic factors by multivariable modelling was not always clear (Kreuzberger 2020). We included records if there was any hint of individual level predictions either verbally or by the measures they reported for the multivariable models. We considered mentioning the overall model performance measures (e.g. R2, Brier score), discrimination measures (e.g. Harrel’s c‐index, area under the receiver operating characteristic curve (AUC)), classification measures (e.g. sensitivity, accuracy), or the terms calibration or validation in the context of prognosis sufficient for being taken forward to full‐text screening. We excluded records that only reported effect estimates (e.g. hazard ratio, odds ratio), or performance measures for single factors or univariable models at the title/abstract level.

  • We applied exclusion based on the eligibility criterion of aiming to develop or validate prognostic prediction models only at the full‐text screening level in order to take into account the totality of reporting.

  • We excluded multivariable combinations other than statistically developed prognostic prediction models, such as diagnostic criteria or expert scoring rules, even when they were used for individual prognostic prediction. Despite their potential usefulness, the intentions behind their development are different.

  • Expecting prognostic prediction models to be based on statistical theory, we also excluded scores based on counts of prognostic factors selected via or simplified from multivariable models unless the full‐text report provided a reason for the simplification (e.g. all effect estimates being similar) or compared the prediction performance of the count score to the multivariable model generating it.

  • Our search also picked up records that reported prediction of treatment response either by multivariable models or scoring rules. We were aware that some of these reports were making static or dynamic prognostic predictions conditional on treatment (e.g. Kalincik 2017; Sormani 2013), rather than treatment response predictions (Kalincik 2018; Sormani 2017; Steyerberg 2018). We decided not to reinterpret the stated objective or the presented results and excluded such reports.

  • In order to assign a single agreed upon exclusion justification to a full‐text report that may fulfil multiple criteria, we used a hierarchy based on convenience of assessment. As higher‐level exclusion reasons based on the headings, we evaluated a study’s eligibility in the following order: study type or objective, population, outcome, model (intervention), and timing.

Data extraction and management

Pairs of review authors (KAR, BIO, AA, AG) independently extracted data from the included studies into a predefined, piloted electronic spreadsheet (see Appendix 2) based on the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist (Moons 2014) and TRIPOD guidelines (Collins 2015) and open disagreements were resolved jointly. If a study was associated with multiple reports and the data in them were inconsistent, we preferred to collect the data from:

  • journal article over other types of reports;

  • more recent journal article over an older one; and

  • main text of a journal article over its supplements/appendices.

Our data extraction form included the following items, with further explanation available in Appendix 3:

  • Article information (title, author, year, publication type).

  • Data sources (e.g. use of randomised trial/cohort/registry/case‐control data, primary/secondary data use).

  • Participants (e.g. inclusion/exclusion criteria, recruitment method, country, number of centres, setting, participant description, treatments received, MS subtype).

  • Outcomes (e.g. definitions and methods of measurement, categorisation into disability/relapse/conversion to clinically definite MS/conversion to SPMS/composite, duration of follow‐up or time of outcome assessment, blinding).

  • Candidate predictors (e.g. predictor definitions and method/timing of measurement, handling/transformations, categorisation into the following domains: demographics, symptoms, scores, CSF, imaging, electrophysiological, omics, environmental, non‐CSF samples, disease type, treatment, or other).

  • Sample size (e.g. number of participants, number of events, number of events per predictor).

  • Missing data (e.g. number of participants with missing predictor or outcome data, handling of missing data).

  • Model development (e.g. type of model, method for predictor consideration, model/predictor selection method, predictor selection criteria, tuning parameter details, data leakage prevention steps, shrinkage).

  • Model performance and evaluation (e.g. discrimination, calibration, and classification measures with standard errors or confidence intervals, internal or external validation).

  • Model presentation and interpretation (e.g. final models, alternative presentations, exploratory versus confirmatory research, comparison with other studies, generalisability, strengths, and limitations).

  • Factors related to model usability and reproducibility (sufficient explanation to allow for further use, skill and equipment specialisation required for predictor assessment, whether model/tool, code, and/or data were provided, whether absolute risks could be computed).

Assessment of reporting deficiencies

Deficiencies in methods and reporting in prognostic modelling studies are well‐known (Bouwmeester 2012; Brown 2020; Havas 2020; Kreuzberger 2020; Peat 2014). We described deficiencies in the MS prognostic modelling literature using Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guideline items (Collins 2015). We assessed 20 items from the Methods (source of data, participants, outcome, predictors (only for developments), sample size, missing data, statistical analysis methods (only for developments)), Results (participants, model development, model specification (only for developments), model performance), and Discussion (limitations) sections of the checklist provided in the guideline. We used the categories of reported, not reported, and unclearly/somehow reported. During this assessment, we only took into account the text present in the publications themselves or the publications explicitly referenced in them for data source, definitions, or methods (referenced as auxiliary references in Characteristics of included studies) and ignored the information provided by the study authors during follow‐up correspondence.

Assessment of risk of bias in included studies

We performed risk of bias assessments independently and in duplicate (KAR, BIO, AA, AG) using the Prediction Model Risk of Bias Assessment Tool (PROBAST) (Wolff 2019). The tool consists of signalling questions in four domains (participants, predictors, outcome, analysis), covering sources of possible bias due to data sources, definition or measurements of predictors and outcomes, sample size and analysis sets, model development, and model performance evaluation. We graded each domain as having low, high, or unclear risk of bias, which formed the basis for the overall risk of bias assessment (as described in Moons 2019). A third review author (KAR or BIO) reconciled the duplicate assessments and resolved any remaining disagreements at the item and model/validation level by joint discussion with the respective raters. When insufficient information was reported to allow for clear assessment, resulting in an unclear rating at the domain or study level, we contacted the study authors via email to request further information. In order to develop a common understanding of the form, two review authors (KAR, BIO) piloted the tool, discussed discrepancies in use, and agreed on rules for further use.

When multiple models were developed in a single study or development and external validation of a model were included in the same study, we assessed the quality of each model or external validation separately. We presented the risk of bias primarily at the analysis level in the Results and Discussion. However, in the Characteristics of included studies we presented the risk of bias at the study level for each domain. When domain‐level assessments differed across analyses within a single study, the domain was rated as the best within the domain at the study level and the differences per analyses were noted in the support for judgement.

In order to assess risk of bias, we needed to further refine the interpretation of the PROBAST items. The topics that required further refinement were related to the reporting in the literature, specifics of the disease area, and the application of the tool to studies employing non‐traditional prognostic modelling methods ‐ including ML and non‐binary outcomes. These issues were jointly discussed amongst review authors (BIO, KAR, HS, UM, UH, JH, JB) until consensus was reached. Additionally, this review was designed broadly in order to identify all prognostic prediction models of clinical outcomes in MS. This meant that studies may have been included that were not considered to be exactly applicable to the aim of the review, even though they met the selection criteria. We report our decisions regarding PROBAST interpretation and the assessment of applicability in Appendix 4.

Measures of association or predictive performance measures to be extracted

In our protocol we stated that predictor effect measures would be collected and standardised in order to describe changes in prognostic factors in the models over time (On Seker 2020). We extracted effect measures where possible; however, the variety of predictors and their definitions, in addition to the use of ML models for which effect measure reporting is unclear, made comparison of predictors based on effect measures impossible. Instead, we reported categories of predictors, both considered and included in the final models, and described changes in these categories over time (Differences between protocol and review).

We primarily extracted performance measures for discrimination and calibration, as well as their measures of uncertainty. Discrimination (e.g. c‐statistic, AUC, Harrel’s c‐index, Gonen and Heller’s concordance index, Royston and Sauerbrei’s D‐statistic) refers to a model’s ability to distinguish between participants developing and not developing the outcome of interest. We expected the c‐statistic (or equivalently the AUC) to be the most frequently reported measure of discrimination. It gives the proportion of randomly chosen pairs from the sample (one participant with the outcome and one without) in which the participant with the outcome has the higher predicted score/risk. A c‐statistic of 0.5 means that the model’s discriminative performance is no better than chance, while a value of 1.0 is considered perfect discrimination.

Calibration (e.g. calibration slope, calibration‐in‐the‐large, observed‐to‐expected (O:E) ratio) refers to the extent to which the expected outcomes and observed outcomes agree. We expected calibration to be reported infrequently and therefore focused on possible extraction of the O:E ratio, which is strongly related to calibration‐in‐the‐large and is an average across the range of predicted risks (Debray 2017). Values close to 1 indicate a well‐calibrated model overall; however, this does not rule out poor calibration in some subgroups.

We also extracted data on classification measures like sensitivity, specificity, predictive values, or accuracy. Such classification measures are based on categorisation of predicted probabilities at some cutoff. A cutoff may be predetermined based on clinical relevance, arbitrarily defined as the middle (0.5) of the theoretical range of probability (0–1), or calculated post hoc in a data‐driven manner.

The aforementioned performance measures can be evaluated in different contexts. Internal validation is the evaluation of a model’s performance in the same population used for development. External validation is the evaluation of a model’s performance in a population different from that of its development. The characteristics of the participants in a validation that make it external might be based on, for example, location (e.g. participants from different sites), time (e.g. temporal split of participants from a single site), or spectrum (e.g. participants with a different disease subtype or treatment status).

Dealing with missing data

We contacted the corresponding authors via email to request missing or unclear information required for study eligibility, basic study description, quantitative data synthesis, or risk of bias assessment. When the c‐statistic was provided without its standard error or confidence interval, we calculated its variance based on the combination of sample size and number of events, if available, according to the method of Hanley and McNeil and computed the corresponding confidence interval according to Newcombe and colleagues (Hanley 1982; Newcombe 2006 method 4).

Terms used for reporting

A clarification of the terms used to differentiate between the various levels at which we are reporting was necessary. We used the term 'record' to refer to the database entries we retrieved through databases and considered during title/abstract screening. All eligible records were associated with at least one scientific report, retrieved for full‐text screening mostly from the publishers but also via contacting the study authors or searching the Internet. A comprehensive prognostic prediction exercise with a clear goal performed by the same set of authors was called a 'study'. A single study may be associated with more than one 'report'. A report may be one of the following types, ordered according to the amount of information they contain: journal article, preprint article, dissertation, poster or video presentation, conference abstract.

The main unit of interest in this review was called the 'analysis'. A single study (or report) may contain many analyses: multiple model developments or validations with different outcomes, timing, predictor sets, modelling methods, or participant subsets. Analyses may be of the type model 'development' or model 'validation'. Model developments are interchangeably referred to as models or developments.

Several studies we included reported results from development of more than one prognostic model, but only a subset of these studies aimed to present multiple final prognostic models. When multiple models in a study were reported in an almost equivalent manner without any indication of a preferred one, we included all the models in our review. We extracted the data for each model separately and presented them individually. This decision is motivated by our aim of reviewing all prognostic models with potential clinical meaning in the disease area of MS.

When the reporting in an eligible study with multiple models indicated a preferred or selected model, we included only that model in our review. Model preference was communicated either directly (e.g. by discussing the superiority of one amongst the competing models) or indirectly (e.g. by using a bold font for select results or presenting figures for a single model) by those studies’ authors. The other models were considered to be by‐products of the modelling process and not meant to be presented as final models. We always reported all validations of included models as separate analyses.

For the purpose of clear reporting we made a distinction between internal, external, and author‐reported validations. 'Internal validation' is the evaluation method directly relevant to the analyses of the type model development and thus reported in that context. For calling a validation external, we expected loyal evaluation of a developed model in an independent set of participants. Even though authors who developed prognostic models and reported model evaluation measures using a different set of participants may have referred to their activities as "validation", we refrained from calling them 'external validation' if the set of participants was not independent of the development set, if the new set of participants was only used for model refitting, or if the model was improperly changed, e.g. predictors dropped without statistical re‐estimation. These exceptional cases are referred to as other 'author‐reported validations'. We always reported all validations of included models as separate analyses. For the description of the overall literature evaluating prediction performance using a separate set of participants, we referred to the analyses of external validations and other author‐reported validations altogether as validations unless a differentiation between them was deemed necessary. For example, when reporting or discussing clinical readiness, we concentrated only on external validations.

To differentiate between multiple analyses from a single study, we referred to them first by the study name (e.g. Zhao 2020). If there were multiple models included from a single study, these models were differentiated from each other by the name/abbreviation authors have used or by a reference to what separates the models included from that study (e.g. the modelling method and considered set of predictors in Zhao 2020 XGB Common). Finally, if a model had validation other than an internal one, we differentiated these separate analyses by adding 'Dev' for development, 'Ext Val' for external validations, and 'Val' for other author‐reported validations (e.g. Zhao 2020 XGB Common Val).

Data synthesis

This broad review intended to identify all prognostic models of clinical disease progression, worsening, and activity across all types of MS. We expected to identify numerous model development studies, but only few external validation studies overall. As per the protocol, we summarised all identified multivariable models and prognostic factors included in these models in narrative, graphical, and tabular formats. We had planned to apply methods to derive missing performance measures (the c‐statistic for discrimination and the O:E ratio for calibration) (Debray 2019); however, this was not possible due to limited reporting of alternative discrimination measures, descriptions of the linear predictor distribution, and the expected number of events. We did not meta‐analyse prognostic model performance statistics for single models externally validated in several independent samples because no single model had at least three independent external validation studies outside its development study. We also could not perform meta‐regression due to heterogeneity in predictor and outcome definitions and the low number of studies with reported or derivable performance measures for a single outcome. Please see Differences between protocol and review for details.

Investigation of sources of heterogeneity between studies

We expected to find substantial heterogeneity, as diagnostic criteria for MS subtypes and available treatment options, as well as the technology used to assess disease activity, have evolved over time. Heterogeneity is also typically high in prognostic studies. We expected heterogeneity both between development studies and their corresponding validation studies for specific models and also between different development models for the same outcome. Potential sources of heterogeneity related to either or both of these include:

  • case mix (e.g. age, sex, disease duration, treatment status);

  • study design (e.g. follow‐up time, source of data, outcome definitions and prognostic factors); and

  • statistical analysis methods and reporting (e.g. number of prognostic factors included, traditional statistics versus ML, risk of bias, validation methods).

We aimed to extract relevant information and to include a narrative summary of these potential sources of heterogeneity. We had planned to further investigate heterogeneity statistically using I2 calculation and meta‐regression for random‐effects models for external validation performance measures of single models; however, as stated earlier, this was not possible (see Differences between protocol and review). Instead, as also discussed in the protocol, we discussed these potential sources of heterogeneity in the narrative text, and with tables and figures, as far as possible.

Synthesis

For synthesis, we used median, interquartile range (IQR), or range to describe the quantitative measures reported for the included analyses. These consisted of participant characteristics (age, sex, disease duration, study timing), predictors (number considered, number included), sample size (number of participants, number of events), and performance measures (c‐statistic). The sample in some analyses belonging to the same study or using the same data source might be overlapping or even the same. However, we ignored this correlation for reporting purposes due to the facts that (1) the description is aimed to give a sense of the state of the literature rather than to give precise estimates, (2) no study or data source has excessive undue influence over these measures, and (3) it is impossible to discern the extent of overlap between analyses utilising the same data source.

Tables were organised by model outcome, which aligns with the diagnostic subtype. Model outcomes were categorised into five groups: disability, relapse, conversion to definite MS, conversion to progressive MS, and composite outcomes. The models were summarised over several tables presenting certain aspects: study characteristics, participant characteristics, predictor domains, number of predictors, model development and validation, final model presentation, reporting items, and usability. External validation information was also included in these tables, where appropriate. Figures were organised by model outcome, development versus validation, or algorithm type, where appropriate. Algorithm type was categorised into two groups: traditional statistics and ML.

We used the statistical programming software R (version 4.0.1) and the following packages for all analyses: tidyverse (1.3.0), dmetar (0.0.9000).

Conclusions and summary of findings

A GRADE framework adaptation specific to prognostic model research, rather than general prognosis (Iorio 2015) or prognostic factor research (Foroutan 2020) is still a topic of future work, hence, we did not apply GRADE to our conclusions. Our conclusions highlighted the biases in the current literature, the usability of the currently available models, and areas in need of improved reporting. We also made recommendations for future research.

Results

In this section, we start by reporting the results from the search and screening process, including the reasons for exclusion. This is followed by an in‐depth review of the models with more than one external validation. Then, we describe the data extracted from all the included studies, and the respective analyses as appropriate, in the order of the CHARMS checklist (Moons 2014): data source, participants, outcomes, predictors, sample size and missing data, model development, model performance and evaluation, model presentation, and interpretation. We finalise this section with our assessment of the analyses based on the extracted data: usability and reproducibility, risk of bias, and reporting deficiencies.

Description of studies

Results of the search

We identified 13,046 records via our database search (4757 from MEDLINE, 7706 from Embase, and 583 from the Cochrane Library ‐ search updated on 2 July 2021), as summarised in Figure 1. Our backward and forward citation tracking of the included studies and reviews on MS prognosis identified during title/abstract screening resulted in an additional 4727 records. Contact with the authors of included studies led us to a further 23 suggested records related to the topic. After deduplication of the 17,796 records from all sources, we screened the titles/abstracts of 12,258 unique records, of which 261 were found eligible for full‐text retrieval. We identified an additional 48 reports of the types preprint article, dissertation, and poster or video presentations related to the conference abstracts via searching the Internet or contacting the abstract authors. In total, we assessed 309 full‐text reports for eligibility.

1.

1

Flow diagram based on PRISMA 2020 guideline

At the full‐text screening stage, we excluded 180 reports (see Excluded studies). Furthermore, 21 reports of 11 studies were conference abstracts or presentations without any associated full‐text publication. Despite attempts to contact the authors for more information, a final judgement on eligibility could not be reached for an additional eight reports corresponding to six studies due to limited information (see Characteristics of studies awaiting classification). Thus, we included 100 reports corresponding to 57 studies in our review. Bergamaschi 2001 did not report any predictions for individuals and Weinshenker 1991 was published before the dates covered by our search algorithm. These two studies were nevertheless included in our review due to the fact that they described the development of models that were validated in future eligible studies (Bergamaschi 2007; Bergamaschi 2015; Weinshenker 1996).

Excluded studies

We excluded 180 reports after full‐text screening for the following reasons, listed according to our hierarchy of exclusion reasons at the first level and decreasing number of studies at the second level.

  • Wrong study type (113)

    • 112 records that do not aim to develop or validate prognostic models

    • 1 record that is not an original study but a review

  • Wrong population (3)

    • 3 records with prognostication applied to people without a diagnosis of MS

  • Wrong outcome (6)

    • 6 records with outcomes other than disability, relapses, or conversion to a more advanced disease subtype

  • Wrong model (43)

    • 13 records using multivariable combinations not derived from statistical prognostic models (e.g. diagnostic criteria, scoring rules)

    • 12 records predicting treatment response

    • 10 records that do not perform individual level predictions

    • 8 records containing predictions from a model not multivariable in nature

  • Wrong timing (15)

    • 15 records predicting concurrent or cross‐sectional outcomes

A representative selection of the excluded studies, with detailed reasons, is available in the section Characteristics of excluded studies.

Included studies

Of the 57 studies included in this review, 42 (74%) reported prognostic model development only (Aghdam 2021; Agosta 2006; Bendfeldt 2019; Bergamaschi 2001; Borras 2016; Brichetto 2020; De Brouwer 2021; de Groot 2009; Gout 2011; Kosa 2022; Kuceyeski 2018; Law 2019; Margaritella 2012; Martinelli 2017; Misicka 2020; Montolio 2021; Olesen 2019; Oprea 2020; Pellegrini 2019; Pinto 2020; Pisani 2021; Roca 2020; Rocca 2017; Rovaris 2006; Runia 2014; Seccia 2020; Skoog 2014; Sombekke 2010; Spelman 2017; Szilasiová 2020; Tacchella 2018; Tommasin 2021; Tousignant 2019; Vukusic 2004; Weinshenker 1991; Wottschel 2015; Wottschel 2019; Ye 2020; Yoo 2019; Yperman 2020; Zakharov 2013; Zhang 2019), and eight (14%) reported both development and external validation of prognostic models (Ahuja 2021; Calabrese 2013; Lejeune 2021; Malpas 2020; Mandrioli 2008; Manouchehrinia 2019; Sormani 2007; Vasconcelos 2020). Bergamaschi 2007 reported an external validation of a previously developed but not evaluated model (Bergamaschi 2001). Bejarano 2011 and probably Zhao 2020 replicated the modelling process in an independent set of participants instead of evaluating the final model derived from the development set. Hence, these two studies are considered to have reported development and other author‐reported validation of prognostic models.

The remaining four studies (7%) were a combination of the aforementioned types: the model developed in Bergamaschi 2001 (called Bayesian Risk Estimate for Multiple Sclerosis (BREMS) by its authors) was both externally validated and, after dropping of post‐onset predictors without a statistical justification (called BREMS onset (BREMSO) by its authors), validated for the original and a new outcome in Bergamaschi 2015. Gurevich 2009 developed two models of interest (called First Level Predictor (FLP) and Fine Tuning Predictor (FTP) by their authors) but externally validated only one of them (FLP). In Skoog 2019, a previously developed model (Skoog 2014) was both further internally evaluated in a subset of the development cohort and validated externally. Finally, Weinshenker 1996 reported both an external validation of a previously developed model (called Model 3 by its authors in Weinshenker 1991) and the development of a new model (short‐term outcome).

We contacted the authors of all 57 included studies to obtain missing information or for clarification of the reported information. No response was received for 21 (37%) studies. Response was received but without further information or clarification from the six (10%) contacts. Authors of the remaining 30 studies (53%) provided further details and clarifications.

Models with more than one external validation

We identified two models with more than one external validation (Table 1), both originally developed to predict time to conversion to progressive MS: the BREMS score (Bergamaschi 2001) and the survival model of Manouchehrinia 2019.

BREMS score

The BREMS score was developed using clinical data from 186 people with RRMS seen at a single MS clinic in Italy until December 1997. The mean follow‐up time was 7.5 years, ranging from 3 to 25 years. Bergamaschi and colleagues defined the time of onset of the secondary progressive phase as the earliest date of observation of a progressive worsening severe enough to induce an increase of at least one EDSS point and which persisted for at least six months after the progression onset. This score was developed using Bayesian methods to jointly model relapse, Kurtzke’s Functional Systems scores, and EDSS until the primary time to SPMS conversion outcome. The presented sum score contained nine predictors: age at onset, female sex, sphincter onset, pure motor onset, motor‐sensory onset, sequelae after onset, number of involved functional systems at onset, number of sphincter plus motor relapses, and EDSS greater than four outside relapse. This model was not internally validated in the development study, but was externally validated in two further studies (Bergamaschi 2007; Bergamaschi 2015). Additionally, the model was updated by dropping two predictors not available at onset (number of sphincter plus motor relapses and EDSS greater than four outside relapse) and renamed BREMSO (Bergamaschi 2015). This update did not, however, consist of a refitting of the model using the subset of original predictors, but rather presented the original coefficients without the two dropped ones. This updated model was evaluated for prediction of the original SPMS conversion outcome as well as for severe MS defined using the Multiple Sclerosis Severity Score. Because the BREMS model was only externally validated twice and no measures of discrimination or calibration were reported, we did not perform meta‐analysis for summarising the performance of this model.

Manouchehrinia 2019

Manouchehrinia and colleagues developed their model using data from 8825 participants with RRMS seen up until May 2016 in the Swedish national MS registry. The mean (standard deviation (SD)) follow‐up time was 12.5 (8.7) years. In this parametric survival model, time to SPMS was defined as the earliest recognised date of SPMS onset determined by a neurologist at a routine clinic visit according to the Lublin 1996 criteria. The model was presented as a nomogram for computing 10‐, 15‐, and 20‐year conversion probability and additionally as a web application (https://aliman.shinyapps.io/SPMSnom/). The final model included five predictors: year of birth, age at onset, sex, first EDSS, and age at first EDSS.

This model was internally validated using the bootstrap method, with both calibration and discrimination assessed. In the same publication as the model development study, it was reported that model validation was also performed using three external multi‐site datasets addressing temporal, geographic, and spectrum transportability. The British Columbia MS Cohort provided 3967 participants diagnosed with RRMS according to Poser 1983, who were enrolled between January 1980 and December 2004 and followed up for an average of 13.8 (SD 8.4) years. The second external validation analysis was performed using the 175 participants from the ACROSS (A CROSS‐Sectional Long‐term Follow‐up of Fingolimod Phase II Study Patients) randomised placebo‐controlled phase 2 trial of the disease‐modifying therapy fingolimod who returned for assessment at 10 years. The third external validation analysis used 2355 participants from the long‐term follow‐up extension study of the phase 3 trials FREEDOMS (FTY720 Research Evaluating Effects of Daily Oral therapy in Multiple Sclerosis) and FREEDOMS II, which also assessed fingolimod. RRMS diagnosis was made using the McDonald 2001 and 2005 criteria (Polman 2005) and mean follow‐up time was 18.6 (SD 7.9) years and 14 (SD 7.8) years, respectively, in ACROSS and FREEDOMS validation analyses.

The model development and external validations in Manouchehrinia 2019 were all found to be at high risk of bias. In the development, people with MS were included from registry data based on availability of EDSS score. It was unclear how standard the data collection was or whether the included sample may have differed from the general population with MS. The outcome, conversion to SPMS, was based on the Lublin 1996 criteria, which we considered to be subjective. The combination of retrospective use of registry data and a subjective outcome increases the risk of bias. During analysis, only complete cases were used, and it was unclear in which subset of participants the backward selection of predictors took place. This was followed by internal validation, which did not include the predictor selection process.

Age at onset, age at first recorded EDSS score, and the first recorded EDSS value were amongst the candidate predictors remaining in the final model. What is meant by 'onset' is not clearly defined here; however, in this review, 'onset' was liberally defined to include up to one year after onset. Given that the model was to be applied at onset, age at onset and age at first EDSS score should have been similar, if not equivalent. However, first EDSS assessment occurred on average 6.5 years after onset, which meant that the development included predictors available only after prediction model application. This use of unavailable predictors is even clearer here than in other analyses because of the use of survival analysis, in which the start and stop of time is explicitly defined. Although EDSS may be available at onset in general, the estimated model would probably change due to a different range of EDSS scores actually seen at onset. The inclusion of predictors unavailable at the time of model application makes the model unusable, as the predictor should be unavailable when applying the model to a future patient by definition. It also inflates model performance, because the predictor measured more closely in time to the outcome is probably more strongly associated with the outcome (Moons 2019).

We rated the external validations at high risk of bias for similar reasons. Inclusion in the British Columbia validation analysis was based on data availability, instead of employing multiple imputation of missing values, and the observation frequency for outcome measurement varied across participants. Only participants with complete follow‐up were included in the ACROSS validation analysis, even though a time‐to‐event analysis capable of dealing with censoring was employed, and this analysis included only 26 events, well below the recommended 100 events for validation studies. Only the FREEDOMS analysis used a clear definition of time of conversion to SPMS based on increased EDSS for at least six months. The three validation analyses assessed discrimination using Harrel’s c‐statistic, but did not assess calibration, which is valuable to assess in external samples, not just the development set.

Although the Manouchehrinia 2019 model was evaluated using three external datasets, all three of these validations were conducted by the same study team within the development publication. Sources of confusion related to timing in model development were propagated across all the validation analyses. We therefore did not consider these evaluations to be independent external validation studies and decided against performing a meta‐analysis.

Characteristics of included models

In total, we extracted data from 75 models developed in 54 studies (see Appendix 5 for details). Of these, 35 (47%) models were developed using traditional statistical methods and the remaining using ML methods. Of the studies that developed models, 42 (78%) contributed one model each, four (7%) contributed two models each (Gurevich 2009; Olesen 2019; Wottschel 2015; Ye 2020), seven (13%) contributed three models each (Bendfeldt 2019; de Groot 2009; Law 2019; Misicka 2020; Pinto 2020; Seccia 2020; Tacchella 2018), and Zhao 2020 contributed four models.

In those 12 studies from which multiple model developments were included, the primary difference between the models was timing of the outcome measurement in five (42%) (Misicka 2020; Pinto 2020; Seccia 2020; Tacchella 2018; Wottschel 2015), modelling method in three (25%) (Gurevich 2009; Law 2019; Zhao 2020), outcome in two (17%) (de Groot 2009; Pinto 2020), considered predictors in four (33%) (Bendfeldt 2019; Olesen 2019; Ye 2020; Zhao 2020), and participant subset in one study (Bendfeldt 2019).

We extracted data from 21 external or author‐reported validations in 15 studies. Of these studies, 11 (73%) contained one validation (Ahuja 2021; Bejarano 2011; Bergamaschi 2007; Calabrese 2013; Gurevich 2009; Lejeune 2021; Malpas 2020; Mandrioli 2008; Sormani 2007; Vasconcelos 2020; Weinshenker 1996), two (13%) contained two validations (Skoog 2019; Zhao 2020), and two (13%) contained three validations (Bergamaschi 2015; Manouchehrinia 2019).

Of all validations, 15 (71%) were external validations of 12 models (16%): 10 were externally validated once (Ahuja 2021 Dev; Calabrese 2013 Dev; Gurevich 2009 FLP Dev; Lejeune 2021 Dev; Malpas 2020 Dev; Mandrioli 2008 Dev; Skoog 2014 Dev; Sormani 2007 Dev; Vasconcelos 2020 Dev; Weinshenker 1991 M3 Dev), the model Bergamaschi 2001 BREMS Dev was externally validated twice in studies separate from the development but by the same research team, and the model Manouchehrinia 2019 Dev was externally validated three times in the same study of its development. The remaining six (29%) were other author‐reported validations (Bejarano 2011 Val; Bergamaschi 2015 BREMSO MSSS Val; Bergamaschi 2015 BREMSO SP Val; Skoog 2019 Val; Zhao 2020 LGBM Common Val; Zhao 2020 XGB Common Val). None of the validations were performed by researchers independent of that model’s development team.

Our main sources for data extraction were journal articles for 55 (96%) studies, a dissertation for Runia 2014, and a conference proceeding for Tousignant 2019. The number of published prognostic model studies, and the analyses (both model developments and validations) they contain, have greatly increased during the last couple of years (see Figure 2 left). Before 2001, two studies containing three analyses were published, whereas 36 studies containing 63 analyses have been published after 2015. Yet, there seems to be no discernible time‐trend in the number of published validations relative to the number of published developments. Recently, there has been an increase in the popularity of ML methods for prognostic prediction model development (see Figure 2 right).

2.

2

Publication characteristics by year. Left: number of included studies (black outline) and the model developments (blue)/validations (orange) they contain by year of publication; right: number of included developments using traditional (dark blue) or machine learning (yellow) methods by year of publication. Data for the year 2021 is incomplete (only until July). ML: machine learning.

Data source

As a data source for model development or validation, 37 (39%) analyses used cohort studies, 18 (19%) analyses used routine care sources, 14 (15%) analyses used randomised trial participants, 13 (14%) analyses used disease registries. Four (4%) analyses used a combination of these: Ahuja 2021 used cohort study and routine care data (electronic health records), Kosa 2022 used cohort study and case‐control study data, Bergamaschi 2001 used registry and routine care data, and Kuceyeski 2018 used cohort study, registry, and routine care data. The source of data was not reported or unclear for 10 (10%) analyses (Gurevich 2009 FLP Dev; Gurevich 2009 FLP Ext Val; Gurevich 2009 FTP; Sombekke 2010; Tommasin 2021; Vasconcelos 2020 Dev; Vasconcelos 2020 Ext Val; Ye 2020 gene signature; Ye 2020 nomogram; Zakharov 2013).

The data used in the analyses were collected to conduct prognostic research in 27 (28%) analyses. In 61 (64%) analyses, data use was secondary, i.e. the data were collected for other reasons but were then repurposed. The data collection purpose for the remaining eight (8%) analyses was either unclear or not reported (Borras 2016; Gurevich 2009 FLP Dev; Gurevich 2009 FLP Ext Val; Gurevich 2009 FTP; Martinelli 2017; Vasconcelos 2020 Dev; Vasconcelos 2020 Ext Val; Zakharov 2013).

Of the 21 validations, the participants in the validation were different from those in the model development in terms of location in six (29%) (Bejarano 2011 Val; Bergamaschi 2007 BREMS Ext Val; Malpas 2020 Ext Val; Weinshenker 1996 M3 Ext Val; Zhao 2020 LGBM Common Val; Zhao 2020 XGB Common Val), in terms of time in three (14%) (Calabrese 2013 Ext Val; Mandrioli 2008 Ext Val; Vasconcelos 2020 Ext Val), and in terms of patient spectrum in two (10%) (Ahuja 2021 Ext Val; Sormani 2007 Ext Val). The difference was in multiple dimensions in eight (38%) validations: in terms of both location and time in Bergamaschi 2015 BREMS Ext Val; Bergamaschi 2015 BREMSO MSSS Val; Bergamaschi 2015 BREMSO SP Val; Skoog 2019 Ext Val, in terms of location, time, and spectrum in Manouchehrinia 2019 Ext Val 1; Manouchehrinia 2019 Ext Val 2; Manouchehrinia 2019 Ext Val 3, and in terms of location and spectrum in Lejeune 2021 Ext Val. The difference between the validation cohort and the derivation cohort was not explicitly reported in Gurevich 2009 FLP Ext Val and Skoog 2019 Val is a further evaluation of the model in a subset of the derivation cohort.

Participants

The participants were recruited from a single site in 54 (56%) and from multiple sites in 40 (42%) analyses; the number of sites was not reported for the remaining analyses. Of those 90 (94%) analyses for which country of participant recruitment could be extracted, recruitment was from centres in Europe in 68 (76%), in North America in 28 (31%), in Asia in 13 (14%), in South America in seven (8%), in Oceania in five (6%), and in Africa (country South Africa) in a single analysis (Pellegrini 2019).

In 86 (90%) analyses for which summary statistics on sex could be extracted (including studies, e.g. Zhao 2020, which did not report characteristics of the included sample but led to references on the source population), the proportion of females ranged from 50% (Rocca 2017; Rovaris 2006) to 100% (Vukusic 2004) with a median (IQR) of 69% (65% to 73%). The distribution of sex in included analyses did not seem to vary by category of the predicted outcome (Figure 3 top left).

3.

3

Participant characteristics in included analyses by outcome. Top left: percentage of females; middle left: measure of centre of disease duration in years; bottom left: measure of centre of age in years as reported at disease onset or at the time of analysis; top right: diagnostic criteria by publication year per outcome; middle right: diagnostic subtype by publication year; bottom right: percent treated by publication year measured at baseline or during follow‐up. Data for the year 2021 are incomplete (only until July). CDMS: conversion to clinically definite MS; CPMS: conversion to progressive MS.

The reported measure of centre (i.e. mean or median) for participant disease duration in 54 (56%) analyses ranged from 0.1 years in participants with a diagnosis of 100% CIS in Wottschel 2015 to 19 years in participants with a diagnosis of 100% RRMS in Seccia 2020. As expected, participants included in the analyses of conversion to progressive MS had been living with MS diagnosis for a longer time compared to those included in the analyses of conversion to definite MS, who had their first symptoms very recently (Figure 3 middle left).

In 87 (91%) analyses that reported age, the reported measure of centre ranged from 24.8 years (Bergamaschi 2007 BREMS Ext Val) to 51.3 years (Rocca 2017; Rovaris 2006). Included in this summary are 40 (42%) analyses that did not specify the time point of measurement, 32 (33%) analyses that reported age at disease onset, and 15 (16%) analyses that reported age at an unclear time or for the source population. The participants were older at the time of those analyses with disability related or composite outcomes as opposed to relapse or diagnostic conversion outcomes. There was no evidence of variance in the distribution of age at onset by category of predicted outcome (Figure 3 bottom left).

Of those 84 (88%) analyses that clearly reported the diagnostic subtype of the included participants, participants of a single subtype were recruited in 62 (74%): CIS in 17 (20%), RRMS in 40 (48%), PPMS in two (Rocca 2017; Rovaris 2006) analyses, and SPMS in three models from a single study (Law 2019 Ada; Law 2019 DT; Law 2019 RF). Participants with a mixture of the aforementioned diagnoses were included in eight (10%) analyses (Agosta 2006; Bejarano 2011 Val; Kosa 2022; Montolio 2021; Sombekke 2010; Szilasiová 2020; Vukusic 2004; Yperman 2020). The remaining 14 (17%) used a different diagnostic subtyping in describing the mixture of their participants. The models developed in participants with primary or secondary progressive subtypes were predicting disability outcomes. As expected, all models predicting conversion to definite MS were developed in participants with CIS and all models predicting conversion to progressive MS were developed in participants with RRMS (Figure 3 middle right).

Of those 68 (71%) analyses that clearly reported the diagnostic criteria at recruitment, 13 (19%) used a mixture of different criteria: 18 (26%) used Poser 1983, two (3%) used Thompson 2000, 41 (60%) used one or more versions of the McDonald criteria (17 used 2001 (McDonald 2001), 11 used 2005 (Polman 2005), 10 used 2010 (Polman 2011), six used 2017 (Thompson 2018b), and three used an unspecified version), seven (10%) analyses used their own definition (Bendfeldt 2019 Linear Placebo; Bendfeldt 2019 M7 Placebo; Bendfeldt 2019 M9 IFN; Law 2019 Ada; Law 2019 DT; Law 2019 RF; Runia 2014), and Olesen 2019 used criteria other than mentioned above (Optic Neuritis Study Group 1991). The changing diagnostic criteria were reflected in the diversification of the criteria over time (Figure 3 top right). Although newer criteria are increasingly used, some studies published after 2015 were conducted in participants diagnosed with McDonald 2001 (Manouchehrinia 2019 Ext Val 2; Montolio 2021; Pellegrini 2019; Szilasiová 2020; Vasconcelos 2020 Dev; Vasconcelos 2020 Ext Val; Ye 2020 gene signature; Ye 2020 nomogram) or even Poser 1983 (Manouchehrinia 2019 Ext Val 1; Skoog 2019 Ext Val; Skoog 2019 Val; Spelman 2017; Vasconcelos 2020 Dev; Vasconcelos 2020 Ext Val).

In the 45 (47%) analyses with clear reporting, the proportion of participants on treatment at recruitment ranged from 0% in 28 analyses to 100% (Calabrese 2013 Dev; Calabrese 2013 Ext Val; Pisani 2021) with a median (IQR) of 0% (0% to 10%) and at least one participant was on treatment at recruitment in 17 (38%) analyses. In the 37 (39%) analyses with clear reporting, the proportion of participants on treatment during follow‐up ranged from 0% in 11 analyses to 100% (Bendfeldt 2019 M9 IFN; Calabrese 2013 Dev; Calabrese 2013 Ext Val; Manouchehrinia 2019 Ext Val 2; Manouchehrinia 2019 Ext Val 3; Oprea 2020; Pisani 2021; Szilasiová 2020), with a median (IQR) of 35% (0% to 68%). When the analyses reporting the treatment or its timing unclearly are included, the median (IQR, range) proportion of participants receiving treatment during follow‐up becomes 50% (12% to 73%, 0% to 100%) in 58 analyses, of which at least one participant was on treatment in 47 (81%). As expected, the proportion of participants receiving treatment is higher during follow‐up than at recruitment. This trend is especially visible in the analyses published during the last 15 years. Regardless of the time point of measurement, the proportion treated is increasing as publications become more recent (Figure 3 bottom right). It should be noted that some analyses were conducted on data from RCT arms, which partially explain percentages of 0% and 100% treated during follow‐up.

Year of observation start ranged from 1972 (Weinshenker 1991 M3 Dev) to 2014 (Brichetto 2020; Olesen 2019 Candidate; Olesen 2019 Routine) with a median of 2003 in 55 (57%) analyses clearly reporting when data collection or recruitment started. Year of observation end ranged from 1984 (Weinshenker 1991 M3 Dev) to 2021 (Kosa 2022) with a median of 2013 in 50 (52%) analyses clearly reporting when data collection or recruitment ended.

In 46 (48%) analyses that clearly reported both of the items, the median (IQR, range) duration of data collection was 7 (3 to 12, 0 to 33) years.

Outcomes

Although definitions in individual analyses might differ, we categorised the outcomes into one of the following domains in line with our PICOTS: disability, relapse, conversion to clinically definite MS, and conversion to progressive MS. Composite outcomes containing any one of the above were also included and categorised separately.

Disability progression

Of the 96 analyses, 31 model developments and eight validations (41%) defined outcomes related to disability progression. Most of these, 33 (85%), operationalised it by using EDSS, two by DSS (Weinshenker 1991 M3 Dev; Weinshenker 1996 M3 Ext Val), and two by MS Severity Score (MSSS) (Bergamaschi 2015 BREMSO MSSS Val; Sombekke 2010), a measure derived from EDSS. The most common outcome definition based on EDSS in nine analyses was disability progression or clinical worsening (sometimes confirmed, sometimes not) based on an increase in EDSS (at least 0.5 increase if EDSS < 6 and at least 1 point increase if EDSS > 5.5). Other outcomes defined by different levels of or just change in EDSS included aggressive disease, severe MS, worsening, and residual disability after relapse. Apart from (E)DSS based outcomes, two analyses used other measures of disability: Kuceyeski 2018 used the SDMT to measure cognitive disability and de Groot 2009 Dexterity used the 9‐Hole Peg Test (9‐HPT). Many of the analyses with outcomes based on disability, 22 (56%), were in participants with a mixture of diagnostic subtypes, 11 (28%) were only in RRMS participants, three (8%) were only in SPMS participants (Law 2019 Ada; Law 2019 DT; Law 2019 RF), two (5%) were only in PPMS participants (Rocca 2017; Rovaris 2006), and Roca 2020 did not report the diagnostic subtype of the participants. In those analyses that defined the timing of measurement, disease progression was measured at the earliest six months (as residual disability after relapse in Lejeune 2021), and at the latest 15 years (as EDSS score ≥ 5.0 in Szilasiová 2020) after intended time of prognostication. In those analyses that did not specify the timing of measurement or had time‐to‐event outcomes, either the follow‐up or the time of outcome occurrence were described as being at the earliest 5.25 years (as clinically worsened in Rovaris 2006) and at the latest 55 years (as mild MS in Bergamaschi 2015) after the intended time of prognostication.

Relapse

Six model developments and two validations (8%) defined outcomes based on relapses: the model developed in Sormani 2007 and its external validation were in participants with RRMS diagnosis (Sormani 2007 Dev; Sormani 2007 Ext Val), the dataset that was used for the models in Gurevich 2009 and Ye 2020 was composed of a mixture of participants with CIS and clinically definite MS (Gurevich 2009 FLP Dev; Gurevich 2009 FLP Ext Val; Gurevich 2009 FTP; Ye 2020 gene signature; Ye 2020 nomogram), and the model in Vukusic 2004 was developed in a mixture of participants with RRMS and SPMS. The relapse outcome definition in Vukusic 2004 had time of measurement: three months after the intended time of prognostication, which was child delivery. Relapse was conceptualised as a time‐to‐event outcome in the analyses other than Vukusic 2004 and the follow‐up was described as being at the earliest 10 months (Sormani 2007 Dev) and at the latest 16 months (Sormani 2007 Val) after the intended time of prognostication.

Conversion to a more advanced disease subtype

Seventeen (18%) model developments defined outcomes of conversion to definite MS in participants with CIS. When defining the definite MS, five (29%) of these referred to McDonald 2010 (Polman 2011) (Aghdam 2021; Olesen 2019 Candidate; Olesen 2019 Routine; Zakharov 2013; Zhang 2019), Yoo 2019 referred to McDonald 2005 (Polman 2005), four (24%) referred to Poser 1983 (Gout 2011; Martinelli 2017; Runia 2014; Spelman 2017), and Bendfeldt 2019 referred to a modified Poser criteria (Bakshi 2005). From the remaining analyses, Borras 2016 provided a definition of definite MS that included Barkhof criteria, whereas no criteria were cited in Wottschel 2015 or Wottschel 2019. In those analyses that defined the timing of measurement, conversion to definite MS was measured at the earliest at one year (Wottschel 2015 one year; Wottschel 2019) and at the latest three years (Wottschel 2015 three years; Zhang 2019) after intended prognostication. In those analyses that did not specify the timing of measurement or had time‐to‐event outcomes, either the follow‐up or the time of outcome occurrence were described as being at the earliest 3.4 years (Olesen 2019) and at the latest 12.7 years (Gout 2011).

Seventeen model developments and 10 validations (28%) defined outcomes of conversion to progressive MS in participants with RRMS, except for Brichetto 2020 in which the diagnostic subtype of the model development population is unclear. When describing the secondary progression outcome, seven (26%) of these analyses referred to Lublin 1996 (Manouchehrinia 2019 Dev; Manouchehrinia 2019 Ext Val 1; Manouchehrinia 2019 Ext Val 2; Pisani 2021; Skoog 2014 Dev; Skoog 2019 Ext Val; Skoog 2019 Val), 12 (44%) did not cite any criteria but provided an outcome definition based on EDSS, and eight (30%) neither cited criteria nor provided an operationalised definition of the outcome (Brichetto 2020; Misicka 2020 10 years; Misicka 2020 20 years; Misicka 2020 Ever; Pinto 2020 SP; Seccia 2020 180 days; Seccia 2020 360 days; Seccia 2020 720 days). In those analyses that defined the timing of measurement, conversion to progressive MS was measured at the earliest at six months (Seccia 2020 180 days; Tacchella 2018 180 days) and at the latest five years (Calabrese 2013) after the intended time of prognostication. In those analyses that did not specify the timing of measurement or had time‐to‐event outcomes, either the follow‐up or the time of outcome occurrence were described as being at the earliest 10 years (Misicka 2020 10 years) and at the latest 56.7 years (Skoog 2014) after the intended time of prognostication.

Composite

Finally, the remaining four model developments and one external validation (5%) had composite outcomes. Ahuja 2021 defined a relapse outcome as a clinical and/or a radiological event at one year in a participant group of mixed diagnostic subtypes. Kosa 2022 defined a model‐based outcome that included disability and imaging components in a participant group of mixed diagnostic subtypes, with no clear timing of measurement. de Groot 2009 defined a cognitive disability outcome within three years based on multiple clinical test results in a participant group of mixed diagnostic subtypes. Pellegrini 2019 defined a disability outcome based on multiple clinical scale or test results measured within two years in people with RRMS.

The number of developments or validations for different outcome types by publication year are shown in Figure 4. There seems to be an increased interest in publishing models developed to predict diagnostic conversion to a more advanced disease state (definite MS or progressive MS) during the last decade. This might be related to the changing diagnostic criteria and a willingness to predict the newly established criteria. Interestingly, while the models predicting conversion to progressive MS were validated the most in terms of their relative frequency, there were no validations of models predicting conversion to definite MS.

4.

4

Outcomes in included analyses by year of publication. Left: categories in model developments; right: categories in model validations. Data for the year 2021 are incomplete (only until July). CDMS: conversion to clinically definite MS; CPMS: conversion to progressive MS.

Predictors
Predictor domains

Of the 75 developments, demographic predictors were considered for inclusion in 65 (87%) and finally included in 49 (65%) models. Predictors related to disability scores or tests were considered for inclusion in 56 (75%) and finally included in 38 (51%) models. Predictors related to symptoms (relapses) were considered for inclusion in 55 (73%) and finally included in 37 (49%) models. Predictors derived from analyses of MR images were considered for inclusion in 42 (56%) and finally included in 36 (48%) models. Of the 27 models developed in participants not confined to a single diagnostic subtype, 17 (63%) considered diagnostic categories as predictors and finally nine (33%) included them in the model. Predictors related to MS treatment were considered for inclusion in 15 (20%) and finally included in nine (12%) models. Predictors from molecular analysis of proteins, transcripts, or genes were considered for inclusion in 10 (13%) and finally included in all of those models. Predictors derived from cerebrospinal fluid (CSF) analysis were considered for inclusion in 10 (13%) and finally included in seven (9%) models. Electrophysiological predictors were considered for inclusion in five (7%) and finally included in four (5%) models. Serum 25‐OH‐vitamin D, considered in Runia 2014, was the only non‐CSF laboratory sample parameter to be considered, but was not selected in the final model. Only Aghdam 2021 considered a predictor from the environmental domain, season of attack (spring vs other), for inclusion but it was also not selected in the final model.

The proportion of model developments that considered a specific domain across publication years are presented in Figure 5 (top). More recent models seem to increasingly consider para‐clinical predictors, like those derived from the analysis of imaging, CSF, omics, and electrophysiological tests. This may be related to increasing interest in these biomarkers as prognostic factors, which sometimes is the main focus of the included studies, and the increased availability of technological means to collect and analyse them. The consideration of MS treatment in prognostic model developments also shows an expected increase with time as treatment options increase and become widespread. Predictor domains considered and included in individual models are presented in Appendix 5.

5.

5

Predictors in included models. Top: percent of models considering each predictor domain by year of publication; bottom left: number of models with selection considering (light blue) and including (dark blue) each predictor domain; bottom‐right: number of considered and included predictors (on log‐2 scale) per modelling method by publication year. Shaded regions depict the predictor number range. Data for the year 2021 are incomplete (only until July). CSF: cerebrospinal fluid, ML: machine learning.

Most of the model developments (71%) considered between three and five domains out of the 11 reported above. Figure 5 (bottom left) compares the frequency of consideration and inclusion of predictor domains in the 47 (63%) models that considered more than one domain for inclusion and had predictor selection. When considered, para‐clinical biomarkers from the domains of imaging, omics, CSF, and electrophysiology seem to be included more frequently than predictors from other domains. There are probably two explanations for this observation. Firstly, the authors considering these predictors in a prognostic model are likely to be interested in them and to select a final model that contains them (e.g. Martinelli 2017). Also, the number of possible predictors that can be derived from these measurements is high. Hence, predictors from these domains tend to outnumber those from other domains and survive a selection procedure (e.g. Gurevich 2009).

Other predictors

Predictors that were considered for inclusion in a total of 28 (37%) developments from 18 studies, but that do not fit any of the above categories, were: administrative (duration of follow‐up, seen at onset, annualised visit density, hospitalisation, scanner, study identifier, presence of specific medical assessments, country, MRI site), medical history related (co‐treatment, concomitant diseases, procedures), pregnancy and post‐partum related, patient‐reported outcomes or symptoms, disability not measured by scores or tests, and output of another predictive model. All of these predictors were considered in only single studies except follow‐up time (six studies) and pregnancy (two studies).

Number of predictors

The number of considered predictors (in degrees of freedom) ranged from two (Zakharov 2013) to 852,167 (Kosa 2022) with a median (IQR) of 23 (12.5 to 124) in 67 (89%) model developments (20 of which reported it unclearly). In seven (9%) model developments, neural network algorithms were used with raw/unsummarised imaging or longitudinal data (De Brouwer 2021; Roca 2020; Seccia 2020 180 days; Seccia 2020 360 days; Seccia 2020 720 days; Tousignant 2019; Yoo 2019), making predictor number counts irrelevant. In Bendfeldt 2019 Linear Placebo, the number of considered predictors in the support vector machine (SVM) model was unclear and is reported as the number of voxels in MRI images.

The number of predictors included in the final models (in degrees of freedom) ranged from two (Borras 2016; Sormani 2007 Dev; Zakharov 2013) to 703 (Kuceyeski 2018) with a median (IQR) of 6.5 (4 to 11.5) in 64 (85%) models (17 of which were unclear). For four (5%) model developments (Bendfeldt 2019 Linear Placebo; Pinto 2020 Severity 10 years; Pinto 2020 Severity 6 years; Pinto 2020 SP), there was insufficient information on the number of predictors in the final model.

The number of predictors considered for and included in the final models were clear for 31 (41%) model developments. Of these, 10 (32%) had no predictor selection, so the numbers before and at the end of the modelling process were equal. In the remaining, the difference between the number of considered and included predictors ranged from one (de Groot 2009 Cognitive; Olesen 2019 Routine) to 201 (Ye 2020 nomogram) with a median (IQR) of 14 (1 to 28) and the median (IQR) percent decrease in the number of predictors from considered to included was 77% (40% to 81%). The number of considered and included predictors by algorithm type are presented in Figure 5 on the log2‐scale. As expected, independent of time, models developed using ML methods seem to both consider and include higher number of predictors than those using traditional methods. There also seems to be a slight increase with time in the number of considered predictors for models developed with traditional statistics and number of included predictors in models developed with ML methods.

Bergamaschi 2015 was the only study in which the set of predictors was different in the validation than in the original model, BREMS, which had nine predictors (developed in Bergamaschi 2001 and initially evaluated in Bergamaschi 2007). Two predictors that were measured within one year of disease onset were dropped from the model without refitting, resulting in BREMSO with seven predictors.

Predictor handling

In four (5%) of the 75 models, at least one interaction between predictors was considered during development. In eight (11%) models, no interactions were considered during development. Modelling methods, e.g. random forests, that intrinsically accounted for interactions were used in 31 (41%) model developments. For the remaining 32 (43%) models, it was not reported if interactions were considered or not during development. During the development of 44 (59%) models, there was no evidence of categorisation of continuous predictors. During the development of 17 (23%) models at least one predictor was dichotomised or categorised. How the predictors were handled was unclear in 13 (17%) model developments. There was insufficient information to deduce how the predictors were handled during development in Zakharov 2013.

Timing of candidate predictor measurement was described as 'at disease onset' in 20 (27%) models. The predictors were measured at study baseline in 17 (23%) models using data from RCT or cohort studies. At least 13 (17%) models considered predictors measured at multiple visits, and at least the models in Misicka 2020 and Oprea 2020 were based on predictor and outcome data collected at a single time point (Misicka 2020 10 years; Misicka 2020 20 years; Misicka 2020 Ever; Oprea 2020).

Sample size and missing data

In model developments, sample size ranged from 33 participants (Olesen 2019) to 8825 participants (Manouchehrinia 2019), with a median size (IQR) of 186 (84 to 664) participants. There were six developments (8%, from four studies) with visits as the unit of analysis. In these studies, visits per participants over time were treated as independent observations. The number of visits ranged from 527 (Tacchella 2018) to 2502 (Yperman 2020). Event number was not relevant for seven (9%) developments with continuous outcomes (Bejarano 2011; Gurevich 2009; Kosa 2022; Kuceyeski 2018; Margaritella 2012; Roca 2020; Rocca 2017). The remaining developments analysed a median of 80 events (IQR 37 to 165 events, range 16 to 1953), but five values were unclearly reported.

There were three developments that considered raw/unsummarised imaging data: while Tousignant 2019 considered only imaging data, Roca 2020 and Yoo 2019 both also considered summary predictors, such as lesion load/volume, as well as patient demographics. For these studies, the maximum EPV was calculated excluding the raw/unsummarised imaging data. The EPV could not be computed for Tousignant 2019 due to predictor type and for one development (Bendfeldt 2019 Linear Placebo) due to missing number of considered predictors. The median EPV in the remaining 73 developments was 3.9 (IQR 1 to 9.9, range 0.0002 to 122.1); however, the precise EPV was unclear in 22 developments and the largest, i.e. most optimistic, EPV possible based on reported information was used. For the 73 developments for which EPV could be computed, 17 (23%) had an EPV of 10 or greater and seven (10%) had an EPV of 20 or greater, respectively the older and more current rule of thumb thresholds for the minimum EPV needed for prediction model development.

Model validations included from 10 (Gurevich 2009) to 14,211 participants (Bergamaschi 2015) with a median (IQR) of 217 participants (136 to 700 participants). The number of events was not reported in four validations (19%) and not relevant in one study due to a continuous outcome. Validation in Ahuja 2021 was at the observation level, with an unreported number of observations from 186 participants. The median number of events in model validation was 76 (IQR 33 to 130, range 19 to 3567), below the 100 event minimum suggested by PROBAST. Only seven (44%) of the 16 validations with clear reporting included at least the minimum recommended number of events.

The most common method for handling missing data, employed in 35 (36%) analyses, was to exclude participants from the study if data were missing for specific or any variables. Complete case analysis was used in 26 (27%) analyses. Predictors, instead of participants, with missing data were excluded from nine analyses (9%) found in two studies (Seccia 2020; Zhao 2020). Imputation was reported for 18 analyses (19%), but only five of these reported using multiple imputation. Multiple methods for dealing with missing data were often combined, as reported in 25 (26%) analyses. The method of handling missing data was not reported for 25 (26%) analyses. Although reporting on the number of participants with missing data was often unclear, it was clear that, when using routine care or registry data, hundreds, even thousands, of participants were excluded from analysis due to missing or error‐prone data. De Brouwer 2021, for example, describe the exclusion steps that brought the MSBase analysis set from 55,409 participants down to 6682 participants, an 88% drop.

Model development

We identified 34 models developed using traditional statistical methods (45%), 40 models developed using ML methods (53%), and one model that selected predictors using ML but fit the final model using traditional statistical methods. The traditional statistical methods included 16 (46%) logistic regression models, 15 (43%) survival analyses (11 of them Cox models), one Bayesian model averaging model, and three (9%) linear regression models.

Of those using ML, three (8%) were developed using penalised regression (a LASSO penalty was applied to two logistic models and one Cox model), 10 (25%) using SVM, and 16 (40%) using tree‐based methods (two classification trees, nine random forests, and five using boosting). Of the random forest developments, one had a numeric outcome (Kosa 2022), and one a survival outcome (Pisani 2021). One model used partial least squares regression (Kuceyeski 2018). Another eight (20%) used neural networks and an additional two models were developed by combining ML methods.

The first identified development using ML was published in 2009. Gurevich 2009 used a multi‐class SVM to distinguish between three data‐driven categories of time until relapse. Two years later, Bejarano 2011 used a multilayer perceptron (a type of neural network) to predict change in EDSS. Since 2018, ML developments have been published in the literature every year and in increasing frequency (see Figure 2 right). As of the latest search in July 2021, only prediction model developments employing ML had been published in 2021. Please note that the decrease in number of identified prediction modelling developments in 2021 is at least partially due to the search covering only the first half of the year.

Univariable predictor selection was reported in 17 developments (23%), while this was unreported or unclear in four developments (5%). While 22 developments (29%) took a full model approach, multivariable predictor selection in the remaining developments took several forms. Of these, eight (11%) based selection on coefficient hypothesis testing, 18 (24%) employed stepwise selection, seven (9%) selected from several models with different predictor sets, two (3%) relied on the selection properties of LASSO penalised regression, and another four (5%, Montolio 2021; Pinto 2020 three models) used LASSO for selection but not for prediction. Other methods of multivariable predictor selection methods were used in nine developments (12%), including Bayesian methods, variable importance ranking, minimal depth in tree‐based methods, frequency of selection within cross‐validation, and combinations of methods. For five developments (7%), the use of multivariable selection methods was unclear or not reported.

In one study, uniform shrinkage was applied to each of its three final developed models (de Groot 2009). Some amount of shrinkage was induced in 38 (51%) developments due to modelling methods, including Bayesian methods, penalised estimation, and other ML methods. No shrinkage was applied in 31 (41%) developments, and it was unclear for two (3%) developments.

Of the 41 developments involving tuning parameters, 19 (46%) mentioned specific tuning parameters and the method used to tune them. There were six (15%) models from two studies for which the use of software defaults were reported upon correspondence (Pinto 2020; Tacchella 2018). Details were unclear in 10 (24%) developments in which tuning for parameters unrelated to the ML algorithm was mentioned but algorithm‐specific tuning was not. There was no reporting related to tuning in six model developments (15%).

Model performance and evaluation
Internal validation methods

Of the 73 model performance evaluations using development data, two evaluations relate to a single model assessed using development data in both the development study and a later validation study (Skoog 2014; Skoog 2019). There was a single development study in which model performance was only evaluated on an external validation set (Ahuja 2021). There were an additional two development studies in which model performance was not evaluated (Bergamaschi 2001; Weinshenker 1991). These were the two studies included because their models were evaluated as prediction models in later studies. Apparent performance was reported in 16 (22%) internal validations and a single random split of the data was used in nine (12%) evaluations. Cross‐validation and bootstrap procedures, preferred approaches to internal validation, were conducted 34 (47%) and 10 (14%) times, respectively. Methods were unclear in four (5%) internal validations, in which bootstrap methods were used for some purpose during development, but not clearly for performance evaluation. The number of bootstrap samples ranged from 200 (Manouchehrinia 2019) to 1500 (Spelman 2017). Leave‐one‐out cross‐validation was reported 10 times, while k‐fold cross‐validation was reported 18 times, with the number of folds varying between 2 (Wottschel 2019) and 10 (Bejarano 2011; Law 2019; Montolio 2021; Pinto 2020; Zhao 2020). Additionally, Wottschel 2019 assessed the influence of cross‐validation methods on classification performance estimates by comparing 2‐fold, 5‐fold, 10‐fold, and leave‐one‐out cross‐validation. Cross‐validation based on leaving a percentage of the data out or based on shuffle split was reported in another five evaluations.

Performance measures

There were a total of 93 model performance evaluations, 24 (26%) of which reported on model calibration, either with a plot, table, measure, or several of these. There were 16 evaluations with calibration plots, four with O:E tables, two with histograms or bar plots depicting differences between observed and predicted outcomes in some way, and one table of observed event frequencies across score levels. Calibration slopes were reported in four evaluations from two studies and the P value from the Hosmer‐Lemeshow test in five from four studies. The Gronnesby and Borgan test P value was reported for one model evaluation and mean square error once. The O:E ratio was reported twice in one study in order to provide a recalibration factor based on the development data and on external validation data (Skoog 2019).

We had intended to compute O:E ratios from reported information; however, the expected number of events or other expected outcomes were only rarely reported. Both of the evaluations rated at low risk of bias in the analysis domain assessed calibration in some way. Pellegrini 2019 presented bootstrap‐corrected calibration slopes of 1.08 (SE 0.17) and 0.97 (SE 0.15) for one and two year composite disease progression outcomes. De Brouwer 2021 reported the use of Platt scaling on their deep learning model for EDSS‐based disease progression at two years. This procedure transforms the output of the classification model into a probability distribution via logistic regression (Platt 1999). Upon correspondence, the authors of De Brouwer 2021 provided a calibration plot with no evidence of major departures from calibration.

There were 85 evaluations for which discrimination and classification measures were applicable (models with survival or binary outcomes). A c‐statistic was reported in 47 (55%) of them, 31 (36%) with some measure of uncertainty. Reporting was unclear about the use of a c‐statistic for survival data in three evaluations (Runia 2014; Spelman 2017; Ye 2020 gene signature). Reported c‐statistics ranged from a minimum of 0.59 (Pellegrini 2019; Ye 2020) to a maximum of 0.92 (Pisani 2021; Tommasin 2021), with a median (IQR) value of 0.77 (IQR 0.71 to 0.82). Both of the evaluations rated at low risk of bias in the analysis domain reported c‐statistics below the median observed across the literature: Pellegrini 2019 reported the minimum c‐statistic of 0.59 and De Brouwer 2021 reported a c‐statistic of 0.66.

Classification measures were reported in 49 (58%) evaluations of survival or classification models. These evaluations reported accuracy or error measures 36 times, sensitivity or specificity 43 times, positive or negative predictive values 21 times, and other measures such as F1 score eight times. There were nine evaluations communicating use of 0.5 as the threshold value for estimating classification performance and seven reporting classification measures for more than one threshold value. Another three used some percentile of the data and nine used data‐driven methods to identify an optimal threshold. Classification measures were also applied to models of continuous outcomes in five evaluations (Bejarano 2011 Dev; Bejarano 2011 Val; Gurevich 2009 FTP; Margaritella 2012; Rocca 2017), with the threshold value unclearly reported for two evaluations from one study (Bejarano 2011) and the other three evaluations using some window around the observed value to be predicted as a threshold.

Model presentation

Models were presented in various ways across studies and modelling methods. For the 35 developments using traditional regression methods for fitting, eight (23%) full regression models with intercepts/baseline hazards were presented and another eight (23%) regression models were presented without intercepts/baseline hazards or some other model coefficients. Three (9%) regression models were simplified into sum scores (Gout 2011; Runia 2014; Vasconcelos 2020), two of which were unweighted sums, i.e. predictor counts (Runia 2014; Vasconcelos 2020). The tools presented included eight nomograms (Manouchehrinia 2019; three in Misicka 2020; two in Olesen 2019; Spelman 2017; Ye 2020), three score charts (all in de Groot 2009), one web application (Skoog 2014), and one heat map (Borras 2016). Manouchehrinia 2019 additionally presented their nomogram as a web application and the web application of Skoog 2014 was updated using the shrinkage factor estimated in Skoog 2019. Two developments were described by lists of included predictors. Malpas 2020 presented a chart of relative risks associated with various combinations of predictors from their simplified model. Two (6%) developments based on traditional methods did not present the final model in any way (Oprea 2020; Zakharov 2013).

Of the 40 models fit using ML, only five (12%) reported tools allowing other users to make predictions for new people with MS. Lejeune 2021 presented a web application, Aghdam 2021 presented a decision tree, and Pisani 2021 presented a tool based on the sum of a heat map‐derived value and a formula weighted by predictor random forest minimal depths. The other two studies provided model coefficients from penalised regression without intercepts/baseline hazards (Ahuja 2021; Ye 2020). Other presentations included one bar chart of predictor weights from a linear SVM although a non‐linear SVM was fit (Bendfeldt 2019), and eight ML developments presented the final model as a list of included predictors. Ten ML developments were not followed by final model presentation in any way. Independent of the model presentation described above, there were a total of 19 ML developments that reported some measure of variable importance.

Model interpretation

Of the 57 studies included, 26 (46%) primarily aimed to predict clinical outcomes in individual patients, as indicated by mentioning the intent to create or assess a model or tool in their abstract, introduction, and discussion. In another 21 studies (37%), outcome prediction was an aim of the study; however, the focus appeared to be on other aspects of the study, such as predictors and modelling methods. Outcome prognostication in individuals was not the primary aim in 10 studies (18%), all of which were instead mainly interested in predictor identification or the usefulness of specific predictors. Forty‐three studies (75%) were presented as exploratory research, indicating some need for further development or validation, while 14 studies (25%) were presented with confirmatory conclusions, eight of which were not associated with any external validation.

We assessed the presence of information on study strengths and limitations, generalisability of results, and comparisons with other modelling studies for the 57 included studies. Most studies discussed their strengths and limitations (49 (86%) and 51 (89%) studies, respectively), just over half of studies (31, 54%) discussed the generalisability of their results; however, only 16 (28%) studies mentioned other models in their discussions. These comparisons with other models were focused on the predictors and modelling methods used, rather than a comparison of model performance with that of other MS prognostic models with similar outcomes. The most comprehensive comparison with other prognostic models, a table of performance measures for models from five other MS prognostic model studies with description of outcome definitions and timing, was presented by Montolio 2021.

Usability and reproducibility

Model usability and reproducibility, as defined in Appendix 3, was assessed for each of the 75 developed models and summarised in Table 2 (ordered by outcome). Usability was assessed in terms of skill and equipment specialisation required for predictor collection, model presentation, the ability of the presented model to estimate absolute risk, and the number of external validations performed for the model. Model reproducibility is summarised by the availability of the model/tool, code, and data.

1. Model usability and reproducibility.
Model Outcome Predictor timing Equipment Usability Absolute risk Ext. Val. Reproducibility
Agosta 2006 Disability (EDSS) From study entry to 1 year after study entry Specialty centre Unclear No 0 Unclear
Bejarano 2011 Disability (EDSS) From disease onset (MS) to study entry Specialty centre No model Not risk model 1 external refit None
De Brouwer 2021 Disability (EDSS) From onset to index date (including trajectories during the 3 years prior to index date) No special equipment No model No 0 Code
de Groot 2009 Dexterity Disability (9HPT) At Poser MS diagnosis (within 6 months) Specialty centre Tool + instructions No 0 Tool
de Groot 2009 Walking Disability (EDSS) At Poser MS diagnosis (within 6 months) Standard hospital Tool + instructions No 0 Tool
Kuceyeski 2018 Disability (cognitive ‐ SDMT) From disease onset (undefined ‐ RRMS?) to final follow‐up Specialty centre No model Not risk model 0 None
Law 2019 Ada Disability (EDSS) From disease onset (MS) to study entry Specialty centre No model No 0 None
Law 2019 DT Disability (EDSS) From disease onset (MS) to study entry Specialty centre No model No 0 None
Law 2019 RF Disability (EDSS) From disease onset (MS) to study entry Specialty centre No model No 0 None
Lejeune 2021 Disability (EDSS) From disease onset (undefined ‐ RRMS?) to index relapse No special equipment Tool + instructions Yes 1 Tool, DOR
Malpas 2020 Disability (EDSS) From symptom onset to 1 year after onset No special equipment Tool + instructions No 1 Tool
Mandrioli 2008 Disability (EDSS) From disease onset (first attack) to diagnosis (CDMS) Specialty centre Unclear Yes 1 Unclear
Margaritella 2012 Disability (EDSS) From disease onset (MS) to 1 year prior to outcome Specialty centre Unclear Not risk model 0 Unclear
Montolio 2021 Disability (EDSS) At study entry, year 1 visit and year 2 visit Specialty centre No model No 0 None
Oprea 2020 Disability Disability (EDSS) At study entry No special equipment No model No 0 None
Pinto 2020 Severity 10 years Disability (EDSS) From onset to 5 years post‐prognostication Not reported No model No 0 None
Pinto 2020 Severity 6 years Disability (EDSS) From onset to 2 years post‐prognostication Not reported No model No 0 None
Roca 2020 Disability (EDSS) At FLAIR imaging (anytime) Specialty centre No model Not risk model 0 None
Rocca 2017 Disability (EDSS) From study entry to 15 months after study entry Specialty centre Model Not risk model 0 Model
Rovaris 2006 Disability (EDSS) From study entry (anytime during PPMS) Specialty centre Unclear No 0 Unclear
Sombekke 2010 Disability (MSSS) At disease onset (MS) Specialty centre Model No 0 Model
Szilasiova 2020 Disability (EDSS) At study entry Standard hospital Unclear   0 Unclear
Tommasin 2021 Disability (EDSS) At imaging visit Specialty centre No model No 0 None
Tousignant 2019 Disability (EDSS) At imaging visit Specialty centre No model No 0 None
Weinshenker 1991 M3 Disability (DSS) From disease onset (initial symptom) to assessment (not defined) No special equipment Model + instructions Yes 1 Model
Weinshenker 1996 Short‐term Disability (EDSS) From disease onset (initial symptom) to outcome measurement No special equipment Model Yes 0 Model
Yperman 2020 Disability (EDSS) At clinical visit (unclear: at any time during MS) Specialty centre No model No 0 DOR
Zhao 2020 LGBM All Disability (EDSS) 2‐year observation window Specialty centre No model No 0 Code, DOR
Zhao 2020 LGBM Common Disability (EDSS) 2‐year observation window Specialty centre No model No 1 (unclear if refit) Code, DOR
Zhao 2020 XGB All Disability (EDSS) 2‐year observation window Specialty centre No model No 0 Code, DOR
Zhao 2020 XGB Common Disability (EDSS) 2‐year observation window Specialty centre No model No 1 (unclear if refit) Code, DOR
Gurevich 2009 FLP Relapse During CIS or CDMS Specialty centre No model No 1 None
Gurevich 2009 FTP Relapse During CIS or CDMS Specialty centre No model Not risk model 0 None
Sormani 2007 Relapse From 2 years prior to baseline measurement (during RRMS) Standard hospital Model + instructions Yes 1 Model
Vukusic 2004 Relapse From disease onset (MS) to delivery No special equipment Model + instructions Yes 0 Model
Ye 2020 gene signature Relapse At study entry Specialty centre Model + instructions No 0 Model, Data
Ye 2020 nomogram Relapse At study entry Specialty centre Model Yes 0 Tool, Data
Aghdam 2021 Conversion to definite MS At ON event Standard hospital Model Yes 0 Tool
Bendfeldt 2019 Linear Placebo Conversion to definite MS At CIS onset (within 60 days) Specialty centre No model No 0 None
Bendfeldt 2019 M7 Placebo Conversion to definite MS At CIS onset (within 60 days) Specialty centre No model No 0 None
Bendfeldt 2019 M9 IFN Conversion to definite MS At CIS onset (within 60 days) Specialty centre No model No 0 None
Borras 2016 Conversion to definite MS At disease onset (CIS, up to 126 days after onset) Specialty centre Tool + instructions Yes 0 Tool
Gout 2011 Conversion to definite MS At CIS onset (admission for CIS event) Standard hospital Tool + instructions No 0 Tool
Martinelli 2017 Conversion to definite MS At CIS onset (within 3 months) Specialty centre No model No 0 None
Olesen 2019 Candidate Conversion to definite MS At disease onset (ON, up to 38 days after onset) Specialty centre Tool + instructions Yes 0 Tool, DOR
Olesen 2019 Routine Conversion to definite MS At disease onset (ON, up to 38 days after onset) Specialty centre Tool + instructions Yes 0 Tool, DOR
Runia 2014 Conversion to definite MS At disease onset (CIS) Standard hospital Tool + instructions No 0 Tool
Spelman 2017 Conversion to definite MS At disease onset (within 12 months) Specialty centre Tool + instructions Yes 0 Tool
Wottschel 2015 1 year Conversion to definite MS At CIS onset (within a mean of 6.15 weeks) Specialty centre No model No 0 None
Wottschel 2015 3 years Conversion to definite MS At CIS onset (within a mean of 6.15 weeks) Specialty centre No model No 0 None
Wottschel 2019 Conversion to definite MS At CIS onset (within 14 weeks) Specialty centre No model No 0 None
Yoo 2019 Conversion to definite MS At CIS onset (within 180 days) Specialty centre No model No 0 None
Zakharov 2013 Conversion to definite MS At first MRI after CIS onset Specialty centre No model No 0 None
Zhang 2019 Conversion to definite MS At CIS onset (primary clinical work‐up for CIS) Specialty centre No model No 0 None
Bergamaschi 2001 BREMS Conversion to progressive MS From disease onset (RRMS) to 1 year after disease onset No special equipment Unclear No 2, simplified: 2* Unclear
Brichetto 2020 Conversion to progressive MS At visit of interest Standard hospital No model No 0 None
Calabrese 2013 Conversion to progressive MS At study entry (during RRMS) Specialty centre Model + instructions Yes 1 Model
Manouchehrinia 2019 Conversion to progressive MS From disease onset (unclear: RRMS?) up to first EDSS recorded (several years after onset) No special equipment Tool + instructions Yes 3 Tool
Misicka 2020 10 years Conversion to progressive MS At study interview Specialty centre Tool + instructions Yes 0 Tool
Misicka 2020 20 years Conversion to progressive MS At study interview Specialty centre Tool + instructions Yes 0 Tool
Misicka 2020 Ever Conversion to progressive MS At study interview Specialty centre Tool + instructions Yes 0 Tool
Pinto 2020 SP Conversion to progressive MS From onset to 2 years post‐prognostication Not reported No model No 0 None
Pisani 2021 Conversion to progressive MS From RRMS onset to 2 years post‐onset Specialty centre Model No 0 Tool, DOR
Seccia 2020 180 days Conversion to progressive MS Patient trajectories until index visit (during RRMS) Standard hospital No model No 0 Data
Seccia 2020 360 days Conversion to progressive MS Patient trajectories until index visit (during RRMS) Standard hospital No model No 0 Data
Seccia 2020 720 days Conversion to progressive MS Patient trajectories until index visit (during RRMS) Standard hospital No model No 0 Data
Skoog 2014 Conversion to progressive MS From last relapse to index date, repeatedly No special equipment Tool + instructions Yes 1 Tool
Tacchella 2018 180 days Conversion to progressive MS From disease onset to the index visit of interest Standard hospital No model No 0 None
Tacchella 2018 360 days Conversion to progressive MS From disease onset to the index visit of interest Standard hospital No model No 0 None
Tacchella 2018 720 days Conversion to progressive MS From disease onset to the index visit of interest Standard hospital No model No 0 None
Vasconcelos 2020 Conversion to progressive MS From onset (unclear) to at least 2 years (unclear) No special equipment Unclear No 1 Unclear
Ahuja 2021 Composite (relapse) From 12 months prior to index date Standard hospital Model No 1 Model, Code, DOR
Kosa 2020 Composite (EDSS, SNRS, T25FW, NDH‐9HPT) At lumbar puncture Specialty centre No model Not risk model 0 None
de Groot 2009 Cognitive Composite (cognitive tests) At Poser MS diagnosis (within 6 months) Specialty centre Tool + instructions No 0 Tool
Pellegrini 2019 Composite (EDSS, T25FW, 9HPT, PASAT, VFT) From disease onset (MS) to study entry Standard hospital Model No 0 Model

9HPT: 9‐hole peg test
Ada: adaptive boosting
BREMSO: Bayesian Risk Estimate for Multiple Sclerosis at Onset
CDMS: clinically definite multiple sclerosis
CIS: clinically isolated syndrome
DOR: data available on request (as reported in the publication)
DSS: Disability Status Scale
DT: decision tree
EDSS: Expanded Disability Status Scale
FLP: first level predictor
FTP: fine tuning predictor
IFN: interferon
LGBM: light gradient boosting machine
MRI: magnetic resonance imaging
MS: multiple sclerosis
MSSS: multiple sclerosis severity score
NDH‐9HPT: non‐dominant hand 9‐hole peg test
ON: optic neuritis
PASAT: Paced Auditory Serial Addition Test
PPMS: primary progressive multiple sclerosis
RF: random forest
RRMS: relapsing‐remitting multiple sclerosis
SDMT: symbol digit modalities test
SNRS: Scripps neurological rating scale
SP: secondary progressive
T25FW: timed 25‐foot walk
VFT: visual function test
XGB: extreme gradient boosting

The timing of predictor collection varied across the final models. There were 38 models (51%) using information available at a single time point (20 at disease onset and 18 at an arbitrary point) and seven models (9%) using information from a specific timeframe (between 12 and 24 months) relative to disease onset or study entry. Twenty‐five models (33%) used data available over time and another four models from two studies specifically used predictor data longitudinally (De Brouwer 2021; Seccia 2020). Predictor assessment timing was unclear in the model of Vasconcelos 2020.

The level of skill and equipment specialisation could not be assessed for three developments from one study due to lack of information on selected predictors (Pinto 2020). All 72 models reporting details on included predictors were found to require a specialist in order to measure or assess predictors, 35 of which contained EDSS. For this reason, level of skill is omitted from Table 2. Greater variability in rating was observed for level of equipment specialisation: 11 models (14.7%) required no special equipment, relying only on demographics, disease subtype, symptoms, and treatments. Predictors from 17 models (22.7%) could be measured in a standard hospital, and 44 models (59%) required specialised equipment related to advanced imaging, omics, CSF markers, and evoked potential markers.

We identified 39 (52%) developed models that were not accompanied by model coefficients, tools, or instructions. Eight (11%) models were reported with basic model information, five (7%) with a model and instructions, and 16 (21%) were given as simple tools with instructions or explanation of use. This measure of usability was rated as unclear for seven (9%) models for which model components not considered to be predictors were not reported (for example, coefficients for follow‐up duration adjustment), when it was unclear if model coefficients were missing, or when coding of predictors was especially unclear. For two models, for example, coding of the basic demographic predictor sex was unclear (Margaritella 2012; Szilasiová 2020).

There were seven developed models that did not aim to predict the risk of a clinical outcome, but rather the value of a continuous measure. Of the 34 models for future disease risk that reported a final model in some way, absolute risk can be estimated with the reported information from 18 (53%) of them, but not for 15 (44%). Clarification of predictor coding would enable estimation of absolute risk from one further model (Szilasiová 2020).

Analysis data was made publicly available for five (7%) models from two studies (Gurevich 2009; Seccia 2020) and analysis code was publicly available for six (8%) models from three studies (Ahuja 2021; De Brouwer 2021; Zhao 2020). One further study developing two models (Ye 2020) reused the data provided by Gurevich and colleagues (Gurevich 2009). Six (8%) studies explicitly stated that data were available upon request (Ahuja 2021; Lejeune 2021; Olesen 2019; Pisani 2021; Yperman 2020; Zhao 2020). For 28 (37%) models from 19 studies, no models, code, or data were provided (three traditional regression models and 25 ML models). None of the studies provided a model/tool, code, and data or even just code and data.

Because multiple external validations should be performed before a model is deemed clinically useful, the number of external validations performed for each model is also given in Table 2. Of the identified models, only 12 (16%) were externally validated at least once. Of these, one model was externally validated twice (Bergamaschi 2001) and another three times (Manouchehrinia 2019).

Risk of bias

As depicted in Figure 6 (left), all but one of the 96 analyses (Pellegrini 2019) were found to have high risk of bias. This single study was co‐authored by a clinical prediction modelling methodologist from outside the MS field, who is part of the PROBAST group (Wolff 2019). The introduction of this study listed many of the substandard aspects of prediction model development in MS also identified in this review. It appeared to aim to depict correct prognostic model development and internal validation steps for the MS community.

6.

6

Risk of bias summary. Left: by domain; right: item‐wise for analysis domain.

The high risk of bias across the literature was driven mainly by the analysis domain, for which only two (2%) analyses, that of De Brouwer 2021 and Pellegrini 2019, were found to be at low risk of bias but 94 (98%) at high risk (see Figure 6 left), and, to a lesser extent, by the participants domain, for which 18 (19%) analyses were found to be at low risk but 78 at high or unclear risk of bias (59 (61%) and 19 (20%), respectively). Domain‐level risk of bias plots per analysis are provided in Appendix 6. Item‐level assessment details for the analysis domain are depicted in Figure 6 (right).

The high risk of bias related to the analysis domain was multi‐faceted but mainly driven by two PROBAST items: 80 (83%) analyses were found to have an insufficient number of participants and 81 (84%) analyses did not use relevant model performance measures, with most ignoring calibration. Besides the exclusion for missing data addressed in the participants domain, 32 (33%) analyses used other suboptimal methods for dealing with missing data. Predictor dichotomisation and univariable predictor selection are still used, as found in 13 (16%) analyses and 17 (23%) developments, respectively. A clear difference between studies using ML as opposed to traditional statistics can be seen in the reporting of final models. Of the 35 developments using traditional regression methods, 23 (66%) were reported in such a way that it was clear that the final model corresponded to the multivariable analysis. However, only three (8%) of the 40 ML developments reported final model details that correspond directly with the multivariable analysis. Most ML developments did not present tools or report enough information for understanding of the final models.

The two most common reasons for a high risk rating in the participants domain were the use of routine care or registry data (36 analyses, 38%) and inappropriate exclusion of participants (35 analyses, 36%). While registries are an important source of data for MS research, their quality and limitations should be reported and addressed. Data quality was rarely discussed, and the only reported method to deal with poor data quality was to exclude participants with missing/erroneous data. There was no mention of whether the excluded participants were otherwise similar to included participants with respect to observed covariates, and it was unclear whether study teams even had access to the excluded data in order to assess possible differences. Additionally, inappropriate inclusion of participants known to already have the outcome at baseline affected at least five developments from three studies (de Groot 2009; Malpas 2020; Szilasiová 2020). This is expected to result in overestimated performance estimates at internal validation (Moons 2019).

The use of problematic data sources also led to issues in the predictor and outcome domains. Combined with insufficient reporting, it was difficult to judge whether predictors and outcomes were assessed uniformly across participants and whether each was blinded to the other. The registry datasets cover long time periods and multiple sites, which makes it unlikely that predictors and outcomes were uniformly measured, especially given the rapid changes in diagnostic criteria and the poor generalisability of imaging predictors across machines (Seccia 2021). An important, independent issue to highlight within the predictors' domain though relates to timing. We identified 11 (11%) analyses using predictors only available after the intended time of model use, which makes the model unusable in practice. The intended time of model use was generally unclear, making it difficult to understand when the model is meant to be used and how far into the future it is meant to predict outcomes for.

Although our review question was broad in nature, we found only 36 (38%) analyses to be of low concern regarding applicability. The most common reason for concern related to participants was the inclusion of participants known to have the outcome of interest at the time the model was applied, jeopardising the categorisation of the model as a prognostic model. For example, one study defining disability as EDSS more than or equal to five at 15 years included an unknown number of participants with EDSS more than or equal to five already at baseline (Szilasiová 2020). We found the most frequent concern regarding predictors to be the inclusion of only a single predictor type (e.g. only imaging or genetic predictors) without consideration of more basic, easier‐to‐collect predictors. Only Kosa 2022 was rated at high concern due to unclear interpretation of the outcome. Kosa and colleagues modelled the outcome MS‐DSS, which is itself the output of another model, making interpretation difficult. The most common reason for high concern to overall applicability was a primary aim other than development or validation of a prognostic model for individual prediction, which was determined for 12 (12%) analyses. For 27 (28%) analyses lacking final model/tool presentation in a way allowing for application to new individuals, we considered the applicability unclear. This concern was especially frequent amongst the ML studies.

Reporting deficiencies

Across the included analyses, 54% of the 20 mandatory TRIPOD items we evaluated were reported. When unclear or partial reporting was included, the amount of reporting increased to 69%. Of the 19 mandatory TRIPOD items applicable to developments, fewer were reported in those using ML methods (49%) compared to those using traditional statistics (60%). An item‐wise summary of reporting is shown in Figure 7 both overall for all analyses types and by algorithm type for model developments.

7.

7

Summary of reporting deficiencies based on TRIPOD items. Top: Overall; bottom: For developments by modelling method. Val: validation

When we compared the percentage of reporting in the model developments using traditional statistics published before 2016 (the publication year of TRIPOD) and during or after 2016, we did not observe any difference (59% and 61%, respectively). Visual inspection did not indicate any time trends in median percentage reporting overall or in categories based on the algorithm or the analysis type (see Appendix 6).

When described analysis‐wise, the best reporting amongst developments was in all three models from de Groot 2009 with 84% of the 19 items reported, and the worst reporting was in Oprea 2020 with 16% reported. The best reporting amongst validations was at 73% of the 15 items in Lejeune 2021 Ext Val, Manouchehrinia 2019 Ext Val 2, and Manouchehrinia 2019 Ext Val 3, and the worst reporting was at 13% in Gurevich 2009 FLP Ext Val. Item‐wise reporting per analysis is displayed in Appendix 6.

Source of data and participants

At least one out of the five items related to source of data and participants was not reported in 70 (73%) of the analyses. The item with worst reporting under this heading was treatments (item 5c). Of the 96 included analyses, the treatments received by participants, either at baseline or during follow‐up, were somehow reported but not clearly in 25 (26%) analyses and not reported at all in 40 (42%) analyses. Of those that did not report treatments received by participants, eight (20%) were solely in people with CIS (Aghdam 2021; Olesen 2019 Candidate; Olesen 2019 Routine; Runia 2014; Wottschel 2019; Yoo 2019; Zakharov 2013; Zhang 2019). This item was reported less frequently in models developed with ML methods (20%) than with traditional statistics (46%).

The study start and end dates (item 4b) were the next item that most of the analyses failed to report. They were somehow reported but not clearly in three (3%) (Bergamaschi 2001 BREMS Dev; Borras 2016; Roca 2020), and not reported at all in 52 (54%) of the 96 included analyses. Although reported relatively better than most of the other items, the most fundamental information on the study design or source of data (item 4a) was reported in an unclear manner in almost every fourth (21) analysis, and was totally missing from three (3%) analyses (Kuceyeski 2018; Oprea 2020; Yoo 2019).

Predictors and outcome

At least one out of the three items related to predictors and outcome was not reported in 92 (96%) analyses. Of the 96 included analyses, the outcome definition (item 6a) was missing from five (5%) analyses with conversion to progressive MS outcomes (Brichetto 2020; Pinto 2020 SP; Tacchella 2018 180 days; Tacchella 2018 360 days; Tacchella 2018 720 days). Outcomes were not clear in Bejarano 2011, which reported AUC measures for change in EDSS (modelled as continuous), and Oprea 2020, which reported keeping an EDSS score with unclear thresholds and time points. Blinding of the outcome assessment to predictors (item 6b) was reported in only four analyses (Kosa 2022; Olesen 2019 Candidate; Olesen 2019 Routine; Rovaris 2006) and not reported at all in the remaining.

Of the 75 model developments in which the reporting of predictor definitions (item 7a) was assessed, predictor definitions were somehow reported but not clearly in 24 (32%) developments and not reported at all in 12 (16%) developments.

Sample size and missing data

At least one out of the three items related to sample size and missing data were not reported in 82 (85%) analyses. Of the 96 analyses, none reported details of sample size justification (item 8) to reach a level of certainty of the reported effect sizes. The presentation closest to a sample size justification was in Yperman 2020, which used a random forest classifier in nested cross‐validation. They plotted a learning curve of AUC as a function of different sizes of the training set to discuss any plateauing and its sufficiency. The limitations posed by a small sample size were somewhat discussed in 24 (25%) analyses. Model developments with ML methods were more likely (45%) to discuss their limited sample size or the drawbacks posed by it compared to those with traditional methods (14%).

Only 23 (24%) of the analyses reported the amount of missing data handled (item 13b) during study design or analysis. This is despite the fact that we considered a study to have reported the amount of missing data when the only information provided on this topic was the number of excluded participants due to lack of a predictor domain measurement (e.g. missing MR images). Thirty‐six analyses (38%) reported the amount of missing data in an unclear or inconsistent manner. The method of dealing with missing data (item 9) was somehow reported but not clearly in 16 (17%), and not reported at all in 25 (26%) analyses.

Statistical analysis methods

At least one out of the two items related to the statistical analysis was not reported in 18 (24%) developments. The type of model, model‐building procedures (including predictor selection and tuning parameter optimisation, as relevant), and method for internal validation (item 10b) were reported to a limited extent for 21 (28%), and not reported at all for 13 (17%) of the 75 model developments. The model‐building steps, expected to be relatively simpler for traditional methods, were reported more frequently in the model developments utilising traditional statistics (74%) than those utilising ML methods (38%).

Results and discussion

At least one out of the seven items related to results and discussion was not reported in 79 (82%) analyses. Of the 96 analyses, the number of participants and the number of events (items 13a/14a) were reported in an unclear manner in 11 (11%), and not reported at all in seven (7%) (Ahuja 2021 Dev; Ahuja 2021 Ext Val; Bergamaschi 2015 BREMS Ext Val; Gurevich 2009 FLP Ext Val; Oprea 2020; Sormani 2007 Ext Val; Szilasiová 2020). Information on basic baseline participant characteristics (item 13b: age, sex, diagnostic subtype) was missing from 17 (18%) analyses.

A comparison of the distribution of important variables with the development data (item 13c) was missing from 11 (55%) of the 20 validations, excluding Skoog 2019 Val using a subset of participants as the model development study (Skoog 2014 Dev). Also, none of the model developments that used a single random‐split for evaluation provided such a comparison.

The full prediction model including the intercept or baseline survival to allow for calculation of absolute risk (item 15a), was reported more or less clearly only in 16 (21%) of the 75 developments. An explanation on how to make predictions or assign an individual to a risk group based on the developed model (item 15b) was provided for 22 (29%) models. Although neither item 15a nor item 15b were reported, the discussion section of five (7%) model developments had confirmatory language that suggests implementation of the models to support clinical decisions (Malpas 2020 Dev; Pisani 2021; Roca 2020; Tousignant 2019; Ye 2020 gene signature). Models developed with traditional statistical methods were much more likely to present the final full models (40%) or how to calculate predictions from them (60%) than those developed by ML methods (5% and 2%, respectively).

A model performance measure (item 16, assessed by the presence of a discrimination measure) was reported in 33 (34%) analyses with its uncertainty and in 25 (26%) analyses without its uncertainty. Reporting of AUC in Szilasiová 2020 was unclear due to the inconsistency between the receiver operating characteristic figure associated with the AUC and the reported point sensitivity/specificity value. No discrimination or classification measures were reported in 10 (10%) analyses: three did not contain any evaluation of model performance and were included because of their validations (Ahuja 2021 Dev; Bergamaschi 2001 BREMS Dev; Weinshenker 1991 M3 Dev), three reported R2 (Misicka 2020 10 years; Misicka 2020 20 years; Misicka 2020 Ever), and the remaining four survival analyses reported Kaplan‐Meier or incidence plots for predicted risk groups (Gout 2011; Sormani 2007 Dev; Sormani 2007 Ext Val; Vasconcelos 2020 Dev). The discussion section of four (5%) models from two studies had confirmatory language, although no discrimination or classification measures were reported (Bergamaschi 2001 BREMS Dev; Misicka 2020 10 years; Misicka 2020 20 years; Misicka 2020 Ever).

Discussion

Summary of main results

The main objective of this review was to identify and summarise all multivariable prognostic model developments and validations for quantifying the risk of clinical disease progression, worsening, and activity in people with MS. We identified 75 models, 12 of which were externally validated at least once in a total of 15 validations by applying the model of interest as intended by its development to predict outcomes in new participants. Only a single model, Manouchehrinia 2019, was externally validated three times, all within the development study. There were six other author‐reported validations that did not meet our criteria for external validation. No external validation has yet occurred for the remaining 60 models, making them unsuitable for use in practice at this time.

Models with an external validation

Of the 12 models with any external validation, only two models were found to have been externally evaluated more than once. The BREMS score (Bergamaschi 2001) was evaluated with two external datasets by the development team before being simplified to the BREMSO score and being evaluated further using the second external development dataset. The model for conversion to progressive MS developed by Manouchehrinia and colleagues (Manouchehrinia 2019) was evaluated using three external datasets (one registry, two randomised controlled trials) within the development study. No validation studies were found to be performed by researchers external to the development team and almost all were part of the development publication. The lack of independent external validations hinders conclusions about the models’ generalisability. While these studies provide information on model performance when applied by the development authors to new people with MS, it is still unclear whether the reporting is sufficient to enable use by external researchers and clinicians. This is important because model performance will depend on the interpretation of unclearly reported models, predictors, and outcomes.

Models without an external validation

At 80%, models for which no external validation evidence of any kind exists make up the overwhelming majority of the MS prognostic modelling literature. However, this is not surprising considering that only 12 of these 60 models reported their full model or provided a tool and gave instructions on its use. Only one of these models explicitly stated that external validation was not pursued due to the poor discrimination of the developed model in the internal validation (Pellegrini 2019). It is worth noting that none of the identified models for conversion to definite MS were found to be externally validated. Due to its early position in the disease course, valid prognostic models addressing this outcome in people with CIS have the potential to exert the greatest impact on treatment decisions and thereby long‐term outcomes.

Before developing a new prognostic model, it is recommended that one should first review the literature in search of previously developed prognostic models predicting the outcome of interest (van Smeden 2018). This would ideally be followed by external validation of relevant existing models to test their generalisability. This validation process is meant to call attention to possible weaknesses of the model and to allow for an iterative process of improvement via model updating. When multiple models for the same outcome exist, it is important to compare their performance when applied to the population of interest, instead of just comparing the included predictors and modelling method. These recommended steps would channel the efforts of the scientific community towards the common goal of delivering a generalisable and clinically useful prognostic prediction model. Our review points to the fact that the recommended initial steps are omitted in MS prognostic research. Model developments dominate the literature and the model performances of newly developed models are not compared to that of other published prognostic models in a meaningful way. Some included studies, e.g. Seccia 2020, explicitly mentioned the lack of validated models for MS prognosis, and yet continued to develop new models without evaluating the performance of previously developed models for similar outcomes on their independent data.

The lack of external validations also point to the need for effective clinical data‐sharing. In order to make the best use of the resources allocated to medical research, independent researchers should be able to access existing datasets for external validation of published prognostic models (Völler 2017). Ideally, these datasets should be harmonised and be provided through an infrastructure allowing individual patient level meta‐analysis (Snell 2020) from different sources to reach a sufficient sample size. While there are several large MS registries and even a network between many of them (Big Multiple Sclerosis Data Network), access provided to general researchers appears limited. The increasing use of large, long‐term registry data will also necessitate improved data quality measures and reporting across domains, especially participant characteristics. More attention to participant selection and how it affects model applicability will be needed.

Overall completeness, certainty of the evidence and study limitations of externally validated models

Overall completeness of the data

In the MS literature, reporting of prognostic prediction model studies was very poor, echoing the experience in other disease areas (Kreuzberger 2020; Wynants 2020). The situation was at least as dire in models developed with ML compared to those developed with traditional methods, partially because the current EQUATOR Network guidelines do not seem directly applicable to these studies (Dhiman 2021). Although a reporting guideline for prediction modelling, TRIPOD (Collins 2015), has been available since 2015, most of the items that were poorly reported in the studies we included were also part of other reporting guidelines published earlier, like STROBE (von Elm 2007) or STARD (Bossuyt 2015). Additionally, most of the analyses (66%) in our review were published in or later than 2016 and no temporal pattern could be observed in the proportion of the items reported. Across the included analyses, just over half (54%) of the 20 mandatory TRIPOD items we evaluated were clearly reported. The state of the literature makes us doubt whether the reporting guidelines are being required or at least recommended by peer reviewers or publishers/editors.

None of the studies justified its sample size. The failure to consider this aspect during study design jeopardises efforts put into prognostic research (Steyerberg 2019). Only three cohort studies reported that the outcome assessors were blinded to at least a subset of the predictors (e.g. image readings or lab analysis) (Kosa 2022; Olesen 2019 Candidate; Olesen 2019 Routine; Rovaris 2006). Because the purpose of data collection was not prognostic modelling in most of the analyses (64%), i.e. secondary data use, blinding of predictors to the outcome was probably not even considered. Still, its presence (or absence) could have been reported, especially in a disease area like MS for which the subjectivity and reliability during the assessment of clinical outcomes is an ongoing issue (van Munster 2017).

Details of the full model and how to find the absolute or relative risk of an individual patient were either missing or unclear in most (79% and 71%, respectively) of the model developments. In our opinion, this indicates a failure to deliver the objective of these studies. Reporting the performance of a newly developed prognostic prediction model serves no purpose unless the models are also reported in such a way to enable future, preferably independent, external validations and further application to individual patients. Despite the anticipated difficulty of reporting models developed using ML models, we consider this to be possible by e.g. transporting model objects or developing web‐based applications/platforms to calculate predictions for individual patients (Boulesteix 2019). Failure to provide the model or a way to use it indicates that there is research waste without any tangible (potential) benefits. This failure also precludes any discussion suggesting the need for further validation or clinical application.

Clear reporting of the amount of missing data (only in 24%) was another topic that could be improved. This aspect makes assessment of potential bias due to overrepresentation of complete cases difficult (Wynants 2017). Many studies also failed to report the disease‐modifying treatments received by the participants. Any treatment received at baseline is important to understand the population to which the prognostic prediction is applicable. Treatment received during follow‐up poses another challenge to the prediction model because it is likely to change the outcome as a factor post‐baseline. The treatments received by study participants were reported clearly only in 32% of the analyses.

Finally, model performance measures were reported to a limited extent but did not meet our expectations. Although we considered only a discrimination measure, e.g. a c‐statistic or AUC, with its associated uncertainty, to be sufficient for this reporting assessment item, only 34% of the model developments or validations could fulfil this criterion. The value of a model cannot be evaluated without an appropriate performance measure. The uncertainty around this measure is also critical to understand if the model is actually performing better than random decisions, corresponding to a c‐statistic of 0.5.

Certainty of the evidence

At the time of this review’s submission, no GRADE tool was available for prognostic models. Hence, the certainty of evidence is not rated in this review.

Study limitations of prognostic model development studies

We found all but one development to have a high risk of bias according to PROBAST. The principal drivers of this result were primarily from the analysis domain: the use of routine care or registry data combined with suboptimal methods for dealing with missing data, insufficient sample sizes with respect to the number of predictors of interest, incomplete model performance evaluation due to lack of assessment of calibration and sometimes even discrimination, and failure to account for overfitting and optimism, especially with regard to accounting for all modelling steps.

Besides being of sufficient quality and representative of the population of interest, the data should also be sufficiently large in order to develop robust models that precisely estimate risk overall, as well as across the spectrum of predicted risk, both in the development set, and, more importantly, when applied to new people (Riley 2020). We found about three‐quarters of the developments to be high risk due to insufficient sample size, a point also mentioned in Pellegrini 2019 as a limitation of the MS prognosis research literature. In smaller datasets with large numbers of considered predictor parameters relative to the number of events and total size, the risk of model overfitting increases, without guarantee that further methods, such as shrinkage, can be applied to fully overcome the problem. Very few of the traditional statistical developments found here addressed overfitting and optimism by applying shrinkage.

Overfitting and optimism were also neglected in other ways. Although resampling procedures have been recommended as best practice in internal validation, around one‐third of the reviewed analyses conducted apparent validation or relied on a single random split of the data. Internal validations employing cross‐validation or bootstrapping methods, however, are not immune from overoptimism. The resampling schemes should include the entire modelling process, including, for example, predictor selection. Unfortunately, this is difficult to assess given the identified pitfalls and the increasingly complex analysis structure, especially in developments using ML.

Large sample sizes and proper validation methods need to be used in combination with clinically meaningful performance measures to contribute information on the suitability of the model to practice. Assessment of both discrimination and calibration are widely recommended; however, we found only about half of model evaluations to have reported discrimination and only one‐quarter to have assessed calibration. While the recommendation to assess calibration has been poorly received in general, it seems worse in publications using ML models. This may be partly due to a focus on classification rather than risk estimation or lack of familiarity with model evaluation in the clinical setting. Due to the increasing popularity of ML in the field of clinical prediction modelling, it is ever more important to identify and address the reasons for this shortcoming.

Additionally, we found the importance of timing in prognostic modelling to be underappreciated in the MS literature. Many studies only implicitly stated the time at which the model is meant to be applied to participants and the prediction horizon of interest, making assessment of all PROBAST domains difficult. This information is needed to understand whether the included participants align with the population of interest, whether predictors are available at the intended time of model use, whether the time interval between predictor collection and outcome assessment is appropriate, and whether complexities in the data related to time have been properly accounted for. As discussed by Pellegrini and colleagues, defining a true baseline time point in MS is difficult (Pellegrini 2019); however, even clear clinical landmarks, such as before or after treatment initiation, have been ignored especially when using registry data. Several studies have used predictors assessed years after the implicit baseline timing of the model, which has complicated interpretation and assessment. Furthermore, handling of differing observation periods across participants by ‘adjusting’ for this confounding has been unclearly reported and possibly inappropriate for prediction tasks. These issues suggest possible confusion about the use of prognostic modelling and the types of conclusions that can be drawn from these analyses.

Usability

While all models were found to require an MS specialist for predictor assessment, equipment specialisation varied, with some models relying on basic clinical predictors but many relying on advanced imaging, omics, or electrophysiological data. While these models may currently be limited to MS research centres, improving access may expand the applicable settings of use over time. This can be seen most readily in the rapid development of diagnostic capabilities in recent years: imaging with high‐resolution 3 Tesla MRI scanners is now much better positioned across the board than it was 10 years ago. New markers, such as the ability to measure brain atrophy, are being added to standardised MS MRI protocols. Major advances have been observed in other areas as well. Laboratory chemistry can determine antibodies for differential diagnostic considerations that were widely unknown 10 years ago (e.g. aquaporin‐4 antibody, myelin oligodendrocyte glycoprotein, immunoglobulin G). New laboratory analyses are also on the horizon. Here, for example, the possibility of detecting neurofilament light chain in blood is emerging; several years ago, this would have been possible only in CSF.

Besides the lack of external validation, the other greatest threat to usability is the lack of clear reporting of final models or tools and instructions on their use; half of the identified models provided neither of these while only approximately 20% reported both. The reporting of the other models contained inconsistencies, unclear predictor coding, or missing model components, making direct use on new participants difficult. Partial reporting also hampers the ability of future researchers to predict absolute risks using these models. These deficits in usability were not compensated for by measures of reproducibility, as only a handful of studies provided analysis data or code and none shared both. Sharing of both data and code could improve model sharing, especially for complex algorithms without simple representations. However, their use may require further specialised skills and other forms of model presentation should be preferred when translation into clinical practice is the goal.

Potential bias in the review process

In order to reduce the bias at the search stage, we searched three major databases. We also searched the conference proceedings of the main organisations in the MS disease area and tried to access more information on the eligible ones by Internet search and author contact. The measures we took to prevent bias in the study selection and data extraction/risk of bias processes were 1) pre‐protocol pilots, 2) training of all contributors on these steps with the relevant methodological publications, protocol, and internal guidance, 3) performing these steps independently in duplicate, 4) resolving any disagreements with at least one other co‐author in group discussions, and 5) contacting the study authors for any missing or unclear data critical for risk of bias assessment or planned analysis. Despite these measures, due to the novelty of this review type, some methodological decisions needed to be made or elaborated on either at the protocol stage or during the review, which may have introduced bias.

  • Database search: Our database search strategy was constructed to be sensitive. Our decision not to search trial registries is not expected to introduce bias. Prognostic research studies, like diagnostic ones, are unlikely to be pre‐registered (Korevaar 2020; Peat 2014; Sekula 2016) despite calls to do so (Altman 2014). The restriction of the search to publications after 1996 might have introduced bias. However, the fact that only two studies before 2001 (Weinshenker 1996 and the related model development study Weinshenker 1991) were found eligible for this review indicates that this restriction is expected to result in missing of very few relevant publications, if any.

  • Reference search: A post‐protocol change that may have introduced bias is the step of backward reference searching. For forward reference searching, we utilised the functionality of Web of Science, which is one of the available and commonly used platforms for this task (Briscoe 2020). Because Web of Science also offers an option for backward reference searching, we decided to access the titles/abstracts from the same platform instead of handsearching. This methodological change allowed us to screen the totality of titles/abstracts of the references as opposed to only titles, but it was less sensitive than handsearching due to the fact that some references are not linked in Web of Science records. Still, we expect this pitfall to introduce little bias because most of the backward/forward references were linked to the records and only 5% of the included reports came from reference searching as opposed to database searches and other sources.

  • Study selection: The language of reporting prognostic models varies across time and based on the main speciality of the authors, e.g. clinicians versus methodologists. In order to be more representative of the literature, we considered the objectives of a study to be an expression of intent not only within the objectives section of the abstract or the main text but also dispersed over the focus of the Results and Discussion sections. This perspective led to inclusion of studies with primary objectives other than prognostication in MS patients. We were also inclusive of various clinical outcome definitions (including timing), data types, and statistical methods. However, we consider this aspect not as a bias but rather a strength of our review, which, due to the rarity of independent external validations of the developed models, was destined to turn into a systematic and comprehensive description of the state of the literature in this disease area. Expanding the outcomes of interest to fatigue, falls, or depression would have resulted in only a few, if any, additional reports but it would have made the results from our review, which has the aim of being relevant to clinical practice and patients alike, less interpretable.

  • Model selection: In a model development study, fitting many models may be used, amongst others, as a means for selecting predictors, selecting the most predictable outcome, selecting the best algorithm type, or optimising tuning parameters. When the authors of a study indicated their preferred model in any way, we only included those models, otherwise, we refrained from making a selection. Although the studies that reported multiple models and failed to select a final favourite model for presentation might even be considered not to have prognostic model development as an aim, we decided not to exclude such studies. The number of models from a single study reached up to four (Zhao 2020), but less than one‐quarter of the studies contributed more than one model. This may have introduced bias in the descriptive quantitative measures we reported across the analyses, e.g. median percentage of females or median sample size, due to the overrepresentation of some studies or samples. Because our intention and capability with the available dataset is not to summarise or analyse the models but just to describe them, we find it appropriate to consider these as separate analyses for a general overview of the literature.

  • Risk of bias: Although a detailed explanation and elaboration publication for PROBAST exists (Moons 2019), we had to interpret some items to fit the needs of our review. These interpretations were aimed at adapting the responses to the specifics of MS or the variety of statistical methods not addressed by the narrow scope of the current PROBAST, which focusses on binary outcomes modelled with traditional regression methods. These decisions are summarised in the Methods and in detail in Appendix 4. Irrespective of our interpretations, almost all analyses that were rated as having high risk of bias are expected to have still been rated the same at the analysis domain, and hence overall, due to their failure of assessing model performance appropriately by reporting both discrimination and calibration while accounting for overoptimism. Thus, our interpretations are expected to introduce no substantial change in the overall risk of bias assessment and any bias would be limited to the item/domain level.

  • Analysis: Because the number of independent external validations per model did not allow us to perform a meta‐analysis or meta‐regression, we had no analysis to which bias can be introduced. The only model that would allow meta‐analysis of its performance in its three external validations (Manouchehrinia 2019) was not meta‐analysed due to the lack of independence in its external validations (Kreuzberger 2020), as well as its inherent limitations by design causing high risk of bias both in its development and validations. Any derived quantitative measures in this review are related to basic description of the studies. If reported for subgroups, pooled mean and pooled SD of participant characteristics were calculated. Also, if missing, the variance of c‐statistics was calculated and used to build confidence intervals. We report these in the data tables or Characteristics of included studies.

In this review, publication bias was not assessed due to the lack of methodological guidance on this topic.

Applicability of findings to clinical practice and policy

We identified 75 models, 63 of which were not externally validated and the performance measures for these models were reported only in the same population as model development. Although 11 of these models used language in their discussion sections suggesting their applicability and implementation to clinical care, they cannot be recommended for application without showing generalisability by independent external validations of their performance. The same is true for the remaining 12 models, none of which have external validations by independent teams and only three have external validations in separate studies: Weinshenker 1991 M3 in Weinshenker 1996, Bergamaschi 2001 (BREMS) in Bergamaschi 2007 and Bergamaschi 2015, and Skoog 2014 in Skoog 2019. Also, of those with any external validation, 10 have only a single one. The development and validations of the model with the maximum number of external validations (Manouchehrinia 2019, three in the same study) were rated as having high risk of bias in two to four domains out of the four domains addressed by PROBAST. Hence, none of the identified models are yet applicable in clinical practice.

Moreover, the heterogeneity of definitions and populations in the findings of this review rather highlight the challenges of developing or validating a prognostic prediction model in people with MS due to the difficulty of defining the clinical need in terms of the relevant population, outcome and available predictors. The literature in this disease area turned out to lack uniformity in definitions of patient‐relevant clinical outcomes and time points to measure these. Also, the fast‐changing landscape of diagnostic subtypes and their criteria, e.g. the McDonald criteria published and then revised three times during the last 20 years, makes not only the extrapolation of applicability of a model to a future patient population difficult, but also the outcomes of diagnostic conversion irrelevant in clinical practice with passing time. Lack of an objective agreed‐upon and standardised definition of secondary progression is another factor hindering any research aiming to support clinical decision‐making that targets this outcome. The increasing number of different treatment options for all diagnostic subtypes of MS during the last 25 years, and hence the increasing proportion of those treated with them, also raises questions about the applicability of prognostic models developed using data from the pre‐treatment or first‐line treatment era to the people with MS today or in the future. This is evident in many domains, including in the relapse rate, the detection of paraclinical disease activity but also in the possibility of using advanced markers such as lesion volume and atrophy measurements in MRI or the use of laboratory‐based biomarkers for prognosis assessment. With all of these changes, prognostic modelling in this field is truly like "chasing a moving target" (Chen 2017), and difficult even when applying the highest methodological standards (Pellegrini 2019). Any conclusions regarding the applicability of prognostic models in this disease area require rigorous testing of the developed models with many and up‐to‐date external validation studies; however, this is currently lacking.

Implications of the rise of ML for research

The rising popularity of ML algorithms has also spread to MS. Since 2018 an increasing number of ML prognostic model developments have been published and every single development identified in the first half of 2021 employed ML techniques. The Radiology Editorial Board suggests that artificial intelligence and ML will impact any medical application using imaging (Bluemke 2020). As MRI has been an important tool for depicting pathological features in MS since the 1980s (Ge 2006), this trend should then not be surprising. Although ML offers great potential for uncovering complex relationships in our ever‐growing data using fewer assumptions, this potential cannot be harnessed without greater attention to the needs of clinical practice and to good practice in prediction model development.

Because the use of ML for clinical prediction is still relatively new in MS, it is unsurprising that several publications are presented as pilot or proof of concept studies. As stated previously, many ML studies identified here also provide no model or tool for external use. There is, however, a looming threat of research waste if this trend continues. Across several specialities, the discrepancy between the number of developments and the number of tools used in practice has been noted. Studies using clinical applications are important for highlighting methodological accomplishments, but this type of research should not be conflated with or replace actual attempts to create prediction models for clinical practice (Mateen 2020). These differing aims may partly explain the high number of studies with no selected final models and low number of validation studies; the models identified were meant to depict methodological and technological advances rather than to provide individualised estimates of outcome risks.

Another possible reason for the lack of presented tools for prediction in new individuals may relate to the difference in cultures between ML and clinical research (Mateen 2020), and to the notion that clinical prediction modelling guidelines are not relevant to this body of ML research (Dhiman 2021). Mateen and colleagues argue that, in order for clinical practice to experience the greatest benefit from this work, greater collaboration between healthcare experts and the ML community is necessary. Our review suggests that these collaborations already exist, but that the guidelines put forth by clinical prediction modelling experts are still ignored. We argue that all researchers interested in clinical prediction need to not only work together, but also to be responsible for conducting research according to the current best practices. This entails, at a minimum, adherence to the reporting guidelines set out in TRIPOD and TRIPOD‐AI, when it becomes available. The brief guide on assessing radiological research using artifical intelligence published by the Radiology Editorial Board may also prove valuable to the MS research community (Bluemke 2020), although this document is relevant to a wider range of radiological studies involving ML.

Comparisons with other reviews

We are aware of several related prognosis reviews in MS, including Brown 2020, Havas 2020, and Seccia 2021. Although a systematic review of prognostic models in a different clinical field, it is also worth comparing our review to Kreuzberger 2020, the first published Cochrane Review of prognostic models to date.

The review Brown 2020 differed from our review in that they focused specifically on prognostic models intended to be used at diagnosis of RRMS. While their population was more specific, their definition of prediction models was broader, including all models using multiple predictors in combination to determine the probability of an outcome. This led to the inclusion of several models that were not developed with the intent of prediction of individual outcomes, but rather as explanatory models of disease aetiology, and which were excluded from our review. This highlights the difficulty in distinguishing between studies aiming to develop prognostic models and those with other purposes, a problem encountered in our review, as well as in Kreuzberger 2020. This point was also echoed in the review Havas 2020, which reported that almost half of the over 6000 studies screened were not prediction modelling studies, but rather other study types that used the words prediction and association interchangeably.

Unlike our review and that of Brown 2020, however, Havas 2020 included models predicting treatment response, such as the Magnetic Resonance Imaging in MS (MAGNIMS) score (Sormani 2016), and scoring systems defined by expert knowledge, such as the Rio score (Río 2009), rather than statistical algorithms. Models predicting treatment response are very important to MS clinical practice; however, they were outside the scope of our review. Treatment response prediction, and causal prediction more generally, is an evolving field and its model development and performance assessment methods are an area of active research. Further methodological foundations are still needed in order to inform such a review task. While Havas and colleagues called on MS researchers to establish a consensus on the definition, development, and validation of prognostic models, we would like to emphasise that this consensus already exists within the clinical prediction modelling literature and only needs to be incorporated into the MS speciality.

Seccia 2021 reviewed recent ML models considering clinical data in their feature sets, making the scope of their review much narrower than ours. All studies identified in Seccia 2021 were also included in our review and we identified 19 additional ML models. At least seven of these were published too late for consideration by Seccia and colleagues and several others were excluded from that review for a strong focus on imaging or omics data over clinical data. The review authors highlighted the importance of sufficient data size but also data quality, relevant to ML prognostic model studies and traditional regression studies alike. They also mentioned the problems inherent in subjective disease measures and the non‐generalisability of some predictor types, such as imaging data specific to a single device, which we also identified to be problematic in the MS field. They further discussed issues specific to ML, including interpretable ML and the combination of tabular and non‐tabular data. They made a point of stating that no identified study had developed a prognostic model with performance suggestive of clinical use. Given that none of these studies were truly externally validated and that risk of bias was rated high in all of them due, at least in part, to small sample sizes and lack of calibration assessment, we would add that, even if the reported performance estimates were significantly higher, these models would still not be ready for clinical practice. Additionally, these were not reported in a way that allows for external validation, as no tool or model fit was provided.

Unlike in Kreuzberger 2020, our results focus on all studies identified, not just those with external validation. Due to the rarity of multiple external validations for a single prognostic model in MS, we wanted to comprehensively describe and summarise the state of the field, not just the models most ready for translation into clinical practice. In light of this aim, we did not consider the applicability of a model as lesser based on the age of the data or diagnostic criteria. Kreuzberger and colleagues rated applicability as unclear if eligibility criteria or recruitment period were not given, stating that they could not be sure if the included individuals matched the review question and a current application of the model. This is certainly also an issue in MS, which has faced continual updating of diagnostic criteria since publication of the first McDonald criteria in 2001 (McDonald 2001). In fact, several studies we included defined conversion to clinically definite MS outcomes using the Poser 1983 criteria and used components of the later published criteria as predictors. The aim of these studies may have been more in line with validation of newly published criteria rather than development of a new prediction tool, again highlighting the need for better reporting. However, the models with multiple external validations in this review did not suffer from such problems.

Authors' conclusions

Implications for practice

The goal of prognostic modelling research must be to bring into routine clinical care the use of multivariable prognostic models for predicting future clinical outcomes in people with multiple sclerosis (MS). This is of particular interest because, although highly effective therapeutic options are available, they are associated with relevant risks to the patient. With high variability in disease worsening or progression, it is imperative to use a need‐based therapy.

The currently available evidence for predicting MS prognosis in clinical routine is not sufficient. This is due to the quality standards that have to be applied for the generation and especially the validation of such prediction scores. Ideally, prediction models are developed using large, high‐quality datasets with subjects representative of the population to which the model will later be applied. However, our results do not exclude the possibility of transferability of the existing prediction models into clinical routine after successful external validations and demonstration of benefit in clinical practice by randomised controlled trials investigating their impact. Both the validation of currently available predictive models and the consistent application of quality standards in future studies are needed.

Implications for research

Our systematic review identified an abundance of models developed for prediction of disability, relapse, conversion to definite MS, conversion to progressive MS, and models for composite outcomes based on these. As previously found within and beyond the scope of MS, these studies were generally not conducted or reported according to current standards and guidelines. We point the MS research community to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) and Prediction Model Risk of Bias Assessment Tool (PROBAST) for such guidance, and to the upcoming artificial intelligence updates to these tools for machine learning specific guidance.

Clinical prediction modelling studies should be conducted using longitudinal datasets collected for the purpose of prognostic research in order to minimise bias in terms of participants, predictors, and outcome. An understanding of when the predictor or outcome measurements can take place and how these affect the interpretation of a prognostic model is a point of note for researchers in this disease area. Appropriate methods should be used in consultation with the experts in clinical prediction modelling. Models should be provided in a manner making them usable by other researchers because developed models should be externally validated, preferably by independent researchers. Data sharing practices can support external validation efforts.

History

Protocol first published: Issue 5, 2020

Notes

Data and code availability

The dataset summarised in this review is available as tables in the Appendices and in Characteristics of included studies. The R code used for the statistical description is available upon request from the authors.

Role of sources of support

The funding sources did not have any influence on the planning, conduct, analysis, or reporting of this review.

Acknowledgements

We would like to thank Cochrane and several members for their support in the development of this review and those who conducted the editorial process for this review:

  • Graziella Filippini (Co‐ordinating Editor) and Ben Ridley (Managing Editor) of Cochrane Multiple Sclerosis and Rare Diseases of the CNS.

We would like to thank the following people who conducted the editorial process for this article:

  • Sign‐off Editor (final editorial decision): Robert Boyle, Imperial College London, Cochrane’s Editorial Board.

  • Managing Editor (selected peer reviewers, provided editorial guidance to authors, edited the article): Colleen Ovelman and Sam Hinsley, Central Editorial Service.

  • Editorial Assistant (conducted editorial policy checks, collated peer reviewer comments and supported editorial team): Lisa Wydrzynski, Central Editorial Service.

  • Copy Editor (copy editing and production): Jenny Bellorini, Cochrane Central Production Service; pre‐edit: Tori Capehart, Copy Editor, J&J Editorial; Margaret Silvers, Copy Editor, J&J Editorial; Sarah Hammond, Senior Copy Editor, J&J Editorial.

  • Peer reviewers (provided comments and recommended an editorial decision): Arman Eshaghi, University College London (clinical/content review), Bruce V Taylor, University of Tasmania (clinical/content review), Steve Simpson‐Yap, The University of Melbourne (clinical/content review), Iván Pérez‐Neri (consumer review), Nina Kreuzberger, Cochrane Haematology, Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany (methods review), Robin Featherstone, Cochrane Central Editorial Service (search review).

We would also like to thank Johanna AA Damen (Co‐ordinator) of the Cochrane Prognosis Methods Group for responding to our questions. We are thankful to Liliya Eugenevna Ziganshina for the support during the eligibility assessment of a publication in Russian (Zakharov 2013). For the full translation of the same article and support during its data extraction, we are grateful to Larissa German.

We thank all the primary study authors who replied to our requests for further information. We are very grateful for the conversations with and guidance from Beate Sick regarding deep learning methods. We are also thankful to Anja Friedrichs for the diligent proofreading of critical sections of the review text.

Appendices

Appendix 1. Electronic search strategies

Database: Ovid MEDLINE(R) and Epub Ahead of Print, In‐Process, In‐Data‐Review & Other Non‐Indexed Citations and Daily 1946 to 1 July 2021

Date search conducted: 1 July 2021

Strategy:

# Concept Searches Results
1 1
Multiple sclerosis
(exp Multiple Sclerosis/ OR ((multipl* OR disseminated OR insular) ADJ1 sclerosis).ti,ab.) NOT (animals NOT humans).sh. NOT (child NOT adult).sh. 80,108
2 2a
Prognostic/ prediction
(exp Prognosis/ AND (exp disease progression/ OR exp Remission, Spontaneous/ OR exp Recurrence/)) OR (predict OR prognos*).ti. OR ((predict* OR prognos*) ADJ3 (recurrence OR progression OR relaps* OR remission OR remitting OR 'multiple sclerosis' OR ms)).ti,ab. OR ((predict* OR prognos*) ADJ3 treat* ADJ3 response).ti,ab. OR ((predict* OR prognos*) ADJ3 disease ADJ3 activity).ti,ab. 357,355
3 2b
General models
((model* OR decision* OR identif*) ADJ3 (history OR variable* OR multicomponent* OR multivariable* OR multivariate* OR covariate* OR criteria OR criterion OR scor* OR characteristic* OR finding* OR factor* OR rule*)).ti,ab. OR (decision* ADJ6 model*).ti,ab. 420,767
4 2c
Statistical terms
((logistic OR statistic*) ADJ3 model*).ti,ab. OR (decision*.ti,ab. AND exp models, statistical/) OR (Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable* OR multivariate* OR covariate* OR valid*).ti. OR ((Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable* OR multivariate* OR covariate* OR valid*).ab. AND (prognos* OR predict*).ti,ab.) 976,860
5 3
Outcomes
((disease OR disability OR invalid* OR function* OR outcome OR impairment OR composite OR activity OR severity OR cognitive OR edss OR treatement OR ms OR 'multiple sclerosis' OR brems) ADJ6 (scor* OR scal* OR status OR assess* OR index OR classification)).ti,ab. OR (clinical ADJ3 (assess* OR activity)).ti,ab. OR ((disease OR disabilit* OR risk OR calculat*) ADJ3 (course OR progression)).ti,ab. OR (relaps* ADJ3 (rate OR frequen* OR time OR prognos* OR predict*)).ti,ab. OR (clinical* ADJ3 decision*).ti,ab. OR ((ms OR cdms OR 'multiple sclerosis') ADJ3 (develop* OR course OR progress* OR relaps* OR clinical*)).ti,ab. 1,094,627
6   (1 and 2) or (1 and (3 or 4) and 5) 5004
7   limit 6 to yr="1996 ‐Current" 4764

Database: EMBASE via embase.com 1974 to 2 July 2021

Date search conducted: 2 July 2021

Strategy:

# Concept Searches Results
1 1
Multiple sclerosis
('multiple sclerosis'/exp/mj OR (ms:ti AND 'multiple sclerosis'/exp) OR (((multipl* OR disseminated OR insular) NEAR/1 sclerosis):ti,ab)) NOT ([animals]/lim NOT [humans]/lim) NOT ('nonhuman'/exp NOT 'human'/exp) NOT 'animal model'/exp NOT ('child'/exp NOT 'adult'/exp) NOT [conference abstract]/lim 82,989
2 2a
Prognostic/ prediction
('predictive value'/exp AND 'model'/exp) OR ('prognosis'/exp AND ('disease exacerbation'/exp OR 'recurrent disease'/exp OR 'recurrence risk'/exp OR 'relapse'/exp OR 'remission'/exp)) OR predict*:ti OR prognos*:ti OR ((predict* OR prognos*) NEAR/3 (recurr* OR progress* OR relaps* OR remission OR remitting OR 'multiple sclerosis' OR ms)):ti,ab OR ((predict* OR prognos*) NEAR/3 treat* NEAR/3 response):ti,ab OR ((predict* OR prognos*) NEAR/3 disease NEAR/3 activity):ti,ab 848,229
3 2b
General models
((model* OR decision* OR identif*) NEAR/3 (history OR variable* OR multicomponent* OR multivariable* OR multivariate* OR covariate* OR criteria OR criterion OR scor* OR characteristic* OR finding* OR factor* OR rule*)):ti,ab OR (decision* NEAR/6 model*):ti,ab 596,768
4 2c
Statistical terms
((logistic OR statistic*) NEAR/3 model*):ti,ab OR (decision*:ti,ab AND 'statistical model'/exp) OR (Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable* OR multivariate* OR covariate* OR valid*):ti OR ((Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable* OR multivariate* OR covariate* OR valid*):ab AND (prognos* OR predict*):ti,ab) 1,397,586
5 3
Outcomes
((disease OR disability OR invalid* OR function* OR outcome OR impairment OR composite OR activity OR severity OR cognitive OR edss OR treatment OR ms OR 'multiple sclerosis' OR brems) NEAR/6 (scor* OR scal* OR status OR assess* OR index OR classification)):ti,ab OR (clinical NEAR/3 (assess* OR activity)):ti,ab OR ((disease OR disabilit* OR risk OR calculat*) NEAR/3 (course OR progression)):ti,ab OR (relaps* NEAR/3 (rate OR frequen* OR time OR prognos* OR predict*)):ti,ab OR (clinical* NEAR/3 decision*):ti,ab OR ((ms OR cdms OR 'multiple sclerosis') NEAR/3 (develop* OR course OR progress* OR relaps* OR clinical*)):ti,ab 1,680,033
6   #1 AND #2 OR (#1 AND (#3 OR #4) AND #5) 5041
7   (#1 AND #2 OR (#1 AND (#3 OR #4) AND #5)) AND [1996‐2021]/py 4976
8 Multiple sclerosis conference abstracts ('multiple sclerosis'/exp/mj OR (ms:ti AND 'multiple sclerosis'/exp) OR (((multipl* OR disseminated OR insular)
NEAR/1 sclerosis):ti,ab)) NOT ([animals]/lim NOT [humans]/lim) NOT ('nonhuman'/exp NOT 'human'/exp)
NOT 'animal model'/exp NOT ('child'/exp NOT 'adult'/exp) AND [conference abstract]/lim
33,377
9   #8 AND #2 OR (#8 AND (#3 OR #4) AND #5) 4077
10 Specific conference names 'european committee for treatment and research in multiple sclerosis':nc OR ectrims:nc OR 'americas
committee for treatment and research in multiple sclerosis':nc OR actrims:nc OR 'american academy of
neurology':nc OR aan:nc OR 'european academy of neurology':nc OR ean:nc
49,010
11   #9 AND #10 2730
12   #9 AND #10 AND [1996‐2021]/py 2730

Databases: Cochrane Database of Systematic Reviews (CDSR; 2021, Issue 6) and Cochrane Central Register of Controlled Trials (CENTRAL; 2021, Issue 6) via www.cochranelibrary.com

Date search conducted: 2 July 2021

Strategy:

# Concept Searches Results
1 1
Multiple sclerosis
((multipl* OR disseminated OR insular) NEAR/1 sclerosis):ti,ab,kw 10,654
2 2a
Prognostic/ prediction
predict*:ti OR prognos*:ti OR ((predict* OR prognos*) NEAR/3 (recurr* OR progress* OR relaps* OR remission OR remitting OR 'multiple sclerosis' OR ms)):ti,ab,kw OR ((predict* OR prognos*) NEAR/3 treat* NEAR/3 response):ti,ab,kw OR ((predict* OR prognos*) NEAR/3 disease NEAR/3 activity):ti,ab,kw 32,700
3 2b
General models
((model* OR decision* OR identif*) NEAR/3 (history OR variable* OR multicomponent* OR multivariable* OR multivariate* OR covariate* OR criteria OR criterion OR scor* OR characteristic* OR finding* OR factor* OR rule*)):ti,ab,kw OR (decision* NEAR/6 model*):ti,ab,kw 25,595
4 2c
Statistical terms
((logistic OR statistic*) NEAR/3 model*):ti,ab,kw OR (Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable* OR multivariate* OR covariate* OR valid*):ti OR ((Stratification OR Discrimination OR Discriminate OR "c‐statistic" OR "c statistic" OR "Area under the curve" OR AUC OR Calibration OR Indices OR index OR Algorithm OR Multivariable* OR multivariate* OR covariate* OR valid*):ab AND (prognos* OR predict*):ti,ab,kw) 70,797
5 3
Outcomes
((disease OR disability OR invalid* OR function* OR outcome OR impairment OR composite OR activity OR severity OR cognitive OR edss OR treatment OR ms OR 'multiple sclerosis' OR brems) NEAR/6 (scor* OR scal* OR status OR assess* OR index OR classification)):ti,ab,kw OR (clinical NEAR/3 (assess* OR activity)):ti,ab,kw OR ((disease OR disabilit* OR risk OR calculat*) NEAR/3 (course OR progression)):ti,ab,kw OR (relaps* NEAR/3 (rate OR frequen* OR time OR prognos* OR predict*)):ti,ab,kw OR (clinical* NEAR/3 decision*):ti,ab,kw OR ((ms OR cdms OR 'multiple sclerosis') NEAR/3 (develop* OR course OR progress* OR relaps* OR clinical*)):ti,ab,kw 340,882
6   (#1 AND #2) OR (#1 AND (#3 OR #4) AND #5) 583
7   (#1 AND #2) OR (#1 AND (#3 OR #4) AND #5)
with Cochrane Library publication date from Jan 1996 to Jul 2021
583

Appendix 2. Data extraction form

Adapted from CHARMS checklist of Moons 2014.

Domain Key items
Study information
  • Study identifier (last name of first author, publication year, and, if necessary, model/analysis name), citation

  • Development with/without internal validation and/or external validation

Source of data
  • Cohort, case‐control, randomised trial, registry, routine care data

  • Primary/secondary use of data

Participants
  • Inclusion and exclusion criteria

  • Recruitment method and details (location, number of centres, setting)

  • Participant description (including age, sex, disease duration, type of MS at prognostication, diagnostic criteria used, description of EDSS/relapse at entry)

  • Details of treatments received

  • Study dates

Outcomes to be predicted
  • Definition and method of measurement of outcome

  • Category of outcome measure (conversion to definite MS, conversion to progressive MS, relapse, disability, composite)

  • Was the same outcome definition and method of measurement used on all participants?

  • Was the outcome assessed without knowledge of candidate predictors (i.e. blinded)?

  • Were candidate predictors part of the outcome?

  • Time of outcome occurrence or summary of duration of follow‐up

Candidate predictors
  • Number and type of predictors (e.g. demographics, symptoms, scores, CSF, imaging, electrophysiological, omics, environmental, non‐CSF samples, disease type, treatment, other)

  • Definition and method for measurement of candidate predictors

  • Timing of predictor measurement (e.g. at patient presentation, as diagnosis, at predefined intervals)

  • Were predictors assessed blinded for outcome (if relevant)?

  • Handling of predictors in the modelling (e.g. transformations, categorisations)

Sample size
  • Number of participants and number of events

  • Number of events in relation to number to candidate predictors (EPV)

  • Power of study assessed

Missing data
  • Number of participants with any missing value

  • Number of participants with missing values for each predictor

  • Method for handling missing data (e.g. complete‐case analysis, single imputation, multiple imputation)

  • Loss to follow‐up discussed

Model development
  • Modelling method (e.g. logistic, survival, penalised regression, machine/deep learning methods)

  • Modelling assumptions satisfied

  • Method for selection of predictors for inclusion in multivariable modelling (e.g. all candidate predictors, pre‐selection based on unadjusted association with outcome, etc.)

  • Method for selection of predictors during multivariable modelling (e.g. full model approach, stepwise selection, significance, multiple models, other)

  • Criteria used for selection of predictors during multivariable modelling (e.g. P value, AIC, BIC)

  • Shrinkage of predictor weights/regression coefficients (e.g. no shrinkage, uniform shrinkage, shrinkage due to estimation method)

  • Tuning parameter selection details and information on preventing data leakage

Model performance
  • Measure and estimate of calibration with confidence intervals (calibration plot, calibration slope, Hosmer‐Lemeshow test)

  • Measure and estimate of discrimination with confidence intervals (c‐statistic, D‐statistic)

  • Log‐rank used for discrimination (yes, no, not applicable)

  • Measure and estimate of classification with confidence intervals (sensitivity, specificity, PPV, NPV, net reclassification, accuracy rate (TP+TN/N), error rate (1‐accuracy), other)

  • Were a priori cut points used for classification measures? (yes, no, not reported, not applicable)

  • Overall performance (R2, Brier score, etc.)

Model evaluation
  • If model development, model performance tested on development dataset only or on separate external validation

  • If model development, method used for testing model performance on development dataset (random split of data, resampling methods e.g. bootstrap or cross‐validation, none)

  • In case of poor validation, was model adjusted or updated (e.g. intercept recalibrated, predictor effects adjusted, new predictors added)?

Results
  • Multivariable model (e.g. basic, extended, simplified) presented, including predictor weights/regression coefficients, intercept, baseline survival; with standard errors or confidence intervals)

  • Any alternative presentation of the final prediction models (e.g. sum score, nomogram, score chart, predictions for specific risk subgroups with performance)

  • Provide details on how risk groups were created, if done and the observed values at which they occur

  • Comparison of the distribution of predictors (including missing data) for development and validation datasets

  • If validation, is the same model used as presented in development (same intercept and weights, no dropping of variables, etc.)?

Interpretation and discussion
  • Aim according to authors (abstract, discussion)

  • Was the primary aim prediction of individual patient outcomes?

  • Are the models interpreted as probably/confirmatory (model useful for practice) or probably/exploratory (more research is needed)?

  • Comparisons made with other studies, discussion of generalisability, strengths, and limitations

  • Suggested improvements for the future

Usability and reproducibility of final model
  • Skill and specialisation of equipment required for predictor collection, sufficient explanation to allow for further use, whether absolute risk can be estimated with the presented tool

  • Model/tool, code, and/or data provided

AIC: Akaike information criterion
BIC: Bayesian information criterion
N: sample size
NPV: negative predictive value
PPV: positive predictive value
TP: true positive
TN: true negative

Appendix 3. Definitions used for data extraction

In order to ensure a uniform data extraction from included studies with various reporting styles, we had some working definitions. These are listed below:

Data sources

  • Cohort study: Many studies reported collecting data from a cohort of patients, although other details in their report implicitly or explicitly suggested sources of data other than a cohort study. This suggests that the word 'cohort' is used to refer to a group of patients on whom there is some longitudinal data available rather than a longitudinal study with pre‐defined data collection times and items. After trying to resolve any unclarity with the study authors, for practical purposes, the following occurs.

    • If words indicating other types of sources (e.g. a well‐known registry like MSBase, or health records) were also used in relation to the data source without explicit definition of a cohort study, we considered the data source to be the other type.

    • If no other words related to the data source were used while referring to the data, but no specific cohort study was referred to, we assumed the data source to be a cohort study, even though there were clues (e.g. irregular follow‐up times) against it.

    • In both of the cases above, the reporting of the data source in Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) was marked as unclear.

  • Primary data use: In line with the suggestion by Wynants 2017, we refrained from using 'retrospective' or 'prospective' for describing the data source. The data source types for which prognostic prediction modelling could be considered as primary data use were case‐control or prospective cohort studies. When the data collection of an included study had vague objectives, like natural history (e.g. Weinshenker 1991) or research on certain predictor domains (e.g. Agosta 2006), we assumed the data collection purpose to be primary, unless it was explicitly reported to be a retrospective data collection (e.g. Wottschel 2015).

Participants

  • Participants description: When age, disease duration, or sex was reported for subsets of the included patients but not overall (e.g. by outcome or diagnosis type), we calculated and reported weighted averages and pooled standard deviations according to Cohen 1988.

  • Treatments received: We collected data on disease‐modifying therapies and ignored symptomatic treatments for relapses. If the reported eligibility criteria specified inclusion of only treatment‐naive patients or a requirement for a wash‐out period, irrespective of its length, then treatment received at recruitment was considered to be none. No assumptions were made based on the diagnostic subtype of the included population or inclusion time with respect to disease onset. For instance, due to proponents of treating clinically isolated syndrome (CIS) patients in the literature (e.g. Wiendl 2021), we preferred not to assume they were treatment‐free unless it was explicitly reported.

Outcomes

  • Blinded assessment: Analyses utilising randomised trial participants as the data source usually referred to source articles reporting trials of drug interventions with blinding. However, this blinding only concerned the intervention under investigation but not the predictors considered in the prognostic prediction modelling study, employing secondary use of data. Hence, outcome assessment was only considered blinded if there was an explicit reporting of the outcome assessment being blinded to baseline status of the study population.

Candidate predictors

  • Considered predictors were all predictors used in univariate or multivariate analysis of all the models with the outcome of interest and included predictors were all predictors presented as being in the final model.

  • In line with Moons 2019, predictors were counted in terms of degrees of freedom.

  • We assumed dummy‐coding of categorical predictors (instead of e.g. one‐hot encoding) for all modelling methods unless the number of features or another type of coding was explicitly reported. Hence, when counting the degrees of freedom of the predictors considered or included in the models, we might have underestimated the number.

  • The data on number of considered interactions was deemed irrelevant for the following modelling methods, which were assumed to intrinsically take interactions into account: tree‐based methods (e.g. boosting, random forests), neural networks, support vector machines with nonlinear kernels.

Sample size

  • For continuous outcomes, we considered the number of events to be the number of observations and calculated the events per variable (EPV) accordingly. For models considering both tabular and non‐tabular (imaging) predictors, the EPV was computed using only the number of tabular predictors and number of events. For models considering only non‐tabular predictors, EPV could not be computed. When models used longitudinal data, predictor trajectories were counted as single predictors.

Missing data

  • We considered exclusion of participants with missing data from the study to be a separate method for handling missing data than complete case analysis, because exclusions were listed in the study exclusion criteria, implying that differences between these participants and included ones could not be explored.

Model development

  • Shrinkage: We extracted whether a specific shrinkage method was used (e.g. uniform shrinkage) or whether the modelling method induced some form of shrinkage. Penalised regression, support vector machines, random forests, boosting, and Bayesian methods were considered to induce shrinkage. Neural networks were considered to induce shrinkage only if dropout, early stopping, or other regularisation methods were mentioned.

Model usability and reproducibility

  • Skill required for predictor measurement was categorised into three levels: predictors a patient could assess alone, predictors for which a primary care clinician would be qualified to measure, and predictors requiring a specialist for measurement or interpretation.

  • Equipment specialisation required for predictor measurement was also categorised into three levels, based on equipment standards in Western Europe: predictors that required no special equipment or that could be measured with equipment found in a typical primary care clinic, equipment found in a standard hospital (for example, magnetic resonance imaging (MRI)), and equipment only found in a speciality centre (for example, optical coherence tomography or multimodal electrophysiological equipment)

  • We defined whether a model can practically be expected to be used in practice based on whether a model was reported, whether it was provided in a way enabling easy use on future people with multiple sclerosis (MS), and whether instructions on use were given. The categories are: lack of both model and instructions, reporting of only a model for prognostication, reporting of a model and instructions on use, and reporting of a tool for easy use and instructions.

  • A model’s reproducibility was described by the components given: ‘None’ if no model, tool, data, or code was provided, ‘Model’ if a model was reported, ‘Tool’ if a tool was given (for example, a nomogram or web application), ‘Code’ if analysis code was provided, and ‘Data’ if data were provided (‘DOR’ if the authors explicitly stated that data are available on request).

If coefficients other than intercepts or baseline hazards were not reported or if it was unclear whether all necessary coefficients were reported, the usability and reproducibility measures were rated as unclear.

Appendix 4. Decisions related to risk of bias and applicability

Decisions related to risk of bias

Participants domain
  • Prediction Model Risk of Bias Assessment Tool (PROBAST) includes items (4.3, 4.4) on the handling of missing data in the analysis domain. However, studies often explicitly used availability of data as an eligibility criterion and excluded participants with missing data. Similarly to Kreuzberger 2020, we decided to address the exclusion of participants with missing data in the participants domain (PROBAST item 1.2) if a study mentioned it as part of the inclusion/exclusion criteria and we addressed it in the analysis domain otherwise. If selection criteria were based on complete examinations and further predictor‐level missing data was addressed in the analysis, ratings in both domains were affected.

  • We considered registry data sources as being at high risk of bias (PROBAST item 1.1) unless the study authors reported a specific cohort study within the registry. This was in line with the PROBAST tool, which considers a data source to be appropriate when defined methods are consistently applied for participant inclusion/exclusion, predictor assessment, and outcome determination. This is not expected to be true of registries receiving data from many clinics over long periods of time. There may also be issues related to data quality and availability. For instance, Kalincik 2017 describes implementation of quality assessments for MSBase, a popular international multiple sclerosis (MS) registry. However, it is not only unclear whether these assessments are used to improve data quality in or sampling from the database or any of the prognostic studies based on this database, but they also do not address all possible limitations inherent in observational data.

Predictors and outcome domains
  • Objectively defining the diagnostic conversion from relapsing‐remitting MS (RRMS) to secondary progressive MS (SPMS) is difficult (Ferrazzano 2020). Currently, it is based on retrospective evaluation of gradual worsening in clinical and radiological assessments, independent of relapses (Lublin 2014). While some studies operationalised the definition of conversion to SPMS, e.g. using a priori defined changes in Expanded Disability Status Scale (EDSS), other studies left conversion unclearly defined. We considered the definition of conversion to SPMS using only clinical judgement to be subjective and therefore at high risk of bias, especially when used in studies relying on retrospective data collected across many sites and over long periods of time.

  • Based on the rationale that validated disability scales and scores (e.g. EDSS), relapses, or functional systems of symptoms are the most commonly used and accepted clinical parameters in MS practice and research, measurements or definitions based on these were generally considered to be objective and not greatly affected by interrater variability or blinding. Hence, these were considered to be at low risk of bias unless there was an indication to the contrary. The EDSS, for example, is a valid measure of MS severity and progression. Although this measure has documented drawbacks, such as greater interrater variability for lower scores, it is robust for measurements over long time periods and is internationally accepted as a primary endpoint in clinical trials (Meyer‐Moock 2014).

Analysis domain
  • Nonparametric techniques make fewer assumptions, which require more data. Machine learning (ML) modelling methods then require at least as much data as traditional modelling methods, possibly requiring over 200 events per predictor (van der Ploeg 2014). Clear guidance on this topic is lacking, so we used the current recommendation for PROBAST item 4.1, which requires at least 20 events per predictor. For learning methods using non‐tabular input without prior feature extraction, e.g. deep learning models taking raw images as input, events per variable (EPV) could not be defined, and we rated this item as ‘no information’ (NI), unless the sample size was clearly insufficient, as evidenced by the number of inputs and events.

  • Some studies dealt with heterogeneity in participant observation times by adjusting for follow‐up duration or numbers of visits during follow‐up, without specifying exactly how this was done. In these situations, we considered the study to be at high risk of bias regarding the methods for accounting for the complexity in the data (PROBAST item 4.6). However, when follow‐up duration was considered as a predictor, this was rather considered to be a predictor measured after time of intended prognostication and was addressed in PROBAST item 2.3.

  • Although an established outcome measure in MS, the EDSS is not without criticism. For example, the EDSS exhibits greater variability for lower scores than for higher scores, has unequal interval distances, and its rate of change depends on baseline values (Meyer‐Moock 2014). Outcome definitions addressing its weaknesses are recognised; however, not all studies within the review used these preferable outcomes. When the ordinal EDSS was predicted as a continuous outcome in a parametric linear regression, we also assessed baseline EDSS range and any use of interactions. If the range was large and interactions were not tested, we considered the study to be at high risk of bias due to insufficiently accounting for the complexity in the data (PROBAST item 4.6).

  • Calibration is just as important as discrimination in assessing prognostic models in medicine (Steyerberg 2019). For ML algorithms that output class assignments rather than probabilities, calibration measures may seem inappropriate compared to classification measures. However, many ML methods are known to produce poor predicted probabilities, making assessment even more important (Niculescu‐Mizil 2005; Zadrozny 2001). Calibrating ML models is possible using standard software, just as for traditional regression methods, and should be expected in the biomedical setting. Hence, we did not change the interpretation of item 4.7 of PROBAST for different modelling methods and judged studies lacking calibration assessment to be at high risk of bias.

  • Assessment of whether overfitting and performance optimism were accounted for, especially in ML studies, required information on data pre‐processing and tuning parameter selection, which can both lead to data leakage. Data leakage is the use of information in model training which is not expected to be available at the time of prognostication, leading to overestimation of model performance (Kaufman 2011). Preprocessing steps such as predictor standardisation are done to improve model fit and were therefore treated as a model tuning step. It is best practice to tune, select, and evaluate the model performance on different data, as in, e.g. a nested cross‐validation structure (Hastie 2009; Steyerberg 2019). We rated a model development as at high risk of bias (PROBAST item 4.8) if there was evidence of data leakage. This is related to the PROBAST guidance on all modelling steps being accounted for appropriately during internal validation.

  • Many ML studies that reported aiming to develop a clinical prediction model stopped short of clearly selecting a final combination of tuning parameters, predictors, and algorithm and then fitting the selected combination to the full dataset. These studies instead focused on presenting the process of model development. In these cases, it was impossible to determine whether the final presented model corresponded to the results from the multivariable analysis, as the final presented model did not seem to exist. Accordingly, we considered there to be no information for which to respond to PROBAST item 4.9.

Decisions related to applicability

We rated a study as having high concern regarding applicability if:

  • Participants domain: Participants with the outcome at baseline were included. This review is interested in prognostic models rather than diagnostic ones.

  • Outcomes domain: Outcomes did not have a clear clinical interpretation. This review included studies with clinical outcomes, which are relevant to people with MS and measure their symptoms, functioning and health status. For example, we considered the well‐known composite outcome no evidence of disease activity to be a simple interpretable combination of relapse, disease progression, and magnetic resonance imaging (MRI) activity. On the other hand, outcomes based on complex weighting of many clinical and paraclinical measures were considered to be difficult to interpret as it is not clear what a specific value of such an outcome means for people with MS.

  • Predictors domain: Only one type of predictor was considered. To be useful, clinical prediction models should use simple and cost‐effective predictors and add more complex predictors when they offer information above and beyond that offered by simple, available predictors such as demographics and disease characteristics (Steyerberg 2019). Studies using only MRI images, for example, are rated as at high concern for applicability. This review is interested in multivariable prediction models. While such studies may technically be multivariable, they ignore the prognostic value of other, possibly easier to collect, predictors.

  • Overall: The main objective according to the study report was not development or validation of a model for predicting future clinical outcomes in individuals with MS. The distinction between multivariable models used for prognostication and those used for other purposes can be unclear even when considering the full text, making exclusion of all models for other purposes difficult. When prognostication is not the main aim, the methods may not be optimal for this purpose.

Additionally, we rated study applicability as unclear if:

  • Overall: The study did not include sufficient details on a final model to allow for validation by unrelated researchers. Model coefficients, nomograms, scores, score charts, and web‐based tools and calculators were considered sufficient, while a list of important predictors was considered insufficient. Studies not reporting a final model are likely to be interested in the importance of methods or predictors, not prognostication of outcomes in individuals. Some studies mentioned the word pipeline, but we did not consider a pipeline to be a complete model/tool directly usable by clinicians and people with MS.

  • Participants: No eligibility criteria were reported other than the diagnosis of MS for the study population. Having no eligibility criteria other than MS diagnosis seemed unreasonably broad and to be hinting at an underreporting issue.

Appendix 5. Data tables

Table 3

2. Study characteristics.

Analysis Outcome Study type Data source Recruitment from
Agosta 2006 Disability Development Cohort (primary use) Italy (single site)
Bejarano 2011 Dev Disability Development + validation Cohort (primary use) Spain (single site)
Bejarano 2011 Val Disability Development + validation (location): model refit Cohort (primary use) Italy (single site)
Bergamaschi 2015 BREMSO MSSS Val Disability Validation (location, time): predictors dropped and different outcome Registry (secondary use) Italy, Canada, Australia, Spain, Netherlands, Argentina, Iran, Kuwait, Turkey, Denmark, Czech Republic, Portugal, French, Belgium, UK, Germany, Cuba, Israel, Hungary, USA, India, Mexico, Malta, Macedonia, Romania, Brazil
De Brouwer 2021 Disability Development Registry (secondary use) ND (multisite)
de Groot 2009 Dexterity Disability Development Cohort (primary use) Netherlands (multisite)
de Groot 2009 Walking Disability Development Cohort (primary use) Netherlands (multisite)
Kuceyeski 2018 Disability Development Mixed: cohort, registry, routine care (secondary use) ND (ND site)
Law 2019 Ada Disability Development Randomised trial participants (secondary use) Canada, United Kingdom, Netherlands, Sweden, Denmark, Finland, Germany, Estonia, Latvia, Spain
Law 2019 DT Disability Development Randomised trial participants (secondary use) Canada, United Kingdom, Netherlands, Sweden, Denmark, Finland, Germany, Estonia, Latvia, Spain
Law 2019 RF Disability Development Randomised trial participants (secondary use) Canada, United Kingdom, Netherlands, Sweden, Denmark, Finland, Germany, Estonia, Latvia, Spain
Lejeune 2021 Dev Disability Development + external validation Randomised trial participants (secondary use) France (multisite)
Lejeune 2021 Ext Val Disability Development + external validation (location, spectrum) Routine care (secondary use) France (single site)
Malpas 2020 Dev Disability Development + external validation Registry (secondary use) ND (multisite)
Malpas 2020 Ext Val Disability Development + external validation (location) Registry (secondary use) Sweden (multisite)
Mandrioli 2008 Dev Disability Development + external validation Cohort (secondary use) Italy (single site)
Mandrioli 2008 Ext Val Disability Development + external validation (time) Cohort (secondary use) Italy (single site)
Margaritella 2012 Disability Development Routine care (secondary use) Italy (single site)
Montolio 2021 Disability Development Routine care (secondary use) Spain (single site)
Oprea 2020 Disability Development Routine care (secondary use) Romania (single site)
Pinto 2020 severity 10 years Disability Development Routine care (secondary use) Portugal (single site)
Pinto 2020 severity 6 years Disability Development Routine care (secondary use) Portugal (single site)
Roca 2020 Disability Development Registry (secondary use) France (multisite)
Rocca 2017 Disability Development Cohort (primary use) Italy (multisite)
Rovaris 2006 Disability Development Cohort (primary use) Italy (multisite)
Sombekke 2010 Disability Development Unclear (secondary use) Netherlands (single site)
Szilasiova 2020 Disability Development Cohort (secondary use) Slovak Republic (single site)
Tommasin 2021 Disability Development Unclear (secondary use) Italy (multisite)
Tousignant 2019 Disability Development Randomised trial participants (secondary use) ND (multisite)
Weinshenker 1991 M3 Dev Disability Development Cohort (primary use) Canada (single site)
Weinshenker 1996 M3 Ext Val Disability External validation (location) Routine care (secondary use) Canada (single site)
Weinshenker 1996 short‐term Disability Development Routine care (secondary use) Canada (single site)
Yperman 2020 Disability Development Routine care (secondary use) Belgium (single site)
Zhao 2020 LGBM All Disability Development + validation Cohort (primary use) USA (single site)
Zhao 2020 LGBM Common Disability Development + validation Cohort (primary use) USA (single site)
Zhao 2020 LGBM Common Val Disability Development + validation (location): unclear if model refit Cohort (primary use) USA (single site)
Zhao 2020 XGB All Disability Development + validation Cohort (primary use) USA (single site)
Zhao 2020 XGB Common Disability Development + validation Cohort (primary use) USA (single site)
Zhao 2020 XGB Common Val Disability Development + validation (location): unclear if model refit Cohort (primary use) USA (single site)
Gurevich 2009 FLP Dev Relapse Development + external validation Unclear (unclear use) Israel (single site)
Gurevich 2009 FLP Ext Val Relapse Development + external validation Unclear (unclear use) Israel (single site)
Gurevich 2009 FTP Relapse Development Unclear (unclear use) Israel (single site)
Sormani 2007 Dev Relapse Development + external validation Randomised trial participants (secondary use) Argentina, Australia, Austria, Belgium, Canada, Denmark, France, Germany, Hungary, Israel, Italy, Netherlands, New Zealand, Spain, Sweden, Switzerland, UK, USA
Sormani 2007 Ext Val Relapse Development + external validation (spectrum) Randomised trial participants (secondary use) Europe (undefined), Canada
Vukusic 2004 Relapse Development Cohort (primary use) Unclear (total PRIMS cohort): France, Austria, Belgium, Netherlands, Italy, Denmark, Spain, Germany, United Kingdom, Portugal, Switzerland, Ireland
Ye 2020 gene signature Relapse Development Unclear (secondary use) Israel (single site)
Ye 2020 nomogram Relapse Development Unclear (secondary use) Israel (single site)
Aghdam 2021 Conversion to definite MS Development Cohort (secondary use) Iran (single site)
Bendfeldt 2019 Linear Placebo Conversion to definite MS Development Randomised trial participants (secondary use) Austria, Belgium, Czech Republic, Denmark, France, Finland, Germany, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland, United Kingdom, Israel, Canada
Bendfeldt 2019 M7 Placebo Conversion to definite MS Development Randomised trial participants (secondary use) Austria, Belgium, Czech Republic, Denmark, France, Finland, Germany, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland, United Kingdom, Israel, Canada
Bendfeldt 2019 M9 IFN Conversion to definite MS Development Randomised trial participants (secondary use) Austria, Belgium, Czech Republic, Denmark, France, Finland, Germany, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland, United Kingdom, Israel, Canada
Borras 2016 Conversion to definite MS Development Cohort (unclear use) Spain (single site)
Gout 2011 Conversion to definite MS Development Registry (secondary use) France (single site)
Martinelli 2017 Conversion to definite MS Development Routine care (unclear use) Italy (single site)
Olesen 2019 candidate Conversion to definite MS Development Cohort (primary use) Denmark (multisite)
Olesen 2019 routine Conversion to definite MS Development Cohort (primary use) Denmark (multisite)
Runia 2014 Conversion to definite MS Development Cohort (primary use) Netherlands (single site)
Spelman 2017 Conversion to definite MS Development Cohort (primary use) ND (multisite)
Wottschel 2015 1 year Conversion to definite MS Development Cohort (secondary use) UK (single site)
Wottschel 2015 3 year Conversion to definite MS Development Cohort (secondary use) UK (single site)
Wottschel 2019 Conversion to definite MS Development Cohort (secondary use) Spain, Denmark, Austria, UK, Italy
Yoo 2019 Conversion to definite MS Development Randomised trial participants (secondary use) Canada, United States
Zakharov 2013 Conversion to definite MS Development Unclear (unclear use) Russia (single site)
Zhang 2019 Conversion to definite MS Development Cohort (primary use) Germany (single site)
Bergamaschi 2001 BREMS Dev Conversion to progressive MS Development Mixed: registry, routine care (secondary use) Italy (single site)
Bergamaschi 2007 BREMS Ext Val Conversion to progressive MS External validation (location) Cohort (secondary use) Italy (multi‐site)
Bergamaschi 2015 BREMS Ext Val Conversion to progressive MS External validation (location, time) Registry (secondary use) Italy, Canada, Australia, Spain, Netherlands, Argentina, Iran, Kuwait, Turkey, Denmark, Czech Republic, Portugal, French, Belgium, UK, Germany, Cuba, Israel, Hungary, USA, India, Mexico, Malta, Macedonia, Romania, Brazil
Bergamaschi 2015 BREMSO SP Val Conversion to progressive MS Validation (location, time): predictors dropped Registry (secondary use) Italy, Canada, Australia, Spain, Netherlands, Argentina, Iran, Kuwait, Turkey, Denmark, Czech Republic, Portugal, French, Belgium, UK, Germany, Cuba, Israel, Hungary, USA, India, Mexico, Malta, Macedonia, Romania, Brazil
Brichetto 2020 Conversion to progressive MS Development Cohort (primary use) Italy (multisite)
Calabrese 2013 Dev Conversion to progressive MS Development + external validation Cohort (primary use) Italy (single site)
Calabrese 2013 Ext Val Conversion to progressive MS Development + external validation (time) Cohort (primary use) Italy (single site)
Manouchehrinia 2019 Dev Conversion to progressive MS Development + external validation Registry (secondary use) Sweden (multisite)
Manouchehrinia 2019 Ext Val 1 Conversion to progressive MS Development + external validation (location, time, spectrum) Cohort (secondary use) Canada (multisite)
Manouchehrinia 2019 Ext Val 2 Conversion to progressive MS Development + external validation (location, time, spectrum) Randomised trial participants (secondary use) Canada, Denmark, France, Germany, Italy, Poland, Portugal, Spain, Switzerland, United Kingdom
Manouchehrinia 2019 Ext Val 3 Conversion to progressive MS Development + external validation (location, time, spectrum) Randomised trial participants (secondary use) ND (multisite)
Misicka 2020 10 years Conversion to progressive MS Development Registry (secondary use) USA (multisite)
Misicka 2020 20 years Conversion to progressive MS Development Registry (secondary use) USA (multisite)
Misicka 2020 ever Conversion to progressive MS Development Registry (secondary use) USA (multisite)
Pinto 2020 SP Conversion to progressive MS Development Routine care (secondary use) Portugal (single site)
Pisani 2021 Conversion to progressive MS Development Cohort (secondary use) Italy (single site)
Seccia 2020 180 days Conversion to progressive MS Development Routine care (secondary use) Italy (single site)
Seccia 2020 360 days Conversion to progressive MS Development Routine care (secondary use) Italy (single site)
Seccia 2020 720 days Conversion to progressive MS Development Routine care (secondary use) Italy (single site)
Skoog 2014 Dev Conversion to progressive MS Development Cohort (primary use) Sweden (single site)
Skoog 2019 Ext Val Conversion to progressive MS External validation (location, time) Registry (secondary use) Sweden (single site)
Skoog 2019 Val Conversion to progressive MS Validation Cohort (primary use) Sweden (single site)
Tacchella 2018 180 days Conversion to progressive MS Development Routine care (secondary use) Italy (single site)
Tacchella 2018 360 days Conversion to progressive MS Development Routine care (secondary use) Italy (single site)
Tacchella 2018 720 days Conversion to progressive MS Development Routine care (secondary use) Italy (single site)
Vasconcelos 2020 Dev Conversion to progressive MS Development + external validation Unclear (unclear use) Brazil (single site)
Vasconcelos 2020 Ext Val Conversion to progressive MS Development + external validation (time) Unclear (unclear use) Brazil (single site)
Ahuja 2021 Dev Composite Development + external validation Mixed: routine care (electronic health records), cohort (secondary use) United States (single site)
Ahuja 2021 Ext Val Composite Development + external validation (spectrum) Routine care: electronic health records (secondary use) United States (multisite)
de Groot 2009 cognitive Composite Development Cohort (primary use) Netherlands (multisite)
Kosa 2022 Composite Development Mixed: case‐control, cohort (primary use) USA (ND site)
Pellegrini 2019 Composite Development Randomised trial participants (secondary use) Australia, Austria, Belarus, Belgium, Bosnia and Herzegovina, Bulgaria, Canada, Chile, Colombia, Costa Rica, Croatia, Czech Republic, Estonia, France, Georgia, Germany, Greece, Guatemala, India, Ireland, Israel, Latvia, Mexico, Macedonia, Netherlands, Moldova, New Zealand, Peru, Poland, Romania, Russian Federation, Puerto Rico, Serbia, Slovakia, South Africa, Switzerland, Spain, Ukraine, United Kingdom, United States, Virgin Islands (USA)

Ada: adaptive boosting
BREMSO: Bayesian Risk Estimate for Multiple Sclerosis at Onset
Dev: development
DT: decision tree
Ext: external
FLP: first level predictor
FTP: fine tuning predictor
IFN: interferon
LGBM: light gradient boosting machine
MS: multiple sclerosis
MSSS: multiple sclerosis severity score
ND: no data available
PRIMS: pregnancy in multiple sclerosis
RF: random forest
SP: secondary progressive
Val: validation
XGB: extreme gradient boosting

Table 4

3. Participant characteristics.

Analysis Outcome Females Age (years) Diagnosis (criteria) Disease duration (years) Treated Clinical description
Agosta 2006 Disability 70% Mean: 33.5 27.4% CIS, 46.6% RRMS, 26.0% SPMS (Lublin 1996; Poser 1983) Range: 0 to 25 Recruitment: 0%, follow‐up: 55% EDSS median (range): CIS 0.0 (0.0 to 1.5), RRMS 2.5 (1.0 to 5.5), SPMS 5.5 (3.5 to 6.5)
Bejarano 2011 Dev Disability 65% Mean: 35.1 31.4% CIS, 51.0% RRMS, 5.9% SPMS, 7.8% PPMS, 3.9% PRMS (McDonald 2005) Mean: 5.9, SD: 7.4 Recruitment: 55%, follow‐up: ND/unclear EDSS median (range): 2.0 (0 to 6), number of relapses in previous 2 years mean (SD): 1.29 (1.51)
Bejarano 2011 Val Disability 67% Mean: 37 88.5% RRMS, 11.5% SPMS (McDonald 2005) Mean: 9, SD: 6 Recruitment: ND/unclear, follow‐up: ND/unclear EDSS median (range): 1.5 (0 to 6.5)
Bergamaschi 2015 BREMSO MSSS Val Disability 71% Mean: 31.1 100% RRMS (McDonald 2001) ND/unclear Recruitment: ND/unclear, follow‐up: 72% ND
De Brouwer 2021 Disability 71% Mean (at onset): 32.2 85.6% RRMS, 4.9% SPMS, 3.3% PPMS, 1.4% PRMS, 4.8% unknown (Lublin 1996) Mean: 6.88, range: 3 to 25 Recruitment: ND/unclear, follow‐up: ND/unclear Prior 3‐year EDSS per patient mean (SD, range): 2.38 (1.48, 0 to 8.5)
de Groot 2009 dexterity Disability 64% Mean: 37.4 82% relapse onset, 18% non‐relapse onset (Poser 1983) Max: 0.5 Recruitment: 6%, follow‐up: 30% EDSS median (IQR): 2.5 (2.0 to 3.0)
de Groot 2009 walking Disability 64% Mean: 37.4 82% relapse onset, 18% non‐relapse onset (Poser 1983) Max: 0.5 Recruitment: 6%, follow‐up: 30% EDSS median (IQR): 2.5 (2.0 to 3.0)
Kuceyeski 2018 Disability 73% Mean (unclear when): 36.8 100% RRMS (McDonald 2010; McDonald 2017) Mean: 1.5, SD: 1.3 Recruitment: 95%, follow‐up: ND/unclear EDSS mean (SD): 1.1 (1.1)
Law 2019 Ada Disability 64% Mean: 50.9 100% SPMS (own definition) Mean: 9.3, SD: 5 Recruitment: ND/unclear, follow‐up: 50% EDSS median (IQR): 6.0 (4.5 to 6.5)
Law 2019 DT Disability 64% Mean: 50.9 100% SPMS (own definition) Mean: 9.3, SD: 5 Recruitment: ND/unclear, follow‐up: 50% EDSS median (IQR): 6.0 (4.5 to 6.5)
Law 2019 RF Disability 64% Mean: 50.9 100% SPMS (own definition) Mean: 9.3, SD: 5 Recruitment: ND/unclear, follow‐up: 50% EDSS median (IQR): 6.0 (4.5 to 6.5)
Lejeune 2021 Dev Disability 76% Mean (unclear when): 35.3 100% RRMS (McDonald 2005) Mean: 7.32, SD: 5.5 Recruitment: 55%, follow‐up: unclear, 32.8% therapeutic escalation, 59.1% no DMT change EDSS mean (SD): 3.45 (0.96)
Lejeune 2021 Ext Val Disability 77% Mean (unclear when): 36.2 100% RRMS (McDonald 2005) Mean: 7.62, SD: 6.56 Recruitment: 59%, follow‐up: unclear, 48% therapeutic escalation, 49.1% no DMT change EDSS Mean (SD): 2.93 (1.00)
Malpas 2020 Dev Disability 71% Mean (at onset): 31.7 100% RRMS (McDonald 2010) Mean: 0.33, SD: 0.3 Recruitment: unclear number of participants, mean percentage of time on treatment 1st year, first‐line 17.1%, second‐line 0.50%, follow‐up: unclear number of participants, mean percentage of time on treatment 10th year, 46% first‐line, 5.3% second‐line First year EDSS mean (SD): 1.78 (1.26), number of relapses mean (SD): 0.74 (0.93)
Malpas 2020 Ext Val Disability ND Mean (at onset): 33.4 100% RRMS (McDonald 2010) ND/unclear Recruitment: ND/unclear, follow‐up: ND/unclear First year EDSS mean (SD): 1.51 (1.28)
Mandrioli 2008 Dev Disability 61% Mean (at onset): 27.6 100% RRMS (ND) ND/unclear Recruitment: ND/unclear, follow‐up: 70% EDSS at diagnosis mean (SD): BMS 1.76 (0.24), SMS 2.17 (0.18)
Mandrioli 2008 Ext Val Disability 62% Mean (at onset): 33 100% RRMS (ND) ND/unclear Recruitment: ND/unclear, follow‐up: 68% EDSS at diagnosis mean (SD): BMS 1.65 (0.10), SMS 2.45 (0.23)
Margaritella 2012 Disability 79% Mean (at onset): 28.6 89.7% RRMS, 3.4% PPMS, 6.9% Benign MS (McDonald 2001; McDonald 2005) Mean: 10.1, SD: 7.3 Recruitment: ND/unclear, follow‐up: ND/unclear EDSS mean (SD): 2.1 (1.5)
Montolio 2021 Disability 67% Mean: 42.4 92.7% RRMS, 6.1% SPMS, 1.2% PPMS (McDonald 2001) Mean: 10.1, pooled SD: 7.74 Recruitment: ND/unclear, follow‐up: 70% EDSS mean: 2.6 (SD between 1.27 to 2.02)
Oprea 2020 Disability 62% Mean: 40.3 Unclear: RRMS, PPMS (ND) Mean: 10.2 Recruitment: ND/unclear, follow‐up: 100% ND
Pinto 2020 severity 10 years Disability 78% Mean (at onset): 32.3 100% RRMS (McDonald (undefined)) ND/unclear Recruitment: ND/unclear, follow‐up: ND/unclear ND
Pinto 2020 severity 6 years Disability 70% Mean (at onset): 30.3 100% RRMS (McDonald (undefined)) ND/unclear Recruitment: ND/unclear, follow‐up: ND/unclear ND
Roca 2020 Disability ND ND/unclear ND (ND) ND/unclear Recruitment: ND/unclear, follow‐up: ND/unclear ND
Rocca 2017 Disability 50% Mean: 51.3 100% PPMS (Thompson 2000) Median: 10, range: 2 to 26 Recruitment: 33%, follow‐up: 18% EDSS median (IQR): 6.0 (4.5 to 6.5)
Rovaris 2006 Disability 50% Mean: 51.3 100% PPMS (Thompson 2000) Median: 10, range: 2 to 26 Recruitment: 33%, follow‐up: 18% EDSS median (range): 5.5 (2.5 to 7.5)
Sombekke 2010 Disability 64% Mean (at onset): 32.4 51.2% RRMS, 31.4% SPMS, 17.4% PPMS (Poser 1983; McDonald 2006) Mean: 13.1, SD: 8.3 Recruitment: ND/unclear, follow‐up: ND/unclear EDSS median (IQR): 4.0 (3.5)
Szilasiova 2020 Disability 65% ND/unclear 63.5% RRMS, 29.4% SPMS, 7.1% PPMS (McDonald 2001) Mean: 6.7, range: 0.5 to 30 Recruitment: ND/unclear, follow‐up: 100% EDSS mean (SD, range): 3.03 (1.5, 1.0 to 7.0)
Tommasin 2021 Disability 64% Mean: 39.7 74.8% RRMS, 25.2% PMS (McDonald 2010; McDonald 2017) Mean: 9.9, SD: 8.06 Recruitment: ND/unclear, follow‐up: 72% EDSS median (range): 3.0 (0.0 to 7.5)
Tousignant 2019 Disability ND ND/unclear 100% RRMS (ND) ND/unclear Recruitment: 0%, follow‐up: 0% ND
Weinshenker 1991 M3 Dev Disability 66% Mean (at onset): 30.5 65.8% RRMS, 14.8% relapsing progressive, 18.7% chronically progressive, 0.9% unknown/83.3% diagnosis probable, 16.4% diagnosis possible (Poser 1983) Mean: 11.9, SE: 0.3 Recruitment: 0%, follow‐up: 0% ND
Weinshenker 1996 M3 Ext Val Disability 69% Mean: 44.1 84.3% RRMS, 2.0% relapsing progressive, 13.7% chronically progressive (ND) Mean: 12 Recruitment: ND/unclear, follow‐up: ND/unclear Unclear
Weinshenker 1996 short‐term Disability 69% Mean: 44.1 84.3% RRMS, 2.0% relapsing progressive, 13.7% chronically progressive (ND) Mean: 12 Recruitment: ND/unclear, follow‐up: ND/unclear Unclear
Yperman 2020 Disability 72% Mean: 45 CIS 1.7%, RRMS 53.2%, SPMS 10.7%, PPMS 2.9%, unknown 32.9% (unrecorded in the dataset) ND/unclear Recruitment: 74%, follow‐up: 79% EDSS mean (SD): 3.0 (1.8)
Zhao 2020 LGBM All Disability 76% Mean (unclear when): 39 Unclear: 16.4% CIS, 65.6% RRMS, 2.6% SPMS, 4.3% PPMS (ND) Median: 2, range: 0 to 44 Recruitment: unclear, for source cohort, 92.6% DMT first line, 0.8% DMT oral, 3.9% DMT high, 0.4% experimental, 2.3% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 34.6% DMT first line, 39.1% DMT oral, 9.6% DMT high, 3.1% experimental, 0.2% immune, 13.3% MS other, 4.5% never on treatment Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 2; 0 to 7.5)
Zhao 2020 LGBM Common Disability 76% Mean (unclear when): 39 Unclear: 16.4% CIS, 65.6% RRMS, 2.6% SPMS, 4.3% PPMS (ND) Median: 2, range: 0 to 44 Recruitment: unclear, for source cohort, 92.6% DMT first line, 0.8% DMT oral, 3.9% DMT high, 0.4% experimental, 2.3% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 34.6% DMT first line, 39.1% DMT oral, 9.6% DMT high, 3.1% experimental, 0.2% immune, 13.3% MS other, 4.5% never on treatment Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 2; 0 to 7.5)
Zhao 2020 LGBM Common Val Disability 69% Mean (unclear when): 42.5 Unclear: 15.9% CIS, 70.8% RRMS, 9.3% SPMS, 3.9% PPMS (ND) Median: 6, range: 0 to 45 Recruitment: unclear, for source cohort, 93.4% DMT first line, 0.7% DMT oral, 1.6% DMT high, 0.7% experimental, 0.5% immune, 1.6% steroid, 1.6% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 57.4% DMT first line, 19.8% DMT oral, 9.6% DMT high, 3% experimental, 0.5% immune, 5% steroid, 4.8% MS other, 15.1% never on treatment Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 3; 0 to 7)
Zhao 2020 XGB All Disability 76% Mean (unclear when): 39 Unclear: 16.4% CIS, 65.6% RRMS, 2.6% SPMS, 4.3% PPMS (ND) Median: 2, range: 0 to 44 Recruitment: unclear, for source cohort, 92.6% DMT first line, 0.8% DMT oral, 3.9% DMT high, 0.4% experimental, 2.3% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 34.6% DMT first line, 39.1% DMT oral, 9.6% DMT high, 3.1% experimental, 0.2% immune, 13.3% MS other, 4.5% never on treatment Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 2; 0 to 7.5)
Zhao 2020 XGB Common Disability 76% Mean (unclear when): 39 Unclear: 16.4% CIS, 65.6% RRMS, 2.6% SPMS, 4.3% PPMS (ND) Median: 2, range: 0 to 44 Recruitment: unclear, for source cohort, 92.6% DMT first line, 0.8% DMT oral, 3.9% DMT high, 0.4% experimental, 2.3% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 34.6% DMT first line, 39.1% DMT oral, 9.6% DMT high, 3.1% experimental, 0.2% immune, 13.3% MS other, 4.5% never on treatment Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 2; 0 to 7.5)
Zhao 2020 XGB Common Val Disability 69% Mean (unclear when): 42.5 Unclear: 15.9% CIS, 70.8% RRMS, 9.3% SPMS, 3.9% PPMS (ND) Median: 6, range: 0 to 45 Recruitment: unclear, for source cohort, 93.4% DMT first line, 0.7% DMT oral, 1.6% DMT high, 0.7% experimental, 0.5% immune, 1.6% steroid, 1.6% MS other, follow‐up: unclear, for source cohort, treatment at last visit, 57.4% DMT first line, 19.8% DMT oral, 9.6% DMT high, 3% experimental, 0.5% immune, 5% steroid, 4.8% MS other, 15.1% never on treatment Unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 3; 0 to 7)
Gurevich 2009 FLP Dev Relapse 64% Mean (unclear when): 36.3 34.0% CIS, 66.0% CDMS (McDonald 2001) Mean: 5.67, pooled SD: 0.89 Recruitment: 0%, follow‐up: 35% EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1)
Gurevich 2009 FLP Ext Val Relapse ND ND/unclear Unclear: CIS 60%, CDMS 40% (McDonald 2001) ND/unclear Recruitment: 0%, follow‐up: unclear, 9 on IMD Unclear, published inconsistencies, EDSS (unclear if mean and SD): CIS 2.58 (0.15) CDMS 5.3 (2.39), annualised relapse rate (unclear if mean and SD): CIS 6.1 (2.05) CDMS 1 (0.51)
Gurevich 2009 FTP Relapse 64% Mean (unclear when): 36.3 34.0% CIS, 66.0% CDMS (McDonald 2001) Mean: 5.67, pooled SD: 0.89 Recruitment: 0%, follow‐up: 35% EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1)
Sormani 2007 Dev Relapse ND Median: 37 100% RRMS (Poser 1983) Median: 5.9, range: 0.6 to 30 Recruitment: 0%, follow‐up: 1% EDSS median (range): 2.0 (0.0 to 5.0), prior 2‐year number of relapses (range): 2 (1 to 11)
Sormani 2007 Ext Val Relapse ND Median: 34 100% RRMS (Poser 1983) Median: 3.8, range: 0.5 to 22 Recruitment: 0%, follow‐up: 0% EDSS median (range): 2.0 (0.0 to 4.0), prior 2‐year number of relapses (range): 2 (1 to 8)
Vukusic 2004 Relapse 100% Mean: 30 96% RRMS, 4% SPMS (Poser 1983) Mean: 6, SD: 4 Recruitment: 0%, follow‐up: 2% DSS at beginning of pregnancy mean (SD): 1.3 (1.4), annualised relapse rate during year before pregnancy (95% CI): 0.7 (0.6 to 0.8)
Ye 2020 gene signature Relapse 64% Mean (unclear when): 36.3 34.0% CIS, 66.0% CDMS (McDonald 2001) Mean: 5.67, pooled SD: 0.89 Recruitment: ND/unclear, follow‐up: 31% EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1)
Ye 2020 nomogram Relapse 64% Mean (unclear when): 36.3 34.0% CIS, 66.0% CDMS (McDonald 2001) Mean: 5.67, pooled SD: 0.89 Recruitment: ND/unclear, follow‐up: 31% EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1)
Aghdam 2021 Conversion to definite MS 74% Mean (unclear when): 40 100% CIS (McDonald 2010) ND/unclear Recruitment: 0%, follow‐up: ND/unclear History of ON: 11.9%
Bendfeldt 2019 linear placebo Conversion to definite MS 71% Mean: 30.8 100% CIS (own definition) Max: 0.16 Recruitment: 0%, follow‐up: 0% EDSS median (range): conv‐ 1.5 (1.0 to 2.0), conv+ 1.5 (1.0 to 2.0)
Bendfeldt 2019 M7 placebo Conversion to definite MS 70% Mean: 29.7 100% CIS (own definition) Max: 0.16 Recruitment: 0%, follow‐up: 0% EDSS median (range): conv‐ 1.0 (0.0 to 2.0), conv+ 1.5 (1.0 to 2.0)
Bendfeldt 2019 M9 IFN Conversion to definite MS 66% Mean: 29.6 100% CIS (own definition) Max: 0.16 Recruitment: 0%, follow‐up: 100% EDSS median (range): conv‐ 2.0 (1.0 to 2.0), conv+ 2.0 (1.0 to 2.5)
Borras 2016 Conversion to definite MS 66% Median (unclear when): 35.5 100% CIS (ND) Median: 0.22, range: 0.01 to 0.35 Recruitment: ND/unclear, follow‐up: 8% EDSS median (range): 1.5 (0 to 5)
Gout 2011 Conversion to definite MS 70% Median: 31 100% CIS (ND) ND/unclear Recruitment: 0%, follow‐up: 0% EDSS median (range): 2 (0 to 6)
Martinelli 2017 Conversion to definite MS 68% Mean: 32 100% CIS (ND) Max: 0.25 Recruitment: ND/unclear, follow‐up: 40% ND
Olesen 2019 candidate Conversion to definite MS 68% Median: 36 100% CIS (Optic Neuritis Study Group criteria 1991) ND/unclear Recruitment: 0%, follow‐up: ND/unclear ND
Olesen 2019 routine Conversion to definite MS 68% Median: 36 100% CIS (Optic Neuritis Study Group criteria 1991) ND/unclear Recruitment: 0%, follow‐up: ND/unclear ND
Runia 2014 Conversion to definite MS 73% ND/unclear 100% CIS (own definition) Max: 0.5 Recruitment: ND/unclear, follow‐up: ND/unclear ND
Spelman 2017 Conversion to definite MS 71% Median (at MS onset): 31.6 100% CIS (Poser 1983) Max: 1 Recruitment: ND/unclear, follow‐up: 28% EDSS median (IQR): 2 (1 to 2.5)
Wottschel 2015 1 year Conversion to definite MS 66% Mean: 33.1 100% CIS (ND) Mean: 0.12, SD: 0.07 Recruitment: 0%, follow‐up: 0% EDSS median (range): 1 (0 to 8)
Wottschel 2015 3 years Conversion to definite MS 67% Mean: 33.2 100% CIS (ND) Mean: 0.12, SD: 0.07 Recruitment: 0%, follow‐up: 0% EDSS median (range): 1 (0 to 8)
Wottschel 2019 Conversion to definite MS 66% Mean (at onset): 32.7 100% CIS (ND) Max: 0.27 Recruitment: ND/unclear, follow‐up: ND/unclear EDSS median (range): 2 (0 to 8)
Yoo 2019 Conversion to definite MS 69% Mean (at onset): 35.9 100% CIS (ND) Median: 0.23, range: 0.06 to 0.52 Recruitment: 0%, follow‐up: 50% EDSS median (range): 1.5 (0 to 4.5)
Zakharov 2013 Conversion to definite MS 70% Mean: 25.1 100% CIS (ND) ND/unclear Recruitment: ND/unclear, follow‐up: ND/unclear EDSS ≤ 2
Zhang 2019 Conversion to definite MS 70% Mean (unclear when): 42.4 100% CIS (McDonald 2010) ND/unclear Recruitment: 1%, follow‐up: ND/unclear EDSS median: 1
Bergamaschi 2001 BREMS Dev Conversion to progressive MS 63% Mean: 28.5 100% RRMS (Poser 1983; Lublin 1996) ND/unclear Recruitment: 10%, follow‐up: 10% ND
Bergamaschi 2007 BREMS Ext Val Conversion to progressive MS 69% Median: 24.8 100% RRMS (Poser 1983) ND/unclear Recruitment: 3%, follow‐up: 57% ND
Bergamaschi 2015 BREMS Ext Val Conversion to progressive MS 71% Mean: 31.1 100% RRMS (McDonald 2001) ND/unclear Recruitment: ND/unclear, follow‐up: 72% ND
Bergamaschi 2015 BREMSO SP Val Conversion to progressive MS 71% Mean: 31.1 100% RRMS (McDonald 2001) ND/unclear Recruitment: ND/unclear, follow‐up: 72% ND
Brichetto 2020 Conversion to progressive MS ND ND/unclear Unclear: unclear, RRMS, SPMS (ND) ND/unclear Recruitment: ND/unclear, follow‐up: ND/unclear ND
Calabrese 2013 Dev Conversion to progressive MS 67% Mean: 35.3 100% RRMS (McDonald 2001) Mean: 11.3, range: 5 to 23 Recruitment: 100%, follow‐up: 100% EDSS median (range): 2.5 (0 to 4.5)
Calabrese 2013 Ext Val Conversion to progressive MS 60% Mean: 34.5 100% RRMS (McDonald 2001) Mean: 10.5, range: 10 to 21 Recruitment: 100%, follow‐up: 100% ND
Manouchehrinia 2019 Dev Conversion to progressive MS 72% Mean (at onset): 31.5 100% RRMS (McDonald (undefined)) ND/unclear Recruitment: unclear, a minority, follow‐up: unclear number of participants, median duration of exposures first‐line 3, second‐line 0.8 First to recorded EDSS median (IQR): 2 (1 to 3)
Manouchehrinia 2019 Ext Val 1 Conversion to progressive MS 74% Mean (at onset): 31.1 100% RRMS (Poser 1983) ND/unclear Recruitment: ND/unclear, follow‐up: unclear number of participants, median 0 First to recorded EDSS median (IQR): 2 (1 to 3)
Manouchehrinia 2019 Ext Val 2 Conversion to progressive MS 67% Mean (at onset): 29.5 100% RRMS (McDonald 2001) ND/unclear Recruitment: 0%, follow‐up: 100% First to recorded EDSS median (IQR): 2 (1.5 to 3)
Manouchehrinia 2019 Ext Val 3 Conversion to progressive MS 74% Mean (at onset): 29.9 100% RRMS (McDonald 2005) ND/unclear Recruitment: 0%, follow‐up: 100% First to recorded EDSS median (IQR): 2 (1.5 to 3.5)
Misicka 2020 10 years Conversion to progressive MS 78% Median (at MS onset): 32 100% RRMS (McDonald 2005; McDonald 2010) Median: 11, IQR: 5 to 19 Recruitment: 0%, follow‐up: ND/unclear Time to second relapse: 0 years to 1 year 48.1%, 2 years to 5 years 31.8%, ≥ 6 years 20.1%
Misicka 2020 20 years Conversion to progressive MS 78% Median (at MS onset): 32 100% RRMS (McDonald 2005; McDonald 2010) Median: 11, IQR: 5 to 19 Recruitment: 0%, follow‐up: ND/unclear Time to second relapse: 0 years to 1 year 48.1%, 2 years to 5 years 31.8%, ≥ 6 years 20.1%
Misicka 2020 ever Conversion to progressive MS 78% Median (at onset): 32 100% RRMS (McDonald 2005; McDonald 2010) Median: 11, IQR: 5 to 19 Recruitment: 0%, follow‐up: ND/unclear Time to second relapse: 0 years to 1 year 48.1%, 2 years to 5 years 31.8%, ≥ 6 years 20.1%
Pinto 2020 SP Conversion to progressive MS 73% Mean (at onset): 31.1 100% RRMS (McDonald (undefined)) ND/unclear Recruitment: ND/unclear, follow‐up: ND/unclear ND
Pisani 2021 Conversion to progressive MS 58% Mean: 33.5 100% RRMS (McDonald 2005) ND/unclear Recruitment: 100%, follow‐up: 100% EDSS median (range): 1.5 (0 to 3.5)
Seccia 2020 180 days Conversion to progressive MS 70% Mean (at onset): 29 100% RRMS (latest criteria at time of diagnosis) Mean: 19 Recruitment: ND/unclear, follow‐up: 73% ND
Seccia 2020 360 days Conversion to progressive MS 70% Mean (at onset): 29 100% RRMS (latest criteria at time of diagnosis) Mean: 19 Recruitment: ND/unclear, follow‐up: 73% ND
Seccia 2020 720 days Conversion to progressive MS 70% Mean (at onset): 29 100% RRMS (latest criteria at time of diagnosis) Mean: 19 Recruitment: ND/unclear, follow‐up: 73% ND
Skoog 2014 Dev Conversion to progressive MS 65% Mean: 33.5 100% RRMS (Poser 1983) Median: 2 Recruitment: 0%, follow‐up: 0% ND
Skoog 2019 Ext Val Conversion to progressive MS 76% Mean (at CDMS onset (2nd attack)): 33 100% RRMS (Poser 1983) ND/unclear Recruitment: 0%, follow‐up: unclear, few patients received first generation DMT (IFN‐beta or glatiramer acetate), 99 out of 1762 patient years ND
Skoog 2019 Val Conversion to progressive MS 65% Mean (at CDMS onset (2nd attack)): 33 100% RRMS (Poser 1983) Median: 2 Recruitment: 0%, follow‐up: 0% ND
Tacchella 2018 180 days Conversion to progressive MS ND ND/unclear 100% RRMS (McDonald 2017) ND/unclear Recruitment: ND/unclear, follow‐up: 89% ND
Tacchella 2018 360 days Conversion to progressive MS ND ND/unclear 100% RRMS (McDonald 2017) ND/unclear Recruitment: ND/unclear, follow‐up: 89% ND
Tacchella 2018 720 days Conversion to progressive MS ND ND/unclear 100% RRMS (McDonald 2017) ND/unclear Recruitment: ND/unclear, follow‐up: 89% ND
Vasconcelos 2020 Dev Conversion to progressive MS 76% Mean (at onset): 28.7 100% RRMS (Poser 1983; McDonald 2001) Mean: 16, SD: 9.42 Recruitment: ND/unclear, follow‐up: 58% Patients with more than one relapse at first year of disease: 74%
Vasconcelos 2020 Ext Val Conversion to progressive MS 78% Mean (at onset): 28.5 100% RRMS (Poser 1983; McDonald 2001) Mean: 13.22, SD: 9.72 Recruitment: ND/unclear, follow‐up: 77% Patients with more than one relapse at first year of disease: 74%
Ahuja 2021 Dev Composite 74% Median (unclear when): 43.3 Unclear: approximately 70% to 80% RRMS, approximately 10% PPMS, 10% to 20% SPMS (ND) Median: 5.12, IQR: 2.03 Recruitment: ND/unclear, follow‐up: 55% ND
Ahuja 2021 Ext Val Composite 74% Median (unclear when): 43.3 Unclear: approximately 70% to 80% RRMS, approximately 10% PPMS, 10% to 20% SPMS (ND) Median: 4.37, IQR: 2.82 Recruitment: ND/unclear, follow‐up: 55% ND
de Groot 2009 cognitive Composite 64% Mean: 37.4 82% relapse onset, 18% non‐relapse onset (Poser 1983) Max: 0.5 Recruitment: 6%, follow‐up: 30% EDSS median (IQR): 2.5 (2.0 to 3.0)
Kosa 2022 Composite 54% Mean: 49.6 30.8% RRMS, 24.2% SPMS, 44.9% PPMS (McDonald 2010; McDonald 2017) Mean: 12.2, pooled SD: 8.51 Recruitment: 0%, follow‐up: ND/unclear EDSS mean (SD): development set; RRMS 1.8 (1.2), SPMS 5.9 (1.2), PPMS 5.3 (1.6)/validation set RRMS 2.2 (1.6), SPMS 5.5 (1.5), PPMS 5.2 (1.6)
Pellegrini 2019 Composite 71% Mean: 37.1 100% RRMS (McDonald 2001; McDonald 2005) Mean: 7.5, SD: 6.5 Recruitment: 34%, follow‐up: 0% EDSS mean (SD): 2.5 (1.2), number of relapses 1 year prior to study entry mean (SD): 1.4 (0.7)

Ada: adaptive boosting
BREMS: Bayesian Risk Estimate for Multiple Sclerosis
BREMSO: Bayesian Risk Estimate for Multiple Sclerosis Onset
CDMS: clinically definite multiple sclerosis
CI: confidence interval
CIS: clinically isolated syndrome
conv‐: did not convert to CDMS
conv+: converted to CDMS 
Dev: development
DMT: disease‐modifying treatment
DSS: disability status scale
DT: decision tree
EDSS: Expanded Disability Status Scale
Ext: external
FLP: first level predictor
FTP: fine tuning predictor
IFN: interferon
IMD: immunomodulatory drug
IQR: interquartile range
LGBM: light gradient boosting machine
Max: maximum
Min: minimum
MS: multiple sclerosis
ND: no data available
ON: optic neuritis
PPMS: primary progressive MS
PRMS: progressive‐relapsing MS
RF: random forest
RRMS: relapsing‐remitting MS
SD: standard deviation
SP: secondary progressive
SPMS: secondary progressive MS
Val: validation

Table 5

4. Number of predictors.

Model Outcome Number considered Number included Timing
Agosta 2006 Disability 26 3 (2 or 3 (unclear if follow‐up duration included)) At study baseline (cohort entry at least 2 years after diagnosis of definite MS or 3 months after CIS), 12 months (± 10 days) after baseline, at final follow‐up (outcome measurement)
Bejarano 2011 Disability 23 (22 or 23 (unclear transformation)) 5 At study baseline (cohort entry)
De Brouwer 2021 Disability 24 + EDSS trajectories 19 predictors + EDSS trajectories At multiple visits, at least 6 in 3‐year period
de Groot 2009 dexterity Disability 5 5 At disease onset (definite MS) (study baseline within 6 months after diagnosis)
de Groot 2009 walking Disability 5 3 At disease onset (definite MS) (study baseline within 6 months after diagnosis)
Kuceyeski 2018 Disability 965 703 (703 predictors transformed to 6 principal components) At baseline image (early RRMS within 5 years of their first neurologic symptom), at follow‐up (unclear: outcome measurement)
Law 2019 Ada Disability 9 9 At study baseline (RCT)
Law 2019 DT Disability 9 9 At study baseline (RCT)
Law 2019 RF Disability 9 9 At study baseline (RCT)
Lejeune 2021 Disability 19 (≤ 19 and ≥ 14 (unclear transformations)) 6 (7 df) At study baseline (RCT, relapse) or retrospectively at screening
Malpas 2020 Disability 17 3 At symptom onset, at visits up to 1 year following symptom onset, and at final follow‐up
Mandrioli 2008 Disability 15 4 At disease onset (RRMS)
Margaritella 2012 Disability 8 (≥ 8 (unclear transformations)) 6 At multiple assessments consecutively for 3 years until 1 year prior to outcome
Montolio 2021 Disability 39 5 (4 of them longitudinal) At 3 visits over 2 years (not defined baseline and annual visits 1 and 2)
Oprea 2020 Disability 6 6 At a single time point during outcome determination
Pinto 2020 severity 10 years Disability 1306 Unclear which predictors make up the final model At multiple visits dependent on which 1‐year to 5‐year model
Pinto 2020 severity 6 years Disability 1306 Unclear which predictors make up the final model At multiple visits dependent on which 1‐year to 5‐year model
Roca 2020 Disability Unstructured data + 65 Unstructured data + 65 At FLAIR imaging (initial in the dataset)
Rocca 2017 Disability 26 5 At study baseline (cohort entry), at median 15 months after baseline, and at median 56 months (called 5 years) after baseline
Rovaris 2006 Disability 25 3 (2 or 3 (unclear if follow‐up time included)) At study baseline (cohort entry), at 15 months post‐baseline (follow‐up), at final follow‐up (outcome measurement)
Sombekke 2010 Disability 72 9 (13 df) At baseline (already available or retrospectively collected)
Szilasiova 2020 Disability 11 6 (7 df) At study baseline (cohort entry)
Tommasin 2021 Disability 16 4 At assessment (not defined), at follow‐up
Tousignant 2019 Disability Unstructured data Unstructured data At imaging
Weinshenker 1991 M3 Disability 13 (≥ 13 (unclear if complete list)) 7 At assessment (not defined), at follow‐up
Weinshenker 1996 short‐term Disability 5 (≥ 5 (unclear if complete list)) 5 At assessment (not defined), at follow‐up (unclear: outcome measurement)
Yperman 2020 Disability 5893 9 (≤ 9 (unclear subset)) At visit of interest
Zhao 2020 LGBM All Disability 198 198 At multiple assessments every 6 months from baseline (undefined) to year 2
Zhao 2020 LGBM Common Disability 105 (≤ 105 (unclear subset)) 105 (≤ 105 (unclear subset)) At multiple assessments every year from baseline (undefined) to year 2
Zhao 2020 XGB All Disability 198 198 At multiple assessments every 6 months from baseline (undefined) to year 2
Zhao 2020 XGB Common Disability 105 (≤ 105 (unclear subset)) 105 (≤ 105 (unclear subset)) At multiple assessments every year from baseline (undefined) to year 2
Gurevich 2009 FLP Relapse 10,602 10 (df unclear) At study baseline (cohort entry)
Gurevich 2009 FTP Relapse 10,602 9 (df unclear) At study baseline (cohort entry)
Sormani 2007 Relapse 12 (≥ 12 (unclear transformations)) 2 At study baseline (RCT, entry at least 1 year after disease onset)
Vukusic 2004 Relapse 11 3 At study baseline (cohort entry during pregnancy week 4 to 36), at examinations at 20, 28, 36 weeks of gestation, and also post‐partum
Ye 2020 gene signature Relapse 202 5 At study baseline (cohort entry)
Ye 2020 nomogram Relapse 206 5 At study baseline (cohort entry)
Aghdam 2021 Conversion to definite MS 10 (≥ 10 (unclear transformations)) 4 At presentation due to ON
Bendfeldt 2019 linear placebo Conversion to definite MS Number of voxels in the cortical GM mask NA At disease onset (CIS) (RCT baseline within 60 days after onset)
Bendfeldt 2019 M7 placebo Conversion to definite MS 301 25 (df unclear (reported predictors do not add up to 25)) At disease onset (CIS) (RCT baseline within 60 days after onset)
Bendfeldt 2019 M9 IFN Conversion to definite MS 301 15 (df unclear) At disease onset (CIS) (RCT baseline within 60 days after onset)
Borras 2016 Conversion to definite MS 32 (≤ 32 and ≥ 17 (discrepant lists)) 2 At disease onset (CIS) reported as 'first relapse' (4 to 126 days between CIS and lumbar puncture)
Gout 2011 Conversion to definite MS 15 (≥ 15 (unclear how many interactions tested)) 3 At disease onset (CIS) leading to admission
Martinelli 2017 Conversion to definite MS 36 (≥ 24 and ≤ 36 (unclear adjustments and transformations)) 7 (5 or 7 (unclear adjustment)) At disease onset (CIS) and up to 3 months after disease onset
Olesen 2019 candidate Conversion to definite MS 14 3 At disease onset (ON), from ON onset median (range): 14 days (2 days 38 days)
Olesen 2019 routine Conversion to definite MS 4 3 At disease onset (ON), from ON onset median (range): 14 days (2 days 38 days)
Runia 2014 Conversion to definite MS 21 (≥ 16 or 21 (unclear transformations)) 5 At disease onset (CIS) (at study baseline within 6 months after onset)
Spelman 2017 Conversion to definite MS 16 (≥ 16 (unclear how many interactions tested)) 7 (11 df) At disease onset (CIS) (up to 12 months after disease onset)
Wottschel 2015 1 year Conversion to definite MS 14 3 (df unclear) At disease onset (CIS) and up to a mean of 6.15 weeks (SD 3.4) after disease onset
Wottschel 2015 3 years Conversion to definite MS 14 6 (df unclear) At disease onset (CIS) and up to a mean of 6.15 weeks (SD 3.4) after disease onset
Wottschel 2019 Conversion to definite MS 214 36 ((for 2‐fold CV)) At disease onset (CIS) and up to 14 weeks after disease onset
Yoo 2019 Conversion to definite MS Unstructured data + 11(user‐defined) Unstructured data + 11 At disease onset (CIS) (RCT baseline within 180 days after disease onset)
Zakharov 2013 Conversion to definite MS 2 (≥ 2 (unclear if complete list)) 2 Unclear, at first MRI after CIS onset (timing distribution unknown)
Zhang 2019 Conversion to definite MS 30 18 At disease onset (CIS) (during primary clinical work‐up for CIS)
Bergamaschi 2001 BREMS Conversion to progressive MS 9 (> 9 (unclear if complete list)) 9 At disease onset (RRMS) and regular visits up to 1 year after onset (baseline)
Brichetto 2020 Conversion to progressive MS 143 33 Unclear, at multiple assessments every 4 months
Calabrese 2013 Conversion to progressive MS 16 (≥ 16 (unclear df of initial symptoms)) 3 At study baseline (cohort entry at least 5 years after disease onset)
Manouchehrinia 2019 Conversion to progressive MS 20 5 (6 df) From disease onset (RRMS) to first EDSS recorded (median 2 years)
Misicka 2020 10 years Conversion to progressive MS 35 6 (7 df) At study interview (the same as the time of outcome reporting)
Misicka 2020 20 years Conversion to progressive MS 35 6 (7 df) At study interview (the same as the time of outcome reporting)
Misicka 2020 ever Conversion to progressive MS 35 6 (7 df) At study interview (the same as the time of outcome reporting)
Pinto 2020 SP Conversion to progressive MS 1306 Unclear which predictors make up the final model At multiple visits dependent on which 1‐year to 5‐year model
Pisani 2021 Conversion to progressive MS 13 (12 or 13 (unclear adjustment)) 7 At diagnosis (RRMS) and up to 2 years after diagnosis
Seccia 2020 180 days Conversion to progressive MS 21 predictor trajectories 18 predictor trajectories At multiple visits comprising patient history to the current visit of interest
Seccia 2020 360 days Conversion to progressive MS 21 predictor trajectories 18 predictor trajectories At multiple visits comprising patient history to the current visit of interest
Seccia 2020 720 days Conversion to progressive MS 21 predictor trajectories 18 predictor trajectories At multiple visits comprising patient history to the current visit of interest
Skoog 2014 Conversion to progressive MS 15 (≥ 15 (unclear transformations)) 3 (4 df) At last relapse, at time of prognostication
Tacchella 2018 180 days Conversion to progressive MS 46 46 At visit of interest
Tacchella 2018 360 days Conversion to progressive MS 46 46 At visit of interest
Tacchella 2018 720 days Conversion to progressive MS 46 46 At visit of interest
Vasconcelos 2020 Conversion to progressive MS 8 5 At multiple visits (unclear if CIS or RR onset) to at least 2 years post‐onset
Ahuja 2021 Composite 2730 114 (model 1: 111, model 2: 3) From 1 year ago to the index encounter (unspecified)
de Groot 2009 cognitive Composite 5 4 At disease onset (definite MS) (study baseline within 6 months after diagnosis)
Kosa 2022 Composite 852,167 (852,167 or 852,165 (unclear adjustment for age and sex)) 23 (23 or 21 (unclear if age and sex are predictors)) At lumbar puncture
Pellegrini 2019 Composite 23 3 At study baseline (RCT)

Ada: adaptive boosting
BREMS: Bayesian Risk Estimate for Multiple Sclerosis
CIS: clinically isolated syndrome
CV: cross‐validation
df: degrees of freedom
DT: decision tree
EDSS: Expanded Disability Status Scale
FLAIR: fluid‐attenuated inversion recovery
FLP: first level predictor
FTP: fine tuning predictor
IFN: interferon
LGBM: light gradient boosting machine
MRI: magnetic resonance imaging
MS: multiple sclerosis
NA: not applicable
ND: no data available
ON: optic neuritis
RCT: randomised controlled trial
RF: random forest
RR: relapsing‐remitting
RRMS: relapsing‐remitting MS
SD: standard deviation
SP: secondary progressive
XGB: extreme gradient boosting

Table 6

5. Development and performance details.

Analysis Outcome Algorithm Sample size (number of events) EPV Evaluation details Number of external validations Calibration Discrimination Classification
Agosta 2006 Disability (EDSS) Logistic regression 70 (44) 1 Cross‐validation 0 ND ND Accuracy = 46/70, sensitivity = 30/41, specificity = 16/29
Bejarano 2011 Disability (EDSS) Neural network 51 (NA) 2 Cross‐validation 1 external refit ND AUC computed for continuous outcome Unclear how classification measures produced for numeric outcome, accuracy = 0.80 (SD 0.14), sensitivity = 0.92, specificity = 0.61, PPV = 0.80, NPV = 0.80
Bejarano 2011 Val Disability (EDSS) NA 96 (NA) 4 Validation; location NA ND NA Unclear how classification measures produced for numeric outcome, accuracy = 0.81
Bergamaschi 2015 BREMSO MSSS Val Disability (MSSS) NA 14,211 (3567) NA Validation; multiple (location, time); predictors dropped and different outcome NA ND ND Sensitivity = 0.36, specificity = 0.79
De Brouwer 2021 Disability (EDSS) Neural network 6682 (1114) 46 Cross‐validation 0 Calibration plot upon request 0.66 (0.64 to 0.68)B ND
de Groot 2009 dexterity Disability (9HPT) Logistic regression 146 (46) 9 Bootstrap 0 Calibration plot, calibration slope 0.85 0.77 (0.69 to 0.86) ND
de Groot 2009 walking Disability (EDSS) Logistic regression 146 (37) 7 Bootstrap 0 Calibration plot, calibration slope 0.93 0.89 (0.83 to 0.95) ND
Kuceyeski 2018 Disability (cognitive ‐ SDMT) Partial least squares regression 60 (NA) 10 Unclear 0 Calibration plot NA NA
Law 2019 Ada Disability (EDSS) Boosting 485 (115) 13 Cross‐validation 0 ND 0.6 (0.54 to 0.66)B Cutoff (0.527) identified by convex hull method, sensitivity = 53.0 (SD 4.7), specificity = 62.4 (SD 2.5), PPV = 30.5 (SD 1.6), NPV = 81.1 (1.9)
Law 2019 DT Disability (EDSS) Classification tree 485 (115) 13 Cross‐validation 0 ND 0.62 (0.56 to 0.68)B Cutoff (0.537) identified by convex hull method, sensitivity = 58.3 (SD 4.6), specificity = 62.2 (SD 2.5), PPV = 32.4 (SD 2.0), NPV = 82.7 (SD 1.8)
Law 2019 RF Disability (EDSS) Random forest 485 (115) 13 Cross‐validation 0 ND 0.61 (0.55 to 0.67)B Cutoff (0.531) identified by convex hull method, sensitivity = 59.1 (SD 4.6), specificity = 61.1 (SD 2.5), PPV = 32.1 (SD 2.1), NPV = 82.8 (SD 1.7)
Lejeune 2021 Disability (EDSS) Penalised regression 186 (53) 4 Bootstrap 1 ND 0.82 (0.73 to 0.91) Cutoff = 0.5, PPV 0.73 (95% CI 0.53 to 0.92), NPV 0.70 (95% CI 0.50 to 0.88)
Lejeune 2021 Ext Val Disability (EDSS) NA 175 (55) NA External validation; multiple (location, spectrum) NA Calibration plot, Hosmer‐Lemeshow test 0.71 (0.62 to 0.80) Cutoff = 0.5, PPV 0.83 (95% CI 0.76 to 0.92), NPV 0.74 (95% CI 0.67 to 0.81)
Malpas 2020 Disability (EDSS) Bayesian model averaging 2403 (145) 8 Apparent 1 ND 0.8 (0.75 to 0.84) Full model: cutoff = 0.05, sensitivity = 0.78, specificity = 0.71, PPV = 0.15, NPV = 0.98. Reduced model: cutoff = 0.06, sensitivity = 0.72, specificity = 0.73, PPV = 0.15, NPV = 0.98
Malpas 2020 Ext Val Disability (EDSS) NA 556 (34) NA External validation; location NA ND 0.75 (0.66 to 0.84) Cutoff determined in development set (0.06), PPV = 0.15, NPV = 0.97
Mandrioli 2008 Disability (EDSS) Logistic regression 64 (26) 2 Apparent 1 ND ND Error = 0.0937, sensitivity = 0.8846, specificity = 0.9211, PPV = 0.8846, NPV = 0.9211
Mandrioli 2008 Ext Val Disability (EDSS) NA 65 (20) NA External validation; time NA ND ND Error = 0.1231, sensitivity = 0.8000, specificity = 0.9111, PPV = 0.8000, NPV = 0.9111
Margaritella 2012 Disability (EDSS) Other regression 58 (NA) 22 Apparent 0 Histogram of differences between measured and predicted values NA Percent predictions within ± 0.5 of observed = 0.72
Montolio 2021 Disability (EDSS) Neural network 82 (37) 1 Cross‐validation 0 ND 0.82 (0.72 to 0.92)B Accuracy = 0.817, sensitivity = 0.811, specificity = 0.822, PPV = 0.789
Oprea 2020 Disability (EDSS) Logistic regression 151 (ND) 13 Cross‐validation 0 ND 0.82 NA Accuracy = 0.7662, sensitivity = 0.7775, PPV = 0.8145, F1 = 0.7806
Pinto 2020 severity 10 years Disability (EDSS) Support vector machine 67 (30) < 1 Cross‐validation 0 ND 0.85 (0.75 to 0.95)B Sensitivity = 0.77 (0.13), specificity = 0.79 (0.09), F1 score = 0.72 (0.09), geometric mean = 0.78 (0.08)
Pinto 2020 severity 6 years Disability (EDSS) Support vector machine 145 (38) < 1 Cross‐validation 0 ND 0.89 (0.83 to 0.95)B Sensitivity = 0.84 (0.11), specificity = 0.81 (0.05), F1 score = 0.53 (0.07), geometric mean = 0.82 (0.06)
Roca 2020 Disability (EDSS) ML combination 1427 (NA) 22A Random split 0 Other: plot of MSE per EDSS category, MSE 2.21 (validation), 3 (test) NA NA
Rocca 2017 Disability (EDSS) Other regression 49 (NA) 2 Cross‐validation 0 ND NA EDSS change precision within one point = 0.776
Rovaris 2006 Disability (EDSS) Logistic regression 52 (35) 1 Cross‐validation 0 ND ND Accuracy = 0.808, sensitivity = 31/35 = 0.89, specificity = 11/17 = 0.65
Sombekke 2010 Disability (MSSS) Logistic regression 605 (86) 1 Unclear 0 Hosmer‐Lemeshow test 0.78 (0.75 to 0.84) Sensitivity = 0.37, specificity = 0.953, LR+ = 7.9
Szilasiova 2020 Disability (EDSS) Logistic regression 85 (ND) 4 Apparent 0 ND Unclear due to mismatch between ROC curve and reported statistics: 0.94 (95% CI 0.89 to 0.98) Unclear because these points do not correspond to point on plot, sensitivity = 0.94, specificity = 0.89
Tommasin 2021 Disability (EDSS) Random forest 163 (58) 4 Cross‐validation 0 ND 0.92 (0.88 to 0.96)B Accuracy = 0.92, sensitivity = 0.92, specificity = 0.91
Tousignant 2019 Disability (EDSS) Neural network 1083 (103) NA Cross‐validation 0 ND 0.7 (0.64 to 0.76)B ND
Weinshenker 1991 M3 Disability (DSS) Survival analysis 1060 (498) 38 None 1 ND ND ND
Weinshenker 1996 M3 Ext Val Disability (DSS) NA 259 (66) NA External validation; location NA ND ND ND
Weinshenker 1996 short‐term Disability (EDSS) Logistic regression 174 (28) 9 Apparent 0 ND ND Cutoff = 0.5: accuracy = 0.75, sensitivity = 0.21, specificity = 0.93, for 0.3 cutoff = 0.3: accuracy = 0.67, sensitivity = 0.54, specificity = 0.72
Yperman 2020 Disability (EDSS) Random forest 2502 (275) < 1 Cross‐validation 0 Calibration plot upon request 0.75 (0.71 to 0.79)B ND
Zhao 2020 LGBM All Disability (EDSS) Boosting 724 (165) 1 Cross‐validation 0 ND 0.78 (0.74 to 0.82)B Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.77, sensitivity = 0.58, specificity = 0.82
Zhao 2020 LGBM Common Disability (EDSS) Boosting 724 (165) 2 Cross‐validation 1 (unclear if refit) ND 0.76 (0.72 to 0.8)B Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.64, sensitivity = 0.75, specificity = 0.61
Zhao 2020 LGBM Common Val Disability (EDSS) NA 400 (130) NA Validation; location NA ND 0.82 (0.78 to 0.86)B Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.73, sensitivity = 0.73, specificity = 0.73
Zhao 2020 XGB All Disability (EDSS) Boosting 724 (165) 1 Cross‐validation 0 ND 0.78 (0.74 to 0.82)B Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.74, sensitivity = 0.68, specificity = 0.76
Zhao 2020 XGB Common Disability (EDSS) Boosting 724 (165) 2 Cross‐validation 1 (unclear if refit) ND 0.76 (0.72 to 0.8)B Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.65, sensitivity = 0.75, specificity = 0.62
Zhao 2020 XGB Common Val Disability (EDSS) NA 400 (130) NA Validation; location NA ND 0.82 (0.78 to 0.86)B Cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.68, sensitivity = 0.85, specificity = 0.60
Gurevich 2009 FLP Relapse Support vector machine 94 (19) < 1 Cross‐validation 1 ND ND Categories determined in data, error = 0.079
Gurevich 2009 FLP Ext Val Relapse NA 10 (ND) NA External validation; ND NA ND ND Error = 0.25 (but 10 patients reported)
Gurevich 2009 FTP Relapse Other regression 40 (NA) < 1 Cross‐validation 0 Calibration plot NA Prediction more than 50 days from observed = 0.345
Sormani 2007 Relapse Survival analysis 539 (270) 22 Apparent 1 ND ND ND
Sormani 2007 Ext Val Relapse NA 117 (ND) NA External validation; spectrum NA ND ND ND
Vukusic 2004 Relapse Logistic regression 223 (63) 6 Apparent 0 ND 0.72 (0.64 to 0.8)B Cutoff = 0.5, accuracy = 160/223 = 0.72
Ye 2020 gene signature Relapse Penalised regression 94 (64) < 1 Random split 0 ND 0.73 (0.61 to 0.85)B ND
Ye 2020 nomogram Relapse Survival analysis 94 (64) < 1 Random split 0 ND 0.59 (0.47 to 0.71)B ND
Aghdam 2021 Conversion to definite MS Classification tree 277 (117) 12 Random split 0 ND ND Accuracy = 0.74, sensitivity = 0.71, specificity = 0.76, PPV = 0.65, NPV = 0.79
Bendfeldt 2019 linear placebo Conversion to definite MS Support vector machine 69 (25) NA Cross‐validation 0 ND ND Accuracy = 0.712 (95% CI 0.707 to 0.716), sensitivity = 0.64, specificity = 0.783
Bendfeldt 2019 M7 placebo Conversion to definite MS Support vector machine 61 (22) < 1 Cross‐validation 0 ND ND Balanced accuracy = 0.676 (95% CI 0.559 to 0.793)
Bendfeldt 2019 M9 IFN Conversion to definite MS Support vector machine 99 (49) < 1 Cross‐validation 0 ND ND Balanced accuracy = 0.704 (95% CI 0.614 to 0.794)
Borras 2016 Conversion to definite MS Logistic regression 49 (24) 1 Unclear 0 ND 0.79 (0.65 to 0.93)B Sensitivity = 0.84, specificity = 0.83
Gout 2011 Conversion to definite MS Survival analysis 208 (141) 9 Apparent 0 ND ND ND
Martinelli 2017 Conversion to definite MS Survival analysis 243 (108) 4 Apparent 0 Other: Gronnesby and Borgan statistic 0.7 (0.64 to 0.75) Categories defined as low: 0% to 33.3%, moderate: 33.3% to 66.7%, high: 66.6% to 100%, net reclassification improvement = 0.3
Olesen 2019 candidate Conversion to definite MS Logistic regression 33 (16) 1 Bootstrap 0 Calibration plot, Hosmer‐Lemeshow test 0.89 (0.77 to 1.00) ND
Olesen 2019 routine Conversion to definite MS Logistic regression 38 (16) 4 Bootstrap 0 Calibration plot, Hosmer‐Lemeshow test 0.86 (0.74 to 0.98) ND
Runia 2014 Conversion to definite MS Survival analysis 431 (109) 7 Bootstrap 0 ND 0.66 (0.6 to 0.72)B ND
Spelman 2017 Conversion to definite MS Survival analysis 3296 (1953) 122 Bootstrap 0 Calibration plot 0.81 (0.79 to 0.83)B ND
Wottschel 2015 1 year Conversion to definite MS Support vector machine 74 (22) 2 Cross‐validation 0 ND ND Accuracy = 0.714 (95% CI 0.58 to 0.84), sensitivity = 0.77, specificity = 0.66, PPV = 0.70, NPV = 0.74
Wottschel 2015 3 years Conversion to definite MS Support vector machine 70 (31) 2 Cross‐validation 0 ND ND Accuracy = 0.68 (95% CI 0.61 to 0.73), sensitivity = 0.60, specificity = 0.76, PPV = 0.72, NPV = 0.65
Wottschel 2019 Conversion to definite MS Support vector machine 400 (91) < 1 Cross‐validation 0 ND ND 2 to fold CV: accuracy = 0.648 (95% CI 0.646 to 0.651), sensitivity = 0.641, specificity = 0.656, also reported for 5 to, 10 to fold, and LOOCV
Yoo 2019 Conversion to definite MS Neural network 140 (80) 7A Cross‐validation 0 ND 0.75 (0.67 to 0.83)B Accuracy = 0.75 (SD = 0.113), sensitivity = 0.787 (SD = 0.122), specificity = 0.704 (SD = 0.154)
Zakharov 2013 Conversion to definite MS Logistic regression 102 (23) 12 Apparent 0 ND ND Sensitivity = 0.727, specificity = 0.345
Zhang 2019 Conversion to definite MS Random forest 84 (66) 1 Cross‐validation 0 ND ND Accuracy = 0.85 (95% CI 0.75 to 0.91), sensitivity = 0.94 (95% CI 0.85 to 0.98), specificity = 0.50 (95% CI 0.26 to 0.74), PPV = 0.87 (95% CI 0.81 to 0.91), NPV = 0.69 (95% CI 0.44 to 0.87), DOR = 15.50 (95% CI 3.93 to 60.98), balanced Accuracy = 0.72 (posterior probability interval 0.60 to 0.82)
Bergamaschi 2001 BREMS Conversion to progressive MS Survival analysis 186 (34) 4 None 2, simplified: 2 ND ND ND
Bergamaschi 2007 BREMS Ext Val Conversion to progressive MS NA 535 (87) NA External validation; location NA ND ND Cutoff at 95th percentile (score ≥ 2.0): sensitivity = 0.17, specificity = 0.99, PPV = 0.86, NPV = 0.83, cutoff at 5th percentile (score ≤ 0.63): sensitivity = 0.08, specificity = 1.00, PPV = 1.00, NPV = 0.18 (the event is defined as having secondary progression for 95th percentile cutoff but not having secondary progression for other)
Bergamaschi 2015 BREMS Ext Val Conversion to progressive MS NA 1131 (ND) NA External validation; multiple (location, time) NA ND ND Sensitivity = 0.35, specificity = 0.80
Bergamaschi 2015 BREMSO SP Val Conversion to progressive MS NA 14,211 (1954) NA Validation; multiple (location, time); predictors dropped NA ND ND Sensitivity = 0.28, specificity = 0.76
Brichetto 2020 Conversion to progressive MS ML combination 810 (1451) 10 Unclear 0 ND ND Accuracy FCA = 0.826, CCA = 0.860
Calabrese 2013 Conversion to progressive MS Logistic regression 334 (66) 4 Cross‐validation 1 ND ND Accuracy = 0.928, sensitivity = 0.878, specificity = 0.94
Calabrese 2013 Ext Val Conversion to progressive MS NA 83 (19) NA External validation; time NA ND ND Accuracy = 0.916, sensitivity = 0.842, specificity = 0.937
Manouchehrinia 2019 Conversion to progressive MS Survival analysis 8825 (1488) 74 Bootstrap 3 Calibration plot 0.84 (0.83 to 0.85) ND
Manouchehrinia 2019 Ext Val 1 Conversion to progressive MS NA 3967 (888) NA External validation; multiple (location, time, spectrum) NA ND 0.77 (0.76 to 0.78) ND
Manouchehrinia 2019 Ext Val 2 Conversion to progressive MS NA 175 (26) NA External validation; multiple (location, time, spectrum) NA ND 0.77 (0.70 to 0.85) ND
Manouchehrinia 2019 Ext Val 3 Conversion to progressive MS NA 2355 (126) NA External validation; multiple (location, time, spectrum) NA ND 0.87 (0.84 to 0.89) ND
Misicka 2020 10 years Conversion to progressive MS Survival analysis 1166 (55) 2 Apparent 0 ND ND ND
Misicka 2020 20 years Conversion to progressive MS Survival analysis 1166 (128) 4 Apparent 0 ND ND ND
Misicka 2020 ever Conversion to progressive MS Survival analysis 1166 (177) 5 Apparent 0 ND ND ND
Pinto 2020 SP Conversion to progressive MS Support vector machine 187 (21) < 1 Cross‐validation 0 ND 0.86 (0.78 to 0.94)B Sensitivity = 0.76 (0.14), specificity = 0.77 (0.05), F1 score = 0.20 (0.05), geometric mean = 0.76 (0.08)
Pisani 2021 Conversion to progressive MS Random survival forest 262 (69) 5 Random split 0 ND Reported for RF, not final model Cutoff = 17.7, accuracy = 0.88 (95% CI 0.75 to 0.96), sensitivity = 0.92 (95% CI 0.70 to 1.00), specificity = 0.87 (95% CI 0.70 to 0.96), PPV = 0.75 (95% CI 0.48 to 0.93), NPV = 0.96 (95% CI 0.81 to 1.00) from evaluation of final tool using random split
Seccia 2020 180 days Conversion to progressive MS Neural network 1515 (207) 10 Random split 0 ND ND Cutoff = 0.5, accuracy = 0.98, sensitivity = 0.385, specificity = 0.988, PPV = 0.308
Seccia 2020 360 days Conversion to progressive MS Neural network 1449 (207) 10 Random split 0 ND ND Cutoff = 0.5, accuracy = 0.975, sensitivity = 0.50, specificity = 0.982, PPV = 0.295
Seccia 2020 720 days Conversion to progressive MS Neural network 1375 (207) 10 Random split 0 ND ND Cutoff = 0.5, accuracy = 0.98, sensitivity = 0.673, specificity = 0.985, PPV = 0.427
Skoog 2014 Conversion to progressive MS Survival analysis 157 (118) 8 Apparent 1 O:E table ND ND
Skoog 2019 Ext Val Conversion to progressive MS NA 145 (54) NA External validation; multiple (location, time) NA Calibration plot, O:E table, O:E 0.599 ND ND
Skoog 2019 Val Conversion to progressive MS NA 144 (100) NA Apparent validation in new publication, some participants from the development excluded NA Calibration plot, O:E table, O:E 0.829 ND ND
Tacchella 2018 180 days Conversion to progressive MS Random forest 527 (65) 1 Cross‐validation 0 ND 0.71 (0.66 to 0.76) ND
Tacchella 2018 360 days Conversion to progressive MS Random forest 527 (125) 3 Cross‐validation 0 ND 0.67 (0.62 to 0.71) ND
Tacchella 2018 720 days Conversion to progressive MS Random forest 527 (211) 5 Cross‐validation 0 ND 0.68 (0.64 to 0.72) ND
Vasconcelos 2020 Conversion to progressive MS Survival analysis 287 (88) 11 apparent 1 Other: events per score level ND ND
Vasconcelos 2020 Ext Val Conversion to progressive MS NA 142 (31) NA External validation; time NA O:E table (unclear), Hosmer‐Lemeshow test ND ND
Ahuja 2021 Composite (relapse) Penalised regression 1435 (ND) 1 none 1 ND ND ND
Ahuja 2021 Ext Val Composite (relapse) NA 186 (ND) NA External validation; spectrum NA Plots comparing observed and predicted relapse proportions stratified by disease duration and age separately 0.71 (0.69 to 0.71) Sensitivity = 0.499, specificity = 0.719, PPV = 0.223, NPV = 0.900, F1 = 0.307
de Groot 2009 cognitive Composite (cognitive tests) Logistic regression 146 (44) 9 Bootstrap 0 Calibration plot, calibration slope 0.88 0.74 (0.65 to 0.83) ND
Kosa 2022 Composite (EDSS, SNRS, T25FW, NDH‐9HPT) Random forest 227 (NA) < 1 Random split 0 Calibration plot NA NA
Pellegrini 2019 Composite (EDSS, T25FW, 9HPT, PASAT, VFT) Survival analysis 1582 (434) 19 Bootstrap 0 Calibration slope 1 year: 1.10 (bootstrap = 1.08, SE 0.17), 2 years: 1.00 (bootstrap = 0.97, SE 0.15) 0.59 (0.57 to 0.61)B ND

9HPT: 9‐hole peg test
AUC: area under the curve
Ada: adaptive boosting
BREMSO: Bayesian Risk Estimate for Multiple Sclerosis at Onset
CI: confidence interval
Dev: development
DSS: Disability Status Scale
DT: decision tree
EDSS: Expanded Disability Status Scale
EPV: events per variable
Ext: external
FLP: first level predictor
FTP: fine tuning predictor
IFN: interferon
LGBM: light gradient boosting machine
LOOCV: leave‐one‐out cross‐validation
MS: multiple sclerosis
MSE: mean squared error
MSSS: multiple sclerosis severity score
NA: not applicable
ND: no data available
NDH‐9HPT: non‐dominant hand 9‐hole peg test
NPV: negative predictive value
O:E: observed to expected ratio
PASAT: Paced Auditory Serial Addition Test
PPV: positive predictive value
RF: random forest
ROC: receiver operator characteristics
SD: standard deviation
SE: standard error
SP: secondary progressive
T25FW: timed 25‐foot walk
Val: validation
SE: standard error
VFT: visual function test
XGB: extreme gradient boosting

AEvents per variable was computed using only tabular predictors, but non‐tabular predictors also considered.
BConfidence interval was not reported, but computed based on reported information.

Table 7

6. Final model and presentation.

Model Outcome Definition Timing Predictors Presentation
Agosta 2006 Disability (EDSS) Clinical worsening confirmed after a 3‐month, relapse‐free period (EDSS increase (for EDSS baseline) ≥ 1.0 (< 6.0), ≥ 0.5 (≥ 6.0)) Follow‐up median: 8 years, mean: 7.7 years Baseline GM histogram peak height, average lesion MTR percentage change after 12 months, follow‐up duration Regression coefficients without the intercept or coefficient for 'adjustment for follow‐up duration'
Bejarano 2011 Disability (EDSS) Change in EDSS 2 years Age, worst central motor conduction time of both arms, worst central motor conduction time of both legs, at least 1 abnormal MEP, motor score of EDSS at baseline ND
De Brouwer 2021 Disability (EDSS) Disability progression confirmed at least 6 months later (EDSS increase (baseline EDSS) ≥ 1.5 (0), 1 (≤ 5.5), or 0.5 (> 5.5)) 2 years Gender, age at onset, MS course at time t = 0 (RRMS, SPMS, PPMS, or CIS), disease duration at time t = 0, EDSS at t = 0, last used DMT at t = 0 (none, interferons, natalizumab, fingolimod, teriflunomide, dimethyl‐fumarate, glatiramer, alemtuzumab, rituximab, cladribine, ocrelizumab, other (contains stem cells therapy, siponimod and daclizumab)), EDSS trajectories ND
de Groot 2009 dexterity Disability (9HPT) Impaired dexterity (abnormal score (mean – 1.96 SD, healthy Dutch reference population) for the 9HPT) 3 years How well can you use your hands?, impairment of sensory tract, impairment of pyramidal tract, Impairment of cerebellar tract, T2‐weighted infratentorial lesion load Score chart
de Groot 2009 walking Disability (EDSS) Inability to walk 500 m (EDSS ≥ 4) 3 years How well can you walk? Impairment of cerebellar tract, number of lesions in spinal cord Score chart
Kuceyeski 2018 Disability (cognitive ‐ SDMT) Processing speed measured by Symbol Digits Modality Test Mean (SD): 28.6 months (10.3 months) Age, sex, disease duration, treatment duration, baseline SDMT, baseline EDSS, regional GM atrophy (86 regions), NEMO pairwise disconnection measures (610 considered), number of months between time points ND
Law 2019 Ada Disability (EDSS) Confirmed disability progression sustained for 6 months: EDSS increase (for EDSS baseline) ≥ 1.0 (≤ 5.5), ≥ 0.5 (≥ 6) 2 years T25FW, 9HPT, PASAT, EDSS, disease duration, age, sex, T2LV, BPF ND
Law 2019 DT Disability (EDSS) Confirmed disability progression sustained for 6 months: EDSS increase (for EDSS baseline) ≥ 1.0 (≤ 5.5), ≥ 0.5 (≥ 6) 2 years T25FW, 9HPT, PASAT, EDSS, disease duration, age, sex, T2LV, BPF ND
Law 2019 RF Disability (EDSS) Confirmed disability progression sustained for 6 months: EDSS increase (for EDSS baseline) ≥ 1.0 (≤ 5.5), ≥ 0.5 (≥ 6) 2 years T25FW, 9HPT, PASAT, EDSS, disease duration, age, sex, T2LV, BPF ND
Lejeune 2021 Disability (EDSS) Residual disability after relapse (EDSS increase ≥ 1) 6 months Increased EDSS during relapse, pre‐relapse EDSS at 0, age, proprioceptive ataxia, subjective sensory disorder, disease duration Web app at https://shiny.idbc. fr/SMILE/
Malpas 2020 Disability (EDSS) Aggressive MS (EDSS ≥ 6 reached within 10 years of symptom onset, sustained over ≥ 6 months, and sustained until end of follow‐up) 10 years, time from onset to aggressive disease mean (SD, range): 6.05 years (2.79 years, 0 years to 9.89 years) Onset age, median EDSS in first year, pyramidal signs Relative risk of aggressive disease by number of positive signs in simplified model (dichotomised based on individual optimal thresholds)
Mandrioli 2008 Disability (EDSS) Severe MS (EDSS ≥ 4 by 10 years disease duration, EDSS progression confirmed in 2 consecutive examinations) Follow‐up from onset mean (SD): BMS 16.03 years (0.92 years), SMS 13.62 years (0.80 years), time between unclear CSF IgM OB presence, motor symptoms at onset, sensory symptoms at onset, time to second relapse in months Full regression model
Margaritella 2012 Disability (EDSS) EDSS 1 year after included mEPS and EDSS predictors EDSS, mEPS, age at onset, gender, benign course, PP course Full regression model
Montolio 2021 Disability (EDSS) Worsening (EDSS increase ≥ 1) 10 years (8 years) from baseline (last predictors) Disease duration, relapse in preceding year, EDSS, temporal RNFL thickness, superior RNFL thickness List of selected predictors
Oprea 2020 Disability (EDSS) Keeping EDSS score ≤ threshold (chosen threshold: 2.5) ND Gender, age at diagnosis, age, EDSS at onset, disease duration, number of treatments ND
Pinto 2020 severity 10 years Disability (EDSS) Severe disease (EDSS > 3 based on the mean EDSS from all clinical visits in prediction horizon year) 10 years from baseline (with predictors at 5 years from baseline) ND ND
Pinto 2020 severity 6 years Disability (EDSS) Severe disease (EDSS > 3 based on the mean EDSS from all clinical visits in prediction horizon year) 6 years from baseline (with predictors at 2 years from baseline) ND ND
Roca 2020 Disability (EDSS) EDSS 2 years from the initial imaging Unstructured: FLAIR images, lesion masks from white matter hyperintensities segmentation from FLAIR images, structured: 60 tracts of interest from the ICBM‐DTI 81 white matter labels and sensorimotor tracts atlases in MNI space, whole‐brain lesion load, volume of the lateral ventricles, age, gender, 3D/2D nature of FLAIR sequence ND
Rocca 2017 Disability (EDSS) EDSS change from baseline confirmed after 3 months 15 years, median (IQR): 15.1 years (13.9 years to 15.4 years) Baseline EDSS, 15‐month EDSS change, 15‐month new T1 hypointense lesions, percentage brain volume change, baseline grey matter mean diffusivity Regression coefficients without the intercept
Rovaris 2006 Disability (EDSS) Clinical worsening confirmed after 3 months (EDSS increase (for baseline EDSS) ≥ 1.0 (< 6.0), ≥ 0.5 (≥ 6.0)) Follow‐up median (range): 56.0 months (35 months to 63 months) Baseline EDSS, grey matter mean diffusivity, follow‐up Regression coefficients without intercept and follow‐up time
Sombekke 2010 Disability (MSSS) MSSS ≥ 2.5 ND Age at onset, male gender, progressive onset type, NOS2 level, PITPNCI level, IL2 level, CCL5 level, ILIRN level, PNMT level Regression coefficients without intercept
Szilasiova 2020 Disability (EDSS) EDSS ≥ 5.0 15 years Sex, age, MS form, EDSS, MS duration, P300 latency (ms) Full regression model
Tommasin 2021 Disability (EDSS) Disability progression (EDSS increase (for baseline EDSS) ≥ 1.5 (0), 1 (≤ 5.5), or 0.5 (> 5.5)) Follow‐up mean (SD, range): 3.93 years (0.95 years, 2 years to 6 years) T2 lesion load, cerebellar volume, thalamic volume, fractional anisotropy of normal appearing WM List of predictors (model selected)
Tousignant 2019 Disability (EDSS) EDSS increase (for baseline EDSS) ≥ 1.5 (0), ≥ 1 (0.5 to 5.5), ≥ 0.5 (≥ 6) sustained for ≥ 12 weeks 1 year MRI channels: volumes from T1‐weighted pre‐contrast, T1‐weighted post‐contrast, T2w, proton density‐weighted, FLAIR; T2 lesion masks; Gadolinium enhanced lesion masks List of predictors (no selection)
Weinshenker 1991 M3 Disability (DSS) Time to reach DSS 6 (EDSS 6.0 or 6.5) Follow‐up for 12 years Age at onset, seen at MS onset, motor (insidious), brainstem, cerebellar, cerebral, pyramidal Full regression model
Weinshenker 1996 short‐term Disability (EDSS) Short‐term progression (change in EDSS) Definition 1 year to 3 years, follow‐up summarised for 2 years Duration, EDSS, progression index, predicted time to DSS 6 from model 1, follow‐up Full regression model
Yperman 2020 Disability (EDSS) Disability progression (EDSS increase (for baseline EDSS) ≥ 1.0 (≤ 5.5), ≥ 0.5 (> 5.5)) Baseline to outcome EDSS median (IQR): 1.98 years (1.84 years to 2.08 years) (similar for baseline MEP) Selected predictors unclear, at least latencies, EDSS at T0, age ND
Zhao 2020 LGBM All Disability (EDSS) Worsening: EDSS increase ≥ 1.5 5 years Unclear if it is the complete list. Common: age, ethnicity, longitudinal features, attack previous 6 months, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time ND
Zhao 2020 LGBM Common Disability (EDSS) Worsening: EDSS increase ≥ 1.5 5 years Unclear if it is the complete list. Common: age, ethnicity, longitudinal features, attack previous 6 m, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time ND
Zhao 2020 XGB All Disability (EDSS) Worsening: EDSS increase ≥ 1.5 5 years Unclear if it is the complete list. Common: age, ethnicity, longitudinal features, attack previous 6 m, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time ND
Zhao 2020 XGB Common Disability (EDSS) Worsening: EDSS increase ≥ 1.5 5 years Unclear if it is the complete list. Common: age, ethnicity, longitudinal features, attack previous 6 m, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time ND
Gurevich 2009 FLP Relapse Time from baseline gene expression analysis until next relapse (3 categories: < 500 days, 500 days to 1264 days, > 1264 days) Unclear reporting FLJ10201, PDCD2, IL24, MEFV, CA2, SLM1, CLCN4, SMARCA1, TRIM22, TGFB2 List of selected genes
Gurevich 2009 FTP Relapse Time from baseline gene expression analysis to next acute relapse (given this time < 500 days) Unclear reporting KIAA1043, LOC51145, PPFIA1, MGC8685, DNCH2, PCOLCE2, FPRL1, G3BP, RHBG List of selected genes
Sormani 2007 Relapse Time to first relapse (≥ 1 neurological symptoms (causing EDSS increase ≥ 0.5 or 1 grade in the score of 2 or more functional systems or 2 grades in 1 functional system) lasting at least 48 hours and preceded by a relatively stable or improving neurological state in the prior 30 days) Follow‐up median (range): 14 months (0.4 months to 16 months), time to outcome from study entry mean (SD): 47 weeks (0.9 weeks) Previous 2 years' relapses, number of enhancing lesions Regression model formula with survival probability for 6 months and 1 year
Vukusic 2004 Relapse Postpartum relapse 3 months after delivery Number of relapses in pre‐pregnancy year, Number of relapses during pregnancy, MS duration Full regression model
Ye 2020 gene signature Relapse Relapse‐free survival Follow‐up mean (SD): 1.97 years (1.3 years) FTH1, GBP2, MYL6, NCOA4, SRP9 Regression coefficients without baseline hazard
Ye 2020 nomogram Relapse Relapse‐free survival Follow‐up mean (SD): 1.97 years (1.3 years) Age, gender, disease type, DMT, risk score Nomogram
Aghdam 2021 Conversion to definite MS McDonald 2010 Follow‐up mean (SD): 5.30 years (2.94 years), no precise time point of measurement Presence of plaque in MRI, history of previous optic neuritis attack, type of optic neuritis, gender Decision tree
Bendfeldt 2019 linear placebo Conversion to definite MS Modified Poser diagnosis confirmed by a central committee (relapse with clinical evidence of at least one CNS lesion (distinct from lesion responsible for CIS presentation if monofocal) or EDSS increase ≥ 1.5 reaching total EDSS ≥ 2.5 and confirmed 3 months later) 2 years Cortical grey matter segmentation masks, age, sex, scanner ND
Bendfeldt 2019 M7 placebo Conversion to definite MS Modified Poser diagnosis confirmed by a central committee (relapse with clinical evidence of at least one CNS lesion (distinct from lesion responsible for CIS presentation if monofocal) or EDSS increase ≥ 1.5 reaching total EDSS ≥ 2.5 and confirmed 3 months later) 2 years Age, sex, EDSS, GM volume ratio, whole brain summaries (total, mean, median, minimum, maximum, SD) of lesion volume, surface area, and mean breadth, unclear if 3 more imaging predictors ND
Bendfeldt 2019 M9 IFN Conversion to definite MS Modified Poser diagnosis confirmed by a central committee (relapse with clinical evidence of at least one CNS lesion (distinct from lesion responsible for CIS presentation if monofocal) or EDSS increase ≥ 1.5 reaching total EDSS ≥ 2.5 and confirmed 3 months later) 2 years Age, sex, EDSS, GM volume ratio, lesion count, whole brain summaries (total, mean, SD) of lesion volume, surface area, and mean breadth, Euler‐Poincare characteristic ND
Borras 2016 Conversion to definite MS Presence of IgG oligoclonal bands and an abnormal brain MRI at baseline (2, 3, or 4 Barkhof criteria) Follow‐up median (SD): CIS 3.25 years (1.32 years), CDMS 4.08 years (2.48 years) CH3L1, CNDP1 Heat maps
Gout 2011 Conversion to definite MS Time to Poser 1983 diagnosis Follow‐up median (range): 3.5 years (1.0 year to 12.7 years), time to outcome in those who experience it median (range) 16.6 months (1.1 months to 112.5 months) Age (≤ 31 years), 3 to 4 positive MR Barkhof criteria, CSF white blood cell Count > 4 per cubic millimetre Sum score: 1 if age at onset ≤ 31 years, 3 if 3 to 4 Barkhof Criteria present, 1 if > 4 white blood cells per cubic millimetre in CSF
Martinelli 2017 Conversion to definite MS Time to Poser 1983 diagnosis Follow‐up median (IQR): 7.3 years (3.5 years to 10.2 years) 2010 DIS criteria fulfilled, 2010 DIT criteria fulfilled, age, T1 lesions, CSF oligoclonal bands (steroid use in 4 weeks prior to study, DMT use during follow‐up) List of selected predictors
Olesen 2019 candidate Conversion to definite MS McDonald 2010 Follow‐up median (range): 29.6 months (19 months to 41 months) IL‐10, NF‐L, CXCL13 Nomogram
Olesen 2019 routine Conversion to definite MS McDonald 2010 Follow‐up median (range): 29.6 months (19 months to 41 months) OCB, leukocytes, IgG index Nomogram
Runia 2014 Conversion to definite MS Time from start of first symptoms to CDMS (Poser 1983) Unclear DIS + DIT2010, corpus callosum lesions, oligoclonal bands, fatigue, abnormal MRI Unweighted sum score from 0 to 5
Spelman 2017 Conversion to definite MS Time to first relapse following CIS (Poser 1983) Follow‐up median (IQR): 1.92 years (0.90 years to 3.71 years) Sex, age, EDSS, first symptom location, T2 infratentorial lesions, T2 periventricular lesions, OCB in CSF Nomogram for 1‐year outcomes (nomograms for 6‐month, 2, 3, 4, and 5‐year outcomes)
Wottschel 2015 1 year Conversion to definite MS Occurrence of a second clinical attack attributable to demyelination of more than 24 hours in duration and at least 4 weeks from the initial attack 1 year Type of presentation, gender, lesion load List of selected predictors and kernel degree
Wottschel 2015 3 year Conversion to definite MS Occurrence of a second clinical attack attributable to demyelination of more than 24 hours in duration and at least 4 weeks from the initial attack 3 years Lesion count, average lesion PD intensity, average distance of lesions from the centre of the brain, shortest horizontal distance of a lesion from the vertical axis, age, EDSS at onset List of selected predictors and kernel degree
Wottschel 2019 Conversion to definite MS Occurrence of a second clinical episode 1 year Type of CIS, WM lesion load ‐ whole brain, WM lesion load ‐ frontal, WM lesion load ‐ limbic, WM lesion load ‐ temporal, WM lesion load ‐ dGM, WM lesion load ‐ WM, GM ‐ cerebellum, GM ‐ thalamus, GM ‐ frontal operculum, GM ‐ middle cingulate gyrus, GM ‐ precentral gyrus medial segment, GM ‐ posterior cingulate gyrus, GM ‐ praecuneus, GM ‐ parietal operculum, GM ‐ post‐central gyrus, GM ‐ planum polare, GM ‐ subcallosal area, GM ‐ supplementary motor cortex, GM ‐ superior occipital gyrus, cortical thickness ‐ central operculum, cortical thickness ‐ cuneus, cortical thickness ‐ fusiform gyrus, cortical thickness ‐ inferior temporal gyrus, cortical thickness ‐ middle occipital gyrus, cortical thickness ‐ post‐central gyrus medial segment, cortical thickness ‐ occipital pole, cortical thickness ‐ opercular part of the inferior frontal gyrus, cortical thickness ‐ orbital part of the inferior frontal gyrus, cortical thickness ‐ planum temporale, cortical thickness ‐ superior occipital gyrus, volume ‐ whole brain, volume ‐ ventral diencephalon, volume ‐ middle temporal gyrus, volume ‐ supramarginal gyrus, volume ‐ limbic List of selected predictors for peak accuracy when using 2‐fold CV
Yoo 2019 Conversion to definite MS McDonald 2005 2 years Unstructured: MRI mask images, structured: T2w lesion volume, brain parenchymal fraction, diffusely abnormal white matter, gender, initial CIS event cerebrum, initial CIS event optic nerve, initial CIS event cerebellum, initial CIS event brainstem, initial CIS event spinal cord, EDSS, CIS monofocal or multifocal type at onset ND
Zakharov 2013 Conversion to definite MS Development of CDMS (second attack) Follow‐up for 8 years Age at disease onset, size of the foci of demyelination ND
Zhang 2019 Conversion to definite MS Demonstration of dissemination in Time by clinical relapse or new MRI lesions 3 years Total lesion number, total lesion volume, minimum, maximum, mean, SD for surface area, sphericity, surface‐volume‐ratio, and volume of individual lesions ND
Bergamaschi 2001 BREMS Conversion to progressive MS Time to earliest date of observation of progressive worsening (EDSS increase ≥ 1) persistent for ≥ 6 months Follow‐up mean (SD, range): 7.5 years (5.7 years, 3 years to 25 years) Age at onset, female, sphincter onset, pure motor onset, motor‐sensory onset, sequelae after onset, number of involved FS at onset, number of sphincter plus motor relapses, EDSS ≥ 4 outside relapse Regression model without baseline hazard or proneness to failure
Brichetto 2020 Conversion to progressive MS ND Unclear reporting ABILHAND item 12, ABILHAND total, HADS item 7, HADS sub1, HADS sub2, HADS total, Life Satisfaction Index total, MFIS item 2, MFIS sub1, MFIS sub2, MFIS sub3, MFIS total, Overactive Bladder Questionnaire item 1, Overactive Bladder Questionnaire item 4, Overactive Bladder Questionnaire total, Functional Independence Measure item 10, Functional Independence Measure item 11, Functional Independence Measure item 12, Functional Independence Measure item 14, Functional Independence Measure sub3, Functional Independence Measure sub4, Functional Independence Measure sub5, Functional Independence Measure sub6, Functional Independence Measure total, Montreal Cognitive Assessment item 1, Montreal Cognitive Assessment item 9, Montreal Cognitive Assessment tot1, Montreal Cognitive Assessment tot2, PASAT, SDMT, years of education, height, weight List of selected predictors
Calabrese 2013 Conversion to progressive MS EDSS increase ≥ 1.0 EDSS not related to a relapse and confirmed at 6 months Up to 5 years, time to outcome median (range): 52 months (29 months to 64 months) Age, cortical lesion volume, cerebellar cortical volume Full regression model
Manouchehrinia 2019 Conversion to progressive MS Time to earliest recognised date of SPMS onset determined by neurologist at routine visit Follow‐up mean (SD): 12.5 years (8.7 years) Calendar year of birth, male sex, onset age, first‐recorded EDSS score, age at the first‐recorded EDSS score Nomograms for calculating probabilities of 10, 15, and 20 year risk (web app at https://aliman.shinyapps.io/SPMSnom/)
Misicka 2020 10 years Conversion to progressive MS Time from participant‐reported age of RRMS onset to participant‐reported age of SPMS onset Up to 10 years Age of MS onset, male sex, time to second relapse, cancer, brainstem/bulbar, HLA‐A*02:01 0.60 Nomogram
Misicka 2020 20 years Conversion to progressive MS Time from participant‐reported age of RRMS onset to participant‐reported age of SPMS onset Up to 20 years Age of MS onset, male sex, time to second relapse, obesity, neurological disorders, HLA‐A*02:01 0.56 Nomogram
Misicka 2020 ever Conversion to progressive MS Time from participant‐reported age of RRMS onset to participant‐reported age of SPMS onset ND Age of MS onset, male sex, time to second relapse, neurological disorders, spasticity, HLA‐A*02:01 Nomogram
Pinto 2020 SP Conversion to progressive MS SPMS diagnosis by clinician Unclear (with predictors at 2 years from baseline) ND ND
Pisani 2021 Conversion to progressive MS Time to occurrence of continuous disability accumulation independent of relapses confirmed 12 months later (transitory plateaus in the progressive course were allowed) Follow‐up mean (range): 9.55 years (6.8 years to 13.13 years) At onset: cortical lesion number, age, EDSS, white matter lesion number; difference (between 0 years and 2 years): global cortical thickness, cerebellar cortical volume, new cortical lesion number Combination of heat map value for 2 predictors plus other predictor values weighted by their minimal depth
Seccia 2020 180 days Conversion to progressive MS Assessed by treating clinician 180 days from the index visit Longitudinal trajectories of: age at onset, gender, age at visit, EDSS, number relapses from last visit, pregnancy, relapses frequency, time from last relapse, spinal cord, supratentorial, optic pathway, brainstem‐cerebellum, relapse treatment drugs, first‐line DMT, immunosuppressant, MS symptomatic treatment drugs, second‐line DMT, other concomitant diagnosis drugs ND
Seccia 2020 360 days Conversion to progressive MS Assessed by treating clinician 360 days from the index visit Longitudinal trajectories of: age at onset, gender, age at visit, EDSS, Number relapses from last visit, pregnancy, relapses frequency, time from last relapse, spinal cord, supratentorial, optic pathway, brainstem‐cerebellum, relapse treatment drugs, first‐line DMT, immunosuppressant, MS symptomatic treatment drugs, second‐line DMT, other concomitant diagnosis drugs ND
Seccia 2020 720 days Conversion to progressive MS Assessed by treating clinician 720 days from the index visit Longitudinal trajectories of: age at onset, gender, age at visit, EDSS, number relapses from last visit, pregnancy, relapses frequency, time from last relapse, spinal cord, supratentorial, optic pathway, brainstem‐cerebellum, relapse treatment drugs, first‐line DMT, immunosuppressant, MS symptomatic treatment drugs, second‐line DMT, other concomitant diagnosis drugs ND
Skoog 2014 Conversion to progressive MS Time from RRMS onset to retrospectively‐determined continuous progression for at least 1 year without remission Time to outcome median (range): 11.5 years (0.7 years to 56.7 years) Age, attack grade, time since last relapse (interaction with attack grade) Web app at http://msprediction.com
Tacchella 2018 180 days Conversion to progressive MS Gradual worsening of RRMS course determined by change in EDSS independent of relapses over a period of at least 6 or 12 months 180 days after visit of interest Age at onset, cognitive impairment, age at visit, visual impairment, routine visit, visual field deficits/scotoma, suspected relapse, diplopia (oculomotor nerve palsy), ambulation index, nystagmus, 9HPT (right), trigeminal nerve impairment, 9HPT (left), hemifacial spasm, PASAT, dysarthria, T25FW, dysphagia, impairment in daily living activities, facial nerve palsy, upper‐limb motor deficit, lower‐limb motor deficit, upper limb ataxia, lower limb ataxia, dysaesthesia, hypoesthesia, paraesthesia, Lhermitte's sign, urinary dysfunction, fatigue, bowel dysfunction, mood disorders, sexual dysfunction, tremor, ataxic gait, headache, paretic gait, spastic gait, EDSS, pyramidal functions, cerebellar functions, brainstem functions, sensory functions, bowel and bladder function, visual function, cerebral (or mental) functions, ambulation score ND
Tacchella 2018 360 days Conversion to progressive MS Gradual worsening of RRMS course determined by change in EDSS independent of relapses over a period of at least 6 or 12 months 360 days after visit of interest Age at onset, cognitive impairment, age at visit, visual impairment, routine visit, visual field deficits/scotoma, suspected relapse, diplopia (oculomotor nerve palsy), ambulation index, nystagmus, 9HPT (right), trigeminal nerve impairment, 9HPT (left), hemifacial spasm, PASAT, dysarthria, T25FW, dysphagia, impairment in daily living activities, facial nerve palsy, upper‐limb motor deficit, lower‐limb motor deficit, upper limb ataxia, lower limb ataxia, dysaesthesia, hypoesthesia, paraesthesia, Lhermitte's sign, urinary dysfunction, fatigue, bowel dysfunction, mood disorders, sexual dysfunction, tremor, ataxic gait, headache, paretic gait, spastic gait, EDSS, pyramidal functions, cerebellar functions, brainstem functions, sensory functions, bowel and bladder function, visual function, cerebral (or mental) functions, ambulation score ND
Tacchella 2018 720 days Conversion to progressive MS Gradual worsening of RRMS course determined by change in EDSS independent of relapses over a period of at least 6 or 12 months 720 days after visit of interest Age at onset, cognitive impairment, age at visit, visual impairment, routine visit, visual field deficits/scotoma, suspected relapse, diplopia (oculomotor nerve palsy), ambulation index, nystagmus, 9HPT (right), trigeminal nerve impairment, 9HPT (left), hemifacial spasm, PASAT, dysarthria, T25FW, dysphagia, impairment in daily living activities, facial nerve palsy, upper‐limb motor deficit, lower‐limb motor deficit, upper limb ataxia, lower limb ataxia, dysaesthesia, hypoesthesia, paraesthesia, Lhermitte's sign, urinary dysfunction, fatigue, bowel dysfunction, mood disorders, sexual dysfunction, tremor, ataxic gait, headache, paretic gait, spastic gait, EDSS, pyramidal functions, cerebellar functions, brainstem functions, sensory functions, bowel and bladder function, visual function, cerebral (or mental) functions, ambulation score ND
Vasconcelos 2020 Conversion to progressive MS Time until confirmed progressive and sustained worsening for at least 6 months (irreversible EDSS increase (for EDSS baseline) ≥ 1.0 (≤ 5.5), 0.5 point (> 5.5) independent of relapse) Time to outcome mean (SD): 13.70 years (8.88 years) Pyramidal and cerebellar impairment at onset of the disease, treatment before EDSS 3, age at disease onset, African descent, time between first and second relapses (unclear if the coefficient for 'recovery' is needed or the model fit without recovery is presented) Unweighted sum score from 0 to 5 (unclear if based on refit minus 'recovery,' found to be insignificant at multivariable analysis)
Ahuja 2021 Composite (relapse) Clinical/radiological relapse (radiological relapse: new T1‐enhancing lesion or new/enlarging T2‐FLAIR hyperintense lesion on brain, orbit, or spinal cord MRI) 1 year Step 1 model: 111 features (12 Current Procedural Terminology (CPT) codes, 60 CUIs derived from free‐text, and 35 PheCodes from ICD data, age, sex, race, disease duration), step 2 model: age, disease duration, relapse history estimated by model 1 Regression coefficients without intercept
de Groot 2009 cognitive Composite (cognitive ‐ see definition) Cognitive impairments: score of mean – SD for 1 or more subtests of a cognitive screening test (subscales of consistent long‐term retrieval and long‐term storage of the selective reminding test, 10/36 spatial recall test, symbol digit modalities test, PASAT, and word list generation) 3 years Age, gender, how well can you concentrate?, T2‐weighted supratentorial lesion load Score chart
Kosa 2022 Composite (see definition) MS‐DSS (model output based on measured CombiWISE (EDSS, SNRS, T25FW, NDH‐9HPT), therapy adjusted CombiWISE, COMRIS‐CTD (lesion/atrophy measures), time from disease onset to first therapy, difference between adjusted and unadjusted CombiWISE, age, and family history of MS) Follow‐up mean: 4.3 years SOMAmer ratios, age, sex ND
Pellegrini 2019 Composite (EDSS, T25FW, 9HPT, PASAT, VFT) Time to disability progression (EDSS increase (for EDSS baseline) ≥ 1 (≥ 1), 1.5 (< 1) or 20% worsening on either T25FW or 9HPT or PASAT or 10‐letter worsening on VFT) confirmed at 24 weeks 2 years PASAT, SF‐36 physical component summary, visual function test Regression coefficients without baseline hazard

2D: 2‐dimensional
3D: 3‐dimensional
9HPT: 9‐hole peg test
ABILHAND: interview‐based assessment of a patient‐reported measure of the perceived difficulty in using their hand to perform manual activities
Ada: adaptive boosting
BMS: benign MS
BPF: brain parenchymal fraction
BREMS: Bayesian Risk Estimate for Multiple Sclerosis
CDMS: conversion to clinically definite MS
CLCN4: chloride voltage‐gated channel 4
CombiWISE: Combinatorial Weight‐adjusted Disability Score
COMRIS‐CTD: Combinatorial MRI scale of CNS tissue destruction
CPT: current procedural terminology
CSF: cerebrospinal fluid
CUIs: concept unique identifiers
CXCL13: chemokine ligand 13
CV: cross‐validation
dGM: deep grey matter
DIS: dissemination in space
DIT: dissemination in time
DIT2010: dissemination in time according to McDonald 2010 criteria
DMT: disease‐modifying therapy
DSS: Disability Status Scale
DT: decision tree
EDSS: Expanded Disability Status Scale
FLAIR: fluid‐attenuated inversion recovery
FLJ10201: anti‐YEATS2 antibody
FLP: first level predictor
FTP: fine tuning predictor
GD: gadolinium
GM: grey matter
HADS: Hospital Anxiety and Depression Scale
HLA: human leukocyte antigen
ICBM‐DTI: International Consortium of Brain Diffusion Tensor Imaging
ICD: International Classification of Diseases
IFN: interferon
IgG: immunoglobulin G
IL2: interleukin‐2
ILIRN: interleukin‐1 receptor antagonist
IQR: interquartile range
KIAA1043: a gene
LGBM: light gradient boosting machine
MEFV: Mediterranean fever gene
MEP/mEPS: motor evoked potentials
MFIS: Modified Fatigue Impact Scale
MNI: Montreal Neurological Institute
MR: magnetic resonance
MRI: magnetic resonance imaging
MS: multiple sclerosis
MS‐DSS: MS disease severity scale
MTR: magnetisation transfer imaging
ND: no data available
NDH‐9HRT: non‐dominant hand 9‐hole peg test
NEMO: network modification tool
NF‐L: neurofilament light chain
NOS2: nitric oxide synthase 2
OB: oligoclonal bands
OCB: oligoclonal bands
PASAT: Paced Auditory Serial Addition Test
PD: patient‐determined
PDCD2: human programmed cell death protein 2
PITPNCI: phosphatidylinositol transfer protein
PNMT: phenylethanolamine‐N‐methyltransferase gene
PP: primary progressive
PPMS: primary progressive MS
RF: random forest
RNRL: retinal nerve fibre layer
RRMS: relapsing‐remitting multiple sclerosis
SD: standard deviation
SDMT: symbol digit modalities test
SF‐36: 36‐Item Short Form Health Survey
SLM: a gene
SMARCA1: SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 1
SMS: severe multiple sclerosis
SNRS: Scripps neurologic rating scale
SOMAmer: short, single stranded deoxyoligonucleotides
SP: secondary progression
SPMS: secondary progressive MS
T25FW: timed 25‐foot walk
T2LV: T2 lesion volume
T2w: T2‐weighted
TGFB2: transforming growth factor beta 2
TRIM22: tripartite motif containing 22
VFT: visual function test
WM: white matter
XGB: extreme gradient boosting

Appendix 6. Additional figures

Figure 8

8.

8

Tables of predictor domains included or considered in included models. Top: models developed with machine learning; bottom: models developed with traditional methods. CSF: cerebrospinal fluid, Dev: development, ND: no data available

Figure 9

9.

9

Risk of bias assessments per analysis grouped by machine learning developments, traditional statistics developments, and all validations. Dev: development, ML: machine learning, Val: validation, y: year(s), d: days

Figure 10

10.

10

Percent of study items reported over time by analysis type. Data for the year 2021 are incomplete (only until July). ML: machine learning.

Figure 11

11.

11

Tables of TRIPOD items in included analyses. Top: models developed with machine learning; middle: models developed with traditional methods; bottom: model validations. The white box indicates that item 13c, which pertains to external validations, was not applicable for the Skoog 2019 Val analysis, which used the development participants. Dev: development, ML: machine learning, Val: validation, y: year(s), d: days

Characteristics of studies

Characteristics of included studies [ordered by study ID]

Aghdam 2021.

Study characteristics
General information Model name
Not applicable
Primary source
Journal
Data source
Cohort, secondary
Study type
Development
Participants Inclusion criteria
  • Optic neuritis patients with at least 2 of the following: pain with eye movements, decreased best corrected visual acuity, relative afferent pupillary defect, prolonged P100 latency of visual evoked potentials

  • At least 3 years' follow‐up


Exclusion criteria
  • Optic neuropathies proven to be the result of other aetiologies such as compressive, infiltrative, toxic, hereditary, and metabolic

  • Presence of any other ocular condition that could confound data gathering like diabetic retinopathy and retinal dystrophies

  • Failure to follow up patients during the study period

  • Any finding suggestive of the presence of underlying vasculitis or infectious cause for optic neuritis

  • Patients who underwent any preemptive treatment such as with interferon after ON

  • Missing data in medical records that could affect the result of this study

  • NMOSD diagnosis during follow‐up


Recruitment
Patients admitted to the ophthalmology and neurology departments of Rassoul Akram Hospital, a tertiary referral centre in Tehran, Iran
Age (years)
Mean 40.0 (at ON)
Sex (%F)
74.1
Disease duration (years)
Not reported
Diagnosis
100% CIS
Diagnostic criteria
McDonald 2010 (Polman 2011)
Treatment
  • At recruitment, 0%

  • During follow‐up, not reported


Disease description
History of ON: 11.9%
Recruitment period
2008 to 2018
Predictors Considered predictors
Age (as continuous and/or dichotomised), gender, season of attack (spring vs other), best corrected visual acuity (as continuous or dichotomised as logMAR ≤ or > 1), optic disc swelling (type of ON), ocular pain, ON history, plaque positive (white matter lesions ≥ 3 mm in diameter in juxtacortical, periventricular, infratentorial or spinal cord regions), dissemination in space (hyperintense T2 lesions in ≥ 2 of juxtacortical, periventricular, infratentorial or spinal cord regions), treatment with prednisolone
Number of considered predictors
≥ 10
Timing of predictor measurement
At presentation due to ON
Predictor handling
Unclear, all might be dichotomised and/or continuous
Outcome Outcome definition
Conversion to definite MS (Polman 2011): CDMS based on 2010 revised McDonald criteria
Timing of outcome measurement
Follow‐up mean (SD): 5.30 years (2.94 years), no precise time point of measurement
Missing data Number of participants with any missing value
60
Missing data handling
Exclusion
Analysis Number of participants (number of events)
277 (117)
Modelling method
Classification tree
Predictor selection method
  • For inclusion in the multivariable model, univariable analysis

  • During multivariable modelling, full model approach


Hyperparameter tuning
Not reported
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Random split: 70% training, 30% test
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Accuracy = 0.74, sensitivity = 0.71, specificity = 0.76, PPV = 0.65, NPV = 0.79
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Decision tree
Number of predictors in the model
4
Predictors in the model
Presence of plaque in MRI, history of previous optic neuritis attack, type of optic neuritis, gender
Effect measure estimates
Tree given
Predictor influence measure
Not reported
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To evaluate the predisposing factors of conversion to MS in the Iranian population with ON to organise a decision tree for predicting the probability of conversion to MS
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Probably confirmatory
Suggested improvements
Planned study, survival, incorporating other factors such as CSF components or serum vitamin D level, adding visual outcomes, use McDonald 2017 (Thompson 2018b)
Notes Applicability overall
Low
Applicability overall rationale
Study authors confirmed that participants who had already experienced the outcome at baseline were not included in the development set.
 
Item Authors' judgement Support for judgement
Participants Unclear Participants were excluded for having missing data.
Predictors Yes The predictors are collected by fellows and are standard enough.
Outcome No Although some predictors could have been known while assessing parts of the outcome, we consider the outcome to be robust to such information. However, a specific time point for outcome assessment was not used and its variability is high. Also, dissemination in time and space are amongst the predictors, which form part of the outcome definition.
Analysis No The EPV was around 10 for the entire dataset. Predictors were dichotomised and selected prior to multivariable modelling. The differing outcome time was not addressed. Neither discrimination nor calibration was assessed. A random split was used for validation.
Overall No At least one domain is at high risk of bias.

Agosta 2006.

Study characteristics
General information Model name
Not applicable
Primary source
Journal
Data source
Cohort, primary
Study type
Development
Participants Inclusion criteria
  • Definite MS for at least 2 years, either RR or SP, or CIS within 3 months with paraclinical evidence of spatial disease dissemination (at least 4 focal abnormalities on T2 scans)

  • Participated in the short‐term follow‐up study by Rovaris (2003)

  • No immunosuppressive or immunomodulating treatments for at least 12 months prior to entry

  • No relapses or steroidal treatment during the 3 months preceding both baseline and follow‐up scans


Exclusion criteria
Not reported
Recruitment
All participants were recruited previously by Filippi (2000) and followed‐up at short term by Rovaris (2003), at the University Hospital San Raffaele in Milan, Italy (unclear, based on Ethics Committee approval)
Age (years)
Mean 33.5
Sex (%F)
69.9
Disease duration (years)
Range: 0 to 25
Diagnosis
27.4% CIS, 46.6% RRMS, 26.0% SPMS
Diagnostic criteria
Mixed: Poser 1983, Lublin 1996
Treatment
  • At recruitment, 0%

  • During follow‐up, 54.8% no specific treatment details


Disease description
EDSS median (range): CIS 0.0 (0.0 to 1.5), RRMS 2.5 (1.0 to 5.5), SPMS 5.5 (3.5 to 6.5)
Recruitment period
Not reported
Predictors Considered predictors
Age, disease duration, clinical phenotype (CIS+RR vs SP), baseline EDSS, baseline T2 lesion volume, baseline T1 lesion volume, baseline brain parenchymal fraction, baseline grey matter fraction, baseline white matter fraction, baseline average whole‐brain magnetisation transfer ratio, baseline average grey matter magnetisation transfer ratio, baseline average normal‐appearing white matter magnetisation transfer ratio, baseline average lesion magnetisation ratio, baseline whole‐brain magnetisation transfer ratio histogram peak height, baseline grey matter histogram peak height, baseline normal‐appearing white matter histogram peak height, brain parenchymal fraction percentage change, grey matter fraction percentage change, white matter fraction percentage change, average whole‐brain magnetisation transfer ratio percentage change, average grey matter magnetisation transfer ratio percentage change, average normal‐appearing white matter magnetisation transfer ratio percentage change, average lesion magnetisation transfer ratio percentage change, (unclear adjustment for follow‐up duration)
Number of considered predictors
26
Timing of predictor measurement
At study baseline (cohort entry at least 2 years after diagnosis of definite MS or 3 months after CIS), 12 months (± 10 days) after baseline, at final follow‐up (outcome measurement)
Predictor handling
Continuously
Outcome Outcome definition
Disability (EDSS): clinically worsened defined as an EDSS score increase ≥ 1.0, when baseline EDSS was < 6.0, or an EDSS score increase ≥ 0.5 when baseline EDSS was ≥ 6.0; EDSS changes had to be confirmed by a second visit after a 3‐month, relapse‐free period
Timing of outcome measurement
Follow‐up median: 8 years, mean: 7.7 years
Missing data Number of participants with any missing value
6, only missing outcome reported
Missing data handling
Mixed: last value carried forward, complete case
Analysis Number of participants (number of events)
70 (44)
Modelling method
Logistic regression
Predictor selection method
  • Univariable analysis

  • Mentions a final multivariable model but no selection at multivariable stage


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
No
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation: LOOCV
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Accuracy = 46/70, sensitivity = 30/41, specificity = 16/29
Overall performance
Nagelkerke's R2 = 0.28
Risk groups
Not reported
Model  Model presentation
Regression coefficients without the intercept or coefficient for "adjustment for follow‐up duration"
Number of predictors in the model
2 or 3 (unclear if follow‐up duration included)
Predictors in the model
Baseline GM histogram peak height, average lesion MTR percentage change after 12 months, follow‐up duration
Effect measure
OR (95% CI): baseline GM histogram peak height 0.97 (0.94 to 0.99), average lesion MTR percentage change after 12 months 0.88 (0.80 to 0.98), follow‐up duration (not reported)
Predictor influence
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To assess the value of MT MRI quantities and their short‐term changes in predicting the long‐term accumulation of disability in multiple sclerosis patients
Primary aim
The primary aim of this study is not the prediction of individual outcomes. Rather, the focus is on the usefulness of MRI measures.
Model interpretation
Exploratory
Suggested improvements
Not reported
Notes Applicability overall
High
Applicability overall rationale
Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to show the usefulness of new MRI measures.
Auxiliary references
Filippi M, Inglese M, Rovaris M, Sormani MP, Horsfield P, Iannucci PG, et al. Magnetization transfer imaging to monitor the evolution of MS: a 1‐year follow‐up study. Neurology 2000;55(7):940‐6.
Lechner‐Scott J, Kappos L, Hofman M, Polman CH, Ronner H, Montalban X, et al. Can the expanded disability status scale be assessed by telephone? Mult Scler 2003;9(2):154‐9.
Rovaris M, Agosta F, Sormani MP, Inglese M, Martinelli V, Comi G, et al. Conventional and magnetisation transfer MRI predictors of clinical multiple sclerosis evolution: a medium‐term follow‐up study. Brain 2003;126(Pt 10):2323‐32.
 
Item Authors' judgement Support for judgement
Participants Yes Patients were included probably from a prospectively designed cohort study with clear eligibility criteria.
Predictors Yes Even though there is no clear indication of predictors being collected similarly across patients, the authors described when changes were made from the outcome, suggesting they would describe if there were different assessments of variables.
Outcome Yes Although EDSS was determined differently either in person or by phone, EDSS assessment by phone has been shown to be valid. It is unclear if the outcome assessment was blinded to predictors, but we consider EDSS to be an objective measure.
Analysis No The EPV was far less than 10. Predictors were included based on univariable analyses. Calibration and discrimination were not assessed. Although cross‐validation was used, it is unclear whether the variable selection process was included within this procedure. It is unclear if all participants were analysed and how missing data were handled. Follow‐up time was added as a predictor instead of using methods to deal with different observation times.
Overall No At least one domain is at high risk of bias.

Ahuja 2021.

Study characteristics
General information Model name
  • Dev

  • Ext Val


Primary source
Journal
Data source
  • Dev: mixed (routine care‐electronic health records, cohort), secondary

  • Ext Val: routine care‐electronic health records, secondary


Study type
Development + external validation (spectrum)
Participants Inclusion criteria
  • Dev

    • ≥ 18 years of age

    • Neurologist‐confirmed MS diagnosis

    • Linked EHR data

  • Ext Val

    • MS

    • Neurological care at Mass General Brigham

    • Annotated relapse events


Exclusion criteria
  • Dev: not reported

  • Ext Val: not part of the training set


Recruitment
  • Dev: patients in the Comprehensive Longitudinal Investigation of Multiple Sclerosis at Brigham and Women's Hospital (CLIMB) cohort of Brigham Multiple Sclerosis Centre, USA

  • Ext Val: random selection from electronic health records of the Mass General Brigham Healthcare system, USA


Age (years)
Median 43.3 (at first MS code)
Sex (%F)
  • Dev: 73.9

  • Ext Val: 74.2


Disease duration (years)
  • Dev: median 5.1 (IQR: 2.03)

  • Ext Val: median 4.4 (IQR: 2.82)


Diagnosis
Approximately 70% to 80% RRMS, 10% PPMS, 10% to 20% SPMS
Diagnostic criteria
Not reported
Treatment
  • At recruitment, not reported

  • During follow‐up, unclear, for all patients in the study, 55% on treatment


Disease description
Not reported
Recruitment period
2006 to 2016
Predictors Considered predictors
  • Dev: age, sex, race, disease duration, relapse history (RH), unclear number of EHR predictors (ICD consolidated into PheCode, CPT consolidated into groupings, unclear concepts retrieved from free‐text)

  • Ext Val: not applicable


Number of considered predictors
  • Dev: 2730

  • Ext Val: not applicable


Timing of predictor measurement
  • Dev: from 1 year ago to the index encounter (unspecified)

  • Ext Val: not applicable


Predictor handling
  • Dev: continuously, (log(n + 1) transformation of counts)

  • Ext Val: not applicable

Outcome Outcome definition
Composite (relapse): a relapse event as a clinical and/or radiological relapse; clinical relapse: new or recurrence of neurological symptoms lasting persistently for ≥ 24 h without fever or infection; radiological relapse: either a new T1‐enhancing lesion and/or a new or enlarging T2‐FLAIR hyperintense lesion on brain, orbit, or spinal cord MRI on clinical radiology report.
Timing of outcome measurement
1 year
Missing data Number of participants with any missing value
Not reported
Missing data handling
Not reported
Analysis Number of participants (number of events)
  • Dev: 1435 participants, unit of analysis is visits and its number is not reported (not reported)

  • Ext Val: 186 participants, unit of analysis is visits and its number is not reported (not reported)


Modelling method
  • Dev: logistic regression, LASSO

  • Ext Val: not applicable


Predictor selection method
  • Dev:

    • For inclusion in the multivariable model, univariable analysis

    • During multivariable modelling, mixed

      • Several models

      • Modelling method

  • Ext Val: not applicable


Hyperparameter tuning
  • Dev: model step 1 only: 10‐fold CV for lambda to maximise spearman correlation between observed and predicted relapse count

  • Ext Val: not applicable


Shrinkage of predictor weights
  • Dev: modelling method

  • Ext Val: not applicable


Performance evaluation dataset
  • Dev: none

  • Ext Val: external validation


Performance evaluation method
Not applicable
Calibration estimate
  • Dev: not reported

  • Ext Val: plots comparing observed and predicted relapse proportions stratified by disease duration and age separately


Discrimination estimate
  • Dev: not reported

  • Ext Val: c‐statistic = 0.707 (95% CI 0.69 to 0.71)


Classification estimate
Sensitivity = 0.499, specificity = 0.719, PPV = 0.223, NPV = 0.900, F1 = 0.307
Overall performance
Not reported
Risk groups
  • Dev: not reported

  • Ext Val: 2 groups defined by time‐dependent threshold equal to the observed prevalence within ± 1 year of a patients time since first relapse

Model  Model presentation
  • Dev: regression coefficients without intercept

  • Ext Val: not applicable


Number of predictors in the model
  • Dev: 114 (model 1: 111, model 2: 3)

  • Ext Val: not applicable


Predictors in the model
  • Dev:

    • Step 1 model: 111 features (12 Current Procedural Terminology (CPT) codes, 60 CUIs derived from free‐text, and 35 PheCodes from ICD data, age, sex, race, disease duration)

    • Step 2 model: age, disease duration, relapse history estimated by model 1

  • Ext Val: not applicable


Effect measure estimates
  • Dev:

    • Step 1 model: see Ahuja 2021 supplementary table 1

    • Step 2 model (log OR:): age −0.019216, disease duration −0.033818, estimated relapse history 2.924231

  • Ext Val: not applicable


Predictor influence measure
  • Dev: not reported

  • Ext Val: not applicable


Validation model update or adjustment
  • Dev: not applicable

  • Ext Val: none

Interpretation  Aim of the study
To develop and test a clinically deployable model for predicting 1‐year relapse risk in MS patients
Primary aim
The primary aim of this study is the prediction of individual outcomes
Model interpretation
Probably exploratory
Suggested improvements
To incorporate MRI features
Notes Applicability overall
Low
 
Item Authors' judgement Support for judgement
Participants No EHR include data collected for a different purpose; the data are collected without protocol and various users enter data in the records. Therefore, the data are extremely heterogeneous.
Predictors Yes
  • Dev: The basic set of predictors includes age, sex, race, and disease duration and the others are consolidated EHR codes. The model seems to be applicable at any visit.

  • Ext Val: The basic set of predictors includes age, sex, race, and disease duration, which should not be problematic. The other predictors are EHR codes, which are expected to be standardised. The model seems to be applicable at any time.

Outcome Yes
  • Dev: We rated this domain for this analysis as having a low risk of bias. Although it is unclear if the outcome data are coming from the CLIMB dataset, it is expected to be from the CLIMB cohort due to the cumbersome work of deriving it in the EHR dataset and its reported unreliability.

  • Ext Val: We rated this domain for this analysis as having a high risk of bias. Relapse history is presented as unreliable in EHR but the outcome itself is based on relapse. Also, the definition of the outcome contains a radiological component, without any specifications of measurement method or standardisation between the sites.

Analysis No
  • Dev: The EPV was low. No information about missing data and how it was handled was given. The model was trained on correlated data (overlapping periods from patient visits to outcome assessment), which was addressed only by keeping all observations from a single patient together when creating folds. The uncertainty of the first model step was probably not carried through to the second step of the model. It was unclear if the initial univariable feature selection was performed in the total set or training only.

  • Ext Val: The number of events was unclear, but the relapse rate was reported to be low and only 186 participants were included. No information about missing data and how it was handled was given. It was unclear whether the external validation set contained correlated, even overlapping data (visits from patients), and whether it was accounted for.

Overall No At least one domain is at high risk of bias.

Bejarano 2011.

Study characteristics
General information Model name
  • Dev

  • Val


Primary source
Journal
Data source
Cohort, primary
Study type
Development + validation (model refit), location
Participants Inclusion criteria
  • Short‐medium disease duration (< 10 years)

  • Any disease subtype fulfilling MS criteria (criteria of spatio‐temporal dissemination fulfilled if CIS patients)

  • No relapses in the month prior to inclusion


Exclusion criteria
  • Conditions that prevent patients from undergoing motor evoked potentials (MEP) or MRI studies

  • EDSS > 7.0


Recruitment
  • Dev: consecutive patients with MS at the University of Navarra, Spain

  • Val: Hospital San Raffaele (Milan), Italy


Age (years)
  • Dev: mean 35.1

  • Val: mean 37.0


Sex (%F)
  • Dev: 64.7

  • Val: 66.7


Disease duration (years)
  • Dev: mean 5.9 (SD: 7.4)

  • Val: mean 9 (SD: 6)


Diagnosis
  • Dev: 31.4% CIS, 51.0% RRMS, 5.9% SPMS, 7.8% PPMS, 3.9% PRMS

  • Val: 88.5% RRMS, 11.5% SPMS


Diagnostic criteria
McDonald 2005 (Polman 2005)
Treatment
  • Dev:

    • At recruitment, 54.9% on DMT

    • During follow‐up, not reported

  • Val: not reported


Disease description
  • Dev: EDSS median (range): 2.0 (0 to 6), number of relapses in previous 2 years mean (SD): 1.29 (1.51)

  • Val: EDSS median (range): 1.5 (0 to 6.5)


Recruitment period
Not reported
Predictors Considered predictors
  • Dev: disease subtype, sex, age, EDSS at study entry, motor function score of EDSS (MF), Multiple Sclerosis Functional Composite (MSFC), motor scores of MSFC (TWT and NHPT), use of disease‐modifying therapies, total lesion volume on T1 (unclear T2), gadolinium‐enhancing T1, GM and WM volumes, central motor conduction time (CMCT), Motor Evoked Potential (MEP) score, aggregated MEP score, worst Z score from the 4 limbs, abnormal MEP

  • Val: not applicable


Number of considered predictors
  • Dev: 22 or 23 (unclear transformation)

  • Val: not applicable


Timing of predictor measurement
  • Dev: at study baseline (cohort entry)

  • Val: not applicable


Predictor handling
  • Dev: MEP amplitude and latency dichotomised

  • Val: not applicable

Outcome Outcome definition
Disability (EDSS): change in EDSS as numeric delta
Timing of outcome measurement
At 2 years
Missing data Number of participants with any missing value
  • Dev: ≥ 8

  • Val: not reported


Missing data handling
  • Dev: single imputation (by worst latency) of missing MEP data, last value carried forward for outcomes

  • Val: not reported

Analysis Number of participants (number of events)
  • Dev: 51 (continuous outcome)

  • Val: 96 (continuous outcome)


Modelling method
  • Dev: neural network, multilayer perception

  • Val: not applicable


Predictor selection method
  • Dev:

    • For inclusion in the multivariable model, all candidate predictors

    • During multivariable modelling, stepwise selection

      • Direction unclear, accuracy as criterion

  • Val: not applicable


Hyperparameter tuning
  • Dev: early stopping (minimum MSE in validation set) mentioned

  • Val: not applicable


Shrinkage of predictor weights
  • Dev: modelling method

  • Val: not applicable


Performance evaluation dataset
  • Dev: development

  • Val: external validation


Performance evaluation method
  • Dev: cross‐validation, 10 times 10‐fold

  • Val: model refit to new data


Calibration estimate
Not reported
Discrimination estimate
  • Dev: unclear how c‐statistic produced for numeric outcome, c‐statistic = 0.76 (SD 0.25)

  • Val: not applicable


Classification estimate
  • Dev: unclear how classification measures produced for numeric outcome, accuracy = 0.80 (SD 0.14), sensitivity = 0.92, specificity = 0.61, PPV = 0.80, NPV = 0.80

  • Val: unclear how classification measures produced for numeric outcome, accuracy = 0.81


Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
  • Dev: not reported

  • Val: not applicable


Number of predictors in the model
  • Dev: 5

  • Val: not applicable


Predictors in the model
  • Dev: age, worst central motor conduction time of both arms, worst central motor conduction time of both legs, at least 1 abnormal MEP, motor score of EDSS at baseline

  • Val: not applicable


Effect measure estimates
  • Dev: not reported

  • Val: not applicable


Predictor influence measure
  • Dev: clinical predictors of the test MS cohort selected by the Wrapper algorithm

  • Val: not applicable


Validation model update or adjustment
  • Dev: not applicable

  • Val: model refit

Interpretation  Aim of the study
To evaluate the usefulness of clinical, imaging and neurophysiological variables for predicting short‐term disease outcomes in MS patients
Primary aim
The primary aim of this study is the prediction of individual outcomes
Model interpretation
Exploratory
Suggested improvements
Incorporating GM atrophy or other new MRI metrics
Notes Applicability overall
Low
 
Item Authors' judgement Support for judgement
Participants Yes A cohort was formed for prediction purposes, and inclusion/exclusion criteria seem appropriate.
Predictors Yes The predictors were collected in a prospective way at a single clinic.
Outcome Yes The outcome is considered objective, so risk of bias from knowledge of predictors is not expected. The outcome was conceptualised as a change in EDSS and treated as a score that can be subtracted although EDSS is an ordinal measure. Yet, the modelling method of neural networks can accommodate interactions amongst baseline predictors, including baseline EDSS.
Analysis No The number of participants was low. Calibration was not assessed.
Overall No At least one domain is at high risk of bias.

Bendfeldt 2019.

Study characteristics
General information Model name
  • M7 placebo

  • M9 IFN

  • Linear placebo


Primary source
Journal
Data source
Randomised trial participants, secondary
Study type
Development
Participants Inclusion criteria
  • CIS patients with a monofocal or multifocal presentation of the disease, a first demyelinating event suggestive of MS and at least 2 clinically silent lesions on a T2‐weighted brain magnetic resonance (MRI) scan with a size of at least 3 mm at least 1 of which being ovoid, periventricular, or infratentorial

  • Age between 18 years and 45 years

  • Baseline EDSS 0 to 5


Exclusion criteria
  • Patients with any disease other than MS explaining the signs and symptoms

  • Previous episode that could possibly be attributed to an acute demyelinating event

  • Complete transverse myelitis or bilateral optic neuritis

  • Patients who received prior immunosuppressive therapy


Recruitment
  • M7 Placebo:

    • Placebo arm participants in the BENFIT, a multicentre RCT with a total of 98 centres from 20 countries

    • Austria, Belgium, Czech Republic, Denmark, France, Finland, Germany, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland, United Kingdom, Israel, Canada

  • M9 IFN:

    • IFNb arm participants in the BENFIT study, a multicentre RCT with a total of 98 centres from 20 countries

    • Austria, Belgium, Czech Republic, Denmark, France, Finland, Germany, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland, United Kingdom, Israel, Canada

  • Linear Placebo:

    • Placebo arm participants in the BENFIT, a multicentre RCT with a total of 98 centres from 20 countries

    • Austria, Belgium, Czech Republic, Denmark, France, Finland, Germany, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland, United Kingdom, Israel, Canada


Age (years)
  • M7 placebo: mean 29.7

  • M9 IFN: mean 29.6

  • Linear placebo: mean 30.8


Sex (%F)
  • M7 placebo: 70.5

  • M9 IFN: 65.7

  • Linear placebo: 71.0


Disease duration (years)
Up to 0.16
Diagnosis
100% CIS
Diagnostic criteria
Own definition
Treatment
  • At recruitment, 0%

  • During follow‐up:

    • M7 placebo and linear placebo: 0%

    • M9 IFN: 100% on IFN‐b


Disease description
  • M7 placebo: EDSS median (range): conv‐ 1.0 (0.0 to 2.0), conv+ 1.5 (1.0 to 2.0)

  • M9 IFN: EDSS median (range): conv‐ 2.0 (1.0 to 2.0), conv+ 2.0 (1.0 to 2.5)

  • Linear placebo: EDSS median (range): conv‐ 1.5 (1.0 to 2.0), conv+ 1.5 (1.0 to 2.0)


Recruitment period
2002 to 2005
Predictors Considered predictors
  • M7 placebo, and M9 IFN: age, sex, EDSS, GM volume, GM volume ratio, GM lesion count, Euler‐Poincare characteristic, whole brain summaries (total, mean, standard deviation) of lesion volume, surface area, and mean breadth, GM volume ratio by ROI, ROI summaries (total, mean, standard deviation) of lesion volume, surface area, and mean breadth

  • Linear placebo: cortical grey matter segmentation masks, age, sex, scanner


Number of considered predictors
  • M7 placebo and M9 IFN: 301

  • Linear placebo: unclear, total number of dimensions was determined by the number of voxels within the cortical GM mask, a kernel matrix was created from the images based on correlation, i.e. the similarity between each pair of participants


Timing of predictor measurement
At disease onset (CIS) (RCT baseline within 60 days after onset)
Predictor handling
  • M7 placebo and M9 IFN: continuously, transformation of volume (cubic root) and area (square root)

  • Linear placebo: unclear, probably continuously

Outcome Outcome definition
Conversion to definite MS (modified Poser): CDMS diagnosis within 2 years and confirmed by a central committee based on modified Poser; modified Poser defined as a relapse with clinical evidence of at least 1 CNS lesion, and if the first presentation was monofocal distinct from the lesion responsible for the CIS presentation, or 2) sustained progression by ≥ 1.5 points on the EDSS reaching a total EDSS score of ≥ 2.5 and confirmed at a consecutive visit 3 months later
Timing of outcome measurement
Median (IQR) in days
  • M7 placebo:

    • conv‐: 1780 (857 to 1805)

    • conv+: 249 (74 to 627)

  • M9 IFN:

    • conv‐: 1807 (1794 to 1820)

    • conv+: 432 (111 to 824)

  • Linear placebo:

    • conv‐: 1798 (1497 to 1808)

    • conv+: 303 (138 to 709)

Missing data Number of participants with any missing value
  • M7 placebo: 115, unclear exactly how many participants have any missing

  • M9 IFN: 193, unclear exactly how many participants have any missing

  • Linear placebo: 107, unclear exactly how many participants have any missing


Missing data handling
Not reported
Analysis Number of participants (number of events)
  • M7 placebo: 61 (22)

  • M9 IFN: 99 (49)

  • Linear placebo: 69 (25)


Modelling method
Support vector machine, radial kernel
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, multiple models


Hyperparameter tuning
  • M7 placebo and M9 IFN: soft margin parameter and RBF parameter chosen based on a grid search during k‐fold CV nested within 10‐fold CV for evaluation.

  • Linear placebo: unclear, tuning parameters not specifically mentioned for linear SVM


Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
  • M7 placebo, and M9 IFN: cross‐validation, nested 10‐fold CV

  • Linear placebo: LOOCV within 500 balanced bootstrap samples


Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
  • M7 placebo: balanced accuracy = 0.676 (95% CI 0.559 to 0.793)

  • M9 IFN: balanced accuracy = 0.704 (95% CI 0.614 to 0.794)

  • Linear placebo: accuracy = 0.712 (95% CI 0.707 to 0.716), sensitivity = 0.64, specificity = 0.783


Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Not reported
Number of predictors in the model
  • M7 placebo: 25 (df unclear)

  • M9 IFN: 15 (df unclear)

  • Linear placebo: not reported


Predictors in the model
  • M7 placebo: age, sex, EDSS, GM volume ratio, whole brain summaries (total, mean, median, minimum, maximum, standard deviation) of lesion volume, surface area, and mean breadth

  • M9 IFN: age, sex, EDSS, GM volume ratio, whole brain summaries (total, mean, standard deviation) of lesion volume, surface area, and mean breadth, Euler‐Poincare characteristic

  • Linear placebo: cortical grey matter segmentation masks, age, sex, scanner


Effect measure estimates
Not reported
Predictor influence measure
  • M7 placebo and linear placebo: not reported

  • M9 IFN: predictor weights


Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To determine whether pattern classification using SVMs facilitates predicting conversion to clinically definite multiple sclerosis (CDMS) from clinically isolated syndrome (CIS)
Primary aim
The primary aim of this study is not the prediction of individual outcomes. The focus is on identifying advanced MRI features capable of improving prediction.
Model interpretation
Exploratory
Suggested improvements
Feature selection methods; larger, independent test data; ensembles of classifiers; other para‐clinical markers (synthesis of oligoclonal bands and genetic factors); other lesional or degenerative MRI features
Notes Applicability overall
High
Applicability overall rationale
Although this study contained models, the main aim was not to create a model for prediction of individual outcomes but was rather to identify advanced imaging features capable of improving prediction.
Auxiliary references
Bakshi R, Dandamudi VS, Neema M, De C, Bermel RA. Measurement of brain and spinal cord atrophy by magnetic resonance imaging as a tool to monitor multiple sclerosis. J Neuroimaging 2005;15(4 Suppl):30s‐45s.
Barkhof F, Polman CH, Radue EW, Kappos L, Freedman MS, Edan G, et al. Magnetic resonance imaging effects of interferon beta‐1b in the BENEFIT study: integrated 2‐year results. Arch Neurol 2007;64(9):1292‐8.
Kappos L, Polman C H, Freedman MS, Edan G, Hartung HP, Miller DH, et al. Treatment with interferon beta‐1b delays conversion to clinically definite and McDonald MS in patients with clinically isolated syndromes. Neurology 2006;67(7):1242‐9.
 
Item Authors' judgement Support for judgement
Participants Yes The data source was RCT, expected to be high in quality and with a clear eligibility assessment. The number of participants reported in the Methods section and described in results did not match, yet no exclusion criteria were reported to explain the difference. Hence, this discrepancy was addressed in the analysis domain.
Predictors Yes The predictors were derived from an RCT, expected to have sufficient standardisation, and there is no reason to believe that the feature extraction/processing was different. Although some of the predictors were created after the outcome was assessed, the predictor creation is automated, and the risk of bias is considered to be low.
Outcome Yes Imaging was assessed at a central location, and the outcome should be robust to the knowledge of demographics and baseline EDSS. The outcome is not common, but it was probably pre‐specified.
Analysis No M7 placebo arm and M9 IFNb arm: The number of patients included in Table 2 was inconsistent with the number of trial participants and what was reported in the Introduction. There was no mention of missing data, making it likely that a complete‐case analysis was performed. The number of observations per variable was low. Accuracy was computed, but not discrimination or calibration. Standardisation was done on the full dataset instead of within the resampling structure. The performance measures were calculated at the same level as model selection would probably occur but not tuning parameter selection. The presentation of a final selected model is unclear.
Linear SVM: The number of patients included in Table 2 was inconsistent with the number of trial participants and what was reported in the Introduction. There was no mention of missing data, making it likely that a complete‐case analysis was performed. The number of observations per variable was low. Accuracy was computed, but not discrimination or calibration. Standardisation was done on the full dataset instead of within the resampling structure. There was no mention of SVM tuning, so performance was probably evaluated in the same data as tuning. The presentation of a final selected model is unclear.
Overall No At least one domain is at high risk of bias.

Bergamaschi 2001.

Study characteristics
General information Model name
BREMS Dev
Primary source
Journal
Data source
Mixed (registry, routine care), secondary
Study type
Development
Participants Inclusion criteria
  • Diagnosis of CDMS (Poser 1983)

  • Initial RR course (Lublin 1996)

  • Disease duration ≥ 3 years

  • Prediagnosis interval (time between symptoms onset and first examination at Institute) ≤ 12 months


Exclusion criteria
Not reported
Recruitment
Patients at the Centre for Multiple Sclerosis of Fondazione C. Mondino (Pavia), the only facility for MS patients in the district, Italy
Age (years)
Mean 28.5
Sex (%F)
62.9
Disease duration (years)
Not reported
Diagnosis
100% RRMS
Diagnostic criteria
Mixed: Poser 1983, Lublin 1996
Treatment
  • At recruitment, 5.4% copolymer, 3.7% betaIFN‐1b, 2.2% betaIFN‐1a

  • During follow‐up, 5.4% copolymer, 3.7% betaIFN‐1b, 2.2% betaIFN‐1a


Disease description
Not reported
Recruitment period
Until 1997
Predictors Considered predictors
Unclear if it is the complete list: gender, age at onset, type of initially involved functional systems (FSs), number of initially involved FSs, whether initial relapse was followed by sequelae, interval between first 2 attacks, pre‐1 year relapse counts by type of involved FSs, maximum neurological score reached in each distinct FS, whether EDSS ≥ 4 during or outside of relapse in first year, (intermediate predictors: relapses in each neurological systems, FS‐specific impairment scores, EDSS evolution, use of preventive therapies)
Number of considered predictors
> 9
Timing of predictor measurement
At disease onset (RRMS) and regular visits up to 1 year after onset (baseline)
Predictor handling
EDSS dichotomised
Outcome Outcome definition
Conversion to progressive MS: time of onset of secondary progressive phase, defined as the earliest date of observation of a progressive worsening, severe enough to determine an increase of at least 1 point on the EDSS; the worsening had to persist for at least 6 months after the onset of progression in order to be confirmed
Timing of outcome measurement
Follow‐up mean (SD, range): 7.5 years (5.7 years, 3 years to 25 years)/follow‐up visit frequency every 6 months on average, but the frequency depended on the course of the disease: patients with 'active' relapsing disease were followed every 3 months, patients with 'stable' relapsing disease every 6 to 12 months
Missing data Number of participants with any missing value
Not reported
Missing data handling
Not reported
Analysis Number of participants (number of events)
186 (34)
Modelling method
Survival, Bayesian joint survival model using Monte Carlo particle filtering
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, Monte Carlo particle filtering


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
None
Performance evaluation method
Not applicable
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Not reported
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Regression model without baseline hazard or proneness to failure
Number of predictors in the model
9
Predictors in the model
Age at onset, female, sphincter onset, pure motor onset, motor‐sensory onset, sequelae after onset, number of involved FS at onset, number of sphincters plus motor relapses, EDSS ≥ 4 outside relapse
Effect measure estimates
Local relative risks (95% credible interval): age at onset 1.05 (1.02 to 1.09), female 0.39 (0.17 to 0.78), sphincter onset 2.98 (1.10 to 6.10), pure motor onset 2.11 (0.90 to 4.20), motor‐sensory onset 2.4 (1.15 to 4.41), sequelae after onset 1.76 (1.04 to 2.88), number of involved FS at onset 1.39 (1.16 to 1.64), number of sphincter plus motor relapses 2.10 (1.56 to 2.89), EDSS ≥ 4 outside relapse 2.28 (0.40 to 6.50), to be understood as hazard ratios
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
With the aid of a Bayesian statistical model of the natural course of relapsing‐remitting MS, to identify short‐term clinical predictors of long‐term evolution of the disease, with particular focus on predicting onset of secondary progressive course failure event on the basis of patient information available at an early stage of disease.
Primary aim
The primary aim of this study is not the prediction of individual outcomes. Rather, the focus is on predictor identification.
Model interpretation
Probably confirmatory
Suggested improvements
Not reported
Notes Applicability overall
High
Applicability overall rationale
Although this study contained a model, the main aim was not to create a model for prediction of individual outcomes but was rather to search for predictors.
 
Item Authors' judgement Support for judgement
Participants No This study used routine care data, which may introduce risk of bias.
Predictors Yes Because the data were collected at clinical visits, the predictors were probably collected without knowledge about the outcome. A group of neurologists saw the participants at the same clinic, so the predictors were probably assessed in a similar way for all patients. The proposed model only used the early predictors, which were collected within 1 year of disease onset.
Outcome No The frequency of visits depended on the disease course and was different for patients with active relapsing disease, every 3 to 6 months, vs patients with stable relapsing disease, every 6 to 12 months, which causes differential assessment of the outcome in different patients.
Analysis No The EPV was low, and the model or its optimism were not evaluated in any way. Many other details, including the method of dealing with missing data and overfitting, were not reported. EDSS was dichotomised.
Overall No At least one domain is at high risk of bias.

Bergamaschi 2007.

Study characteristics
General information Model name
BREMS
Primary source
Journal
Data source
Cohort, secondary
Study type
External validation (initial validation), location
Participants Inclusion criteria
  • Diagnosis of definite MS

  • Initial RR course

  • Disease duration ≥ 10 years

  • Interval from clinical onset to the first neurological examination ≤ 1 year


Exclusion criteria
Not reported
Recruitment
MS centres in Pavia (Northern Italy), Florence (Central Italy), Bari (Southern Italy)
Age (years)
Median 24.8
Sex (%F)
69.3
Disease duration (years)
Not reported
Diagnosis
100% RRMS
Diagnostic criteria
Poser 1983
Treatment
  • At recruitment, 2.7% on DMT

  • During follow‐up, 52.9% on DMT and 1.4% on immunosuppressive, 2.7% since the beginning, 43% never treated


Disease description
Not reported
Recruitment period
Not reported
Predictors Considered predictors
Not applicable
Number of considered predictors
Not applicable
Timing of predictor measurement
Not applicable
Predictor handling
Not applicable
Outcome Outcome definition
Conversion to progressive MS: time at which the patient reached the confirmed SP, defined as the earliest date of observation of a progressive worsening, severe enough to lead to an increase of at least 1 point on the EDSS, and confirmed at least 1 year after progression
Timing of outcome measurement
Follow‐up mean (SD, range): 17.1 years (2.1 years, 10 years to 48 years), time to endpoint median (range): 10.5 years (2 years to 44 years)/follow‐up visit frequency every 6 months on average, but the frequency depended on the course of the disease: patients with 'active' relapsing disease were followed every 3 months, patients with 'stable' relapsing disease every 6 to 12 months.
Missing data Number of participants with any missing value
Not reported
Missing data handling
Not reported
Analysis Number of participants (number of events)
535 (87 within 10 years)
Modelling method
Not applicable
Predictor selection method
Not applicable
Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
Not applicable
Performance evaluation dataset
External validation
Performance evaluation method
Not applicable
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
  • Cutoff at 95th percentile (score ≥ 2.0): sensitivity = 0.17, specificity = 0.99, PPV = 0.86, NPV = 0.83

  • Cutoff at 5th percentile (score ≤ −0.63): sensitivity = 0.08, specificity = 1.00, PPV = 1.00, NPV = 0.18

  • The event is defined as having secondary progression for 95th percentile cutoff but not having secondary progression for the other


Overall performance
Not reported
Risk groups
Very high risk: 95th percentile (score ≥ 2.0), very low risk: 5th percentile (score ≤ −0.63)
Model  Model presentation
Not applicable
Number of predictors in the model
Not applicable
Predictors in the model
Not applicable
Effect measure estimates
Not applicable
Predictor influence measure
Not applicable
Validation model update or adjustment
None
Interpretation  Aim of the study
To test the trustworthiness of the Bayesian risk score on the basis of a new and larger sample of patients
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Confirmatory
Suggested improvements
Incorporation of additional clinical aspects of disease (cognitive impairment and fatigue), genetic, neuroimmunological, neuroradiological, and neurophysiological findings
Notes Applicability overall
Low
 
Item Authors' judgement Support for judgement
Participants Yes This study probably used cohort data and the eligibility criteria were clear.
Predictors Yes Because the data were collected at clinical visits, the predictors were probably collected without knowledge about the outcome. A group of neurologists saw the participants at the same clinic, so the predictors were probably assessed in a similar way for all patients. The proposed model only used the early predictors, which were collected within 1 year of disease onset.
Outcome No The frequency of visits depended on the disease course and was different for patients with active relapsing disease, every 3 to 6 months, vs patients with stable relapsing disease, every 6 to 12 months, which causes differential assessment of the outcome in different patients.
Analysis No The subset on which performance measures were estimated contained fewer than 100 events. Only classification measures were addressed. Methods of dealing with missing data were not reported.
Overall No At least one domain is at high risk of bias.

Bergamaschi 2015.

Study characteristics
General information Model name
  • BREMS Ext Val

  • BREMSO SP Val

  • BREMSO MSSS Val


Primary source
Journal
Data source
Registry, secondary
Study type
  • BREMS Ext Val: external validation, multiple (location, time)

  • BREMSO SP Val: validation (predictors dropped), multiple (location, time)

  • BREMSO MSSS Val: validation (predictors dropped and different outcome), multiple (location, time)

Participants Inclusion criteria
  • RRMS

  • Disease duration ≥ 1 year


Exclusion criteria
  • PPMS

  • Missing or incorrect data for variables used in BREMSO


Recruitment
  • MS centres participating in MSBase registry from 26 countries

  • Italy, Canada, Australia, Spain, Netherlands, Argentina, Iran, Kuwait, Turkey, Denmark, Czech Republic, Portugal, French, Belgium, UK, Germany, Cuba, Israel, Hungary, USA, India, Mexico, Malta, Macedonia, Romania, Brazil


Age (years)
Mean 31.1
Sex (%F)
71.3
Disease duration (years)
Not reported
Diagnosis
100% RRMS
Diagnostic criteria
McDonald 2001
Treatment
Unclear timing, 72.2% on treatment
Disease description
Not reported
Recruitment period
Not reported
Predictors Considered predictors
  • BREMS Ext Val: not applicable

  • BREMSO SP Val and BREMSO MSSS Val: age at onset, gender, sphincter onset, pure motor onset, motor‐sensory onset, number of functional systems involved at onset, sequel after onset


Number of considered predictors
  • BREMS Ext Val: not applicable

  • BREMSO SP Val and BREMSO MSSS Val: 7


Timing of predictor measurement
  • BREMS Ext Val: not applicable

  • BREMSO SP Val and BREMSO MSSS Val: at disease onset (RRMS)


Predictor handling
  • Continuously

  • No interactions considered

Outcome Outcome definition
  • BREMS Ext Val and BREMSO SP Val:

    • Conversion to progressive MS: secondary progression (SP) defined as when a worsening, severe enough to lead to an EDSS score increase of at least 1 point, was confirmed at least 1 year after its initial observation

  • BREMSO MSSS Val:

    • Disability (MSSS): mild MS defined as MSSS < first quartile and severe MS defined as MSSS ≥ third quartile, calculated for each patient at the last observation or at the last observation before the introduction of DMTs, if used; MS Severity Score (MSSS) is an algorithm that adjusts EDSS according to the corresponding disease duration


Timing of outcome measurement
  • Never‐treated patients from disease onset to the last observation mean (SD) 8 years (9.9 years), median (IQR, range) 3.7 years (0.9 years to 12 years, 1 year to 55 years); 1148 patients observed for ≥ 10 years

  • Treated patients from disease onset to the start of treatment mean (SD) 5.7 years (6.6 years), median (IQR, range) 3.2 years (1 year to 8 years, 1 year to 52 years); 2021 patients observed for ≥ 10 years

Missing data Number of participants with any missing value
2965
Missing data handling
Exclusion
Analysis Number of participants (number of events)
  • BREMS Ext Val: 1131 (not reported)

  • BREMSO SP Val 14,211 (1954)

  • BREMSO MSSS Val: 14,211 (3567)


Modelling method
Not applicable
Predictor selection method
Not applicable
Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
Not applicable
Performance evaluation dataset
External validation
Performance evaluation method
Not applicable
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
  • BREMS Ext Val: sensitivity = 0.35, specificity = 0.80

  • BREMSO SP Val: sensitivity = 0.28, specificity = 0.76

  • BREMSO MSSS Val: sensitivity = 0.36, specificity = 0.79


Overall performance
Not reported
Risk groups
Quartiles by risk score, the first (< −0.58) and third quartiles (> 0.52)
Model  Model presentation
  • BREMS Ext Val: not applicable

  • BREMSO SP Val and BREMSO MSSS Val: regression model without baseline hazard and a subset of fitted coefficients


Number of predictors in the model
  • BREMS Ext Val and BREMSO MSSS Val: not applicable

  • BREMSO SP Val: 7


Predictors in the model
Not applicable
Effect measure estimates
Not applicable
Predictor influence measure
Not applicable
Validation model update or adjustment
  • BREMS Ext Val: none

  • BREMSO SP Val and BREMSO MSSS Val: predictors removed

Interpretation  Aim of the study
To predict the natural course of MS using the Bayesian Risk Estimate for MS at Onset (BREMSO), which gives an individual risk score calculated from demographic and clinical variables collected at disease onset
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Confirmatory
Suggested improvements
Not reported
Notes Applicability overall
Low
 
Item Authors' judgement Support for judgement
Participants No This study used registry data and there was a substantial amount of missing data that the exclusion was based on.
Predictors Yes The data were collected prospectively, and only early predictors that were collected within 1 year of disease onset were included in the model to be used early in the disease. It is a multicentre study but with well‐defined tools.
Outcome Yes BREMS Ext Val and BREMSO for SP: We rated this domain for these analyses as having an unclear risk of bias. Due to the data source, the outcome was probably assessed with knowledge of the predictors, but we consider the definition for conversion to secondary progressive MS based on EDSS to be a rather hard outcome. It is unclear if the frequency of visits at which the outcome was assessed differed from patient to patient, which is likely due to the nature of the data source.
BREMSO MSSS: We rated this domain for this analysis as having a low risk of bias. Due to the data source, the outcome was probably assessed with knowledge of the predictors, but we consider EDSS to be a rather hard outcome.
Analysis No BREMS Ext Val: This was a validation study without any reported discrimination or calibration measures, especially those for censored data. Exclusion for missing data was handled in the Participants section.
BREMSO for SP and BREMSO MSSS: This validation study did not assess discrimination or calibration. Variables were dropped from the developed model, and the coefficients for the rest of the predictors were used as if this did not occur.
Overall No At least one domain is at high risk of bias.

Borras 2016.

Study characteristics
General information Model name
CH3L1 + CNDP1
Primary source
Journal
Data source
Cohort, unclear
Study type
Development
Participants Inclusion criteria
  • Patients with CIS or OND


Exclusion criteria
Not reported
Recruitment
Hospital Ramon and Cajal in Madrid, Spain
Age (years)
Median 35.5 (unclear when)
Sex (%F)
66.0
Disease duration (years)
Median 0.22 (range: 0.01 to 0.35)
Diagnosis
100% CIS
Diagnostic criteria
Not reported
Treatment
Unclear timing, and no specific treatment details, 8% on treatment
Disease description
EDSS median (range): 1.5 (0 to 5)
Recruitment period
From 2001 onward
Predictors Considered predictors
CH3L1, CNDP1, CLUS, A1AG1, 2‐AACT, CNTN1, AACT, SEM7A, HPT, PGCB, 3‐AACT, OSTP, CMGA, SCG2, A2MG, A1AG1, TTHY
Number of considered predictors
Between 32 and 17 (discrepant lists)
Timing of predictor measurement
At disease onset (CIS) reported as 'first relapse' (4 to 126 days between CIS and Lumbar Puncture)
Predictor handling
Continuously (log2 transformed)
Outcome Outcome definition
Conversion to definite MS: conversion to CDMS defined as the presence of IgG oligoclonal bands and an abnormal brain MRI at baseline (2, 3, or 4 Barkhof criteria)
Timing of outcome measurement
Follow‐up median (SD): CIS 3.25 years (1.32 years), CDMS 4.08 years (2.48 years)
Missing data Number of participants with any missing value
1, only missing predictor reported
Missing data handling
Single value imputation of predictors (a minimum estimated log2‐transformed abundance for a given protein across runs)
Analysis Number of participants (number of events)
49 (24)
Modelling method
Logistic regression
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, iterative selection within CV folds followed by inclusion for that overall training sample if included in 2 of 4 of the CV training sets, final model based on most frequent combinations, AUC as criterion


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
No
Performance evaluation dataset
Development
Performance evaluation method
Unclear, point estimates for full dataset, plots also depict median performance and measure of uncertainty for subset of 500 repeats of training‐validation split
Calibration estimate
Not reported
Discrimination estimate
c‐Statistic = 0.858, optimism‐corrected = 0.785
Classification estimate
Sensitivity = 0.84, specificity = 0.83
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Heat maps
Number of predictors in the model
2
Predictors in the model
CH3L1, CNDP1
Effect measure estimates
Not reported
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To establish a diagnostic molecular classifier with high sensitivity and specificity able to differentiate between clinically isolated syndrome patients with a high and a low risk of developing multiple sclerosis over time. To build a statistical model able to assign to each patient a precise probability of conversion to clinically defined multiple sclerosis.
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Probably confirmatory
Suggested improvements
Not reported
Notes Applicability overall
High
Applicability overall rationale
The predictors used were proteins and no other predictor domain was considered for use in the model.
 
Item Authors' judgement Support for judgement
Participants Unclear No details were provided about the eligibility criteria other than the diagnostic subtype and the data source is not clearly reported.
Predictors Yes The predictors are relatively objective to assess, available at the intended time of prognostication. Although it is unclear when the predictor assessment was done relative to outcome data collection, there is nothing to indicate different assessments for participants.
Outcome Unclear No exact timing of the outcome assessment was specified. Some patients were followed up for short periods and others for years.
Analysis No The number of participants was much lower than necessary, and EPV was less than 10. Only discrimination was addressed, but not calibration. A bootstrap procedure was used, but the variability in AUC only accounted for training samples that chose those predictors. The time for which predictions were to be made was never addressed; therefore, participants had different follow‐up times, and this was not accounted for. It is unclear whether the weights of the predictors corresponded to a final selected model or not. Although not all patients were included in the analysis, only a single patient was excluded, which is less than 5%.
Overall No At least one domain is at high risk of bias.

Brichetto 2020.

Study characteristics
General information Model name
Future course assignment
Primary source
Journal
Data source
Cohort, primary
Study type
Development
Participants Inclusion criteria
Unclear, as reported in auxiliary reference
  • MS diagnosis in patients with RRMS or SPMS with a minimum of 1 time point


Exclusion criteria
Unclear, as reported in auxiliary reference
  • Patients with progressive‐relapsing or primary progressive MS


Recruitment
Patients followed as outpatients or at‐home by Italian Multiple Sclerosis Society (AISM) Rehabilitation Centres of Genoa, Padua and Vicenza, Italy
Age (years)
Not reported
Sex (%F)
Not reported
Disease duration (years)
Not reported
Diagnosis
Unclear, RRMS, SPMS
Diagnostic criteria
Not reported
Treatment
Not reported
Disease description
Not reported
Recruitment period
2014 to 2017
Predictors Considered predictors
ABILHAND, Edinburgh Handedness Inventory, Hospital Anxiety and Depression Scale, Life Satisfaction Index, Modified Fatigue Impact Scale, Overactive Bladder Questionnaire, Functional Independence Measure, Montreal Cognitive Assessment, Paced Auditory Serial Addition Task, Symbol Digit Modalities Test, education (years), number of relapses in past 4 months, height, weight
Number of considered predictors
143
Timing of predictor measurement
Unclear, at multiple assessments every 4 months
Predictor handling
Unclear, probably continuously
Outcome Outcome definition
Conversion to progressive MS
Timing of outcome measurement
Unclear if next visit or 4 months
Missing data Number of participants with any missing value
Not reported
Missing data handling
Unclear, K‐nearest neighbour data imputing strategy
Analysis Number of participants (number of events)
≤ 3398 evaluations of 810 participants (unclear how many are used in the FCA model, 1451 evaluations of RR, 1947 evaluations are SP)
Modelling method
Unspecified ML techniques/multitask elastic net (for prediction of disease descriptors) followed by gradient boosting (for classification based on predicted predictors) in auxiliary reference
Predictor selection method
  • For inclusion in the multivariable model, not reported

  • During multivariable modelling, unclear

    • Elastic net and recursive feature elimination during CCA model fitting mentioned in auxiliary reference


Hyperparameter tuning
Unclear, according to auxiliary reference, parameter tuning done using inner parameter optimisation via grid‐search in cross‐validation, modelling/tuning not reported in Brichetto 2020
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
Unclear, methods not reported, unclear how much to rely on auxiliary reference
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Accuracy FCA = 0.826, CCA = 0.860
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
List of selected predictors
Number of predictors in the model
33
Predictors in the model
ABILHAND item 12, ABILHAND total, HADS item 7, HADS sub1, HADS sub2, HADS total, Life Satisfaction Index total, MFIS item 2, MFIS sub1, MFIS sub2, MFIS sub3, MFIS total, Overactive Bladder Questionnaire item 1, Overactive Bladder Questionnaire item 4, Overactive Bladder Questionnaire total, Functional Independence Measure item 10, Functional Independence Measure item 11, Functional Independence Measure item 12, Functional Independence Measure item 14, Functional Independence Measure sub3, Functional Independence Measure sub4, Functional Independence Measure sub5, Functional Independence Measure sub6, Functional Independence Measure total, Montreal Cognitive Assessment item 1, Montreal Cognitive Assessment item 9, Montreal Cognitive Assessment tot1, Montreal Cognitive Assessment tot2, PASAT, SDMT, years of education, height, weight
Effect measure estimates
Not reported
Predictor influence measure
Not reported
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To confirm the important role of applying ML to PROs and CAOs of people with relapsing‐remitting (RR) and secondary progressive (SP) form of multiple sclerosis (MS), to promptly identify information useful to predict disease progression
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on showing the relevancy of PRO and CAO to MS prediction.
Model interpretation
Exploratory
Suggested improvements
Including data on therapy and MRIs
Notes Applicability overall
Unclear
Applicability overall rationale
Due to the lack of reporting on outcomes and their assessment, applicability is unclear. Additionally, it is unclear whether some patients had already experienced the outcome at baseline.
Auxiliary references
Bebo BF Jr, Fox RJ, Lee K, Utz U, Thompson AJ. Landscape of MS patient cohorts and registries: recommendations for maximising impact. Mult Scler 2018;24(5):579‐86.
Fiorini S, Verri A, Barla A, Tacchino A, Brichetto G. Temporal prediction of multiple sclerosis evolution from patient‐centred outcomes. In: Proceedings of the 2nd Machine Learning for Healthcare Conference; 2017 August 18‐19; Boston MA. Boston MA: Proceedings of Machine Learning Research, 2017.
 
Item Authors' judgement Support for judgement
Participants Unclear The data were prospectively collected from a cohort. Although references cited in the article have some study inclusion/exclusion criteria, the number of patients used in this article does not match them. Thus, the inclusion/exclusion criteria used in the article are unclear.
Predictors No There are patient‐reported outcomes that could be influenced by the current diagnoses conveyed to patients by clinicians. It is clear that at 1 stage (CCA) of the FCA modelling, patients with different diagnoses (RR, SP) were included. It is unclear whether assessors of clinical predictors at the different clinics have the same level of experience.
Outcome Unclear The outcomes were not clearly defined and their assessments were not described.
Analysis No EPV of the compound FCA model was at most 10.1. There was no mention of the complexities and uncertainties of two‐stage modelling or inclusion of different time points from the same patients in training/validation/test sets being taken into account. Neither calibration nor discrimination measures was reported. It was unclear how the missing data were handled. The method of internal validation was unclear. Model selection and evaluation did not appear to be properly separated.
Overall No At least one domain is at high risk of bias.

Calabrese 2013.

Study characteristics
General information Model name
  • Dev

  • Ext Val


Primary source
Journal
Data source
Cohort, primary
Study type
Development + external validation, time
Participants Inclusion criteria
  • Diagnosis of RRMS

  • At least 5 years of disease duration


Exclusion criteria
Not reported
Recruitment
Consecutive patients receiving medical care at the outpatient rooms of the MS Centre of Veneto Region–First Neurological Clinic at University Hospital of Padua, Italy
Age (years)
  • Dev: mean 35.3

  • Ext Val: mean 34.5


Sex (%F)
  • Dev: 66.7

  • Ext Val: 59.5


Disease duration (years)
  • Dev: mean 11.3 (range: 5 to 23)

  • Ext Val: mean 10.5 (range: 10 to 21)


Diagnosis
100% RRMS
Diagnostic criteria
McDonald 2001
Treatment
  • At recruitment, 100% on DMT

  • During follow‐up, 100% on DMT


Disease description
  • Dev: EDSS median (range): 2.5 (0 to 4.5)

  • Ext Val: not reported


Recruitment period
  • Dev: during 2006

  • Ext Val: during 2007

Predictors Considered predictors
  • Dev: age, age at onset, gender, initial symptoms (?), EDSS score, relapse rate, Modified Fatigue Impact Scale score, T2 white matter lesion volume (T2WMLV), T2 white matter lesion number, global cortical thickness (CTh), cerebellar cortical volume (CCV), cortical lesion (CL) volume, cortical lesion (CL) number, contrast‐enhancing lesion (CEL) number, spinal cord lesion (SCL) number, patients with spinal cord lesions (SCL)

  • Ext Val: not applicable


Number of considered predictors
  • Dev: ≥ 16 (unclear predictor definition)

  • Ext Val: not applicable


Timing of predictor measurement
  • Dev: at study baseline (cohort entry at least 5 years after disease onset)

  • Ext Val: not applicable


Predictor handling
  • Dev: continuously

  • Ext Val: not applicable

Outcome Outcome definition
Conversion to progressive MS: SPMS defined as an increase of at least 1.0 EDSS point compared to T0, not related to a relapse, observed at any time of the follow‐up and confirmed at 6 months; EDSS scored every 6 months and in case of a relapse
Timing of outcome measurement
  • Dev: up to 5 years (T5), time to outcome median (range): 52 months (29 months to 64 months)

  • Ext Val: up to 5 years (T5), time to outcome median (range): 54 months (30 months to 62 months)

Missing data Number of participants with any missing value
  • Dev: 11, only missing outcome reported

  • Ext Val: 1, only missing outcome reported


Missing data handling
Complete case
Analysis Number of participants (number of events)
  • Dev: 334 (66)

  • Ext Val: 83 (19)


Modelling method
  • Dev: logistic regression

  • Ext Val: not applicable


Predictor selection method
  • Dev:

    • For inclusion in the multivariable model, all candidate predictors

    • During multivariable modelling, significance

      • P value < 0.05

  • Ext Val: not applicable


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
  • Dev: none

  • Ext Val: not applicable


Performance evaluation dataset
  • Dev: development

  • Ext Val: external validation


Performance evaluation method
  • Dev: cross‐validation, LOOCV

  • Ext Val: not applicable


Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
  • Dev: accuracy = 0.928, sensitivity = 0.878, specificity = 0.94

  • Ext Val: accuracy = 0.916, sensitivity = 0.842, specificity = 0.937


Overall performance
Not reported
Risk groups
  • Dev: 3 risk groups, low, intermediate and high risk groups created by an unsupervised cluster analysis is said to be confirmed by the logit model but the cut points are unclear

  • Ext Val: not reported

Model  Model presentation
  • Dev:

    • Full regression model

    • Logistic curve (x‐axis: score, y‐axis: estimated progression probability) coloured by risk groups defined by hierarchical clustering

  • Ext Val: not applicable


Number of predictors in the model
  • Dev: 3

  • Ext Val: not applicable


Predictors in the model
  • Dev: age, cortical lesion volume, cerebellar cortical Volume

  • Ext Val: not applicable


Effect measure estimates
  • Dev: log OR (SE): intercept −131.3, age 0.13 (0.046), cortical lesion volume 0.0053 (0.0011), cerebellar cortical volume −0.0013 (0.0003)

  • Ext Val: not applicable


Predictor influence measure
Not applicable
Validation model update or adjustment
  • Dev: not applicable

  • Ext Val: none

Interpretation  Aim of the study
A prospective 5‐year longitudinal study to assess demographic, clinical, and magnetic resonance imaging (MRI) parameters that could predict the changing clinical course of MS.
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on predictor identification.
Model interpretation
Probably exploratory
Suggested improvements
Confirmation in different MS populations and with a longer follow‐up
Notes Applicability overall
Low
 
Item Authors' judgement Support for judgement
Participants Yes The data were reported to be from a cohort study with predefined data collection times.
Predictors Yes Because of the prospective nature of data collection, there is no reason to suspect assessment of predictors differently or with knowledge of outcome data. The predictors were collected by a small group of clinicians at a single centre, and only the variables at the time of baseline were used.
Outcome Yes The outcome is standard and well‐reported. The authors used a 6‐month confirmation period to ensure that the EDSS increase is stable, and found that the results were also stable at 12 months in all patients.
Analysis No Dev: The EPV was below 10 in the development. No calibration or discrimination measures were reported. The internal validation did not address the whole model selection procedure, but an external validation was done. However, the need for shrinkage was not addressed. A low percentage of patients were lost to follow‐up, and complete case analysis was done after the reason was reported, so we do not consider this a large source of possible bias. However, these patients could have been included if time‐to‐event data were used instead.
Ext Val: The number of events was extremely low in the validation. No calibration or discrimination measures were reported. 1 participant was excluded due to a missing outcome, but this is not considered a large possible source of bias.
Overall No At least one domain is at high risk of bias.

De Brouwer 2021.

Study characteristics
General information Model name
GRU‐ODE‐Bayes
Primary source
Journal
Data source
Registry, secondary
Study type
Development
Participants Inclusion criteria
  • MS patients with at least 6 visits in the 3‐year observation period

  • At least one EDSS measurement after the 2‐year EDSS


Exclusion criteria
  • Patients with missing or invalid diagnosis dates

  • No EDSS value and with a date of visit before the onset date

  • Patients with visits before 1990 or with onset date before 1990

  • Patients without at least 1 observation between T1 and T3

  • All EDSS measurements occurring less than 1 month after a relapse in the test period

  • CIS patients


Recruitment
MSBase registry
Age (years)
Mean 32.2 (onset)
Sex (%F)
71.1
Disease duration (years)
Mean 6.9 (range: 3 to 25)
Diagnosis
85.6% RRMS, 4.9% SPMS, 3.3% PPMS, 1.4% PRMS, 4.8% unknown
Diagnostic criteria
Lublin 1996
Treatment
Not reported
Disease description
Prior 3‐year EDSS per patient mean (SD, range): 2.38 (1.48, 0 to 8.5)
Recruitment period
Not reported
Predictors Considered predictors
Gender, age at onset, MS course at time t = 0 (RRMS, SPMS, PPMS, or CIS), disease duration at time t = 0, EDSS at t = 0, Last used DMT at t = 0 (none, interferons, natalizumab, fingolimod, teriflunomide, dimethyl‐fumarate, glatiramer, alemtuzumab, rituximab, cladribine, ocrelizumab, other (contains stem cells therapy, siponimod and daclizumab)), RF only: EDSS closest to t‐3 (first EDSS in dataset), maximum EDSS within t‐3 to t0, difference between maximum and minimum EDSS between t‐3 and t0, number of visits between t‐3 and t0, number of relapses between t‐3 and t0, BPTF/NN: EDSS trajectories
Number of considered predictors
24+EDSS trajectories
Timing of predictor measurement
At multiple visits, at least 6 in 3‐year period
Predictor handling
Continuously
Outcome Outcome definition
Disability (EDSS): disability progression after 2 years defined as a minimum increase in EDSS (baseline EDSS) of 1.5 (0), 1 (≤ 5.5), or 0.5 (> 5.5); needed to be confirmed at least 6 months after
Timing of outcome measurement
Closest observation time to 2 years (t2*, between t1 and t3) and confirmed with a measurement after t2*(at least 6 months later), median (IQR) = 1.995 years (1.887 years to 2.112 years)
Missing data Number of participants with any missing value
≥ 48,520, unclear exactly how many participants have any missing
Missing data handling
Exclusion
Analysis Number of participants (number of events)
6882 (1114)
Modelling method
Neural network, continuous‐time gated recurrent unit variant of recurrent neural network
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, full model approach


Hyperparameter tuning
Validation set (separate from train and test) used for tuning parameter selection during 5‐fold CV optimising binary cross‐entropy
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, 5‐fold (train/validation/test)
Calibration estimate
Calibration plot upon request
Discrimination estimate
c‐Statistic= 0.66 (SD = 0.02)
Classification estimate
Not reported
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Not reported
Number of predictors in the model
19+EDSS trajectories
Predictors in the model
Gender, age at onset, MS course at time t = 0 (RRMS, SPMS, PPMS, or CIS), disease duration at time t = 0, EDSS at t = 0, last used DMT at t = 0 (none, interferons, natalizumab, fingolimod, teriflunomide, dimethyl‐fumarate, glatiramer, alemtuzumab, rituximab, cladribine, ocrelizumab, other (contains stem cells therapy, siponimod and daclizumab)), EDSS trajectories
Effect measure estimates
Not reported
Predictor influence measure
Average AUC degradation after random shuffling of each predictor's values
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To predict disability progression on the EDSS using longitudinal clinical patient data.
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on utilising patient trajectories.
Model interpretation
Exploratory
Suggested improvements
Not reported
Notes Applicability overall
Unclear
Applicability overall rationale
Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.
 
Item Authors' judgement Support for judgement
Participants No The data source was a registry, and inclusion/exclusion criteria were based on predictor/outcome data availability.
Predictors No Predictors were probably collected prior to outcome assessment and were available when the model was used. Most predictors were basic enough that we are not concerned about them being assessed in different ways across patients. However, disease type was used as a predictor. This predictor was probably not measured in the same way across patients or across time, as the diagnostic criteria changed. Also, the category progressive‐relapsing was probably used heterogeneously.
Outcome Yes The outcome is standard and was assessed similarly across patients. It did not contain predictors. Although some predictors could have been known while assessing parts of the outcome, we consider the outcome to be robust to such information. The reported assessment time was 1 year to 3 years, but upon follow‐up with the author, it was stated that the IQR was 1.9 years to 2.1 years.
Analysis Yes Calibration was not explicitly assessed in the report, but the model was calibrated using Platt scaling, and a calibration plot was provided during correspondence. A final model/tool was not provided, but given the model reporting, there is no reason to believe the final model differs from the multivariable analysis.
Overall No At least one domain is at high risk of bias.

de Groot 2009.

Study characteristics
General information Model name
  • Walking

  • Dexterity

  • Cognitive


Primary source
Journal
Data source
Cohort, primary
Study type
Development
Participants Inclusion criteria
  • Aged 16 to 55 years

  • Diagnosed with MS recently (< 6 months)


Exclusion criteria
  • Patients with other neurologic disorders

  • Systemic diseases

  • Malignant neoplastic diseases


Recruitment
Consecutive patients visiting the outpatient clinics of 5 neurology departments in Amsterdam and Rotterdam, Netherlands
Age (years)
Mean 37.4
Sex (%F)
63.7
Disease duration (years)
Up to 6 months
Diagnosis
82% relapse onset, 18% non‐relapse onset
Diagnostic criteria
Poser 1983
Treatment
  • At recruitment, 6% on DMT

  • During follow‐up, 30% on DMT


Disease description
EDSS median (IQR): 2.5 (2.0 to 3.0)
Recruitment period
1998 to 2000
Predictors Considered predictors
  • Walking: Items of the Disability and Impact Profile: How well can you walk? (0 to 10), Are you easily tired? (0 to 10); Item of the Functional Systems of the EDSS: Impairment of pyramidal tract (0 to 6), Impairment of cerebellar tract (0 to 5); Number of lesions in spinal cord

  • Dexterity: Items of the Disability and Impact Profile: How well can you use your hands? (0 to 10); Item of the Functional Systems of the EDSS: Impairment of sensory tract (0 to 6), Impairment of pyramidal tract (0 to 6), Impairment of cerebellar tract (0 to 5); T2‐weighted infratentorial lesion load

  • Cognitive: age, gender; items of the Disability and Impact Profile: How good is your memory? (0 to 10), How well can you concentrate? (0 to 10); T2‐weighted supratentorial lesion load


Number of considered predictors
5
Timing of predictor measurement
At disease onset (definite MS) (study baseline within 6 months after diagnosis)
Predictor handling
  • Continuously

  • No interaction considered

Outcome Outcome definition
  • Walking: disability (EDSS): inability to walk 500 m defined as an EDSS score of 4 or higher

  • Dexterity: disability (9‐HPT): impaired dexterity defined as an abnormal score (mean – 1.96 SD, healthy Dutch reference population) for the 9‐HPT

  • Cognitive: composite (Consistent Long Term Retrieval and Long Term Storage of the Selective Reminding Test, 10/36 Spatial Recall Test, SDMT, PASAT, Word List Generation): cognitive impairments defined as a score of mean – SD for 1 or more subtests of a cognitive screening test, which include the subscales Consistent Long Term Retrieval and Long Term Storage of the Selective Reminding Test measuring verbal learning and memory, the 10/36 Spatial Recall Test measuring visuospatial learning and delayed recall, the Symbol Digit Modalities Test measuring sustained attention and concentration, the Paced Auditory Serial Addition Test measuring sustained attention and information processing speed, and the Word List Generation measuring verbal fluency


Timing of outcome measurement
3 years
Missing data Number of participants with any missing value
  • Walking: 25

  • Dexterity, and Cognitive: 23


Missing data handling
Mixed: complete case for outcome, multiple imputation (twice) for predictors
Analysis Number of participants (number of events)
  • Walking: 146 (37)

  • Dexterity: 146 (46)

  • Cognitive: 146 (44)


Modelling method
Logistic regression
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, stepwise selection

    • P value < 0.5

    • Backward


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
Uniform shrinkage
Performance evaluation dataset
Development
Performance evaluation method
Bootstrap, B = 250
Calibration estimate
  • Walking: calibration plot, calibration slope = 0.93

  • Dexterity: calibration plot, calibration slope = 0.85

  • Cognitive: calibration plot, calibration slope = 0.88


Discrimination estimate
  • Walking: c‐statistic = 0.89 (95% CI 0.83 to 0.95)

  • Dexterity: c‐statistic = 0.77 (95% CI 0.69 to 0.86)

  • Cognitive: c‐statistic = 0.74 (95% CI 0.65 to 0.83)


Classification estimate
Not reported
Overall performance
Not reported
Risk groups
3 risk categories: high (probability of adverse outcome > 75%), moderate (probability of adverse outcome 25% to 75%), and low (probability of adverse outcome < 25%)
Model  Model presentation
  • Score chart

  • Shrunken coefficients without the intercept, risk groups


Number of predictors in the model
  • Walking: 3

  • Dexterity: 5

  • Cognitive: 4


Predictors in the model
  • Walking: How well can you walk?, Impairment of cerebellar tract, Number of lesions in spinal cord

  • Dexterity: How well can you use your hands?, Impairment of sensory tract, Impairment of pyramidal tract, Impairment of cerebellar tract, T2‐weighted infratentorial lesion load

  • Cognitive: age, gender, How well can you concentrate?, T2‐weighted supratentorial lesion load


Effect measure estimates
  • Walking: shrunken log OR (P value): How well can you walk?: −0.57 (0.00), Impairment of cerebellar tract: 0.77 (0.00), Number of lesions in spinal cord: 0.16 (0.05)

  • Dexterity: shrunken log OR (P value): How well can you use your hands? −0.16 (0.16), Impairment of sensory tract 0.27 (0.17), Impairment of pyramidal tract 0.25 (0.31), Impairment of cerebellar tract 0.46 (0.03), T2‐weighted infratentorial lesion load 0.97 (0.00)

  • Cognitive: shrunken log OR (P value): age 0.03 (0.12), gender 0.88 (0.02), How well can you concentrate? −0.17 (0.07), T2‐weighted supratentorial lesion load 0.06 (0.00)


Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To predict functioning after 3 years in patients with recently diagnosed multiple sclerosis (MS)
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Exploratory
Suggested improvements
New cohort recruited in a different geographic area, at a different point in time, or, assessed with different diagnostic criteria
Notes Applicability overall
High
Applicability overall rationale
This study included participants who had already experienced the outcome at baseline.
 
Item Authors' judgement Support for judgement
Participants No Participants known to already have the outcome at baseline were included.
Predictors Yes Predictors available early in disease were used. There is no reason to believe the predictor assessments were made with knowledge of outcome, as the collection was prospective.
Outcome Yes It is unclear if the outcome was determined blinded to predictors or not, but the outcomes are relatively objective, which reduces the risk of bias.
Analysis No Even though the number of predictors was limited and shrinkage was used, the EPV was below or around 10. More than 5% of the participants were removed due to missing outcomes. The bootstrap procedure did not include the predictor selection step.
Overall No At least one domain is at high risk of bias.

Gout 2011.

Study characteristics
General information Model name
Not applicable
Primary source
Journal
Data source
Registry, secondary
Study type
Development
Participants Inclusion criteria
  • Age < 51 years

  • Patients with a first demyelinating event (CIS) leading to admission in the neurology department

  • At least a 1‐year follow‐up

  • Available results of both initial brain magnetic resonance imaging (MRI) and cerebrospinal fluid (CSF) cytology


Exclusion criteria
  • Other diagnoses


Recruitment
Consecutive patients admitted to the neurology department of the Foundation A de Rothschild in Paris, France
Age (years)
Median 31.0
Sex (%F)
70.2
Disease duration (years)
Not reported
Diagnosis
100% CIS
Diagnostic criteria
Not reported
Treatment
0%
Disease description
EDSS median (range): 2 (0 to 6)
Recruitment period
1994 to 2006
Predictors Considered predictors
Gender, age, family history, previous symptoms suggestive of CNS involvement, initial involvement (optic nerve (ref), spinal cord, brainstem/cerebellum, polyregional/cerebrum), Initial Expanded Disability Status Scale ≥ 2.5, ≥ 2 T2 lesions (MR), 3‐4 + Barkhof criteria, CSF white blood cell count > 4/mm3, IgG oligoclonal band, positive CSF (> 4 WBC/mm3 or IgG OB), ≥ 2 T2 lesions + IgG OB, McDonald DIS (3‐4 +BC or 2 T2 lesions +IgG OB)
Number of considered predictors
≥ 15 (unclear how many interactions tested)
Timing of predictor measurement
At disease onset (CIS) leading to admission
Predictor handling
  • All dichotomised (the cutoff levels chosen to be clinically meaningful or to maximise the power)

  • At least 1 interaction was considered

Outcome Outcome definition
Conversion to definite MS (Poser 1983): date of occurrence of a second demyelinating event defined as the occurrence of a symptom or symptoms of neurological dysfunction lasting more than 24 hours with objective confirmation at least 1 month after initial event, or the last follow‐up date in the case of patients remaining event‐free
Timing of outcome measurement
Follow‐up median (range): 3.5 years (1.0 year to 12.7 years), time to outcome in those who experience it median (range) 16.6 months (1.1 months to 112.5 months)
Missing data Number of participants with any missing value
213
Missing data handling
Exclusion
Analysis Number of participants (number of events)
208 (141)
Modelling method
Survival, Cox
Predictor selection method
  • For inclusion in the multivariable model, univariable analysis

  • During multivariable modelling, significance

    • P value < 0.05


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Apparent
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Not reported
Overall performance
Not reported
Risk groups
3 risk groups: low‐risk group (score = 0), intermediate‐risk group (0 < score < 5) and high‐risk group (score = 5) based on Kaplan Meier plots/estimates.
Model  Model presentation
  • Sum score ‐ 1 if age at onset ≤ 31 years, 3 if 3‐4 BC present, 1 if > 4 WBC in CSF

  • Regression coefficients, risk groups (and KM plots), KM plot of baseline hazard


Number of predictors in the model
3
Predictors in the model
Age (≤ 31 years), 3‐4+ MR Barkhof Criteria, CSF white blood cell count > 4/mm3
Effect measure estimates
HR (95% CI): age ≤ 31 years 1.44 (1.02 to 2.01), 3‐4+ MR Barkhof criteria 2.07 (1.47 to 2.91), CSF white blood cell count > 4/mm3 1.44 (1.03 to 2.02)
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To assess whether CSF analysis at the time of a first demyelinating event is a useful tool to predict CDMS. Specifically: first, to assess the predictive value of CSF analysis independently of the other known prognostic factors, and, second, to provide a simple classification for predicting CDMS based on a multivariate Cox model.
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the CSF analysis.
Model interpretation
Exploratory
Suggested improvements
Validation in another cohort
Notes Applicability overall
Low
 
Item Authors' judgement Support for judgement
Participants No The data were used for the second time, with study inclusion based on availability of data, availability of both MRI and CSF measures.
Predictors Yes The predictor assessments were probably performed before the outcome due to the prospective nature of data collection, and all predictors are expected to be collected at the onset of the disease, which is the time of intended use. It is a single‐centre study, so predictor collection and assessment should be similar in all patients.
Outcome Yes Although the blinding of outcome assessment was not reported, the outcome definition based on new symptoms is relatively objective. The new event had to be a month apart from the first event to ensure they were separate events.
Analysis No The EPV was less than 20. Predictors were selected based on univariable analysis. The predictors were dichotomised, sometimes using clinically meaningful cutoffs, sometimes at sample median. No model performance measures were reported other than cumulative incidence plots per risk group in the development set. The multivariable model coefficients were rounded to simplify it into a score, but the steps are clear and reproducible. There was no assessment of the model before simplifying it. There was no examination of the need for shrinkage.
Overall No At least one domain is at high risk of bias.

Gurevich 2009.

Study characteristics
General information Model name
  • FLP Dev

  • FTP

  • FLP Ext Val


Primary source
Journal
Data source
Unclear
Study type
  • FLP Dev and FLP Ext Val: development + external validation

  • FTP: development

Participants Inclusion criteria
  • FLP Dev, and FTP:

    • Diagnosed with definite MS or CIS

    • Free of steroids and immunomodulatory treatments for at least 30 days before blood withdrawal

    • At least 1 year after treatment with cyclophosphamide

  • FLP Ext Val: unclear


Exclusion criteria
  • Patients with neuromyelitis optica (NMO) according to the criteria of Wingerchuk


Recruitment
Unclear, Sheba Medical Centre, Israel
Age (years)
  • FLP Dev and FTP: mean, 36.3 (onset)

  • FLP Ext Val: unclear


Sex (%F)
  • FLP Dev, and FTP: 63.8

  • FLP Ext Val: not reported


Disease duration (years)
  • FLP Dev, and FTP: mean, 5.67 (pooled SD 0.89)

  • FLP Ext Val: unclear


Diagnosis
  • FLP Dev, and FTP: 34.0% CIS, 66.0% CDMS

  • FLP Ext Val: unclear, CIS 60%, CDMS 40%


Diagnostic criteria
McDonald 2001
Treatment
  • At recruitment, 0%

  • During follow‐up:

    • FLP Dev, and FTP: 5.3% on Interferon β‐1a avonex, 2.1% on interferon β‐1b betaferon, 10.6% on interferon β‐1a rebif, 10.6% on glatiramer acetate copaxone, 6.4% on intravenous immunoglobulins Iv‐Ig)

    • FLP Ext Val: unclear, 9 on IMD


Disease description
  • FLP Dev, and FTP: EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1)

  • FLP Ext Val: unclear, published inconsistencies, EDSS (unclear if mean and SD): CIS 2.58 (0.15) CDMS 5.3 (2.39), annualised relapse rate (unclear if mean and SD): CIS 6.1 (2.05) CDMS 1 (0.51)


Recruitment period
Not reported
Predictors Considered predictors
  • FLP Dev, and FTP: PBMC RNA microarray analysis set of 22,215 gene‐transcripts averaged to 10,594 potential features (genes and annotated sequences), age, MS stage (CIS or definite), gender, annual relapse rate, EDSS at time of blood sampling, disease duration, age at onset, EDSS change in the last relapse

  • FLP Ext Val: not applicable


Number of considered predictors
  • FLP Dev, and FTP: 10,602

  • FLP Ext Val: not applicable


Timing of predictor measurement
  • FLP Dev, and FTP: at study baseline (cohort entry)

  • FLP Ext Val: not applicable


Predictor handling
  • FLP Dev, and FTP: unclear, probably continuously

  • FLP Ext Val: not applicable

Outcome Outcome definition
  • FLP Dev, and FLP Ext Val:

    • Relapse: time until next relapse broken down into 3 categories as less than 500 days, between 500 days and 1264 days, and more than 1264 days; relapse defined as the onset of new objective neurological symptoms/signs or worsening of existing neurological disability not accompanied by metabolic changes, fever or other signs of infection, and lasting for a period of at least 48 hours accompanied by objective change of at least 0.5 in the EDSS score

  • FTP:

    • Relapse: time from baseline gene expression analysis to next acute relapse, defined as the onset of new objective neurological symptoms/signs or worsening of existing neurological disability, not accompanied by metabolic changes, fever or other signs of infection, and lasting for a period of at least 48 hours accompanied by objective change of at least 0.5 in the EDSS score


Timing of outcome measurement
  • FLP Dev and FTP: unclear because of conflicting numbers when reported separately for CIS and CDMS, up to 1264 days (40 patients had the outcome in less than 500 days, 23 patients had the outcome in 500 days to 1264 days, 31 patients did not have the outcome until 1264 days)

  • FLP Ext Val: not reported

Missing data Number of participants with any missing value
  • FLP Dev: 6

  • FTP: ≤ 6, unclear exactly how many belonged to this subset

  • FLP Ext Val: not reported


Missing data handling
  • FLP Dev: complete case, participants with poor‐quality microarray results

  • FTP: complete case, unclear, participants with poor‐quality microarray results

  • FLP Ext Val: not reported

Analysis Number of participants (number of events)
  • FLP Dev: published inconsistencies, probably 94 but 79 in CIS/CDMS numbers (unclear, between 19 and 23)

  • FTP: published inconsistencies, probably 40 but 39 in CIS/CDMS numbers (continuous outcome)

  • FLP Ext Val: published inconsistencies, 10 or 12 (not reported)


Modelling method
  • FLP Dev: support vector machine, multiclass classification

  • FTP: linear regression

  • FLP Ext Val: not applicable


Predictor selection method
  • FLP Dev, and FTP:

    • For inclusion in the multivariable model, all candidate predictors

    • During multivariable modelling, stepwise selection

      • First, select predictor with lowest leave 20% out cross‐validation error and all predictors with error not statistically higher (P value). Then, iteratively add more until error rate is significantly worse. Final selection unclear

  • FLP Ext Val: not applicable


Hyperparameter tuning
  • FLP Dev: not reported

  • FTP and FLP Ext Val: not applicable


Shrinkage of predictor weights
  • FLP Dev: modelling method

  • FTP: none

  • FLP Ext Val: not applicable


Performance evaluation dataset
  • FLP Dev, and FTP: development

  • FLP Ext Val: external validation


Performance evaluation method
  • FLP Dev, and FTP: cross‐validation, repeated leave 20% out CV

  • FLP Ext Val: not applicable


Calibration estimate
  • FLP Dev, and FLP Ext Val: not reported

  • FTP: calibration plot


Discrimination estimate
  • FLP Dev, and FLP Ext Val: not reported

  • FTP: not applicable


Classification estimate
  • FLP Dev: categories determined in data, error = 0.079

  • FTP: prediction more than 50 days from observed = 0.345

  • FLP Ext Val: error = 0.25 (but 10 patients reported)


Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
  • FLP Dev and FTP: list of selected genes

  • FLP Ext Val: not applicable


Number of predictors in the model
  • FLP Dev: 10 (df unclear)

  • FTP: 9 (df unclear)

  • FLP Ext Val: not applicable


Predictors in the model
  • FLP Dev: FLJ10201, PDCD2, IL24, MEFV, CA2, SLM1, CLCN4, SMARCA1, TRIM22, TGFB2

  • FTP: KIAA1043, LOC51145, PPFIA1, MGC8685, DNCH2, PCOLCE2, FPRL1, G3BP, RHBG

  • FLP Ext Val: not applicable


Effect measure estimates
  • FLP Dev and FTP: not reported

  • FLP Ext Val: not applicable


Predictor influence measure
  • FLP Dev: not reported

  • FTP and FLP Ext Val: not applicable


Validation model update or adjustment
  • FLP Dev: not applicable

  • FLP Ext. Val: none

Interpretation  Aim of the study
To determine if subsets of genes can predict the time to the next acute relapse in patients with MS
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the use of genetic information.
Model interpretation
Probably exploratory
Suggested improvements
To find sets of predictive genes that give significant results when their gene expression is measured by cheaper, small‐scale, technologies such as kinetic RT‐PCR, to predict radiological MRI lesions (that are possibly clinically silent) from gene expression in PBMC
Notes Applicability overall
Unclear
Applicability overall rationale
  • FLP Dev:

    • Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.

  • FTP:

    • Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes. Additionally, the model was fit such that the outcome must be known in order to decide whether to use the model because participants known to have the shortest follow‐up time were used, as opposed to those predicted to have short follow‐up time based on the FLP model. Furthermore, it is unclear to whom the model applies.

  • FLP Ext Val:

    • Due to the lack of reporting on participants, applicability is unclear.

 
Item Authors' judgement Support for judgement
Participants Unclear FLP Dev and FTP: The data source was not clearly reported. 100 patients were sampled from a larger population of unclear source. Although 6 of the samples were dropped due to QC issues, missingness is expected to be at random.
FLP Ext Val: The recruitment of this additional cohort was not described at all.
Predictors Yes Although microarray analysis of transcriptome can be affected by different batch effects, there were efforts to circumvent it. Even though the microarray analysis might have occurred after the outcomes became known, the procedure was relatively automated and is expected not to be affected by the information. The intended time of model use concerning patient's disease history is unclear, but it may be any time that blood was drawn.
Outcome Yes FLP Dev: We rated this domain for this analysis as having a high risk of bias. This outcome was defined after seeing the outcome information, so it is not standard.
FTP: We rated this domain for this analysis as having a low risk of bias. It is unclear which predictors were known at outcome assessment, but we consider the relapse definition to be robust.
FLP Ext Val: We rated this domain for this analysis as having a high risk of bias. This outcome was defined after seeing the outcome information in the development set, so it is not standard.
Analysis No FLP Dev and FTP: The authors chose to use multi‐class modelling for a time‐to‐event analysis, unnecessarily categorising the outcome. The outcome groups were based on the development data distribution. The number of observations and the number of events were low. Calibration and discrimination were not addressed. Patients were excluded for poor quality in transcription data. It seems that there was no difference in the data used for predictor selection and model evaluation. The final model is unclear.
FLP Ext Val: The authors chose to use multi‐class modelling for a time‐to‐event analysis, unnecessarily categorising the outcome. The number of observations and the number of events were low. Calibration and discrimination were not addressed.
Overall No At least one domain is at high risk of bias.

Kosa 2022.

Study characteristics
General information Model name
Not applicable
Primary source
Journal
Data source
Mixed (case‐control, cohort), primary
Study type
Development
Participants Inclusion criteria
  • CDMS diagnosis

  • Lumbar puncture within one year of a clinical visit that included all tests necessary to calculate CombiWISE (EDSS, SNRS, 9HPT, and 25FW)


Exclusion criteria
  • Participants in MS exacerbation or on low‐efficacy therapies (copaxone, interferon beta preparations, oral disease‐modifying treatments) within 3 months of lumbar puncture

  • If participant on high‐efficacy therapies (natalizumab, daclizumab, alemtuzumab, rituximab, ocrelizumab) within 6 months of lumbar puncture


Recruitment
Prospectively recruited in the study 'Comprehensive Multimodal Analysis of Neuroimmunological Diseases of the Central Nervous System' (NCT00794352), unclear which centre(s), USA
Age (years)
Mean 49.6
Sex (%F)
54.2
Disease duration (years)
Mean 12.2 (pooled SD: 8.51)
Diagnosis
30.8% RRMS, 24.2% SPMS, 44.9% PPMS
Diagnostic criteria
Mixed: McDonald 2010 (Polman 2011), McDonald 2017 (Thompson 2018b)
Treatment
  • At recruitment, 0%

  • During follow‐up, not reported


Disease description
EDSS mean (SD): development set; RRMS 1.8 (1.2), SPMS 5.9 (1.2), PPMS 5.3 (1.6)/validation set RRMS 2.2 (1.6), SPMS 5.5 (1.5), PPMS 5.2 (1.6)
Recruitment period
2004 to 2021
Predictors Considered predictors
All possible Somamer ratios from 1305 Somamers (unclear adjustment for age and sex) along with individual markers
Number of considered predictors
852,167 or 852,165 (unclear adjustment for age and sex)
Timing of predictor measurement
At lumbar puncture
Predictor handling
Continuously (transformed into ratios)
Outcome Outcome definition
Composite (EDSS, SNRS, T25FW, NDH‐9HPT): MS‐DSS, a model output based on measured CombiWISE (which contains EDSS, SNRS, T25FW, NDH‐9HPT), therapy adjusted CombiWISE (which includes a treatment efficacy model), COMRIS‐CTD (including several lesion and atrophy measures), time from disease onset to first therapy, difference between adjusted and unadjusted CombiWISE, age, and family history of MS
Timing of outcome measurement
Mean: 4.3 years
Missing data Number of participants with any missing value
NR
Missing data handling
  • Exclusion

Analysis Number of participants (number of events)
227 (continuous outcome)
Modelling method
Random forest, numeric outcome
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, stepwise selection

    • Backward recursive feature elimination on 10 RFs, in which the predictors with lowest 10% variable importance removed, RFs refit, and process iterated until n predictors remaining (n chosen based on lowest out‐of‐bag error)


Hyperparameter tuning
Unclear, number of predictors to include chosen by out of bag error, random forest tuning parameters not reported
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
Random split
Calibration estimate
Calibration plot
Discrimination estimate
Not applicable
Classification estimate
Not applicable
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Not reported
Number of predictors in the model
21 or 23 (unclear if age and sex are predictors)
Predictors in the model
Somamer ratios, age, sex
Effect measure estimates
R2 = 0.264
Predictor influence measure
Variable importance
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To test the hypothesis that CSF biomarker models provide insight into MS pathophysiology, identify molecular disease heterogeneity, and lead to an independent‐cohort validated prognostic test
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the prognostic value of CSF biomarkers.
Model interpretation
Probably exploratory
Suggested improvements
Further mechanistic research
Notes Applicability overall
High
Applicability overall rationale
The outcome was not a clinical measure but rather a value produced by another model with unclear interpretation.
Auxiliary references
Calle ML, Urrea V, Boulesteix AL, Malats N. AUC‐RF: a new strategy for genomic profiling with random forest. Hum Hered 2011;72(2):121‐32.
Kosa P, Komori M, Waters R, Wu T, Cortese I, Ohayon J, et al. Novel composite MRI scale correlates highly with disability in multiple sclerosis patients. Mult Scler Relat Disord 2015;4(6):526‐35.
Roxburgh RH, Seaman SR, Masterman T, Hensiek AE, Sawcer SJ, Vukusic S, et al. Multiple Sclerosis Severity Score: using disability and disease duration to rate disease severity. Neurology 2005;64(7):1144‐51.
Weideman A M, Barbour C, Tapia‐Maltos MA, Tran T, Jackson K, Kosa P, et al. New multiple sclerosis disease severity scale predicts future accumulation of disability. Front Neurol 2017;8:598.
NCT00794352. Comprehensive multimodal analysis of neuroimmunological diseases of the central nervous system. https://clinicaltrials.gov/show/NCT00794352 (first received 20 November 2008).
 
Item Authors' judgement Support for judgement
Participants No Although the study categorised itself as a case‐control study, the model we are interested in used prospectively measured predictors and outcomes of interest. In addition, the inclusion criteria depended on the availability of some tests, which is likely to introduce risk of bias.
Predictors Yes Predictors were collected prospectively according to a standard operating procedure by investigators blinded to clinical and MRI outcomes. The predictors were available at the intended time of use, reported as first lumbar puncture.
Outcome No During the study, the calculation of neurological scales changes from manual to an app where it is likely to introduce variability. The timing of the outcome was not well defined and, despite the prospective design of the study, follow‐up time has high variability.
Analysis No The sample size was small. Participants with missing outcome data were excluded from the analysis via exclusion criteria. The model performance was assessed suboptimally in a random‐split sample. There was no indication of a final selected model that could be used by others.
Overall No At least one domain is at high risk of bias.

Kuceyeski 2018.

Study characteristics
General information Model name
Pairwise disconnection and GM atrophy
Primary source
Journal
Data source
Mixed (cohort, registry, routine care), secondary
Study type
Development
Participants Inclusion criteria
  • Early RRMS patients


Exclusion criteria
Not reported
Recruitment
Not reported
Age (years)
Mean 36.8 (unclear when)
Sex (%F)
73.3
Disease duration (years)
Mean 1.5 (SD 1.3)
Diagnosis
100% RRMS
Diagnostic criteria
Mixed: McDonald 2010 (Polman 2011), McDonald 2017 (Thompson 2018b)
Treatment
  • At recruitment, 95% on DMT

  • During follow‐up, not reported


Disease description
EDSS mean (SD): 1.1 (1.1)
Recruitment period
Not reported
Predictors Considered predictors
Age, sex, disease duration, treatment duration, baseline SDMT, baseline EDSS, regional GM atrophy (86 regions), NEMO pairwise disconnection measures (610 of 3655 considered), (other models: JHU‐MNI atlas overlap (176 regions), regional disconnection (86 ChaCo scores)), number of months between time points
Number of considered predictors
965
Timing of predictor measurement
At baseline image (early RRMS within 5 years of their first neurologic symptom), at follow‐up (unclear: outcome measurement)
Predictor handling
Continuously
Outcome Outcome definition
Disability (cognitive, SDMT): future processing speed measured using Symbol Digits Modality Test (SDMT) scores
Timing of outcome measurement
Mean (SD): 28.6 months (10.3 months)
Missing data Number of participants with any missing value
Not reported
Missing data handling
Not reported
Analysis Number of participants (number of events)
60 (continuous outcome)
Modelling method
Partial least squares regression
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, multiple models


Hyperparameter tuning
Ten‐fold cross‐validation to identify number of components that minimised predicted residual sum of squares
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
Unclear (cross‐validation and bootstrap used for model selection/fitting)
Calibration estimate
Calibration plot
Discrimination estimate
Not applicable
Classification estimate
Not applicable
Overall performance
R2 = 0.79 (95% CI 0.80 to 0.97)
Risk groups
Not reported
Model  Model presentation
Not reported
Number of predictors in the model
703 predictors transformed into 6 principal components
Predictors in the model
Age, sex, disease duration, treatment duration, baseline SDMT, baseline EDSS, regional GM atrophy (86 regions), NEMO pairwise disconnection measures (610 considered), number of months between time points
Effect measure estimates
Not reported
Predictor influence measure
Median and bootstrapped 95% confidence intervals for coefficients of statistically significant predictors
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
  1. To identify which of our 5 models (based on GM atrophy, global, regional and region‐pair disconnectivity and atlas overlap) has the best accuracy in predicting follow‐up processing speed

  2. To identify which of the global, regional, or pairwise disconnectivity; atrophy; and atlas overlap metrics in these models are significant predictors of future processing speed


Primary aim
The primary aim of this study is not the prediction of individual outcomes. The focus is on the usefulness of MRI measures and identifying promising MRI features.
Model interpretation
Exploratory
Suggested improvements
Increase sample size, the scores addressing SDMT domains as outcome measures, add WM damage measures
Notes Applicability overall
High
Applicability overall rationale
Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to show the usefulness of MRI‐based connectome measures.
 
Item Authors' judgement Support for judgement
Participants No Combination of data from a cohort study, registry, and routine care were used. No information was reported about the eligibility criteria.
Predictors Yes The predictors are objective measures or scores. All except for number of months between time points could be collected at the time of the first image.
Outcome Yes It is unclear whether the outcome was blinded to the predictors, but we consider the outcome based on SDMT to be objective. SDMT is a validated measure of cognitive function measurement.
Analysis No Even when based on the number of principal components, the EPV was low. No information on the missing data or how it was handled was provided. Details of the model were not reported. The post‐baseline time variable number of months between time points was included in the models. Although bootstrapping was used for confidence interval calculation, there was no indication that any optimism correction was performed.
Overall No At least one domain is at high risk of bias.

Law 2019.

Study characteristics
General information Model name
  • DT

  • RF

  • Ada


Primary source
Journal
Data source
Randomised trial participants, secondary
Study type
Development
Participants Inclusion criteria
  • Age between 18 years and 65 years

  • Patients meeting a stringent definition of SPMS (recently confirmed progression in EDSS in the absence of relapse)

  • Documented history of SPMS

  • Absence of relapse in the 3 months leading up to trial participation

  • EDSS score of 3.5 to 6.5

  • Kurtzke pyramidal or cerebellar system subscore ≥ 3


Exclusion criteria
  • Participants with multiple missing visits or data entries at any given visit, including participants that did not have a complete set of baseline clinical scores (EDSS, MSFC, 9HP, T25W, PASAT) or missing baseline T2LV or BPF

  • Diagnosis of PPMS

  • Previous treatment with MBP8298

  • History of malignancy

  • Steroid therapy within 30 days of study entry

  • Treatment with beta‐interferon

  • Glatiramer acetate within 3 months

  • Mitoxantrone, cyclophosphamide, methotrexate, azathioprine, or any other immunomodulating or immunosuppressive drugs or plasma exchange within 6 months prior to the first study‐specific test with the exception of corticosteroids or ACTH for relapse treatment

  • Initiation or discontinuation of 4‐AP or 3,4‐DAP at any time during the study

  • History of anaphylactic/anaphylactoid reactions to glatiramer acetate or Gd‐DTPA

  • Abnormal baseline results deemed clinically significant by the investigator

  • Any condition that could interfere with the performance of study‐specific procedures and any other condition that, in the investigator’s opinion, would make the individual unsuitable for participation.


Recruitment
  • MBP8298 RCT participants from 47 centres across 10 countries

  • Canada, United Kingdom, Netherlands, Sweden, Denmark, Finland, Germany, Estonia, Latvia, Spain


Age (years)
Mean 50.9
Sex (%F)
64.1
Disease duration (years)
Mean 9.3 (SD 5.0)
Diagnosis
100% SPMS
Diagnostic criteria
Own definition
Treatment
  • At recruitment, not reported

  • During follow‐up, 50% on MBP8298


Disease description
EDSS median (IQR): 6.0 (4.5 to 6.5)
Recruitment period
2004 to 2009
Predictors Considered predictors
Timed 25‐foot walk (T25W), 9HPT, Paced Auditory Serial Addition Test (PASAT), EDSS, disease duration, age, sex, T2 lesion volume (T2LV), brain parenchymal fraction (BPF)
Number of considered predictors
9
Timing of predictor measurement
At study baseline (RCT)
Predictor handling
Continuously
Outcome Outcome definition
Disability (EDSS): confirmed disability progression defined as an increase in EDSS (≥ 1.0 or ≥ 0.5 for baseline EDSS ≤ 5.5 or ≥ 6, respectively) sustained for 6 months
Timing of outcome measurement
At 2 years
Missing data Number of participants with any missing value
Unclear exactly how many participants have any missing
  • 1 participant with missing value

  • 127 participants excluded due to missing value


Missing data handling
Mixed: mean imputation for single patient's disease duration, exclusion
Analysis Number of participants (number of events)
485 (115)
Modelling method
  • DT: decision tree

  • RF: random forest

  • Ada: boosting, AdaBoost


Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, full model approach


Hyperparameter tuning
  • DT: nested 5‐fold cross‐validation for minimum size of each child node necessary for splitting a node (considered 5, 10, or 15%)

  • RF and Ada: nested 5‐fold cross‐validation for number of models to include in ensemble (considered 2, 5, or 10)


Shrinkage of predictor weights
  • DT: none

  • RF and Ada: modelling method


Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, 10‐fold
Calibration estimate
Not reported
Discrimination estimate
  • c‐Statistic:

    • DT: 0.618 (SD 0.03)

    • RF: 0.607 (SD 0.031)

    • Ada: 0.602 (SD 0.031)


Classification estimate
  • DT: cutoff (0.537) identified by convex hull method, sensitivity = 58.3 (SD 4.6), specificity = 62.2 (SD 2.5), PPV = 32.4 (SD 2.0), NPV = 82.7 (SD 1.8)

  • RF: cutoff (0.531) identified by convex hull method, sensitivity = 59.1 (SD 4.6), specificity = 61.1 (SD 2.5), PPV = 32.1 (SD 2.1), NPV = 82.8 (SD 1.7)

  • Ada: cutoff (0.527) identified by convex hull method, sensitivity = 53.0 (SD 4.7), specificity = 62.4 (SD 2.5), PPV = 30.5 (SD 1.6), NPV = 81.1 (SD 1.9)


Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Not reported
Number of predictors in the model
9
Predictors in the model
Timed 25‐foot walk (T25W), 9HPT, Paced Auditory Serial Addition Test (PASAT), EDSS, disease duration, age, sex, T2 lesion volume (T2LV), brain parenchymal fraction (BPF)
Effect measure estimates
Not reported
Predictor influence measure
Mean % feature contribution/importance
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To evaluate individual and ensemble model performance built using decision tree (DT)‐based algorithms compared to logistic regression (LR) and support vector machines (SVMs) for predicting SPMS disability progression
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on modelling methods.
Model interpretation
Exploratory
Suggested improvements
Bigger samples, more predictors with non‐linear relationships with progression, using random trees instead of simple DTs in AdBoost
Notes Applicability overall
Unclear
Applicability overall rationale
Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.
Auxiliary references
Freedman MS, Bar‐Or A, Oger J, Traboulsee A, Patry D, Young C, et al. A phase III study evaluating the efficacy and safety of MBP8298 in secondary progressive MS. Neurology 2011;77(16):1551‐60.
 
Item Authors' judgement Support for judgement
Participants No Data from an RCT were used, but only complete cases were included. Around 10% of patients were excluded due to missing values, and it is unclear if the excluded patients differed from the included patients.
Predictors Yes The predictors were collected during an RCT; therefore, they are expected to be collected in the same way across all patients.
Outcome Yes The outcome was standard and was assessed during an RCT. We expect EDSS assessment to be objective and do not think that predictor knowledge influences results.
Analysis No The EPV was close to 10. Discrimination was addressed, but not calibration. The final model is unclear.
Overall No At least one domain is at high risk of bias.

Lejeune 2021.

Study characteristics
General information Model name
  • Dev

  • Ext Val


Primary source
Journal
Data source
  • Dev: randomised trial participants, secondary

  • Ext Val: routine care, secondary


Study type
Development + external validation, location
Participants Inclusion criteria
  • 18 to 55 years of age

  • RRMS diagnosis

  • Relapse with available data about its clinical presentation

  • EDSS ≤ 5 before relapse

  • Pre‐ and post‐relapse EDSS scores available


Exclusion criteria
  • Dev: use of natalizumab, mitoxantrone, cyclophosphamide

  • Ext Val: not reported


Recruitment
  • Dev: participants in the COPOUSEP (Corticothérapie Orale dans les Poussées de Sclérose en Plaques), an RCT run in 14 centres (NCT00984984), France

  • Ext Val: Bordeaux University Hospital, France


Age (years)
  • Dev: mean 35.3 (unclear when)

  • Ext Val: mean 36.2 (unclear when)


Sex (%F)
  • Dev: 76.3

  • Ext Val: 76.6


Disease duration (years)
  • Dev: mean 7.32 (SD 5.5)

  • Ext Val: mean 7.62 (SD 6.56)


Diagnosis
100% RRMS
Diagnostic criteria
McDonald 2005 (Polman 2005)
Treatment
  • Dev:

    • At recruitment, 51.1% first line, 3.8% second line, 45.2% no treatment

    • During follow‐up, unclear, 32.8% therapeutic escalation, 59.1% no DMT change

  • Ext Val:

    • At recruitment, 33.9% first line, 24.7% second line, 41.4% no treatment

    • During follow‐up, unclear, 48% therapeutic escalation, 49.1% no DMT change


Disease description
  • Dev: EDSS mean (SD): 3.45 (0.96)

  • Ext Val: EDSS mean (SD): 2.93 (1.00)


Recruitment period
  • Dev: 2008 to 2013

  • Ext Val: 2005 to 2016

Predictors Considered predictors
  • Dev: sex, age, disease duration, DMT (unclear if binary or none, first line, second line), EDSS 3 to 6 months prior to relapse, EDSS at relapse, difference between relapse EDSS and prior EDSS, relapse phenotype (at least a dummy for each of: 1) motor (motor disorders or isolated irritative pyramidal signs), 2) sensory (subjective sensory disturbances corresponding to paraesthesia, objective sensory disturbances corresponding to anaesthesia or hypoesthesia), 3) gait or balance disorder related to proprioceptive ataxia, 4) visual, 5) bladder/bowel, 6) cerebellar, 7) brainstem, 8) cognitive disorders, and 9) multifocal symptoms)

  • Ext Val: not applicable


Number of considered predictors
  • Dev: between 14 and 19 (unclear transformations)

  • Ext Val: not applicable


Timing of predictor measurement
  • Dev: at study baseline (RCT, relapse) or retrospectively at screening

  • Ext Val: not applicable


Predictor handling
  • Dev:

    • Unclear, disease duration continuously (ln transformed), age and EDSS prior to relapse dichotomised or categorised, EDSS change categorised

    • No interactions considered

  • Ext Val: not applicable

Outcome Outcome definition
Disability (EDSS): residual disability at 6 months after relapse defined as an increase of at least 1 EDSS point compared with pre‐relapse EDSS
Timing of outcome measurement
At 6 months
Missing data Number of participants with any missing value
  • Dev: ≥ 29, unclear exactly how many participants have any missing

  • Ext Val: ≥ 782, unclear exactly how many participants have any missing


Missing data handling
Exclusion
Analysis Number of participants (number of events)
  • Dev: 186 (53)

  • Ext Val: 175 (55)


Modelling method
  • Dev: logistic regression, LASSO

  • Ext Val: not applicable


Predictor selection method
  • Dev:

    • For inclusion in the multivariable model, all candidate predictors

    • During multivariable modelling, modelling method

  • Ext Val: not applicable


Hyperparameter tuning
  • Dev: penalty tuning parameter estimated by 5‐fold cross‐validation

  • Ext Val: not applicable


Shrinkage of predictor weights
  • Dev: modelling method

  • Ext Val: not applicable


Performance evaluation dataset
  • Dev: development

  • Ext Val: external validation


Performance evaluation method
  • Dev: bootstrap, B = 1000

  • Ext Val: not applicable


Calibration estimate
  • Dev: not reported

  • Ext Val: calibration plot, Hosmer‐Lemeshow test


Discrimination estimate
  • Dev: c‐statistics = 0.82 (95% CI 0.73 to 0.91)

  • Ext Val: c‐statistics = 0.71 (95% CI 0.62 to 0.80)


Classification estimate
  • Dev: cutoff = 0.5, PPV 0.73 (95% CI 0.53 to 0.92), NPV 0.70 (95% CI 0.50 to 0.88)

  • Ext Val: cutoff = 0.5, PPV 0.83 (95% CI 0.76 to 0.92), NPV 0.74 (95% CI 0.67 to 0.81)


Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation

Number of predictors in the model
  • Dev: 6 (7 df)

  • Ext Val: not applicable


Predictors in the model
  • Dev: increased EDSS during relapse, pre‐relapse EDSS at 0, age, proprioceptive ataxia, subjective sensory disorder, disease duration

  • Ext Val: not applicable


Effect measure estimates
  • Dev: OR (SD): increased EDSS during relapse from 1.5 points to 2.5 points 1.08 (1.58), increased EDSS during relapse of 3 or more 4.98 (9.23), pre‐relapse EDSS at 0 points 1.75 (0.29), age 1.29 (1.65), proprioceptive ataxia 1.05 (0.95), subjective sensory disorder 0.51 (0.17), natural log of disease duration 0.73 (0.12)

  • Ext Val: not applicable


Predictor influence measure
  • Dev: not reported

  • Ext Val: not applicable


Validation model update or adjustment
  • Dev: not applicable

  • Ext Val: none

Interpretation  Aim of the study
To develop and validate a clinical‐based model for predicting the risk of residual disability at 6 months post‐relapse in MS
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Confirmatory
Suggested improvements
Not reported
Notes Applicability overall
Low
Auxiliary references
NCT00984984. Efficacy and safety of methylprednisolone per os versus IV for the treatment of multiple sclerosis (MS) relapses. https://ClinicalTrials.gov/show/NCT00984984 (first received 25 September 2009).
 
Item Authors' judgement Support for judgement
Participants No Dev: RCT data, considered to be a valid source, were used for modelling. However, > 5% of participants were excluded for missing data.
Ext Val: The use of data from routine clinical care may introduce bias. Of the 978 people in the registry, 781 were excluded for missing data. Due to the exclusion of this significant number of participants, it is unclear whether the model results are generalisable.
Predictors Yes Dev: Due to the RCT nature, predictors should have been defined and assessed in a similar way across participants. Due to the prospective nature of the RCT, predictors were collected without knowledge of the outcome. The authors specifically set out to create a prediction model in which all predictors were readily available at baseline.
Ext Val: There is no reason to believe differential or post‐outcome assessment of the predictors in this routine hospital data set from a single centre.
Outcome Yes We consider the outcome, which is based on EDSS, to be robust to sources of bias, such as knowledge of predictors at outcome assessment.
Analysis No Dev: The EPV was less than 10. Continuous predictors were dichotomised or possibly categorised without clear explanation. It was unclear how missing predictor data were handled, other than exclusion (handled in Participants section). Although calibration measures for the development set were not reported, they were reported for the external validation set of the same publication.
Ext Val: The number of events was fewer than 100. It was unclear how missing predictor data were handled other than exclusion, which was handled in the Participants section.
Overall No At least one domain is at high risk of bias.

Malpas 2020.

Study characteristics
General information Model name
  • Dev

  • Ext Val


Primary source
Journal
Data source
Registry, secondary
Study type
Development + external validation, location
Participants Inclusion criteria
  • Diagnosis of clinically definite relapse‐onset MS

  • Age at onset ≥ 18 years

  • First EDSS recorded within 12 months of symptom onset

  • At least 2 recorded EDSS scores within 10 years of symptom onset

  • At least 10 years of observation time based on last recorded EDSS


Exclusion criteria
Not reported
Recruitment
  • Dev: participants from 139 clinical centres in 34 countries in the MSBase registry

  • Ext Val: participants in the Swedish MS Registry, Sweden


Age (years)
  • Dev: mean 31.7 (onset)

  • Ext Val: mean 33.4 (onset)


Sex (%F)
  • Dev: 71.2

  • Ext Val: not reported


Disease duration (years)
  • Dev: mean 0.33 (SD 0.3)

  • Ext Val: not reported


Diagnosis
100% RRMS
Diagnostic criteria
McDonald 2010 (Polman 2011)
Treatment
  • Dev:

    • At recruitment, unclear number of participants, mean % time on treatment 1st year, first‐line 17.1%, second‐line 0.50%

    • During follow‐up, unclear number of participants, mean % time on treatment 10th year, 46% first‐line, 5.3% second‐line

  • Ext Val: not reported


Disease description
  • Dev: first year EDSS mean (SD): 1.78 (1.26), number of relapses mean (SD): 0.74 (0.93)

  • Ext Val: first year EDSS mean (SD): 1.51 (1.28)


Recruitment period
Not reported
Predictors Considered predictors
  • Dev: gender, age at symptom onset, within the first year of symptom onset: median EDSS, any hospitalisation associated with a relapse, any treatment with steroids, number of severe relapses, number of any relapses, pyramidal signs, bowel/bladder signs, cerebellar signs, incomplete recovery from a relapse, nuisance variables: disease duration at first visit, total observation time, proportion of time over the first year on first‐line therapy, proportion of time over the first year on second‐line therapy, proportion of time over the 10‐year observation period on first‐line therapy, proportion of time over the 10‐year observation period on second‐line therapy

  • Ext Val: not applicable


Number of considered predictors
  • Dev: 17

  • Ext Val: not applicable


Timing of predictor measurement
  • Dev: at symptom onset, at visits up to 1 year following symptom onset, and at final follow‐up

  • Ext Val: not applicable


Predictor handling
  • Dev:

    • Continuously

    • No interactions considered

  • Ext Val: not applicable

Outcome Outcome definition
Disability (EDSS): aggressive MS defined as all of (i) EDSS ≥ 6 reached within 10 years of symptom onset, (ii) EDSS ≥ 6 confirmed and sustained over ≥ 6 months, and (iii) EDSS ≥ 6 sustained until the end of follow‐up (≥ 10 years)
Timing of outcome measurement
  • Dev: at 10 years after onset (adjustment for time on study), time from onset to meeting aggressive disease criteria mean (SD, range): 6.05 years (2.79 years, 0 years to 9.89 years)

  • Ext Val: at 10 years after onset

Missing data Number of participants with any missing value
  • Dev: 56,081, unclear exactly how many participants have any missing

  • Ext Val: not reported


Missing data handling
Exclusion
Analysis Number of participants (number of events)
  • Dev: 2403 (145)

  • Ext Val: 556 (34)


Modelling method
  • Dev: ensemble, Bayesian model averaging with binomial distribution

  • Ext Val: not applicable


Predictor selection method
  • Dev:

    • For inclusion in the multivariable model, all candidate predictors

    • During multivariable modelling, posterior inclusion probability > 0.5

  • Ext Val: not applicable


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
  • Dev: modelling method

  • Ext Val: not applicable


Performance evaluation dataset
  • Dev: development

  • Ext Val: external validation


Performance evaluation method
  • Dev: apparent

  • Ext Val: not applicable


Calibration estimate
Not reported
Discrimination estimate
  • Dev:

    • Full model: c‐statistic = 0.82 (95% CI 0.78 to 0.85)

    • Final model: c‐statistic = 0.80 (95% CI 0.75 to 0.84)

  • Ext Val: c‐statistic = 0.75 (95% CI 0.66 to 0.84)


Classification estimate
  • Dev:

    • Full model: cutoff = 0.05, sensitivity = 0.78, specificity = 0.71, PPV = 0.15, NPV = 0.98

    • Reduced model: cutoff = 0.06, sensitivity = 0.72, specificity = 0.73, PPV = 0.15, NPV = 0.98

  • Ext Val: cutoff determined in development set (0.06), PPV = 0.15, NPV = 0.97


Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
  • Dev:

    • Relative risk of aggressive disease by number of positive signs in simplified model (dichotomised based on individual optimal thresholds)

    • BMA coefficients for larger model without intercept and 17th coefficient

  • Ext Val: not applicable


Number of predictors in the model
  • Dev:

    • Reduced model: 3

    • Full model: 17

  • Ext Val: not applicable


Predictors in the model
  • Dev: onset age, median EDSS in first year, pyramidal signs

  • Ext Val: not applicable


Effect measure estimates
  • Dev: log OR (95% credible interval) for reduced model: intercept −3.54 (‐3.85 to −3.24), onset age 0.06 (0.04 to 0.08), median EDSS in first year 0.47 (0.35 to 0.59), pyramidal signs 0.80 (0.40 to 1.20)

  • Ext Val: not applicable


Predictor influence measure
Not applicable
Validation model update or adjustment
  • Dev: not applicable

  • Ext. Val: none

Interpretation  Aim of the study
To evaluate whether patients who will develop aggressive multiple sclerosis can be identified based on early clinical markers
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on predictor identification.
Model interpretation
Probably confirmatory
Suggested improvements
Add MRI and CSF data
Notes Applicability overall
  • Dev: high

  • This study included participants who had already experienced the outcome at baseline.

  • Ext Val: unclear

  • It is unclear whether patients who had already experienced the outcome at baseline were included in the validation set.


Applicability overall rationale
  • Dev: this study included participants who had already experienced the outcome at baseline.

  • Ext Val: it is unclear whether patients who had already experienced the outcome at baseline were included in the validation set.


Auxiliary references
Butzkueven H, Chapman J, Cristiano E, Grand'Maison F, Hoffmann M, Izquierdo G, et al. MSBase: an international, online registry and platform for collaborative outcomes research in multiple sclerosis. 12(6):769‐74.
 
Item Authors' judgement Support for judgement
Participants No Dev: The data source was reported unclearly: called a registry but also a cohort study. Although the authors referred to a Quality Assurance paper for the data source, the concerned article only described the system of quality summary. Also, it was not reported what happens when centres/observations deviated from quality standards. Inappropriate inclusion of participants with outcomes at baseline may lead to bias in predictions and complicate the interpretation, regardless of whether the model estimated change or not. The sensitivity analysis only addressed whether the same predictors were included, not whether the predictions changed.
Ext Val: The data source was a registry with inclusion/exclusion depending on the length of follow‐up. Also, it is unclear whether participants had the outcome at baseline, as in the development.
Predictors Yes The final model is simple and timely. The included predictors are simple enough to be considered objective. The included predictors were collected up to 1 year after symptom onset. We consider the period up to 1 year after onset to still be onset. For this reason, and because the logistic model, as opposed to the survival model, does not require a starting point, the predictors are considered to be available at the time of model application.
Outcome Yes The outcome was pre‐specified. It was based on EDSS, which we consider to be relatively robust, so we are not concerned about the possible lack of blinding of the outcome assessor to the patient history. Participants with the outcome at baseline were included, but this was addressed in the Participants section.
Analysis No Dev: The EPV was less than 10. Complete case analysis was performed. No calibration measures were reported, and apparent discrimination was reported. External validation was done in the same paper but also without calibration. No assessment of the need for shrinkage was done. The model coefficients were given at correspondence, but in the paper, the reduced model was presented only in terms of a chart with relative risks based on combinations of predictors.
Ext Val: The event number was fewer than 100. Complete case analysis was performed. No calibration measures were reported.
Overall No At least one domain is at high risk of bias.

Mandrioli 2008.

Study characteristics
General information Model name
  • Dev

  • Ext Val


Primary source
Journal
Data source
Cohort, secondary
Study type
Development + external validation, time
Participants Inclusion criteria
  • RRMS course at onset

  • Neurological follow‐up and EDSS evaluation at least every 6 months for at least 10 years


Exclusion criteria
  • PPMS

  • No regular neurological follow‐up

  • No CSF sample available


Recruitment
Consecutive patients identified during regular follow‐ups at the Neurology Clinic of Modena University Hospital, Italy
Age (years)
  • Dev: mean 27.6 (onset)

  • Ext Val: mean 33.0 (onset)


Sex (%F)
  • Dev: 60.9

  • Ext Val: 61.5


Disease duration (years)
Not reported
Diagnosis
100% RRMS
Diagnostic criteria
Not reported
Treatment
  • At recruitment, not reported

  • During follow‐up:

    • Dev: 34.4% IFN‐b, 60.9% azathioprine, 18.8% iv mitoxantrone, 29.7% never treated

    • Ext Val: 46.2% IFN‐b, 36.9% azathioprine, 16.9% iv mitoxantrone, 32.3% never treated


Disease description
  • Dev: EDSS at diagnosis mean (SD): BMS 1.76 (0.24), SMS 2.17 (0.18)

  • Ext Val: EDSS at diagnosis mean (SD): BMS 1.65 (0.10), SMS 2.45 (0.23)


Recruitment period
  • Dev: 2003 to 2004

  • Ext Val: not reported

Predictors Considered predictors
  • Dev: IgG OB presence or absence, IgM OB presence or absence, increased IgG index, increased IgM index, gender, age at onset (> 30, < 31), sensory symptoms at onset, motor symptoms at onset, optic neuritis at onset, brainstem or cerebellar symptoms at onset, time to second relapse (> 24, < 25 months), time to second relapse in months, EDSS score at diagnosis (unclear when these were dropped: IgG OB number, IgM OB number)

  • Ext Val: not applicable


Number of considered predictors
  • Dev: 15

  • Ext Val: not applicable


Timing of predictor measurement
  • Dev: at disease onset (RRMS)

  • Ext Val: not applicable


Predictor handling
  • Dev: age at onset dichotomised, time to second relapse tested as dichotomised and continuously, IgM and IgG indices and their OB number as dichotomised (justified on the basis of its dependence on the varying laboratory method), EDSS score at diagnosis continuously

  • Ext Val: not applicable

Outcome Outcome definition
Disability (EDSS): severe MS (SMS) defined as an EDSS score of 4 or more after a disease duration of 10 years or less, benign MS (BMS) otherwise (Kurtzke 1977 criteria); progression to a new EDSS score had to be confirmed in 2 consecutive examinations
Timing of outcome measurement
  • Dev: unclear when the outcome was measured relative to study start, follow‐up from onset mean (SD): BMS 16.03 years (0.92 years), SMS 13.62 years (0.80 years)

  • Ext Val: unclear when the outcome was measured relative to study start, follow‐up from onset mean (SD): BMS 11.36 years (0.61 years), SMS 11.65 years (1.15 years)

Missing data Number of participants with any missing value
  • Dev: 29

  • Ext Val: 39


Missing data handling
Exclusion
Analysis Number of participants (number of events)
  • Dev: 64 (26)

  • Ext Val: 65 (20)


Modelling method
  • Dev: logistic regression

  • Ext Val: not applicable


Predictor selection method
  • Dev:

    • For inclusion in the multivariable model, univariable analysis

    • During multivariable modelling, significance

      • P value < 0.05

  • Ext Val: not applicable


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
  • Dev: none

  • Ext Val: not applicable


Performance evaluation dataset
  • Dev: development

  • Ext Val: external validation


Performance evaluation method
  • Dev: apparent

  • Ext Val: not applicable


Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
  • Dev: error = 0.0937, sensitivity = 0.8846, specificity = 0.9211, PPV = 0.8846, NPV = 0.9211

  • Ext Val: error = 0.1231, sensitivity = 0.8000, specificity = 0.9111, PPV = 0.8000, NPV = 0.9111


Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
  • Dev: full regression model

  • Ext Val: not applicable


Number of predictors in the model
  • Dev: 4

  • Ext Val: not applicable


Predictors in the model
  • Dev: CSF IgM OB presence, motor symptoms at onset, sensory symptoms at onset, time to second relapse in months

  • Ext Val: not applicable


Effect measure estimates
  • Dev:

    • OR (95% CI) IgM OB presence 0.02 (0.00 to 0.16), motor symptoms at onset 0.04 (0.00 to 0.43), sensory symptoms at onset 169.27 (6.95 to 4120.44), time to second relapse in months 0.96 (0.93 to 1.00)

    • Unclear/inconsistent reporting: linear predictor formula = 3.31 pyramidal – 5.13 sensory – 0.03 time + 3.86 IgMOB – 0.76

  • Ext Val: not applicable


Predictor influence measure
Not applicable
Validation model update or adjustment
  • Dev: not applicable

  • Ext Val: unclear

Interpretation  Aim of the study
To create a multifactorial prognostic index (MPI) providing the probability of a severe MS course at diagnosis based on clinical and immunological CSF parameters
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Probably exploratory
Suggested improvements
Include MRI data; validate in a large, prospective cohort
Notes Applicability overall
Low
 
Item Authors' judgement Support for judgement
Participants No Data were retrospectively collected and the data source is not clearly reported. Exclusion criteria included follow‐up and predictor availability.
Predictors Yes The study authors reported analysing immunological and clinical data blindly. CSF Immunological assessments were performed twice by 2 neurologists. They specifically chose to include only predictors available at RRMS diagnosis in the final model.
Outcome Yes The outcome was defined in an independent study (Kurtzke 1977) and was based on EDSS, so we consider it a hard outcome with little risk of bias. Furthermore, the timing of 10 years is a reasonable amount of time for reaching EDSS 4.
Analysis No Dev: EPV was less than 10. Missing data were addressed by exclusion and handled in the Participants section. Neither discrimination nor calibration was addressed. Univariate analyses were used for variable selection. It was unclear whether the predictors and their assigned weights in the final model corresponded to the results from multivariable analysis because the OR measures provided in the results table had different signs than the model formula provided in the text. Although an external validation dataset was used in the same study, only classification measures related to it were reported. The need for shrinkage was not assessed.
Ext Val: The number of participants was fewer than 100. Neither discrimination nor calibration was addressed. Participants with missing data were excluded and handled in the Participants section.
Overall No At least one domain is at high risk of bias.

Manouchehrinia 2019.

Study characteristics
General information Model name
  • Dev

  • Ext Val 1

  • Ext Val 2

  • Ext Val 3


Primary source
Journal
Data source
  • Dev: registry, secondary

  • Ext Val 1: cohort, secondary

  • Ext Val 2 and Ext Val 3: randomised trial participants, secondary


Study type
Development + external validation, multiple (location, time, spectrum)
Participants Inclusion criteria
  • Dev:

    • Swedish MS registry (SMSreg)

    • Born between 1940 and 2000

    • Initial relapsing‐remitting disease course

    • At least 1 EDSS score recorded within the RRMS phase

  • Ext Val 1:

    • SMSreg

    • Enrolled in the database from 1 January 1980 to 31 December 2004

  • Ext Val 2:

    • SMSreg

    • Visit for assessment at year 10

  • Ext Val 3:

    • SMSreg


Exclusion criteria
Not reported
Recruitment
  • Dev: national MS registry containing data on about 80% of all prevalent cases of MS in Sweden

  • Ext Val 1: 4 original MS clinics in British Columbia, containing an estimated 80% of the MS population in the province, Canada

  • Ext Val 2: participants in the ACROSS, a multicentre phase 2 RCT from 32 centres in 10 European countries (from clinicaltrial.gov: Denmark, France, Germany, Italy, Poland, Portugal, Spain, Switzerland, United Kingdom) and Canada

  • Ext Val 3: participants in the FREEDOMS and FREEDOMS II extension study, an open‐label, single‐arm, long‐term follow‐up extension study of the phase 3 trials FREEDOMS and FREEDOMS II run at 138 centres in 22 countries


Age (years)
  • Dev: mean 31.5 (onset)

  • Ext Val 1: mean 31.1 (onset)

  • Ext Val 2: mean 29.5 (onset)

  • Ext Val 3: mean 29.9 (onset)


Sex (%F)
  • Dev: 72

  • Ext Val 1 and Ext Val 3: 74

  • Ext Val 2: 67


Disease duration (years)
Unclear
Diagnosis
100% RRMS
Diagnostic criteria

Treatment
  • Dev:

    • At recruitment, unclear, a minority

    • During follow‐up, unclear number of participants, median duration of exposures first‐line 3, second‐line 0.8

  • Ext Val 1:

    • At recruitment, not reported

    • During follow‐up, unclear number of participants, median 0

  • Ext Val 2 and Ext Val 3:

    • At recruitment, 0%

    • During follow‐up, 100% on DMT


Disease description
  • Dev and Ext Val 1: first‐recorded EDSS median (IQR): 2 (1 to 3)

  • Ext Val 2: first‐recorded EDSS median (IQR): 2 (1.5 to 3)

  • Ext Val 3: first‐recorded EDSS median (IQR): 2 (1.5 to 3.5)


Recruitment period
  • Dev: up to 2016

  • Ext Val 1: 1980 to 2004

  • Ext Val 2: 2003 to 2015

  • Ext Val 3: 2006 to 2018

Predictors Considered predictors
  • Dev: calendar year of birth, sex, onset age, first‐recorded EDSS score (linear spline with a single knot at 4), age at the first‐recorded EDSS score, duration of exposure to first‐line DMTs, duration of exposure to second‐line DMTs, complete recovery from the first relapse, monofocal or polyfocal type of the first attack, sensory/sensory and motor/motor type of the first attack, relapse rate within the first 2 years from disease onset, relapse rate within the first 5 years from disease onset, relapse rate before the first EDSS score, total number of brain T2 lesions (ref 0/1 to 9/10 to 20/> 20), number of brain gadolinium–enhanced lesions (ref 0/1 to 2/> 2)

  • Ext Val 1, Ext Val 2, and Ext Val 3: not applicable


Number of considered predictors
  • Dev: 20

  • Ext Val 1, Ext Val 2, and Ext Val 3: not applicable


Timing of predictor measurement
  • Dev: from disease onset (RRMS) to first EDSS recorded (median 2 years)

  • Ext Val 1, Ext Val 2, and Ext Val 3: not applicable


Predictor handling
  • Dev: first‐recorded EDSS score as linear spline with a single knot, total number of brain T2 lesions, and number of brain gadolinium–enhanced lesions as categorised

  • Ext Val 1, Ext Val 2, and Ext Val 3: not applicable

Outcome Outcome definition
  • Dev and Ext Val 1:

    • Conversion to progressive MS (Lublin 1996): the earliest recognised date of SPMS onset determined by a neurologist at a routine clinic visit

  • Ext Val 2:

    • Conversion to progressive MS (Lublin 1996): the earliest recognised date of SPMS onset determined by a neurologist at a 10‐year follow‐up

  • Ext Val 3: conversion to progressive MS:

    • SPMS defined post hoc as a progressive increase in EDSS (by ≥ 1 from an initial score of 3 to 5 or by ≥ 0.5 for score ≥ 5.5) for at least 6 months duration in the absence or independent of relapses; SPMS not assigned to individuals below EDSS 3


Timing of outcome measurement
  • Dev: follow‐up mean (SD): 12.5 years (8.7 years)

  • Ext Val 1: follow‐up mean (SD): 13.8 years (8.4 years)

  • Ext Val 2: follow‐up mean (SD): 18.6 years (7.9 years)

  • Ext Val 3: follow‐up mean (SD): 14 years (7.8 years)

Missing data Number of participants with any missing value
  • Dev: ≥ 7684, unclear exactly how many participants have any missing

  • Ext Val 1: ≥ 106, unclear exactly how many participants have any missing

  • Ext Val 2 and Ext Val 3: not reported


Missing data handling
  • Dev and Ext Val 1

    • Mixed: complete case, exclusion

  • Ext Val 2, and Ext Val 3

    • Complete case

Analysis Number of participants (number of events)
  • Dev: 8825 (1488)

  • Ext Val 1: 3967 (888)

  • Ext Val 2: 175 (26)

  • Ext Val 3: 2355 (126)


Modelling method
  • Dev: survival, Gaussian

  • Ext Val 1, Ext Val 2 and Ext Val 3: not applicable


Predictor selection method
  • Dev:

    • For inclusion in the multivariable model, all candidate predictors

    • During multivariable modelling, stepwise selection:

      • P value < 0.05

      • backward

  • Ext Val 1, Ext Val 2, and Ext Val 3: not applicable


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
  • Dev: none

  • Ext Val 1, Ext Val 2, and Ext Val 3: not applicable


Performance evaluation dataset
  • Dev: development

  • Ext Val 1, Ext Val 2, and Ext Val 3: external validation


Performance evaluation method
  • Dev: bootstrap, B = 200

  • Ext Val 1, Ext Val 2, and Ext Val 3: not applicable


Calibration estimate
  • Dev: calibration plot

  • Ext Val 1, Ext Val 2, and Ext Val 3: not reported


Discrimination estimate
  • Dev: Harrel's c‐statistic 0.84 (95% CI 0.83 to 0.85)

  • Ext Val 1: Harrel's c‐statistic 0.77 (95% CI 0.76 to 0.78)

  • Ext Val 2: Harrel's c‐statistic 0.77 (95% CI 0.70 to 0.85)

  • Ext Val 3: Harrel's c‐statistic 0.87 (95% CI 0.84 to 0.89)


Classification estimate
Not reported
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
  • Dev:

  • Ext Val 1, Ext Val 2, and Ext Val 3: not applicable


Number of predictors in the model
  • Dev: 5 (6 df)

  • Ext Val 1, Ext Val 2, and Ext Val 3: not applicable


Predictors in the model
  • Dev: calendar year of birth, male sex, onset age, first‐recorded EDSS score, age at the first‐recorded EDSS score

  • Ext Val 1, Ext Val 2, and Ext Val 3: not applicable


Effect measure estimates
  • Dev: log HR (95% CI): calendar year of birth 0.21 (0.16 to 0.28), male sex −1.24 (–1.79 to −0.70), onset age −0.93 (−0.96 to −0.89), lower EDSS spline −2.42 (−2.66 to −2.18), upper EDSS spline 1.42 (0.89 to 1.96), age at the first‐recorded EDSS score 0.88 (0.82 to 0.94)

  • Ext Val 1, Ext Val 2, and Ext Val 3: not applicable


Predictor influence measure
Not applicable
Validation model update or adjustment
  • Dev: not applicable

  • Ext Val 1, Ext Val 2, and Ext Val 3: none

Interpretation  Aim of the study
To design a nomogram, a prediction tool, to predict the individual’s risk of conversion to secondary progressive multiple sclerosis (SPMS) at the time of multiple sclerosis (MS) onset
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Probably confirmatory
Suggested improvements
Not reported
Notes Applicability overall
Low
Auxiliary references
Calabresi PA, Radue EW, Goodin D, Jeffery D, Rammohan KW, Reder AT, et al. Safety and efficacy of fingolimod in patients with relapsing‐remitting multiple sclerosis (FREEDOMS II): a double‐blind, randomised, placebo‐controlled, phase 3 trial. Lancet Neurol 2014;13(6):545‐56.
Derfuss T, Sastre‐Garriga J, Montalban X, Rodegher M, Wuerfel J, Gaetano L, et al. The ACROSS study: long‐term efficacy of fingolimod in patients with relapsing‐remitting multiple sclerosis. Mult Scler J Exp Transl Clin 2020;6(1):2055217320907951.
Hillert J, Stawiarz L. The Swedish MS registry – clinical support tool and scientific resource. Acta Neurol Scand 2015;132(199):11‐9.
Kappos L, Antel J, Comi G, Montalban X, O'Connor P, Polman C H, et al. Oral fingolimod (FTY720) for relapsing multiple sclerosis. N Engl J Med 2006;355(11):1124‐40.
Kappos L, Radue EW, O'Connor P, Polman C, Hohlfeld R, Calabresi P, et al. A placebo‐controlled trial of oral fingolimod in relapsing multiple sclerosis. N Engl J Med 2010;362(5):387‐401.
NCT02307838. Long‐term follow‐up of fingolimod phase II study patients (ACROSS). https://clinicaltrials.gov/ct2/show/NCT02307838 (first received 4 December 2014).
 
Item Authors' judgement Support for judgement
Participants Yes Dev: We rated this domain for this analysis as having a high risk of bias. The data source was a nationwide registry; hence, it is expected to be heterogeneous. The inclusion criteria were based on the presence of a predictor.
Ext Val 1: We rated this domain for this analysis as having a high risk of bias. The inclusion criteria were based on the presence of a predictor. The data source is not very clear, although it was referred to as a cohort.
Ext Val 2: We rated this domain for this analysis as having a high risk of bias. The data source was a secondary use of an RCT. Although the initial inclusion/exclusion criteria were not reported, we expect them to be appropriate due to the inherent prospective nature. However, only participants with complete follow‐up were included for this analysis, even though survival analysis was used.
Ext Val 3: We rated this domain for this analysis as having a low risk of bias. The data source was a secondary use of an RCT. Although the initial inclusion/exclusion criteria were not reported, it is expected to be appropriate due to the inherent prospective nature. The number of patients completing the FREEDOMS studies and the number included here are the same.
Predictors No Onset age and age at the first‐recorded EDSS score were predictors in the final model. The intended time of model use was stated to be the RRMS onset. Age at first recorded EDSS was available only several years after onset instead of at time of model use.
Outcome Yes Dev: We rated this domain for this analysis as having a high risk of bias. The participants were seen 4 to 7 times in 5 years to 10 years, considered to be close to the expected frequency of yearly visits. However, the outcome is not clearly operationalised in the report or in the criteria referred.
Ext Val 1: We rated this domain for this analysis as having a high risk of bias. The participants were seen 3 to 5 times in 5 years to 10 years, less than a visit per year. Since the outcome was time‐to‐event, the varying density of observations might introduce a bias. However, the outcome is not clearly operationalised in the report or in the criteria referred.
Ext Val 2: We rated this domain for this analysis as having a high risk of bias. The outcome is not clearly operationalised in the report or in the criteria referred.
Ext Val 3: We rated this domain for this analysis as having a low risk of bias. The outcome assessment was made outside the trial based on on‐trial EDSS information. The outcome assessment was EDSS‐based and therefore relatively robust to bias due to lack of blinding.
Analysis No Dev: Many candidate predictor values were missing, and in which subset of patients the backward selection took place was not reported. Complete case analysis was used. The candidate predictors of the number of lesions were categorised, but EDSS was handled using linear splines. The bootstrap procedure for performance measures did not include predictor selection, but external validation was done. However, the external validations did not address calibration.
Ext Val 1: Calibration was not assessed. The amount of missing data and how it was handled was not reported.
Ext Val 2: There were too few events in this validation set, and calibration was not assessed.
Ext Val 3: Calibration was not assessed. No information was reported on the handling of missing data.
Overall No At least one domain is at high risk of bias.

Margaritella 2012.

Study characteristics
General information Model name
Not applicable
Primary source
Journal
Data source
Routine care, secondary
Study type
Development
Participants Inclusion criteria
  • Diagnosis of clinically definite MS according to research criteria at the time of assessment

  • 3 consecutive tests (multimodal sensory EP tests and simultaneous clinical and EDSS assessments)

  • Assessed yearly on 3 consecutive occasions


Exclusion criteria
  • Incomplete EP tests

  • Tests performed during a clinical relapse


Recruitment
Patients referred to a single MS centre in Milan, Italy
Age (years)
Mean 28.6 (onset)
Sex (%F)
79.3
Disease duration (years)
Mean 10.1 (SD 7.3)
Diagnosis
89.7% RRMS, 3.4% PPMS, 6.9% benign MS
Diagnostic criteria
Mixed: McDonald 2001, McDonald 2005 (Polman 2005)
Treatment
Not reported
Disease description
EDSS mean (SD): 2.1 (1.5)
Recruitment period
2005 to 2008
Predictors Considered predictors
mEPS (1 year lag), age, age at onset, gender, disease course type (RR, SP, PP, benign), EDSS (1 year lag)
Number of considered predictors
≥ 8 (unclear transformations)
Timing of predictor measurement
At multiple assessments consecutively for 3 years until 1 year prior to outcome
Predictor handling
Continuously, unclear: mEPS as square root
Outcome Outcome definition
Disability (EDSS): EDSS score
Timing of outcome measurement
At 1 year after included mEPS and EDSS predictors (probably occurring multiple yearly periods ‐ up to 3 per patient)
Missing data Number of participants with any missing value
163
Missing data handling
Exclusion
Analysis Number of participants (number of events)
58 participants, ≤ 174 observations (continuous outcome)
Modelling method
Linear regression
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, unclear

    • AIC, BIC

    • One variable was excluded based on AIC/BIC during multivariable regression


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Apparent
Calibration estimate
Histogram of differences between measured and predicted values
Discrimination estimate
Not applicable
Classification estimate
% predictions within ± 0.5 of observed = 0.72
Overall performance
R2 = 0.8
Risk groups
Not reported
Model  Model presentation
Full regression model
Number of predictors in the model
6
Predictors in the model
EDSS, mEPS, age at onset, gender, benign course, PP course
Effect measure estimates
Linear model coefficients (SE): EDSS 0.86 (0.589), mEPS 0.11 (0.038), age at onset −0.009 (0.014), gender 0.25 (0.201), benign course −0.26 (0.186), PP course −0.98 (0.594), intercept 19.86 (27.93)
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To re‐evaluate the usefulness of mEP for short‐term prediction of the EDSS by considering mEP not as a single predictor but within a multivariate statistic approach derived from economics that can be easily implemented and tested
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the prognostic value of multimodal EP.
Model interpretation
Probably confirmatory
Suggested improvements
Test model on more heterogeneous patient groups and test for ability to predict out longer than 1 year, including motor evoked potentials
Notes Applicability overall
Low
 
Item Authors' judgement Support for judgement
Participants No Exclusions were based on data availability because of the data source, routine clinical data.
Predictors Yes Predictors were collected according to recommended protocols at a single centre by limited number of technicians. Even if post‐processing might be after the outcome, the predictor definitions seem objective.
Outcome Yes The outcome was based on EDSS, which is considered to be an objective measure.
Analysis No Whether or not the sample size was sufficient could not be judged. There was no appropriate calibration plot. Overfitting and optimism were not addressed. EDSS score was treated as a continuous, normally distributed variable, although it is an ordinal measure. Participants with missing EP and EDSS data were excluded, but further missing data handling was not reported.
Overall No At least one domain is at high risk of bias.

Martinelli 2017.

Study characteristics
General information Model name
MRI criteria + all significant
Primary source
Journal
Data source
Routine care
Study type
Development
Participants Inclusion criteria
  • Age between 18 years and 55 years at the time of the first neurological episode

  • Diagnosis of CIS not attributable to any definitive disease

  • Underwent a comprehensive diagnostic workup

  • Hospitalisation within 3 months from symptom onset and all examinations (baseline CSF, EP, MRI)

  • Follow‐­up of more than 2 years


Exclusion criteria
Not reported
Recruitment
Patients admitted to the MS centre at San Raffaele Hospital in Milan, Italy
Age (years)
Mean 32.0
Sex (%F)
67.9
Disease duration (years)
Up to 3 months
Diagnosis
100% CIS
Diagnostic criteria
Not reported
Treatment
  • At recruitment, not reported

  • During follow‐up, 39.5% on DMT


Disease description
Not reported
Recruitment period
2000 to 2013
Predictors Considered predictors
2010 DIS criteria fulfilled, 2010 DIT criteria fulfilled; age at onset; sex; multifocal or monofocal type of onset; partial or complete recovery; brainstem, optic neuritis, spinal cord, or other type of CIS; binary T2 lesions; binary T1 lesions; binary Gd‐enhancing lesions; binary CSF cells; binary CSF proteins; CSF oligoclonal bands present or absent; binary Link Index; binary Tourtellotte Index; binary Reiber Index; binary blood‐brain barrier damage index; abnormal or normal visual evoked potentials; abnormal or normal auditory evoked potentials; abnormal or normal somatosensory evoked potentials; abnormal or normal motor evoked potentials; abnormal or normal overall evoked potential score (adjusted for steroids use in 4 weeks prior to examinations, DMDs during follow‐up)
Number of considered predictors
Between 24 and 36 (unclear adjustments and transformations)
Timing of predictor measurement
At disease onset (CIS) and up to 3 months after disease onset
Predictor handling
  • All both continuously and as dichotomised (cutoff points suggested in the literature)

  • At least 1 interaction considered

Outcome Outcome definition
Conversion to definite MS (Poser 1983): time‐to‐CDMS defined as interval between onset of the first neurological event and last neurological visit or CDMS, new symptoms or signs occurring after an interval of at least 1 month from the onset of CIS only when other diagnoses are excluded
Timing of outcome measurement
Follow‐up median (IQR): 7.3 years (3.5 years to 10.2 years)
Missing data Number of participants with any missing value
224
Missing data handling
Exclusion
Analysis Number of participants (number of events)
243 (108)
Modelling method
Survival, Cox
Predictor selection method
  • For inclusion in the multivariable model, unclear

  • During multivariable modelling, inclusion of predictors into the final model based on likelihood ratio test comparing the model with only MRI criteria to the model with individual predictors and MRI criteria


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Apparent
Calibration estimate
Gronnesby and Borgan statistic
Discrimination estimate
Pencina's c‐statistic 5‐year: 0.695 (95% CI 0.635 to 0.753), 2‐year: 0.74 (95% CI 0.677 to 0.804)
Classification estimate
Categories defined as low: 0% to 33.3%, moderate: 33.3% to 66.7%, high: 66.6% to 100%, net reclassification improvement = 0.3
Overall performance
Not reported
Risk groups
3 risk groups low: 0% to 33.3%, moderate: 33.3% to 66.7%, high: 66.6% to 100%
Model  Model presentation
List of selected predictors
Number of predictors in the model
5 or 7 (unclear adjustment)
Predictors in the model
2010 DIS criteria fulfilled, 2010 DIT criteria fulfilled, age, T1 lesions, CSF oligoclonal bands (steroid use in 4 weeks prior to study, DMT use during follow‐up)
Effect measure estimates
Not reported
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To determine whether multiple biomarkers improved the prediction of MS in patients with CIS in a real‐world clinical practice
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the added value of considering multiple biomarkers as opposed to univariate prediction.
Model interpretation
Probably exploratory
Suggested improvements
Multicentric prospective studies, enroling a larger number of patients with CIS and taking into consideration all the possible biomarkers (e.g.comorbidities, spinal MRI) of CDMS risk
Notes Applicability overall
Unclear
Applicability overall rationale
Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.
 
Item Authors' judgement Support for judgement
Participants No The data were from medical records, probably from routine care. According to the flow chart, the data were retrospectively identified from a database but described as a cohort in Table 1. Inclusion was based on the length of follow‐up (approximately 6% excluded for this reason) and availability of all routine workup measures (n = 195).
Predictors No The clinical predictors can be measured relatively objectively and were measured during the inclusion hospitalisation. The EP and MRI assessments were reported to be blinded to the follow‐up data and outcome. Although the time of intended model use is not explicit, the inclusion criteria indicated that it is 3 months from symptom onset, and all predictors were reported as measured at baseline examinations. However, all models were adjusted for treatment during follow‐up. Information was not available at the time of prediction.
Outcome Yes Although the blinding of outcome assessment was not reported, the outcome definition based on new symptoms is relatively objective.
Analysis No The details of the model itself were not explicitly reported. The number of events per variable was low. Complete case analysis was used. Only P values were reported to address calibration. Assessment occurred only in the full development set.
Overall No At least one domain is at high risk of bias.

Misicka 2020.

Study characteristics
General information Model name
  • Ever

  • 10 years

  • 20 years


Primary source
Journal
Data source
Registry, secondary
Study type
Development
Participants Inclusion criteria
  • Age of MS onset of ≥ 18 years

  • Neurologist confirmed MS diagnosis

  • RRMS at onset

  • Non‐Hispanic white


Exclusion criteria
  • Clinical or radiological evidence of stroke

  • History of meningitis, neoplastic, peripheral nervous system, primary muscle disease, other well‐characterised non‐demyelinating diseases of the nervous system

  • Blood‐borne pathogens

  • Allogenic bone marrow transplant

  • Weight < 37 pounds

  • More than 1 person per extended family


Recruitment
Participants in the Accelerated Cure Project for MS, a repository of biological samples and epidemiological data for persons with demyelinating diseases, recruited from the patient base or the surrounding communities from 10 MS speciality clinics, USA
Age (years)
  • Ever: median 32.0 (onset)

  • 10 years and 20 years: median 32.0 (MS onset)


Sex (%F)
78.1
Disease duration (years)
Median 11.0 (IQR: 5 to 19)
Diagnosis
100% RRMS
Diagnostic criteria
Mixed: McDonald 2005 (Polman 2005) and McDonald 2010 (Polman 2011)
Treatment
  • At recruitment, probably 0%

  • During follow‐up, not reported


Disease description
Time to second relapse: 0 years to 1 year 48.1%, 2 years to 5 years 31.8%, ≥ 6 years 20.1%
Recruitment period
2006 to 2013
Predictors Considered predictors
Age of MS onset, sex, years of education, history of infectious mononucleosis prior MS onset, tobacco smoking within 5 years prior to MS onset, obesity, high cholesterol, high blood pressure, type II diabetes, cancer, neurological disease, physical disease, psychological disorders, other autoimmune diseases; y/n for impaired functional domains: motor, cerebellar, spasticity, optic nerve, facial (motor), facial (sensory), brainstem and bulbar, cognitive, sexual, bladder and bowel, affect mood, fatigue; time to second relapse (TT2R; ≤ 1 year, 2 years to 5 years, and ≥ 6 years), the number of relapses experienced in the first 2 years after MS onset (NR2Y; ≤ 1, 2 to 3, ≥ 4 relapses, and NA), HLA‐A*02:01 alleles (0, 1, 2, NA), HLA‐DRB1*15:01 alleles (0, 1, 2, NA), Genetic Risk Score
Number of considered predictors
35
Timing of predictor measurement
At study interview (the same as the time of outcome reporting)
Predictor handling
Continuously except time to second relapse and the number of relapses in the first 2 years after MS onset, which were categorised
Outcome Outcome definition
Conversion to progressive MS: time to SPMS defined as the difference between participant‐reported age of onset of RRMS, age of first symptom or exacerbation, and age of onset of SPMS
Timing of outcome measurement
  • Ever: not reported

  • 10 years: up to 10 years

  • 20 years: up to 20 years

Missing data Number of participants with any missing value
≥ 323, unclear exactly how many participants have any missing
Missing data handling
Mixed: complete case for genetic variables, single imputation with a forest for other predictors, single imputation with category NA for NR2Y
Analysis Number of participants (number of events)
  • Ever: 1166 (177)

  • 10 years: 1166 (55)

  • 20 years: 1166 (128)


Modelling method
Survival, Cox
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, stepwise selection

    • AIC

    • Forward


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Apparent
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Not reported
Overall performance
  • Ever: R2 = 0.28 to 0.32 (with genetic predictors)

  • 10 years: R2 = 0.50 to 0.56 (with genetic predictors)

  • 20 years: R2 = 0.34 to 0.40 (with genetic predictors)


Risk groups
Not reported
Model  Model presentation
Nomogram
Number of predictors in the model
6 (7 df)
Predictors in the model
  • Ever: age of MS onset, male sex, time to second relapse, neurological disorders, spasticity, HLA‐A*02:01

  • 10 years: age of MS onset, male sex, time to second relapse, cancer, brainstem/bulbar, HLA‐A*02:01 0.60

  • 20 years: age of MS onset, male sex, time to second relapse, obesity, neurological disorders, HLA‐A*02:01 0.56


Effect measure estimates
  • Ever: HR (95% CI): age of MS onset 1.08 (1.06 to 1.09), male sex 1.84 (1.33 to 2.55), time to second relapse 2 to 5 years 1.07 (0.75 to 1.53), time to second relapse ≥ 6 years 0.59 (0.40 to 0.88), neurological disorders 0.62 (0.40 to 0.94), spasticity 0.61 (0.37 to 1.02), HLA‐A*02:01 0.73 (0.56 to 0.97), not reported/given upon request

  • 10 years: HR (95% CI): age of MS onset 1.06 (1.03 to 1.09), male sex 2.62 (1.51 to 4.55), time to second relapse 2 to 5 years 0.69 (0.38 to 1.25), time to second relapse ≥ 6 years 0.25 (0.09 to 0.65), cancer 2.59 (1.01 to 6.67), brainstem/bulbar 0.47 (0.23 to 0.98), HLA‐A*02:01 0.60 (0.35 to 1.04), not reported/given upon request

  • 20 years: HR (95% CI): age of MS onset 1.08 (1.06 to 1.10), male sex 1.66 (1.12 to 2.45), time to second relapse 2 to 5 years 0.86 (0.57 to 1.29), time to second relapse ≥ 6 years 0.49 (0.30 to 0.80), obesity 0.33 (0.12 to 0.89), neurological disorders 0.46 (0.26 to 0.79), HLA‐A*02:01 0.56 (0.39 to 0.80), not reported/given upon request


Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To construct prediction models for SPMS using sociodemographic and self‐reported clinical measures that would be available at or near MS onset, with specific considerations for MS genetic risk factors
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Probably confirmatory
Suggested improvements
Not reported
Notes Applicability overall
Unclear
Applicability overall rationale
Due to the lack of reporting on outcomes and their definition, applicability is unclear.
Auxiliary references
Saroufim P, Zweig SA, Conway DS, Briggs FBS. Cardiovascular conditions in persons with multiple sclerosis, neuromyelitis optica and transverse myelitis. Mult Scler Relat Disord 2018;25:21‐5.
 
Item Authors' judgement Support for judgement
Participants No The clinical data were collected cross‐sectionally by asking the patients about their medical history. Therefore, there is a high chance of recall bias or length‐time bias.
Predictors No The nature of clinical data collection by medical history taken from patients introduces recall bias. For example, the patients who had SP might remember the details or the diseases they had more vividly. Alternatively, patients with a shorter disease duration at the time of the interview might remember the details at disease onset more accurately.
Outcome No The outcome was based on patient‐reported time of RRMS diagnosis, and while CDMS was confirmed by a neurologist, the authors did not report that the timing was also confirmed. A definition of what was considered SPMS was not given. This makes the outcome assessment non‐standard and non‐uniform. Also, the patients knew all their clinical history while reporting the outcome.
Analysis No The EPV was less than 10. Time to second relapse was categorised. Neither calibration nor discrimination was addressed. Evaluation occurred in the full development set only. Missing values for non‐genetic variables were handled with multiple imputation. Participants not contributing genetic data were excluded.
Overall No At least one domain is at high risk of bias.

Montolio 2021.

Study characteristics
General information Model name
Disability Course ‐ LSTM
Primary source
Journal
Data source
Routine care, secondary
Study type
Development
Participants Inclusion criteria
  • MS patients

  • Best corrected visual acuity (BCVA) of 20/40 or higher

  • Refractive error within ± 5.00 dioptres equivalent sphere and ± 2.00 dioptres astigmatism

  • Transparent ocular media (nuclear colour or opalescence, cortical or posterior subcapsular lens opacity < 1) according to the Lens Opacities Classification System III

  • Unclear, having 7 visits: baseline, 5 annual, 10‐year visit


Exclusion criteria
  • Prior intraocular surgery

  • Diabetes or other diseases affecting the visual field or nervous system

  • Ongoing use of medications that could affect visual function


Recruitment
Miguel Servet University Hospital in Zarazoga, Spain
Age (years)
Mean 42.4
Sex (%F)
67.1
Disease duration (years)
Mean 10.1 (pooled SD 7.74)
Diagnosis
92.7% RRMS, 6.1% SPMS, 1.2% PPMS
Diagnostic criteria
McDonald 2001
Treatment
Unclear timing, 40% on IFN, 30% on immunomodulators, 30% none
Disease description
EDSS mean 2.6 (SD between 1.27 and 2.02)
Recruitment period
Not reported
Predictors Considered predictors
Baseline visit: age, sex, MS duration, MS subtype, ON antecedent; at baseline and the following 2 annual visits: BCVA, relapse in past year, EDSS, peripapillary thickness, superior thickness, nasal thickness, inferior thickness, temporal thickness, foveal thickness
Number of considered predictors
39
Timing of predictor measurement
At 3 visits over 2 years (not defined baseline and annual visits 1 and 2)
Predictor handling
Continuously, one‐hot encoding for categories
Outcome Outcome definition
Disability (EDSS): worsening defined as at least a 1‐point increase in EDSS between visit 2‐ and 10‐year follow‐up
Timing of outcome measurement
Follow‐up for 10 years from baseline, 8 years from the last predictors
Missing data Number of participants with any missing value
Not reported
Missing data handling
Not reported
Analysis Number of participants (number of events)
82 (37)
Modelling method
Long short‐term memory recursive neural network
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, LASSO used to select predictors for actual prediction model


Hyperparameter tuning
Search for optimal number of hidden layers (30), epochs (30), and mini‐batch size (20) in cross validation
Shrinkage of predictor weights
Not reported
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, 10‐fold
Calibration estimate
Not reported
Discrimination estimate
c‐Statistic = 0.8165
Classification estimate
Accuracy = 0.817, sensitivity = 0.811, specificity = 0.822, PPV = 0.789
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
List of selected predictors
Number of predictors in the model
5 (4 of them longitudinal)
Predictors in the model
Disease duration, relapse in preceding year, EDSS, temporal RNFL thickness, superior RNFL thickness
Effect measure estimates
Not reported
Predictor influence measure
Not reported
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To improve the MS diagnosis and predict the long‐term course of disability in MS patients based on clinical data and retinal nerve fibre layer (RNFL) thickness, measured by optical coherence tomography (OCT)
Primary aim
The primary aim of this study is only partially about the prediction of individual outcomes. The focus is on OCT measures and machine learning.
Model interpretation
Probably exploratory
Suggested improvements
Use of OCT devices in combination with other techniques such as MRI, EP or CSF analysis, used in combination with clinical data, such as the EDSS
Notes Applicability overall
Unclear
Applicability overall rationale
Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.
 
Item Authors' judgement Support for judgement
Participants No The data source was routine care even though there were clearly defined inclusion and exclusion criteria.
Predictors Yes Although the predictors are collected up to year 2 to predict 10‐year outcome from the baseline, the situation is clarified by reporting the prediction as for 8 years.
Outcome No The outcome was based on a 1‐point increase in EDSS. However, the meaning of a 1‐point change depends on the baseline value. This study included participants of different MS subtypes and a range of EDSS at baseline, which are expected to have different patterns of change due to disease. The outcome was not reported to be confirmed at a later point.
Analysis No The EPV was very low. Information on missing data and handling was not reported. Calibration was not assessed. Parameter tuning, modelling method selection, and final performance resulted from unnested CV. No model was provided.
Overall No At least one domain is at high risk of bias.

Olesen 2019.

Study characteristics
General information Model name
  • Routine

  • Candidate


Primary source
Journal
Data source
Cohort, primary
Study type
Development
Participants Inclusion criteria
  • > 15 years of age

  • Acute ON diagnosed by independent neurological and ophthalmologic examination

  • Prior to treatment


Exclusion criteria
  • Previous diagnosis of MS

  • NMOSD

  • Another cause of optic neuropathy that was apparent at the time of referral (other intraocular pathologic conditions with symptoms mimicking ON such as high ocular pressure and vascular, traumatic, infectious, metabolic, neoplastic, or toxic causes)


Recruitment
3 hospital units with ophthalmology departments and 44 ophthalmologists in general practice (Primary Care Ophthalmology) in the administrative unit Region of Southern Denmark
Age (years)
Median 36.0
Sex (%F)
67.5
Disease duration (years)
Not reported
Diagnosis
100% CIS (isolated optic neuritis)
Diagnostic criteria
Optic Neuritis Study Group criteria 1991
Treatment
  • At recruitment, 0%

  • During follow‐up, not reported


Disease description
Not reported
Recruitment period
2014 to 2016
Predictors Considered predictors
  • Routine: CSF leukocyte count, oligoclonal band positivity, IgG index, albumin ratio

  • Candidate: CSF neurofilament light chain levels (NF‐L), serum IL‐10, serum IL‐6, serum IL‐17A, serum IL‐1beta, serum TRAIL, CSF IL‐10, CSF IL‐6, CSF IL‐17A, CSF IL‐1beta, CSF TRAIL, CSF CXCL13, CSF TNF‐alpha, serum TNF‐alpha


Number of considered predictors
  • Routine: 4

  • Candidate: 14


Timing of predictor measurement
At disease onset (ON), from ON onset median (range): 14 days (2 days to 38 days)
Predictor handling
Continuously
Outcome Outcome definition
Conversion to definite MS (McDonald 2010, Polman 2011): MS diagnosed according to McDonald 2010
Timing of outcome measurement
Follow‐up median (range): 29.6 months (19 months to 41 months)
Missing data Number of participants with any missing value
  • Routine: 2

  • Candidate: 7


Missing data handling
Not reported
Analysis Number of participants (number of events)
  • Routine: unclear if 38, 40 reported (≤ 16)

  • Candidate: unclear if 33, 40 reported (≤ 16)


Modelling method
Logistic regression
Predictor selection method
  • For inclusion in the multivariable model, univariable analysis

  • During multivariable modelling, full model approach


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Bootstrap, B = 500
Calibration estimate
Calibration plot, Hosmer‐Lemeshow test
Discrimination estimate
c‐Statistic
  • Routine: 0.86 (95% CI 0.74 to 0.98), optimism‐corrected 0.83

  • Candidate: 0.89 (95% CI 0.77 to 1.00), optimism‐correct 0.87


Classification estimate
Not reported
Overall performance
  • Routine: adjusted McFadden R2 = 0.16

  • Candidate: adjusted McFadden R2 = 0.15


Risk groups
Not reported
Model  Model presentation
Nomogram
Number of predictors in the model
3
Predictors in the model
  • Routine: OCB, leukocytes, IgG index

  • Candidate: IL‐10, NF‐L, CXCL13


Effect measure estimates
Not reported
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
We propose that markers of inflammation and of neurodegeneration (a) may differ between patients with MS‐related ON and patients with ON unrelated to MS and (b) may predict development of MS in patients with acute ON.
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the association of CSF markers with CDMS.
Model interpretation
Probably exploratory
Suggested improvements
Validation in larger, well‐designed cohorts including differential diagnoses and other ethnicities from multiple centres
Notes Applicability overall
High
Applicability overall rationale
The predictors used were CSF biomarkers and no other predictor domain was considered for use in the model.
Auxiliary references
Soelberg K, Jarius S, Skejoe H, Engberg H, Mehlsen JJ, Nilsson AC, et al. A population‐based prospective study of optic neuritis. Mult Scler 2017;23(14):1893‐901.
Soelberg K, Skejoe HPB, Grauslund J, Smith TJ, Lillevang ST, Jarius S, et al. Magnetic resonance imaging findings at the first episode of acute optic neuritis. Mult Scler Relat Disord 2018;20:30‐6.
 
Item Authors' judgement Support for judgement
Participants Unclear A prospectively collected population‐based cohort was used, but it is unclear if the participants who experienced the outcome were included. 12 participants were diagnosed with clinically definite MS in less than 2 months.
Predictors Yes Predictors were collected shortly after onset and were collected without knowledge of the outcome due to the prospective collection.
Outcome No From the 40 patients included and 16 events of clinically definite MS, 12 were diagnosed with MS at the acute stage of optic neuritis in less than 2 months, while the predictors, venous blood and CSF, from all included patients were collected within 38 days of ON onset (median, 14 days; range 2 to 38). Thus, the time difference between predictor collection and outcome seems to be too short.
Analysis No The number of participants was too low. Not all participants were included in the analysis, and a complete case analysis was probably applied. Besides, the number of patients with missing data was about 5% and was not expected to increase the risk of bias. Univariate analyses were used to select candidate predictors for multivariate analysis. Logistic regression was applied even though there was no defined timing of the outcome and follow‐up duration varied amongst participants. The resampling method excluded the variable selection process. Effect estimates were not reported, so it is unclear whether the final model corresponds to the multivariable analysis.
Overall No At least one domain is at high risk of bias.

Oprea 2020.

Study characteristics
General information Model name
Mixed treatment ‐ disability
Primary source
Journal
Data source
Routine care, secondary
Study type
Development
Participants Inclusion criteria
  • RRMS or PPMS

  • At least 2 neurological evaluations in the past 2 years


Exclusion criteria
Not reported
Recruitment
Neurology Department of the Bucharest Emergency University Hospital (BEUH), Romania
Age (years)
Mean 40.3
Sex (%F)
61.6
Disease duration (years)
Mean 10.2
Diagnosis
RRMS, PPMS
Diagnostic criteria
Not reported
Treatment
  • At recruitment, not reported

  • During follow‐up, 30.5% IFN, 29.8% GA, 2.0% teriflunomide, 37.1% natalizumab, 0.7% unknown


Disease description
Not reported
Recruitment period
Not reported
Predictors Considered predictors
Gender, age at diagnosis, age, EDSS at onset, disease duration, number of treatments
Number of considered predictors
6
Timing of predictor measurement
At a single time point during outcome determination
Predictor handling
Unclear, continuously or EDSS at onset categorised
Outcome Outcome definition
Disability (EDSS): keeping an EDSS score less than or equal to EDSS score threshold (chosen model with EDSS threshold ≤ 2.5) at final visit
Timing of outcome measurement
Not reported
Missing data Number of participants with any missing value
Not reported
Missing data handling
Not reported
Analysis Number of participants (number of events)
151 (not reported)
Modelling method
Logistic regression
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, full model approach


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, 10,000 shuffle split, train‐test: 14:1
Calibration estimate
Not reported
Discrimination estimate
c‐Statistic = 0.8221
Classification estimate
Accuracy = 0.7662, sensitivity = 0.7775, PPV = 0.8145, F1 = 0.7806
Overall performance
Brier score = 0.1754
Risk groups
Not reported
Model  Model presentation
Not reported
Number of predictors in the model
6
Predictors in the model
Gender, age at diagnosis, age, EDSS at onset, disease duration, number of treatments
Effect measure estimates
Not reported
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To develop a disability and outcome prediction algorithm in MS patients
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Exploratory
Suggested improvements
More patients, more relevant predictors, online platform
Notes Applicability overall
Unclear
Applicability overall rationale
Due to ambiguities and the lack of reporting on participants, predictors, and outcomes, applicability is unclear.
 
Item Authors' judgement Support for judgement
Participants No The study data come from routine care and eligibility criteria are unclear.
Predictors No The timing of data collection for predictors and outcome measurement are the same. Hence, probably the predictors are not available at the time of the intended model use.
Outcome No The outcome definition is unclear and does not mention confirmation. Also, the timing of the outcome assessment with respect to the prognostication is unclear, probably making the outcome highly variable for different patients with different periods between onset and assessment visit.
Analysis No The number of events was unclear, but even in the best case scenario, the number of events per predictor was lower than 15. No information on missing data and its handling was reported. Timing of predictor and outcome assessment was not considered. The final model was not presented. Although cross‐validation was used for internal validation, the need for shrinkage was not assessed.
Overall No At least one domain is at high risk of bias.

Pellegrini 2019.

Study characteristics
General information Model name
Final model with 3 predictors
Primary source
Journal
Data source
Randomised trial participants, secondary
Study type
Development
Participants Inclusion criteria
  • Common to all

    • EDSS 0 to 5

  • (ADVANCE)

    • Age between 18 years and 65 years

    • RRMS diagnosis (McDonald 2005, Polman 2005)

    • At least 2 clinically documented relapses in the previous 3 years, with at least one having occurred within the past 12 months

  • (CONFIRM)

    • Age between 18 years and 55 years

    • Diagnosis of RRMS (McDonald 2005, Polman 2005)

    • At least 1 clinically documented relapse in the previous 12 months or at least 1 gadolinium‐enhancing lesion 0 weeks to 6 weeks before randomisation

  • (DEFINE)

    • Age between 18 years and 55 years

    • RRMS diagnosis (McDonald 2005, Polman 2005)

    • Disease activity as evidenced by at least 1 clinically documented relapse within 12 months before randomisation or a brain magnetic resonance imaging (MRI) scan, obtained within 6 weeks before randomisation, that showed at least 1 gadolinium‐enhancing lesion.

  • (AFFIRM)

    • Males and females between the ages of 18 years and 50 years

    • Diagnosis of RRMS (McDonald 2001, McDonald 2001)

    • At least 1 medically documented relapse within the 12 months before the study began

    • MRI showing lesions consistent with MS


Exclusion criteria
  • Common to all

    • Progressive forms of MS

  • (ADVANCE)

    • Prespecified laboratory abnormalities

    • Previous treatment with interferon for MS for more than 4 weeks or discontinuation less than 6 months before baseline

  • (CONFIRM)

    • Other clinically significant illness

    • Prespecified laboratory abnormalities

    • Prior exposure to glatiramer acetate or contraindicated medications

  • (DEFINE)

    • Another major disease that would preclude participation in a clinical trial

    • Abnormal results on prespecified laboratory tests

    • Recent exposure to contraindicated medications

  • (AFFIRM)

    • A relapse within 50 days before the administration of the first dose of the study drug

    • Treatment with cyclophosphamide or mitoxantrone within the previous year

    • Treatment with interferon beta, glatiramer acetate, cyclosporine, azathioprine, methotrexate, or intravenous immune globulin within the previous 6 months

    • Treatment with interferon beta, glatiramer acetate, or both for more than 6 months


Recruitment
  • Placebo arm participants in the ADVANCE, DEFINE, CONFIRM, and AFFIRM, multi‐site RCTs

  • Australia, Austria, Belarus, Belgium, Bosnia and Herzegovina, Bulgaria, Canada, Chile, Colombia, Costa Rica, Croatia, Czech Republic, Estonia, France, Georgia, Germany, Greece, Guatemala, India, Ireland, Israel, Latvia, Mexico, Macedonia, Netherlands, Moldova, New Zealand, Peru, Poland, Romania, Russian Federation, Puerto Rico, Serbia, Slovakia, South Africa, Switzerland, Spain, Ukraine, United Kingdom, United States, Virgin Islands (USA)


Age (years)
Mean 37.1
Sex (%F)
71.0
Disease duration (years)
Mean 7.5 (SD 6.5)
Diagnosis
100% RRMS
Diagnostic criteria
Mixed: McDonald 2001, McDonald 2005
Treatment
  • Prior treatment, 34.5%

  • During follow‐up, 0%


Disease description
EDSS mean (SD): 2.5 (1.2), number of relapses 1 year prior to study entry mean (SD): 1.4 (0.7)
Recruitment period
Not reported
Predictors Considered predictors
Age (in years), gender (male vs female), ethnicity (white vs other), number of relapses 1 year prior to study entry, number of relapses 3 years prior to study entry, MS disease duration (in years), time since pre‐study relapse (in months), prior treatment (yes vs no), EDSS, T25FW, 9HPT, PASAT, VFT 2.5%, gadolinium‐enhancing lesion number, T1 lesion volume (log‐scale), T2 lesion volume (log‐scale), brain volume standardised Z‐score, brain parenchymal fraction, SF‐36 Physical Component Summary, SF‐36 Mental Component Summary, study identifier (as fixed term adjustment)
Number of considered predictors
23
Timing of predictor measurement
At study baseline (RCT)
Predictor handling
  • Continuously

  • Interactions tested by model based decision trees

Outcome Outcome definition
Composite (EDSS, T25FW, 9HPT, PASAT, VFT): time to disability progression confirmed at 24 weeks on either EDSS (≥ 1 point increase if baseline EDSS ≥ 1.0 or 1.5 point increase otherwise) or any of timed 25‐foot walk (T25FW) test, 9HPT, Paced Auditory Serial Addition Test (PASAT), and visual function test (VFT; 2.5% contrast level) components (20% worsening on either T25FW or 9HPT or PASAT or 10‐letter worsening on VFT)
Timing of outcome measurement
Up to 2 years
Missing data Number of participants with any missing value
Missing MRI data for 44% and 48% by design (DEFINE and CONFIRM)
Missing data handling
Multiple imputation, 10 MCMC‐based imputation sets
Analysis Number of participants (number of events)
1582 (434)
Modelling method
Survival, Cox
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, 3 best predictors (number selected based on bootstrapped c‐index) from median variable importance rank calculated from 6 modelling algorithms


Hyperparameter tuning
Parameter tuning of ML models leading to predictor selection well‐described in supplementary material
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Bootstrap
Calibration estimate
Calibration slope 1 year = 1.10 (bootstrap = 1.08, SE 0.17), 2 years = 1.00 (bootstrap = 0.97, SE 0.15)
Discrimination estimate
  • Survival c‐statistic:

    • 1 year 0.59 (SE 0.02), bootstrap 0.59

    • 2 years 0.59 (SE 0.01), bootstrap 0.59


Classification estimate
Not reported
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Regression coefficients without baseline hazard
Number of predictors in the model
3
Predictors in the model
PASAT, SF‐36 physical component summary, visual function test
Effect measure estimates
HR (95% CI): PASAT 0.94 (0.90 to 0.98), SF‐36 physical component summary 0.92 (0.88 to 0.97), visual function test 0.95 (0.92 to 0.99)
Predictor influence measure
Relative importance ranking
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To compare the aforementioned regression and machine learning methods in their ability to assess the ranking of common prognostic factors for MS progression and to generate consistent risk predictions in clinical trial data settings
Primary aim
The primary aim of this study is the prediction of individual outcomes. The focus is on the factors.
Model interpretation
Exploratory
Suggested improvements
Explore alternative predictors and their change over time, sensitivity of the endpoints’ definition to a set of baseline characteristics using a multivariate (i.e. joint) endpoint assessment based on variance components
Notes Applicability overall
Low
Auxiliary references
Calabresi PA, Kieseier BC, Arnold DL, Balcer LJ, Boyko A, Pelletier J, et al. Pegylated interferon beta‐1a for relapsing‐remitting multiple sclerosis (ADVANCE): a randomised, phase 3, double‐blind study. Lancet Neurol 2014;13(7):657‐65.
Fox RJ, Miller DH, Phillips JT, Hutchinson M, Havrdova E, Kita M, et al. Placebo‐controlled phase 3 study of oral BG‐12 or glatiramer in multiple sclerosis. N Engl J Med 2012;367(12):1087‐97.
Gold R, Kappos L, Arnold DL, Bar‐Or A, Giovannoni G, Selmaj K, et al. Placebo‐controlled phase 3 study of oral BG‐12 for relapsing multiple sclerosis. N Engl J Med 2012;367(12):1098‐107.
Polman CH, O'Connor PW, Havrdova E, Hutchinson M, Kappos L, Miller DH, et al. A randomized, placebo‐controlled trial of natalizumab for relapsing multiple sclerosis. N Engl J Med 2006;354(9):899‐910.
NCT00906399. Efficacy and safety study of peginterferon beta‐1a in participants with relapsing multiple sclerosis (ADVANCE). https://clinicaltrials.gov/ct2/show/NCT00906399 (first received 21 May 2009).
NCT00027300. Safety and efficacy of natalizumab in the treatment of multiple sclerosis. https://clinicaltrials.gov/ct2/show/NCT00027300 (first received 3 December 2001).
NCT00420212. Efficacy and safety of oral BG00012 in relapsing‐remitting multiple sclerosis (DEFINE). https://clinicaltrials.gov/ct2/show/NCT00420212 (first received 11 January 2007).
NCT00451451. Efficacy and safety study of oral BG00012 with active reference in relapsing‐remitting multiple sclerosis (CONFIRM). https://clinicaltrials.gov/ct2/show/NCT00451451 (first received 23 March 2007).
 
Item Authors' judgement Support for judgement
Participants Yes Data from an RCT were used. Although the inclusion and exclusion criteria for the prediction study were not described, the number of patients per study matched up with the original RCT publications; hence, there is no reason to assume that there were additional eligibility criteria for the prediction study.
Predictors Yes The predictors were collected during an RCT; therefore, they are expected to be collected in the same way across all patients.
Outcome Yes The outcome was composite with clear components of clinical interest, which are considered to be objective measurements. Assessments occurred during RCTs and are expected to be standardised.
Analysis Yes The EPV was around 20. Overfitting and optimism were accounted for. Calibration and discrimination were assessed.
Overall Yes All domains are at low risk of bias.

Pinto 2020.

Study characteristics
General information Model name
  • SP

  • Severity 6 years

  • Severity 10 years


Primary source
Journal
Data source
Routine care, secondary
Study type
Development
Participants Inclusion criteria
  • SP:

    • Tracked since onset diagnosis or with SP diagnosis only after the fifthyear of tracking

    • Minimum of 6 years of tracking

    • Minimum of 5 annotated visits

  • Severity 6 years and severity 10 years:

    • Tracked since onset diagnosis

    • Minimum of 10 years of tracking

    • Minimum of 5 annotated visits


Exclusion criteria
Patients with PPMS
Recruitment
Neurology Department of Centro Hospitalar e Universitario de Coimbra, Portugal
Age (years)
  • SP: mean 31.1 (onset)

  • Severity 6 years: mean 30.3 (onset)

  • Severity 10 years: mean 32.3 (onset)


Sex (%F)
  • SP: 72.7

  • Severity 6 years: 69.7

  • Severity 10 years: 77.6


Disease duration (years)
Not reported
Diagnosis
100% RRMS
Diagnostic criteria
McDonald (undefined)
Treatment
Not reported
Disease description
Not reported
Recruitment period
Not reported
Predictors Considered predictors
  • Identification (static): gender, age of onset, initial supratentorial manifestations, initial optic pathway manifestations, initial brainstem or cerebellum manifestations, initial spinal cord manifestations, clinical evidence in the MS initial manifestations, initial manifestations visualised in MRI, initial manifestations visualised in evoked potentials test, initial manifestations visualised in CSF

  • Mean, median, standard deviation, and mode and 2 segmentation windows (normal and accumulative) for dynamic features

  • Visits (dynamic): routine visit; FS scores for pyramidal, cerebellar, brainstem, sensory, bowel and bladder, visual, mental, and ambulation; cerebellar weakness, visual symptoms, gait disturbances related to ataxia, dysaesthesiae, lower extremities ataxia, paresthesiae, perturbances in cognition, gait disturbances related to paresis, gait disturbances related to spasticity, muscular weakness in upper extremities, perturbances in micturition, fatigue, muscular weakness in lower extremities, mood perturbances, EDSS

  • Relapses (dynamic): impact on ADL functions, recovery, severity; manifestations related to pyramidal tract, brain stem, bowel and bladder, neuropsychological functions, cerebellum, visual functions, and sensory functions; hospitalisation, effect on ambulatory capacity


Number of considered predictors
1306
Timing of predictor measurement
At multiple visits dependent on which n‐year model (n = 1 to 5)
Predictor handling
  • Continuously

  • No interactions considered

Outcome Outcome definition
  • SP:

    • Conversion to progressive MS (not reported): the indication in the database of an SP course diagnosis by the clinicians

  • Severity 6 years:

    • Disability (EDSS): severe disease defined as EDSS > 3 by 6 years based on the mean EDSS from all visits to the clinic that happened in the 6th year; when the year did not contain any annotated visits for a given patient, 1 of 2 possible consecutive years were considered; the chosen model is 2 years from the onset

  • Severity 10 years:

    • Disability (EDSS): severe disease defined as EDSS > 3 by 10 years based on the mean EDSS from all visits to the clinic that happened in the 10th year; when the year did not contain any annotated visits for a given patient, 1 of 2 possible consecutive years were considered; the chosen model is 5 years from the onset


Timing of outcome measurement
  • SP: as long as availability in the database/by year 2 (from baseline which formed the basis for year‐n models) 7 patients already had SP

  • Severity 6 years: at 6 years (the 2 years from the onset model) until year 2 (from baseline which formed the basis for year‐n models) at least 19 patients already met the definition of severe disease

  • Severity 10 years: at 10 years (the 5 years from onset model) until year 5 (from baseline which formed the basis for year‐n models) at least 15 patients already met the definition of severe disease

Missing data Number of participants with any missing value
Unclear exactly how many participants have any missing
Missing data handling
Mixed: single imputation of the feature mean for predictors, exclusion for outcome
Analysis Number of participants (number of events)
  • SP: 187 (21)

  • Severity 6 years: 145 (38)

  • Severity 10 years: 67 (30)


Modelling method
Support vector machine
Predictor selection method
  • For inclusion in the multivariable model, univariable analysis

  • During multivariable modelling, LASSO used to select predictors for other prediction models, tuning parameter chosen in order to yield at least 5 training samples per predictor


Hyperparameter tuning
Default parameters of MatLab function
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, (10 times) 10‐fold
Calibration estimate
Not reported
Discrimination estimate
  • c‐Statistic

    • SP: 0.86 (SD 0.07)

    • Severity 6 years: 0.89 (SD 0.03)

    • Severity 10 years: 0.85 (SD 0.07)


Classification estimate
  • SP: sensitivity = 0.76 (SD 0.14), specificity = 0.77 (SD 0.05), F1 score = 0.20 (0.05), geometric mean = 0.76 (SD 0.08)

  • Severity 6 years: sensitivity = 0.84 (SD 0.11), specificity = 0.81 (SD 0.05), F1 score = 0.53 (SD 0.07), geometric mean = 0.82 (SD 0.06)

  • Severity 10 years: sensitivity = 0.77 (SD 0.13), specificity = 0.79 (SD 0.09), F1 score = 0.72 (SD 0.09), geometric mean = 0.78 (SD 0.08)


Overall performance
Not reported
Risk groups
Not applicable
Model  Model presentation
Not reported
Number of predictors in the model
Unclear which predictors make up the final model
Predictors in the model
Not reported
Effect measure estimates
Not reported
Predictor influence measure
Predictive power (% of iterations predictor selected in)
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To predict MS progression, based on the clinical characteristics of the first 5 years of the disease
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Exploratory
Suggested improvements
Include more information such as MRI examination; use on different disease severeness criteria and compare; considering disease phenotypes as an interaction
Notes Applicability overall
High
Applicability overall rationale
Approximately half of the participants had already experienced the outcome before the measurement of predictors, which included the baseline measure of the outcome itself.
 
Item Authors' judgement Support for judgement
Participants No The data source was routine care, and inclusion was based on the availability of data.
Predictors No SP: The intended time of prediction is unclear and defined as availability in the dataset. Hence, using the predictors at 2 years seems to be unavailable at baseline. There is no enough information to judge whether predictors were defined and assessed in a similar way across patients, especially predictors related to instruments since the timing of data collection is unclear.
Severity 6 years and severity 10 years: The prognosis was presented as a 6‐year prediction while the predictors were from the second year, effectively shortening the prediction window. There is no enough information to judge whether predictors were defined and assessed in a similar way across patients, especially predictors related to instruments since the timing of data collection is unclear.
Outcome No SP: The outcome was SPMS from a routine care database, which is expected to be not standardised or operationalised. Also, the timing of the outcome was not clearly defined but was limited to the availability in the database. At 2 years, which was the chosen model, half of the events had had already occurred.
Severity 6 years and severity 10 years: EDSS scores were included in the predictors, and almost half of the participants have already experienced the event, the definition of which is based on EDSS by year 2. Also, the EDSS change was not reported to be confirmed.
Analysis No The amount of missing data was substantial and was handled by mean imputation within the cross‐validation structure. The sample size was too small. Univariable predictor selection was used. Calibration was not assessed. Parameter tuning for the main chosen model, SVM, was not reported. At correspondence, the defaults were reported to be used. It is unclear whether model selection and evaluation were properly separated. The final model is unclear.
Overall No At least one domain is at high risk of bias.

Pisani 2021.

Study characteristics
General information Model name
SP‐RiSc
Primary source
Journal
Data source
Cohort, secondary
Study type
Development
Participants Inclusion criteria
  • RRMS patients


Exclusion criteria
Not reported
Recruitment
MS specialist centre of Verona University Hospital, Italy
Age (years)
Mean 33.5
Sex (%F)
58.4
Disease duration (years)
Not reported
Diagnosis
100% RRMS
Diagnostic criteria
McDonald 2005 (Polman 2005)
Treatment
  • At recruitment, 100% on first‐line DMT (IFN, GA)

  • During follow‐up, 43.5% switched to second‐line therapy (fingolimod, natalizumab)


Disease description
EDSS median (range): 1.5 (0 to 3.5)
Recruitment period
2005 to 2018
Predictors Considered predictors
  • At onset: age, sex, EDSS, cortical lesion number, white matter lesion number, spinal cord lesions

  • At 2 years: EDSS, number of relapses, new CL number, new WM lesion number

  • Difference (2 years to 0 years): global cortical thickness, cerebellar cortical volume (adjusted for switch to second‐line DMT during follow‐up)


Number of considered predictors
12 or 13 (unclear adjustment)
Timing of predictor measurement
At diagnosis (RRMS) and up to 2 years after diagnosis
Predictor handling
Continuously
Outcome Outcome definition
Conversion to progressive MS (Lublin 1996): time to the occurrence of continuous disability accumulation independently of relapses, confirmed 12 months later, transitory plateaus in the progressive course were allowed, steady progression was the rule
Timing of outcome measurement
Examination every 6 months or when a relapse occurred; mean (range) follow‐up: 9.55 years (6.8 years to 13.13 years)
Missing data Number of participants with any missing value
Not reported
Missing data handling
Not reported
Analysis Number of participants (number of events)
262 (69)
Modelling method
Random survival forest
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, inclusion if minimal depth < mean minimal depth


Hyperparameter tuning
Parameters, but not tuning methods, mentioned in appendix
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
Random split for tool performance; development out‐of‐bag for random forest performance
Calibration estimate
Not reported
Discrimination estimate
Harrel’s c‐index on development out‐of‐bag for evaluating random forest at:
  • 7 years 92.0%

  • 8.5 years 91.0%

  • 9.5 years 91.4%

  • 10.5 years 90%

  • 13.5 years 90.0%


Classification estimate
Cutoff = 17.7, accuracy = 0.88 (95% CI 0.75 to 0.96), sensitivity = 0.92 (95% CI 0.70 to 1.00), specificity = 0.87 (95% CI 0.70 to 0.96), PPV = 0.75 (95% CI 0.48 to 0.93), NPV = 0.96 (95% CI 0.81 to 1.00) from evaluation of final tool using random split
Overall performance
Brier score (95% CI) using development out‐of‐bag for evaluating RF at:
  • 7 years: 0.08 (0.06 to 0.10)

  • 8.5 years: 0.09 (0.08 to 0.11)

  • 9.5 years: 0.08 (0.06 to 0.11)

  • 10.5 years: 0.05 (0.03 to 0.06)

  • 13.5 years: 0.02 (0.01 to 0.03)


Risk groups
3 risk groups (high for those with ensemble mortality > third quartile, medium, low for those with ensemble mortality < first quartile)
Model  Model presentation
Combination of heat map value for 2 predictors plus other predictor values weighted by their minimal depth
Number of predictors in the model
7
Predictors in the model
  • At onset: cortical lesion number, age, EDSS, white matter lesion number

  • Difference (2 years to 0 years): global cortical thickness, cerebellar cortical volume, new cortical lesion number


Effect measure estimates
Not reported
Predictor influence measure
Minimal depth
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To develop the secondary progressive risk score (SP‐RiSc), which integrates demographic, clinical, and MRI data collected from a cohort of RRMS patients during the first 2 years after the disease diagnosis
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Probably confirmatory
Suggested improvements
An additional validation, especially on a larger independent cohort with neuroimaging data from different field strength MRI scanners
Notes Applicability overall
Low
 
Item Authors' judgement Support for judgement
Participants Unclear Although the data source is reported to be a cohort study, there are no details related to eligibility criteria.
Predictors No Images were produced at a single centre and analysed by 2 clinicians with experience. The included predictors are relatively objective. The model is meant to be used at RRMS diagnosis and the survival model counts time from diagnosis only. This means that the predictors measured at 2 years from the diagnosis are should be considered unavailable.
Outcome No The secondary progression conversion outcome was clearly defined but it was not operationalised and hence the application of this definition might vary greatly based on assessors and experience level.
Analysis No The number of events was low. Missing data and its handling was not mentioned. Discrimination and overall performance of the original RF is evaluated internally with out‐of‐bag error, but the final model is only assessed with classification measures. There is no mention of parameter tuning. The final prediction tool does not correspond to the multivariable model.
Overall No At least one domain is at high risk of bias.

Roca 2020.

Study characteristics
General information Model name
Aggregated model
Primary source
Journal
Data source
Registry, secondary
Study type
Development
Participants Inclusion criteria
  • Patients with MS with initial FLAIR MRI and EDSS score at 2 years


Exclusion criteria
Not reported
Recruitment
Subset of the OFSEP (Observatoire français de la sclérose en Plaques) registry from 37 institutions in 13 French cities, France
Age (years)
Not reported
Sex (%F)
Not reported
Disease duration (years)
Not reported
Diagnosis
Not reported
Diagnostic criteria
Not applicable
Treatment
Unclear
Disease description
Not reported
Recruitment period
From 2008 onward
Predictors Considered predictors
  • Non‐tabular: FLAIR images, lesion masks from White Matter Hyperintensities segmentation from FLAIR images

  • Tabular: 60 tracts of interest from the ICBM‐DTI 81 white matter labels and sensorimotor tracts atlases in MNI space, whole‐brain lesion load, volume of the lateral ventricles, age, gender, 3D/2D nature of FLAIR sequence


Number of considered predictors
Non‐tabular data + 65
Timing of predictor measurement
At FLAIR imaging (initial in the dataset)
Predictor handling
Unclear, probably continuously
Outcome Outcome definition
Disability (EDSS): EDSS score
Timing of outcome measurement
At 2 years from the initial imaging
Missing data Number of participants with any missing value
19
Missing data handling
Not reported
Analysis Number of participants (number of events)
1427 (continuous outcome)
Modelling method
Ensemble: convolutional neural network (linear and non‐linear registration), random forest (single and dual), and manifold learning
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, full model approach


Hyperparameter tuning
Not reported
Shrinkage of predictor weights
Unclear
Performance evaluation dataset
Development
Performance evaluation method
Random split of approximately 1/3 for test set, further random split of remaining data (90% training, 10% validation)
Calibration estimate
Plot of MSE per EDSS category, MSE 2.21 (validation), 3 (test)
Discrimination estimate
Not applicable
Classification estimate
Not applicable
Overall performance
Not reported
Risk groups
Not applicable
Model  Model presentation
Not reported
Number of predictors in the model
Unstructured data + 65
Predictors in the model
  • Non‐tabular: FLAIR images, lesion masks from White Matter Hyperintensities segmentation from FLAIR images

  • Tabular: 60 tracts of interest from the ICBM‐DTI 81 white matter labels and sensorimotor tracts atlases in MNI space, whole‐brain lesion load, volume of the lateral ventricles, age, gender, 3D/2D nature of FLAIR sequence


Effect measure estimates
Not reported
Predictor influence measure
Most informative features by RF variable importance
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To create an algorithm that combines multiple machine‐learning techniques to predict the expanded disability status scale (EDSS) score of patients with multiple sclerosis at 2 years solely based on age, sex and fluid‐attenuated inversion recovery (FLAIR) MRI data
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Probably confirmatory
Suggested improvements
Using additional factors such as baseline EDSS score, including quantitative metrics coming from T1‐weighted‐based segmentation, a larger cohort or oversampling of high EDSS score examples or generating synthetic data, further validated on an external larger test cohort
Notes Applicability overall
Unclear
Applicability overall rationale
Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to provide a model/tool for the prediction of individual MS outcomes.
Auxiliary references
Vukusic S, Casey R, Rollot F, Brochet B, Pelletier J, Laplaud DA, et al. Observatoire Francais de la Sclerose en Plaques (OFSEP): a unique multimodal nationwide MS registry in France. Mult Scler 2020;26(1):118‐22.
NCT02889965. The French multiple sclerosis registry (OFSEP). https://clinicaltrials.gov/ct2/show/NCT02889965(first received 7 September 2016).
 
Item Authors' judgement Support for judgement
Participants No The data, although collected prospectively in a registry, were known to have inclusion biases. The full dataset (DS1, DS2, DS3) corresponded to all the MRI scans that were recorded in the OFSEP database, meaning inclusion was defined by availability of data.
Predictors No The final features are based on heterogeneously collecting imaging data. It was unclear whether outcomes were known when features were created, but we did not believe this to be a source of bias.
Outcome Yes The outcome was based on EDSS, which we assume to be standard and robust to predictor knowledge. The visit frequency was reported to be about yearly.
Analysis No Calibration was not fully explored and reported (the bar chart and MSE did not allow for an understanding of the direction of the errors). Missing data were addressed in the Participant section. An additional 19 people were dropped due to data quality, but this number was very small (~1%) compared to the total amount. Random splits of the data were used for evaluation. Hyperparameter tuning details were unclear. The number of participants per predictor may be low given the complex modelling techniques used. The internal evaluation used the validation set to weight the models in the aggregate and then again to assess performance of this model. Presentation of the final model was unclear.
Overall No At least one domain is at high risk of bias.

Rocca 2017.

Study characteristics
General information Model name
15‐month clinical and MR
Primary source
Journal
Data source
Cohort, primary
Study type
Development
Participants Inclusion criteria
  • PPMS or probable PPMS (negative CSF examination and positive MRI findings)

  • Patients participated in the study by Rovaris 2006


Exclusion criteria
Not reported
Recruitment
Consecutively from the outpatient populations attending MS clinics of the participating institutions, unclear which, Italy
Age (years)
Mean 51.3
Sex (%F)
50.0
Disease duration (years)
Median (range) 10 (2 to 26)
Diagnosis
100% PPMS
Diagnostic criteria
Thompson 2000
Treatment
  • At recruitment, 16.7% azathioprine, 7.4% mitoxantrone, 9.3% methotrexate, 66.7% no DMT

  • During follow‐up, unclear, at 15 years, 13% azathioprine, 3.7% mitoxantrone, 1.9% methotrexate, 81.5% no DMT


Disease description
EDSS median (IQR): 6.0 (4.5 to 6.5)
Recruitment period
Not reported
Predictors Considered predictors
Age, log disease duration, baseline EDSS, baseline MS severity score, change in EDSS at 15 months, log baseline T2 lesion volume, T2 lesion volume percentage change, log baseline T1 lesion volume, T1 lesion volume percentage change, number of new T2 lesions, number of new T1 lesions, normalised brain volume, percentage brain volume change, cervical cord cross‐sectional area, cervical cord cross‐sectional area percentage change, average lesion mean diffusivity, average lesion mean diffusivity percentage change, average lesion fractional anisotropy, average lesion fractional anisotropy percentage change, average normal‐appearing white matter mean diffusivity, average normal‐appearing white matter mean diffusivity percentage change, average normal‐appearing white matter fractional anisotropy, average normal‐appearing white matter fractional anisotropy percentage change, average grey matter mean diffusivity, average grey matter mean diffusivity percentage change, (in another model: change in EDSS at 5 years)
Number of considered predictors
26
Timing of predictor measurement
At study baseline (cohort entry), at median 15 months after baseline, and at median 56 months (called 5 years) after baseline
Predictor handling
Continuously
Outcome Outcome definition
Disability (EDSS): EDSS change between baseline and at 15‐year follow‐up; any EDSS change is always confirmed by a second visit after a further 3 months
Timing of outcome measurement
Median (IQR): 15.1 years (13.9 years to 15.4 years)
Missing data Number of participants with any missing value
5, only missing outcome reported
Missing data handling
  • Complete case

  • Not explicitly reported

Analysis Number of participants (number of events)
49 (continuous outcome)
Modelling method
Linear regression
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, hybrid stepwise selection using multiple models

    • P value < 0.1


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, LOOCV
Calibration estimate
Not reported
Discrimination estimate
Not applicable
Classification estimate
EDSS change precision within 1 point = 0.776
Overall performance
R2 = 0.61
Risk groups
Not reported
Model  Model presentation
Regression coefficients without the intercept
Number of predictors in the model
5
Predictors in the model
Baseline EDSS, 15‐month EDSS change, 15‐month new T1 hypointense lesions, percentage brain volume change, baseline grey matter mean diffusivity
Effect measure
Linear model coefficients (P value): baseline EDSS −0.54 (< 0.001), 15‐month EDSS change 0.39 (0.09), 15‐month new T1 hypointense lesions 0.28 (0.003), percentage brain volume change −0.24 (0.05), baseline grey matter mean diffusivity 3.86 (0.03)
Predictor influence
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To investigate the added value of magnetic resonance imaging measures of brain and cervical cord damage in predicting long‐term clinical worsening of primary progressive multiple sclerosis compared to simple clinical assessment
Primary aim
The primary aim of this study is not the prediction of individual outcomes. Rather, the focus is on the usefulness of MRI measures.
Model interpretation
Exploratory
Suggested improvements
To widen clinical measures and include further MRI measures
Notes Applicability overall
High
Applicability overall rationale
Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to show the usefulness of new MRI measures.
Auxiliary references
Rovaris M, Gallo A, Valsasina P, Benedetti B, Caputo D, Ghezzi A, et al. Short‐term accrual of gray matter pathology in patients with progressive multiple sclerosis: an in vivo study using diffusion tensor MRI. Neuroimage 2005;24(4):1139‐46.
Rovaris M, Judica E, Gallo A, Benedetti B, Sormani MP, Caputo D, et al. Grey matter damage predicts the evolution of primary progressive multiple sclerosis at 5 years. Brain 2006;129(Pt 10):2628‐34.
 
Item Authors' judgement Support for judgement
Participants Yes The data source was a cohort study, and that was collected with the aim of imaging predictor search, but no eligibility criteria were mentioned other than the diagnosis. It was explicitly stated that inclusion did not depend on disease duration, progression rate, and disability level.
Predictors Unclear The stated interest was in predicting 15 years outcomes, but the data used were at 15 months after baseline, which brought the window down to less than 14 years. The intended time of the model used is unclear.
Outcome Unclear The outcome was based on EDSS, which is considered to be measured objectively. It was conceptualised as a change in EDSS and treated as a score that can be subtracted, and the change was treated as a continuous outcome in a linear regression. However, EDSS is not a numeric scale. It is accepted to be an ordinal scale, and it is unclear if treating it as numeric is appropriate or not.
Analysis No The 5 participants (> 10% of the sample size) lost to follow‐up were excluded from the analysis, without any mention of how they compared to other patients. The number of events per predictor was far lower than 10. Change in EDSS was assumed to follow a normal distribution by modelling it linearly ‐ without any interaction terms ‐ although there was no information if this assumption was violated or not since EDSS itself is not considered a linear scale but an ordinal one. Although cross‐validation was used, it is unclear whether the variable selection process was included within this procedure. There is no predicted vs observed plot or a similar measure of calibration.
Overall No At least one domain is at high risk of bias.

Rovaris 2006.

Study characteristics
General information Model name
Not applicable
Primary source
Journal
Data source
Cohort, primary
Study type
Development
Participants Inclusion criteria
  • PPMS patients


Exclusion criteria
  • Other neurological conditions


Recruitment
Consecutively from the outpatient populations attending MS clinics of the participating institutions, unclear which, Italy
Age (years)
Mean 51.3
Sex (%F)
50.0
Disease duration (years)
Median 10 (range: 2 to 26)
Diagnosis
100% PPMS, 45 definite, 9 probable
Diagnostic criteria
Thompson 2000
Treatment
  • At recruitment, 16.7% azathioprine, 7.4% mitoxantrone, 9.3% methotrexate, and 66.7% no DMT

  • During follow‐up, unclear, at final follow‐up, 13% azathioprine, 3.7% mitoxantrone, 1.9% methotrexate, and 81.5% no DMT


Disease description
EDSS median (range): 5.5 (2.5 to 7.5)
Recruitment period
Not reported
Predictors Considered predictors
Age, gender, disease duration, EDSS, baseline T2 LV, T2 LV percent change, baseline T1 LV, T1 LV percent change, number of new T2 lesions, number of new T1 lesions, normalised brain volume, percentage brain volume percent change, cervical cord cross‐sectional cord area, cervical cord cross‐sectional cord area percent change, average lesion mean diffusivity, average lesion MD percent change, average lesion fractional anisotropy, average lesion fractional anisotropy percent change, average lesion normal‐appearing white matter mean diffusivity, average lesion normal‐appearing white matter mean diffusivity percent change, average lesion normal‐appearing white matter fractional anisotropy, average lesion normal‐appearing white matter fractional anisotropy percent change, average lesion grey matter mean diffusivity, average lesion grey matter mean diffusivity percent change, (adjustment for follow‐up time)
Number of considered predictors
25
Timing of predictor measurement
At study baseline (cohort entry), at 15 months post‐baseline (follow‐up), at final follow‐up (outcome measurement)
Predictor handling
Continuously
Outcome Outcome definition
Disability (EDSS): clinically worsened defined as an EDSS score increase ≥ 1.0, when baseline EDSS was < 6.0, or an EDSS score increase ≥ 0.5, when baseline EDSS was ≥ 6.0; confirmed by a second visit after a 3‐month interval
Timing of outcome measurement
Follow‐up median (range): 56.0 months (35 months to 63 months)
Missing data Number of participants with any missing value
≤ 11, unclear exactly how many participants have any missing
Missing data handling
Complete case, the details are not explicitly reported
Analysis Number of participants (number of events)
52 (35)
Modelling method
Logistic regression
Predictor selection method
  • For inclusion in the multivariable model, univariable analysis

  • During multivariable modelling, significance

    • P value < 0.05


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, LOOCV
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Accuracy = 0.808, sensitivity = 31/35 = 0.89, specificity = 11/17 = 0.65
Overall performance
Nagelkerke's R2 = 0.44
Risk groups
Not reported
Model  Model presentation
Regression coefficients without intercept and follow‐up time
Number of predictors in the model
2 or 3 (unclear if follow‐up time included)
Predictors in the model
Baseline EDSS, grey matter mean diffusivity, follow‐up
Effect measure estimates
OR (95% CI): baseline EDSS 0.48 (0.26 to 0.91), average grey matter mean diffusivity 1.21 (1.06 to 1.38), follow‐up not reported
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To investigate whether conventional and DT‐MRI‐derived measures can predict the long‐term clinical evolution of PP multiple sclerosis
Primary aim
The primary aim of this study is not the prediction of individual outcomes. The focus is on the usefulness of MRI measures.
Model interpretation
Exploratory
Suggested improvements
Not reported
Notes Applicability overall
High
Applicability overall rationale
Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to show the usefulness of MRI measures.
Auxiliary references
Rovaris M, Gallo A, Valsasina P, Benedetti B, Caputo D, Ghezzi A, et al. Short‐term accrual of gray matter pathology in patients with progressive multiple sclerosis: an in vivo study using diffusion tensor MRI. Neuroimage 2005;24(4):1139‐46.
 
Item Authors' judgement Support for judgement
Participants Yes The data source was a cohort study, and the data seems to be collected with the aim of imaging predictor search, but no eligibility criteria were mentioned other than the diagnosis. It was specifically stated that inclusion did not depend on disease duration, progression rate, and disability level.
Predictors Yes The predictors in the final model were based on baseline measurements. Due to the prospective nature of data collection and automated analysis of images, predictors are considered to be assessed without knowledge of outcome data. Both automated MR analysis and EDSS measurements are considered to be objective.
Outcome Yes Even though the same physician assessed EDSS, EDSS is considered to be an objective measure. There was approximately a full 2‐year range in outcome assessment time, but the clinical authors did not find this problematic.
Analysis No The EPV was very low. Predictor selection started with univariate analyses. Neither calibration nor discrimination was addressed for the final model. Cross‐validation was used, but it did not cover all modelling steps. Final model coefficients were provided for EDSS and average GM MD, but not for follow‐up time. At least 6 participants (> 10%) had missing data, and complete case analysis was probably used. The model was adjusted for follow‐up time, which we consider to be an inappropriate use of post‐baseline data.
Overall No At least one domain is at high risk of bias.

Runia 2014.

Study characteristics
General information Model name
Not applicable
Primary source
Dissertation
Data source
Cohort, primary
Study type
Development
Participants Inclusion criteria
  • CIS suggestive of MS

  • Age between 18 years and 50 years

  • Included within 6 months after symptom onset

  • No serious comorbidity


Exclusion criteria
  • Alternative diagnoses


Recruitment
Consecutive patients at the Rotterdam MS Centre, Netherlands
Age (years)
Unclear
Sex (%F)
72.9
Disease duration (years)
Up to 0.5
Diagnosis
100% CIS
Diagnostic criteria
Own definition
Treatment
Not reported
Disease description
Not reported
Recruitment period
Not reported
Predictors Considered predictors
Age (unclear if linear or 3 categories), sex, optic nerve (binary), fatigue, presence of first‐ or second‐degree relatives with MS, abnormal MRI (1 or more lesions), number of T2 lesions (0 lesions/1 to 9 lesions/> 9 lesions), gadolinium enhancement, presence of a lesion in the corpus callosum, modified Barkhof criteria (at least 3 of 4 criteria fulfilled), Swanton criteria, DIS + DIT2010 (the baseline scan fulfils criteria for dissemination in time and place according to the 2010 revised McDonald criteria (Polman 2011)), IgG index, presence of oligoclonal bands, serum 25‐OH‐vitamin D (fatigue as continuous and localisation of first symptoms as optic nerve, spinal cord, brainstem, or other were chosen to be included otherwise due to 'discriminating ability')
Number of considered predictors
≥ 16 or 21 (unclear transformations)
Timing of predictor measurement
At disease onset (CIS) (at study baseline within 6 months after onset)
Predictor handling
All categorised or dichotomised in Table 2/FSS (justified by comparison to the continuous version based on discriminative ability) and number of T1 lesions dichotomised, number of T2 lesions categorised/unclear: age categorised, 25‐OH‐vitamin D dichotomised, IgG Index dichotomised
Outcome Outcome definition
Conversion to definite MS (Poser 1983): time from start of first symptoms to CDMS diagnosed in case of clinical evidence for dissemination in space and time
Timing of outcome measurement
Unclear, up to > 90 months
Missing data Number of participants with any missing value
≥ 356, unclear exactly how many participants have any missing
Missing data handling
Mixed: complete case for outcome, multiple imputation for predictors
Analysis Number of participants (number of events)
431 (109 by 2 years)
Modelling method
Survival, Cox
Predictor selection method
  • For inclusion in the multivariable model, univariable analysis

  • During multivariable modelling, stepwise selection

    • Backward


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Bootstrap
Calibration estimate
Not reported
Discrimination estimate
c‐Statistic:
  • Raw: 0.72

  • Optimism‐corrected: 0.71

  • Simple model with 3 groups: 0.66


Classification estimate
Not reported
Overall performance
Not reported
Risk groups
3 risk categories from the sum score: low (0 to 1), intermediate (2 to 3), and high (4 to 5)
Model  Model presentation
  • Unweighted sum score from 0 to 5

  • Regression model without baseline survival, risk groups and KM plots


Number of predictors in the model
5
Predictors in the model
DIS + DIT2010, corpus callosum lesions, oligoclonal bands, fatigue, abnormal MRI
Effect measure estimates
HR (95% CI): DIS + DIT2010 2.2 (1.4 to 3.3), corpus callosum lesions 1.9 (1.2 to 2.9), oligoclonal bands 1.7 (1.1 to 2.6), fatigue 2.3 (1.4 to 3.9), abnormal MRI 2.3 (0.9 to 6.0)
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To develop a simple and reliable prediction model for MS in patients with CIS
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Probably exploratory
Suggested improvements
External validation
Notes Applicability overall
Low
 
Item Authors' judgement Support for judgement
Participants Yes The study used data collected for the Predicting the Outcome of a Demyelinating Event (PROUD) study protocol, and the inclusion/exclusion criteria were based on the baseline status.
Predictors Yes Predictors were collected before the outcome and therefore blinded, and all are available at onset.
Outcome Yes A standard definition of conversion to definite MS was used. The outcome was probably not blinded to predictors due to the clinical setting, but the outcome is considered relatively objective. The predictors dissemination in time (DIT) and dissemination in space (DIS) in McDonald include MRI, but the Poser does not, so the predictors were not included in the outcome definition.
Analysis No The EPV was low. Patients lost to follow‐up, with no reported reason or comparison with the remaining cohort, were excluded even though it was a survival analysis. Calibration was not assessed. Bootstrap methods were used to account for optimism but probably did not include the whole modelling process. Predictors were selected based on univariate analyses. Many continuous predictors seem to be categorised, although the reason is not very clear.
Overall No At least one domain is at high risk of bias.

Seccia 2020.

Study characteristics
General information Model name
  • 180 days

  • 360 days

  • 720 days


Primary source
Journal
Data source
Routine care, secondary
Study type
Development
Participants Inclusion criteria
  • Visits of patients with RR or SP as disease subtype


Exclusion criteria
  • Visits after the transition of the patient to the SP phase


Recruitment
Sant’Andrea University hospital in Rome, Italy
Age (years)
Mean 29.0 (onset)
Sex (%F)
69.8
Disease duration (years)
Mean 19.0
Diagnosis
100% RRMS
Diagnostic criteria
Latest criteria at time of diagnosis
Treatment
Unclear timing, 73% on DMT at some point
Disease description
Not reported
Recruitment period
1985 to 2018
Predictors Considered predictors
Longitudinal trajectories of age at onset, gender, age at visit, EDSS, number relapses from last visit, pregnancy, relapses frequency, time from last relapse, spinal cord, supratentorial, optic pathway, brainstem‐cerebellum, relapse treatment drugs, first‐line DMT, immunosuppressant, MS symptomatic treatment drugs, second‐line DMT, other concomitant diagnosis drugs, feature saving: status T1, status T2, oligoclonal banding
Number of considered predictors
21 predictor trajectories
Timing of predictor measurement
At multiple visits comprising patient history to the current visit of interest
Predictor handling
Continuously
Outcome Outcome definition
Conversion to progressive MS: transition from the RR to the SP phase within 180 days as assessed by the treating clinician
Timing of outcome measurement
  • 180 days: 180 days from the index visit

  • 360 days: 360 days from the index visit

  • 720 days: 720 days from the index visit

Missing data Number of participants with any missing value
0
Missing data handling
Exclusion of 3 variables with missing values
Analysis Number of participants (number of events)
  • 180 days: 1515 participants, 14,923 records (207)

  • 360 days: 1449 participants, 14,238 records (207)

  • 720 days: 1375 participants, 13,178 records (207)


Modelling method
Long short‐term memory recurrent neural network
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, full model approach


Hyperparameter tuning
Number of neurones chosen through trial and error procedure, dropout probability set to 0.2
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
Random split, train‐test splits preserving outcome proportions with balance‐inducing bagging
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
  • 180 days: cutoff = 0.5, accuracy = 0.98, sensitivity = 0.385, specificity = 0.988, PPV = 0.308

  • 360 days: cutoff = 0.5, accuracy = 0.975, sensitivity = 0.50, specificity = 0.982, PPV = 0.295

  • 720 days: cutoff = 0.5, accuracy = 0.98, sensitivity = 0.673, specificity = 0.985, PPV = 0.427


Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Not reported
Number of predictors in the model
18 predictor trajectories
Predictors in the model
Longitudinal trajectories of age at onset, gender, age at visit, EDSS, number relapses from last visit, pregnancy, relapses frequency, time from last relapse, spinal cord, supratentorial, optic pathway, brainstem‐cerebellum, relapse treatment drugs, first‐line DMT, immunosuppressant, MS symptomatic treatment drugs, second‐line DMT, other concomitant diagnosis drugs
Effect measure estimates
Not reported
Predictor influence measure
Not reported
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To explore the possibility of predicting whether a patient will pass from RR to SP phase in a given time window, using a real‐world dataset, built in close collaboration between computer experts and neurologists
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Exploratory
Suggested improvements
Using the LSTM model with different endpoints that are less unbalanced, using large and well maintained clinical databases
Notes Applicability overall
Unclear
Applicability overall rationale
Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to provide a model/tool for the prediction of individual MS outcomes.
 
Item Authors' judgement Support for judgement
Participants No The data source was routine medical records, and there were no eligibility criteria other than the MS subtype.
Predictors No The data were collected between 1978 and 2018. The first patient entering analysis was seen in 1985, probably due to the missingness of predictors prior to that time. Due to changing diagnostic criteria and technology, predictors such as age at onset, T1/T2 status, and treatment options are expected to be heterogeneous over time.
Outcome No The outcome was SPMS from a routine care database, which is expected to be not standardised or operationalised. Given that the diagnostic criteria changed over time, the outcome definition is expected to be somewhat different over time.
Analysis No The sample size and number of events were low. No discrimination or calibration measures were assessed. Many participants were dropped in the feature‐saving analysis, but here we focused on the record‐saving analysis. For computational reasons, a random split was used for assessment. There was no separation of data used for parameter tuning and data used to estimate performance in future patients. A final model did not appear to be selected, fitted, and presented.
Overall No At least one domain is at high risk of bias.

Skoog 2014.

Study characteristics
General information Model name
MSPS
Primary source
Journal
Data source
Cohort, primary
Study type
Development
Participants Inclusion criteria
  • RRMS patients with at least 1 distinct second attack that confirmed the diagnosis of MS according to the Poser criteria


Exclusion criteria
  • Patients with a single‐attack progressive MS or a second attack in the year of onset of SP


Recruitment
Medical records from the Sahlgrenska Neurology Department and outpatient clinic, the only neurological service in the Gothenburg area, Sweden
Age (years)
Mean 33.5
Sex (%F)
65.0
Disease duration (years)
Median 2
Diagnosis
100% RRMS
Diagnostic criteria
Poser 1983
Treatment
0%
Disease description
Not reported
Recruitment period
Not reported
Predictors Considered predictors
Age at onset attack, current age (spline), gender, time from the second attack, number of previous attacks, monofocal symptoms at onset attack, afferent symptoms at onset attack, complete remission from the onset attack, monofocal symptoms at the most recent attack, afferent symptoms at the most recent attack, complete remission from the most recent attack, time since the most recent attack, severity grade of attack (0 to 2, number of unfavourable 'no' responses to afferent symptoms and complete remission), interaction term between the attack grade and the interval between the most recent attack and current time
Number of considered predictors
≥ 15 (unclear transformations)
Timing of predictor measurement
At last relapse, at time of prognostication
Predictor handling
  • Continuously, unclear: as linear splines

  • At least one interaction was considered

Outcome Outcome definition
Conversion to progressive MS (Lublin 1996): continuous progression for at least 1 year without remission and detectable at time intervals of months or years, determined retrospectively after 1 year of observation and recorded the probable year of onset retrospectively; observation terminated at onset of secondary progression, at censoring due to competing causes of death, other disabling diseases, migration or the end of follow‐up; time since RRMS onset
Timing of outcome measurement
Time from the first relapse to censoring or outcome median (range): 11.5 years (0.7 years to 56.7 years)
Missing data Number of participants with any missing value
171 attacks, unclear exactly how many participants
Missing data handling
Mixed, complete case for attacks, and regression methods for loss to follow‐up
Analysis Number of participants (number of events)
157 (118 participants, unit of analysis is participants, 749 attacks)
Modelling method
Survival, Poisson
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, significance

    • P value < 0.05


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Apparent
Calibration estimate
O:E table
Discrimination estimate
Not reported
Classification estimate
Not reported
Overall performance
Not reported
Risk groups
Low‐risk periods: score < 0.04, high‐risk periods: score > 0.06
Model  Model presentation

Number of predictors in the model
3 (4 df)
Predictors in the model
Age, attack grade, time since last relapse (interaction with attack grade)
Effect measure estimates
log HR (SE): constant −11.5081 (4.0138), lower age predictor 0.3167 (0.1507), upper age predictor −0.0199 (0.0088), attack grade 0.7164 (0.1467), attack grade × time since last relapse −0.0457 (0.0158)
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To search for independent demographic and clinical factors that contributed to the risk of transition to SP and to simplify these complex relationships into a continuous individualised prediction based on repeated assessments expressed as a clinically and scientifically useful score
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Probably exploratory
Suggested improvements
Investigation and replication in an independent patient cohort, taking into account the therapy
Notes Applicability overall
Low
Auxiliary references
Runmarker B, Andersen O. Prognostic factors in a multiple sclerosis incidence cohort with twenty‐five years of follow‐up. Brain 1993;116 (Pt 1):117‐34.
Skoog B, Runmarker B, Winblad S, Ekholm S, Andersen O. A representative cohort of patients with non‐progressive multiple sclerosis at the age of normal life expectancy. Brain 2012;135(Pt 3):900‐11.
 
Item Authors' judgement Support for judgement
Participants No Exclusion criteria contained a second attack in the year of onset of SP, but it was not explained. The author defined the data source as a cohort, but the diagnoses and other categorisations probably needed to be performed retrospectively. Thus, it is unclear whether it was a cohort study or not.
Predictors Yes Although the recruitment lasted 14 years and the median time of follow‐up till event/censoring was over 11 years, the data was from a single centre, and the predictor definitions seem to be clear. Hence, the assessment is considered to be similar amongst patients. The predictor assessments might have been performed retrospectively, but the definitions seem to be clear, leaving no space for subjective judgement.
Outcome No The outcome was clearly defined but it is not an operationalised one. Assessors, experience level, and blinding were not explicitly reported, hence the application of this definition might vary greatly.
Analysis No The EPV was low whether assessed in terms of a binary outcome or continuous one. No information on missing data was provided. All the reported measures were evaluated in the full development set. Discrimination and optimism were not addressed.
Overall No At least one domain is at high risk of bias.

Skoog 2019.

Study characteristics
General information Model name
  • Val

  • Ext Val


Primary source
Journal
Data source
  • Val: cohort, primary

  • Ext Val: registry, secondary


Study type
  • Val: validation (internal validation ‐ some participants from the development excluded)

  • Ext Val: external validation, multiple (location, time)

Participants Inclusion criteria
  • All patients with RRMS that fulfilled the Poser criteria


Exclusion criteria
  • Patients with SP occurring before the second distinct attack


Recruitment
  • Val: medical records from the Sahlgrenska Neurology Department and outpatient clinic, the only neurological service in the Gothenburg area, Sweden

  • Ext Val: patients in the Swedish National MS Registry participating from the Uppsala University Neurology Department, Sweden


Age (years)
Mean 33.0 (CDMS onset, i.e. 2nd attack)
Sex (%F)
  • Val: 65.0

  • Ext Val: 76.0


Disease duration (years)
  • Val: median 2

  • Ext Val: not reported


Diagnosis
100% RRMS
Diagnostic criteria
Poser 1983
Treatment
  • Val: 0%

  • Ext Val:

    • At recruitment, 0%

    • During follow‐up, unclear, few patients received first‐generation DMT (IFN‐beta or glatiramer acetate), 99 out of 1762 patient‐years


Disease description
Not reported
Recruitment period
  • Val: not reported

  • Ext Val: up to 2000

Predictors Considered predictors
Not applicable
Number of considered predictors
Not applicable
Timing of predictor measurement
Not applicable
Predictor handling
Not applicable
Outcome Outcome definition
Conversion to progressive MS (Lublin 1996): continuous progression for at least 1 year, without remission, and detectable at time intervals of months or years, determined retrospectively after one year of observation and recorded the probable year of onset retrospectively
Timing of outcome measurement
  • Val: yearly for 25 years starting January 1st after clinically defining attack, KM estimate time to outcome from 2nd attack median (95% CI): 11.5 years (9.2 years to 13.8 years)

  • Ext Val: yearly for 25 years starting January 1st after clinically defining attack, KM estimate time to outcome from 2nd attack median (95% CI): 15.0 years (10.9 years to 19.1 years)

Missing data Number of participants with any missing value
  • Val: ≤ 12, unclear exactly how many participants have any missing

  • Ext Val: ≤ 27, unclear exactly how many participants have any missing


Missing data handling
Not reported
Analysis Number of participants (number of events)
  • Val: 144 (100)

  • Ext Val: 145 (54)


Modelling method
Not applicable
Predictor selection method
Not applicable
Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
Not applicable
Performance evaluation dataset
  • Val: development

  • Ext Val: external validation


Performance evaluation method
  • Val: apparent, some development participants excluded

  • Ext Val: not applicable


Calibration estimate
  • Val: calibration plot, O:E table, O:E 0.829

  • Ext Val: calibration plot, O:E table, O:E 0.599


Discrimination estimate
Not reported
Classification estimate
Not reported
Overall performance
Not reported
Risk groups
Nor calibration measures: periods with predetermined MSPS strata < 0.025, 0.025 to 0.05, 0.05 to 0.075, 0.075 to 0.10, 0.10 to 0.125, > 0.125 (simplified to < 0.05, 0.05 to 0.075, 0.075 to 0.10, > 0.10)
Model  Model presentation
  • Val: (original model)*recalibration ratio (0.829)

  • Ext Val: (original model)*recalibration ratio (0.599)


Number of predictors in the model
Not applicable
Predictors in the model
Not applicable
Effect measure estimates
Not applicable
Predictor influence measure
Not applicable
Validation model update or adjustment
Recalibration
Interpretation  Aim of the study
To validate this model with an essentially untreated Swedish cohort
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Probably exploratory
Suggested improvements
Demonstrating generalisability in non‐Swedish cohorts collected with different methods, considering DMT use
Notes Applicability overall
Low
 
Item Authors' judgement Support for judgement
Participants No Val: Exclusion criteria contained a second attack in the year of onset of SP, but it was not explained. The author defined the data source as a cohort, but the diagnoses and other categorisations probably needed to be performed retrospectively. Thus, it is unclear whether it was a cohort study or not.
Ext Val: The data source was a registry for which patient data were entered retrospectively. Also, the diagnoses and other categorisations probably needed to be performed retrospectively.
Predictors Yes The data were from a single centre and the predictor definitions seem to be clear. Hence, the assessment is considered to be similar amongst patients. The predictor assessments might have been performed retrospectively, but the definitions seem to be clear, leaving no space for subjective judgement.
Outcome No The outcome was clearly defined but it is not an operationalised one. Assessors, experience level, and blinding were not explicitly reported, hence the application of this definition might vary greatly.
Analysis No Val: Missing values and how they were treated were not clearly discussed. Discrimination was not addressed.
Ext Val: The number of events in the validation was low. Missing data were not clearly discussed. Discrimination was not addressed.
Overall No At least one domain is at high risk of bias.

Sombekke 2010.

Study characteristics
General information Model name
Outcome dichotomous MSSS, predictors clinical + genetics
Primary source
Journal
Data source
Unclear, secondary
Study type
Development
Participants Inclusion criteria
  • Confirmed diagnosis of MS

  • Availability of DNA and clinical assessment of disability

  • (Unclear) unrelated Dutch Caucasian


Exclusion criteria
Not reported
Recruitment
Natural history studies at the MS Centre of the VU University Medical Centre in Amsterdam, Netherlands
Age (years)
Mean 32.4 (onset)
Sex (%F)
63.8
Disease duration (years)
Mean 13.1 (SD 8.3)
Diagnosis
51.2% RRMS, 31.4% SPMS, 17.4% PPMS
Diagnostic criteria
Mixed: Poser 1983, McDonald 2005 (Polman 2005)
Treatment
Not reported
Disease description
EDSS median (IQR): 4.0 (3.5)
Recruitment period
Not reported
Predictors Considered predictors
Gender, onset type, age at onset, SNPs (69)
Number of considered predictors
72
Timing of predictor measurement
At baseline (already available or retrospectively collected)
Predictor handling
Age continuously, SNPs categorised
Outcome Outcome definition
Disability (MSSS): MSSS ≥ 2.5; MSSS denotes the speed of disability accumulation of an individual patient compared with a large patient cohort
Timing of outcome measurement
Not reported
Missing data Number of participants with any missing value
Not reported
Missing data handling
Exclusion
Analysis Number of participants (number of events)
605 (86)
Modelling method
Logistic regression
Predictor selection method
  • For inclusion in the multivariable model, univariable analysis

  • During multivariable modelling, stepwise selection

    • Backward


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Bootstrap (B = 500), unclear if for optimism correction or just confidence intervals
Calibration estimate
Hosmer‐Lemeshow test
Discrimination estimate
c‐Statistic = 0.78 (95% CI 0.75 to 0.84)
Classification estimate
Sensitivity = 0.37, specificity = 0.953, LR+ = 7.9
Overall performance
Nagelkerke's R2 = 0.219
Risk groups
Not reported
Model  Model presentation
Regression coefficients without intercept
Number of predictors in the model
9 (13 df)
Predictors in the model
Age at onset, male gender, progressive onset type, NOS2 level, PITPNCI level, IL2 level, CCL5 level, ILIRN level, PNMT level
Effect measure estimates
OR (95% CI): age at onset 1.05 (1.02 to 1.08), male gender 2.02 (1.14 to 3.57), progressive onset type 4.69 (1.32 to 16.63), NOS2 level AG 0.53 (0.32 to 0.89), NOS2 level AA 0.24 (0.09 to 0.67), PITPNCI level AG 0.45 (0.27 to 0.75), PITPNCI GG 0.59 (0.18 to 1.95), IL2 level GT 0.39 (0.22 to 0.70), IL2 level TT 0.38 (0.17 to 0.84), CCL5 level CT 2.04 (1.12 to 3.70), CCL5 level TT 1.47 (0.38 to 5.67), ILIRN level CT/TT 0.60 (0.36 to 0.99), PNMT level GG 0.52 (0.29 to 0.92)
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To evaluate the additional prognostic value of genetic information of a DNA chip, containing a set of candidate genes, previously correlated to MS (either susceptibility or phenotypes) over available demographics and clinical characteristics, aiming to improve the prediction of the expected disease severity for future patients
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the prognostic value of genetic data.
Model interpretation
Exploratory
Suggested improvements
Test on patients with longer disease duration, use SNPs assessed during the GWAS era, include MRI parameters, yet‐to‐be‐discovered genes, environmental factors
Notes Applicability overall
Low
 
Item Authors' judgement Support for judgement
Participants No The data source on which the prediction model study relied on is unclear. Some predictors were collected retrospectively, and the inclusion criteria contained the availability of DNA and clinical assessment of disability.
Predictors Yes Genetic data are not likely to be biased. Clinical data were simple and easy to collect. The predictors are objective measures and could be available at the time of model use.
Outcome Yes Assuming that MSSS is a relatively standard outcome, it accounts for the difference in time from disease onset in patients, and the outcome was collected at a single point in time.
Analysis No The EPV was less than 10. Only discrimination was assessed, and it is unclear how missing information was handled, other than through exclusion criteria, that was handled in the Participants section. It seems like univariate analyses were used to select the predictors. It is unclear if model overfitting and optimism in model performance were accounted for or not.
Overall No At least one domain is at high risk of bias.

Sormani 2007.

Study characteristics
General information Model name
  • Dev

  • Ext Val


Primary source
Journal
Data source
Randomised trial participants, secondary
Study type
Development + external validation, spectrum
Participants Inclusion criteria
  • Age between 18 years and 50 years

  • Complete clinical and MRI data at baseline

  • Not treated with disease‐modifying agents during the study

  • Diagnosis of MS for at least 1 year

  • RR disease course

  • EDSS score of 0.0 to 5.0

  • At least one documented relapse in the preceding 2 years

  • At least one gadolinium‐enhancing lesion on their screening brain MRI

  • Relapse‐free and steroid‐free in the 30 days prior to inclusion into the study


Exclusion criteria
  • Dev:

    • Prior use of glatiramer acetate, oral myelin, cladribine, and total body irradiation or total lymphoid irradiation

    • Use of immunosuppressive drugs in the 12 months before study entry, or the use of interferons, intravenous immunoglobulins

    • More than 30 consecutive days of chronic steroid treatment, or participation in clinical studies of experimental drugs in the 6 months before study entry

    • Life‐threatening or unstable clinically significant disease, pregnant or lactating

    • Major current gastrointestinal disorders used medication that could cause major gastrointestinal disturbances

    • Medical or psychiatric conditions that could affect their ability to give informed consent

    • Sensitivity to gadolinium chelates or an inability to undergo MRI

  • Ext Val: not reported


Recruitment
  • Dev:

    • Placebo arm participants in the CORAL, an RCT run in 158 centres worldwide

    • Argentina, Australia, Austria, Belgium, Canada, Denmark, France, Germany, Hungary, Israel, Italy, Netherlands, New Zealand, Spain, Sweden, Switzerland, UK, USA

  • Ext Val:

    • Placebo arm participants in European/Canadian GA study from 29 centres

    • Europe (undefined), Canada


Age (years)
  • Dev: median 37.0

  • Ext Val: median 34.0


Sex (%F)
Not reported
Disease duration (years)
  • Dev: median 5.9 (range 0.6 to 30)

  • Ext Val: median 3.8 (range 0.5 to 22)


Diagnosis
100% RRMS
Diagnostic criteria
Poser 1983
Treatment
  • Dev:

    • At recruitment, 0%

    • During follow‐up, 0.6% on DMT

  • Ext Val:

    • 0%


Disease description
  • Dev: EDSS median (range): 2.0 (0.0 to 5.0), prior 2‐year number of relapses (range): 2 (1 to 11)

  • Ext Val: EDSS median (range): 2.0 (0.0 to 4.0), prior 2‐year number of relapses (range): 2 (1 to 8)


Recruitment period
  • Dev: 2000 to 2001

  • Ext Val: 1997 to 1998

Predictors Considered predictors
  • Dev: age at onset, disease duration, prior 2‐year relapses, EDSS, Gd‐enhancing lesions, Gd‐enhancing lesion volume, T2‐hyperintense lesion volume, T1‐hypointense lesion volume

  • Ext Val: not applicable


Number of considered predictors
  • Dev: ≥ 12 (unclear transformations)

  • Ext Val: not applicable


Timing of predictor measurement
  • Dev: at study baseline (RCT, entry at least 1 year after disease onset)

  • Ext Val: not applicable


Predictor handling
  • Dev: continuously except testing square‐root and dichotomised transformations in the final model

  • Ext Val: not applicable

Outcome Outcome definition
Relapse: time of first relapse occurrence defined as appearance of one or more new neurological symptoms or the reappearance of one or more previously experienced neurological symptoms; neurological deterioration had to last at least 48 hours and be preceded by a relatively stable or improving neurological state in the prior 30 days; the symptoms had to be accompanied by objective changes in the neurological examination corresponding to an increase of at least 0.5 points on the EDSS, or one grade in the score of 2 or more functional systems or 2 grades in 1 functional system; deterioration associated with fever or infections that can cause transient, secondary impairment of neurological function or change in bowel, bladder, or cognitive function alone was not accepted as a relapse
Timing of outcome measurement
  • Dev: follow‐up median (range): 14 months (0.4 months to 16 months), time to outcome from study entry mean (SD): 47 weeks (0.9 weeks)

  • Ext Val: follow‐up median (range): 9 months (2.6 months to 10 months), time to outcome from study entry mean (SD): 26 weeks (1.4 weeks)

Missing data Number of participants with any missing value
9, not explicitly reported in this report
Missing data handling
Exclusion
Analysis Number of participants (number of events)
Dev: 539 (unclear, approximately 270)
Ext Val: 117 (not reported)
Modelling method
  • Dev: survival, Cox

  • Ext Val: not applicable


Predictor selection method
  • Dev:

    • For inclusion in the multivariable model, univariable analysis

    • During multivariable modelling, significance

      • P value < 0.01

  • Ext Val: not applicable


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
  • Dev: none

  • Ext Val: not applicable


Performance evaluation dataset
  • Dev: development

  • Ext Val: external validation


Performance evaluation method
  • Dev: apparent

  • Ext Val: not applicable


Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Not reported
Overall performance
Not reported
Risk groups
  • Dev: according to the score distribution low risk (score below the 95th percentile) and high risk (score above the 95th percentile) of relapse occurrence, score at 95th percentile = 1.84

  • Ext Val: according to cutoffs from development set (score = 1.84)

Model  Model presentation
  • Dev: regression model formula with survival probability for 6 months and 1 year

  • Ext Val: not applicable


Number of predictors in the model
  • Dev: 2

  • Ext Val: not applicable


Predictors in the model
  • Dev: previous 2 years relapses, number of enhancing lesions

  • Ext Val: not applicable


Effect measure estimates
  • Dev: log HR: 6‐month baseline survival 0.92, square root of previous 2 years relapses 0.64, square root of number of enhancing lesions 0.26 (1 year baseline survival 0.86)

  • Ext Val: not applicable


Predictor influence measure
Not applicable
Validation model update or adjustment
  • Dev: not applicable

  • Ext Val: none

Interpretation  Aim of the study
To generate and validate a composite (clinical and MRI‐based) score able to identify individual patients with relapsing‐remitting multiple sclerosis (RRMS) with a high risk of experiencing relapses in the short term
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Probably exploratory
Suggested improvements
A validation in natural history cohorts (but this is not feasible because current patients are treated)
Notes Applicability overall
Low
Auxiliary references
Comi G, Filippi M, Wolinsky JS. European/Canadian multicentre, double‐blind, randomized, placebo‐controlled study of the effects of glatiramer acetate on magnetic resonance imaging‐‐measured disease activity and burden in patients with relapsing multiple sclerosis. European/Canadian glatiramer acetate study group. Ann Neurol 2001;49(3):290‐7.
Filippi M, Wolinsky JS, Comi G. Effects of oral glatiramer acetate on clinical and MRI‐monitored disease activity in patients with relapsing multiple sclerosis: a multicentre, double‐blind, randomised, placebo‐controlled study. Lancet Neurol 2006;5(3):213‐20.
 
Item Authors' judgement Support for judgement
Participants Yes Although participants with missing predictor measurements were excluded from the current study, they probably comprised only a small percentage (< 5%) of the eligible population from the original RCT cohort.
Predictors Yes Dev: The predictors appear to be collected at baseline. The predictors were not explicitly named in the text but table 2 consists of predictors entering the univariable and multivariable analysis. The data source is an RCT, so assessment is assumed to be similar across patients.
Val: The data from this trial were collected using different MRI machines of various strengths, but contrast‐enhancing lesions should be robust to the use of different machines.
Outcome Yes The details of the outcome definition were not explicitly reported in the prediction model study but can be found in the RCT. We expect the outcome to be standardised and determined appropriately due to the data source. The outcome may or may not be determined with the knowledge of predictors, but the outcome is considered an objective one.
Analysis No Dev: Variable selection began with univariable analysis. No discrimination or calibration measures were reported. Although external validation was done, there was no indication of model shrinkage or other attempts at addressing overfitting and optimism.
Val: The number of events was not reported but was expected to be at best 56.5. No relevant performance measures were reported.
Overall No At least one domain is at high risk of bias.

Spelman 2017.

Study characteristics
General information Model name
Not applicable
Primary source
Journal
Data source
Cohort, primary
Study type
Development
Participants Inclusion criteria
  • MS patients with CIS

  • Onset less than 12 months from enrolment

  • Minimum data collection at each visit (EDSS, KFSS, relapse onset date, glucocorticoid therapy for relapse, initiation and discontinuation dates for DMTs collected at each visit)

  • Annual follow‐up

  • At least 1 EDSS recorded within 12 months of onset but not included within 30 days of onset

  • First brain MRI scan classification using Barkhof–Tintore criteria for lesion dissemination in space within 12 months of onset


Exclusion criteria
  • PPMS


Recruitment
Fifty MS clinics participating in MSBase Incident Study (MSBASIS), a substudy within MSBase registry
Age (years)
Median 31.6 (at MS onset)
Sex (%F)
70.5
Disease duration (years)
Up to 1 year
Diagnosis
100% CIS
Diagnostic criteria
Poser 1983
Treatment
  • At recruitment, not reported

  • During follow‐up, 11.8% IM‐IFNβ‐1a, 9.3% SC‐IFNβ‐1a, 5.1% IFNβ‐1b, 3.8% glatiramer acetate (adds up to 30% not 27.6%)


Disease description
EDSS median (IQR): 2 (1 to 2.5)
Recruitment period
From 2004 onward
Predictors Considered predictors
Sex, age at onset, EDSS, first symptom location (categorical with optic pathways as reference, supratentorial, brainstem or spinal cord), T1 gadolinium lesions (binary), T2 hyperintense lesions (3 levels), infratentorial lesions (binary), juxtacortical (binary), periventricular (3 levels), number of spinal T1 gadolinium lesions (binary), number of spinal T2 lesions (binary), oligoclonal bands (binary), (unclear adjustment for country)
Number of considered predictors
≥ 16 (unclear how many interactions tested)
Timing of predictor measurement
At disease onset (CIS) (up to 12 months after disease onset)
Predictor handling
  • Age and EDSS continuously, predictors based on number of lesions dichotomised or categorised and also continuously, (for unclear predictors) linearity tested by incorporating quadratic transformations into the model

  • At least one interaction considered

Outcome Outcome definition
Conversion to definite MS (Poser 1983): time to first relapse following CIS, i.e. CDMS, defined as examination evidence of a symptomatic second neurological episode attributable to demyelination of more than 24 hours duration and more than 4 weeks from the initial attack; follow‐up time was defined as the time that lapsed between the date of CIS onset (baseline) and either the date of first post‐CIS relapse or, where no subsequent post‐CIS relapse was observed, the date of the last recorded clinic visit
Timing of outcome measurement
Follow‐up median (IQR): 1.92 years (0.90 years to 3.71 years)
Missing data Number of participants with any missing value
≤ 1017, unclear how many of the exclusions are due to missing
Missing data handling
Exclusion
Analysis Number of participants (number of events)
3296 (1953)
Modelling method
Survival, Cox
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, significance

    • P value < 0.05


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
Unclear
Performance evaluation dataset
Development
Performance evaluation method
Bootstrap (B = 1500)
Calibration estimate
Calibration plot
Discrimination estimate
  • At 6 months, c‐statistic = 0.76

  • At 1 year, c‐statistic = 0.81

  • At 2 years, c‐statistic = 0.81

  • At 3 years, c‐statistic = 0.82

  • At 4 years, c‐statistic = 0.83

  • At 5 years, c‐statistic = 0.83


Classification estimate
Not reported
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
  • Nomogram for 1‐year outcomes

  • Nomograms for 6‐month, 2, 3, 4, and 5‐year outcomes


Number of predictors in the model
7 (11 df)
Predictors in the model
Sex, age, EDSS, first symptom location, T2 Infratentorial lesions, T2 periventricular lesions, OCB in CSF
Effect measure estimates
Not reported
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To examine determinants of second attack and validate a prognostic nomogram for individualised risk assessment of clinical conversion
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Exploratory
Suggested improvements
External validation with a larger sample, more patients with 0 T2 lesions
Notes Applicability overall
Unclear
Applicability overall rationale
It is unclear whether some patients had already experienced the outcome at the time of predictor collection.
 
Item Authors' judgement Support for judgement
Participants No Although the data source is appropriate, eligibility criteria contained both baseline predictor measurement and regular follow‐up, which may cause risk of bias.
Predictors No Due to the prospective collection of the data, predictors were probably assessed without knowledge of outcomes and because only baseline variables were used, all predictors should be available at intended time of prediction. No information is reported on whether predictors were defined and assessed in a similar way for all patients. Especially the imaging predictors from multiple centres in many countries participating in the MSBase registry are likely to be introducing risk of bias.
Outcome Unclear We consider relapses to be a relatively objective outcome; therefore, we believe that the assessment with knowledge of predictor information does not increase the risk of bias. However, the predictors were collected within 12 months of onset, and according to the survival curves, a substantial amount of patients (between 0.2 and 0.7) already may have had the event at the time of predictor collection.
Analysis No Some continuous predictors were categorised with only 2 to 3 levels. Over 1000 enrolled patients were excluded from the study without any description of the reasons and how they differed from those included; thus, it is unclear whether complete case analysis was appropriate. The authors mentioned adjusting for country, but the methods were not described, making it unclear whether hierarchical models were used or a categorical predictor included, or some other method was used. The method of arriving at the weights in the nomogram and if any optimism correction was done are unclear.
Overall No At least one domain is at high risk of bias.

Szilasiová 2020.

Study characteristics
General information Model name
Not applicable
Primary source
Journal
Data source
Cohort, secondary
Study type
Development
Participants Inclusion criteria
  • Diagnosis of MS based on the revised 2001 McDonald criteria (McDonald 2001)

  • Being older than 18 years

  • The ability to give written informed consent


Exclusion criteria
  • Major hypacusis or deafness

  • Relapse or corticosteroid use within 30 days preceding the study assessments


Recruitment
Department of Neurology of Louis Pasteur University Hospital in Kosice, Slovak Republic
Age (years)
Unclear
Sex (%F)
64.7
Disease duration (years)
Mean 6.7 (range 0.5 to 30)
Diagnosis
63.5% RRMS, 29.4% SPMS, 7.1% PPMS
Diagnostic criteria
McDonald 2001
Treatment
Reported for the original cohort of 110 patients, unclear timing: 64.7% interferon‐beta and 35.3% some DMT
Disease description
EDSS mean (SD, range): 3.03 (1.5, 1.0 to 7.0)
Recruitment period
2003 to 2018
Predictors Considered predictors
Age, sex, disease duration, EDSS, MS form (SP vs R or P), P300 latency, P300 amplitude, lesion load (# T2 lesions), education (primary, secondary, university)
Number of considered predictors
11
Timing of predictor measurement
At study baseline (cohort entry)
Predictor handling
Continuously
Outcome Outcome definition
Disability (EDSS): clinically worsened defined as an EDSS ≥ 5
Timing of outcome measurement
15 years
Missing data Number of participants with any missing value
25
Missing data handling
Complete case
Analysis Number of participants (number of events)
85 (not reported)
Modelling method
Logistic regression
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, stepwise selection

    • Backward


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Apparent
Calibration estimate
Not reported
Discrimination estimate
Unclear due to mismatch between ROC curve and reported statistics: 0.94 (95% CI 0.889 to 0.984)
Classification estimate
Unclear because these points do not correspond to point on plot, sensitivity = 0.94, specificity = 0.89
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Full regression model
Number of predictors in the model
6 (7 df)
Predictors in the model
Sex, age, MS form, EDSS, MS duration, P300 latency (ms)
Effect measure estimates
OR (95% CI); sex: 0.17 (0.02 to 1.295), age: 0.87 (0.74 to 1.040), RRMS: 3,156,828,983.597 (0.000 to NA), PMS: 751,474,054.21 (0.000 to NA), EDSS: 3.06 (1.028 to 9.139), MS duration: 1.21 (1.007 to 1.451), P300 latency (ms): 1.06 (1.008 to 1.110), constant: 0.0 (NA to NA)
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To determine whether ERPs have a prognostic significance for a patient’s future disability
Primary aim
The primary aim of this study is not the prediction of individual outcomes. The focus is on usefulness of ERPs.
Model interpretation
Probably exploratory
Suggested improvements
Not reported
Notes Applicability overall
High
Applicability overall rationale
Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to identify predictors. Additionally, this study included participants who had already experienced the outcome at baseline.
 
Item Authors' judgement Support for judgement
Participants No The data source is a cohort study. But, according to Table 1, the EDSS range at study entry was from 1.0 to 7.0, which means participants with the outcome at entry were included in the analysis.
Predictors Yes This was a single‐centre study with well described procedures for electrophysiological predictor collection. The other predictors were standard and/or easy to assess. Predictors were assessed at study entry.
Outcome Yes The outcome was based on an EDSS landmark and was assessed at 15‐year follow‐up. Predictor information was probably known, but we consider EDSS to be a robust outcome measure.
Analysis No The sample size was too low and the number of events was not reported. Participants lost to follow‐up were excluded from the analysis instead of being accounted for in time‐to‐event analysis. Calibration was not assessed. Shrinkage was not applied and only apparent performance measures were reported.
Overall No At least one domain is at high risk of bias.

Tacchella 2018.

Study characteristics
General information Model name
  • 180 days

  • 360 days

  • 720 days


Primary source
Journal
Data source
Routine care, secondary
Study type
Development
Participants Inclusion criteria
  • RRMS at the time of the visit(s) included in the database and transitioned to the SP phase at some time point


Exclusion criteria
Not reported
Recruitment
Outpatients of the MS service of Sant'Andrea hospital in Rome, Italy
Age (years)
Not reported
Sex (%F)
Not reported
Disease duration (years)
Not reported
Diagnosis
100% RRMS
Diagnostic criteria
McDonald 2017 (Thompson 2018b)
Treatment
Unclear timing and distribution, 89.3% on DMTs, 43% on first‐line treatments, 57% on second‐line treatments
Disease description
Not reported
Recruitment period
Not reported
Predictors Considered predictors
Age at onset, cognitive impairment, age at visit, visual impairment, routine visit, visual field deficits/scotoma, suspected relapse, diplopia (oculomotor nerve palsy), ambulation index, nystagmus, 9HPT (right), trigeminal nerve impairment, 9HPT (left), hemifacial spasm, PASAT, dysarthria, timed 25‐foot walk, dysphagia, impairment in daily living activities, facial nerve palsy, upper‐limb motor deficit, lower‐limb motor deficit, upper limb ataxia, lower limb ataxia, dysaesthesia, hypoesthesia, paraesthesia, Lhermitte's sign, urinary dysfunction, fatigue, bowel dysfunction, mood disorders, sexual dysfunction, tremor, ataxic gait, headache, paretic gait, spastic gait, EDSS, pyramidal functions, cerebellar functions, brainstem functions, sensory functions, bowel and bladder function, visual function, cerebral (or mental) functions, ambulation score
Number of considered predictors
46
Timing of predictor measurement
At visit of interest
Predictor handling
Continuously
Outcome Outcome definition
Conversion to progressive MS: SP stage defined as a history of gradual worsening following the initial RR course determined by objective measure of change of disability (EDSS score) independent of relapses over a period of at least 6 or 12 months
Timing of outcome measurement
  • 180 days: at 180 days after visit of interest

  • 360 days: at 360 days after visit of interest

  • 720 days: at 720 days after visit of interest

Missing data Number of participants with any missing value
Not reported
Missing data handling
Not reported
Analysis Number of participants (number of events)
  • 180 days: 527, unit of analysis is visit of 84 participants (65)

  • 360 days: 527, unit of analysis is visit of 84 participants (125)

  • 720 days: 527, unit of analysis is visit of 84 participants (211)


Modelling method
Random forest
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, full model approach


Hyperparameter tuning
Default parameters of SciKit library
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, (a form of) LOOCV
Calibration estimate
Not reported
Discrimination estimate
c‐Statistic:
  • 180 days: 0.71 (95% CI 0.66 to 0.76)

  • 360 days: 0.67 (95% CI 0.62 to 0.71)

  • 720 days: 0.68 (95% CI 0.64 to 0.72)


Classification estimate
Not reported
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Not reported
Number of predictors in the model
46
Predictors in the model
Age at onset, cognitive impairment, age at visit, visual impairment, routine visit, visual field deficits/scotoma, suspected relapse, diplopia (oculomotor nerve palsy), ambulation index, nystagmus, 9HPT (right), trigeminal nerve impairment, 9HPT (left), hemifacial spasm, PASAT, dysarthria, timed 25‐foot walk, dysphagia, impairment in daily living activities, facial nerve palsy, upper‐limb motor deficit, lower‐limb motor deficit, upper limb ataxia, lower limb ataxia, dysaesthesia, hypoesthesia, paraesthesia, Lhermitte's sign, urinary dysfunction, fatigue, bowel dysfunction, mood disorders, sexual dysfunction, tremor, ataxic gait, headache, paretic gait, spastic gait, EDSS, pyramidal functions, cerebellar functions, brainstem functions, sensory functions, bowel and bladder function, visual function, cerebral (or mental) functions, ambulation score
Effect measure estimates
Not reported
Predictor influence measure
Not reported
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To obtain predictions on the probability that MS patients in the RR phase will convert to a SP form within a certain time frame
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on collective intelligence rather than individual prediction.
Model interpretation
Exploratory
Suggested improvements
(For hybrid model) to investigate the best ways to combine predictions of different agents, to recruit more expert opinions
Notes Applicability overall
Unclear
Applicability overall rationale
Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.
 
Item Authors' judgement Support for judgement
Participants No Routine care data were used with no reported inclusion/exclusion criteria other than diagnostic subtype.
Predictors Yes The study was conducted on data from a single centre, and data were collected according to international standards.
Outcome Yes The outcome was defined based on gradual increase in EDSS.
Analysis No 180 days: Calibration was not assessed. Parameter and model tuning was not described. The number of events was low. The final model is unclear.
360 days and 720 days: Calibration was not assessed. Parameter and model tuning was not described. The number of events is low. There is no indication that the model is fit to the entire dataset to finalise the model.
Overall No At least one domain is at high risk of bias.

Tommasin 2021.

Study characteristics
General information Model name
Radiological
Primary source
Journal
Data source
Unclear, secondary
Study type
Development
Participants Inclusion criteria
  • Diagnosis of MS according to McDonald's criteria (2010 (Polman 2011), 2017 (Thompson 2018b))

  • Between 18 years and 70 years of age

  • Clinical assessment and MRI examination not more than 1 month apart

  • Clinical follow‐up available after a minimum of 2 years from MRI examination


Exclusion criteria
Not reported
Recruitment
The Human Neuroscience Department of Sapienza University, the MS centre of the Federico II University, Italy
Age (years)
Mean 39.7
Sex (%F)
63.8
Disease duration (years)
Mean 9.9 (SD 8.06)
Diagnosis
74.8% RRMS, 25.2% PMS
Diagnostic criteria
Mixed: McDonald 2010 (Polman 2011), McDonald 2017 (Thompson 2018b)
Treatment
Unclear timing, 32.5% first line, 39.9% 2nd line, 27.6% none
Disease description
EDSS median (range): 3.0 (0.0 to 7.5)
Recruitment period
2003 to 2018
Predictors Considered predictors
Clinical: disease duration, age, sex, disease phenotype, EDSS at baseline, therapy, time‐to‐follow‐up; radiological: mean diffusivity of normal appearing WM, GM volume, WM volume, T2 lesion load, cerebellar volume, thalamic volume, fractional anisotropy of normal appearing WM, site, random feature
Number of considered predictors
16
Timing of predictor measurement
At assessment (not defined), at follow‐up
Predictor handling
Continuously
Outcome Outcome definition
Disability (EDSS): disability progression defined as a minimum increase in EDSS (since baseline EDSS) of 1.5 (0), 1 (≤ 5.5), or 0.5 (> 5.5)
Timing of outcome measurement
Follow‐up mean (SD, range): 3.93 years (0.95 years, 2 years to 6 years)
Missing data Number of participants with any missing value
Not reported
Missing data handling
Not reported
Analysis Number of participants (number of events)
163 (58)
Modelling method
Random forest
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, multiple models


Hyperparameter tuning
Not reported
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, 1000 random splits (only those with accuracy difference < 0.02 between training and validation considered)
Calibration estimate
Not reported
Discrimination estimate
c‐Statistic = 0.92
Classification estimate
Accuracy = 0.92, sensitivity = 0.92, specificity = 0.91
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
List of predictors (model selected)
Number of predictors in the model
4
Predictors in the model
T2 lesion load, cerebellar volume, thalamic volume, fractional anisotropy of normal appearing WM
Effect measure estimates
Not reported
Predictor influence measure
Feature importance (percentage of classifiers in which predictor more important than random feature)
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To evaluate the accuracy of a data‐driven approach, such as machine learning classification, in predicting disability progression in MS
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on imaging and machine learning.
Model interpretation
Exploratory
Suggested improvements
Prospective studies to evaluate other aspects of brain involvement, as well as other CNS structures (e.g. spinal cord) using additional techniques (e.g. fMRI, MTR, qMRI)
Notes Applicability overall
Unclear
Applicability overall rationale
Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.
 
Item Authors' judgement Support for judgement
Participants No The data source is unclear. Participants were included based on availability of follow‐up data at least 2 years later.
Predictors Unclear The final model contains radiological predictors that were assessed by trained experts at 2 centres at study entry. However, it is unclear if follow‐up time was included in the final model as a predictor.
Outcome No The timing of the outcome assessment was any time between 2 years and 6 years, making assessment different across patients.
Analysis No The sample size was small. No information on missing data was reported. It was unclear if the differing follow‐up time among the patients was appropriately accounted for. Only discrimination was assessed. It was not clear that the methods used optimally accounted for overfitting and optimism. A final model was not presented.
Overall No At least one domain is at high risk of bias.

Tousignant 2019.

Study characteristics
General information Model name
3D CNN + lesion masks
Primary source
Conference proceeding
Data source
Randomised trial participants, secondary
Study type
Development
Participants Inclusion criteria
  • RRMS

  • Placebo arm of trial


Exclusion criteria
  • Participants not completing study


Recruitment
Participants in 2 large proprietary, multi‐scanner, multi‐centre clinical trials (names not reported)
Age (years)
Not reported
Sex (%F)
Not reported
Disease duration (years)
Not reported
Diagnosis
100% RRMS
Diagnostic criteria
Not reported
Treatment
0%
Disease description
Not reported
Recruitment period
Not reported
Predictors Considered predictors
MRI channels: volumes from T1‐weighted pre‐contrast (T1c), T1‐weighted post‐contrast (T1p), T2‐weighted (T2w), proton density‐weighted (Pdw), fluid‐attenuated inversion (FLAIR); T2‐weighted lesion masks; gadolinium enhanced lesion masks
Number of considered predictors
Non‐tabular data
Timing of predictor measurement
At imaging
Predictor handling
Continuously
Outcome Outcome definition
Disability (EDSS): increase in EDSS score within 1 year and sustained for ≥ 12 weeks (baseline EDSS of 0: increase of ≥ 1.5; baseline EDSS of 0.5 to 5.5: increase of ≥ 1; baseline EDSS of ≥ 6: increase of ≥ 0.5)
Timing of outcome measurement
1 year
Missing data Number of participants with any missing value
Not reported
Missing data handling
Exclusion
Analysis Number of participants (number of events)
1083, unit of analysis is probably observations, of 465 participants (103)
Modelling method
3D convolutional neural network
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, multiple models


Hyperparameter tuning
Unclear, tuning parameters and cross‐validation mentioned, but not tuning details
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, 4‐fold (75% training, 15% validation, 10% test)
Calibration estimate
Not reported
Discrimination estimate
c‐Statistic = 0.701 (SD 0.027)
Classification estimate
Not reported
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
List of predictors (no selection)
Number of predictors in the model
Unstructured data
Predictors in the model
MRI channels: volumes from T1‐weighted pre‐contrast (T1c), T1‐weighted post‐contrast (T1p), T2‐weighted (T2w), proton density‐weighted (Pdw), fluid‐attenuated inversion (FLAIR); T2‐weighted lesion masks; gadolinium enhanced lesion masks
Effect measure estimates
Not reported
Predictor influence measure
Not reported
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To present the first automatic end‐to‐end deep learning framework for the prediction of future patient disability progression (1 year from baseline) based on multi‐modal brain magnetic resonance images (MRI) of patients with multiple sclerosis (MS)
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Probably confirmatory
Suggested improvements
Alternative ways of quantifying uncertainty, adapting architecture to leverage longitudinal clinical information (e.g. age, disability stage)
Notes Applicability overall
High
Applicability overall rationale
The predictors used were imaging features and no other predictor domain was considered for use in the model.
 
Item Authors' judgement Support for judgement
Participants Unclear The data source was unspecified randomised clinical trials but the participants were excluded based on incomplete follow‐up without any specification of the reasons or the number.
Predictors Yes The predictors were collected during a clinical trial, so we expect them to be defined and assessed homogeneously. Several scanners across multiple sites were used, but standardisation across sites is mentioned. Expert raters were used in semi‐automated procedures.
Outcome Yes We consider standard EDSS outcomes rather objective. The outcome was assessed within clinical trials.
Analysis No The sample size was probably small (the highest possible number of events was 103 with 7 inputs and a very complex model). Discrimination was addressed but not calibration. It was unclear whether complexities in the data were appropriately addressed as the analysis appeared to be at the visit level. It was unclear if the optimism in performance was accounted for due to the fact that the outcome assessment periods might be overlapping. No model was provided for future use.
Overall No At least one domain is at high risk of bias.

Vasconcelos 2020.

Study characteristics
General information Model name
  • Dev

  • Ext Val


Primary source
Journal
Data source
Unclear
Study type
Development + external validation, time
Participants Inclusion criteria
  • Definitive diagnosis of RRMS based on Poser criteria for patients attended until 2001 and the McDonald criteria 2001 for patients attended since 2002

  • Disease duration of at least 2 years

  • Available data on disease progression, with complete data provided in an equal interval of time


Exclusion criteria
  • Patients who had incomplete data on the longitudinal evolution of the disease


Recruitment
MS Centre of the Hospital da Lagoa in Rio de Janeiero, Brazil
Age (years)
  • Dev: mean 28.7 (onset)

  • Ext Val: mean 28.5 (onset)


Sex (%F)
  • Dev: 76.0

  • Ext Val: 78.5


Disease duration (years)
  • Dev: mean 16.0 (SD 9.42)

  • Ext Val: mean 13.22 (SD 9.72)


Diagnosis
100% RRMS
Diagnostic criteria
Mixed: Poser 1983, McDonald 2001
Treatment
  • Dev: unclear timing, 58% treated before EDSS 3

  • Ext Val: unclear timing, 77% treated before EDSS 3


Disease description
Patients with more than one relapse at first year of disease: 74%
Recruitment period
1993 to 2017
Predictors Considered predictors
  • Dev: gender, > 1 relapse in the first year of the disease, pyramidal and cerebellar impairment at onset of the disease, treatment before to reach EDSS 3, < 30 years of age at onset, African descent, < 2 years time between the first and second relapses, recovery after first relapse

  • Ext Val: not applicable


Number of considered predictors
  • Dev: 8

  • Ext Val: not applicable


Timing of predictor measurement
  • Dev: at multiple visits (unclear if CIS or RR onset) to at least 2 years post‐onset

  • Ext Val: not applicable


Predictor handling
  • Dev: all dichotomised

  • Ext Val: not applicable

Outcome Outcome definition
Conversion to progressive MS: time elapsed until the year of confirmed progressive and sustained worsening for at least 6 months and not associated with the occurrence of acute relapse, an irreversible increase of at least 1.0 points in the EDSS when its value was ≤ 5.5 or 0.5 point when it was > 5.5 (independent of relapses and corticosteroid treatment)
Timing of outcome measurement
  • Dev: time to outcome mean (SD): 13.70 (8.88)

  • Ext Val: time to outcome mean (SD): 11.45 (7.40)

Missing data Number of participants with any missing value
  • Dev: 249

  • Ext Val: 250


Missing data handling
Exclusion
Analysis Number of participants (number of events)
  • Dev: 287 (88)

  • Ext Val: 142 (31)


Modelling method
  • Dev: survival, Cox

  • Ext Val: not applicable


Predictor selection method
  • Dev:

    • For inclusion in the multivariable model, univariable analysis

    • During multivariable modelling, significance

      • P value < 0.05

  • Ext Val: not applicable


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
  • Dev: none

  • Ext Val: not applicable


Performance evaluation dataset
  • Dev: development

  • Ext Val: external validation


Performance evaluation method
  • Dev: apparent

  • Ext Val: not applicable


Calibration estimate
  • Dev: events per score level

  • Ext Val: O:E table (unclear), Hosmer‐Lemeshow test


Discrimination estimate
Not reported
Classification estimate
Not reported
Overall performance
Not reported
Risk groups
2 risk categories: high (> 2 points), low (≤ 2 points)
Model  Model presentation
  • Dev:

    • Unweighted sum score from 0 to 5 (unclear if based on refit without 'recovery', found to be insignificant at multivariable analysis)

    • Risk groups

  • Ext Val: not applicable


Number of predictors in the model
  • Dev:

    • 5, unclear if the coefficient for 'recovery' is needed or the model fit without 'recovery' is presented

  • Ext Val: not applicable


Predictors in the model
  • Dev: pyramidal and cerebellar impairment at onset of the disease, treatment before EDSS 3, age at disease onset, African descent, time between first and second relapses (unclear if the coefficient for 'recovery' is needed or the model fit without 'recovery' is presented)

  • Ext Val: not applicable


Effect measure estimates
  • Dev: HR (95% CI): pyramidal and cerebellar impairment at onset of the disease: 2.5 (1.2 to 5.1), treatment before EDSS 3 2.6 (1.6 to 4.2), age at disease onset 2.0 (1.2 to 3.1), African descent 1.8 (1.1 to 2.8), time between first and second relapses 1.9 (1.2 to 3.0), not results of entire model (unclear if the coefficient for 'recovery' is needed or the model fit without 'recovery' is presented)

  • Ext Val: not applicable


Predictor influence measure
Not applicable
Validation model update or adjustment
  • Dev: not applicable

  • Ext Val: none

Interpretation  Aim of the study
To construct a clinical risk score for MS long‐term progression that could be easily applied in clinical practice
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Exploratory
Suggested improvements
Validation in different cohorts, especially those with greater diversity concerning the genetic background, and exploration of other factors capable of influencing disease progression (e.g. neuroimaging data)
Notes Applicability overall
Low
 
Item Authors' judgement Support for judgement
Participants No The data source is unclear. Excluding participants without complete follow‐up may introduce selection bias. Also, at least 248 patients were excluded for missing data.
Predictors Unclear There is no reason to believe predictors were assessed differently across patients. Predictors were collected from onset up to at least 2 years later. It is not clearly stated at what point the model was applied and whether onset referred to CIS onset or RR onset.
Outcome Yes The outcome was based on observing an EDSS increase that was confirmed at a later time point. It is unclear whether the outcome was assessed blinded to the predictors, but we do not consider this to be problematic because EDSS assessment is relatively objective, and the definition required confirmation at 6 months. Participants have regular follow‐ups due to inclusion criteria, so assessment timing is likely homogenous.
Analysis No Dev: The EPV was 11, which is relatively low. Continuous variables such as age were treated as binary variables. Univariable predictor selection was used. Discrimination and calibration were not addressed properly. The statistical model was simplified into an unweighted sum score (by unclear rounding rules) without the performance of this model being assessed. Besides the large number of participants excluded for irregularly timed data, only 1 participant was reported as being excluded after enrolment, and this probably had little effect on results. Although an external validation set was reported, no assessment was performed for need of shrinkage.
Ext Val: The number of events was low, and discrimination was not addressed. Complete case analysis was used for enrolled participants, but this only led to a drop of 2 participants.
Overall No At least one domain is at high risk of bias.

Vukusic 2004.

Study characteristics
General information Model name
Not applicable
Primary source
Journal
Data source
Cohort, primary
Study type
Development
Participants Inclusion criteria
  • MS diagnosis for at least 1 year prior to conception

  • Pregnant for at least 4 weeks but less than 36 weeks at entry into the study

  • Full‐term delivery of live infant

  • First pregnancy in dataset (if multiple observed)

  • Follow‐up until at least delivery


Exclusion criteria
Not reported
Recruitment
  • PRIMS (The Pregnancy in Multiple Sclerosis) natural history study participants from multiple centres across Europe, 76% of females known to their neurologists before recruitment

  • Unclear (total PRIMS cohort): France, Austria, Belgium, Netherlands, Italy, Denmark, Spain, Germany, United Kingdom, Portugal, Switzerland, Ireland


Age (years)
Mean 30.0
Sex (%F)
100.0
Disease duration (years)
Mean 6 (SD 4)
Diagnosis
96% RRMS, 4% SPMS
Diagnostic criteria
Poser 1983
Treatment
  • At recruitment, 0%

  • During follow‐up, 1.8% azathioprine and 0.4% mitoxantrone


Disease description
DSS at beginning of pregnancy mean (SD): 1.3 (1.4), annualised relapse rate during the year before pregnancy (95% CI): 0.7 (0.6 to 0.8)
Recruitment period
1993 to 1995
Predictors Considered predictors
Number of relapses in pre‐pregnancy year, number of relapses during pregnancy, DSS at pregnancy onset, epidural analgesia (ref: no), breast‐feeding (ref: no), total number of relapses before pregnancy, disease duration, age at multiple sclerosis onset, age at pregnancy onset, number of previous pregnancies, child gender (ref: male)
Number of considered predictors
11
Timing of predictor measurement
At study baseline (cohort entry during pregnancy week 4 to 36), at examinations at 20, 28, 36 weeks of gestation, and also post‐partum
Predictor handling
Continuously
Outcome Outcome definition
Relapse: a post‐partum relapse, defined as the appearance, reappearance or worsening of symptoms of neurological dysfunction lasting > 24 hours; fatigue alone not considered as a relapse
Timing of outcome measurement
During 3 months after delivery
Missing data Number of participants with any missing value
≥ 17, unclear exactly how many participants have any missing
Missing data handling
Complete case
Analysis Number of participants (number of events)
223 (63)
Modelling method
Logistic regression
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, unclear

    • Significance based on P value threshold implied


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Apparent
Calibration estimate
Not reported
Discrimination estimate
c‐Statistic = 0.72
Classification estimate
Accuracy = 0.72 (cutoff = 0.5)
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Full regression model
Number of predictors in the model
3
Predictors in the model
Number of relapses in pre‐pregnancy year, number of relapses during pregnancy, MS duration
Effect measure estimates
OR (95% CI): number of relapses in pre‐pregnancy year 1.94 (1.32 to 2.80), number of relapses during pregnancy 1.87 (1.12 to 3.13), MS duration 1.11 (1.03 to 1.20)
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To report the 2‐year post‐partum follow‐up and to analyse the factors predictive of relapse in the 3 months after delivery
Primary aim
The primary aim of this study is not the prediction of individual outcomes. The focus is on predictor identification.
Model interpretation
Exploratory
Suggested improvements
Not reported
Notes Applicability overall
Low
Auxiliary references
Confavreux C, Hutchinson M, Hours MM, Cortinovis‐Tourniaire P, Moreau T. Rate of pregnancy‐related relapse in multiple sclerosis. Pregnancy in multiple sclerosis group. N Engl J Med 1998;339(5):285‐91.
 
Item Authors' judgement Support for judgement
Participants Yes The study used cohort study data collected to assess the effect of pregnancy on MS courses, and the inclusion criteria are appropriate.
Predictors No Almost 50% of patients were not followed up prospectively, and nearly 25% were not known to their neurologists before recruitment. Hence, the number of relapses before pregnancy was probably collected non‐uniformly from a mixture of patients and neurologists in a retrospective or prospective manner.
Outcome Yes The outcome is a relatively objective one, so even if the predictor information was available at the time of its assessment, it would not introduce risk of bias.
Analysis No The EPV was below 10. Calibration was not addressed, and only apparent validation was reported. Participants lost to follow‐up were excluded from the analysis. Reporting of missing data handling was ambiguous but probably based on complete case analysis.
Overall No At least one domain is at high risk of bias.

Weinshenker 1991.

Study characteristics
General information Model name
M3 Dev
Primary source
Journal
Data source
Cohort, primary
Study type
Development
Participants Inclusion criteria
MS
Exclusion criteria
Not reported
Recruitment
Consecutive patients referred to the MS Clinic at the University Hospital in London, Ontario, Canada
Age (years)
Mean 30.5 (onset)
Sex (%F)
65.7
Disease duration (years)
Mean 11.9 (SE 0.3)
Diagnosis
Other: 65.8% RRMS, 14.8% relapsing progressive, 18.7% chronically progressive, 0.9% unknown/83.3% diagnosis probable, 16.4% diagnosis possible
Diagnostic criteria
Poser 1983
Treatment
0%
Disease description
Not reported
Recruitment period
1972 to 1984
Predictors Considered predictors
Unclear if it is the complete list, age at onset, sex, seen at onset of MS, initial symptoms ‐ motor, systems involved ‐ brainstem, systems involved cerebellar, systems involved cerebral, systems involved ‐ pyramidal, (in other models: initial symptoms ‐ limb ataxia and balance, remitting at onset, first interattack interval, number of attacks in first 2 years, DSS at 2 years, DSS at 5 years)
Number of considered predictors
≥ 13 (unclear if complete list)
Timing of predictor measurement
At assessment (not defined), at follow‐up
Predictor handling
Continuously
Outcome Outcome definition
Disability (DSS): time to reach DSS 6
Timing of outcome measurement
Follow‐up for 12 years
Missing data Number of participants with any missing value
38
Missing data handling
Complete case
Analysis Number of participants (number of events)
1060 (498)
Modelling method
Survival, Weibull
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, stepwise selection

    • P value < 0.05

    • A form of forward selection


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
None
Performance evaluation dataset
None
Performance evaluation method
Not applicable
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Not reported
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Full regression model
Number of predictors in the model
7
Predictors in the model
Age at onset, seen at MS onset, motor (insidious), brainstem, cerebellar, cerebral, pyramidal
Effect measure
Log HR (SE): intercept 4.25 (0.132), age at onset −0.030 (0.003), seen at MS onset −0.568 (0.104), motor (insidious) −0.224 (0.077), brainstem −0.184 (0.061), cerebellar −0.430 (0.073), cerebral −0.255 (0.100), pyramidal −0.230 (0.090), scale 0.648 (0.022)
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
A multivariate hierarchical analysis to assess the significance of several demographic and clinical factors in multiple sclerosis patients (analysis similar to multiple regression was used to generate predictive models which permit the calculation of the median time to DSS 6 for patients with a given set of covariates).
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the factors.
Model interpretation
Exploratory
Suggested improvements
Not reported
Notes Applicability overall
Low
Auxiliary references
Weinshenker BG, Bass B, Rice GP, Noseworthy J, Carriere W, Baskerville J, et al. The natural history of multiple sclerosis: a geographically based study. I. Clinical course and disability. Brain 1989;112 (Pt 1):133‐46.
 
Item Authors' judgement Support for judgement
Participants Yes The authors described collecting a clinical cohort that intended to include all MS patients in the geographical area. They followed up with the patients regularly, and the study data were separate from the routine clinical charts. No inclusion criteria were discussed explicitly, but the study aimed to include all patients with MS in the entire area and called itself a natural history study.
Predictors No Although this study is a population‐based cohort and standardised data fields were created with MS research in mind, almost 4/5 of the patients were not seen from the onset onwards. Thus, data on predictors related to the onset were collected differently for patients as retrospectively or prospectively.
Outcome Yes The outcome was probably defined with knowledge of the predictors because only a few clinicians saw the patients in a routine care setting. DSS 6 is a relatively 'hard' outcome in which patients become dependent on a walking aid, so we judge the risk of bias due to knowledge of predictors to be low.
Analysis No Although not all enrolled participants were included in modelling due to complete case analysis, the missing data were in less than 5% of patients and hence are not expected to introduce risk of bias. Calibration and discrimination were both not addressed; also, model optimism was not addressed. The evaluation only included patients experiencing the outcome instead of using methods that account for the censoring.
Overall No At least one domain is at high risk of bias.

Weinshenker 1996.

Study characteristics
General information Model name
  • Short term

  • M3 Ext Val


Primary source
Journal
Data source
Routine care, secondary
Study type
  • Short term: development

  • M3 Ext Val: external validation, location

Participants Inclusion criteria
Not reported
Exclusion criteria
Not reported
Recruitment
Consecutive participants seen by first author at Ottawa Regional MS clinic, Canada
Age (years)
Mean 44.1
Sex (%F)
69.1
Disease duration (years)
Mean 12
Diagnosis
Other: 84.3% RRMS, 2.0% relapsing progressive, 13.7% chronically progressive
Diagnostic criteria
Not reported
Treatment
Tot reported
Disease description
Unclear
Recruitment period
Not reported
Predictors Considered predictors
  • Short term: unclear if it is the complete list, disease duration (bin, ref: < 20 years), EDSS (difference from 4.5), progression index, predicted time to DSS 6 (model 1 from Weinshenker 1991), follow‐up time

  • M3 Ext Val: not applicable


Number of considered predictors
  • Short term:

    • ≥ 5 (unclear if complete list)

  • M3 Ext Val: not applicable


Timing of predictor measurement
  • Short term: at assessment (not defined), at follow‐up (unclear: outcome measurement)

  • M3 Ext Val: not applicable


Predictor handling
  • Short term: continuously except duration, which was dichotomised

  • M3 Ext Val: not applicable

Outcome Outcome definition
  • Short term:

    • Disability (EDSS): short‐term progression defined as change in EDSS over 1 year to 3 years of follow‐up

  • M3 Ext Val:

    • Disability (DSS): time to reach DSS 6 (equivalent to EDSS 6.0 or 6.5) defined as the point at which patients required a cane at all times when walking outside the home and the time at which the patient was barely able to walk half a block


Timing of outcome measurement
  • Short term: definition 1 year to 3 years, follow‐up summarised for 2 years

  • M3 Ext Val: time to outcome mean (SD): 20.7 years (0.90 years)

Missing data Number of participants with any missing value
  • Short term: not reported

  • M3 Ext Val: 10, only missing outcome reported


Missing data handling
Complete case
Analysis Number of participants (number of events)
  • Short term:

    • 84 or, probably, 174 (43 with worsening, 28 with +1‐point change or higher)

  • M3 Ext Val:

    • ≤ 259 (66)


Modelling method
  • Short term: logistic regression

  • M3 Ext Val: not applicable


Predictor selection method
  • Short term:

    • For inclusion in the multivariable model, not reported

    • During multivariable modelling, full model approach

  • M3 Ext Val: not applicable


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
  • Short term: none

  • M3 Ext Val: not applicable


Performance evaluation dataset
  • Short term: development

  • M3 Ext Val: external validation


Performance evaluation method
  • Short term: apparent

  • M3 Ext Val: not applicable


Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
  • Short term:

    • Cutoff = 0.5: accuracy = 0.75, sensitivity = 0.21, specificity = 0.93

    • 0.3 cutoff = 0.3: accuracy = 0.67, sensitivity = 0.54, specificity = 0.72

  • M3 Ext Val: not reported


Overall performance
  • Short term: not reported

  • M3 Ext Val: mean prediction error 5.25 (SD 4.58)


Risk groups
Not reported
Model  Model presentation
  • Short term: full regression model

  • M3 Ext Val: not applicable


Number of predictors in the model
  • Short term: 5

  • M3 Ext Val: not applicable


Predictors in the model
  • Short term: duration, EDSS, progression index, predicted time to DSS 6 from model 1, follow‐up

  • M3 Ext Val: not applicable


Effect measure
  • Short term:

    • log OR (SE): intercept −1.45, duration −1.70 (0.68), EDSS −0.65 (0.19), follow‐up 0.64 (0.26), progression index −0.16 (0.27), predicted time to DSS 6 from model 1 0.05 (0.04)

  • M3 Ext Val: not applicable


Predictor influence measure
Not applicable
Validation model update or adjustment
  • Short term: not applicable

  • M3 Ext Val: none

Interpretation  Aim of the study
  • Short term: to establish predictors of short‐term outcome of MS

  • M3 Ext Val: to validate previously published models predicting time to EDSS 6


Primary aim
  • Short term: the primary aim of this study is not the prediction of individual outcomes. Rather, the focus is on predictor identification.

  • M3 Ext Val: the primary aim of this study is the prediction of individual outcomes.


Model interpretation
Exploratory
Suggested improvements
Implicitly suggests that model should be applied to correct patient population (based on temporal course and baseline disability) as opposed to any patients available
Notes Applicability overall
  • Short term: high

  • M3 Ext Val: low


Applicability overall rationale
  • Short term: although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to explore predictors for short‐term disease progression.

 
Item Authors' judgement Support for judgement
Participants No Although the authors discussed the probable lack of referral bias, the data were collected for reasons other than this study where no inclusion/exclusion criteria other than the diagnosis were reported.
Predictors No Short‐term: The stated interest was in predicting 15‐year outcomes, but the data used were at 15 months after baseline, which brought the window down to less than 14 years. The intended time of the model used is unclear.
M3 Ext Val: The fact that 'seen at MS onset' was a predictor in the model made it likely that the data on the participants were collected as a mixture of retrospective and prospective, just like in Weinshenker 1991.
Outcome Yes Short‐term: We rated this domain for this analysis as having a high risk of bias. The short‐term progression was not confirmed, although the EDSS might fluctuate. Although this study probably pre‐dates the standard definition of progression, an outcome based on EDSS is standard and considered to be objective.
M3 Ext Val: We rated this domain for this analysis as having a low risk of bias. Although the outcome was probably assessed with the knowledge of the predictors, DSS 6 can be considered a hard outcome; thus, knowledge of predictors introduces a little risk of bias.
Analysis No Short‐term: The EPV was low. Disease duration was dichotomised, justified by clinical knowledge, but the nonlinearity could have been more thoroughly explored. Many participants were excluded without reporting reasons and comparing these participants to those included. Complete case analysis was used. Neither calibration nor discrimination was assessed, and classification measures were assessed in‐sample. Follow‐up time was added as a predictor instead of using methods to deal with different observation times.
M3 Ext Val: The number of events was far below 100, only complete case analysis was done, and the models were not evaluated using calibration and discrimination measures accounting for censoring.
Overall No At least one domain is at high risk of bias.

Wottschel 2015.

Study characteristics
General information Model name
  • 1 year

  • 3 years


Primary source
Journal
Data source
Cohort, secondary
Study type
Development
Participants Inclusion criteria
  • At least one demyelinating lesion visible on baseline scans

  • Available scans, and corresponding lesion masks

  • Available clinical data at 1‐ and 3‐year follow‐up


Exclusion criteria
Not reported
Recruitment
UK
Age (years)
  • 1 year: mean 33.1

  • 3 years: mean 33.2


Sex (%F)
  • 1 year: 66.2

  • 3 years: 67.1


Disease duration (years)
Mean 0.1 (SD 0.07)
Diagnosis
100% CIS
Diagnostic criteria
Not reported
Treatment
0%
Disease description
EDSS median (range): 1 (0 to 8)
Recruitment period
1995 to 2004
Predictors Considered predictors
Age, gender, type of CIS (brainstem/cerebellum, spinal cord, optic neuritis, other), EDSS, lesion count, lesion load, average lesion PD intensity, average lesion T2 intensity, average distance of lesions from the centre of the brain, presence of lesions in proximity of the centre of the brain, the shortest horizontal distance of a lesion from the vertical axis of the brain, lesion size profile
Number of considered predictors
14
Timing of predictor measurement
At disease onset (CIS) and up to a mean of 6.15 weeks (SD 3.4) after disease onset
Predictor handling
Continuously (polynomial kernel)
Outcome Outcome definition
Conversion to definite MS: clinical conversion to MS due to the occurrence of a second clinical attack attributable to demyelination of more than 24 hours in duration and at least 4 weeks from the initial attack
Timing of outcome measurement
  • 1 year: 1 year

  • 3 years: 3 years

Missing data Number of participants with any missing value
  • 1 year: 0

  • 3 years: ≥ 4, unclear exactly how many participants have any missing


Missing data handling
  • 1 year

    • Exclusion

  • 3 years

    • Mixed: complete case, and exclusion

Analysis Number of participants (number of events)
1 year: 74 (22)
3 years: 70 (31)
Modelling method
Support vector machine, polynomial kernel
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, stepwise selection

    • Forward selection based on bootstrap classification accuracy


Hyperparameter tuning
Several values for polynomial degree considered in cross‐validation, other tuning parameters not reported
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, LOOCV repeated on 100 balanced bootstrap samples
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
  • 1 year: accuracy = 0.714 (95% CI 0.58 to 0.84), sensitivity = 0.77, specificity = 0.66, PPV = 0.70, NPV = 0.74

  • 3 years: accuracy = 0.68 (95% CI 0.61 to 0.73), sensitivity = 0.60, specificity = 0.76, PPV = 0.72, NPV = 0.65


Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
List of selected predictors and kernel degree
Number of predictors in the model
  • 1 year: 3 (df unclear)

  • 2 years: 6 (df unclear)


Predictors in the model
  • 1 year: type of presentation, gender, lesion load

  • 2 years: lesion count, average lesion PD intensity, average distance of lesions from the centre of the brain, shortest horizontal distance of a lesion from the vertical axis, age, EDSS at onset


Effect measure estimates
Not reported
Predictor influence measure
Not reported
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To determine if machine learning techniques, such as support vector machines (SVMs), can predict the occurrence of a second clinical attack, which leads to the diagnosis of clinically definite multiple sclerosis (CDMS) in patients with a clinically isolated syndrome (CIS), on the basis of single patient's lesion features and clinical/demographic characteristics
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Probably exploratory
Suggested improvements
Use automatically derived features (instead of semi‐automated/manual features), features containing information on different aspects of imaging data (scale, directionality), imaging features not related to lesions (magnetism transfer imaging), other para‐clinical predictors (OCB, grey matter atrophy, genetic factors, spinal cord lesions, cortical lesions, Gd enhancing lesions), larger independent dataset, including temporal ordering of events, novel algorithms
Notes Applicability overall
Unclear
Applicability overall rationale
Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.
 
Item Authors' judgement Support for judgement
Participants No The data source was a secondary use of cohort study, but only participants with complete data were included.
Predictors Yes It is unclear if the neurologist circling the lesions was informed of the outcome. Still, we do not believe this would induce any considerable bias as imaging is considered to be an objective predictor. Other predictors are basic and objective.
Outcome Yes Although the outcome might have been measured with knowledge of the predictors, clinical attacks are considered objective.
Analysis No 1 year: The EPV was very low. Only classification measures were evaluated. Model selection and model evaluation occurred on the same data. The final model is unclear.
3 year: The EPV was very low. 4 participants were lost to follow‐up by 3 years, but this was only a small amount of the total patients. Only classification measures were evaluated. Model selection and model evaluation occurred on the same data. The final model is unclear.
Overall No At least one domain is at high risk of bias.

Wottschel 2019.

Study characteristics
General information Model name
BCGLMS
Primary source
Journal
Data source
Cohort, secondary
Study type
Development
Participants Inclusion criteria
  • Patients with a CIS examined within 3 months from symptoms onset

  • T1‐weighted MRI sequences of the brain obtained at onset, using standard‐of‐care local protocols

  • Demographic (age, sex) and clinical information (e.g. type of CIS) at baseline

  • The presence/absence of a second relapse at one year follow‐up available

  • Presence of T2‐hyperintense WM brain lesions as outlined in each centre on PD/T2‐weighted or FLAIR MRI by experienced researchers, resulting in binary lesion masks


Exclusion criteria
Not reported
Recruitment
  • 6 MAGNIMS network centres

  • Spain, Denmark, Austria, UK, and Italy


Age (years)
Mean 32.7 (onset)
Sex (%F)
66.3
Disease duration (years)
Up to 0.27
Diagnosis
100% CIS
Diagnostic criteria
Not reported
Treatment
Not reported
Disease description
EDSS median (range): 2 (0 to 8)
Recruitment period
Not reported
Predictors Considered predictors
Global features: (whole‐brain measures) GM volume, WM volume, brain volume as a percentage of the intracranial volume, age, sex, CIS type (brainstem/optic nerve/spinal cord/other), EDSS; region of interest (ROI) features: 143 ROIs (excluding ROIs describing ventricles, skull and background) based on the Neuromorphometrics atlas, each ROI from the brain parcellation used to mask each patient's GM probability map, CT map, lesion segmentation and T1 scan (to estimate the volume); lobe features: (ROIs were merged into nine larger areas according to their anatomical location) limbic, insular, frontal, parietal, temporal, occipital, cerebellum, GM and WM, deep grey matter defined as thalamus, hippocampus, nucleus accumbens, amygdala, caudate nucleus, pallidum, putamen and basal ganglia
Number of considered predictors
214
Timing of predictor measurement
At disease onset (CIS) and up to 14 weeks after disease onset
Predictor handling
Continuously
Outcome Outcome definition
Conversion to definite MS: occurrence of a second clinical episode
Timing of outcome measurement
1 year
Missing data Number of participants with any missing value
Not reported
Missing data handling
Exclusion
Analysis Number of participants (number of events)
400 (91)
Modelling method
Support vector machine, linear kernel
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, stepwise selection

    • Backward recursive feature elimination removing 20% of predictors with bootstrap averaged SVM weights closest to zero and repeated until accuracy no longer improves


Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, k‐fold CV (k = 2, 5, 10 (where possible), LOO) repeated on 100 balanced bootstrap samples
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
5‐fold CV: accuracy = 0.685 (95% CI 0.683 to 0.687), sensitivity = 0.678, specificity = 0.693, LOOCV: accuracy = 0.708 (95% CI 0.706 to 0.71), sensitivity = 0.703, specificity = 0.713
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
List of selected predictors for peak accuracy when using 2‐fold CV
Number of predictors in the model
36 (for 2‐fold CV)
Predictors in the model
Type of CIS, WM lesion load ‐ whole brain, WM lesion load ‐ frontal, WM lesion load ‐ limbic, WM lesion load ‐ temporal, WM lesion load ‐ dGM, WM lesion load ‐ WM, GM ‐ cerebellum, GM ‐ thalamus, GM ‐ frontal operculum, GM ‐ middle cingulate gyrus, GM ‐ precentral gyrus medial segment, GM ‐ posterior cingulate gyrus, GM ‐ praecuneus, GM ‐ parietal operculum, GM ‐ post‐central gyrus, GM ‐ planum polare, GM ‐ subcallosal area, GM ‐ supplementary motor cortex, GM ‐ superior occipital gyrus, cortical thickness ‐ central operculum, cortical thickness ‐ cuneus, cortical thickness ‐ fusiform gyrus, cortical thickness ‐ inferior temporal gyrus, cortical thickness ‐ middle occipital gyrus, cortical thickness ‐ post‐central gyrus medial segment, cortical thickness ‐ occipital pole, cortical thickness ‐ opercular part of the inferior frontal gyrus, cortical thickness ‐ orbital part of the inferior frontal gyrus, cortical thickness ‐ planum temporale, cortical thickness ‐ superior occipital gyrus, volume ‐ whole brain, volume ‐ ventral diencephalon, volume ‐ middle temporal gyrus, volume ‐ supramarginal gyrus, volume ‐ limbic
Effect measure estimates
Not reported
Predictor influence measure
Not reported
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To distinguish CIS converters from non‐converters at onset of a CIS, using recursive feature elimination and weight averaging with support vector machines. Also, to assess the influence of cohort size and cross‐validation methods on the accuracy estimate of the classification
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the influence of sample size and CV methods on results.
Model interpretation
Probably exploratory
Suggested improvements
To compare 2 or more cross‐validation schemes to estimate potential biases when it is not possible to use completely distinct data sets for training and testing, advanced imaging techniques such as magnetisation transfer imaging (MTR) or double or phase‐shifted inversion recovery (DIR/PSIR), genetic or environmental predictors, larger cohort, longitudinal MRI data, prospective harmonised imaging protocols
Notes Applicability overall
Unclear
Applicability overall rationale
Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes.
 
Item Authors' judgement Support for judgement
Participants No A multicentre cohort was used, but patients with missing data were excluded.
Predictors Yes Even though imaging predictors might have been a result of preprocessing after the outcome, we do not expect such information to affect an automatised procedure. Imaging data from several sites were used, but they were all MAGNIMS sites and collaborated in defining imaging protocols for the field.
Outcome Yes Although the outcome might have been measured with knowledge of the predictors, clinical attacks are considered objective.
Analysis No The EPV was very low. Only classification measures were evaluated. Selection and assessment occurred at the same resampling level. The final model is unclear.
Overall No At least one domain is at high risk of bias.

Ye 2020.

Study characteristics
General information Model name
  • 5‐gene signature

  • Nomogram


Primary source
Journal
Data source
Unclear, secondary
Study type
Development
Participants Inclusion criteria
  • Diagnosed with definite MS or CIS

  • Free of steroids and immunomodulatory treatments for at least 30 days before blood withdrawal

  • At least 1 year after treatment with cyclophosphamide


Exclusion criteria
  • Patients with neuromyelitis optica (NMO) according to the criteria of Wingerchuk


Recruitment
Unclear, Sheba Medical Centre, Israel
Age (years)
Mean 36.3 (unclear when)
Sex (%F)
63.8%
Disease duration (years)
Mean 5.7 (pooled SD 0.89)
Diagnosis
34.0% CIS, 66.0% CDMS
Diagnostic criteria
McDonald 2001
Treatment
Unclear timing, 30.9% on DMT
Disease description
EDSS (unclear if mean and SD): CIS 0.9 (0.2) CDMS 2.4 (0.2), annualised relapse rate (unclear if mean and SD): CDMS 0.92 (0.1)
Recruitment period
Not reported
Predictors Considered predictors
  • 5‐gene signature: differentially expressed genes

  • Nomogram: age, gender, disease type (CIS vs MS), DMT, genetic risk score (based on the 5‐gene signature)


Number of considered predictors
  • 5‐gene signature: 202

  • Nomogram: 206


Timing of predictor measurement
At study baseline (cohort entry)
Predictor handling
  • 5‐gene signature: unclear, probably continuously

  • Nomogram: age dichotomised, genetic risk score continuously

Outcome Outcome definition
Relapse: relapse‐free survival (relapse defined as the onset of new objective neurological symptoms and signs or worsening of existing neurological disability not accompanied by metabolic changes, fever or other signs of infection, and lasting for a period of at least 48 hours accompanied by objective change of at least 0.5 in the EDSS score)
Timing of outcome measurement
Follow‐up mean (SD): 1.97 (1.3)
Missing data Number of participants with any missing value
Not reported
Missing data handling
Single imputation, using k‐nearest neighbours
Analysis Number of participants (number of events)
94 (64)
Modelling method
  • 5‐gene signature: survival, LASSO Cox

  • Nomogram: survival, Cox


Predictor selection method
  • 5‐gene signature

    • For inclusion in the multivariable model, univariable analysis

    • During multivariable modelling, modelling method

  • Nomogram

    • For inclusion in the multivariable model, all candidate predictors

    • During multivariable modelling, full model approach


Hyperparameter tuning
  • 5‐gene signature: not reported

  • Nomogram: not applicable


Shrinkage of predictor weights
  • 5‐gene signature: modelling method

  • Nomogram: none


Performance evaluation dataset
Development
Performance evaluation method
  • 5‐gene signature: random split, 2/3 training and 1/3 test

  • Nomogram: random split, 2/3 training and 1/3 test (B = 1000 bootstraps on training data)


Calibration estimate
Not reported
Discrimination estimate
  • 5‐gene signature

    • c‐Statistic: 1 year 0.518, 2 year 0.655, 3 year 0.729 (development set: 1 year 0.785, 2 year 0.86, 3 year 0.897)

  • Nomogram

    • Survival c‐statistic: 0.59 (development set: 0.67)


Classification estimate
Not reported
Overall performance
Not reported
Risk groups
  • 5‐gene signature: high‐ and low‐risk groups defined by median of risk score (predicted log hazard ratios in training set): 1.12

  • Nomogram: not reported

Model  Model presentation
  • 5‐gene signature: regression coefficients without baseline hazard

  • Nomogram: nomogram


Number of predictors in the model
5
Predictors in the model
  • 5‐gene signature: FTH1, GBP2, MYL6, NCOA4, SRP9

  • Nomogram: age, gender, disease type, DMT, risk score


Effect measure estimates
  • 5‐gene signature: HR (95% CI): FTH1 9.080 (2.31309 to 35.65), GBP2 0.155 (0.02757 to 0.88), MYL6 0.019 (0.00028 to 1.23), NCOA4 0.106 (0.02277 to 0.49), SRP9 23.045 (3.00729 to 176.60)

  • Nomogram: HR (95% CI): age 1.032 (0.442 to 2.411), gender 0.727 (0.371 to 1.425), disease type 1.657 (0.726 to 3.784), DMT 0.707 (0.307 to 1.628), risk score 1.159 (1.076 to 1.248)


Predictor influence measure
  • 5‐gene signature: not reported

  • Nomogram: not applicable


Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To develop and validate an effective and noninvasive prognostic gene signature for predicting the probability of relapse and remission period in MS patients via an integrated analysis of blood microarrays
Primary aim
The primary aim of this study is the prediction of individual outcomes.
Model interpretation
Probably confirmatory
Suggested improvements
Include Asian participants
Notes Applicability overall
  • 5‐gene signature: high

  • Nomogram: low


Applicability overall rationale
  • 5‐gene signature: the predictors used were genomic features and no other predictor domain was considered for use in the model.


Auxiliary references
Gurevich M, Tuller T, Rubinstein U, Or‐Bach R, Achiron A. Prediction of acute multiple sclerosis relapses by transcription levels of peripheral blood cells. BMC Medical Genomics [Electronic Resource] 2009;2:46.
 
Item Authors' judgement Support for judgement
Participants Unclear The data source was not clearly reported in this study, nor in the original study from which the data came.
Predictors Yes According to the study from which the data came, although microarray analysis of transcriptome can be affected by different batch effects, there were efforts to circumvent it. Even though the microarray analysis might have occurred after the outcomes became known, the procedure was relatively automated and expected not to be affected by the information. The intended time of model use concerning patient's disease history is unclear, but it may be any time that blood was drawn.
Outcome Yes Information related to the outcome was retrieved from Gurevich 2009. A standard definition of relapse was used, which we considered robust to possible predictor knowledge.
Analysis No 5‐gene signature: The number of predictors compared to number of events was too large. Univariable analysis was used for predictor selection. Although no information related to missing data was reported, the number of included patients matches with the publication on the data source. Only discrimination, which did not appear to account for censored‐data, was assessed. A random split was used for assessment. Parameter tuning was not discussed and the plots corresponding to the Cox LASSO model selection do not correspond to the final model presented (plots depict optimal predictor number 11, not 5).
Nomogram: The number of predictors compared to number of events was too large. Although no information related to missing data was reported, the number of included patients matches with the publication on the data source. Only discrimination was assessed. A random split was used for assessment, in addition to a bootstrap procedure in the training set that did not correct for optimism.
Overall No At least one domain is at high risk of bias.

Yoo 2019.

Study characteristics
General information Model name
CNN EDT, pretraining, all user‐defined features
Primary source
Journal
Data source
Randomised trial participants, secondary
Study type
Development
Participants Inclusion criteria
  • Age between 18 years and 60 years

  • Patients with onset of their first demyelinating symptoms within the previous 180 days

  • A minimum of 2 lesions that were at least 3 mm in diameter on a T2‐weighted (T2w) screening brain MRI (one had to be ovoid, periventricular or infratentorial)

  • For patients over the age of 50 years cerebrospinal fluid oligoclonal bands or spinal MRI changes typical of demyelination


Exclusion criteria
  • Better explanation for the event

  • Previous event reasonably attributable to demyelination

  • Meeting the 2005 McDonald criteria for MS (Polman 2005)


Recruitment
Minocycline RCT participants recruited from MS clinics in Canada and USA:
  • Cumming School of Medicine and the Hotchkiss Brain Institute, Calgary, AB

  • University of British Columbia, Vancouver, BC

  • University of Montreal, Montreal, QC

  • Tufts University, Boston, MA

  • Western University, London, ON, Fraser Health MS Clinic, Burnaby, BC

  • University of Ottawa and the Ottawa Hospital Research Institute, Ottawa, ON, Dalhousie University, Halifax, NS

  • University of Alberta, Edmonton, AB

  • University of Manitoba, Winnipeg, MB, Clinique Neuro Rive‐Sud, Greenfield Park, QC

  • University of Toronto, Toronto, ON, CHA‐Hôpital Enfant‐Jésus, Québec, QC


Age (years)
Mean 35.9 (onset)
Sex (%F)
69.0
Disease duration (years)
Median 0.2 (range 0.06 to 0.52)
Diagnosis
100% CIS
Diagnostic criteria
Not reported
Treatment
  • At recruitment, 0%

  • During follow‐up, approximately 50% on minocycline


Disease description
EDSS median (range): 1.5 (0 to 4.5)
Recruitment period
2009 to 2013
Predictors Considered predictors
MRI mask images/user‐defined: T2w lesion volume, brain parenchymal fraction, diffusely abnormal white matter, gender, initial CIS event cerebrum, initial CIS event optic nerve, initial CIS event cerebellum, initial CIS event brainstem, initial CIS event spinal cord, EDSS, CIS monofocal or multifocal type at onset
Number of considered predictors
Non‐tabular data + 11 (user‐defined)
Timing of predictor measurement
At disease onset (CIS) (RCT baseline within 180 days after disease onset)
Predictor handling
Continuously except for DAWM, which was dichotomised (justified as binary being more reliable)
Outcome Outcome definition
Conversion to definite MS (McDonald 2005 (Polman 2005)): MS at the end of 2 years determined by new T2 lesions, new T1 gadolinium enhancing lesions and/or new clinical relapse
Timing of outcome measurement
2 years
Missing data Number of participants with any missing value
9, only missing outcome reported
Missing data handling
Not reported
Analysis Number of participants (number of events)
140 (80)
Modelling method
Convolutional neural network
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, full model approach


Hyperparameter tuning
Empirically determined L1 and L2‐norm parameters (Montavon 2012), early stopping convergence target found by test error increase during cross‐validation, grid search over several values for replication and scale factors using cross‐validation accuracy
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, 7‐fold
Calibration estimate
Not reported
Discrimination estimate
c‐Statistic = 0.746 (SD 0.114)
Classification estimate
Accuracy = 0.75 (SD 0.113), sensitivity = 0.787 (SD 0.122), specificity = 0.704 (SD 0.154)
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Not reported
Number of predictors in the model
Non‐tabular data + 11
Predictors in the model
Non‐tabular: MRI mask images, tabular: T2w lesion volume, brain parenchymal fraction, diffusely abnormal white matter, gender, initial CIS event cerebrum, initial CIS event optic nerve, initial CIS event cerebellum, initial CIS event brainstem, initial CIS event spinal cord, EDSS, CIS monofocal or multifocal type at onset
Effect measure estimates
Not reported
Predictor influence measure
Average relative importance of the user‐defined features
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To determine whether deep learning can extract latent MS lesion features that, when combined with user‐defined radiological and clinical measurements, can predict conversion to MS ... in patients with early MS symptoms (clinically isolated syndrome), a prodromal stage of MS, more accurately than imaging biomarkers that have been used in clinical studies to evaluate overall disease state, such as lesion volume and brain volume
Primary aim
The primary aim of this study is not entirely on the prediction of individual outcomes. The focus is on the ability of deep learning to extract latent features.
Model interpretation
Exploratory
Suggested improvements
Examine more sophisticated strategies such as augmenting input feature vectors with the squared values or by taking polynomial combinations of feature vectors to increase feature dynamic range and creating an augmented network that has the ability to learn higher order features.
Notes Applicability overall
High
Applicability overall rationale
Although this study contained a model, the main aim was not to create a model for prediction of individual outcomes but was rather to how the ability of deep learning algorithms to extract prognostic factors.
Auxiliary references
Metz LM, Li DKB, Traboulsee AL, Duquette P, Eliasziw M, Cerchiaro G, et al. Trial of minocycline in a clinically isolated syndrome of multiple sclerosis. N Engl J Med 2017;376(22):2122‐33.
 
Item Authors' judgement Support for judgement
Participants Unclear The data came from an RCT with well‐explained, appropriate inclusion/exclusion criteria. It is unclear why cerebrospinal fluid oligoclonal bands, or spinal MRI changes typical of demyelination, were required for participants over age 50 years. This means that the patients were known to have the outcome by more current diagnostic criteria. It is unclear if this introduces any risk of bias.
Predictors Yes Even though imaging predictors might have been a result of preprocessing after the outcome, we do not expect such information to affect an automatised procedure. The seeds were set by a single expert and checked by another single expert. Other predictors were assessed by MS clinicians.
Outcome Yes Although the outcome might have been measured with knowledge of the predictors, the diagnostic criteria are considered objective.
Analysis No The number of participants and events relative to the number of tabular features was very low. It was unclear how missing data were handled, but 12‐month outcomes were used for 9 participants. Calibration was not assessed. Evaluation and tuning occurred at the same level, where there was no nested structure to resampling. No final model/tool was given.
Overall No At least one domain is at high risk of bias.

Yperman 2020.

Study characteristics
General information Model name
RF literature + time series predictors
Primary source
Journal
Data source
Routine care, secondary
Study type
Development
Participants Inclusion criteria
  • MS patients with at least one EP measurement visit (complete MEP (left/right hands/feet))

  • 2‐year follow‐up


Exclusion criteria
  • Visits that do not contain all 4 (2 APB for the hands and 2 AH for the feet) evoked potential time series (EPTS)

  • Visits without 2‐year follow‐up

  • Motor evoked potentials (MEPs) with facilitation method

  • Any EPTS that have a spectral power above an empirically determined threshold at the starting segment (determined by the values of the latency of a healthy patient) of the measurement

  • Measurements of a duration differing from 100 ms


Recruitment
Rehabilitation & MS Centre in Overpelt, Belgium
Age (years)
Mean 45.0
Sex (%F)
71.8
Disease duration (years)
Not reported
Diagnosis
CIS 1.7%, RRMS 53.2%, SPMS 10.7%, PPMS 2.9%, unknown 32.9%
Diagnostic criteria
Unclear, unrecorded in the dataset
Treatment
  • At recruitment, 74% on DMT

  • During follow‐up, 79% on DMT at 2 years


Disease description
EDSS mean (SD): 3.0 (1.8)
Recruitment period
Not reported
Predictors Considered predictors
Latencies, EDSS at T0, age, peak‐to‐peak amplitude (L and R), gender, type of MS, around 5885 time series features extracted from the EPTS
Number of considered predictors
5893
Timing of predictor measurement
At visit of interest
Predictor handling
Continuously
Outcome Outcome definition
Disability (EDSS): disability progression defined as EDSS(T1) ‐ EDSS(T0) ≥ 1.0 for EDSS(T0) ≤ 5.5, or if EDSS(T1) ‐ EDSS(T0) ≥ 0.5 for EDSS(T0) > 5.5
Timing of outcome measurement
Time from EDSS_baseline measurement to EDSS_outcome median (IQR): 1.98 years (1.84 years to 2.08 years), time from MEP_baseline measurement to EDSS_outcome median (IQR): 1.99 years (1.87 years to 2.08 years)
Missing data Number of participants with any missing value
3717, unclear exactly how many participants have any missing
Missing data handling
Mixed, exclusion, complete case (on visit level), and complete feature analysis
Analysis Number of participants (number of events)
2502 visits (unit of analysis is visit) of 419 participants (275)
Modelling method
Random forest
Predictor selection method
  • For inclusion in the multivariable model, univariable analysis

  • During multivariable modelling, choosing 1 predictor per cluster in hierarchical clustering, Boruta, top n features with variable importance and cross‐validated performance as criteria (mutual information at univariable level)


Hyperparameter tuning
Unclear, maximum number of features, number of trees and minimum samples for split chosen in cross‐validation
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, 100 grouped stratified shuffle split within 1000 grouped stratified shuffle split
Calibration estimate
Calibration plot upon request
Discrimination estimate
c‐Statistic = 0.75 (SD 0.07)
Classification estimate
Not reported
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Not reported
Number of predictors in the model
≤ 9 (unclear subset)
Predictors in the model
Selected predictors unclear, at least latencies, EDSS at T0, age
Effect measure estimates
Not reported
Predictor influence measure
20 highest ranking features across all splits
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To investigate whether a machine learning approach that includes extra features from the EPTS can increase the predictive performance of EP in MS (progression in 2 years)
Primary aim
The primary aim of this study is not the prediction of individual outcomes. The focus is on the value of EP features.
Model interpretation
Exploratory
Suggested improvements
Data augmentation to expand the size of the training set, to stabilise the performance estimate, analysing the whole longitudinal trajectory of the patient, to use TS algorithms not included in HCTSA, using short timescale EPTS changes (e.g. 6 months) to predict EDSS changes on longer time‐scales or to detect non‐response to treatment, incorporating the left/right symmetry in a more advanced way, other variables such as MRI, cerebrospinal fluid, and genomic data, evaluation in larger datasets (preferably multicentre), VEP and SEP should be included in prediction process
Notes Applicability overall
High
Applicability overall rationale
Although this study contained a model and its assessment, the main aim was not to create a model for prediction of individual outcomes but was rather to show the added usefulness of EP with extra time series features using machine learning. Additionally, no final model was reported.
Auxiliary references
Fulcher BD, Little MA, Jones NS. Highly comparative time‐series analysis: the empirical structure of time series and their methods. J R Soc Interface 2013;10(83):20130048.
 
Item Authors' judgement Support for judgement
Participants No The data source was routine care, and the exclusion of visits/measurements was dependent on the quality of measurements and the availability of the outcome measurement.
Predictors No Two different machines were used; however, our clinical authors do not find this to be problematic. The predictors were probably assessed without outcome knowledge and are available at prediction model use. However, the disease type variable did not exist in the original data source and was inferred based on other variables only for a subset of patients.
Outcome No Progression was not confirmed. Table 1 suggested similar rates of worsening across disease subtypes in the paper, which is not expected and could be due to the lack of confirmation.
Analysis No The sample size was small relative to the large number of predictors. Exclusion of patients for missing data was addressed in the Participants section, but further exclusion due to missing data seems likely. The analysis was done at the visit level. Group stratified internal validation was used to address this, but it is unclear if this was enough. The correlation between observations was not addressed in the fitting. It is unclear how it would be addressed had a final model been selected and fit to the entire dataset. Univariable selection was used. The feature extraction and standardisation were done on the entire dataset, instead of within cross‐validation, making data leakage possible. A calibration plot was provided at follow‐up and showed severe miscalibration. No final model seems to be selected, fit, and presented.
Overall No At least one domain is at high risk of bias.

Zakharov 2013.

Study characteristics
General information Model name
Not applicable
Primary source
Journal
Data source
Unclear
Study type
Development
Participants Inclusion criteria
Patients with monofocal CIS
Exclusion criteria
Not reported
Recruitment
Department of Neurology and Neurosurgery of the Samara State Medical University and at the Centre for MS at the Samara Regional Clinical Hospital, Russia
Age (years)
Mean 25.1
Sex (%F)
70.0
Disease duration (years)
Not reported
Diagnosis
100% CIS
Diagnostic criteria
Not reported
Treatment
Not reported
Disease description
EDSS ≤ 2
Recruitment period
2004 to 2012
Predictors Considered predictors
Unclear if it is the complete list, age, number of foci, location of foci, size of the demyelination foci
Number of considered predictors
≥ 2 (unclear if complete list)
Timing of predictor measurement
Unclear, at first MRI after CIS onset (timing distribution unknown)
Predictor handling
Not reported
Outcome Outcome definition
Conversion to definite MS (McDonald 2010 (Polman 2011)): development of CDMS defined as the time of the onset of the second attack
Timing of outcome measurement
Follow‐up for 8 years
Missing data Number of participants with any missing value
Not reported
Missing data handling
Not reported
Analysis Number of participants (number of events)
102 (23)
Modelling method
Logistic regression
Predictor selection method
Not reported
Hyperparameter tuning
Not applicable
Shrinkage of predictor weights
None
Performance evaluation dataset
Development
Performance evaluation method
Apparent
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Sensitivity = 0.727, specificity = 0.345
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Not reported
Number of predictors in the model
2
Predictors in the model
Age at disease onset, size of the foci of demyelination
Effect measure estimates
Not reported
Predictor influence measure
Not applicable
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To study the clinical and instrumental parameters of the patient population with the first attack of the demyelinating process and involvement of only one functional system, most relevant to the term 'monofocal CIS'
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the predictors.
Model interpretation
Probably exploratory
Suggested improvements
Increase in the number of variables involved in the model, such variables as immunological indicators and data from neurophysiological methods – multimodal evoked potential
Notes Applicability overall
Unclear
Applicability overall rationale
Due to the lack of reporting on predictors and model timing, applicability is unclear.
 
Item Authors' judgement Support for judgement
Participants Unclear The data source is unclear without an indication of an associated study or registry. There were no detailed exclusion/inclusion criteria other than the diagnosis of monofocal CIS.
Predictors Unclear There are few details on how predictors were assessed or when they were assessed.
Outcome Unclear The definition of a second attack for a CDMS is standard and is expected to be measured relatively objectively. It is unclear how much time there was between the predictor assessment and outcome determination because the timing of the predictor measurement is unclear. It is unclear if there were regular visits for outcome assessment or it was assessed whenever/if a patient came in.
Analysis No Although many details of the analysis were not reported, there are clear indicators to assess the risk of bias of this domain as high. EPV is at most 11.5, based on the number of variables in the final model, not the unknown number of variables considered. There was no information on missing data, including censoring, during the 8‐year follow‐up period. The only model performance measures reported were sensitivity and specificity evaluated in the development set. A final model is not presented.
Overall No At least one domain is at high risk of bias.

Zhang 2019.

Study characteristics
General information Model name
Shape
Primary source
Journal
Data source
Cohort, primary
Study type
Development
Participants Inclusion criteria
  • Patients who initially presented with CIS, i.e. showed symptoms suggestive of an inflammatory central nervous disease without fulfilling the 2010 McDonald criteria (Polman 2011) for MS

  • Patients with at least 3 years of follow‐up (or earlier diagnosis of conversion to MS)

  • The presence of a baseline MR scan, including a FLAIR and T1w image


Exclusion criteria
Not reported
Recruitment
Prospectively from a single centre (unclear), Germany
Age (years)
Mean 42.4 (unclear when)
Sex (%F)
69.9
Disease duration (years)
Not reported
Diagnosis
100% CIS
Diagnostic criteria
McDonald 2010 (Polman 2011)
Treatment
  • At recruitment, 1.2% IFN‐b

  • During follow‐up, not reported


Disease description
EDSS median 1
Recruitment period
2009 to 2013
Predictors Considered predictors
Total lesion number, total lesion volume, minimum, maximum, mean, standard deviation for surface area, sphericity, surface‐volume‐ratio, and volume of individual lesions, (other models: minimum, maximum, mean, standard deviation for skewness, kurtosis, entropy of intensity histograms)
Number of considered predictors
30
Timing of predictor measurement
At disease onset (CIS) (during primary clinical work‐up for CIS)
Predictor handling
Continuously (by summary statistics of parameters from multiple lesions within single patients)
Outcome Outcome definition
Conversion to definite MS (McDonald 2010): demonstration of dissemination in time by a clinical relapse or the occurrence of new MRI lesions
Timing of outcome measurement
3 years
Missing data Number of participants with any missing value
2
Missing data handling
Exclusion
Analysis Number of participants (number of events)
84 (66)
Modelling method
Random forest, oblique ‐ linear multivariable model splitting
Predictor selection method
  • For inclusion in the multivariable model, all candidate predictors

  • During multivariable modelling, multiple models


Hyperparameter tuning
Number of variables considered at each node (considered 3, sqrt (number of variables) and 7 variables) and number of trees (considered 100, 200 and 300 trees) were optimised on out‐of‐bag error during 3‐fold CV
Shrinkage of predictor weights
Modelling method
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, 3‐fold
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Accuracy = 0.85 (95% CI 0.75 to 0.91), sensitivity = 0.94 (95% CI 0.85 to 0.98), specificity = 0.50 (95% CI 0.26 to 0.74), PPV = 0.87 (95% CI 0.81 to 0.91), NPV = 0.69 (95% CI 0.44 to 0.87), DOR = 15.50 (95% CI 3.93‐ to 60.98), balanced accuracy = 0.72 (posterior probability interval 0.60 to 0.82)
Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Not reported
Number of predictors in the model
18
Predictors in the model
Total lesion number, total lesion volume, minimum, maximum, mean, standard deviation for surface area, sphericity, surface‐volume‐ratio, and volume of individual lesions
Effect measure estimates
Not reported
Predictor influence measure
Bootstrapped importance scores
Validation model update or adjustment
Not applicable
Interpretation  Aim of the study
To predict the conversion from CIS to multiple sclerosis (MS) based on the baseline MRI scan by studying image features of these lesions
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on the new MRI features.
Model interpretation
Probably exploratory
Suggested improvements
Independent validation, other features (texture, advanced deep learning, clinical, paraclinical), predict disease course not only conversion
Notes Applicability overall
High
Applicability overall rationale
The predictors used were imaging features and no other predictor domain was considered for use in the model.
Auxiliary references
Filippi M, Preziosa P, Meani A, Ciccarelli O, Mesaros S, Rovira A, et al. Prediction of a multiple sclerosis diagnosis in patients with clinically isolated syndrome using the 2016 MAGNIMS and 2010 McDonald criteria: a retrospective study. Lancet Neurol 2018;17(2):133‐42.
 
Item Authors' judgement Support for judgement
Participants No The data source was described as a cohort, and patients could not have the outcome at baseline according to the information received during follow‐up. However, the amount of follow‐up was an inclusion criterion, which may introduce risk of bias.
Predictors Yes The predictors were collected at a single centre, and sensitivity to the lesion extraction method was examined.
Outcome Yes The outcome was based on well‐defined standard diagnostic criteria and we believe it is robust to knowledge of predictor information.
Analysis No The EPV was low. Neither calibration nor discrimination was addressed. During follow‐up, it was confirmed that patients were included in the analysis based on the availability of follow‐up, but the frequency of patients excluded from the analysis due to missing outcome data was less than 5% and hence is not expected to increase the risk of bias. The final model was unclear.
Overall No At least one domain is at high risk of bias.

Zhao 2020.

Study characteristics
General information Model name
  • XGB All

  • XGB Common

  • LGBM All

  • LGBM Common

  • XGB Common Val

  • LGBM Common Val


Primary source
Journal
Data source
Cohort, primary
Study type
Development + validation (unclear if model refit), location
Participants Inclusion criteria
  • XGB All, XGB Common, LGBM All, and LGBM Common:

    • Unclear, for the source population as reported in Gauthier 2006

    • Age ≥ 18 years

    • Definitive diagnosis of MS within the last 3 years whether treated or untreated

    • Adult with a diagnosis of MS meeting 2010 International Panel criteria from all CLIMB participants

    • Recruited into the QOL arm of the CLIMB study (enrolled between 4 May 2000 and 9 March 2013) or participants who had 10 years of follow‐up since first symptom

  • XGB Common Val and LGBM Common Val:

    • Unclear, for the source population as reported in Bove 2018

    • Age between 18 years and 65 years


Exclusion criteria
Not reported
Recruitment
  • XGB All, XGB Common, LGBM All, and LGBM Common: prospectively from the clinical practice of Brigham and Women’s Hospital forming the CLIMB cohort within the SUMMIT consortium, USA

  • XGB Common Val and LGBM Common Val: prospective observational research cohort from San Francisco MS Center at University of California, preferential recruitment of ambulatory participants and those with a recent onset of CDMS (2001 International Panel Diagnostic Criteria) or CIS, forming the EPIC cohort within the SUMMIT consortium, USA


Age (years)
  • XGB All, XGB Common, LGBM All, and LGBM Common: unclear, mean 39.0 (all participant data is for source population of the selected cohort)

  • XGB Common Val and LGBM Common Val: unclear, mean 42.5 (all participant data is for source population of the selected cohort)


Sex (%F)
  • XGB All, XGB Common, LGBM All, and LGBM Common: 76.1

  • XGB Common Val and LGBM Common Val: 68.7


Disease duration (years)
  • XGB All, XGB Common, LGBM All, and LGBM Common: median 2.0 (range: 0 to 44)

  • XGB Common Val and LGBM Common Val: median 6.0 (range: 0 to 45)


Diagnosis
  • XGB All, XGB Common, LGBM All, and LGBM Common: unclear, 16.4% CIS, 65.6% RRMS, 2.6% SPMS, 4.3% PPMS

  • XGB Common Val and LGBM Common Val: unclear, 15.9% CIS, 70.8% RRMS, 9.3% SPMS, 3.9% PPMS


Diagnostic criteria
Not reported
Treatment
  • XGB All, XGB Common, LGBM All, and LGBM Common:

    • At recruitment, unclear, for source cohort, 92.6% DMT first line, 0.8% DMT oral, 3.9% DMT high, 0.4% experimental, 2.3% MS other

    • During follow‐up, unclear, for source cohort, treatment at last visit, 34.6% DMT first line, 39.1% DMT oral, 9.6% DMT high, 3.1% experimental, 0.2% immune, 13.3% MS other, 4.5% never on treatment

  • XGB Common Val and LGBM Common Val:

    • At recruitment, unclear, for source cohort, 93.4% DMT first line, 0.7% DMT oral, 1.6% DMT high, 0.7% experimental, 0.5% immune, 1.6% steroid, 1.6% MS other

    • During follow‐up, unclear, for source cohort, treatment at last visit, 57.4% DMT first line, 19.8% DMT oral, 9.6% DMT high, 3% experimental, 0.5% immune, 5% steroid, 4.8% MS other, 15.1% never on treatment


Disease description
  • XGB All, XGB Common, LGBM All, and LGBM Common: unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 2; 0 to 7.5)

  • XGB Common Val and LGBM Common Val: unclear, for source cohort, EDSS median (IQR; range): 1.5 (1 to 3; 0 to 7)


Recruitment period
From 2000 onward
Predictors Considered predictors
  • XGB All, XGB Common, LGBM All, and LGBM Common: unclear if it is the complete list, Common: age, ethnicity, longitudinal features, attack previous 6 m, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time

  • XGB Common Val and LGBM Common Val: not applicable


Number of considered predictors
  • XGB All and LGBM All: 198

  • XGB Common and LGBM Common: ≤ 105 (unclear subset)

  • XGB Common Val and LGBM Common Val: not applicable


Timing of predictor measurement
  • XGB All and LGBM All: at multiple assessments every 6 months from baseline (undefined) to year 2

  • XGB Common and LGBM Common: at multiple assessments every year from baseline (undefined) to year 2

  • XGB Common Val and LGBM Common Val: not applicable


Predictor handling
  • XGB All, XGB Common, LGBM All, and LGBM Common: unclear, probably continuously

  • XGB Common Val and LGBM Common Val: not applicable

Outcome Outcome definition
Disability (EDSS): worsening defined as an increase in EDSS ≥ 1.5
Timing of outcome measurement
Up to 5 years
Missing data Number of participants with any missing value
Not reported
Missing data handling
  • Mixed

    • Exclusion of variables with excessive missing values

    • (Unclear) missing values in the time series: numeric values interpolated/extrapolated linearly using the nearest data points

    • (Unclear) categorical values filled using the mode of existing values in the patient

Analysis Number of participants (number of events)
  • XGB All, XGB Common, LGBM All, and LGBM Common: 724 (165)

  • XGB Common Val and LGBM Common Val: 400 (130)


Modelling method
  • XGB All, and XGB Common: XGBoost

  • LGBM All, and LGBM Common: LightGBM

  • XGB Common Val and LGBM Common Val: not applicable


Predictor selection method
  • XGB All, XGB Common, LGBM All, and LGBM Common:

    • For inclusion in the multivariable model, all candidate predictors

    • During multivariable modelling, full model approach

  • XGB Common Val and LGBM Common Val:

    • For inclusion in the multivariable model, not applicable

    • During multivariable modelling, not applicable


Hyperparameter tuning
  • XGB All, XGB Common, LGBM All, and LGBM Common: unclear, best cost‐sensitive learning weight selected using grid search over list of weights and chosen based on 5‐fold CV AUC, algorithm‐specific parameters not discussed, code suggests some tuning

  • XGB Common Val and LGBM Common Val: not applicable


Shrinkage of predictor weights
  • XGB All, XGB Common, LGBM All, and LGBM Common: modelling method

  • XGB Common Val and LGBM Common Val: not applicable


Performance evaluation dataset
  • XGB All, XGB Common, LGBM All, and LGBM Common: development

  • XGB Common Val and LGBM Common Val: external validation


Performance evaluation method
  • XGB All, XGB Common, LGBM All, and LGBM Common: cross‐validation, 10‐fold (nested)

  • XGB Common Val and LGBM Common Val: unclear if model refit to new data


Calibration estimate
Not reported
Discrimination estimate
  • XGB All, and LGBM All: c‐statistic = 0.78

  • XGB Common, and LGBM Common: c‐statistic = 0.76

  • XGB Common Val and LGBM Common Val: c‐statistic = 0.82


Classification estimate
  • XGB All: cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.74, sensitivity = 0.68, specificity = 0.76

  • XGB Common: cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.65, sensitivity = 0.75, specificity = 0.62

  • LGBM All: cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.77, sensitivity = 0.58, specificity = 0.82

  • LGBM Common: cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.64, sensitivity = 0.75, specificity = 0.61

  • XGB Common Val: cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.68, sensitivity = 0.85, specificity = 0.60

  • LGBM Common Val: cutoff = 0.4 (results also reported for 0.5, 0.45, 0.35, and 0.3), accuracy = 0.73, sensitivity = 0.73, specificity = 0.73


Overall performance
Not reported
Risk groups
Not reported
Model  Model presentation
Not reported
Number of predictors in the model
  • XGB All and LGBM All: 198

  • XGB Common and LGBM Common: ≤ 105 (unclear subset)

  • XGB Common Val and LGBM Common Val: not applicable


Predictors in the model
  • XGB All, XGB Common, LGBM All, and LGBM Common: unclear if it is the complete list, Common: age, ethnicity, longitudinal features, attack previous 6 m, attack previous 2 years, bowel–bladder function, brainstem function, cerebellar function, gender, race, disease category, EDSS, lesion volume, mental function, pyramidal function, smoking history, sensory function, total GD, visual function, walk 25 ft time

  • XGB Common Val and LGBM Common Val: not applicable


Effect measure estimates
  • XGB All, XGB Common, LGBM All, and LGBM Common: not reported

  • XGB Common Val and LGBM Common Val: not applicable


Predictor influence measure
  • XGB All and LGBM All: top 10 predictive features

  • XGB Common and LGBM Common: not reported

  • XGB Common Val and LGBM Common Val: not applicable


Validation model update or adjustment
  • XGB All, XGB Common, LGBM All, and LGBM Common: not applicable

  • XGB Common Val and LGBM Common Val: model probably refit

Interpretation  Aim of the study
To apply machine learning techniques to predict the disability level of MS patients at the 5‐year time point using the first 2 years of clinical and neuroimaging longitudinal data
Primary aim
The primary aim of this study is somehow the prediction of individual outcomes. The focus is on machine learning methods.
Model interpretation
Probably exploratory
Suggested improvements
Time series models to better capture the temporal dependencies, incorporate genetic information, and additional biomarkers
Notes Applicability overall
Unclear
Applicability overall rationale
Due to the lack of reporting on the final model or tool, it is unclear whether this study aimed to develop a model/tool for the prediction of individual MS outcomes. Additionally, it is unclear how many participants had already experienced the outcome by 2 years, at which time point the predictors were still being collected.
Auxiliary references
Bove R, Chitnis T, Cree BA, Tintoré M, Naegelin Y, Uitdehaag BM, et al. SUMMIT (serially unified multicenter multiple sclerosis investigation): creating a repository of deeply phenotyped contemporary multiple sclerosis cohorts. Mult Scler 2018;24(11):1485‐98.
Gauthier SA, Glanz B I, Mandel M, Weiner HL. A model for the comprehensive investigation of a chronic autoimmune disease: the multiple sclerosis CLIMB study. Autoimmun Rev 2006;5(8):532‐6.
 
Item Authors' judgement Support for judgement
Participants Unclear The data were prospectively collected from a cohort. Although references cited in the article have some study inclusion/exclusion criteria, the number of patients used in this article does not match them. Thus, the inclusion/exclusion criteria used in the article are unclear.
Predictors Unclear Due to the fact that the data were from cohort studies, the predictors are expected to be similar. The intended time of model use is unclear, and the predictors from 2 years were used to predict 5‐year outcome.
Outcome No A pre‐specified outcome was probably used, but EDSS change was not confirmed. We are not concerned that the outcome assessment could be biased by knowledge of predictors' values. However, it is unclear how many patients had had the outcome by 2 years at the time point where predictors were still collected.
Analysis No XGBoost All, XGBoost Common, LightGBM All and LightGBM Common: The number of events per variable was low compared to the unknown number of predictors considered. Calibration was not assessed. Parameter tuning was reported to occur over a grid of values within an inner CV loop, so optimism was probably addressed. The final model was unclear.
XGBoost Common and LightGBM Common: Calibration was not assessed. The model appears to be refit in the external validation set.
Overall No At least one domain is at high risk of bias.

25FW (also seen as T25FW): timed 25‐foot walk
2D: 2‐dimensional
3D: 3‐dimensional
3:4‐DAP: 3,4‐diaminopyridine
4‐AP: 4‐aminopyridine
9‐HPT (also seen as 9HPT): 9‐hole peg test
ABILHAND: interview‐based assessment of a patient‐reported measure of the perceived difficulty in using their hand to perform manual activities
ACTH: adrenocorticotropic hormone
Ada (AdaBoost): adaptive boosting
ADL: activities of daily living
AH: abductor hallucis
AIC: Akaike information criterion
AISM: Italian Multiple Sclerosis Society
APB: abductor pollicis brevis
AUC: area under the curve
BCVA: best corrected visual acuity
BENEFIT: Betaferon/Betaseron in Newly Emerging MS for Initial Treatment
BIC: Bayesian information criterion
BMA: Bayesian model averaging
BMS: benign MS
BPF: brain parenchymal fraction
BPTF: Bayesian probabilistic tensor factorisation
BREMS: Bayesian Risk Estimate for Multiple Sclerosis
BREMSO: Bayesian Risk Estimate for Multiple Sclerosis at Onset
CAO: clinician‐assessed outcomes
CCA: current course assignment
CCV: cerebral cortical volume
CDMS: clinically definite multiple sclerosis
CEL: contrast‐enhancing lesion
CI: confidence interval
CIS: clinically isolated syndrome
CL: cortical lesion
CLCN4: chloride voltage‐gated channel 4
CLIMB: Comprehensive Longitudinal Investigation of Multiple Sclerosis at Brigham and Women's
CMCT: central motor conduction time
CombiWISE: Combinatorial Weight‐adjusted Disability Score
COMRIS‐CTD: Combinatorial MRI scale of CNS tissue destruction
CNN: convolutional neural network
CNS: central nervous system
COPOUSEP: Corticothérapie Orale dans les Poussées de Sclérose en Plaques
CPT: current procedural terminology
CSF: cerebrospinal fluid
CT: computed tomography
CTh: cortical thickness
CUIs: concept unique identifiers
CXCL13: chemokine ligand 13
CV: cross‐validation
DAWM: diffusely abnormal white matter 
Dev: development
df: degrees of freedom
Dgm: deep grey matter
DIR: double inversion recovery
DIS: dissemination in space
DIT2010: dissemination in time according to McDonald 2010 criteria
DMD: disease‐modifying drug
DMT: disease‐modifying treatment
DNA: deoxyribonucleic acid
DSS: Disability Status Scale
DT: decision time; decision tree
EDSS: expanded disability status scale
EDT: Euclidean distance transform
EHR: electronic health record
EP: evoked potential
EPIC: expression, proteomics, imaging, clinical
EPTS: evoked potential time series
EPV: events per variable
Ext: external
F: female
F1: F‐score
FCA: future course assignment
FLAIR: fluid‐attenuated inversion recovery
FLP: first level predictor
FREEDOMS: FTY720 Research Evaluating Effects of Daily Oral therapy in Multiple Sclerosis
FS: functional systems
FTP: fine tuning predictor
GA: glatiramer acetate
Gd (also seen as “GD”): gadolinium
Gd‐DTPA: gadolinium diethylenetriamine penta‐acetic acid
GM: grey matter
GRU‐ODE‐Bayes: Gated Recurrent Unit‐Ordinary Differential Equation‐Bayes
HLA: human leukocyte antigen
HR: hazard ratio
ICBM‐DTI: International Consortium of Brain Mapping diffusion tensor imaging
ICD: International Classification of Disease
IFN: interferon
IgG: immunoglobulin G
IL2: interleukin‐2
ILIRN: interleukin‐1 receptor antagonist
IQR: interquartile range
JHU‐MNI: Johns Hopkins University‐Montreal Neurological Institute
KFSS: Kurtzke Functional Systems Scores
LASSO: least absolute shrinkage and selection operator
LGBM: light gradient‐boosting machine
logMAR: logarithm of the minimum angle of resolution
LOO: leave‐one‐out
LOOCV: leave‐one‐out cross‐validation
LR: logistic regression
LSTM: long short‐term memory
MAGNIMS: Magnetic Resonance Imaging in MS
MEP (also seen as “mEPS”): motor evoked potentials
mEPS: motor evoked potentials
MF: motor function
MFIS: Modified Fatigue Impact Scale
ML: machine learning
MNI: Montreal Neurological Institute
MPI: multifactorial prognostic index
MR: magnetic resonance
MRI: magnetic resonance imaging
MS: multiple sclerosis
MSBASIS: MSBase Incident Study
MS‐DSS: MS disease severity scale
MSE: mean squared error
MSFC: multiple sclerosis functional composite
MSPS: multiple sclerosis prediction score
MSSS: MS severity score
MT/MTR: magnetisation transfer imaging
NA: not applicable
NDH‐9HPT: non‐dominant hand 9‐hole peg test
NEMO: network modification tool
NF‐L: neurofilament light chain level
NHPT: nine‐hole peg test
NMO: neuromyelitis optica
NMOSD: neuromyelitis optica spectrum disorder
NN: neural network
NPV: negative predictive value
NR: not reported
NR2Y: number of relapses experienced in the first 2 years after MS onset
O:E: observed to expected ratio
OB: oligoclonal bands
OCB: oligoclonal bands
OCT: optical coherence tomography
OFSEP: Observatoire Français de la Sclérose en Plaques
ON: optic neuritis
OND: other neurologic disease
OR: odds ratio
PASAT: Paced Auditory Serial Addition Test
PBMC: peripheral blood mononuclear cells
PD: patient‐determined
PDCD2: human programmed cell death‐2 gene
Pdw: proton density‐weighted
PP: primary progressive
PPMS: primary progressive MS
PPV: positive predictive value
PRIMS: pregnancy in MS
PRO: patient‐reported outcome
PSIR: phase‐shifted inversion recovery
QOL: quality of life
RCT: randomised controlled trial
RF: random forest
RH: relapse history
RNA: ribonucleic acid
RNRL: retinal nerve fibre layer
ROC: receiver operating characteristic
ROI: region of interest
RR: relapsing‐remitting
RRMS: relapsing‐remitting multiple sclerosis
RT‐PCR: reverse transcription polymerase chain reaction
SCL: spinal CL
SD: standard deviation
SDMT: symbol digits modality test
SE: standard error
SF‐36: 36‐Item Short Form Survey
SMS: severe multiple sclerosis
SMSreg: Swedish MS registry
SNP: single nucleotide polymorphism
SNRS: Scripps neurological rating scale
SP: secondary progression
SPMS: secondary progressive multiple sclerosis
SUMMIT: Serially Unified Multicenter Multiple Sclerosis Investigation
SVM: support vector machine
T1c: T1‐weighted pre‐contrast
T1p: T1 weighted post‐contrast
T2LV (also seen as “T2 LV”): T2 lesion volume
T2w: T2‐weighted
TT2R: time to second relapse
TWT: timed walk test
Val: validation
VFT: visual function test
WBC: white blood cell
WM: white matter
XGB: extreme gradient boosting

Characteristics of excluded studies [ordered by study ID]

Study Reason for exclusion
Achiron 2006 Ineligible model: there is no multivariable prognostic prediction model in this study. Rather, there is the description of EDSS evolution in a cohort and its comparison with new patients.
Ahlbrecht 2016 Ineligible study type: the objective of this study is to assess associations between microRNAs detected in the CSF and conversion from CIS to RRMS. No model is developed or validated for prognostic prediction.
Andersen 2015 Ineligible study type: the aim of this study is not to create a prognostic prediction model but to describe the natural history of the disease.
Azevedo 2019 Ineligible study type: the aim of this conference abstract is to identify minimum clinically meaningful differences in brain atrophy rather than using multivariable models for prediction of future outcomes in individuals.
Barkhof 1997 Ineligible model: this is a count‐score study. The multivariable logistic regression is used for predictor selection. This is followed by counting the abnormal variables and forming a univariable logistic regression with the count to derive predicted risk.
Brettschneider 2006 Ineligible model: this study aims to assess whether cerebrospinal fluid biomarkers can improve diagnostic criteria for prediction of conversion from CIS to CDMS. However, no statistically developed multivariable models are used for predicting future conversion in individual patients; rather, diagnostic criteria are combined.
Bsteh 2021 Ineligible model: the models aim to predict outcomes after treatment withdrawal, which can be considered treatment response.
Castellaro 2015 Ineligible study type: based on the presented aim, results, and conclusion, the aim of this conference abstract is to show that specific brain measures are predictive of conversion.
Chalkou 2021 Ineligible model: the objective of this study is treatment effect prediction, and the prognostic model is only a step towards that goal.
Costa 2017 Ineligible study type: this poster presents a prognostic factor study that aims to investigate the prognostic role of different biomarkers.
Cutter 2014 Ineligible study type: the objective of this poster presentation is not prognostic prediction but an indirect comparison of different treatment regimens.
Damasceno 2019 Ineligible study type: the aim of this study is to analyse cognitive trajectories using longitudinal models. Hence, there is no prediction of outcomes in individuals.
Daumer 2007 Ineligible model: there is no prediction in this study. A matching algorithm based on similarity is used, which is followed by a description of the data.
Dekker 2019 Ineligible study type: the objective of this study is to show the predictive value of brain measures. The multivariable models fit in the study are not used for predictions and are not interpreted as prognostic models in the discussion section.
Esposito 2011 Ineligible outcome: the outcome, classification of lesions as normal or abnormal, is not a clinical outcome.
Filippi 2010 Ineligible study type: the objective of this conference abstract is to develop diagnostic criteria for MS.
Filippi 2013 Ineligible study type: the objective of this study is to identify MRI predictors. Although random forests are used, it is to assess the importance of predictors for future outcomes.
Fuchs 2021 Ineligible study type: the aim of this study is to compare the use of imaging features extracted from routine clinical data with modified methods to those collected according to research standards.
Gasperini 2021 Ineligible model: the developed score is not statistically derived, and it is unclear whether the aim of the study is the prediction of treatment responses.
Gomez‐Gonzalez 2010 Ineligible study type: the aim of this study is to demonstrate the use of an automated tool for oligoclonal band analysis and to show that the information extracted relates to patient subgroups.
Hakansson 2017 Ineligible study type: the objective is to search for prognostic markers in CSF, and there is no individual‐level prediction with a multivariable model.
Ho 2013 Ineligible population: this study aims to predict the risk of MS diagnosis in the general population, not the risk of future MS outcomes.
Ignatova 2018 Ineligible study type: the aim of this study is to find predictors of progression.
Invernizzi 2011 Ineligible model: this study investigates the prognostic value of an evoked potentials score, which is not a multivariable model for prognostic prediction.
Jackson 2020 Ineligible timing: the outcome in this study is based on cross‐sectionally collected data, precluding prognostic prediction.
Kalincik 2013 Ineligible study type: the stated aim of this study is to evaluate associations between genetic susceptibility markers and MS phenotypes.
Leocani 2017 Ineligible study type: the objective is of this conference abstract is to demonstrate the prognostic value of evoked potentials, not prognostic prediction.
Morelli 2020 Ineligible study type: the objective of this study is to show the predictive value of putamen hypertrophy with cognitive impairment instead of prognostic prediction.
Palace 2013 Ineligible study type: the objective of this study is to assess cost‐effectiveness.
Pappalardo 2020 Ineligible outcome: the study looks at endpoints that are unrelated to relapse, disability, or conversion to a more advanced disease stage.
Petrou 2018 Ineligible study type: the aim of this conference abstract is to assess correlations between biomarkers and clinical outcomes. Also, the study contains no multivariable prognostic model for the prediction of future outcomes but rather assesses a non‐statistical combination of two biomarkers.
Preziosa 2015 Ineligible study type: this study aims to show the value of MRI measures and uses a multivariable model to this end.
Rajda 2019 Ineligible population: at the moment of prognostication, the people included are those yet to be diagnosed, and the outcome is differentiation of people with MS vs controls.
Rio 2019 Ineligible model: this conference presentation compares different treatment response scores and a count score, which is not a multivariable model, with the stated intention of treatment response prediction.
Rodriguez 2012 Ineligible study type: the aim is to apply a novel model to an MS dataset. The focus is not clinical prediction but demonstration of the methodology.
Rothman 2016 Ineligible study type: in this study, multivariable models are used to assess the association between retinal measurements, visual function, and future disease disability rather than predicting individual outcomes.
Roura 2018 Ineligible timing: this study aims to evaluate the longitudinal changes in brain fractal geometry and its association with disability worsening. Correspondence with the authors has confirmed that the models are not predicting future outcomes but current states.
Sbardella 2011 Ineligible study type: the objective of this study is to demonstrate the predictive value of diffuse brain damage as opposed to prognostic prediction.
Schlaeger 2012 Ineligible study type: the objective of this study is to demonstrate the predictive value of evoked potentials.
Srinivasan 2020 Ineligible outcome: in this abstract, the presented outcomes (QoL, fatigue, depression, falls) are not related to clinical disability with respect to the definition we are using in our review.
Tintore 2015 Ineligible model: this conference presentation contains no prediction but rather categorisation into different groups by analysis of time‐to‐event data and description of these groups' characteristics.
Tomassini 2019 Ineligible model: this is a count‐score. In this study, Cox regression is used to select predictors that are later counted to give a discrete score. This score is used in a univariate model as a factor to report risk stratification.
Tossberg 2013 Ineligible model: the model in this study is not used for prognostic prediction but for diagnostic purposes. The developed score for diagnostic purposes is used for prognostic prediction only in those that convert to MS.
Uher 2017a Ineligible model: in this study, multivariable models are used to select adjusted predictors, followed by counting the positive predictors to create a score.
Uher 2017b Ineligible study type: the purpose is not prognostic prediction but to demonstrate the concurrent predictive value of MRI measures on cognitive impairment.
Veloso 2014 Ineligible model: this publication presents a simulation interface based on previously published studies, most relevant to our review being BREMS, but does not perform any new prediction and only describes or reports correlations for the included study participants.
Vukusic 2006 Ineligible study type: unrelated review
Wahid 2019 Ineligible model: the two models in this conference abstract are not longitudinal in nature, and the only longitudinal model is presented as a treatment response prediction tool.
Zephir 2009 Ineligible study type: the objective of this study is to demonstrate the usefulness of IgG as a biomarker of pathology.
Ziemssen 2019 Ineligible outcome: in this poster presentation, the objective is differentiating between the relapsing and progressing diagnoses instead of prognostic prediction.

BREMS: Bayesian Risk Estimate for Multiple Sclerosis
CDMS: clinically definite multiple sclerosis
CIS: clinically isolated syndrome
CSF: cerebrospinal fluid
EDSS: Expanded Disability Status Scale
IgG: immunoglobulin G
MRI: magnetic resonance imaging
MS: multiple sclerosis
QoL: quality of life
RNA: ribonucleic acid
RRMS: relapsing–remitting multiple sclerosis

Characteristics of studies awaiting classification [ordered by study ID]

Achiron 2007.

General information Reason for awaiting classification
It is unclear whether the study design is longitudinal or whether the sampling is performed at the same time as the outcome assessment.
Model name
Not reported
Primary source
Journal
Data source
Cohort, primary
Study type
Development
Participants Inclusion criteria
  • RRMS

  • Participants with good outcome (no deterioration in neurological disability, no relapse during 2‐year follow‐up) or poor outcome (EDSS score change ≥ 0.5)


Exclusion criteria
Not reported
Recruitment
Israel
Age (years)
Mean 43
Sex (%F)
69.8
Disease duration (years)
Mean 10.5 (pooled SD 2.4)
Diagnosis
100% RRMS
Diagnostic criteria
Not reported
Treatment
100% on interferon beta‐1a
Disease description
(For the source population including all outcomes) EDSS (unclear if mean and SD): development 2.0 (1.0), validation 2.5 (0.2); mean annualised relapse rate: 1.1
Recruitment period
Not reported
Predictors Considered predictors
PBMC RNA microarray analysis of gene transcripts
Outcome Outcome definition
Composite (includes relapse and scores (EDSS)): good outcome defined as no deterioration in neurological disability and no relapse, poor outcome as EDSS score change ≥ 0.5 that needed to be confirmed at 3 months during 2‐year follow‐up
Timing of outcome measurement
Unclear, follow‐up for 2 years
Analysis Number of participants (number of events)
56 (unclear how many events in the validation set, ≥ 9)
Modelling method
Support vector machine
Performance evaluation dataset
Development
Performance evaluation method
Random split
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Classification rate = 88.9%
Predictors in the model
34 gene transcripts from the following 29 genes: ADD1, CA11, CCL17, CD44, COL11A2, CRYGD, DNM1, DR1, GNMT, GPP3, GSTA1, HAB1, HSPA8, IGLJ3, IGLVJ, IL3RA, KIAA0980, KLF4, KLK1, MUC4, NY‐REN‐24, ODZ2, PTN, RRN3, S100B, TCRBV, TOP3B, TPSB2, VEGFB
Interpretation Aim of the study
To evaluate whether gene expression profiling can differentiate RRMS patients according to their clinical course – either favourable or poor
Notes

Behling 2019.

General information Reason for awaiting classification
Conference abstract
Model name
Not reported
Primary source
Abstract
Data source
Mixed (routine care, claims), secondary
Study type
Development
Participants Inclusion criteria
  • MS patients treated with a DMT prior to 31 December 2017 (index date)

  • No evidence of relapse in the 30 days prior to the index


Exclusion criteria
Not reported
Recruitment
Patients from a variety of provider practice types across the USA included in the OM1 Data cloud, USA
Age (years)
Median 54
Sex (%F)
Not reported
Disease duration (years)
Not reported
Diagnosis
Not reported
Diagnostic criteria
Not reported
Treatment
  • At recruitment: unclear

  • During follow‐up: 100% on DMT


Disease description
Not reported
Recruitment period
2015 to 2018
Predictors Considered predictors
Probably not the complete list, most significant predictors: the number of relapses in the previous 12 months, antiemetic medication use, skeletal muscle relaxants, MS‐related fatigue symptoms
Outcome Outcome definition
Relapse: MS‐related inpatient stay, emergency room visit, or outpatient visit with documented MS and a corticosteroid prescription within 7 days
Timing of outcome measurement
Within 6 months after index
Analysis Number of participants (number of events)
18,137 (unclear, calculated from reported event rate, 1415)
Modelling method
Random forest
Performance evaluation dataset
Development
Performance evaluation method
Random split, 80% training, 20% test
Calibration
Not reported
Discrimination estimate
c‐Statistic > 0.70
Classification estimate
Cutoff determined from data, PPV = 0.203, 1‐NPV = 0.058; unclear whether other reported measure (0.84) is accuracy or sensitivity
Predictors in the model
Probably not the complete list, most significant predictors: the number of relapses in the previous 12 months, antiemetic medication use, skeletal muscle relaxants, MS‐related fatigue symptoms
Interpretation Aim of the study
To use advanced analytics to predict relapses amongst MS patients treated with DMTs identified from a large, representative database of linked EMR and claims data
Notes

Castellazzi 2019.

General information Reason for awaiting classification
The age range of the included patients is not reported, and it is unclear whether the objective is the development of a diagnostic or prognostic model.
Model name
Not reported
Primary source
Poster
Data source
Not reported
Study type
Development
Participants Inclusion criteria
RRMS patients and healthy controls for developing the classifier and CIS patients for applying
Exclusion criteria
Not reported
Recruitment
Not reported
Age (years)
Not reported
Sex (%F)
Not reported
Disease duration (years)
Not reported
Diagnosis
39.6% CIS, 30.2% RRMS, and 30.2% healthy controls
Diagnostic criteria
McDonald (undefined)
Treatment
Not reported
Disease description
EDSS (unclear if mean and SD): CIS 1.3 (0.8), RRMS 1.7 (1.2)
Recruitment period
Not reported
Predictors Considered predictors
Thresholded and processed cross‐correlation matrix of mean rs‐fMRI signals of parcellated preprocessed rs‐fMRI images
Outcome Outcome definition
Conversion to definite MS (McDonald, undefined): RRMS
Timing of outcome measurement
12 months
Analysis Number of participants (number of events)
106 (unclear how many events in the prediction group of CIS patients, ≥ 32)
Modelling method
  • Multiple models

  • Support vector machine


Performance evaluation dataset
External validation
Performance evaluation method
Model developed to differentiate healthy controls from RRMS is used to predict RRMS conversion in CIS patients
Classification estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Accuracy = 69% (SVM), 56% (ANFIS)
Predictors in the model
10 features from 10 distinct AAL areas, including cuneus, pallidum, calcarine, fusiform, cbl‐6/7b/8, supp motor area, sup/mid occip gyri
Interpretation Aim of the study
To predict the conversion to RRMS in participants with CIS
Notes

Chaar 2019.

General information Reason for awaiting classification
The time points used in the model are not reported, and it is unclear whether the design was longitudinal in nature. Also, the age range of included patients is not described.
Model name
Not reported
Primary source
Abstract
Data source
Cohort, primary
Study type
Development
Participants Inclusion criteria
Patients on fingolimod
Exclusion criteria
Not reported
Recruitment
Not reported
Age (years)
Not reported
Sex (%F)
Not reported
Disease duration (years)
Not reported
Diagnosis
Not reported
Diagnostic criteria
Not reported
Treatment
100% on fingolimod
Disease description
Not reported
Recruitment period
Not reported
Predictors Considered predictors
Unclear features extracted from magnetic resonance imaging, magnetic resonance spectroscopy, magnetisation transfer ratio, diffusion tensor imaging, and optical coherence tomography
Outcome Outcome definition
Disability (EDSS)
Timing of outcome measurement
Unclear, 3 time points at 1‐year intervals
Analysis Number of participants (number of events)
Unclear unit of analysis, 50 participants, 135 time points (not reported)
Modelling method
Neural network, single hidden‐layered feed‐forward ANN with Bayesian regularisation
Performance evaluation dataset
Development
Performance evaluation method
Random split, 85% training, 15% test
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Mean squared error = 1.213, accuracy = 77.9%
Predictors in the model
Unclear, possibly non‐tabular data from MRI, MRS, MTR, DTI, and OCT
Interpretation Aim of the study
To predict the clinical disability based on multiple important imaging biomarkers
Notes

Dalla Costa 2014.

General information Reason for awaiting classification
It is unclear whether all the predictors are used to predict an outcome in the future or concurrent to the predictor measurement.
Model name
Not reported
Primary source
Abstract
Data source
Not reported
Study type
Development
Participants Inclusion criteria
Admittance within 3 months from the onset of a CIS
Exclusion criteria
Not reported
Recruitment
Patients admitted to the San Raffaele Hospital, Neurological Department, Italy
Age (years)
Not reported
Sex (%F)
Not reported
Disease duration (years)
Not reported
Diagnosis
100% CIS
Diagnostic criteria
Not reported
Treatment
Not reported
Disease description
Not reported
Recruitment period
Not reported
Predictors Considered predictors
Unclear features from clinical data as well as MRI, multimodal EP, and CSF data
Outcome Outcome definition
Conversion to definite MS
Timing of outcome measurement
Unclear, follow‐up mean 6.82 (SD 2.78)
Analysis Number of participants (number of events)
227 (120)
Modelling method
Neural network, multilayer perceptron with a back propagation algorithm
Performance evaluation dataset
Development
Performance evaluation method
Random split, 80% training, 20% validation
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Accuracy = 87%
Predictors in the model
Clinical, MRI, CSF, and EP data
Interpretation Aim of the study
To develop an ANN‐based diagnostic model integrating both clinical and paraclinical baseline data
Notes

Ghosh 2009.

General information Reason for awaiting classification
The age range of included patients is not reported. It is not clear whether individual prediction occurred. Also, the multivariable nature of the model cannot be determined from the limited information.
Model name
Not reported
Primary source
Abstract
Data source
Randomised trial participants, secondary
Study type
Development
Participants Inclusion criteria
  • RRMS

  • Complete information on MRI, on‐study relapses, and baseline covariates


Exclusion criteria
Not reported
Recruitment
Ian McDonald database
Age (years)
Mean 27.9 (at onset)
Sex (%F)
Not reported
Disease duration (years)
Mean 7.5
Diagnosis
100% RRMS
Diagnostic criteria
Not reported
Treatment
Not reported
Disease description
EDSS mean 3
Recruitment period
Not reported
Predictors Considered predictors
Number of Gd‐enhancing lesions, T2 lesion volume
Outcome Outcome definition
Relapse
Timing of outcome measurement
Unclear, follow‐up ≤ 129 weeks
Analysis Number of participants (number of events)
108 (58)
Modelling method
Joint longitudinal model, 3 models connected via random effects, parameter estimates by Markov chain Monte Carlo
Performance evaluation dataset
Not reported
Performance evaluation method
Not reported
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Not reported
Predictors in the model
Not reported
Interpretation Aim of the study
To establish a model that allows the prediction of occurrence of relapses by including longitudinal information on the number of Gd‐enhancing lesions and T2 lesion volume simultaneously
Notes

Kister 2015.

General information Reason for awaiting classification
Conference abstract
Model name
Not reported
Primary source
Poster
Data source
Secondary
  • Dev: registry

  • Val: routine care


Study type
Development and validation (unclear whether predictions adapted), location
Participants Inclusion criteria
  • Dev:

    • Age ≥ 19 years (unclear)

    • Diagnosis of MS

    • Completed disability self‐assessment at enrolment and at 2 years and 5 years of follow‐up

  • Val:

    • MS patients

    • 2 or more PDDS scores that were more than 6 months apart unless a relapse within 3 months of clinic visit


Exclusion criteria
Not reported
Recruitment
  • Dev:

    • NARCOMS Registry

    • USA, Canada (unclear)

  • Val:

    • Consecutive patients at 2 outpatient MS centres in the greater New York area

    • USA


Age (years)
  • Dev: median 47.1

  • Val: mean 45.5


Sex (%F)
  • Dev: 79.8

  • Val: 73.7


Disease duration (years)
  • Dev: not reported

  • Val: mean 12 (SD 8.7)


Diagnosis
  • Dev: not reported

  • Val: 80.54% RRMS, 10.65% SPMS, 4.09% PPMS, 0.75% PRMS, 3.97% other


Diagnostic criteria
Not reported
Treatment
  • Dev: not reported

  • Val: 80.4% on DMT


Disease description
Not reported
Recruitment period
Not reported
Predictors Considered predictors
Gender, age, baseline P‐MSSS
Outcome Outcome definition
  • Dev: disability (P‐MSSS) – aggressive disease defined as worse disability than in 5/6 patients with the same disease duration (P‐MSSS > 0.83); P‐MSSS is PDDS rank‐adjusted by disease duration

  • Val: disability (P‐MSSS) – severe MS defined as P‐MSSS > 0.83


Timing of outcome measurement
  • Dev: follow‐up for 2 years

  • Val: follow‐up mean (range): 10.6 months (6 months to 17 months)

Analysis Number of participants (number of events)
  • Dev: 2364 (not reported)

  • Val: 930 (80)


Modelling method
Logistic regression
Performance evaluation dataset
  • Dev: development

  • Val: external validation


Performance evaluation method
  • Dev: apparent

  • Val: unclear whether development data are also used for validation


Calibration estimate
  • Dev: calibration slope = 0.992, calibration intercept = −0.008

  • Val: not reported


Discrimination estimate
  • Dev: c‐statistic = 0.925

  • Val: not reported


Classification estimate
  • Dev: cutoff (0.296) chosen to give correct prediction (PPV) of severe MS in 50% of patients, sensitivity = 0.77, specificity = 0.90

  • Val: sensitivity = 0.90, specificity = 0.91, PPV = 0.49, NPV = 0.99


Predictors in the model
Gender, age, P‐MSSS
Interpretation Aim of the study
  • Dev: to develop and internally validate a logistic regression model that uses patients' gender, age, and baseline P‐MSSS as predictor variables to estimate the probability of aggressive MS 2 years later

  • Val: to determine short‐term stability of P‐MSSS in MS clinic patients and to explore the utility of the newly developed P‐MSSS‐based risk calculator for this population

Notes Auxiliary references
Charlson R, Herbert J, Kister I. >CME/CNE article: severity grading in multiple sclerosis: a proposal. Int J MS Care 2016;18(5):265‐70.
Kister I, Chamot E, Salter AR, Cutter GR, Bacon TE, Herbert J. Disability in multiple sclerosis: a reference for patients and clinicians. Neurology 2013;80(11):1018‐24.
Kister I, Bacon TE, Cutter GR. Short‐term disability progression in two multiethnic multiple sclerosis centers in the treatment era. Ther Adv Neurol Disord 2018;11:1756286418793613.
Kister I, Kantarci OH. Multiple sclerosis severity score: concept and applications. Mult Scler 2020;26(5):548‐53.
Learmonth YC, Motl RW, Sandroff BM, Pula JH, Cadavid D. Validation of patient determined disease steps (PDDS) scale scores in persons with multiple sclerosis. BMC Neurol 2013;13:37.

Mallucci 2019.

General information Reason for awaiting classification
Conference abstract
Model name
Not reported
Primary source
Abstract
Data source
Not reported
Study type
Development
Participants Inclusion criteria
CIS patients
Exclusion criteria
Not reported
Recruitment
Not reported
Age (years)
Median 32.3
Sex (%F)
65.6
Disease duration (years)
Unclear, upper limit 1 year
Diagnosis
100% CIS
Diagnostic criteria
Not reported
Treatment
  • At recruitment: 29.4% on DMT

  • During follow‐up: not reported


Disease description
Not reported
Recruitment period
Not reported
Predictors Considered predictors
Not reported
Outcome Outcome definition
Composite (includes symptoms, disability): no evidence of disease activity (NEDA3) status in which NEDA3 maintenance is defined by no relapses, no disability progression, and no MRI activity
Timing of outcome measurement
Unclear, 12 months
Analysis Number of participants (number of events)
279 (not reported)
Modelling method
Logistic regression, Bayesian
Performance evaluation dataset
Development
Performance evaluation method
Apparent
Calibration estimate
Not reported
Discrimination estimate
c‐Statistic = 0.83
Classification estimate
  • Accuracy = 0.77 (95% CI 0.70 to 0.83)

  • Sensitivity = 0.69, specificity = 0.82


Predictors in the model
Age, onset with optic neuritis, abnormal upper sensory EPs, abnormal visual EPs, therapy with DMD
Interpretation Aim of the study
To define a prognostic model for the early forecast of losing NEDA3 status (no relapses, no disability progression, no MRI activity) in CIS patients within 12 months from disease onset
Notes

Medin 2016.

General information Reason for awaiting classification
Conference abstract
Model name
Composite
Primary source
Poster
Data source
Routine care, secondary
Study type
Development
Participants Inclusion criteria
  • Confirmed diagnosis of RRMS

  • At least 12 months of follow‐up data post‐index date

  • Non‐missing baseline EDSS score

  • Receiving BRACE (interferons, glatiramer acetate) therapy

  • Subgroup analyses were performed for each possible combination of therapy (BRACE cont., BRACE to BRACE, BRACE to first line, BRACE to second line), with inclusion/exclusion as appropriate


Exclusion criteria
Not reported
Recruitment
Neuro Trans Data, a group of neurology practices, Germany
Age (years)
Not reported
Sex (%F)
Not reported
Disease duration (years)
Not reported
Diagnosis
100% RRMS
Diagnostic criteria
Not reported
Treatment
  • At recruitment, 100% on interferon or glatiramer acetate

  • During follow‐up, 100% on DMT


Disease description
Not reported
Recruitment period
2010 to 2015
Predictors Considered predictors
Unclear whether it is the complete list, demographics (born in Central Europe, aged < 30 years at index date, aged ≥ 30 years and < 40 years at index date), diagnostic history, treatment (fingolimod was available at the index date, teriflunomide was available at the index date), disability status (EDSS score of 0 earlier than 360 days prior to index date), disability history (at least one relapse in the 180 days to 360 days prior to index date, at least one relapse in the 360 days to 720 days prior to index date), cranial and spinal lesion count
Outcome Outcome definition
Relapse: relapse defined as binary over the 12‐month follow‐up period defined as patient‐reported or objectively observed events typical of an acute inflammatory demyelinating event in the central nervous system, current or historical, with duration of at least 24 hours, in the absence of fever or infection
Timing of outcome measurement
12 months; the period is randomly chosen
Analysis Number of participants (number of events)
4129 (751 or 752, calculated from reported event rate)
Modelling method
Logistic regression, elastic net
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, k‐fold
Calibration estimate
Quintiles of predicted probability of relapse vs actual relapse rate
Discrimination estimate
c‐Statistic = 0.69 (95% CI 0.67 to 0.71)
Classification estimate
Not reported
Predictors in the model
Whether the patient experienced at least 1 relapse in the 180 days to 360 days prior to index date, whether the patient was aged < 30 years at index date, whether the patient experienced at least one relapse in the 360 days to 720 days prior to index date, whether the patient was aged ≥ 30 years and < 40 years at index date, whether the patient was born in Central Europe, whether Gilenya was available at the index date, whether Aubagio was available at the index date, whether the patient has an EDSS score of 0 earlier than 360 days prior to index date
Interpretation Aim of the study
To predict disease activity for patients with RRMS using EMR
Notes

Pareto 2017.

General information Reason for awaiting classification
Conference abstract
Model name
Converter and nonconverter
Primary source
Abstract
Data source
Not reported
Study type
Development
Participants Inclusion criteria
Not reported
Exclusion criteria
Not reported
Recruitment
Consecutively
Age (years)
Not reported
Sex (%F)
Not reported
Disease duration (years)
Not reported
Diagnosis
100% CIS
Diagnostic criteria
Not reported
Treatment
Not reported
Disease description
Not reported
Recruitment period
Not reported
Predictors Considered predictors
PRoNTo‐based imaging parameters from segmented grey matter masks
Outcome Outcome definition
Conversion to definite MS (McDonald 2010 (Polman 2011)): either MRI or clinical demonstration of dissemination in space and time
Timing of outcome measurement
Follow‐up for 3 years
Analysis Number of participants (number of events)
90 (45)
Modelling method
Support vector machine
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, LOOCV
Calibration estimate
Not reported
Discrimination estimate
Not reported
Classification estimate
Sensitivity (converters) = 0.65
Specificity (nonconverters) = 0.63
Predictive values = 0.65 (converters) and 0.64 (nonconverters)
Predictors in the model
PRoNTo‐based imaging parameters from segmented grey matter masks
Interpretation Aim of the study
To test whether 3D‐T1‐weighted structural images in conjunction with the pattern recognition tool PRoNTo could differentiate between CIS patients that converted and CIS patients that did not convert to MS
Notes Auxiliary references
Schrouff J, Rosa MJ, Rondina JM, Marquand AF, Chu C, Ashburner J, et al. PRoNTo: pattern recognition for neuroimaging toolbox. Neuroinformatics 2013;11(3):319‐37.

Sharmin 2020.

General information Reason for awaiting classification
Conference abstract
Model name
Not reported
Primary source
Presentation
Data source
Registry, secondary
Study type
Development
Participants Inclusion criteria
Not reported
Exclusion criteria
  • Not MS

  • < 4 visits record for a patient

  • Patient from centre with < 10 patients

  • Patients with missing data at follow‐up


Recruitment
MSBase registry
Age (years)
Not reported
Sex (%F)
Not reported
Disease duration (years)
Not reported
Diagnosis
Unclear, 16.99% CIS, 66.63% RRMS, 7.46% SPMS, 7.34% PPMS, 1.57% PRMS
Diagnostic criteria
Mixed: McDonald 2005 (Polman 2005), McDonald 2010 (Polman 2011)
Treatment
Not reported
Disease description
Not reported
Recruitment period
Not reported
Predictors Considered predictors
Age (years); sex (female ref); MS course (CIS ref, RR, SP, PP, PR); disease duration (years); EDSS (0 to 5.5 ref, 6+); change in EDSS; recency of relapse (> 2 months ref, < 1 month, 1 month to 2 months); number of affected FSSs; separate predictor for worsening in each of pyramidal, cerebellar, brainstem, sensory, bowel‐bladder, visual, and cerebral systems; 2‐way interaction between disease duration and each of FSS worsening predictors; annualised visit density
Outcome Outcome definition
Disability (unclear): risk of 6‐month confirmed disability progression event being sustained over the long term
Timing of outcome measurement
Median (IQR): 9.48 years (6.02 years to 13.32 years)
Analysis Number of participants (number of events)
14,802, unit of analysis is event of 8741 participants (not reported)
Modelling method
Survival (Cox)
Performance evaluation dataset
Development
Performance evaluation method
Random split
Calibration estimate
Not reported
Discrimination estimate
Harrell's c‐statistic = 0.89
Classification estimate
Not reported
Predictors in the model
Age, male, primary progressive, relapsing‐remitting, relapse in previous month, EDSS ≥ 6, EDSS change, number of affected FSSs, worsening pyramidal FSS, worsening in cerebellar FSS, worsening in brainstem FSS, worsening in sensory FSS, worsening in visual FSS, worsening in cerebral FSS, worsening in pyramidal FSS: disease duration, worsening in sensory FSS: disease duration, worsening in cerebral FSS: disease duration, (other: annualised visit density)
Interpretation Aim of the study
To identify those 6‐month confirmed disability progression events that are more likely to represent a long‐term disability worsening
Notes Auxiliary references
Giovannoni G, Comi G, Cook S, Rammohan K, Rieckmann P, Soelberg Sørensen P, et al. A placebo‐controlled trial of oral cladribine for relapsing multiple sclerosis. N Engl J Med 2010;362(5):416‐26.
NCT00641537. CLARITY extension study. https://ClinicalTrials.gov/show/NCT00641537 (first received 28 March 2008).

Silva 2017.

General information Reason for awaiting classification
Conference abstract
Model name
MS‐COT
  • Relapse LASSO

  • Relapse stepwise

  • Disability


Primary source
Poster
Data source
Randomised trial participants
Study type
Development
Participants Inclusion criteria
  • Age between 18 years and 55 years

  • Diagnosed with RRMS

  • One or more confirmed relapses during the preceding year (or 2 or more confirmed relapses during the previous 2 years)

  • EDSS score of 0 to 5.5

  • No relapse or steroid treatment within 30 days before randomisation

  • Interferon β or glatiramer acetate therapy stopped at least 3 months before randomisation


Exclusion criteria
  • Active infection

  • Macular oedema

  • Diabetes mellitus

  • Immune suppression (drug‐ or disease‐induced) or clinically significant systemic disease


Recruitment
  • Participants in the FREEDOMS II, an RCT, from 117 academic and tertiary referral centres in 8 participating countries

  • Unclear subset: Australia, Austria, Belgium, Canada, Czech Republic, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Israel, Netherlands, Poland, Romania, Russia, Slovakia, South Africa, Sweden, Switzerland, Turkey, United Kingdom, United States


Age (years)
Mean 38.7
Sex (%F)
73.6
Disease duration (years)
9.3 (range 0 to 37)
Diagnosis
100% RRMS
Diagnostic criteria
McDonald 2005 (Polman 2005)
Treatment
  • At recruitment, not applicable

  • During follow‐up, 67.2% on fingolimod


Disease description
EDSS mean 2.4, previous year number of relapses 1.5
Recruitment period
2006 to 2011
Predictors Considered predictors
Treatment (fingolimod or placebo), T1 hypointense volume, gender, T2 lesion volume rate, age, NBV, number of relapses in the last year, individualised NBV, number of relapses in the last 2 years, number of Gd+ T1 lesions, EDSS, T2 lesion volume, duration of MS since the first symptom, total number of relapses since the first diagnosis, number of previous DMTs, progression index
Outcome Outcome definition
  • Relapse LASSO and relapse stepwise

    • Relapse: relapse verified by the examining neurologist within 7 days after the onset of symptoms, the symptoms had to be accompanied by an increase of at least half a point in the EDSS score, of 1 point in each of 2 EDSS functional‐system scores, or of 2 points in 1 EDSS functional‐system score, excluding scores for the bowel–bladder or cerebral functional systems

  • Disability

    • Disability (EDSS): 3‐/6‐month confirmed disability progression defined as an increase of 1 point in the EDSS score (or half a point if the baseline EDSS score was equal to 5.5), confirmed after 3/6 months, with an absence of relapse at the time of assessment and with all EDSS scores measured during that time meeting the criteria for disability progression


Timing of outcome measurement
Unclear which of the models is reported, at 1 year or at 2 years
Analysis Number of participants (number of events)
  • Relapse LASSO and relapse stepwise

    • 2355 (unclear, 831)

  • Disability

    • 2355 (unclear, 3 months confirmed 521, 6 months confirmed 343)


Modelling method
  • Relapse LASSO

    • Logistic regression (LASSO)

  • Relapse stepwise

    • Logistic regression

  • Disability

    • Generalised additive model (binary, nonlinear)


Performance evaluation dataset
Development
Performance evaluation method
Random split with CV within training for predictor ranking
Calibration
Not reported
Discrimination estimate
c‐Statistic
  • Relapse LASSO: 0.66

  • Relapse stepwise and disability: 0.67


Classification estimate
Not reported
Predictors in the model
Not reported
Interpretation Aim of the study
To develop an educational predictor tool based on machine learning techniques to help physicians identify clinical and imaging parameters that influence and contribute to long‐term outcomes in patients with RMS
Notes Auxiliary references
Calabresi PA, Radue EW, Goodin D, Jeffery D, Rammohan KW, Reder AT, et al. Safety and efficacy of fingolimod in patients with relapsing‐remitting multiple sclerosis (FREEDOMS II): a double‐blind, randomised, placebo‐controlled, phase 3 trial. Lancet Neurol 2014;13(6):545‐56.
Kappos L, Radue EW, O'Connor P, Polman C, Hohlfeld R, Calabresi P, et al. A placebo‐controlled trial of oral fingolimod in relapsing multiple sclerosis. N Engl J Med 2010;362(5):387‐401.

Tayyab 2020.

General information Reason for awaiting classification
Conference abstract
Model name
Not reported
Primary source
Presentation
Data source
Randomised trial participants, secondary
Study type
Development
Participants Inclusion criteria
  • Age between 18 years and 60 years

  • Patients with onset of their first demyelinating symptoms within the previous 180 days

  • A minimum of 2 lesions that were at least 3 mm in diameter on a T2‐weighted (T2w) screening brain MRI (one had to be ovoid, periventricular, or infratentorial)

  • For participants over the age of 50, cerebrospinal fluid oligoclonal bands or spinal MRI changes typical of demyelination


Exclusion criteria
  • Better explanation for the event

  • Previous event reasonably attributable to demyelination

  • Meeting the 2005 McDonald criteria for MS (Polman 2005)


Recruitment
Participants in the placebo‐controlled randomised trial of minocycline, Canada
Age (years)
Mean 35.9 (onset)
Sex (%F)
69.0
Disease duration (years)
Median 0.23 (range 0.06 to 0.52)
Diagnosis
100% CIS
Diagnostic criteria
McDonald 2005 (Polman 2005)
Treatment
  • At recruitment, not applicable

  • During follow‐up, 50.7% on minocycline


Disease description
EDSS median (range): 1.5 (0 to 4.5)
Recruitment period
2009 to 2013
Predictors Considered predictors
Unclear whether it is the complete list: individual DGM nuclei volumes, minocycline vs placebo, CIS type (monofocal vs multifocal), NBV, sex, EDSS, variable for each location of initial CIS event: cerebrum, optic nerve, cerebellum, brainstem, spinal cord, brain parenchymal fraction
Outcome Outcome definition
Composite (includes relapse): new disease activity (clinical or MRI) within 2 years of a first clinical demyelinating event, defined by the McDonald 2005 criteria (Polman 2005) for conversion to definite MS
Timing of outcome measurement
2 years
Analysis Number of participants (number of events)
140 (60)
Modelling method
Random forest
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, 3‐fold
Calibration
Not reported
Discrimination estimate
c‐Statistic = 0.76
Classification estimate
Accuracy = 0.821, sensitivity = 0.81, PPV = 0.87, F1 = 0.84
Predictors in the model
DGM volumes
Interpretation Aim of the study
To develop a machine learning model for predicting new disease activity (clinical or MRI) within 2 years of a first clinical demyelinating event, using baseline DGM volumes
Notes Auxiliary references
Metz LM, Li DKB, Traboulsee AL, Duquette P, Eliasziw M, Cerchiaro G, et al. Trial of minocycline in a clinically isolated syndrome of multiple sclerosis. N Engl J Med 2017;376(22):2122‐33.

Thiele 2009.

General information Reason for awaiting classification
Conference abstract
Model name
Model‐based approach
Primary source
Abstract
Data source
Registry, secondary
Study type
Development
Participants Inclusion criteria
RRMS
Exclusion criteria
Not reported
Recruitment
Danish MS register, Denmark
Age (years)
Mean 32.3 (at onset)
Sex (%F)
69
Disease duration (years)
Mean 5.6 (range 0 to 30)
Diagnosis
100% RRMS
Diagnostic criteria
Not reported
Treatment
Not reported
Disease description
EDSS mean (range): 2.63 (0 to 7.5), 24 months pre‐study attacks mean (range): 2.64 (1 to 10)
Recruitment period
1997 to 2001
Predictors Considered predictors
Sex, age at onset, disease duration, number of attacks in the 24 months prior to study, and baseline EDSS
Outcome Outcome definition
Relapse: annualised relapse rates
Timing of outcome measurement
Not reported
Analysis Number of participants (number of events)
1202 (continuous outcome)
Modelling method
Count data glm (quasi‐Poisson, negative‐binomial, zero‐inflated Poisson)
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, LOOCV
Calibration estimate
Other: mean prediction error 0.53 to 0.54
Discrimination estimate
Not applicable
Classification estimate
Not applicable
Predictors in the model
Sex, age at onset, disease duration, number of attacks in the 24 months prior to study, baseline EDSS
Interpretation Aim of the study
To compare the performance of a matching‐based approach to predict annualised relapse rates of MS patients versus statistical models
Notes

Tintoré 2015.

General information Reason for awaiting classification
It is unclear whether the study is longitudinal in nature.
Model name
Not reported
Primary source
Presentation
Data source
Cohort, primary
Study type
Development
Participants Inclusion criteria
  • Within 3 months of CIS

  • < 50 years


Exclusion criteria
Not reported
Recruitment
Spain
Age (years)
Not reported
Sex (%F)
Not reported
Disease duration (years)
Not reported
Diagnosis
100% CIS
Diagnostic criteria
Not reported
Treatment
Not reported
Disease description
Not reported
Recruitment period
1996 to 2014
Predictors Considered predictors
Unclear whether the list is complete: gender, age (40 years to 49 years, 30 years to 39 years, 20 years to 29 years, 0 years to 19 years), optic neuritis, number of T2 lesions (0, 1 to 3, 4 to 9, ≥ 10), DMT before second attack, DMT after second attack, topography, CSF: OB, 12‐month number of T2 lesions, 12‐month Gd+, treatment, relapse during first year
Outcome Outcome definition
Unclear, composite: conversion to definite MS, EDSS ≥ 3
Timing of outcome measurement
Unclear, follow‐up every 12 months and 5 years
Analysis Number of participants (number of events)
1059 (unclear, different numbers reported in abstract and presentation)
Modelling method
Multiple models: decision tree based on survival model (Cox)
Performance evaluation dataset
Development
Performance evaluation method
Apparent
Calibration estimate
Not reported
Discrimination estimate
Harrell's c‐statistic (for 12‐month model) CDMS 0.76, EDSS 0.75
Classification estimate
Not reported
Predictors in the model
At baseline: number of T2 lesions, oligoclonal bands, optic neuritis, sex, age; at first year: number of new T2 lesions, onset of DMD during first year, relapse during the first year
Interpretation Aim of the study
To elaborate a dynamic model for predicting long‐term prognosis
Notes

Tommasin 2019.

General information Reason for awaiting classification
Conference abstract
Model name
Not reported
Primary source
Poster
Data source
Not reported
Study type
Development
Participants Inclusion criteria
  • Age between 18 years and 70 years

  • Diagnosis of MS according to the McDonald criteria (2010 (Polman 2011), 2017 (Thompson 2018b))

  • Baseline clinical assessment and MRI examination not more than 1 month apart

  • Clinical follow‐up available after 2 years to 6 years from the MRI examination


Exclusion criteria
  • Relapses in the last 3 months

  • Contraindication to MRI


Recruitment
Not reported
Age (years)
Mean 38.3
Sex (%F)
76.2
Disease duration (years)
Not reported
Diagnosis
81% RRMS, 19% SPMS
Diagnostic criteria
Mixed: McDonald 2010, McDonald 2017
Treatment
30.5% first line, 29.5% second line, 40% none
Disease description
EDSS median 2.0 (range 0.0 to 7.5)
Recruitment period
Not reported
Predictors Considered predictors
3D T1 images (slices of the sagittal, axial, coronal projections)
Outcome Outcome definition
Disability (EDSS): 5‐year disease progression defined as 1.5‐point increase for patients with a baseline EDSS of 0, 1 point for scores from 1.0 to 5.0, and 0.5 points for scores equal to or higher than 5.5; confirmed at 6 months
Timing of outcome measurement
4 to 6 years
Analysis Number of participants (number of events)
105 (36)
Modelling method
Convolutional neural network
Performance evaluation dataset
Development
Performance evaluation method
Random split, 90% training, 10% validation
Calibration
Not reported
Discrimination estimate
Not reported
Classification estimate
Cutoff = 0.5, sensitivity and specificity reported for unclear selections
Predictors in the model
3D T1 images
Interpretation Aim of the study
To investigate the efficacy of deep learning models to accurately predict those patients who will have disability progression in the following 5 years and those who will be stable, based on 3D T1 MRI images acquired at 3T
Notes Auxiliary references
Rio J, Rovira A, Tintore M, Otero‐Romero S, Comabella M, Vidal‐Jordana A, et al. Disability progression markers over 6‐12 years in interferon‐beta‐treated multiple sclerosis patients. Mult Scler 2018;24(3):322‐30.

Wahid 2018.

General information Reason for awaiting classification
Conference abstract
Model name
Not reported
Primary source
Abstract
Data source
Randomised trial participants, secondary
Study type
Development
Participants Inclusion criteria
  • Age between 18 years and 60 years

  • Available T1 and FLAIR baseline MRI scans and EDSS scores at 3 years

  • RRMS diagnosis

  • EDSS between 0 and 5.5

  • At least 2 exacerbations in the prior 3 years (one exacerbation may utilise the McDonald MRI criteria for dissemination in time)


Exclusion criteria
Not reported
Recruitment
Subset of participants in CombiRx RCT, USA
Age (years)
Not reported
Sex (%F)
Not reported
Disease duration (years)
Not reported
Diagnosis
100% RRMS
Diagnostic criteria
Mixed: Poser 1983, McDonald (undefined)
Treatment
Unclear number of participants on interferon beta, glatiramer acetate, their combination
Disease description
Not reported
Recruitment period
Not reported
Predictors Considered predictors
Radiomics (shape, intensity, texture), age, sex, baseline EDSS, lesion volume
Outcome Outcome definition
Disability (EDSS): EDSS < 2 vs EDSS ≥ 2
Timing of outcome measurement
3 years
Analysis Number of participants (number of events)
33 (not reported)
Modelling method
Gradient boosting
Performance evaluation dataset
Development
Performance evaluation method
Cross‐validation, repeated
Calibration
Not reported
Discrimination estimate
Not reported
Classification estimate
Accuracy = 0.867 (SD 0.024)
Predictors in the model
Radiomic shape, intensity, and texture measures
Interpretation Aim of the study
To evaluate the predictive performance of machine learning models constructed from MRI radiomic features at baseline to predict clinical outcomes at 3 years in RRMS
Notes Auxiliary references
Bhanushali MJ, Gustafson T, Powell S, Conwit RA, Wolinsky JS, Cutter GR, et al. Recruitment of participants to a multiple sclerosis trial: the CombiRx experience. Clinical Trials. 2014;11(2):159‐66.
NCT00211887. Combination therapy in patients with relapsing‐remitting multiple sclerosis (MS)CombiRx. https://clinicaltrials.gov/ct2/show/NCT00211887(first received 21 September 2005).

3T: 3 Tesla
AAL: automated anatomical labelling
ANFIS: adaptive‐neuro‐fuzzy‐inference system
ANN: artificial neural network
BRACE: Betaseron (interferon beta‐1b), Rebif (interferon beta‐1a), Avonex (interferon beta‐1a), Copaxone (Glatiramer acetate), and Extavia (interferon beta‐1b)
CDMS: clinically definite multiple sclerosis
CIS: clinically isolated syndrome
CSF: cerebrospinal fluid
CV: cross‐validation
DGM: deep grey matter
DMD: disease‐modifying drug
DMT: disease‐modifying therapy
DTI: diffusion tensor imaging
EDSS: Expanded Disability Status Scale
EMR: electronic medical records
EP: evoked potential
FLAIR: fluid‐attenuated inversion recovery
FSS: Functional Systems Score
Gd: gadolinium
IQR: interquartile range
LASSO: least absolute shrinkage and selection operator
LOOCV: leave‐one‐out cross‐validation
MRI: magnetic resonance imaging
MRS: magnetic resonance spectroscopy
MS: multiple sclerosis
MS‐COT: multiple sclerosis care optimisation tool
MTR: magnetisation transfer ratio
NARCOMS: North American Research Consortium on Multiple Sclerosis
NBV: normalised brain volume
NEDA3: no evidence of disease activity 3
NPV: negative predictive value
OB: oligoclonal bands
OCT: optical coherence tomography
P‐MSSS: patient‐derived MS Severity Score
PBMC: peripheral blood mononuclear cell
PDDS: patient‐determined disease steps
PPMS: primary progressive MS
PPV: positive predictive value
PRMS: progressive‐relapsing multiple sclerosis
PRoNTo: Pattern Recognition for Neuroimaging Toolbox
RCT: randomised controlled trial
RNA: ribonucleic acid
RRMS: relapsing–remitting multiple sclerosis
rs‐fMRI: resting state functional magnetic resonance imaging
SD: standard deviation
SPMS: secondary progressive multiple sclerosis
SVM: support vector machine

Differences between protocol and review

Objectives

  • We relocated the details on the investigation of sources of heterogeneity between studies from the Objectives to the Methods for conciseness and readability.

Criteria for considering studies for this review

  • In 'Types of studies', the eligibility criterion of aiming to develop or validate a prognostic model was already present at the protocol stage. We further operationalised its implementation in the review text. We also clarified that the statistical method used to develop the prognostic model was not a criterion for selection, but that studies on prognostic factors or treatment response prediction were excluded. The possible data sources for prognostic model studies and what is meant by validation was also defined in the review text rather than the protocol.

  • During the review, we came across eligible prognostic model validation studies of models whose development studies would not meet the eligibility criteria outlined in the protocol. In order to have the necessary details on these models, we added a new eligibility criterion to include studies that developed models which were validated in other eligible prognostic prediction studies.

  • In 'Targeted population', we clarified that we included prognostic model studies in people with MS regardless of the MS subtyping they reported. For transparency, we also reported that we considered an episode of optic neuritis as a clinically isolated syndrome and thus studies on people with this condition were eligible.

  • In 'Types of outcomes', we clarified that the data type of the outcome was not a criterion for selection. Also, what was considered to constitute one of the five outcomes (four clinical outcome categories plus their composite) as defined in the protocol was further detailed by giving no evidence of disease activity as an example to the composite outcome and clarifying that cognitive disability fit into one of those categories, but that fatigue, depression, or falls did not.

Search

  • As per the recently published PRISMA statement (Page 2021), we gave details on the platforms used to search the databases and the studies we used during the validation of the search.

  • Originally we had planned to perform backward citation tracking by handsearching the references of related studies. While using Web of Science for the forward search, we realised that it contains a similar functionality for the backward search. We decided to use this convenient functionality because it allowed not only deduplication but also simultaneous screening of titles and abstracts of the references, which would not have been the case during handsearching.

Selection of studies

  • We reported the details of how the pilot screening was conducted, which were absent in the protocol text.

  • During screening, we additionally searched the Internet for or contacted the authors of studies that could not be included or excluded with the reported information, including all conference abstracts.

  • At the protocol stage, we had not planned on how to proceed with eligible conference abstracts without any full‐text report. During the review, it became clear that the information contained in an abstract was not sufficient for selection or assessment of risk of bias. Hence, we decided to present the data extracted from the conference abstracts without a full‐text report in Characteristics of studies awaiting classification.

  • How we were going to screen non‐English abstracts (by using online translators) and full‐texts (with support from native speakers) was missing from the protocol and is clarified in the review text.

  • We reported the study selection based on the flow‐chart of the recently published updated PRISMA statement (Page 2021), rather than the PRISMA statement (Moher 2009), as proposed in the protocol.

  • For transparency, we elaborated on the details of how we operationalised and interpreted study eligibility criteria in a new subsection titled 'Details regarding selection of studies' of the review text.

Data extraction and management

  • During the review, we came across multiple reports from a single study that sometimes had conflicting information. Our prioritisation in such cases was not defined in the protocol but was in the review text.

  • Due to the range of studies we came across, there were minor changes to the extracted data items during the review, e.g. adding a tuning parameter item in order to collect important details related to models developed with machine learning (ML) or using the terms primary/secondary data use rather than prospective/retrospective due to the confusion on and misuse of the latter in the literature. These are reflected in the list of items in this section and elaborated in the Appendices.

Assessment of reporting deficiencies

  • In the protocol, this section came after 'Dealing with missing data'. For a better flow of the text, this section was reported after 'Data extraction and management'.

  • In the protocol we had only mentioned that TRIPOD was going to be used for the assessment of reporting. In the review text we gave the details of our operationalisation based on the domains and items we used for this task.

Assessment of risk of bias in included studies

  • In this section of the protocol we had referred to PROBAST as the risk of bias and applicability assessment tool for prognostic model studies and had briefly summarised its domains. Due to the importance of the risk of bias assessment, the challenges encountered by studies in people with MS, and the limitation of the current tool in its applicability to models developed using machine learning, we had to interpret the items in PROBAST. For transparency, in the review we elaborated on our interpretations and assessment of the risk of bias and applicability in the included analyses.

Measures of association or predictive performance measures to be extracted

  • In the protocol we had proposed describing the adjusted effect measures of prognostic factors in models developed over time. Although we extracted data on the effect measures and their uncertainties from studies that could and did report these, comparing them was not possible due to the variety of the predictors considered, differences in their definitions, and the considerable amount of included ML methods for which traditional effect measures may not be applicable.

  • For clarity, we operationalised the classification measures and validation categories we collected.

Dealing with missing data

  • In the protocol we had proposed to contact the authors for missing information needed for quantitative data synthesis or risk of bias assessment. In the review we reported that we also contacted them for unclear or missing information needed for not only the aforementioned purposes but also study eligibility and basic study description.

  • In the protocol we had proposed applying methods to derive missing performance measures (the c‐statistic for discrimination and the O:E ratio for calibration) and their precision from the reported information. The data reported in the studies did not allow for calculation of missing c‐statistics or missing calibration measures, specifically O:E ratios. Thus, we have changed this in the review to only describe the method we used to derive the missing precision of a reported c‐statistic.

Data synthesis

  • We had intended to perform meta‐analysis for models with at least three external validations and had described the methods under the subheading 'Data synthesis and meta‐analysis approaches' in the protocol. However, no single model had at least three independent external validation studies outside its development study, thus we decided against performing a meta‐analysis and this subheading was removed from the review text. Instead, we added a subheading called 'Synthesis' to describe how we described and summarised the findings in this review.

  • Because there was no meta‐analysis, we did not perform any sensitivity analysis and removed the subsection 'Sensitivity analysis'.

Investigation of sources of heterogeneity between studies

  • In the protocol there were two subheadings to give details on assessment of heterogeneity, 'Assessment of heterogeneity' under 'Data collection' and 'Subgroup analysis and investigation of heterogeneity' under 'Data synthesis'. It was planned to report heterogeneity measures from the meta‐analysis and to perform meta‐regression for different models with the same outcome when at least 10 models for that outcome were identified. Due to the large variability in outcome definitions and poor reporting of performance measures, we were not able to perform this meta‐regression but could only describe the heterogeneity qualitatively. Hence, we reduced the space allocated to this topic to one subsection under 'Data synthesis'.

Terms used for reporting and synthesis

  • For clarity, we added this section to define the terms we used in reporting the review.

Contributions of authors

Task Authors responsible
Draft the protocol BIO, KAR, JH, MG, JB, UH, UM
Develop and run the search strategy MG, KAR, BIO, ZA
Obtain copies of studies AA, BIO, KAR, MG
Select which studies to include BIO, KAR, AA, ZA, AG
Provide consultation on which studies to include UH, JH, JB, UM
Extract data from the studies KAR, BIO, AA, AG
Provide consultation on data extraction HS, UH, UM, JB, JH
Assess risk of bias KAR, BIO, AA, AG
Provide consultation on risk of bias assessment HS, UH, UM, JB, JH
Enter data into RevMan 5 BIO, KAR, ZA, AG
Carry out the analysis KAR, BIO, ZA
Interpret the analysis KAR, BIO, JB, UH, UM, HS, JH
Draft the final review BIO, KAR, JB, JH, UH, UM, MG, HS, AG, ZA, AA
Update the review UM, UH

Sources of support

Internal sources

  • DIFUTURE Project at Ludwig‐Maximilians‐Universität München, Germany

    DIFUTURE is funded by the German Federal Ministry of Education and Research under 01ZZ1804B and 01ZZ1804C.

  • Clinical Research Priority Program (CRPP), University of Zurich, Switzerland

    The CRPP funded the project PrecisionMS: Implementing Precision Medicine in Multiple Sclerosis.

  • Privatdozenten‐Stiftung, University of Zurich, Switzerland

    Privatdozenten‐Stiftung provided partial financial support for project costs including electronic search consulting and research assistant help.

External sources

  • No sources of support provided

Declarations of interest

  • JH reports a grant for OCT research from the Friedrich‐Baur‐Stiftung and Merck, personal fees and non‐financial support from Alexion, Bayer HealthCare Pharmaceuticals, Biogen, Celgene, F. Hoffman‐La Roche, Janssen Biotech, Merck, Novartis, and Sanofi Genzyme and non‐financial support from the Guthy‐Jackson Charitable Foundation, all outside the submitted work.

  • UH received financial compensation once for a lecture organised by CSL Behring, after submission of the manuscript, and outside the submitted work.

  • BIO has provided consultancy to Roche once on a topic outside the submitted work.

  • KAR, JB, MG, AA, ZA, AG, HS, UM: nothing to declare

These authors should be considered joint first author

These authors contributed equally to this work

New

References

References to studies included in this review

Aghdam 2021 {published data only}

  1. Abri Aghdam K, Aghajani A, Kanani F, Soltan Sanjari M, Chaibakhsh S, Shirvaniyan F, et al. A novel decision tree approach to predict the probability of conversion to multiple sclerosis in Iranian patients with optic neuritis. Multiple Sclerosis and Related Disorders 2021;47:102658. [DOI] [PubMed] [Google Scholar]

Agosta 2006 {published data only}

  1. Agosta F, Rovaris M, Pagani E, Sormani MP, Comi G, Filippi M. Magnetization transfer MRI metrics predict the accumulation of disability 8 years later in patients with multiple sclerosis. Brain 2006;129(Pt 10):2620-7. [DOI: 10.1093/brain/awl208] [DOI] [PubMed] [Google Scholar]

Ahuja 2021 {published data only}

  1. Ahuja Y, Kim N, Liang L, Cai T, Dahal K, Seyok T, et al. Leveraging electronic health records data to predict multiple sclerosis disease activity. Annals of Clinical and Translational Neurology 2021;8(4):800-10. [DOI] [PMC free article] [PubMed] [Google Scholar]

Bejarano 2011 {published data only}

  1. Bejarano B, Bianco M, Gonzalez-Moron D, Sepulcre J, Goni J, Arcocha J, et al. Computational classifiers for predicting the short-term course of multiple sclerosis. BMC Neurology 2011;11:67. [DOI: 10.1186/1471-2377-11-67] [DOI] [PMC free article] [PubMed] [Google Scholar]

Bendfeldt 2019 {published data only}

  1. Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. MRI-based prediction of conversion from clinically isolated syndrome to clinically definite multiple sclerosis using SVM and lesion geometry. Brain Imaging and Behavior 2019;13(5):1361-74. [DOI: 10.1007/s11682-018-9942-9] [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. Predicting conversion to clinically definite multiple sclerosis using machine learning on the basis of cerebral grey matter segmentations. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/116222.
  3. Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. Predicting conversion to clinically definite multiple sclerosis using machine learning on the basis of cerebral grey matter segmentations. Multiple Sclerosis Journal 2015;23(Suppl 11):498-9. [Google Scholar]
  4. Bendfeldt K, Taschler B, Gaetano L, Madoerin P, Kuster P, Mueller-Lenke N, et al. MRI-based prediction of conversion from clinically isolated syndrome to clinically definite multiple sclerosis using SVM and lesion geometry. Brain Imaging and Behavior 2018;13(5):1361-74. [DOI] [PMC free article] [PubMed] [Google Scholar]

Bergamaschi 2001 {published data only}

  1. Bergamaschi R, Berzuini C, Romani A, Cosi V. Predicting secondary progression in relapsing-remitting multiple sclerosis: a bayesian analysis. Journal of the Neurological Sciences 2001;189(1-2):13-21. [DOI] [PubMed] [Google Scholar]

Bergamaschi 2007 {published data only}

  1. Bergamaschi R, Quaglini S, Trojano M, Amato MP, Tavazzi E, Paolicelli D, et al. Early prediction of the long term evolution of multiple sclerosis: the bayesian risk estimate for multiple sclerosis (BREMS) score. Journal of Neurology, Neurosurgery and Psychiatry 2007;78(7):757-9. [DOI: 10.1136/jnnp.2006.107052] [DOI] [PMC free article] [PubMed] [Google Scholar]

Bergamaschi 2015 {published data only}

  1. Bergamaschi R, Montomoli C, Mallucci G, Lugaresi A, Izquierdo G, Grand'Maison F, et al. BREMSO: a simple score to predict early the natural course of multiple sclerosis. European Journal of Neurology 2015;22(6):981-9. [DOI: 10.1111/ene.12696] [DOI] [PubMed] [Google Scholar]
  2. Bergamaschi R, Montomoli C, Mallucci G. Bayesian risk estimate for multiple sclerosis at onset (BREMSO): a simple clinical score for the early prediction of multiple sclerosis long-term evolution. In: 29th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2013 October 2-5; Copenhagen (Denmark). ECTRIMS, 2013. Available at onlinelibrary.ectrims-congress.eu/ectrims/2013/copenhagen/34238.
  3. Bergamaschi R, Montomoli C, Mallucci G. Bayesian risk estimate for multiple sclerosis at onset (BREMSO): a simple clinical score for the early prediction of multiple sclerosis long-term evolution. Multiple Sclerosis Journal 2013;19(Suppl 1):338. [Google Scholar]

Borras 2016 {published data only}

  1. Borras E, Canto E, Choi M, Maria Villar L, Alvarez-Cermeno JC, Chiva C, et al. Protein-based classifier to predict conversion from clinically isolated syndrome to multiple sclerosis. Molecular and Cellular Proteomics 2016;15(1):318-28. [DOI: 10.1074/mcp.M115.053256] [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Comabella M, Borràs E, Cantó E, Choi M, Villar LM, Álvarez-Cermeño JC, et al. Protein-based biomarker predicts conversion from clinically isolated syndrome to multiple sclerosis. Multiple Sclerosis Journal 2015;21(Suppl 11):634. [Google Scholar]

Brichetto 2020 {published data only}

  1. Brichetto G, Monti Bragadin M, Fiorini S, Battaglia MA, Konrad G, Ponzio M, et al. The hidden information in patient-reported outcomes and clinician-assessed outcomes: multiple sclerosis as a proof of concept of a machine learning approach. Journal of the Neurological Sciences 2020;41(2):459-62. [DOI: 10.1007/s10072-019-04093-x] [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Tacchino A, Fiorini S, Ponzio M, Barla A, Verri A, Battaglia MA, et al. Multiple sclerosis disease course prediction: a machine learning model based on patient reported and clinician assessed outcomes. In: 7th Joint European Committee for Treatment and Research in Multiple Sclerosis-Americas Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS-ACTRIMS); 2017 October 25-28; Paris (France). ECTRIMS, 2017. Available at onlinelibrary.ectrims-congress.eu/ectrims/2017/ACTRIMS-ECTRIMS2017/202553.
  3. Tacchino A, Fiorini S, Ponzio M, Barla A, Verri A, Battaglia MA, et al. Multiple sclerosis disease course prediction: a machine learning model based on patient reported and clinician assessed outcomes. Multiple Sclerosis Journal 2017;23(Suppl 3):58-9. [Google Scholar]

Calabrese 2013 {published data only}

  1. Calabrese M, Poretto V, Favaretto A, Seppi D, Alessio S, Rinaldi F, et al. The grey matter basis of disability progression in multiple sclerosis. Multiple Sclerosis Journal 2012;18(Suppl 4):121-2. [Google Scholar]
  2. Calabrese M, Romualdi C, Poretto V, Favaretto A, Morra A, Rinaldi F, et al. The changing clinical course of multiple sclerosis: a matter of gray matter. Annals of Neurology 2013;74(1):76-83. [DOI: 10.1002/ana.23882] [DOI] [PubMed] [Google Scholar]

De Brouwer 2021 {published data only}

  1. De Brouwer E, Becker T, Moreau Y, Havrdova EK, Trojano M, Eichau S, et al. Longitudinal machine learning modeling of MS patient trajectories improves predictions of disability progression. Computer Methods and Programs in Biomedicine 2021;208:106180. [DOI] [PubMed] [Google Scholar]
  2. De Brouwer E, Peeters L, Becker T, Altintas A, Soysal A, Van Wijmeersch B, et al. Introducing machine learning for full MS patient trajectories improves predictions for disability score progression. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279466.
  3. De Brouwer E, Peeters L, Becker T, Altintas A, Soysal A, Van Wijmeersch B, et al. Introducing machine learning for full MS patient trajectories improves predictions for disability score progression. Multiple Sclerosis Journal 2019;25(Suppl 2):63-5. [Google Scholar]

de Groot 2009 {published data only}

  1. Groot V, Beckerman H, Uitdehaag BM, Hintzen RQ, Minneboo A, Heymans MW, et al. Physical and cognitive functioning after 3 years can be predicted using information from the diagnostic process in recently diagnosed multiple sclerosis. Archives of Physical Medicine and Rehabilitation 2009;90(9):1478-88. [DOI: 10.1016/j.apmr.2009.03.018] [DOI] [PubMed] [Google Scholar]

Gout 2011 {published data only}

  1. Gout O, Bouchareine A, Moulignier A, Deschamps R, Papeix C, Gorochov G, et al. Prognostic value of cerebrospinal fluid analysis at the time of a first demyelinating event. Multiple Sclerosis Journal 2011;17(2):164-72. [DOI: 10.1177/1352458510385506] [DOI] [PubMed] [Google Scholar]

Gurevich 2009 {published data only}

  1. Gurevich M, Tuller T, Rubinstein U, Or-Bach R, Achiron A. Prediction of acute multiple sclerosis relapses by transcription levels of peripheral blood cells. BMC Medical Genomics 2009;2:46. [DOI: 10.1186/1755-8794-2-46] [DOI] [PMC free article] [PubMed] [Google Scholar]

Kosa 2022 {published data only}

  1. Barbour C, Kosa P, Greenwood M, Bielekova B. Constructing a molecular model of disease severity in multiple sclerosis. Neurology 2019;92(Suppl 15):P3.2-006. [Google Scholar]
  2. Barbour C, Kosa P, Varosanec M, Greenwood M, Bielekova B. Molecular models of multiple sclerosis severity identify heterogeneity of pathogenic mechanisms. medRxiv 2020 May 22 [Epub ahead of print]. [DOI: ] [DOI] [PMC free article] [PubMed]
  3. Barbour CR, Kosa P, Greenwood M, Bielekova B. Constructing a molecular model of disease severity in multiple sclerosis. Multiple Sclerosis Journal 2019;25:23. [Google Scholar]
  4. Kosa P, Barbour C, Varosanec M, Wichman A, Sandford M, Greenwood M, et al. Molecular models of multiple sclerosis severity identify heterogeneity of pathogenic mechanisms. Nature Communications 2022;13(1):7670. [DOI: 10.1038/s41467-022-35357-4] [DOI] [PMC free article] [PubMed] [Google Scholar]

Kuceyeski 2018 {published data only}

  1. Kuceyeski A, Monohan E, Morris E, Fujimoto K, Vargas W, Gauthier SA. Baseline biomarkers of connectome disruption and atrophy predict future processing speed in early multiple sclerosis. NeuroImage: Clinical 2018;19:417-24. [DOI: 10.1016/j.nicl.2018.05.003] [DOI] [PMC free article] [PubMed] [Google Scholar]

Law 2019 {published data only}

  1. Law MT, Traboulsee AL, Li DK, Carruthers RL, Freedman MS, Kolind SH, et al. Machine learning in secondary progressive multiple sclerosis: an improved predictive model for short-term disability progression. Multiple Sclerosis Journal Experimental Translational and Clinical 2019;5(4):2055217319885983. [DOI: 10.1177/2055217319885983] [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Law MT, Traboulsee AL, Li DK, Carruthers RL, Freedman MS, Kolind SH, et al. Machine learning outperforms linear regression for predicting disability progression in SPMS. In: 34th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2018 October 10-12; Berlin (Germany). ECTRIMS, 2018. Available at onlinelibrary.ectrims-congress.eu/ectrims/2018/ectrims-2018/228174.
  3. Law MT, Traboulsee AL, Li DK, Carruthers RL, Freedman MS, Kolind SH, et al. Machine learning outperforms linear regression for predicting disability progression in SPMS. Multiple Sclerosis Journal 2018;24(Suppl 2):1025. [Google Scholar]

Lejeune 2021 {published data only}

  1. Lejeune F, Chatton A, Laplaud D, Wiertlewski S, Edan G, Le Page E, et al. SMILE: a predictive model for scoring the severity of relapses in multIple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):198. [DOI] [PubMed] [Google Scholar]
  2. Lejeune F, Chatton A, Laplaud DA, Le Page E, Wiertlewski S, Edan G, et al. SMILE: a predictive model for scoring the severity of relapses in multIple sclerosis. Journal of Neurology 2021;268(2):669-79. [DOI] [PubMed] [Google Scholar]
  3. Lejeune F, Chatton A, Laplaud DA, Wiertlewski S, Edan G, Lepage E, et al. SCOPOUSEP: a predictive model for scoring the severity of relapses in multiple sclerosis. In: 34th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2018 October 10-12; Berlin (Germany). ECTRIMS, 2018. Available at onlinelibrary.ectrims-congress.eu/ectrims/2018/ectrims-2018/229235.
  4. Lejeune F, Chatton A, Laplaud DA, Wiertlewski S, Edan G, Lepage E, et al. SCOPOUSEP: a predictive model for scoring the severity of relapses in multiple sclerosis. Multiple Sclerosis Journal 2018;24(Suppl 2):791-2. [DOI] [PubMed] [Google Scholar]

Malpas 2020 {published data only}

  1. Malpas CB, Manouchehrinia A, Sharmin S, Roos I, Horakova D, Havrdova EK, et al. Aggressive form of multiple sclerosis can be predicted early after disease onset. Multiple Sclerosis Journal 2019;25(Suppl 2):605-7. [Google Scholar]
  2. Malpas CB, Manouchehrinia A, Sharmin S, Roos I, Horakova D, Havrdova EK, et al. Early clinical markers of aggressive multiple sclerosis. Brain 2020;143(5):1400-13. [DOI: 10.1093/brain/awaa081] [DOI] [PubMed] [Google Scholar]

Mandrioli 2008 {published data only}

  1. Mandrioli J, Sola P, Bedin R, Gambini M, Merelli E. A multifactorial prognostic index in multiple sclerosis. Cerebrospinal fluid IgM oligoclonal bands and clinical features to predict the evolution of the disease. Journal of Neurology 2008;255(7):1023-31. [DOI: 10.1007/s00415-008-0827-5] [DOI] [PubMed] [Google Scholar]

Manouchehrinia 2019 {published data only}

  1. Manouchehrinia A, Zhu F, Piani-Meier D, Lange M, Silva DG, Carruthers R, et al. Predicting risk of secondary progression in multiple sclerosis: a nomogram. Multiple Sclerosis Journal 2019;25(8):1102-12. [DOI: 10.1177/1352458518783667] [DOI] [PubMed] [Google Scholar]

Margaritella 2012 {published data only}

  1. Margaritella N, Mendozzi L, Garegnani M, Colicino E, Gilardi E, Deleonardis L, et al. Sensory evoked potentials to predict short-term progression of disability in multiple sclerosis. Journal of the Neurological Sciences 2012;33(4):887-92. [DOI: 10.1007/s10072-011-0862-3] [DOI] [PubMed] [Google Scholar]

Martinelli 2017 {published data only}

  1. Martinelli V, Dalla Costa G, Messina MJ, Di Maggio G, Moiola L, Rodegher M, et al. Use of multiple biomarkers to improve the prediction of multiple sclerosis in patients with clinically isolated syndromes. Journal of the Neurological Sciences 2015;23(Suppl 11):370-1. [Google Scholar]
  2. Martinelli V, Dalla Costa G, Messina MJ, Di Maggio G, Sangalli F, Moiola L, et al. Multiple biomarkers improve the prediction of multiple sclerosis in clinically isolated syndromes. Acta Neurologica Scandinavica 2017;136(5):454-61. [DOI: 10.1111/ane.12761] [DOI] [PubMed] [Google Scholar]

Misicka 2020 {published data only}

  1. Misicka E, Sept C, Briggs FBS. Predicting onset of secondary-progressive multiple sclerosis using genetic and non-genetic factors. Journal of Neurology 2020;267(8):2328-39. [DOI: 10.1007/s00415-020-09850-z] [DOI] [PubMed] [Google Scholar]

Montolio 2021 {published data only}

  1. Montolio A, Martin-Gallego A, Cegonino J, Orduna E, Vilades E, Garcia-Martin E, et al. Machine learning in diagnosis and disability prediction of multiple sclerosis using optical coherence tomography. Computers in Biology and Medicine 2021;133:104416. [DOI] [PubMed] [Google Scholar]

Olesen 2019 {published data only}

  1. Olesen MN, Soelberg K, Debrabant B, Nilsson AC, Lillevang ST, Grauslund J, et al. Cerebrospinal fluid biomarkers for predicting development of multiple sclerosis in acute optic neuritis: a population-based prospective cohort study. Journal of Neuroinflammation 2019;16(1):59. [DOI: 10.1186/s12974-019-1440-5] [DOI] [PMC free article] [PubMed] [Google Scholar]

Oprea 2020 {published data only}

  1. Oprea S, Văleanu A, Negreș S. The development and validation of a disability and outcome prediction algorithm in multiple sclerosis patients. Farmacia 2020;68(6):1147-54. [Google Scholar]

Pellegrini 2019 {published data only}

  1. Copetti M, Fontana A, Freudensprung U, De Moor C, Hyde R, Bovis F, et al. Predicting MS disease progression remains a significant challenge: results from advanced statistical models of RCT placebo arms. In: 7th Joint European Committee for Treatment and Research in Multiple Sclerosis-Americas Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS-ACTRIMS); 2017 October 25-28; Paris (France). ECTRIMS, 2017. Available at onlinelibrary.ectrims-congress.eu/ectrims/2017/ACTRIMS-ECTRIMS2017/199979.
  2. Copetti M, Fontana A, Freudensprung U, De Moor C, Hyde R, Bovis F, et al. Predicting MS disease progression remains a significant challenge: results from advanced statistical models of RCT placebo arms. Multiple Sclerosis Journal 2017;23(Suppl 3):113. [Google Scholar]
  3. Pellegrini F, Copetti M, Sormani M P, Bovis F, Moor C, Debray TP, et al. Predicting disability progression in multiple sclerosis: insights from advanced statistical modeling. Multiple Sclerosis Journal 2019;26(14):1828-36. [DOI: 10.1177/1352458519887343] [DOI] [PubMed] [Google Scholar]

Pinto 2020 {published data only}

  1. Pinto MF, Oliveira H, Batista S, Cruz L, Pinto M, Correia I, et al. Prediction of disease progression and outcomes in multiple sclerosis with machine learning. Scientific Reports 2020;10(1):21038. [DOI: 10.1038/s41598-020-78212-6] [DOI] [PMC free article] [PubMed] [Google Scholar]

Pisani 2021 {published data only}

  1. Pisani AI, Scalfari A, Crescenzo F, Romualdi C, Calabrese M. A novel prognostic score to assess the risk of progression in relapsing-remitting multiple sclerosis patients. European Journal of Neurology 2021;28(8):2503-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Pisani AI, Scalfari A, Romualdi C, Calabrese M. The progressive multiple sclerosis score: a prognostic assistant tool in multiple sclerosis disease. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279464.
  3. Pisani AI, Scalfari A, Romualdi C, Calabrese M. The progressive multiple sclerosis score: a prognostic assistant tool in multiple sclerosis disease. Multiple Sclerosis Journal 2019;25(Suppl 2):62. [Google Scholar]

Roca 2020 {published data only}

  1. Roca P, Attye A, Colas L, Tucholka A, Rubini P, Cackowski S, et al. Artificial intelligence to predict clinical disability in patients with multiple sclerosis using FLAIR MRI. Diagnostic and Interventional Imaging 2020;101(12):795-802. [DOI: 10.1016/j.diii.2020.05.009] [DOI] [PubMed] [Google Scholar]

Rocca 2017 {published data only}

  1. Filippi M, Rovaris MG, Sormani MP, Caputo D, Ghezzi A, Montanari E, et al. Earlier prognostication in primary progressive multiple sclerosis using MRI: a 15-year longitudinal study. European Journal of Neurology 2017;24(Suppl 1):43. [Google Scholar]
  2. Rocca MA, Sormani MP, Rovaris M, Caputo D, Ghezzi A, Montanari E, et al. Anticipation of long-term disability progression in PPMS using MRI: a 15-year longitudinal study. Multiple Sclerosis Journal 2017;23(Suppl 3):292-3. [Google Scholar]
  3. Rocca MA, Sormani MP, Rovaris M, Caputo D, Ghezzi A, Montanari E, et al. Long-term disability progression in primary progressive multiple sclerosis: a 15-year study. Brain 2017;140(11):2814-9. [DOI: 10.1093/brain/awx250] [DOI] [PubMed] [Google Scholar]

Rovaris 2006 {published data only}

  1. Rovaris M, Judica E, Gallo A, Benedetti B, Sormani MP, Caputo D, et al. Grey matter damage predicts the evolution of primary progressive multiple sclerosis at 5 years. Brain 2006;129(Pt 10):2628-34. [DOI: ] [DOI] [PubMed] [Google Scholar]

Runia 2014 {published data only}

  1. Runia TF, Jafari N, Siepman DAM, Nieboer D, Steyerberg E, et al. A clinical prediction model for definite multiple sclerosis in patients with clinically isolated syndrome. Multiple Sclerosis 2014;20(Suppl 1):404. [Google Scholar]
  2. Runia TF. Multiple Sclerosis - Predicting the Next Attack [Dissertation]. Rotterdam (Netherlands): Erasmus University Rotterdam, 2015. [Google Scholar]

Seccia 2020 {published data only}

  1. Seccia R, Gammelli D, Dominici F, Romano S, Landi AC, Salvetti M, et al. Considering patient clinical history impacts performance of machine learning models in predicting course of multiple sclerosis. PLOS One 2020;15(3):e0230219. [DOI: 10.1371/journal.pone.0230219] [DOI] [PMC free article] [PubMed] [Google Scholar]

Skoog 2014 {published data only}

  1. Skoog B, Runmarker B, Oden A, Andersen O. Multiple sclerosis: a method to identify high risk for secondary progression. Neurology 2012;78(Suppl 1):P05.089. [Google Scholar]
  2. Skoog B, Tedeholm H, Runmarker B, Oden A, Andersen O. Continuous prediction of secondary progression in the individual course of multiple sclerosis. Multiple Sclerosis and Related Disorders 2014;3(5):584-92. [DOI: 10.1016/j.msard.2014.04.004] [DOI] [PubMed] [Google Scholar]
  3. Tedeholm H, Skoog B, Andersen O. A method to identify the risk of transition to the secondary progressive course in multiple sclerosis patients. Neurology 2013;80(Suppl 7):P04.131. [Google Scholar]
  4. Tedeholm H, Skoog B, Runmarker B, Oden A, Andersen O. A new method to identify multiple sclerosis patients with a high risk for secondary progression. Multiple Sclerosis Journal 2012;18(Suppl 4):91. [Google Scholar]

Skoog 2019 {published data only}

  1. Skoog B, Link J, Tedeholm H, Longfils M, Nerman O, Fagius J, et al. Short-term prediction of secondary progression in a sliding window: a test of a predicting algorithm in a validation cohort. Multiple Sclerosis Journal - Experimental, Translational and Clinical 2019;5(3):2055217319875466. [DOI: 10.1177/2055217319875466] [DOI] [PMC free article] [PubMed] [Google Scholar]

Sombekke 2010 {published data only}

  1. Sombekke MH, Arteta D, de Wiel MA, Crusius JB, Tejedor D, Killestein J, et al. Analysis of multiple candidate genes in association with phenotypes of multiple sclerosis. Multiple Sclerosis 2010;16(6):652-9. [DOI: 10.1177/1352458510364633] [DOI] [PubMed] [Google Scholar]

Sormani 2007 {published data only}

  1. Sormani MP, Rovaris M, Comi G, Filippi M. A composite score to predict short-term disease activity in patients with relapsing-remitting MS. Neurology 2007;69(12):1230-5. [DOI: 10.1212/01.wnl.0000276940.90309.15] [DOI] [PubMed] [Google Scholar]

Spelman 2017 {published data only}

  1. Spelman T, Meyniel C, Rojas JI, Lugaresi A, Izquierdo G, Grand'Maison F, et al. Quantifying risk of early relapse in patients with first demyelinating events: prediction in clinical practice. Multiple Sclerosis Journal 2017;23(10):1346-57. [DOI: 10.1177/1352458516679893] [DOI] [PubMed] [Google Scholar]

Szilasiová 2020 {published data only}

  1. Szilasiová J, Rosenberger J, Mikula P, Vitková M, Fedičová M, Gdovinová Z. Cognitive event-related potentials-the P300 wave is a prognostic factor of long-term disability progression in patients with multiple sclerosis. Journal of Clinical Neurophysiology 2020 Oct 05 [Epub ahead of print]. [DOI: 10.1097/WNP.0000000000000788] [DOI] [PubMed]

Tacchella 2018 {published data only}

  1. Tacchella A, Romano S, Ferraldeschi M, Salvetti M, Zaccaria A, Crisanti A, et al. Collaboration between a human group and artificial intelligence can improve prediction of multiple sclerosis course: a proof-of-principle study. F1000Research 2017;6:2172. [DOI: 10.12688/f1000research.13114.2] [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Tacchella A, Romano S, Ferraldeschi M, Salvetti M, Zaccaria A, Crisanti A, et al. Collaboration between a human group and artificial intelligence can improve prediction of multiple sclerosis course: a proof-of-principle study. F1000Research 2018;6:2172. [DOI: 10.12688/f1000research.13114.1] [DOI] [PMC free article] [PubMed] [Google Scholar]

Tommasin 2021 {published data only}

  1. Tommasin S, Cocozza S, Taloni A, Gianni C, Petsas N, Pontillo G, et al. Machine learning classifier to identify clinical and radiological features relevant to disability progression in multiple sclerosis. Journal of Neurology 2021;268(12):4834-45. [DOI] [PMC free article] [PubMed] [Google Scholar]

Tousignant 2019 {published data only}

  1. Tousignant A, Lemaître P, Precup D, Arnold DL, Arbel T. Prediction of disease progression in multiple sclerosis patients using deep learning analysis of MRI data. Proceedings of Machine Learning Research 2019;102:483-92. [Google Scholar]

Vasconcelos 2020 {published data only}

  1. Aurenção JCK, Vasconcelos CCF, Thuler LCS, Alvarenga RMP. Validation of a clinical risk score for long-term progression of MS. Multiple Sclerosis Journal 2017;23(Suppl 3):740. [Google Scholar]
  2. Vasconcelos CCF, Aurenção JCK, Alvarenga RMP, Thuler LCS. Long-term MS secondary progression: derivation and validation of a clinical risk score. Clinical Neurology and Neurosurgery 2020;194:105792. [DOI: 10.1016/j.clineuro.2020.105792] [DOI] [PubMed] [Google Scholar]
  3. Vasconcelos CCF, Thuler LCS, Calvet Kallenbach Aurenção JCK, Papais-Alvarenga RM. A proposal for a risk score for long-term progression of multiple sclerosis. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/115269.
  4. Vasconcelos CCF, Thuler LCS, Calvet Kallenbach Aurenção JCK, Papais-Alvarenga RM. A proposal for a risk score for long-term progression of multiple sclerosis. Multiple Sclerosis Journal 2015;21(Suppl 11):732. [Google Scholar]

Vukusic 2004 {published data only}

  1. Vukusic S, Hutchinson M, Hours M, Moreau T, Cortinovis-Tourniaire P, Adeleine P, et al. Erratum: Pregnancy and multiple sclerosis (the PRIMS study) - clinical predictors of post-partum relapse. Brain 2004;127(Pt 8):1912. [DOI] [PubMed] [Google Scholar]
  2. Vukusic S, Hutchinson M, Hours M, Moreau T, Cortinovis-Tourniaire P, Adeleine P, et al. Pregnancy and multiple sclerosis (the PRIMS study) - clinical predictors of post-partum relapse. Brain 2004;127(Pt 6):1353-60. [DOI: 10.1093/brain/awh152] [DOI] [PubMed] [Google Scholar]

Weinshenker 1991 {published data only}

  1. Weinshenker BG, Rice GPA, Noseworthy JH, Carriere W, Baskerville J, Ebers GC. The natural history of multiple sclerosis: a geographically based study. 3. Multivariate analysis of predictive factors and models of outcome. Brain 1991;114(Pt 2):1045-56. [DOI: 10.1093/brain/114.2.1045] [DOI] [PubMed] [Google Scholar]

Weinshenker 1996 {published data only}

  1. Weinshenker BG, Issa M, Baskerville J. Long-term and short-term outcome of multiple sclerosis: a 3-year follow-up study. Archives of Neurology 1996;53(4):353-8. [DOI: 10.1001/archneur.1996.00550040093018] [DOI] [PubMed] [Google Scholar]

Wottschel 2015 {published data only}

  1. Ciccarelli O, Kwok PP, Wottschel V, Chard D, Stromillo ML, De Stefano N, et al. Predicting clinical conversion to multiple sclerosis in patients with clinically isolated syndrome using machine learning techniques. Multiple Sclerosis Journal 2012;18(Suppl 4):30-1. [Google Scholar]
  2. Wottschel V, Alexander DC, Kwok PP, Chard DT, Stromillo ML, De Stefano N, et al. Predicting outcome in clinically isolated syndrome using machine learning. NeuroImage: Clinical 2015;7:281-7. [DOI: 10.1016/j.nicl.2014.11.021] [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Wottschel V, Ciccarelli O, Chard DT, Miller DH, Alexander DC. Prediction of second neurological attack in patients with clinically isolated syndrome using support vector machines. In: 2013 International Workshop on Pattern Recognition in Neuroimaging. 2013:82-5.

Wottschel 2019 {published data only}

  1. Wottschel V, Chard DT, Enzinger C, Filippi M, Frederiksen JL, Gasperini C, et al. SVM recursive feature elimination analyses of structural brain MRI predicts near-term relapses in patients with clinically isolated syndromes suggestive of multiple sclerosis. NeuroImage: Clinical 2019;24:102011. [DOI: 10.1016/j.nicl.2019.102011] [DOI] [PMC free article] [PubMed] [Google Scholar]

Ye 2020 {published data only}

  1. Ye F, Liang J, Li J, Li H, Sheng W. Development and validation of a five-gene signature to predict relapse-free survival in multiple sclerosis. Frontiers in Neurology 2020;11:579683. [DOI] [PMC free article] [PubMed] [Google Scholar]

Yoo 2019 {published data only}

  1. Yoo Y, Tang LW, Brosch T, Li DKB, Metz L, Traboulsee A, et al. Deep Learning and Data Labeling for Medical Applications. Springer, 2016. [Google Scholar]
  2. Yoo Y, Tang LYW, Li DKB, Metz L, Kolind S, Traboulsee AL, et al. Deep learning of brain lesion patterns and user-defined clinical and MRI features for predicting conversion to multiple sclerosis from clinically isolated syndrome. Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization 2019;7(3):250-9. [DOI: 10.1080/21681163.2017.1356750] [DOI] [Google Scholar]

Yperman 2020 {published data only}

  1. Yperman J, Becker T, Valkenborg D, Popescu V, Hellings N, Van Wijmeersch B, et al. Machine learning analysis of motor evoked potential time series to predict disability progression in multiple sclerosis. BMC Neurology 2020;20(1):105. [DOI: 10.1186/s12883-020-01672-w] [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Yperman J, Becker T, Valkenborg D, Popescu V, Hellings N, Van Wijmeersch B, et al. Machine learning analysis of motor evoked potential time series to predict disability progression in multiple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):874-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

Zakharov 2013 {published data only}

  1. Zakharov AV, Khinivtseva EV, Poverennova IE, Gindullina EA, Vlasov Ia V, Sineok EV. Assessment of the risk of the transition of a monofocal clinically isolated syndrome to clinically definite multiple sclerosis. Zhurnal Nevrologii i Psikhiatrii Imeni S.S. Korsakova 2013;113(2 Pt 2):28-31. [PMID: ] [PubMed] [Google Scholar]

Zhang 2019 {published data only}

  1. Zhang H, Alberts E, Pongratz V, Mühlau M, Zimmer C, Wiestler B, et al. Predicting conversion from clinically isolated syndrome to multiple sclerosis - an imaging-based machine learning approach. NeuroImage: Clinical 2019;21:101593. [DOI: 10.1016/j.nicl.2018.11.003] [DOI] [PMC free article] [PubMed] [Google Scholar]

Zhao 2020 {published data only}

  1. Chitnis T, Zhao Y, Healy BC, Rotstein D, Guttmann CRG, Bakshi R, et al. Predicting clinical course in multiple sclerosis using machine learning. In: 6th Joint European Committee for Treatment and Research in Multiple Sclerosis-Americas Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS-ACTRIMS); 2014 September 10-13; Boston (MA). ECTRIMS, 2014. Available at onlinelibrary.ectrims-congress.eu/ectrims/2014/ACTRIMS-ECTRIMS2014/64470.
  2. Chitnis T, Zhao Y, Healy BC, Rotstein D, Guttmann CRG, Bakshi R, et al. Predicting clinical course in multiple sclerosis using machine learning. Multiple Sclerosis Journal 2014;20(Suppl 1):404. [Google Scholar]
  3. Zhao Y, Chitnis T, Doan T. Ensemble learning for predicting multiple sclerosis disease course. Multiple Sclerosis Journal 2019;25(Suppl 1):160-1. [Google Scholar]
  4. Zhao Y, Healy BC, Rotstein D, Guttmann CR, Bakshi R, Weiner HL, et al. Exploration of machine learning techniques in predicting multiple sclerosis disease course. PLOS One 2017;12(4):e0174866. [DOI: 10.1371/journal.pone.0174866] [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Zhao Y, Wang T, Bove R, Cree B, Henry R, Lokhande H, et al. Ensemble learning predicts multiple sclerosis disease course in the SUMMIT study. NPJ Digital Medicine 2020;3:135. [DOI: 10.1038/s41746-020-00361-9] [DOI] [PMC free article] [PubMed] [Google Scholar]

References to studies excluded from this review

Achiron 2006 {published data only}

  1. Achiron A. Measuring disability progression in multiple sclerosis. Journal of Neurology 2006;253(6):vi31-6. [Google Scholar]

Ahlbrecht 2016 {published data only}

  1. Ahlbrecht J, Martino F, Pul R, Skripuletz T, Suhs KW, Schauerte C, et al. Deregulation of microRNA-181c in cerebrospinal fluid of patients with clinically isolated syndrome is associated with early conversion to relapsing-remitting multiple sclerosis. Multiple Sclerosis Journal 2016;22(9):1202-14. [DOI: 10.1177/1352458515613641] [DOI] [PubMed] [Google Scholar]

Andersen 2015 {published data only}

  1. Andersen O, Skoog B, Runmarker B, Lisovskaja V, Nerman O, Tedeholm H. Fifty years untreated prognosis of multiple sclerosis based on an incidence cohort. European Journal of Neurology 2015;22(Suppl 1):25. [Google Scholar]

Azevedo 2019 {published data only}

  1. Azevedo C, Cen S, Zheng L, Jaberzadeh A, Pelletier D. Minimum clinically important difference for brain atrophy measures in multiple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):697-8. [Google Scholar]

Barkhof 1997 {published data only}

  1. Barkhof F, Filippi M, Miller DH, Scheltens P, Campi A, Polman CH, et al. Comparison of MRI criteria at first presentation to predict conversion to clinically definite multiple sclerosis. Brain 1997;120(Pt 11):2059-69. [DOI: 10.1093/brain/120.11.2059] [DOI] [PubMed] [Google Scholar]

Brettschneider 2006 {published data only}

  1. Brettschneider J, Petzold A, Junker A, Tumani H. Axonal damage markers in the cerebrospinal fluid of patients with clinically isolated syndrome improve predicting conversion to definite multiple sclerosis. Multiple Sclerosis Journal 2006;12(2):143-8. [DOI] [PubMed] [Google Scholar]

Bsteh 2021 {published data only}

  1. Bsteh G, Hegen H, Riedl K, Altmann P, Auer M, Berek K, et al. Quantifying the risk of disease reactivation after interferon and glatiramer acetate discontinuation in multiple sclerosis: the VIAADISC score. European Journal of Neurology 2021;28(5):1609-16. [DOI] [PMC free article] [PubMed] [Google Scholar]

Castellaro 2015 {published data only}

  1. Castellaro M, Bertoldo A, Morra A, Monaco S, Calabrese M, Doyle O. Prediction of conversion to secondary progression phase in multiple sclerosis. Multiple Sclerosis Journal 2015;23:198-9. [Google Scholar]

Chalkou 2021 {published data only}

  1. Chalkou K, Steyerberg E, Egger M, Manca A, Pellegrini F, Salanti G. A two-stage prediction model for heterogeneous effects of treatments. Statistics in Medicine 2021;40(20):4362-75. [DOI] [PMC free article] [PubMed] [Google Scholar]

Costa 2017 {published data only}

  1. Costa GD, Di Maggio G, Sangalli F, Moiola L, Colombo B, Comi G, et al. Prognostic factors for multiple sclerosis in patients with spinal isolated syndromes. European Journal of Neurology 2017;24:62. [Google Scholar]

Cutter 2014 {published data only}

  1. Cutter G, Wolinsky JS, Comi G, Ladkani D, Knappertz V, Vainstein A, et al. Indirect comparison of glatiramer acetate 40mg/mL TIW and 20mg/mL QD dosing regimen effects on relapse rate: results of a predictive statistical model. Multiple Sclerosis Journal 2014;20:112. [Google Scholar]

Damasceno 2019 {published data only}

  1. Damasceno A, Pimentel-Silva LR, Damasceno BP, Cendes F. Cognitive trajectories in relapsing–remitting multiple sclerosis: a longitudinal 6-year study. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/278685. [DOI] [PubMed]
  2. Damasceno A, Pimentel-Silva LR, Damasceno BP, Cendes F. Cognitive trajectories in relapsing–remitting multiple sclerosis: a longitudinal 6-year study. Multiple Sclerosis Journal 2019;26(13):1740-51. [DOI: 10.1177/1352458519878685] [DOI] [PubMed] [Google Scholar]

Daumer 2007 {published data only}

  1. Daumer M, Neuhaus A, Lederer C, Scholz M, Wolinsky JS, Heiderhoff M, et al. Prognosis of the individual course of disease - steps in developing a decision support tool for multiple sclerosis. BMC Medical Informatics and Decision Making 2007;7:11. [DOI: 10.1186/1472-6947-7-11] [DOI] [PMC free article] [PubMed] [Google Scholar]

Dekker 2019 {published data only}

  1. Dekker I, Eijlers AJC, Popescu V, Balk LJ, Vrenken H, Wattjes MP, et al. Predicting clinical progression in multiple sclerosis after 6 and 12 years. European Journal of Neurology 2019;26(6):893-902. [DOI] [PMC free article] [PubMed] [Google Scholar]

Esposito 2011 {published data only}

  1. Esposito M, De Falco I, De Pietro G. An evolutionary-fuzzy DSS for assessing health status in multiple sclerosis disease. International Journal of Medical Informatics 2011;80(12):e245-54. [DOI] [PubMed] [Google Scholar]

Filippi 2010 {published data only}

  1. Filippi M, Rocca MA, Calabrese M, Sormani MP, Rinaldi F, Perini P, et al. Intracortical lesions and new magnetic resonance imaging diagnostic criteria for multiple sclerosis. Multiple Sclerosis Journal 2010;16:S42. [DOI] [PubMed] [Google Scholar]

Filippi 2013 {published data only}

  1. Filippi M, Preziosa P, Copetti M, Riccitelli G, Horsfield MA, Martinelli V, et al. Gray matter damage predicts the accumulation of disability 13 years later in MS. Neurology 2013;81(20):1759-67. [DOI] [PubMed] [Google Scholar]

Fuchs 2021 {published data only}

  1. Fuchs TA, Dwyer MG, Jakimovski D, Bergsland N, Ramasamy DP, Weinstock-Guttman B, et al. Quantifying disease pathology and predicting disease progression in multiple sclerosis with only clinical routine T2-FLAIR MRI. NeuroImage: Clinical 2021;31:102705. [DOI] [PMC free article] [PubMed] [Google Scholar]

Gasperini 2021 {published data only}

  1. Gasperini C, Prosperini L, Rovira A, Tintore M, Sastre-Garriga J, Tortorella C, et al. Scoring the 10-year risk of ambulatory disability in multiple sclerosis: the RoAD score. European Journal of Neurology 2021;28(8):2533-42. [DOI] [PubMed] [Google Scholar]
  2. Gasperini C, Prosperini L, Tortorella C, Haggiag S, Ruggieri S, Mancinelli CR, et al. Scoring the 10-year risk of ambulatory disability in DMD treated multiple sclerosis patients: the RoAD score. In: 34th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2018 October 10-12; Berlin (Germany). ECTRIMS, 2018. Available at onlinelibrary.ectrims-congress.eu/ectrims/2018/ectrims-2018/231905.
  3. Gasperini C, Prosperini L, Tortorella C, Haggiag S, Ruggieri S, Mancinelli CR, et al. Scoring the 10-year risk of ambulatory disability in DMD treated multiple sclerosis patients: the RoAD score. Multiple Sclerosis Journal 2018;24(Suppl 2):58. [Google Scholar]

Gomez‐Gonzalez 2010 {published data only}

  1. Gomez-Gonzalez E, Garcia-Sanchez MI, Izquierdo-Ayuso G, Coca De La Torre A, Ramirez-Martinez D, Marco-Ramirez AM, et al. Application of image and signal processing algorithms to oligoclonal IgG bands classification. Multiple Sclerosis Journal 2010;16:341-2. [Google Scholar]

Hakansson 2017 {published data only}

  1. Hakansson I, Tisell A, Cassel P, Blennow K, Zetterberg H, Lundberg P, et al. Neurofilament light chain in cerebrospinal fluid and prediction of disease activity in clinically isolated syndrome and relapsing-remitting multiple sclerosis. European Journal of Neurology 2017;24(5):703-12. [DOI] [PubMed] [Google Scholar]

Ho 2013 {published data only}

  1. Ho J, Ghosh J, Unnikrishnan K. Risk prediction of a multiple sclerosis diagnosis. In: 2013 IEEE International Conference on Healthcare Informatics. 2013:175-83.

Ignatova 2018 {published data only}

  1. Ignatova V, Todorova L, Haralanov L. Predictors of long term disability progression in patients with relapsing remitting multiple sclerosis. Multiple Sclerosis Journal 2018;24(Suppl 2):788-9. [Google Scholar]

Invernizzi 2011 {published data only}

  1. Invernizzi P, Bertolasi L, Bianchi MR, Turatti M, Gajofatto A, Benedetti MD. Prognostic value of multimodal evoked potentials in multiple sclerosis: the EP score. Journal of Neurology 2011;258(11):1933-9. [DOI] [PubMed] [Google Scholar]

Jackson 2020 {published data only}

  1. Jackson KC, Sun K, Barbour C, Hernandez D, Kosa P, Tanigawa M, et al. Genetic model of MS severity predicts future accumulation of disability. Annals of Human Genetics 2020;84(1):1-10. [DOI: 10.1111/ahg.12342] [DOI] [PMC free article] [PubMed] [Google Scholar]

Kalincik 2013 {published data only}

  1. Kalincik T, Guttmann CR, Krasensky J, Vaneckova M, Lelkova P, Tyblova M, et al. Multiple sclerosis susceptibility loci do not alter clinical and MRI outcomes in clinically isolated syndrome. Genes & Immunity 2013;14(4):244-8. [DOI: 10.1038/gene.2013.17] [DOI] [PubMed] [Google Scholar]

Leocani 2017 {published data only}

  1. Leocani L, Pisa M, Bianco M, Guerrieri S, Di Maggio G, Romeo M, et al. Multimodal EPs predict no evidence of disease activity at two years of first line multiple sclerosis treatment. Neurology 2017;88(Suppl 16):P4.386. [Google Scholar]

Morelli 2020 {published data only}

  1. Morelli ME, Baldini S, Sartori A, D'Acunto L, Dinoto A, Bosco A, et al. Early putamen hypertrophy and ongoing hippocampus atrophy predict cognitive performance in the first ten years of relapsing-remitting multiple sclerosis. Neurological Sciences 2020;41(10):2893-904. [DOI] [PubMed] [Google Scholar]

Palace 2013 {published data only}

  1. Palace J, Bregenzer T, Tremlett H, Duddy M, Boggild M, Zhu F, et al. Modelling natural history for the UK multiple sclerosis risk-sharing scheme. Multiple Sclerosis Journal 2013;19(Suppl 1):339. [Google Scholar]

Pappalardo 2020 {published data only}

  1. Pappalardo F, Russo G, Pennisi M, Parasiliti Palumbo GA, Sgroi G, Motta S, et al. The potential of computational modeling to predict disease course and treatment response in patients with relapsing multiple sclerosis. Cells 2020;9(3):586. [DOI] [PMC free article] [PubMed] [Google Scholar]

Petrou 2018 {published data only}

  1. Petrou P, Yagmour N, Karussis D. Biomarkes for diagnosis and prognosis in multiple sclerosis. Multiple Sclerosis Journal 2018;24:15. [Google Scholar]

Preziosa 2015 {published data only}

  1. Preziosa P, Rocca M, Mesaros S, Copetti M, Petrolini M, Drulovic J, et al. Different MRI measures predict clinical deterioration and cognitive impairment in MS: a 5 year longitudinal study. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/116658.

Rajda 2019 {published data only}

  1. Rajda C, Galla Z, Polyák H, Maróti Z, Babarczy K, Pukoli D, et al. High neurofilament light chain and high quinolinic acid levels in the CSF of patients with multiple sclerosis are independent predictors of active, disabling disease. Multiple Sclerosis Journal 2019;25:856. [Google Scholar]

Rio 2019 {published data only}

  1. Rio J, Rovira A, Gasperini C, Tintore M, Prosperini L, Otero-Romero S, et al. Treatment response scoring systems to assess long term prognosis in relapsing-remitting multiple sclerosis patients. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279564.
  2. Rio J, Rovira A, Gasperini C, Tintore M, Prosperini L, Otero-Romero S, et al. Treatment response scoring systems to assess long term prognosis in relapsing-remitting multiple sclerosis patients. Multiple Sclerosis Journal 2019;25:121-2. [Google Scholar]

Rodriguez 2012 {published data only}

  1. Rodriguez JD, Perez A, Arteta D, Tejedor D, Lozano JA. Using multidimensional bayesian network classifiers to assist the treatment of multiple sclerosis. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2012;42(6):1705-15. [DOI: 10.1109/TSMCC.2012.2217326] [DOI] [Google Scholar]

Rothman 2016 {published data only}

  1. Rothman AM, Button J, Balcer LJ, Frohman EM, Frohman TC, Reich DS, et al. Retinal measurements predict 10-year disability in multiple sclerosis. In: 32nd European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2016 September 14-17; London (UK). ECTRIMS, 2016. Available at onlinelibrary.ectrims-congress.eu/ectrims/2016/32nd/146960.
  2. Rothman AM, Button J, Balcer LJ, Frohman EM, Frohman TC, Reich DS, et al. Retinal measurements predict 10-year disability in multiple sclerosis. Multiple Sclerosis Journal 2016;22:20-1. [Google Scholar]

Roura 2018 {published data only}

  1. Roura E, Maclair G, Martinez-Lapiscina EH, Andorra M, Villoslada P. Brain complexity and damage in patients with multiple sclerosis using fractal analysis: a new imaging outcome for monitoring MS severity. Multiple Sclerosis Journal 2018;24(2):210. [Google Scholar]

Sbardella 2011 {published data only}

  1. Sbardella E, Tomassini V, Stromillo ML, Filippini N, Battaglini M, Ruggieri S, et al. Pronounced focal and diffuse brain damage predicts short-term disease evolution in patients with clinically isolated syndrome suggestive of multiple sclerosis. Multiple Sclerosis Journal 2011;17(12):1432-40. [DOI] [PubMed] [Google Scholar]

Schlaeger 2012 {published data only}

  1. Schlaeger R, D'Souza M, Schindler C, Grize L, Dellas S, Radue EW, et al. Prediction of long-term disability in multiple sclerosis. Multiple Sclerosis Journal 2012;18(1):31-8. [DOI] [PubMed] [Google Scholar]

Srinivasan 2020 {published data only}

  1. Srinivasan J, Gudesblatt M. Multiple sclerosis management: predicting disease trajectory of multiple sclerosis on multi-dimensional data including digital cognitive assessments and patient reported outcomes using machine learning techniques. In: 5th Annual Americas Committee for Treatment and Research in Multiple Sclerosis (ACTRIMS); 2020 February 27-29; West Palm Beach (FL). West Palm Beach (FL): ACTRIMS, 2020.

Tintore 2015 {published data only}

  1. Tintoré M. Predicting MS extremes: benign and aggressive. Multiple Sclerosis 2015;23:56. [Google Scholar]

Tomassini 2019 {published data only}

  1. Tomassini V, Fanelli F, Prosperini L, Cerqua R, Cavalla P, Pozzilli C. Predicting the profile of increasing disability in multiple sclerosis. Multiple Sclerosis Journal 2019;25(9):1306-15. [DOI] [PMC free article] [PubMed] [Google Scholar]

Tossberg 2013 {published data only}

  1. Tossberg JT, Crooke PS, Henderson MA, Sriram S, Mrelashvili D, Vosslamber S, et al. Using biomarkers to predict progression from clinically isolated syndrome to multiple sclerosis. Journal of Clinical Bioinformatics 2013;3(1):18. [DOI] [PMC free article] [PubMed] [Google Scholar]

Uher 2017a {published data only}

  1. Uher T, Vaneckova M, Sobisek L, Tyblova M, Seidl Z, Krasensky J, et al. Combining clinical and magnetic resonance imaging markers enhances prediction of 12-year disability in multiple sclerosis. Multiple Sclerosis Journal 2017;23(1):51-61. [DOI] [PubMed] [Google Scholar]

Uher 2017b {published data only}

  1. Uher T, Vaneckova M, Sormani MP, Krasensky J, Sobisek L, Dusankova JB, et al. Identification of multiple sclerosis patients at highest risk of cognitive impairment using an integrated brain magnetic resonance imaging assessment approach. European Journal of Neurology 2017;24(2):292-301. [DOI] [PubMed] [Google Scholar]

Veloso 2014 {published data only}

  1. Veloso M. A web-based decision support tool for prognosis simulation in multiple sclerosis. Multiple Sclerosis and Related Disorders 2014;3(5):575-83. [DOI] [PubMed] [Google Scholar]

Vukusic 2006 {published data only}

  1. Vukusic S, Confavreux C. Pregnancy and multiple sclerosis: the children of PRIMS. Clinical Neurology and Neurosurgery 2006;108(3):266-70. [DOI] [PubMed] [Google Scholar]

Wahid 2019 {published data only}

  1. Wahid K, Charron O, Colen R, Shinohara RT, Kotrotsou A, Papadimitropoulos G, et al. Prediction of disability and treatment response from radiomic features: a machine learning analysis from the combirx multi-center cohort. Multiple Sclerosis Journal 2019;25:112-3. [Google Scholar]

Zephir 2009 {published data only}

  1. Zephir H, Lefranc D, Dubucquoi S, Seze J, Boron L, Prin L, et al. Serum IgG repertoire in clinically isolated syndrome predicts multiple sclerosis. Multiple Sclerosis Journal 2009;15(5):593-600. [DOI] [PubMed] [Google Scholar]

Ziemssen 2019 {published data only}

  1. Ziemssen T, Piani-Meier D, Bennett B, Johnson C, Tinsley K, Trigg A, et al. Validation of the scoring algorithm for a novel integrative MS progression discussion tool. European Journal of Neurology 2019;26:872. [Google Scholar]

References to studies awaiting assessment

Achiron 2007 {published data only}

  1. Achiron A, Gurevich M, Snir Y, Segal E, Mandel M. Zinc-ion binding and cytokine activity regulation pathways predicts outcome in relapsing-remitting multiple sclerosis. Clinical and Experimental Immunology 2007;149(2):235-42. [DOI: 10.1111/j.1365-2249.2007.03405.x] [DOI] [PMC free article] [PubMed] [Google Scholar]

Behling 2019 {published data only}

  1. Behling M, Bryant A, Brecht T, Cerf S, Gliklich R, Su Z. Predicting relapse episodes in patients with multiple sclerosis treated with disease modifying therapies in a large representative real-world cohort in the United States. Pharmacoepidemiology and Drug Safety 2019;28(Suppl 2):130. [Google Scholar]

Castellazzi 2019 {published data only}

  1. Castellazzi G, Martinelli D, Collorone S, Alhamadi A, Debernard L, Melzer TR, et al. A clinical decision system based on resting state fMRI-derived features to predict the conversion of CIS to RRMS. Multiple Sclerosis Journal 2019;25(Suppl 2):686-7. [Google Scholar]
  2. Castellazzi G, Martinelli D, Collorone S, Alhamadi A, Debernard L, Melzer TR, et al. A clinical decision system based on resting state fMRI-derived features to predict the conversion of CIS to RRMS. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/278467: ECTRIMS, 2019.

Chaar 2019 {published data only}

  1. Chaar D, Kakara M, Razmjou S, Bernitsas E. Predicting EDSS in MS through imaging biomarkers using artificial neural networks. Neurology 2019;92(Suppl 15):P5.2-010. [Google Scholar]

Dalla Costa 2014 {published data only}

  1. Dalla Costa G, Moiola L, Leocani L, Furlan R, Filippi M, Comi G, et al. Artificial intelligence techniques in the diagnosis of clinically definite multiple sclerosis. Multiple Sclerosis Journal 2014;20(Suppl 1):170. [Google Scholar]

Ghosh 2009 {published data only}

  1. Ghosh P, Neuhaus A, Daumer M, Basu S. Joint modelling of multivariate longitudinal data for mixed responses and survival in multiple sclerosis. Multiple Sclerosis 2009;15:S157-8. [Google Scholar]

Kister 2015 {published data only}

  1. Kister I, Bacon T, Levinas M, Green R, Cutter G, Chamot E. Stability and prognostic utility of patient-derived MS severity score (P-MSSS) among MS clinic patients. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/115800.
  2. Kister I, Bacon T, Levinas M, Green R, Cutter G, Chamot E. Stability and prognostic utility of patient-derived MS severity score (P-MSSS) among MS clinic patients. Multiple Sclerosis Journal 2015;21(Suppl 11):410-1. [Google Scholar]
  3. Kister I, Cutter G, Salter A, Herbert J, Chamot E. Novel, easy-to-use prediction tool accurately estimates probability of “aggressive MS” at 2-year follow up. Neurology 2015;84(Suppl 14):P3.214. [Google Scholar]

Mallucci 2019 {published data only}

  1. Mallucci G, Trivelli L, Colombo E, Trojano M, Amato MP, Zaffaroni M, et al. The RECIS (risk estimate in CIS) study: a novel model to early predict clinically isolated syndrome evolution. Multiple Sclerosis Journal 2019;25(Suppl 2):405-6. [Google Scholar]

Medin 2016 {published data only}

  1. Medin J, Joyeux A, Braune S, Bergmann A, Rigg J, Wang L. Predicting disease activity for patients with relapsing remitting multiple sclerosis using electronic medical records. In: American Academy of Neurology Annual Meeting; 2016 April 15-21; Vancouver (Canada). 2016. Available at neurotransdata.com/images/publikationen/2016-predicting-disease-activity-aan.pdf.
  2. Medin J, Joyeux A, Braune S, Bergmann A, Rigg J, Wang L. Predicting disease activity for patients with relapsing remitting multiple sclerosis using electronic medical records. Neurology 2016;86(Suppl 16):P1.395. [Google Scholar]

Pareto 2017 {published data only}

  1. Pareto D, Garcia A, Huerga E, Auger C, Sastre-Garriga J, Tintore M, et al. Pattern recognition for neuroimaging toolbox PRoNTo: a pilot study in predicting clinically isolated syndrome conversion. Multiple Sclerosis Journal 2017;23(Suppl 3):231-2. [Google Scholar]

Sharmin 2020 {published data only}

  1. Sharmin S, Bovis F, Malpas C, Horakova D, Havrdova E, Ayuso GI, et al. Predicting long-term sustained disability progression in multiple sclerosis. Neurology 2020;94(Suppl 15):2002. [Google Scholar]
  2. Sharmin S, Bovis F, Sormani MP, Butzkueven H, Kalincik T. Predicting long-term sustained disability progression in multiple sclerosis: application in the clarity trial. Multiple Sclerosis Journal 2020;26(Suppl 3):181. [Google Scholar]
  3. Sharmin S, Malpas C, Horakova D, Havrdova EK, Izquierdo G, Eichau S, et al. Predicting long-term sustained disability progression in multiple sclerosis. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279563.
  4. Sharmin S, Malpas C, Horakova D, Havrdova EK, Izquierdo G, Eichau S, et al. Predicting long-term sustained disability progression in multiple sclerosis. Multiple Sclerosis Journal 2019;25(Suppl 2):119-21. [Google Scholar]
  5. Sharmin S. Follow-up for cochrane review - prognostc predicton models in multiple sclerosis [pers comm]. Email to: On BI 12 April 2021.

Silva 2017 {published data only}

  1. Silva D, Meier DP, Ritter S, Davorka T, Medin J, Lange M, et al. Multiple sclerosis care optimization tool (MS-COT): a clinical application prototype to predict future disease activity. Neurology 2017;88(16):P1.368. [Google Scholar]
  2. Silva D, Meier DP, Ritter S, Tomic D, Medin J, Lange M, et al. Multiple sclerosis care optimization tool (MSCOT): a clinical application prototype to predict future disease activity. In: 69th Congress of the American Academy of Neurology; 2017 April 22-28; Boston (MA). Novartis Pharma AG, 2017. Available at novartis.medicalcongressposters.com/Default.aspx?doc=ac1bf.

Tayyab 2020 {published data only}

  1. Tam R. Follow-up for cochrane review - prognostic prediction models in multiple sclerosis (Tayyab 2020) [pers comm]. Email to: K Reeve 20 July 2021.
  2. Tayyab M, Metz L, Dvorak A, Kolind S, Au S, Carruthers R, et al. Machine learning of deep grey matter volumes on mri for predicting new disease activity after a first clinical demyelinating event. Multiple Sclerosis Journal 2020;26(Suppl 3):116-7. [Google Scholar]

Thiele 2009 {published data only}

  1. Thiele A, Lederer C, Neuhaus A, Strobl R, Fahrmeir L, Koch-Henriksen N, et al. Comparison of model-based and matching-based prediction of the annualised relapse-rate of MS-patients. Multiple Sclerosis 2009;15(9):S163. [Google Scholar]

Tintoré 2015 {published data only}

  1. Tintoré M, Río J, Otero-Romero S, Arrambide G, Tur C, Comabella M, et al. Dynamic model for predicting prognosis in CIS patients. In: 31st European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2015 October 7-10; Barcelona (Spain). ECTRIMS, 2015. Available at onlinelibrary.ectrims-congress.eu/ectrims/2015/31st/116690.
  2. Tintoré M, Río J, Otero-Romero S, Arrambide G, Tur C, Comabella M, et al. Dynamic model for predicting prognosis in CIS patients. Multiple Sclerosis Journal 2015;21(Suppl 11):33. [Google Scholar]

Tommasin 2019 {published data only}

  1. Tommasin S, Taloni A, Farrelly FA, Petsas N, Ruggieri S, Gianni C, et al. Evaluation of 5-year disease progression in multiple sclerosis via magnetic-resonance-based deep learning techniques. In: 35th European Committee for Treatment and Research in Multiple Sclerosis (ECTRIMS); 2019 September 11-13; Stockholm (Sweden). ECTRIMS, 2019. Available at onlinelibrary.ectrims-congress.eu/ectrims/2019/stockholm/279263.
  2. Tommasin S, Taloni A, Farrelly FA, Petsas N, Ruggieri S, Gianni C, et al. Evaluation of 5-year disease progression in multiple sclerosis via magnetic-resonance-based deep learning techniques. Multiple Sclerosis Journal 2019;25(Suppl 2):468. [DOI] [PMC free article] [PubMed] [Google Scholar]

Wahid 2018 {published data only}

  1. Wahid K, Colen R, Kotrotsou A, Lincoln J, Narayana PA, Cofield SS, et al. Radiomic prediction of clinical outcome in multiple sclerosis patients from the combirx cohort. Multiple Sclerosis Journal 2018;24(Suppl 1):71-2. [Google Scholar]

Additional references

Adelman 2013

  1. Adelman G, Rane SG, Villa KF. The cost burden of multiple sclerosis in the United States: a systematic review of the literature. Journal of Medical Economics 2013;16(5):639-47. [DOI: 10.3111/13696998.2013.778268] [DOI] [PubMed] [Google Scholar]

Altman 2000

  1. Altman DG, Royston P. What do we mean by validating a prognostic model? Statistics in Medicine 2000;19(4):453-73. [DOI: ] [DOI] [PubMed] [Google Scholar]

Altman 2014

  1. Altman DG. The time has come to register diagnostic and prognostic research. Clinical Chemistry 2014;60(4):580-2. [DOI: 10.1373/clinchem.2013.220335] [DOI] [PubMed] [Google Scholar]

Attfield 2022

  1. Attfield KE, Jensen LT, Kaufmann M, Friese MA, Fugger L. The immunology of multiple sclerosis. Nature Reviews Immunology 2022;22(12):734-50. [DOI: 10.1038/s41577-022-00718-z] [DOI] [PubMed] [Google Scholar]

Bakshi 2005

  1. Bakshi R, Dandamudi VS, Neema M, De C, Bermel RA. Measurement of brain and spinal cord atrophy by magnetic resonance imaging as a tool to monitor multiple sclerosis. Journal of Neuroimaging 2005;15(4 Suppl):30s-45s. [DOI] [PubMed] [Google Scholar]

Belbasis 2015

  1. Belbasis L, Bellou V, Evangelou E, Ioannidis JPA, Tzoulaki I. Environmental risk factors and multiple sclerosis: an umbrella review of systematic reviews and meta-analyses. Lancet Neurology 2015;14(3):263-73. [DOI: 10.1177/1352458517690270] [DOI] [PubMed] [Google Scholar]

Bjornevik 2023

  1. Bjornevik K, Münz C, Cohen JI, Ascherio A. Epstein–Barr virus as a leading cause of multiple sclerosis: mechanisms and implications. Nature Reviews Neurology 2023;19(3):160-71. [DOI: 10.1038/s41582-023-00775-5] [DOI] [PubMed] [Google Scholar]

Bluemke 2020

  1. Bluemke DA, Moy L, Bredella MA, Ertl-Wagner BB, Fowler KJ, Goh VJ, et al. Assessing radiology research on artificial intelligence: a brief guide for authors, reviewers, and readers—from the Radiology editorial board. Radiology 2020;294(3):487–89. [DOI: 10.1148/radiol.2019192515] [DOI] [PubMed] [Google Scholar]

Bossuyt 2015

  1. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ 2015;351:h5527. [DOI: 10.1136/bmj.h5527] [DOI] [PMC free article] [PubMed] [Google Scholar]

Boulesteix 2019

  1. Boulesteix A, Janitza S, Hornung R, Probst P, Busen H, Hapfelmeier A. Making complex prediction rules applicable for readers: current practice in random forest literature and recommendations. Biometrical Journal 2019;61(5):1314-28. [DOI: 10.1002/bimj.201700243] [DOI] [PubMed] [Google Scholar]

Bouwmeester 2012

  1. Bouwmeester W, Zuithoff NPA, Mallett S, Geerlings MI, Vergouwe Y, Steyerberg EW, et al. Reporting and methods in clinical prediction research: a systematic review. PLOS Medicine 2012;9(5):e1001221. [DOI: 10.1371/journal.pmed.1001221] [DOI] [PMC free article] [PubMed] [Google Scholar]

Bovis 2019

  1. Bovis F, Carmisciano L, Signori A, Pardini M, Steinerman JR, Li T, et al. Defining responders to therapies by astatistical modeling approach appliedto randomized clinical trial data. BMC Medicine 2019;17:113. [DOI: 10.1186/s12916-019-1345-2] [DOI] [PMC free article] [PubMed] [Google Scholar]

Briggs 2019

  1. Briggs FB, Thompson NR, Conway DS. Prognostic factors of disability in relapsing remitting multiple sclerosis. Multiple Sclerosis and Related Disorders 2019;30:9-16. [DOI: 10.1016/j.msard.2019.01.045] [DOI] [PubMed] [Google Scholar]

Briscoe 2020

  1. Briscoe S, Bethel A, Rogers M. Conduct and reporting of citation searching in Cochrane systematic reviews: a cross-sectional study. Research Synthesis Methods 2020;11(2):169-80. [DOI: 10.1002/jrsm.1355] [DOI] [PMC free article] [PubMed] [Google Scholar]

Brown 2020

  1. Brown FS, Glasmacher SA, Kearns PKA, MacDougall N, Hunt D, Connick P, et al. Systematic review of prediction models in relapsing remitting multiple sclerosis. PLOS One 2020;15(5):e0233575. [DOI: 10.1371/journal.pone.0233575] [DOI] [PMC free article] [PubMed] [Google Scholar]

Chatfield 1995

  1. Chatfield C. Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society. Series A (Statistics in Society) 1995;158(3):419-66. [DOI: ] [Google Scholar]

Chen 2017

  1. Chen JH, Asch SM. Machine learning and prediction in medicine — beyond the peak of inflated expectations. New England Journal of Medicine 2017;376(26):2507-9. [DOI: 10.1056/NEJMp1702071] [DOI] [PMC free article] [PubMed] [Google Scholar]

Cochrane 2021

  1. Cochrane Multiple Sclerosis and Rare Disease of the CNS. Our reviews. msrdcns.cochrane.org/our-review (accessed 30 October 2021).

Cohen 1988

  1. Cohen J. Statistical Power Analysis for the Behavioral Sciences. Hillsdale (NJ): L. Erlbaum Associates, 1988. [Google Scholar]

Collins 2015

  1. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Journal of Clinical Epidemiology 2015;68(2):112-21. [DOI: 10.1016/j.jclinepi.2014.11.010] [DOI] [PubMed] [Google Scholar]

Concato 1993

  1. Concato J, Feinstein AR, Holford TR. The risk of determining risk with multivariable models. Annals of Internal Medicine 1993;118(3):201-10. [DOI: 10.7326/0003-4819-118-3-199302010-00009] [DOI] [PubMed] [Google Scholar]

Correale 2012

  1. Correale J, Ysrraelit MC, Fiol MP. Benign multiple sclerosis: does it exist? Current Neurology and Neuroscience Reports 2012;12(5):601-9. [DOI: 10.1007/s11910-012-0292-5] [DOI] [PubMed] [Google Scholar]

Cree 2016

  1. Cree BAC, Gourraud P-A, Oksenberg JR, Bevan C, Crabtree-Hartman E, Gelfand JM, et al. Long-term evolution of multiple sclerosis disability in the treatment era. Annals of Neurology 2016;80(4):499-510. [DOI: 10.1002/ana.24747] [DOI] [PMC free article] [PubMed] [Google Scholar]

Cree 2019

  1. Cree BA, Hollenbach JA, Bove R, Kirkish G, Sacco S, Caverzasi E, et al. Silent progression in disease activity-free relapsing multiple sclerosis. Annals of Neurology 2019;85(5):653-66. [DOI: 10.1002/ana.25463] [DOI] [PMC free article] [PubMed] [Google Scholar]

Day 2018

  1. Day GS, Rae-Grant A, Armstrong MJ, Pringsheim T, Cofield SS, Marrie RA. Identifying priority outcomes that influence selection of disease-modifying therapies in MS. Neurology Clinical Practice 2018;8(3):179-85. [DOI: 10.1212/CPJ.0000000000000449] [DOI] [PMC free article] [PubMed] [Google Scholar]

Debray 2017

  1. Debray TP, Damen JA, Snell KI, Ensor J, Hooft L, Reitsma JB, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ 2017;356:i6460. [DOI: 10.1136/bmj.i6460] [DOI] [PubMed] [Google Scholar]

Debray 2019

  1. Debray TP, Damen JA, Riley RR, Snell K, Reitsma JB, Hooft L, et al. A framework for meta-analysis of prediction model studies with binary and time-to-event outcomes. Statistical Methods in Medical Research 2019;28(9):2768-86. [DOI: 10.1177/0962280218785504] [DOI] [PMC free article] [PubMed] [Google Scholar]

Derfuss 2012

  1. Derfuss T. Personalized medicine in multiple sclerosis: hope or reality? BMC Medicine 2012;10:116. [DOI: 10.1186/1741-7015-10-116] [DOI] [PMC free article] [PubMed] [Google Scholar]

Dhiman 2021

  1. Dhiman P, Ma J, Navarro CA, Speich B, Bullock G, Damen JAA, et al. Reporting of prognostic clinical prediction models based on machine learning methods in oncology needs to be improved. Journal of Clinical Epidemiology 2021;138:60-72. [DOI: 10.1016/j.jclinepi.2021.06.024] [DOI] [PMC free article] [PubMed] [Google Scholar]

Diamond 1989

  1. Diamond GA. Future imperfect: the limitations of clinical prediction models and the limits of clinical prediction. Journal of the American College of Cardiology 1989;14(3):A12-22. [DOI: 10.1016/0735-1097(89)90157-5] [DOI] [PubMed] [Google Scholar]

Diaz 2019

  1. Diaz C, Zarco LA, Rivera DM. Highly active multiple sclerosis: an update. Multiple Sclerosis and Related Disorders 2019;30:215-24. [DOI: 10.1016/j.msard.2019.01.039] [DOI] [PubMed] [Google Scholar]

Ferrazzano 2020

  1. Ferrazzano G, Crisafulli SG, Baione V, Tartaglia M, Cortese A, Frontoni M, et al. Early diagnosis of secondary progressive multiple sclerosis: focus on fluid and neurophysiological biomarkers. Journal of Neurology 2021;268(10):3626-45. [DOI: 10.1007/s00415-020-09964-4] [DOI] [PubMed] [Google Scholar]

Foroutan 2020

  1. Foroutan F, Guyatt G, Zuk V, Vandvik PO, Alba AC, Mustafa R, et al. GRADE Guidelines 28: Use of GRADE for the assessment of evidence about prognostic factors: rating certainty in identification of groups of patients with different absolute risks. Journal of Clinical Epidemiology 2020;121:62-70. [DOI: 10.1016/j.jclinepi.2019.12.023] [DOI] [PubMed] [Google Scholar]

Freedman 2016

  1. Freedman MS, Rush CA. Severe, highly active, or aggressive multiple sclerosis. Continuum 2016;22(3):761-84. [DOI: 10.1212/CON.0000000000000331] [DOI] [PubMed] [Google Scholar]

Gafson 2017

  1. Gafson A, Craner MJ, Matthews PM. Personalised medicine for multiple sclerosis care. Multiple Sclerosis Journal 2017;23(3):362-9. [DOI: 10.1177/1352458516672017] [DOI] [PubMed] [Google Scholar]

Gauthier 2007

  1. Gauthier SA, Mandel M, Guttmann CRG, Glanz BI, Khoury SJ, Betensky RA. Predicting short-term disability in multiple sclerosis. Neurology 2007;68(24):2059-65. [DOI: 10.1212/01.wnl.0000264890.97479.b1.] [DOI] [PubMed] [Google Scholar]

Ge 2006

  1. Ge Y. Multiple sclerosis: the role of MR imaging. American Journal of Neuroradiology 2006;27(6):1165-76. [PMID: ] [PMC free article] [PubMed] [Google Scholar]

Geersing 2012

  1. Geersing G-J, Bouwmeester W, Zuithoff P, Spijker R, Leeflang M, Moons K. Search filters for finding prognostic and diagnostic prediction studies in MEDLINE to enhance systematic reviews. PLOS One 2012;7(2):e32844. [DOI: 10.1371/journal.pone.0032844] [DOI] [PMC free article] [PubMed] [Google Scholar]

Hanley 1982

  1. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143(1):29-36. [DOI: ] [DOI] [PubMed] [Google Scholar]

Harrell 1996

  1. Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine 1996;15(4):361-87. [DOI: ] [DOI] [PubMed] [Google Scholar]

Harrell 2001

  1. Harrell F. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York (NY): Springer-Verlag, 2001. [Google Scholar]

Hastie 2009

  1. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. 2nd edition. New York (NY): Springer, 2009. [Google Scholar]

Havas 2020

  1. Havas J, Leray E, Rollot F, Casey R, Michel L, Lejeune F, et al. Predictive medicine in multiple sclerosis: a systematic review. Multiple Sclerosis and Related Disorders 2020;40:101928. [DOI: 10.1016/j.msard.2020.101928] [DOI] [PubMed] [Google Scholar]

Hemmer 2021

  1. Hemmer B, et al. Diagnosis and therapy of multiple sclerosis, neuromyelitis optica spectrum diseases and MOG-IgG-associated diseases, S2k Guideline [Diagnose und Therapie der Multiplen Sklerose, Neuromyelitis-optica-Spektrum-Erkrankungen und MOG-IgG-assoziierten Erkrankungen, S2k-Leitlinie]. Deutsche Gesellschaft für Neurologie (Hrsg.), Leitlinien für Diagnostik und Therapie in der Neurologie. (www.dgn.org/leitlinien) 2021 (accessed 17 June 2021).

Hempel 2017

  1. Hempel S, Graham GD, Fu N, Estrada E, Chen AY, Miake-Lye I, et al. A systematic review of modifiable risk factors in the progression of multiple sclerosis. Multiple Sclerosis Journal 2017;23(4):525-33. [DOI: 10.1177/1352458517690270] [DOI] [PubMed] [Google Scholar]

Hernández 2004

  1. Hernández AV, Steyerberg EW, Habbema JDF. Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements. Journal of Clinical Epidemiology 2004;57(5):454-60. [DOI: 10.1016/j.jclinepi.2003.09.014] [DOI] [PubMed] [Google Scholar]

Hohlfeld 2016a

  1. Hohlfeld R, Dornmair K, Meinl E, Wekerle H. The search for the target antigens of multiple sclerosis, part 1: autoreactive CD4+ T lymphocytes as pathogenic effectors and therapeutic targets. Lancet Neurology 2016;15(2):198-209. [DOI: 10.1016/S1474-4422(15)00334-8] [DOI] [PubMed] [Google Scholar]

Hohlfeld 2016b

  1. Hohlfeld R, Dornmair K, Meinl E, Wekerle H. The search for the target antigens of multiple sclerosis, part 2: CD8+ T cells, B cells, and antibodies in the focus of reverse-translational research. Lancet Neurology 2016;15(3):317-31. [DOI: 10.1016/S1474-4422(15)00313-0] [DOI] [PubMed] [Google Scholar]

Iorio 2015

  1. Iorio A, Spencer FA, Falavigna M, Alba C, Lang E, Burnand B, et al. Use of GRADE for assessment of evidence about prognosis: rating confidence in estimates of event rates in broad categories of patients. BMJ 2015;350:h870. [DOI: 10.1136/bmj.h870] [DOI] [PubMed] [Google Scholar]

Jarman 2010

  1. Jarman B, Pieter D, Veen AA, Kool RB, Aylin P, Bottle A, et al. The hospital standardised mortality ratio: a powerful tool for Dutch hospitals to assess their quality of care? BMJ Quality & Safety 2010;19(1):9-13. [DOI: 10.1136/qshc.2009.032953] [DOI] [PMC free article] [PubMed] [Google Scholar]

Justice 1999

  1. Justice AC. Assessing the generalizability of prognostic information. Annals of Internal Medicine 1999;130(6):515-24. [DOI: 10.7326/0003-4819-130-6-199903160-00016] [DOI] [PubMed] [Google Scholar]

Kalincik 2017

  1. Kalincik T, Manouchehrinia A, Sobisek L, Jokubaitis V, Spelman T, Horakova D, et al. Towards personalized therapy for multiple sclerosis: prediction of individual treatment response. Brain 2017;140(9):2426-43. [DOI: 10.1093/brain/awx185] [DOI] [PubMed] [Google Scholar]

Kalincik 2018

  1. Kalincik T. Reply: towards personalized therapy for multiple sclerosis: limitations of observational data. Brain 2018;141(5):e39. [DOI] [PubMed] [Google Scholar]

Kaufman 2011

  1. Kaufman S, Rosset S, Perlich C. Leakage in data mining: formulation, detection, and avoidance. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. 6. 2011:556–63. [DOI: 10.1145/2020408.2020496] [DOI]

Korevaar 2020

  1. Korevaar DA, Salameh J-P, Vali Y, Cohen JF, McInnes MDF, Spijker R, et al. Searching practices and inclusion of unpublished studies in systematic reviews of diagnostic accuracy. Research Synthesis Methods 2020;11(3):343-53. [DOI] [PMC free article] [PubMed] [Google Scholar]

Kreuzberger 2020

  1. Kreuzberger N, Damen JAAG, Trivella M, Estcourt LJ, Aldin A, Umlauff L, et al. Prognostic models for newly-diagnosed chronic lymphocytic leukaemia in adults: a systematic review and meta-analysis. Cochrane Database of Systematic Reviews 2020, Issue 7. Art. No: CD012022. [DOI: 10.1002/14651858.CD012022.pub2] [DOI] [PMC free article] [PubMed] [Google Scholar]

Kurtzke 1977

  1. Kurtzke JF, Beebe GW, Nagler B, Kurland LT, Auth TL. Studies on the natural history of multiple sclerosis--8. Early prognostic features of the later course of the illness. Journal of Chronic Diseases 1977;30(12):819-30. [DOI: 10.1016/0021-9681(77)90010-8] [DOI] [PubMed] [Google Scholar]

Lorscheider 2016

  1. Lorscheider J, Buzzard K, Jokubaitis V, Spelman T, Havrdova E, Horakova D, et al. Defining secondary progressive multiple sclerosis. Brain 2016;139(Pt 9):2395-405. [DOI: 10.1093/brain/aww173] [DOI] [PubMed] [Google Scholar]

Lublin 1996

  1. Lublin FD, Reingold SC. Defining the clinical course of multiple sclerosis: results of an international survey. National Multiple Sclerosis Society (USA) Advisory Committee on Clinical Trials of New Agents in Multiple Sclerosis. Neurology 1996;46(4):907-11. [DOI] [PubMed] [Google Scholar]

Lublin 2014

  1. Lublin FD, Reingold SC, Cohen JA, Cutter GR, Sørensen PS, Thompson AJ, et al. Defining the clinical course of multiple sclerosis. Neurology 2014;83(3):278-86. [DOI: 10.1212/WNL.0000000000000560] [DOI] [PMC free article] [PubMed] [Google Scholar]

Mateen 2020

  1. Mateen BA, Liley J, Denniston AK, Holmes CC, Vollmer SJ. Improving the quality of machine learning in health applications and clinical research. Nature Machine Intelligence 2020;2:554-6. [DOI: 10.1038/s42256-020-00239-1] [DOI] [Google Scholar]

McDonald 2001

  1. McDonald WI, Compston A, Edan G, Goodkin D, Hartung HP, Lublin FD, et al. Recommended diagnostic criteria for multiple sclerosis: guidelines from the International Panel on the diagnosis of multiple sclerosis. Annals of Neurology 2001;50(1):121-7. [DOI: 10.1002/ana.1032] [DOI] [PubMed] [Google Scholar]

Meyer‐Moock 2014

  1. Meyer-Moock S, Feng Y-S, Maeurer M, Dippel F-W, Kohlmann T. Systematic literature review and validity evaluation of the Expanded Disability Status Scale (EDSS) and the Multiple Sclerosis Functional Composite (MSFC) in patients with multiple sclerosis. BMC Neurology 2014;14:58. [DOI: 10.1186/1471-2377-14-58] [DOI] [PMC free article] [PubMed] [Google Scholar]

Miller 2008

  1. Miller A, Avidan N, Tzunz-Henig N, Glass-Marmor L, Lejbkowicz I, Pinter RY, et al. Translation towards personalized medicine in multiple sclerosis. Journal of the Neurological Sciences 2008;274(1):68-75. [DOI: 10.1016/j.jns.2008.07.028] [DOI] [PubMed] [Google Scholar]

Moher 2009

  1. Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLOS Medicine 2009;6(7):e1000097. [DOI: 10.1371/journal.pmed.1000097] [DOI] [PMC free article] [PubMed] [Google Scholar]

Montalban 2018

  1. Montalban X, Gold R, Thompson AJ, Otero-Romero S, Amato MP, Chandraratna D, et al. ECTRIMS/EAN Guideline on the pharmacological treatment of people with multiple sclerosis. Multiple Sclerosis Journal 2018;24(2):25. [DOI: 10.1177/1352458517751049] [DOI] [PubMed] [Google Scholar]

Montavon 2012

  1. Montavon G, Orr G, Müller KR. Neural networks: tricks of the trade. Springer, 2012. [Google Scholar]

Moons 2014

  1. Moons KG, Groot JA, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLOS Medicine 2014;11(10):e1001744. [DOI: 10.1371/journal.pmed.1001744] [DOI] [PMC free article] [PubMed] [Google Scholar]

Moons 2019

  1. Moons KG, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Annals of Internal Medicine 2019;170(1):W1-33. [DOI: 10.7326/M18-1377] [DOI] [PubMed] [Google Scholar]

Newcombe 2006

  1. Newcombe RG. Confidence intervals for an effect size measure based on the Mann–Whitney statistic. Part 2: asymptotic methods and evaluation. Statistics in Medicine 2006;25(4):559-73. [DOI: 10.1002/sim.2324] [DOI] [PubMed] [Google Scholar]

Niculescu‐Mizil 2005

  1. Niculescu-Mizil A, Caruana R. Predicting good probabilities with supervised learning. In: Proceedings of the 22nd International Conference on Machine Learning. 2005:625-32. [DOI: 10.1145/1102351.1102430] [DOI]

Ontaneda 2019

  1. Ontaneda D, Tallantyre E, Kalincik T, Planchon SM, Evangelou N. Early highly effective versus escalation treatment approaches in relapsing multiple sclerosis. Lancet Neurology 2019;18(10):973-80. [DOI: 10.1016/S1474-4422(19)30151-6] [DOI] [PubMed] [Google Scholar]

Optic Neuritis Study Group 1991

  1. Optic Neuritis Study Group. The clinical profile of optic neuritis. Experience of the optic neuritis treatment trial. Archives of Ophthalmology 1991;109(12):1673-8. [DOI: 10.1001/archopht.1991.01080120057025] [DOI] [PubMed] [Google Scholar]

Ouzzani 2016

  1. Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan - a web and mobile app for systematic reviews. Systematic Reviews 2016;5(1):210. [DOI: ] [DOI] [PMC free article] [PubMed] [Google Scholar]

Page 2021

  1. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. PLoS Medicine 2021;18(3):e1003583. [DOI: 10.1371/journal.pmed.1003583] [DOI] [PMC free article] [PubMed] [Google Scholar]

Patsopoulos 2019

  1. Patsopoulos NA, Baranzini SE, Santaniello A, Shoostari P, Cotsapas C, Wong G et al. Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility. Science 2019;365(6460):eaav7188. [DOI: 10.1126/science.aav7188] [DOI] [PMC free article] [PubMed] [Google Scholar]

Peat 2014

  1. Peat G, Riley RD, Croft P, Morley KI, Kyzas PA, Moons KGM, Group for the PROGRESS. Improving the transparency of prognosis research: the role of reporting, data sharing, registration, and protocols. PLOS Medicine 2014;11(7):e1001671. [DOI: 10.1371/journal.pmed.1001671] [DOI] [PMC free article] [PubMed] [Google Scholar]

Platt 1999

  1. Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers 1999;10(3):61-74. [Google Scholar]

Polman 2005

  1. Polman CH, Reingold SC, Edan G, Filippi M, Hartung H-P, Kappos L, et al. Diagnostic criteria for multiple sclerosis: 2005 revisions to the "McDonald Criteria". Annals of Neurology 2005;58(6):840-6. [DOI: 10.1002/ana.20703] [DOI] [PubMed] [Google Scholar]

Polman 2011

  1. Polman CH, Reingold SC, Banwell B, Clanet M, Cohen JA, Filippi M, et al. Diagnostic criteria for multiple sclerosis: 2010 revisions to the McDonald criteria. Annals of Neurology 2011;69(2):292-302. [DOI: 10.1002/ana.20703] [DOI] [PMC free article] [PubMed] [Google Scholar]

Poser 1983

  1. Poser CM, Paty DW, Scheinberg L, McDonald WI, Davis FA, Ebers GC, et al. New diagnostic criteria for multiple sclerosis: guidelines for research protocols. Annals of Neurology 1983;13(3):227-31. [DOI: 10.1002/ana.410130302] [DOI] [PubMed] [Google Scholar]

Rae‐Grant 2018

  1. Rae-Grant A, Day GS, Marrie RA, Rabinstein A, Cree BA, Gronseth GS, et al. Comprehensive systematic review summary: disease-modifying therapies for adults with multiple sclerosis. Neurology 2018;90(17):789-800. [DOI: 10.1212/WNL.0000000000005345] [DOI] [PubMed] [Google Scholar]

Reich 2018

  1. Reich DS, Lucchinetti CF, Calabresi PA. Multiple sclerosis. New England Journal of Medicine 2018;378(2):169-80. [DOI: 10.1056/NEJMra1401483] [DOI] [PMC free article] [PubMed] [Google Scholar]

Riley 2019

  1. Riley RD, Snell KIE, Ensor J, Burke DL, Harrell Jr FE, Moons KGM, et al. Minimum sample size for developing a multivariable prediction model: part II - binary and time-to-event outcomes. Statistics in Medicine 2019;38(7):1276-96. [DOI: 10.1002/sim.7992] [DOI] [PMC free article] [PubMed] [Google Scholar]

Riley 2020

  1. Riley RD, Ensor J, Snell KIE, Harrell FE Jr, Martin GP, Reitsma JB, et al. Calculating the sample size required for developing a clinical prediction model. BMJ 2020;368:m441. [DOI: 10.1136/bmj.m441] [DOI] [PubMed] [Google Scholar]

Roozenbeek 2009

  1. Roozenbeek B, Maas AIR, Lingsma HF, Butcher I, Lu J, Marmarou A, et al. Baseline characteristics and statistical power in randomized controlled trials: selection, prognostic targeting, or covariate adjustment? Critical Care Medicine 2009;37(10):2683-90. [DOI: 10.1097/ccm.0b013e3181ab85ec] [DOI] [PubMed] [Google Scholar]

Rotstein 2019

  1. Rotstein D, Montalban X. Reaching an evidence-based prognosis for personalized treatment of multiple sclerosis. Nature Reviews Neurology 2019;15(5):287-300. [DOI: 10.1038/s41582-019-0170-8] [DOI] [PubMed] [Google Scholar]

Runmarker 1994

  1. Runmarker B, Andersson C, Odén A, Andersen O. Prediction of outcome in multiple sclerosis based on multivariate models. Journal of Neurology 1994;241(10):597-604. [DOI: 10.1007/BF00920623] [DOI] [PubMed] [Google Scholar]

Río 2009

  1. Río J, Comabella M, Montalban X. Predicting responders to therapies for multiple sclerosis. Nature Reviews Neurology 2009;5(10):553-60. [DOI: 10.1038/nrneurol.2009.139] [DOI] [PubMed] [Google Scholar]

Río 2016

  1. Río J, Ruiz-Peña JL. Short-term suboptimal response criteria for predicting long-term non-response to first-line disease modifying therapies in multiple sclerosis: a systematic review and meta-analysis. Journal of the Neurological Sciences 2016;361:158-67. [DOI: 10.1016/j.jns.2015.12.043] [DOI] [PubMed] [Google Scholar]

Sawcer 2011

  1. Sawcer S. The major cause of multiple sclerosis is environmental: genetics has a minor role--no. Multiple Sclerosis 2011;17(10):1174-5. [DOI: 10.1177/1352458511421106] [DOI] [PubMed] [Google Scholar]

Seccia 2021

  1. Seccia R, Romano S, Salvetti M, Crisanti A, Palagi L, Grassi F. Machine learning use for prognostic purposes in multiple sclerosis. Life 2021;11(2):122. [DOI: 10.3390/life11020122] [DOI] [PMC free article] [PubMed] [Google Scholar]

Sekula 2016

  1. Sekula P, Pressler JB, Sauerbrei W, Goebell PJ, Schmitz-Dräger BJ. Assessment of the extent of unpublished studies in prognostic factor research: a systematic review of p53 immunohistochemistry in bladder cancer as an example. BMJ Open 2016;6(8):e009972. [DOI] [PMC free article] [PubMed] [Google Scholar]

Simera 2008

  1. Simera I, Altman DG, Moher D, Schulz KF, Hoey J. Guidelines for reporting health research: the EQUATOR Network's survey of guideline authors. PLOS Medicine 2008;5(6):e139. [DOI: 10.1371/journal.pmed.0050139] [DOI] [PMC free article] [PubMed]

Snell 2020

  1. Snell KIE, Allotey J, Smuk M, Hooper R, Chan C, Ahmed A, et al. External validation of prognostic models predicting pre-eclampsia: individual participant data meta-analysis. BMC Medicine 2020;18(1):302. [DOI: 10.1186/s12916-020-01766-9] [DOI] [PMC free article] [PubMed] [Google Scholar]

Sormani 2013

  1. Sormani MP, Rio J, Tintorè M, Signori A, Li D, Cornelisse P, et al. Scoring treatment response in patients with relapsing multiple sclerosis. Multiple Sclerosis Journal 2013;19(5):605-12. [DOI: 10.1177/1352458512460605] [DOI] [PubMed] [Google Scholar]

Sormani 2016

  1. Sormani MP, Gasperini C, Romeo M, Rio J, Calabrese M, Cocco E et al. Assessing response to interferon-β in a multicenter dataset of patients with MS. Neurology 2016;87(2):134-40. [DOI: 10.1212/WNL.0000000000002830] [DOI] [PubMed] [Google Scholar]

Sormani 2017

  1. Sormani MP. Prognostic factors versus markers of response to treatment versus surrogate endpoints: three different concepts. Multiple Sclerosis Journal 2017;23(3):378-81. [DOI] [PubMed] [Google Scholar]

Steyerberg 2013

  1. Steyerberg EW, Moons KG, Windt DA, Hayden JA, Perel P, Schroter S, et al. Prognosis research strategy (PROGRESS) 3: prognostic model research. PLOS Medicine 2013;10(2):e1001381. [DOI: 10.1371/journal.pmed.1001381] [DOI] [PMC free article] [PubMed] [Google Scholar]

Steyerberg 2018

  1. Steyerberg EW, Claggett B. Towards personalized therapy for multiple sclerosis: limitations of observational data. Brain 2018;141(5):e38. [DOI] [PubMed] [Google Scholar]

Steyerberg 2019

  1. Steyerberg E. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. 2nd edition. New York (NY): Springer-Verlag, 2019. [Google Scholar]

Thompson 2000

  1. Thompson AJ, Montalban X, Barkhof F, Brochet B, Filippi M, Miller DH, et al. Diagnostic criteria for primary progressive multiple sclerosis: a position paper. Annals of Neurology 2000;47(6):831-35. [DOI: ] [DOI] [PubMed] [Google Scholar]

Thompson 2018a

  1. Thompson AJ, Baranzini SE, Geurts J, Hemmer B, Ciccarelli O. Multiple sclerosis. Lancet 2018;391(10130):1622-36. [DOI: 10.1016/S0140-6736(18)30481-1] [DOI] [PubMed] [Google Scholar]

Thompson 2018b

  1. Thompson AJ, Banwell BL, Barkhof F, Carroll WM, Coetzee T, Comi G, et al. Diagnosis of multiple sclerosis: 2017 revisions of the McDonald criteria. Lancet Neurology 2018;17(2):162-73. [DOI: 10.1016/S1474-4422(17)30470-2] [DOI] [PubMed] [Google Scholar]

van der Ploeg 2014

  1. Ploeg T, Austin PC, Steyerberg EW. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Medical Research Methodology 2014;14:137. [DOI: 10.1186/1471-2288-14-137] [DOI] [PMC free article] [PubMed] [Google Scholar]

van Munster 2017

  1. Munster CEP, Uitdehaag BMJ. Outcome measures in clinical trials for multiple sclerosis. CNS Drugs 2017;31(3):217-36. [DOI: 10.1007/s40263-017-0412-5] [DOI] [PMC free article] [PubMed] [Google Scholar]

van Smeden 2018

  1. Smeden M. Should a risk prediction model be developed?; 3 August 2018. https://twitter.com/maartenvsmeden/status/1025315100796899328 (accessed 26 November 2021).

von Elm 2007

  1. Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Annals of Internal Medicine 2007;147(8):573-7. [DOI: 10.7326/0003-4819-147-8-200710160-00010] [DOI] [PubMed] [Google Scholar]

Völler 2017

  1. Völler S, Flint RB, Stolk LM, Degraeuwe PLJ, Simons SHP, Pokorna P, et al. Model-based clinical dose optimization for phenobarbital in neonates: an illustration of the importance of data sharing and external validation. European Journal of Pharmaceutical Sciences 2017;109:S90-7. [DOI: 10.1016/j.ejps.2017.05.026] [DOI] [PubMed] [Google Scholar]

Walton 2020

  1. Walton C, Rachel K, Lindsay R, Wendy K, Emmanuelle L, Ruth AM, et al. Rising prevalence of multiple sclerosis worldwide: Insights from the atlas of MS. Multiple Sclerosis 2020;26(14):1816-21. [DOI: 10.1177/1352458520970841] [DOI] [PMC free article] [PubMed] [Google Scholar]

Warnke 2019

  1. Warnke C, Havla J, Kitzrow M, Biesalski A-S, Knauss S. Entzündliche Erkrankungen. In: Sturm D, Biesalski A-S, Höffken O, editors(s). Neurologische Pathophysiologie: Ursachen und Mechanismen neurologischer Erkrankungen. Berlin, Heidelberg: Springer, 2019:51-98. [DOI: 10.1007/978-3-662-56784-5_2] [DOI] [Google Scholar]

Weinshenker 1989a

  1. Weinshenker BG, Bass B, Rice GP, Noseworthy J, Carriere W, Baskerville J, et al. The natural history of multiple sclerosis: a geographically based study. I. Clinical course and disability. Brain 1989;112(1):133-46. [DOI: 10.1093/brain/112.1.133] [DOI] [PubMed] [Google Scholar]

Wiendl 2021

  1. Wiendl H, Gold R, Berger T, Derfuss T, Linker R, Mäurer M, et al. Multiple Sclerosis Therapy Consensus Group (MSTCG): position statement on disease-modifying therapies for multiple sclerosis (white paper). Therapeutic Advances in Neurological Disorders 2021;14:17562864211039648. [DOI: 10.1177/17562864211039648] [DOI] [PMC free article] [PubMed] [Google Scholar]

Wingerchuk 2016

  1. Wingerchuk DM, Weinshenker BG. Disease modifying therapies for relapsing multiple sclerosis. BMJ 2016;354:i3518. [DOI: 10.1136/bmj.i3518] [DOI] [PubMed] [Google Scholar]

Wolff 2019

  1. Wolff RF, Moons KG, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Annals of Internal Medicine 2019;170(1):51-8. [DOI: 10.7326/M18-1376] [DOI] [PubMed] [Google Scholar]

Wynants 2017

  1. Wynants L, Collins GS, Van Calster B. Key steps and common pitfalls in developing and validating risk models. BJOG: An International Journal of Obstetrics and Gynaecology 2017;124(3):423-32. [DOI: 10.1111/1471-0528.14170] [DOI] [PubMed] [Google Scholar]

Wynants 2020

  1. Wynants L, Van Calster B, CollinsG S, Riley R D, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 2020;369:m1328. [DOI: 10.1136/bmj.m1328] [DOI] [PMC free article] [PubMed] [Google Scholar]

Zadrozny 2001

  1. Zadrozny B, Elkan C. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: Proceedings of the Eighteenth International Conference on Machine Learning. 2001:609-16.

References to other published versions of this review

On Seker 2020

  1. On Seker BI, Reeve K, Havla J, Burns J, Gosteli MA, Lutterotti A, et al. Prognostic models for predicting clinical disease progression, worsening and activity in people with multiple sclerosis. Cochrane Database of Systematic Reviews 2020, Issue 5. Art. No: CD013606. [DOI: 10.1002/14651858.CD013606] [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The dataset summarised in this review is available as tables in the Appendices and in Characteristics of included studies. The R code used for the statistical description is available upon request from the authors.


Articles from The Cochrane Database of Systematic Reviews are provided here courtesy of Wiley

RESOURCES